Compute aggregates by specifying a series of aggregate columns.
Compute aggregates by specifying a series of aggregate columns. Unlike other methods in this class, the resulting DataFrame won't automatically include the grouping columns.
The available aggregate methods are defined in org.apache.spark.sql.functions.
// Selects the age of the oldest employee and the aggregate expense for each department // Scala: import org.apache.spark.sql.functions._ df.groupBy("department").agg($"department", max($"age"), sum($"expense")) // Java: import static org.apache.spark.sql.functions.*; df.groupBy("department").agg(col("department"), max(col("age")), sum(col("expense")));
(Java-specific) Compute aggregates by specifying a map from column name to aggregate methods.
(Java-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department import com.google.common.collect.ImmutableMap; df.groupBy("department").agg(ImmutableMap.<String, String>builder() .put("age", "max") .put("expense", "sum") .build());
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department df.groupBy("department").agg(Map( "age" -> "max", "expense" -> "sum" ))
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department df.groupBy("department").agg( "age" -> "max", "expense" -> "sum" )
Compute the mean value for each numeric columns for each group.
Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.
Count the number of rows for each group.
Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.
Compute the max value for each numeric columns for each group.
Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.
Compute the average value for each numeric columns for each group.
Compute the average value for each numeric columns for each group. This is an alias for avg
.
The resulting DataFrame will also contain the grouping columns.
When specified columns are given, only compute the average values for them.
Compute the min value for each numeric column for each group.
Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.
Compute the sum for each numeric columns for each group.
Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.
:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.