public class KeyValueGroupedDataset<K,V>
extends Object
implements scala.Serializable
Dataset
has been logically grouped by a user specified grouping key. Users should not
construct a KeyValueGroupedDataset
directly, but should instead call groupByKey
on
an existing Dataset
.
Modifier and Type | Method and Description |
---|---|
<U1> Dataset<scala.Tuple2<K,U1>> |
agg(TypedColumn<V,U1> col1)
Computes the given aggregation, returning a
Dataset of tuples for each unique key
and the result of computing this aggregation over all elements in the group. |
<U1,U2> Dataset<scala.Tuple3<K,U1,U2>> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2,
TypedColumn<V,U3> col3)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U1,U2,U3,U4> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2,
TypedColumn<V,U3> col3,
TypedColumn<V,U4> col4)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U,R> Dataset<R> |
cogroup(KeyValueGroupedDataset<K,U> other,
CoGroupFunction<K,V,U,R> f,
Encoder<R> encoder)
(Java-specific)
Applies the given function to each cogrouped data.
|
<U,R> Dataset<R> |
cogroup(KeyValueGroupedDataset<K,U> other,
scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator<U>,scala.collection.TraversableOnce<R>> f,
Encoder<R> evidence$5)
(Scala-specific)
Applies the given function to each cogrouped data.
|
Dataset<scala.Tuple2<K,Object>> |
count()
Returns a
Dataset that contains a tuple with each key and the number of items present
for that key. |
<U> Dataset<U> |
flatMapGroups(FlatMapGroupsFunction<K,V,U> f,
Encoder<U> encoder)
(Java-specific)
Applies the given function to each group of data.
|
<U> Dataset<U> |
flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce<U>> f,
Encoder<U> evidence$3)
(Scala-specific)
Applies the given function to each group of data.
|
<L> KeyValueGroupedDataset<L,V> |
keyAs(Encoder<L> evidence$1)
Returns a new
KeyValueGroupedDataset where the type of the key has been mapped to the
specified type. |
Dataset<K> |
keys()
Returns a
Dataset that contains each unique key. |
<U> Dataset<U> |
mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f,
Encoder<U> evidence$4)
(Scala-specific)
Applies the given function to each group of data.
|
<U> Dataset<U> |
mapGroups(MapGroupsFunction<K,V,U> f,
Encoder<U> encoder)
(Java-specific)
Applies the given function to each group of data.
|
<W> KeyValueGroupedDataset<K,W> |
mapValues(scala.Function1<V,W> func,
Encoder<W> evidence$2)
Returns a new
KeyValueGroupedDataset where the given function func has been applied
to the data. |
<W> KeyValueGroupedDataset<K,W> |
mapValues(MapFunction<V,W> func,
Encoder<W> encoder)
Returns a new
KeyValueGroupedDataset where the given function func has been applied
to the data. |
org.apache.spark.sql.execution.QueryExecution |
queryExecution() |
Dataset<scala.Tuple2<K,V>> |
reduceGroups(scala.Function2<V,V,V> f)
(Scala-specific)
Reduces the elements of each group of data using the specified binary function.
|
Dataset<scala.Tuple2<K,V>> |
reduceGroups(ReduceFunction<V> f)
(Java-specific)
Reduces the elements of each group of data using the specified binary function.
|
public org.apache.spark.sql.execution.QueryExecution queryExecution()
public <L> KeyValueGroupedDataset<L,V> keyAs(Encoder<L> evidence$1)
KeyValueGroupedDataset
where the type of the key has been mapped to the
specified type. The mapping of key columns to the type follows the same rules as as
on
Dataset
.
evidence$1
- (undocumented)public <W> KeyValueGroupedDataset<K,W> mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)
KeyValueGroupedDataset
where the given function func
has been applied
to the data. The grouping key is unchanged by this.
// Create values grouped by key from a Dataset[(K, V)]
ds.groupByKey(_._1).mapValues(_._2) // Scala
func
- (undocumented)evidence$2
- (undocumented)public <W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V,W> func, Encoder<W> encoder)
KeyValueGroupedDataset
where the given function func
has been applied
to the data. The grouping key is unchanged by this.
// Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>>
Dataset<Tuple2<String, Integer>> ds = ...;
KeyValueGroupedDataset<String, Integer> grouped =
ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT()); // Java 8
func
- (undocumented)encoder
- (undocumented)public Dataset<K> keys()
Dataset
that contains each unique key. This is equivalent to doing mapping
over the Dataset to extract the keys and then running a distinct operation on those.
public <U> Dataset<U> flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce<U>> f, Encoder<U> evidence$3)
Dataset
.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset
. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator
.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList
) unless they are sure that this is possible given the memory
constraints of their cluster.
f
- (undocumented)evidence$3
- (undocumented)public <U> Dataset<U> flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder<U> encoder)
Dataset
.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset
. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator
.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList
) unless they are sure that this is possible given the memory
constraints of their cluster.
f
- (undocumented)encoder
- (undocumented)public <U> Dataset<U> mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder<U> evidence$4)
Dataset
.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset
. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator
.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList
) unless they are sure that this is possible given the memory
constraints of their cluster.
f
- (undocumented)evidence$4
- (undocumented)public <U> Dataset<U> mapGroups(MapGroupsFunction<K,V,U> f, Encoder<U> encoder)
Dataset
.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset
. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator
.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList
) unless they are sure that this is possible given the memory
constraints of their cluster.
f
- (undocumented)encoder
- (undocumented)public Dataset<scala.Tuple2<K,V>> reduceGroups(scala.Function2<V,V,V> f)
f
- (undocumented)public Dataset<scala.Tuple2<K,V>> reduceGroups(ReduceFunction<V> f)
f
- (undocumented)public <U1> Dataset<scala.Tuple2<K,U1>> agg(TypedColumn<V,U1> col1)
Dataset
of tuples for each unique key
and the result of computing this aggregation over all elements in the group.
col1
- (undocumented)public <U1,U2> Dataset<scala.Tuple3<K,U1,U2>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1
- (undocumented)col2
- (undocumented)public <U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)public <U1,U2,U3,U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)public Dataset<scala.Tuple2<K,Object>> count()
Dataset
that contains a tuple with each key and the number of items present
for that key.
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator<U>,scala.collection.TraversableOnce<R>> f, Encoder<R> evidence$5)
Dataset
this
and other
. The function can return an iterator containing elements of an
arbitrary type which will be returned as a new Dataset
.
other
- (undocumented)f
- (undocumented)evidence$5
- (undocumented)public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
Dataset
this
and other
. The function can return an iterator containing elements of an
arbitrary type which will be returned as a new Dataset
.
other
- (undocumented)f
- (undocumented)encoder
- (undocumented)