pyspark.pandas.groupby.GroupBy.head#
- GroupBy.head(n=5)[source]#
Return first n rows of each group.
- Returns
- DataFrame or Series
Examples
>>> df = ps.DataFrame({'a': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3], ... 'b': [2, 3, 1, 4, 6, 9, 8, 10, 7, 5], ... 'c': [3, 5, 2, 5, 1, 2, 6, 4, 3, 6]}, ... columns=['a', 'b', 'c'], ... index=[7, 2, 4, 1, 3, 4, 9, 10, 5, 6]) >>> df a b c 7 1 2 3 2 1 3 5 4 1 1 2 1 1 4 5 3 2 6 1 4 2 9 2 9 2 8 6 10 3 10 4 5 3 7 3 6 3 5 6
>>> df.groupby('a').head(2).sort_index() a b c 2 1 3 5 3 2 6 1 4 2 9 2 5 3 7 3 7 1 2 3 10 3 10 4
>>> df.groupby('a')['b'].head(2).sort_index() 2 3 3 6 4 9 5 7 7 2 10 10 Name: b, dtype: int64
Supports Groupby positional indexing Since pandas on Spark 3.4 (with pandas 1.4+):
>>> df = ps.DataFrame([["g", "g0"], ... ["g", "g1"], ... ["g", "g2"], ... ["g", "g3"], ... ["h", "h0"], ... ["h", "h1"]], columns=["A", "B"]) >>> df.groupby("A").head(-1) A B 0 g g0 1 g g1 2 g g2 4 h h0