pyspark.pandas.groupby.GroupBy.sum#
- GroupBy.sum(numeric_only=False, min_count=0)[source]#
Compute sum of group values
New in version 3.3.0.
- Parameters
- numeric_onlybool, default False
Include only float, int, boolean columns.
New in version 3.4.0.
Changed in version 4.0.0.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
New in version 3.4.0.
Notes
There is a behavior difference between pandas-on-Spark and pandas:
- when there is a non-numeric aggregation column, it will be ignored
even if numeric_only is False.
Examples
>>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True], ... "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]})
>>> df.groupby("A").sum().sort_index() B C D A 1 1 6 ab 2 1 8 aa
>>> df.groupby("D").sum().sort_index() A B C D a 5 2 11 b 1 0 3
>>> df.groupby("D").sum(min_count=3).sort_index() A B C D a 5.0 2.0 11.0 b NaN NaN NaN