pyspark.pandas.DataFrame.quantile#
- DataFrame.quantile(q=0.5, axis=0, numeric_only=False, accuracy=10000)[source]#
Return value at the given quantile.
Note
Unlike pandas’, the quantile in pandas-on-Spark is an approximated quantile based upon approximate percentile computation because computing quantile across a large dataset is extremely expensive.
- Parameters
- qfloat or array-like, default 0.5 (50% quantile)
0 <= q <= 1, the quantile(s) to compute.
- axisint or str, default 0 or ‘index’
Can only be set to 0 now.
- numeric_onlybool, default False
Include only float, int or boolean data.
Changed in version 4.0.0: The default value of
numeric_only
is nowFalse
.- accuracyint, optional
Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.
- Returns
- Series or DataFrame
If q is an array, a DataFrame will be returned where the index is q, the columns are the columns of self, and the values are the quantiles. If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.
Examples
>>> psdf = ps.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]}) >>> psdf a b 0 1 6 1 2 7 2 3 8 3 4 9 4 5 0
>>> psdf.quantile(.5) a 3.0 b 7.0 Name: 0.5, dtype: float64
>>> psdf.quantile([.25, .5, .75]) a b 0.25 2.0 6.0 0.50 3.0 7.0 0.75 4.0 8.0