pyspark.pandas.DataFrame.quantile#

DataFrame.quantile(q=0.5, axis=0, numeric_only=False, accuracy=10000)[source]#

Return value at the given quantile.

Note

Unlike pandas’, the quantile in pandas-on-Spark is an approximated quantile based upon approximate percentile computation because computing quantile across a large dataset is extremely expensive.

Parameters
qfloat or array-like, default 0.5 (50% quantile)

0 <= q <= 1, the quantile(s) to compute.

axisint or str, default 0 or ‘index’

Can only be set to 0 now.

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 4.0.0: The default value of numeric_only is now False.

accuracyint, optional

Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.

Returns
Series or DataFrame

If q is an array, a DataFrame will be returned where the index is q, the columns are the columns of self, and the values are the quantiles. If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> psdf
   a  b
0  1  6
1  2  7
2  3  8
3  4  9
4  5  0
>>> psdf.quantile(.5)
a    3.0
b    7.0
Name: 0.5, dtype: float64
>>> psdf.quantile([.25, .5, .75])
        a    b
0.25  2.0  6.0
0.50  3.0  7.0
0.75  4.0  8.0