pyspark.sql.functions.mode#
- pyspark.sql.functions.mode(col, deterministic=False)[source]#
Returns the most frequent value in a group.
New in version 3.4.0.
Changed in version 4.0.0: Supports deterministic argument.
- Parameters
- col
Column
or str target column to compute on.
- deterministicbool, optional
if there are multiple equally-frequent results then return the lowest (defaults to false).
- col
- Returns
Column
the most frequent value in a group.
Notes
Supports Spark Connect.
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([ ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("dotNET", 2013, 48000), ("Java", 2013, 30000)], ... schema=("course", "year", "earnings")) >>> df.groupby("course").agg(sf.mode("year")).sort("course").show() +------+----------+ |course|mode(year)| +------+----------+ | Java| 2012| |dotNET| 2012| +------+----------+
When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"]) >>> df.select(sf.mode("col", False)).show() +---------+ |mode(col)| +---------+ | 0| +---------+ >>> df.select(sf.mode("col", True)).show() +---------------------------------------+ |mode() WITHIN GROUP (ORDER BY col DESC)| +---------------------------------------+ | -10| +---------------------------------------+