pyspark.streaming.DStream.reduceByWindow#

DStream.reduceByWindow(reduceFunc, invReduceFunc, windowDuration, slideDuration)[source]#

Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.

if invReduceFunc is not None, the reduction is done incrementally using the old window’s reduced value :

2. “inverse reduce” the old values that left the window (e.g., subtracting old counts) This is more efficient than invReduceFunc is None.

Parameters

reduceFuncfunction: associative and commutative reduce function
invReduceFuncfunction: inverse reduce function of reduceFunc; such that for all y, and invertible x: invReduceFunc(reduceFunc(x, y), x) = y
windowDurationint: width of the window; must be a multiple of this DStream’s batching interval
slideDurationint: sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval