FPGrowth#

class pyspark.mllib.fpm.FPGrowth[source]#

A Parallel FP-growth algorithm to mine frequent itemsets.

New in version 1.4.0.

Methods

train(data[, minSupport, numPartitions])

Computes an FP-Growth model that contains frequent itemsets.

Methods Documentation

classmethod train(data, minSupport=0.3, numPartitions=- 1)[source]#

Computes an FP-Growth model that contains frequent itemsets.

New in version 1.4.0.

Parameters
datapyspark.RDD

The input data set, each element contains a transaction.

minSupportfloat, optional

The minimal support level. (default: 0.3)

numPartitionsint, optional

The number of partitions used by parallel FP-growth. A value of -1 will use the same number as input data. (default: -1)