Class MultivariateOnlineSummarizer

Object
org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
All Implemented Interfaces:
Serializable, MultivariateStatisticalSummary

public class MultivariateOnlineSummarizer extends Object implements MultivariateStatisticalSummary, Serializable
MultivariateOnlineSummarizer implements MultivariateStatisticalSummary to compute the mean, variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector format in an online fashion.

Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.

A numerically stable algorithm is implemented to compute the mean and variance of instances: Reference: variance-wiki Zero elements (including explicit zero values) are skipped when calling add(), to have time complexity O(nnz) instead of O(n) for each column.

For weighted instances, the unbiased estimation of variance is defined by the reliability weights: see Reliability weights (Wikipedia).

See Also: