public class GaussianMixture
extends Object
implements scala.Serializable
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated "mixing" weights specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
Note: For high-dimensional data (with many features), this algorithm may perform poorly. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
param: k The number of independent Gaussians in the mixture model param: convergenceTol The maximum change in log-likelihood at which convergence is considered to have occurred. param: maxIterations The maximum number of iterations to perform
Constructor and Description |
---|
GaussianMixture()
Constructs a default instance.
|
Modifier and Type | Method and Description |
---|---|
double |
getConvergenceTol()
Return the largest change in log-likelihood at which convergence is
considered to have occurred.
|
scala.Option<GaussianMixtureModel> |
getInitialModel()
Return the user supplied initial GMM, if supplied
|
int |
getK()
Return the number of Gaussians in the mixture model
|
int |
getMaxIterations()
Return the maximum number of iterations to run
|
long |
getSeed()
Return the random seed
|
GaussianMixtureModel |
run(JavaRDD<Vector> data)
Java-friendly version of
run() |
GaussianMixtureModel |
run(RDD<Vector> data)
Perform expectation maximization
|
GaussianMixture |
setConvergenceTol(double convergenceTol)
Set the largest change in log-likelihood at which convergence is
considered to have occurred.
|
GaussianMixture |
setInitialModel(GaussianMixtureModel model)
Set the initial GMM starting point, bypassing the random initialization.
|
GaussianMixture |
setK(int k)
Set the number of Gaussians in the mixture model.
|
GaussianMixture |
setMaxIterations(int maxIterations)
Set the maximum number of iterations to run.
|
GaussianMixture |
setSeed(long seed)
Set the random seed
|
public GaussianMixture()
public GaussianMixture setInitialModel(GaussianMixtureModel model)
model
- (undocumented)public scala.Option<GaussianMixtureModel> getInitialModel()
public GaussianMixture setK(int k)
public int getK()
public GaussianMixture setMaxIterations(int maxIterations)
public int getMaxIterations()
public GaussianMixture setConvergenceTol(double convergenceTol)
convergenceTol
- (undocumented)public double getConvergenceTol()
public GaussianMixture setSeed(long seed)
public long getSeed()
public GaussianMixtureModel run(RDD<Vector> data)
public GaussianMixtureModel run(JavaRDD<Vector> data)
run()