org.apache.spark.mllib.clustering
Return the latest model.
Return the latest model.
Java-friendly version of predictOn
.
Java-friendly version of predictOn
.
Use the clustering model to make predictions on batches of data from a DStream.
Use the clustering model to make predictions on batches of data from a DStream.
DStream containing vector data
DStream containing predictions
Java-friendly version of predictOnValues
.
Java-friendly version of predictOnValues
.
Use the model to make predictions on the values of a DStream and carry over its keys.
Use the model to make predictions on the values of a DStream and carry over its keys.
key type
DStream containing (key, feature vector) pairs
DStream containing the input keys and the predictions as values
Set the decay factor directly (for forgetful algorithms).
Set the decay factor directly (for forgetful algorithms).
Set the half life and time unit ("batches" or "points") for forgetful algorithms.
Set the half life and time unit ("batches" or "points") for forgetful algorithms.
Specify initial centers directly.
Specify initial centers directly.
Set the number of clusters.
Set the number of clusters.
Initialize random centers, requiring only the number of dimensions.
Initialize random centers, requiring only the number of dimensions.
Number of dimensions
Weight for each center
Random seed
Java-friendly version of trainOn
.
Java-friendly version of trainOn
.
Update the clustering model by training on batches of data from a DStream.
Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.
DStream containing vector data
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.
Use a builder pattern to construct a streaming k-means analysis in an application, like: