DistributedLDAModel (Spark 3.1.3 JavaDoc)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Transformer
  - - org.apache.spark.ml.Model<LDAModel>
    - - org.apache.spark.ml.clustering.LDAModel
      - org.apache.spark.ml.clustering.DistributedLDAModel

All Implemented Interfaces:

java.io.Serializable, org.apache.spark.internal.Logging, LDAParams, Params, HasCheckpointInterval, HasFeaturesCol, HasMaxIter, HasSeed, Identifiable, MLWritable
```
public class DistributedLDAModel
extends LDAModel
```
Distributed model fitted by LDA. This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel as a lazy val, but keeping copy() cheap.

See Also:

Serialized Form

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`DistributedLDAModel`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`void`	`deleteCheckpointFiles()` Remove any remaining checkpoint files from training.
`String[]`	`getCheckpointFiles()` If using checkpointing and `LDA.keepLastCheckpoint` is set to true, then there may be saved checkpoint files.
`boolean`	`isDistributed()` Indicates whether this instance is of type `DistributedLDAModel`
`static DistributedLDAModel`	`load(String path)`
`double`	`logPrior()`
`static MLReader<DistributedLDAModel>`	`read()`
`LocalLDAModel`	`toLocal()` Convert this distributed model to a local representation.
`String`	`toString()`
`double`	`trainingLogLikelihood()`
`MLWriter`	`write()` Returns an `MLWriter` instance for this ML instance.

Methods inherited from class org.apache.spark.ml.clustering.LDAModel
checkpointInterval, describeTopics, describeTopics, docConcentration, estimatedDocConcentration, featuresCol, k, keepLastCheckpoint, learningDecay, learningOffset, logLikelihood, logPerplexity, maxIter, optimizeDocConcentration, optimizer, seed, setFeaturesCol, setSeed, setTopicDistributionCol, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSize

Methods inherited from class org.apache.spark.ml.Model
hasParent, parent, setParent

Methods inherited from class org.apache.spark.ml.Transformer
transform, transform, transform

Methods inherited from class org.apache.spark.ml.PipelineStage
params

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.clustering.LDAParams
getDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, validateAndTransformSchema

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
getFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed
getSeed

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
getCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.internal.Logging
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

- Method Detail
  - read
```
public static MLReader<DistributedLDAModel> read()
```
  - load
```
public static DistributedLDAModel load(String path)
```
  - toLocal
```
public LocalLDAModel toLocal()
```
    Convert this distributed model to a local representation. This discards info about the training dataset.
    WARNING: This involves collecting a large topicsMatrix to the driver.
    
    Returns:
    
    (undocumented)
  - copy
```
public DistributedLDAModel copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Model<LDAModel>
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)
  - isDistributed
```
public boolean isDistributed()
```
    Description copied from class: LDAModel
    
    Indicates whether this instance is of type DistributedLDAModel
    
    Specified by:
    
    isDistributed in class LDAModel
  - trainingLogLikelihood
```
public double trainingLogLikelihood()
```
  - logPrior
```
public double logPrior()
```
  - getCheckpointFiles
```
public String[] getCheckpointFiles()
```
    If using checkpointing and LDA.keepLastCheckpoint is set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files.
    Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain DistributedLDAModel methods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope.
    
    Returns:
    
    Checkpoint files from training
  - deleteCheckpointFiles
```
public void deleteCheckpointFiles()
```
    Remove any remaining checkpoint files from training.
    
    See Also:
    
    getCheckpointFiles
  - write
```
public MLWriter write()
```
    Description copied from interface: MLWritable
    
    Returns an MLWriter instance for this ML instance.
    
    Returns:
    
    (undocumented)
  - toString
```
public String toString()
```
    Specified by:
    
    toString in interface Identifiable
    
    Overrides:
    
    toString in class Object

Class DistributedLDAModel

Method Summary

Methods inherited from class org.apache.spark.ml.clustering.LDAModel

Methods inherited from class org.apache.spark.ml.Model

Methods inherited from class org.apache.spark.ml.Transformer

Methods inherited from class org.apache.spark.ml.PipelineStage

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.clustering.LDAParams

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.internal.Logging

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Method Detail

read

load

toLocal

copy

isDistributed

trainingLogLikelihood

logPrior

getCheckpointFiles

deleteCheckpointFiles

write

toString