public class DecisionTree extends Object implements scala.Serializable, Logging
Constructor and Description |
---|
DecisionTree(Strategy strategy) |
Modifier and Type | Method and Description |
---|---|
DecisionTreeModel |
train(RDD<LabeledPoint> input)
Method to train a decision tree model over an RDD
|
static DecisionTreeModel |
trainClassifier(JavaRDD<LabeledPoint> input,
int numClassesForClassification,
java.util.Map<Integer,Integer> categoricalFeaturesInfo,
String impurity,
int maxDepth,
int maxBins)
Java-friendly API for
DecisionTree$.trainClassifier(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, int, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int) |
static DecisionTreeModel |
trainClassifier(RDD<LabeledPoint> input,
int numClassesForClassification,
scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo,
String impurity,
int maxDepth,
int maxBins)
Method to train a decision tree model for binary or multiclass classification.
|
static DecisionTreeModel |
trainRegressor(JavaRDD<LabeledPoint> input,
java.util.Map<Integer,Integer> categoricalFeaturesInfo,
String impurity,
int maxDepth,
int maxBins)
Java-friendly API for
DecisionTree$.trainRegressor(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int) |
static DecisionTreeModel |
trainRegressor(RDD<LabeledPoint> input,
scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo,
String impurity,
int maxDepth,
int maxBins)
Method to train a decision tree model for regression.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public DecisionTree(Strategy strategy)
public static DecisionTreeModel trainClassifier(RDD<LabeledPoint> input, int numClassesForClassification, scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo, String impurity, int maxDepth, int maxBins)
input
- Training dataset: RDD of LabeledPoint
.
Labels should take values {0, 1, ..., numClasses-1}.numClassesForClassification
- number of classes for classification.categoricalFeaturesInfo
- Map storing arity of categorical features.
E.g., an entry (n -> k) indicates that feature n is categorical
with k categories indexed from 0: {0, 1, ..., k-1}.impurity
- Criterion used for information gain calculation.
Supported values: "gini" (recommended) or "entropy".maxDepth
- Maximum depth of the tree.
E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
(suggested value: 4)maxBins
- maximum number of bins used for splitting features
(suggested value: 100)public static DecisionTreeModel trainClassifier(JavaRDD<LabeledPoint> input, int numClassesForClassification, java.util.Map<Integer,Integer> categoricalFeaturesInfo, String impurity, int maxDepth, int maxBins)
DecisionTree$.trainClassifier(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, int, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int)
public static DecisionTreeModel trainRegressor(RDD<LabeledPoint> input, scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo, String impurity, int maxDepth, int maxBins)
input
- Training dataset: RDD of LabeledPoint
.
Labels are real numbers.categoricalFeaturesInfo
- Map storing arity of categorical features.
E.g., an entry (n -> k) indicates that feature n is categorical
with k categories indexed from 0: {0, 1, ..., k-1}.impurity
- Criterion used for information gain calculation.
Supported values: "variance".maxDepth
- Maximum depth of the tree.
E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
(suggested value: 4)maxBins
- maximum number of bins used for splitting features
(suggested value: 100)public static DecisionTreeModel trainRegressor(JavaRDD<LabeledPoint> input, java.util.Map<Integer,Integer> categoricalFeaturesInfo, String impurity, int maxDepth, int maxBins)
DecisionTree$.trainRegressor(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, scala.collection.immutable.Map<java.lang.Object, java.lang.Object>, java.lang.String, int, int)
public DecisionTreeModel train(RDD<LabeledPoint> input)
input
- Training data: RDD of LabeledPoint