public class GradientBoostedTrees
extends Object
implements scala.Serializable
Stochastic Gradient Boosting
for regression and binary classification.
The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.
Notes on Gradient Boosting vs. TreeBoost: - This implementation is for Stochastic Gradient Boosting, not for TreeBoost. - Both algorithms learn tree ensembles by minimizing loss functions. - TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes based on the loss function, whereas the original gradient boosting method does not. - When the loss is SquaredError, these methods give the same result, but they could differ for other loss functions.
param: boostingStrategy Parameters for the gradient boosting algorithm. param: seed Random seed.
Constructor and Description |
---|
GradientBoostedTrees(BoostingStrategy boostingStrategy) |
Modifier and Type | Method and Description |
---|---|
GradientBoostedTreesModel |
run(JavaRDD<LabeledPoint> input)
Java-friendly API for
org.apache.spark.mllib.tree.GradientBoostedTrees!#run . |
GradientBoostedTreesModel |
run(RDD<LabeledPoint> input)
Method to train a gradient boosting model
|
GradientBoostedTreesModel |
runWithValidation(JavaRDD<LabeledPoint> input,
JavaRDD<LabeledPoint> validationInput)
Java-friendly API for
org.apache.spark.mllib.tree.GradientBoostedTrees!#runWithValidation . |
GradientBoostedTreesModel |
runWithValidation(RDD<LabeledPoint> input,
RDD<LabeledPoint> validationInput)
Method to validate a gradient boosting model
|
static GradientBoostedTreesModel |
train(JavaRDD<LabeledPoint> input,
BoostingStrategy boostingStrategy)
Java-friendly API for
GradientBoostedTrees$.train(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, org.apache.spark.mllib.tree.configuration.BoostingStrategy) |
static GradientBoostedTreesModel |
train(RDD<LabeledPoint> input,
BoostingStrategy boostingStrategy)
Method to train a gradient boosting model.
|
public GradientBoostedTrees(BoostingStrategy boostingStrategy)
boostingStrategy
- Parameters for the gradient boosting algorithm.public static GradientBoostedTreesModel train(RDD<LabeledPoint> input, BoostingStrategy boostingStrategy)
input
- Training dataset: RDD of LabeledPoint
.
For classification, labels should take values {0, 1, ..., numClasses-1}.
For regression, labels are real numbers.boostingStrategy
- Configuration options for the boosting algorithm.public static GradientBoostedTreesModel train(JavaRDD<LabeledPoint> input, BoostingStrategy boostingStrategy)
GradientBoostedTrees$.train(org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>, org.apache.spark.mllib.tree.configuration.BoostingStrategy)
input
- (undocumented)boostingStrategy
- (undocumented)public GradientBoostedTreesModel run(RDD<LabeledPoint> input)
input
- Training dataset: RDD of LabeledPoint
.public GradientBoostedTreesModel run(JavaRDD<LabeledPoint> input)
org.apache.spark.mllib.tree.GradientBoostedTrees!#run
.input
- (undocumented)public GradientBoostedTreesModel runWithValidation(RDD<LabeledPoint> input, RDD<LabeledPoint> validationInput)
input
- Training dataset: RDD of LabeledPoint
.validationInput
- Validation dataset.
This dataset should be different from the training dataset,
but it should follow the same distribution.
E.g., these two datasets could be created from an original dataset
by using org.apache.spark.rdd.RDD.randomSplit()
public GradientBoostedTreesModel runWithValidation(JavaRDD<LabeledPoint> input, JavaRDD<LabeledPoint> validationInput)
org.apache.spark.mllib.tree.GradientBoostedTrees!#runWithValidation
.input
- (undocumented)validationInput
- (undocumented)