GradientBoostedTrees (Spark 2.4.1 JavaDoc)

Object
- org.apache.spark.ml.tree.impl.GradientBoostedTrees

public class GradientBoostedTrees
extends Object

Constructor Summary

Constructors
Constructor and Description

GradientBoostedTrees()

Constructors
Constructor and Description
`GradientBoostedTrees()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static scala.Tuple2<DecisionTreeRegressionModel[],double[]>`	`boost(RDD<LabeledPoint> input, RDD<LabeledPoint> validationInput, BoostingStrategy boostingStrategy, boolean validate, long seed, String featureSubsetStrategy)` Internal method for performing regression using trees as base learners.
`static double`	`computeError(RDD<LabeledPoint> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss)` Method to calculate error of the base learner for the gradient boosting calculation.
`static RDD<scala.Tuple2<Object,Object>>`	`computeInitialPredictionAndError(RDD<LabeledPoint> data, double initTreeWeight, DecisionTreeRegressionModel initTree, Loss loss)` Compute the initial predictions and errors for a dataset for the first iteration of gradient boosting.
`static double[]`	`evaluateEachIteration(RDD<LabeledPoint> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss, scala.Enumeration.Value algo)` Method to compute error or loss for every iteration of gradient boosting.
`static scala.Tuple2<DecisionTreeRegressionModel[],double[]>`	`run(RDD<LabeledPoint> input, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy)` Method to train a gradient boosting model
`static scala.Tuple2<DecisionTreeRegressionModel[],double[]>`	`runWithValidation(RDD<LabeledPoint> input, RDD<LabeledPoint> validationInput, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy)` Method to validate a gradient boosting model
`static double`	`updatePrediction(Vector features, double prediction, DecisionTreeRegressionModel tree, double weight)` Add prediction from a new boosting iteration to an existing prediction.
`static RDD<scala.Tuple2<Object,Object>>`	`updatePredictionError(RDD<LabeledPoint> data, RDD<scala.Tuple2<Object,Object>> predictionAndError, double treeWeight, DecisionTreeRegressionModel tree, Loss loss)` Update a zipped predictionError RDD (as obtained with computeInitialPredictionAndError)

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- GradientBoostedTrees
```
public GradientBoostedTrees()
```

Method Detail

run

public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> run(RDD<LabeledPoint> input,
                                                                       BoostingStrategy boostingStrategy,
                                                                       long seed,
                                                                       String featureSubsetStrategy)

Method to train a gradient boosting model

Parameters:: input - Training dataset: RDD of LabeledPoint.; seed - Random seed.; boostingStrategy - (undocumented); featureSubsetStrategy - (undocumented)
Returns:: tuple of ensemble models and weights: (array of decision tree models, array of model weights)

runWithValidation

public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> runWithValidation(RDD<LabeledPoint> input,
                                                                                     RDD<LabeledPoint> validationInput,
                                                                                     BoostingStrategy boostingStrategy,
                                                                                     long seed,
                                                                                     String featureSubsetStrategy)

Method to validate a gradient boosting model

Parameters:: input - Training dataset: RDD of LabeledPoint.; validationInput - Validation dataset. This dataset should be different from the training dataset, but it should follow the same distribution. E.g., these two datasets could be created from an original dataset by using org.apache.spark.rdd.RDD.randomSplit(); seed - Random seed.; boostingStrategy - (undocumented); featureSubsetStrategy - (undocumented)
Returns:: tuple of ensemble models and weights: (array of decision tree models, array of model weights)

computeInitialPredictionAndError

public static RDD<scala.Tuple2<Object,Object>> computeInitialPredictionAndError(RDD<LabeledPoint> data,
                                                                                double initTreeWeight,
                                                                                DecisionTreeRegressionModel initTree,
                                                                                Loss loss)

Compute the initial predictions and errors for a dataset for the first iteration of gradient boosting.

Parameters:: data: - training data.; initTreeWeight: - learning rate assigned to the first tree.; initTree: - first DecisionTreeModel.; loss: - evaluation metric.
Returns:: an RDD with each element being a zip of the prediction and error corresponding to every sample.

updatePredictionError

public static RDD<scala.Tuple2<Object,Object>> updatePredictionError(RDD<LabeledPoint> data,
                                                                     RDD<scala.Tuple2<Object,Object>> predictionAndError,
                                                                     double treeWeight,
                                                                     DecisionTreeRegressionModel tree,
                                                                     Loss loss)

Update a zipped predictionError RDD (as obtained with computeInitialPredictionAndError)

Parameters:: data: - training data.; predictionAndError: - predictionError RDD; treeWeight: - Learning rate.; tree: - Tree using which the prediction and error should be updated.; loss: - evaluation metric.
Returns:: an RDD with each element being a zip of the prediction and error corresponding to each sample.

updatePrediction

public static double updatePrediction(Vector features,
                                      double prediction,
                                      DecisionTreeRegressionModel tree,
                                      double weight)

Add prediction from a new boosting iteration to an existing prediction.

Parameters:: features - Vector of features representing a single data point.; prediction - The existing prediction.; tree - New Decision Tree model.; weight - Tree weight.
Returns:: Updated prediction.

computeError
```
public static double computeError(RDD<LabeledPoint> data,
                                  DecisionTreeRegressionModel[] trees,
                                  double[] treeWeights,
                                  Loss loss)
```
Method to calculate error of the base learner for the gradient boosting calculation. Note: This method is not used by the gradient boosting algorithm but is useful for debugging purposes.

Parameters:

data - Training dataset: RDD of LabeledPoint.

trees - Boosted Decision Tree models

treeWeights - Learning rates at each boosting iteration.

loss - evaluation metric.

Returns:

Measure of model error on data

evaluateEachIteration

public static double[] evaluateEachIteration(RDD<LabeledPoint> data,
                                             DecisionTreeRegressionModel[] trees,
                                             double[] treeWeights,
                                             Loss loss,
                                             scala.Enumeration.Value algo)

Method to compute error or loss for every iteration of gradient boosting.

Parameters:: data - RDD of LabeledPoint; trees - Boosted Decision Tree models; treeWeights - Learning rates at each boosting iteration.; loss - evaluation metric.; algo - algorithm for the ensemble, either Classification or Regression
Returns:: an array with index i having the losses or errors for the ensemble containing the first i+1 trees

boost

public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> boost(RDD<LabeledPoint> input,
                                                                         RDD<LabeledPoint> validationInput,
                                                                         BoostingStrategy boostingStrategy,
                                                                         boolean validate,
                                                                         long seed,
                                                                         String featureSubsetStrategy)

Internal method for performing regression using trees as base learners.

Parameters:: input - training dataset; validationInput - validation dataset, ignored if validate is set to false.; boostingStrategy - boosting parameters; validate - whether or not to use the validation dataset.; seed - Random seed.; featureSubsetStrategy - (undocumented)
Returns:: tuple of ensemble models and weights: (array of decision tree models, array of model weights)

Class GradientBoostedTrees

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Detail

GradientBoostedTrees