public class MLUtils
extends Object
Constructor and Description |
---|
MLUtils() |
Modifier and Type | Method and Description |
---|---|
static Vector |
appendBias(Vector vector)
Returns a new vector with
1.0 (bias) appended to the input vector. |
static double |
EPSILON() |
static <T> scala.Tuple2<RDD<T>,RDD<T>>[] |
kFold(RDD<T> rdd,
int numFolds,
int seed,
scala.reflect.ClassTag<T> evidence$1)
:: Experimental ::
Return a k element array of pairs of RDDs with the first element of each pair
containing the training data, a complement of the validation data and the second
element, the validation data, containing a unique 1/kth of the data.
|
static RDD<LabeledPoint> |
loadLabeledData(SparkContext sc,
String dir)
:: Experimental ::
Load labeled data from a file.
|
static RDD<LabeledPoint> |
loadLibSVMFile(SparkContext sc,
String path)
Loads binary labeled data in the LIBSVM format into an RDD[LabeledPoint], with number of
features determined automatically and the default number of partitions.
|
static RDD<LabeledPoint> |
loadLibSVMFile(SparkContext sc,
String path,
boolean multiclass)
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint], with the number of features
determined automatically and the default number of partitions.
|
static RDD<LabeledPoint> |
loadLibSVMFile(SparkContext sc,
String path,
boolean multiclass,
int numFeatures)
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint], with the default number of
partitions.
|
static RDD<LabeledPoint> |
loadLibSVMFile(SparkContext sc,
String path,
boolean multiclass,
int numFeatures,
int minPartitions)
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint].
|
static void |
saveAsLibSVMFile(RDD<LabeledPoint> data,
String dir)
Save labeled data in LIBSVM format.
|
static void |
saveLabeledData(RDD<LabeledPoint> data,
String dir)
:: Experimental ::
Save labeled data to a file.
|
public static double EPSILON()
public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path, boolean multiclass, int numFeatures, int minPartitions)
label index1:value1 index2:value2 ...
where the indices are one-based and in ascending order.
This method parses each line into a {@link org.apache.spark.mllib.regression.LabeledPoint},
where the feature indices are converted to zero-based.
@param sc Spark context
@param path file or directory path in any Hadoop-supported file system URI
@param multiclass whether the input labels contain more than two classes. If false, any label
with value greater than 0.5 will be mapped to 1.0, or 0.0 otherwise. So it
works for both +1/-1 and 1/0 cases. If true, the double value parsed directly
from the label string will be used as the label value.
@param numFeatures number of features, which will be determined from the input data if a
nonpositive value is given. This is useful when the dataset is already split
into multiple files and you want to load them separately, because some
features may not present in certain files, which leads to inconsistent
feature dimensions.
@param minPartitions min number of partitions
@return labeled data stored as an RDD[LabeledPoint]public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path, boolean multiclass, int numFeatures)
public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path, boolean multiclass)
public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path)
public static void saveAsLibSVMFile(RDD<LabeledPoint> data, String dir)
data
- an RDD of LabeledPoint to be saveddir
- directory to save the data
loadLibSVMFile(org.apache.spark.SparkContext, java.lang.String, org.apache.spark.mllib.util.LabelParser, int, int)
public static RDD<LabeledPoint> loadLabeledData(SparkContext sc, String dir)
sc
- SparkContextdir
- Directory to the input data files.public static void saveLabeledData(RDD<LabeledPoint> data, String dir)
data
- An RDD of LabeledPoints containing data to be saved.dir
- Directory to save the data.public static <T> scala.Tuple2<RDD<T>,RDD<T>>[] kFold(RDD<T> rdd, int numFolds, int seed, scala.reflect.ClassTag<T> evidence$1)