public final class Bucketizer extends Model<Bucketizer>
Bucketizer
maps a column of continuous features to a column of feature buckets.Constructor and Description |
---|
Bucketizer() |
Bucketizer(java.lang.String uid) |
Modifier and Type | Method and Description |
---|---|
static double |
binarySearchForBuckets(double[] splits,
double feature)
Binary searching in several buckets to place each data point.
|
static boolean |
checkSplits(double[] splits)
We require splits to be of length >= 3 and to be in strictly increasing order.
|
Bucketizer |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
double[] |
getSplits() |
Bucketizer |
setInputCol(java.lang.String value) |
Bucketizer |
setOutputCol(java.lang.String value) |
Bucketizer |
setSplits(double[] value) |
DoubleArrayParam |
splits()
Parameter for mapping continuous features into buckets.
|
DataFrame |
transform(DataFrame dataset)
Transforms the input dataset.
|
StructType |
transformSchema(StructType schema)
:: DeveloperApi ::
|
java.lang.String |
uid()
An immutable unique ID for the object and its derivatives.
|
transform, transform, transform
transformSchema
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParams
toString
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public Bucketizer(java.lang.String uid)
public Bucketizer()
public static boolean checkSplits(double[] splits)
public static double binarySearchForBuckets(double[] splits, double feature)
splits
- (undocumented)feature
- (undocumented)SparkException
- if a feature is < splits.head or > splits.lastpublic java.lang.String uid()
Identifiable
public DoubleArrayParam splits()
public double[] getSplits()
public Bucketizer setSplits(double[] value)
public Bucketizer setInputCol(java.lang.String value)
public Bucketizer setOutputCol(java.lang.String value)
public DataFrame transform(DataFrame dataset)
Transformer
transform
in class Transformer
dataset
- (undocumented)public StructType transformSchema(StructType schema)
PipelineStage
Derives the output schema from the input schema.
transformSchema
in class PipelineStage
schema
- (undocumented)public Bucketizer copy(ParamMap extra)
Params
copy
in interface Params
copy
in class Model<Bucketizer>
extra
- (undocumented)defaultCopy()