OneHotEncoderEstimator (Spark 2.4.7 JavaDoc)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Estimator<OneHotEncoderModel>
  - - org.apache.spark.ml.feature.OneHotEncoderEstimator

All Implemented Interfaces:

java.io.Serializable, Logging, OneHotEncoderBase, Params, HasHandleInvalid, HasInputCols, HasOutputCols, DefaultParamsWritable, Identifiable, MLWritable
```
public class OneHotEncoderEstimator
extends Estimator<OneHotEncoderModel>
implements OneHotEncoderBase, DefaultParamsWritable
```
A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. For example with 5 categories, an input value of 2.0 would map to an output vector of [0.0, 0.0, 1.0, 0.0]. The last category is not included by default (configurable via dropLast), because it makes the vector entries sum up to one, and hence linearly dependent. So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0].

See Also:

StringIndexer for converting categorical values into category indices, Serialized Form

Note:

This is different from scikit-learn's OneHotEncoder, which keeps all categories. The output vectors are sparse.
When handleInvalid is configured to 'keep', an extra "category" indicating invalid values is added as last category. So when dropLast is true, invalid values are encoded as all-zeros vector.
, When encoding multi-column by using inputCols and outputCols params, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.

Constructor Summary

Constructors
Constructor and Description

OneHotEncoderEstimator()

OneHotEncoderEstimator(String uid)

Constructors
Constructor and Description
`OneHotEncoderEstimator()`
`OneHotEncoderEstimator(String uid)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`OneHotEncoderEstimator`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`OneHotEncoderModel`	`fit(Dataset<?> dataset)` Fits a model to the input data.
`static OneHotEncoderEstimator`	`load(String path)`
`static MLReader<T>`	`read()`
`OneHotEncoderEstimator`	`setDropLast(boolean value)`
`OneHotEncoderEstimator`	`setHandleInvalid(String value)`
`OneHotEncoderEstimator`	`setInputCols(String[] values)`
`OneHotEncoderEstimator`	`setOutputCols(String[] values)`
`StructType`	`transformSchema(StructType schema)` :: DeveloperApi ::
`String`	`uid()` An immutable unique ID for the object and its derivatives.

Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.feature.OneHotEncoderBase
dropLast, getDropLast, handleInvalid, validateAndTransformSchema

Methods inherited from interface org.apache.spark.ml.param.shared.HasHandleInvalid
getHandleInvalid

Methods inherited from interface org.apache.spark.ml.param.shared.HasInputCols
getInputCols, inputCols

Methods inherited from interface org.apache.spark.ml.param.shared.HasOutputCols
getOutputCols, outputCols

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

Methods inherited from interface org.apache.spark.internal.Logging
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - OneHotEncoderEstimator
```
public OneHotEncoderEstimator(String uid)
```
  - OneHotEncoderEstimator
```
public OneHotEncoderEstimator()
```
- Method Detail
  - load
```
public static OneHotEncoderEstimator load(String path)
```
  - read
```
public static MLReader<T> read()
```
  - uid
```
public String uid()
```
    Description copied from interface: Identifiable
    
    An immutable unique ID for the object and its derivatives.
    
    Specified by:
    
    uid in interface Identifiable
    
    Returns:
    
    (undocumented)
  - setInputCols
```
public OneHotEncoderEstimator setInputCols(String[] values)
```
  - setOutputCols
```
public OneHotEncoderEstimator setOutputCols(String[] values)
```
  - setDropLast
```
public OneHotEncoderEstimator setDropLast(boolean value)
```
  - setHandleInvalid
```
public OneHotEncoderEstimator setHandleInvalid(String value)
```
  - transformSchema
```
public StructType transformSchema(StructType schema)
```
    Description copied from class: PipelineStage
    
    :: DeveloperApi ::
    Check transform validity and derive the output schema from the input schema.
    We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().
    Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
    
    Specified by:
    
    transformSchema in class PipelineStage
    
    Parameters:
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
  - fit
```
public OneHotEncoderModel fit(Dataset<?> dataset)
```
    Description copied from class: Estimator
    
    Fits a model to the input data.
    
    Specified by:
    
    fit in class Estimator<OneHotEncoderModel>
    
    Parameters:
    
    dataset - (undocumented)
    
    Returns:
    
    (undocumented)
  - copy
```
public OneHotEncoderEstimator copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Estimator<OneHotEncoderModel>
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)

Class OneHotEncoderEstimator

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.ml.Estimator

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.feature.OneHotEncoderBase

Methods inherited from interface org.apache.spark.ml.param.shared.HasHandleInvalid

Methods inherited from interface org.apache.spark.ml.param.shared.HasInputCols

Methods inherited from interface org.apache.spark.ml.param.shared.HasOutputCols

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

OneHotEncoderEstimator

OneHotEncoderEstimator

Method Detail

load

read

uid

setInputCols

setOutputCols

setDropLast

setHandleInvalid

transformSchema

fit

copy