public class KolmogorovSmirnovTest
extends Object
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution. By comparing the largest difference between the empirical cumulative distribution of the sample data and the theoretical distribution we can provide a test for the the null hypothesis that the sample data comes from that theoretical distribution. For more information on KS Test:
Constructor and Description |
---|
KolmogorovSmirnovTest() |
Modifier and Type | Method and Description |
---|---|
static Dataset<Row> |
test(Dataset<?> dataset,
String sampleCol,
Function<Double,Double> cdf) |
static Dataset<Row> |
test(Dataset<?> dataset,
String sampleCol,
scala.Function1<Object,Object> cdf) |
static Dataset<Row> |
test(Dataset<?> dataset,
String sampleCol,
String distName,
double... params)
Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability
distribution equality.
|
static Dataset<Row> |
test(Dataset<?> dataset,
String sampleCol,
String distName,
scala.collection.Seq<Object> params) |
public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, double... params)
dataset
- A Dataset
or a DataFrame
containing the sample of data to testsampleCol
- Name of sample column in dataset, of any numerical typedistName
- a String
name for a theoretical distribution, currently only support "norm".params
- Double*
specifying the parameters to be used for the theoretical distribution.
For "norm" distribution, the parameters includes mean and variance.pValue: Double
- statistic: Double
public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, scala.Function1<Object,Object> cdf)
public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, Function<Double,Double> cdf)