public class BinaryClassificationMetrics
extends Object
implements org.apache.spark.internal.Logging
param: scoreAndLabels an RDD of (score, label) or (score, label, weight) tuples.
param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally
will be down-sampled to this many "bins". If 0, no down-sampling will occur.
This is useful because the curve contains a point for each distinct score
in the input, and this could be as large as the input itself -- millions of
points or more, when thousands may be entirely sufficient to summarize
the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of
consecutive points. The size of each bin is
floor(scoreAndLabels.count() / numBins)
, which means the resulting number
of bins may not exactly equal numBins. The last bin in each partition may
be smaller as a result, meaning there may be an extra sample at
partition boundaries.
Constructor and Description |
---|
BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels,
int numBins) |
BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
Defaults
numBins to 0. |
Modifier and Type | Method and Description |
---|---|
double |
areaUnderPR()
Computes the area under the precision-recall curve.
|
double |
areaUnderROC()
Computes the area under the receiver operating characteristic (ROC) curve.
|
RDD<scala.Tuple2<Object,Object>> |
fMeasureByThreshold()
Returns the (threshold, F-Measure) curve with beta = 1.0.
|
RDD<scala.Tuple2<Object,Object>> |
fMeasureByThreshold(double beta)
Returns the (threshold, F-Measure) curve.
|
int |
numBins() |
RDD<scala.Tuple2<Object,Object>> |
pr()
Returns the precision-recall curve, which is an RDD of (recall, precision),
NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
associated with the lowest recall on the curve.
|
RDD<scala.Tuple2<Object,Object>> |
precisionByThreshold()
Returns the (threshold, precision) curve.
|
RDD<scala.Tuple2<Object,Object>> |
recallByThreshold()
Returns the (threshold, recall) curve.
|
RDD<scala.Tuple2<Object,Object>> |
roc()
Returns the receiver operating characteristic (ROC) curve,
which is an RDD of (false positive rate, true positive rate)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
|
RDD<? extends scala.Product> |
scoreAndLabels() |
RDD<scala.Tuple2<Object,scala.Tuple2<Object,Object>>> |
scoreLabelsWeight() |
RDD<Object> |
thresholds()
Returns thresholds in descending order.
|
void |
unpersist()
Unpersist intermediate RDDs used in the computation.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels, int numBins)
public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
numBins
to 0.scoreAndLabels
- (undocumented)public RDD<? extends scala.Product> scoreAndLabels()
public int numBins()
public RDD<scala.Tuple2<Object,scala.Tuple2<Object,Object>>> scoreLabelsWeight()
public void unpersist()
public RDD<Object> thresholds()
public RDD<scala.Tuple2<Object,Object>> roc()
public double areaUnderROC()
public RDD<scala.Tuple2<Object,Object>> pr()
public double areaUnderPR()
public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
beta
- the beta factor in F-Measure computation.public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
public RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
public RDD<scala.Tuple2<Object,Object>> recallByThreshold()