public class BinaryClassificationMetrics extends Object implements Logging
param: scoreAndLabels an RDD of (score, label) pairs.
param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally
will be down-sampled to this many "bins". If 0, no down-sampling will occur.
This is useful because the curve contains a point for each distinct score
in the input, and this could be as large as the input itself -- millions of
points or more, when thousands may be entirely sufficient to summarize
the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of
consecutive points. The size of each bin is
floor(scoreAndLabels.count() / numBins)
, which means the resulting number
of bins may not exactly equal numBins. The last bin in each partition may
be smaller as a result, meaning there may be an extra sample at
partition boundaries.
Constructor and Description |
---|
BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
Defaults
numBins to 0. |
BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels,
int numBins) |
Modifier and Type | Method and Description |
---|---|
double |
areaUnderPR()
Computes the area under the precision-recall curve.
|
double |
areaUnderROC()
Computes the area under the receiver operating characteristic (ROC) curve.
|
RDD<scala.Tuple2<Object,Object>> |
fMeasureByThreshold()
Returns the (threshold, F-Measure) curve with beta = 1.0.
|
RDD<scala.Tuple2<Object,Object>> |
fMeasureByThreshold(double beta)
Returns the (threshold, F-Measure) curve.
|
int |
numBins() |
RDD<scala.Tuple2<Object,Object>> |
pr()
Returns the precision-recall curve, which is an RDD of (recall, precision),
NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
associated with the lowest recall on the curve.
|
RDD<scala.Tuple2<Object,Object>> |
precisionByThreshold()
Returns the (threshold, precision) curve.
|
RDD<scala.Tuple2<Object,Object>> |
recallByThreshold()
Returns the (threshold, recall) curve.
|
RDD<scala.Tuple2<Object,Object>> |
roc()
Returns the receiver operating characteristic (ROC) curve,
which is an RDD of (false positive rate, true positive rate)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
|
RDD<scala.Tuple2<Object,Object>> |
scoreAndLabels() |
RDD<Object> |
thresholds()
Returns thresholds in descending order.
|
void |
unpersist()
Unpersist intermediate RDDs used in the computation.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels, int numBins)
public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
numBins
to 0.scoreAndLabels
- (undocumented)public double areaUnderPR()
public double areaUnderROC()
public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
beta
- the beta factor in F-Measure computation.public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
public int numBins()
public RDD<scala.Tuple2<Object,Object>> pr()
public RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
public RDD<scala.Tuple2<Object,Object>> recallByThreshold()
public RDD<scala.Tuple2<Object,Object>> roc()
public RDD<scala.Tuple2<Object,Object>> scoreAndLabels()
public RDD<Object> thresholds()
public void unpersist()