Defaults numBins
to 0.
Defaults numBins
to 0.
an RDD of (score, label) pairs.
if greater than 0, then the curves (ROC curve, PR curve) computed internally
will be down-sampled to this many "bins". If 0, no down-sampling will occur.
This is useful because the curve contains a point for each distinct score
in the input, and this could be as large as the input itself -- millions of
points or more, when thousands may be entirely sufficient to summarize
the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of
consecutive points. The size of each bin is
floor(scoreAndLabels.count() / numBins)
, which means the resulting number
of bins may not exactly equal numBins. The last bin in each partition may
be smaller as a result, meaning there may be an extra sample at
partition boundaries.
Computes the area under the precision-recall curve.
Computes the area under the precision-recall curve.
Computes the area under the receiver operating characteristic (ROC) curve.
Computes the area under the receiver operating characteristic (ROC) curve.
Returns the (threshold, F-Measure) curve with beta = 1.0.
Returns the (threshold, F-Measure) curve with beta = 1.0.
Returns the (threshold, F-Measure) curve.
Returns the (threshold, F-Measure) curve.
the beta factor in F-Measure computation.
an RDD of (threshold, F-Measure) pairs.
if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins".
if greater than 0, then the curves (ROC curve, PR curve) computed internally
will be down-sampled to this many "bins". If 0, no down-sampling will occur.
This is useful because the curve contains a point for each distinct score
in the input, and this could be as large as the input itself -- millions of
points or more, when thousands may be entirely sufficient to summarize
the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of
consecutive points. The size of each bin is
floor(scoreAndLabels.count() / numBins)
, which means the resulting number
of bins may not exactly equal numBins. The last bin in each partition may
be smaller as a result, meaning there may be an extra sample at
partition boundaries.
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
Returns the (threshold, precision) curve.
Returns the (threshold, precision) curve.
Returns the (threshold, recall) curve.
Returns the (threshold, recall) curve.
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
an RDD of (score, label) pairs.
an RDD of (score, label) pairs.
Returns thresholds in descending order.
Returns thresholds in descending order.
Unpersist intermediate RDDs used in the computation.
Unpersist intermediate RDDs used in the computation.
Evaluator for binary classification.