Compute a histogram using the provided buckets.
Compute a histogram using the provided buckets. The buckets are all open
to the right except for the last which is closed.
e.g. for the array
[1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
e.g <=x<10, 10<=x<20, 20<=x<=50
And on the input of 1 and 50 we would have a histogram of 1, 0, 1
If your histogram is evenly spaced (e.g. [0, 10, 20, 30]) this can be switched from an O(log n) insertion to O(1) per element. (where n = # buckets) if you set evenBuckets to true. buckets must be sorted and not contain any duplicates. buckets array must be at least two elements All NaN entries are treated the same. If you have a NaN bucket it must be the maximum value of the last position and all NaN entries will be counted in that bucket.
Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD.
Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD. For example if the min value is 0 and the max is 100 and there are two buckets the resulting buckets will be [0, 50) [50, 100]. bucketCount must be at least 1 If the RDD contains infinity, NaN throws an exception If the elements in RDD do not vary (max == min) always returns a single bucket.
Compute the mean of this RDD's elements.
Approximate operation to return the mean within a timeout.
Compute the population standard deviation of this RDD's elements.
Compute the population standard deviation of this RDD's elements.
Compute the population variance of this RDD's elements.
Compute the population variance of this RDD's elements.
Compute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).
Compute the sample variance of this RDD's elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).
Return a org.apache.spark.util.StatCounter object that captures the mean, variance and count of the RDD's elements in one operation.
Compute the population standard deviation of this RDD's elements.
Add up the elements in this RDD.
Approximate operation to return the sum within a timeout.
Compute the population variance of this RDD's elements.
Extra functions available on RDDs of Doubles through an implicit conversion.