Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
the name of the column
the name of the column to calculate the correlation against
The Pearson Correlation Coefficient as a Double.
1.4.0
Calculates the correlation of two columns of a DataFrame.
Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.
the name of the column
the name of the column to calculate the correlation against
The Pearson Correlation Coefficient as a Double.
1.4.0
Calculate the sample covariance of two numerical columns of a DataFrame.
Calculate the sample covariance of two numerical columns of a DataFrame.
the name of the first column
the name of the second column
the covariance of the two columns.
1.4.0
Computes a pair-wise frequency table of the given columns.
Computes a pair-wise frequency table of the given columns. Also known as a contingency table.
The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero
pair frequencies will be returned.
The first column of each row will be the distinct values of col1
and the column names will
be the distinct values of col2
. The name of the first column will be $col1_$col2
. Counts
will be returned as Long
s. Pairs that have no occurrences will have null
as their counts.
The name of the first column. Distinct items will make the first item of each row.
The name of the second column. Distinct items will make the column names of the DataFrame.
A DataFrame containing for the contingency table.
1.4.0
(Scala-specific) Finding frequent items for columns, possibly with false positives.
(Scala-specific) Finding frequent items for columns, possibly with false positives. Using the
frequent element count algorithm described in
proposed by Karp, Schenker, and Papadimitriou.
Uses a default
support of 1%.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
the names of the columns to search frequent items in.
A Local DataFrame with the Array of frequent items for each column.
1.4.0
(Scala-specific) Finding frequent items for columns, possibly with false positives.
(Scala-specific) Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in proposed by Karp, Schenker, and Papadimitriou.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
the names of the columns to search frequent items in.
A Local DataFrame with the Array of frequent items for each column.
1.4.0
Finding frequent items for columns, possibly with false positives.
Finding frequent items for columns, possibly with false positives. Using the
frequent element count algorithm described in
proposed by Karp, Schenker, and Papadimitriou.
Uses a default
support of 1%.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
the names of the columns to search frequent items in.
A Local DataFrame with the Array of frequent items for each column.
1.4.0
Finding frequent items for columns, possibly with false positives.
Finding frequent items for columns, possibly with false positives. Using the
frequent element count algorithm described in
proposed by Karp, Schenker, and Papadimitriou.
The support
should be greater than 1e-4.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
the names of the columns to search frequent items in.
The minimum frequency for an item to be considered frequent
. Should be greater
than 1e-4.
A Local DataFrame with the Array of frequent items for each column.
1.4.0
:: Experimental :: Statistic functions for DataFrames.
1.4.0