A Java-friendly version of PowerIterationClustering.run.
Run the PIC algorithm.
Run the PIC algorithm.
an RDD of (i, j, sij) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity sij must be nonnegative. This is a symmetric matrix and hence sij = sji. For any (i, j) with nonzero similarity, there should be either (i, j, sij) or (j, i, sji) in the input. Tuples with i = j are ignored, because we assume sij = 0.0.
a PowerIterationClusteringModel that contains the clustering result
Set the initialization mode.
Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.
Set the number of clusters.
Set maximum number of iterations of the power iteration loop
:: Experimental ::
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.
Spectral clustering (Wikipedia)