public class PowerIterationClustering
extends Object
implements scala.Serializable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
Lin and Cohen
. From the abstract: PIC finds a very
low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise
similarity matrix of the data.
Modifier and Type | Class and Description |
---|---|
static class |
PowerIterationClustering.Assignment
:: Experimental ::
Cluster assignment.
|
Constructor and Description |
---|
PowerIterationClustering()
Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100,
initMode: "random"}.
|
PowerIterationClustering(int k,
int maxIterations,
String initMode) |
Modifier and Type | Method and Description |
---|---|
static Graph<Object,Object> |
initDegreeVector(Graph<Object,Object> g)
Generates the degree vector as the vertex properties (v0) to start power iteration.
|
static VertexRDD<Object> |
kMeans(VertexRDD<Object> v,
int k)
Runs k-means clustering.
|
static Graph<Object,Object> |
normalize(RDD<scala.Tuple3<Object,Object,Object>> similarities)
Normalizes the affinity matrix (A) by row sums and returns the normalized affinity matrix (W).
|
static VertexRDD<Object> |
powerIter(Graph<Object,Object> g,
int maxIterations)
Runs power iteration.
|
static Graph<Object,Object> |
randomInit(Graph<Object,Object> g)
Generates random vertex properties (v0) to start power iteration.
|
PowerIterationClusteringModel |
run(JavaRDD<scala.Tuple3<Long,Long,Double>> similarities)
A Java-friendly version of
PowerIterationClustering.run . |
PowerIterationClusteringModel |
run(RDD<scala.Tuple3<Object,Object,Object>> similarities)
Run the PIC algorithm.
|
PowerIterationClustering |
setInitializationMode(String mode)
Set the initialization mode.
|
PowerIterationClustering |
setK(int k)
Set the number of clusters.
|
PowerIterationClustering |
setMaxIterations(int maxIterations)
Set maximum number of iterations of the power iteration loop
|
public PowerIterationClustering(int k, int maxIterations, String initMode)
public PowerIterationClustering()
public static Graph<Object,Object> normalize(RDD<scala.Tuple3<Object,Object,Object>> similarities)
public static Graph<Object,Object> randomInit(Graph<Object,Object> g)
g
- a graph representing the normalized affinity matrix (W)public static Graph<Object,Object> initDegreeVector(Graph<Object,Object> g)
g
- a graph representing the normalized affinity matrix (W)public static VertexRDD<Object> powerIter(Graph<Object,Object> g, int maxIterations)
g
- input graph with edges representing the normalized affinity matrix (W) and vertices
representing the initial vector of the power iterations.maxIterations
- maximum number of iterationsVertexRDD
representing the pseudo-eigenvectorpublic static VertexRDD<Object> kMeans(VertexRDD<Object> v, int k)
v
- a VertexRDD
representing the pseudo-eigenvectork
- number of clustersVertexRDD
representing the clustering assignmentspublic PowerIterationClustering setK(int k)
public PowerIterationClustering setMaxIterations(int maxIterations)
public PowerIterationClustering setInitializationMode(String mode)
public PowerIterationClusteringModel run(RDD<scala.Tuple3<Object,Object,Object>> similarities)
similarities
- an RDD of (i, j, s,,ij,,) tuples representing the affinity matrix, which is
the matrix A in the PIC paper. The similarity s,,ij,, must be nonnegative.
This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with
nonzero similarity, there should be either (i, j, s,,ij,,) or
(j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we
assume s,,ij,, = 0.0.
PowerIterationClusteringModel
that contains the clustering resultpublic PowerIterationClusteringModel run(JavaRDD<scala.Tuple3<Long,Long,Double>> similarities)
PowerIterationClustering.run
.