public class BlockMatrix extends Object implements DistributedMatrix, Logging
param: blocks The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that
form this distributed matrix. If multiple blocks with the same index exist, the
results for operations like add and multiply will be unpredictable.
param: rowsPerBlock Number of rows that make up each block. The blocks forming the final
rows are not required to have the given number of rows
param: colsPerBlock Number of columns that make up each block. The blocks forming the final
columns are not required to have the given number of columns
param: nRows Number of rows of this matrix. If the supplied value is less than or equal to zero,
the number of rows will be calculated when numRows
is invoked.
param: nCols Number of columns of this matrix. If the supplied value is less than or equal to
zero, the number of columns will be calculated when numCols
is invoked.
Constructor and Description |
---|
BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks,
int rowsPerBlock,
int colsPerBlock)
Alternate constructor for BlockMatrix without the input of the number of rows and columns.
|
BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks,
int rowsPerBlock,
int colsPerBlock,
long nRows,
long nCols) |
Modifier and Type | Method and Description |
---|---|
BlockMatrix |
add(BlockMatrix other)
Adds the given block matrix
other to this block matrix: this + other . |
RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> |
blocks() |
BlockMatrix |
cache()
Caches the underlying RDD.
|
int |
colsPerBlock() |
BlockMatrix |
multiply(BlockMatrix other)
|
BlockMatrix |
multiply(BlockMatrix other,
int numMidDimSplits)
|
int |
numColBlocks() |
long |
numCols()
Gets or computes the number of columns.
|
int |
numRowBlocks() |
long |
numRows()
Gets or computes the number of rows.
|
BlockMatrix |
persist(StorageLevel storageLevel)
Persists the underlying RDD with the specified storage level.
|
int |
rowsPerBlock() |
BlockMatrix |
subtract(BlockMatrix other)
Subtracts the given block matrix
other from this block matrix: this - other . |
CoordinateMatrix |
toCoordinateMatrix()
Converts to CoordinateMatrix.
|
IndexedRowMatrix |
toIndexedRowMatrix()
Converts to IndexedRowMatrix.
|
Matrix |
toLocalMatrix()
Collect the distributed matrix on the driver as a
DenseMatrix . |
BlockMatrix |
transpose()
Transpose this
BlockMatrix . |
void |
validate()
Validates the block matrix info against the matrix data (
blocks ) and throws an exception if
any error is found. |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initializeLogging, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock, long nRows, long nCols)
public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock)
blocks
- The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that
form this distributed matrix. If multiple blocks with the same index exist, the
results for operations like add and multiply will be unpredictable.rowsPerBlock
- Number of rows that make up each block. The blocks forming the final
rows are not required to have the given number of rowscolsPerBlock
- Number of columns that make up each block. The blocks forming the final
columns are not required to have the given number of columnspublic int rowsPerBlock()
public int colsPerBlock()
public long numRows()
DistributedMatrix
numRows
in interface DistributedMatrix
public long numCols()
DistributedMatrix
numCols
in interface DistributedMatrix
public int numRowBlocks()
public int numColBlocks()
public void validate()
blocks
) and throws an exception if
any error is found.public BlockMatrix cache()
public BlockMatrix persist(StorageLevel storageLevel)
public CoordinateMatrix toCoordinateMatrix()
public IndexedRowMatrix toIndexedRowMatrix()
public Matrix toLocalMatrix()
DenseMatrix
.public BlockMatrix transpose()
BlockMatrix
. Returns a new BlockMatrix
instance sharing the
same underlying data. Is a lazy operation.public BlockMatrix add(BlockMatrix other)
other
to this
block matrix: this + other
.
The matrices must have the same size and matching rowsPerBlock
and colsPerBlock
values. If one of the blocks that are being added are instances of SparseMatrix
,
the resulting sub matrix will also be a SparseMatrix
, even if it is being added
to a DenseMatrix
. If two dense matrices are added, the output will also be a
DenseMatrix
.other
- (undocumented)public BlockMatrix subtract(BlockMatrix other)
other
from this
block matrix: this - other
.
The matrices must have the same size and matching rowsPerBlock
and colsPerBlock
values. If one of the blocks that are being subtracted are instances of SparseMatrix
,
the resulting sub matrix will also be a SparseMatrix
, even if it is being subtracted
from a DenseMatrix
. If two dense matrices are subtracted, the output will also be a
DenseMatrix
.other
- (undocumented)public BlockMatrix multiply(BlockMatrix other)
BlockMatrix
to other
, another BlockMatrix
. The colsPerBlock
of this matrix must equal the rowsPerBlock
of other
. If other
contains
SparseMatrix
, they will have to be converted to a DenseMatrix
. The output
BlockMatrix
will only consist of blocks of DenseMatrix
. This may cause
some performance issues until support for multiplying two sparse matrices is added.
other
- (undocumented)multiply
used to throw an error when
there were blocks with duplicate indices. Now, the blocks with duplicate indices will be added
with each other.public BlockMatrix multiply(BlockMatrix other, int numMidDimSplits)
BlockMatrix
to other
, another BlockMatrix
. The colsPerBlock
of this matrix must equal the rowsPerBlock
of other
. If other
contains
SparseMatrix
, they will have to be converted to a DenseMatrix
. The output
BlockMatrix
will only consist of blocks of DenseMatrix
. This may cause
some performance issues until support for multiplying two sparse matrices is added.
Blocks with duplicate indices will be added with each other.
other
- Matrix B
in A * B = C
numMidDimSplits
- Number of splits to cut on the middle dimension when doing
multiplication. For example, when multiplying a Matrix A
of
size m x n
with Matrix B
of size n x k
, this parameter
configures the parallelism to use when grouping the matrices. The
parallelism will increase from m x k
to m x k x numMidDimSplits
,
which in some cases also reduces total shuffled data.