@InterfaceStability.Evolving
public abstract class ColumnVector
extends Object
implements AutoCloseable
ColumnVector
,
e.g. if it's int type, Spark is guaranteed to only call getInt(int)
or
getInts(int, int)
.
ColumnVector supports all the data types including nested types. To handle nested types,
ColumnVector can have children and is a tree structure. Please refer to getStruct(int)
,
getArray(int)
and getMap(int)
for the details about how to implement nested
types.
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating
memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint.
Implementations should prefer computing efficiency over storage efficiency when design the
format. Since it is expected to reuse the ColumnVector instance while loading data, the storage
footprint is negligible.Modifier and Type | Method and Description |
---|---|
abstract void |
close()
Cleans up memory for this column vector.
|
DataType |
dataType()
Returns the data type of this column vector.
|
abstract ColumnarArray |
getArray(int rowId)
Returns the array type value for rowId.
|
abstract byte[] |
getBinary(int rowId)
Returns the binary type value for rowId.
|
abstract boolean |
getBoolean(int rowId)
Returns the boolean type value for rowId.
|
boolean[] |
getBooleans(int rowId,
int count)
Gets boolean type values from [rowId, rowId + count).
|
abstract byte |
getByte(int rowId)
Returns the byte type value for rowId.
|
byte[] |
getBytes(int rowId,
int count)
Gets byte type values from [rowId, rowId + count).
|
abstract Decimal |
getDecimal(int rowId,
int precision,
int scale)
Returns the decimal type value for rowId.
|
abstract double |
getDouble(int rowId)
Returns the double type value for rowId.
|
double[] |
getDoubles(int rowId,
int count)
Gets double type values from [rowId, rowId + count).
|
abstract float |
getFloat(int rowId)
Returns the float type value for rowId.
|
float[] |
getFloats(int rowId,
int count)
Gets float type values from [rowId, rowId + count).
|
abstract int |
getInt(int rowId)
Returns the int type value for rowId.
|
org.apache.spark.unsafe.types.CalendarInterval |
getInterval(int rowId)
Returns the calendar interval type value for rowId.
|
int[] |
getInts(int rowId,
int count)
Gets int type values from [rowId, rowId + count).
|
abstract long |
getLong(int rowId)
Returns the long type value for rowId.
|
long[] |
getLongs(int rowId,
int count)
Gets long type values from [rowId, rowId + count).
|
abstract ColumnarMap |
getMap(int ordinal)
Returns the map type value for rowId.
|
abstract short |
getShort(int rowId)
Returns the short type value for rowId.
|
short[] |
getShorts(int rowId,
int count)
Gets short type values from [rowId, rowId + count).
|
ColumnarRow |
getStruct(int rowId)
Returns the struct type value for rowId.
|
abstract org.apache.spark.unsafe.types.UTF8String |
getUTF8String(int rowId)
Returns the string type value for rowId.
|
abstract boolean |
hasNull()
Returns true if this column vector contains any null values.
|
abstract boolean |
isNullAt(int rowId)
Returns whether the value at rowId is NULL.
|
abstract int |
numNulls()
Returns the number of nulls in this column vector.
|
public final DataType dataType()
public abstract void close()
close
in interface AutoCloseable
public abstract boolean hasNull()
public abstract int numNulls()
public abstract boolean isNullAt(int rowId)
public abstract boolean getBoolean(int rowId)
public boolean[] getBooleans(int rowId, int count)
public abstract byte getByte(int rowId)
public byte[] getBytes(int rowId, int count)
public abstract short getShort(int rowId)
public short[] getShorts(int rowId, int count)
public abstract int getInt(int rowId)
public int[] getInts(int rowId, int count)
public abstract long getLong(int rowId)
public long[] getLongs(int rowId, int count)
public abstract float getFloat(int rowId)
public float[] getFloats(int rowId, int count)
public abstract double getDouble(int rowId)
public double[] getDoubles(int rowId, int count)
public final ColumnarRow getStruct(int rowId)
getChild(int)
and make this
vector a tree structure. The number of child vectors must be same as the number of fields of
the struct type, and each child vector is responsible to store the data for its corresponding
struct field.public abstract ColumnarArray getArray(int rowId)
ColumnarArray
and return it in
this method. ColumnarArray
requires a ColumnVector
that stores the data of all
the elements of all the arrays in this vector, and an offset and length which points to a range
in that ColumnVector
, and the range represents the array for rowId. Implementations
are free to decide where to put the data vector and offsets and lengths. For example, we can
use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in
this vector.public abstract ColumnarMap getMap(int ordinal)
ColumnarMap
and return it in
this method. ColumnarMap
requires a ColumnVector
that stores the data of all
the keys of all the maps in this vector, and another ColumnVector
that stores the data
of all the values of all the maps in this vector, and a pair of offset and length which
specify the range of the key/value array that belongs to the map type value at rowId.public abstract Decimal getDecimal(int rowId, int precision, int scale)
public abstract org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
public abstract byte[] getBinary(int rowId)
public final org.apache.spark.unsafe.types.CalendarInterval getInterval(int rowId)
getChild(int)
and define 2
child vectors: the first child vector is an int type vector, containing all the month values of
all the interval values in this vector. The second child vector is a long type vector,
containing all the microsecond values of all the interval values in this vector.