Object

org.apache.spark.sql.vectorized.ColumnVector

org.apache.spark.sql.vectorized.ArrowColumnVector

All Implemented Interfaces:: AutoCloseable

@DeveloperApi public class ArrowColumnVector extends ColumnVector

A column vector backed by Apache Arrow.

Constructor Summary

Constructors

Constructor

Description

ArrowColumnVector(org.apache.arrow.vector.ValueVector vector)
Method Summary

Modifier and Type

Method

Description

void

close()

Cleans up memory for this column vector.

ColumnarArray

getArray(int rowId)

Returns the array type value for rowId.

byte[]

getBinary(int rowId)

Returns the binary type value for rowId.

boolean

getBoolean(int rowId)

Returns the boolean type value for rowId.

byte

getByte(int rowId)

Returns the byte type value for rowId.

ArrowColumnVector

getChild(int ordinal)

Decimal

getDecimal(int rowId, int precision, int scale)

Returns the decimal type value for rowId.

double

getDouble(int rowId)

Returns the double type value for rowId.

float

getFloat(int rowId)

Returns the float type value for rowId.

int

getInt(int rowId)

Returns the int type value for rowId.

CalendarInterval

getInterval(int rowId)

Returns the calendar interval type value for rowId.

long

getLong(int rowId)

Returns the long type value for rowId.

ColumnarMap

getMap(int rowId)

Returns the map type value for rowId.

short

getShort(int rowId)

Returns the short type value for rowId.

org.apache.spark.unsafe.types.UTF8String

getUTF8String(int rowId)

Returns the string type value for rowId.

org.apache.arrow.vector.ValueVector

getValueVector()

boolean

hasNull()

Returns true if this column vector contains any null values.

boolean

isNullAt(int rowId)

Returns whether the value at rowId is NULL.

int

numNulls()

Returns the number of nulls in this column vector.

Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector
dataType, getBooleans, getBytes, getDoubles, getFloats, getInts, getLongs, getShorts, getStruct, getVariant

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ArrowColumnVector
  
  public ArrowColumnVector(org.apache.arrow.vector.ValueVector vector)
Method Details
- getValueVector
  
  public org.apache.arrow.vector.ValueVector getValueVector()
- hasNull
  
  public boolean hasNull()
  
  Description copied from class: ColumnVector
  
  Returns true if this column vector contains any null values.
  
  Specified by:
  
  hasNull in class ColumnVector
- numNulls
  
  public int numNulls()
  
  Description copied from class: ColumnVector
  
  Returns the number of nulls in this column vector.
  
  Specified by:
  
  numNulls in class ColumnVector
- close
  
  public void close()
  
  Description copied from class: ColumnVector
  
  Cleans up memory for this column vector. The column vector is not usable after this.
  This overwrites AutoCloseable.close() to remove the throws clause, as column vector is in-memory and we don't expect any exception to happen during closing.
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in class ColumnVector
- isNullAt
  
  public boolean isNullAt(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns whether the value at rowId is NULL.
  
  Specified by:
  
  isNullAt in class ColumnVector
- getBoolean
  
  public boolean getBoolean(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the boolean type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getBoolean in class ColumnVector
- getByte
  
  public byte getByte(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the byte type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getByte in class ColumnVector
- getShort
  
  public short getShort(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the short type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getShort in class ColumnVector
- getInt
  
  public int getInt(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the int type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getInt in class ColumnVector
- getLong
  
  public long getLong(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the long type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getLong in class ColumnVector
- getFloat
  
  public float getFloat(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the float type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getFloat in class ColumnVector
- getDouble
  
  public double getDouble(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the double type value for rowId. The return value is undefined and can be anything, if the slot for rowId is null.
  
  Specified by:
  
  getDouble in class ColumnVector
- getDecimal
  
  public Decimal getDecimal(int rowId, int precision, int scale)
  
  Description copied from class: ColumnVector
  
  Returns the decimal type value for rowId. If the slot for rowId is null, it should return null.
  
  Specified by:
  
  getDecimal in class ColumnVector
- getUTF8String
  
  public org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the string type value for rowId. If the slot for rowId is null, it should return null.
  Note that the returned UTF8String may point to the data of this column vector, please copy it if you want to keep it after this column vector is freed.
  
  Specified by:
  
  getUTF8String in class ColumnVector
- getInterval
  
  public CalendarInterval getInterval(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the calendar interval type value for rowId. If the slot for rowId is null, it should return null.
  In Spark, calendar interval type value is basically two integer values representing the number of months and days in this interval, and a long value representing the number of microseconds in this interval. An interval type vector is the same as a struct type vector with 3 fields: months, days and microseconds.
  To support interval type, implementations must implement ColumnVector.getChild(int) and define 3 child vectors: the first child vector is an int type vector, containing all the month values of all the interval values in this vector. The second child vector is an int type vector, containing all the day values of all the interval values in this vector. The third child vector is a long type vector, containing all the microsecond values of all the interval values in this vector. Note that the ArrowColumnVector leverages its built-in IntervalMonthDayNanoVector instead of above-mentioned protocol.
  
  Overrides:
  
  getInterval in class ColumnVector
- getBinary
  
  public byte[] getBinary(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the binary type value for rowId. If the slot for rowId is null, it should return null.
  
  Specified by:
  
  getBinary in class ColumnVector
- getArray
  
  public ColumnarArray getArray(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the array type value for rowId. If the slot for rowId is null, it should return null.
  To support array type, implementations must construct an ColumnarArray and return it in this method. ColumnarArray requires a ColumnVector that stores the data of all the elements of all the arrays in this vector, and an offset and length which points to a range in that ColumnVector, and the range represents the array for rowId. Implementations are free to decide where to put the data vector and offsets and lengths. For example, we can use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in this vector.
  
  Specified by:
  
  getArray in class ColumnVector
- getMap
  
  public ColumnarMap getMap(int rowId)
  
  Description copied from class: ColumnVector
  
  Returns the map type value for rowId. If the slot for rowId is null, it should return null.
  In Spark, map type value is basically a key data array and a value data array. A key from the key array with a index and a value from the value array with the same index contribute to an entry of this map type value.
  To support map type, implementations must construct a ColumnarMap and return it in this method. ColumnarMap requires a ColumnVector that stores the data of all the keys of all the maps in this vector, and another ColumnVector that stores the data of all the values of all the maps in this vector, and a pair of offset and length which specify the range of the key/value array that belongs to the map type value at rowId.
  
  Specified by:
  
  getMap in class ColumnVector
- getChild
  
  public ArrowColumnVector getChild(int ordinal)
  
  Specified by:
  
  getChild in class ColumnVector
  
  Returns:
  
  child ColumnVector at the given ordinal.

Class ArrowColumnVector

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.vectorized.ColumnVector

Methods inherited from class java.lang.Object

Constructor Details

ArrowColumnVector

Method Details

getValueVector

hasNull

numNulls

close

isNullAt

getBoolean

getByte

getShort

getInt

getLong

getFloat

getDouble

getDecimal

getUTF8String

getInterval

getBinary

getArray

getMap

getChild