public class HadoopTableReader extends Object implements TableReader
Constructor and Description |
---|
HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
MetastoreRelation relation,
HiveContext sc,
org.apache.hadoop.hive.conf.HiveConf hiveExtraConf) |
Modifier and Type | Method and Description |
---|---|
static scala.collection.Iterator<org.apache.spark.sql.Row> |
fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator,
org.apache.hadoop.hive.serde2.Deserializer deserializer,
scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs,
org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)
Transform all given raw
Writable s into Row s. |
static void |
initializeLocalJobConfFunc(String path,
org.apache.hadoop.hive.ql.plan.TableDesc tableDesc,
org.apache.hadoop.mapred.JobConf jobConf)
Curried.
|
RDD<org.apache.spark.sql.Row> |
makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer,
scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
Create a HadoopRDD for every partition key specified in the query.
|
RDD<org.apache.spark.sql.Row> |
makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions) |
RDD<org.apache.spark.sql.Row> |
makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable) |
RDD<org.apache.spark.sql.Row> |
makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable,
Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass,
scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
Creates a Hadoop RDD to read data from the target table's data directory.
|
public HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes, MetastoreRelation relation, HiveContext sc, org.apache.hadoop.hive.conf.HiveConf hiveExtraConf)
public static void initializeLocalJobConfFunc(String path, org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf)
public static scala.collection.Iterator<org.apache.spark.sql.Row> fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator, org.apache.hadoop.hive.serde2.Deserializer deserializer, scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs, org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)
Writable
s into Row
s.
iterator
- Iterator of all Writable
s to be transformeddeserializer
- The Deserializer
associated with the input Writable
nonPartitionKeyAttrs
- Attributes that should be filled together with their corresponding
positions in the output schemamutableRow
- A reusable MutableRow
that should be filledIterator[Row]
transformed from iterator
public RDD<org.apache.spark.sql.Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable)
makeRDDForTable
in interface TableReader
public RDD<org.apache.spark.sql.Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable, Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
hiveTable
- Hive metadata for the table being scanned.deserializerClass
- Class of the SerDe used to deserialize Writables read from Hadoop.filterOpt
- If defined, then the filter is used to reject files contained in the data
directory being read. If None, then all files are accepted.public RDD<org.apache.spark.sql.Row> makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions)
makeRDDForPartitionedTable
in interface TableReader
public RDD<org.apache.spark.sql.Row> makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
partitionToDeserializer
- Mapping from a Hive Partition metadata object to the SerDe
class to use to deserialize input Writables from the corresponding partition.filterOpt
- If defined, then the filter is used to reject files contained in the data
subdirectory of each partition being read. If None, then all files are accepted.