|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.rdd.RDD<scala.Tuple2<K,V>> org.apache.spark.rdd.NewHadoopRDD<K,V>
public class NewHadoopRDD<K,V>
:: DeveloperApi ::
An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS,
sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce
).
Note: Instantiating this class directly is not recommended, please use
org.apache.spark.SparkContext.newAPIHadoopRDD()
param: sc The SparkContext to associate the RDD with. param: inputFormatClass Storage format of the data to be read. param: keyClass Class of the key associated with the inputFormatClass. param: valueClass Class of the value associated with the inputFormatClass. param: conf The Hadoop configuration.
Constructor Summary | |
---|---|
NewHadoopRDD(SparkContext sc,
Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass,
Class<K> keyClass,
Class<V> valueClass,
org.apache.hadoop.conf.Configuration conf)
|
Method Summary | ||
---|---|---|
InterruptibleIterator<scala.Tuple2<K,V>> |
compute(Partition theSplit,
TaskContext context)
:: DeveloperApi :: Implemented by subclasses to compute a given partition. |
|
org.apache.hadoop.conf.Configuration |
getConf()
|
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD. |
|
scala.collection.Seq<String> |
getPreferredLocations(Partition hsplit)
Optionally overridden by subclasses to specify placement preferences. |
|
|
mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
boolean preservesPartitioning,
scala.reflect.ClassTag<U> evidence$1)
Maps over a partition, providing the InputSplit that was used as the base of the partition. |
|
NewHadoopRDD<K,V> |
persist(StorageLevel storageLevel)
Set this RDD's storage level to persist its values across operations after the first time it is computed. |
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.apache.spark.Logging |
---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
Constructor Detail |
---|
public NewHadoopRDD(SparkContext sc, Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, org.apache.hadoop.conf.Configuration conf)
Method Detail |
---|
public Partition[] getPartitions()
RDD
public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit, TaskContext context)
RDD
compute
in class RDD<scala.Tuple2<K,V>>
theSplit
- (undocumented)context
- (undocumented)
public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$1)
public scala.collection.Seq<String> getPreferredLocations(Partition hsplit)
RDD
hsplit
- (undocumented)
public NewHadoopRDD<K,V> persist(StorageLevel storageLevel)
RDD
persist
in class RDD<scala.Tuple2<K,V>>
storageLevel
- (undocumented)
public org.apache.hadoop.conf.Configuration getConf()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |