KafkaRDD (Spark 1.3.1 JavaDoc)

Object
- org.apache.spark.rdd.RDD<R>
- - org.apache.spark.streaming.kafka.KafkaRDD<K,V,U,T,R>

All Implemented Interfaces:

java.io.Serializable, Logging, HasOffsetRanges
```
public class KafkaRDD<K,V,U extends kafka.serializer.Decoder<?>,T extends kafka.serializer.Decoder<?>,R>
extends RDD<R>
implements Logging, HasOffsetRanges
```
A batch-oriented interface for consuming from Kafka. Starting and ending offsets are specified in advance, so that you can control exactly-once semantics.

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description
`KafkaRDD(SparkContext sc, scala.collection.immutable.Map<String,String> kafkaParams, OffsetRange[] offsetRanges, scala.collection.immutable.Map<kafka.common.TopicAndPartition,scala.Tuple2<String,Object>> leaders, scala.Function1<kafka.message.MessageAndMetadata<K,V>,R> messageHandler, scala.reflect.ClassTag<K> evidence$1, scala.reflect.ClassTag<V> evidence$2, scala.reflect.ClassTag<U> evidence$3, scala.reflect.ClassTag<T> evidence$4, scala.reflect.ClassTag<R> evidence$5)`

Method Summary

Methods
Modifier and Type	Method and Description
`static <K,V,U extends kafka.serializer.Decoder<?>,T extends kafka.serializer.Decoder<?>,R> KafkaRDD<K,V,U,T,R>`	apply(SparkContext sc, scala.collection.immutable.Map<String,String> kafkaParams, scala.collection.immutable.Map<kafka.common.TopicAndPartition,Object> fromOffsets, scala.collection.immutable.Map<kafka.common.TopicAndPartition,KafkaCluster.LeaderOffset> untilOffsets, scala.Function1<kafka.message.MessageAndMetadata<K,V>,R> messageHandler, scala.reflect.ClassTag<K> evidence$6, scala.reflect.ClassTag<V> evidence$7, scala.reflect.ClassTag<U> evidence$8, scala.reflect.ClassTag<T> evidence$9, scala.reflect.ClassTag<R> evidence$10)
`scala.collection.Iterator<R>`	`compute(Partition thePart, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.collection.Seq<String>`	`getPreferredLocations(Partition thePart)` Optionally overridden by subclasses to specify placement preferences.
`OffsetRange[]`	`offsetRanges()`

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doCheckpoint, doubleRDDToDoubleRDDFunctions, elementClassTag, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getCreationSite, getNarrowAncestors, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, markCheckpointed, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, retag, retag, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

KafkaRDD

public KafkaRDD(SparkContext sc,
        scala.collection.immutable.Map<String,String> kafkaParams,
        OffsetRange[] offsetRanges,
        scala.collection.immutable.Map<kafka.common.TopicAndPartition,scala.Tuple2<String,Object>> leaders,
        scala.Function1<kafka.message.MessageAndMetadata<K,V>,R> messageHandler,
        scala.reflect.ClassTag<K> evidence$1,
        scala.reflect.ClassTag<V> evidence$2,
        scala.reflect.ClassTag<U> evidence$3,
        scala.reflect.ClassTag<T> evidence$4,
        scala.reflect.ClassTag<R> evidence$5)

Method Detail

apply

public static <K,V,U extends kafka.serializer.Decoder<?>,T extends kafka.serializer.Decoder<?>,R> KafkaRDD<K,V,U,T,R> apply(SparkContext sc,
                                                                                                            scala.collection.immutable.Map<String,String> kafkaParams,
                                                                                                            scala.collection.immutable.Map<kafka.common.TopicAndPartition,Object> fromOffsets,
                                                                                                            scala.collection.immutable.Map<kafka.common.TopicAndPartition,KafkaCluster.LeaderOffset> untilOffsets,
                                                                                                            scala.Function1<kafka.message.MessageAndMetadata<K,V>,R> messageHandler,
                                                                                                            scala.reflect.ClassTag<K> evidence$6,
                                                                                                            scala.reflect.ClassTag<V> evidence$7,
                                                                                                            scala.reflect.ClassTag<U> evidence$8,
                                                                                                            scala.reflect.ClassTag<T> evidence$9,
                                                                                                            scala.reflect.ClassTag<R> evidence$10)

Parameters:: kafkaParams - Kafka configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" to be set with Kafka broker(s), NOT zookeeper servers, specified in host1:port1,host2:port2 form.; fromOffsets - per-topic/partition Kafka offsets defining the (inclusive) starting point of the batch; untilOffsets - per-topic/partition Kafka offsets defining the (exclusive) ending point of the batch; messageHandler - function for translating each message into the desired type

offsetRanges
```
public OffsetRange[] offsetRanges()
```
Specified by:

offsetRanges in interface HasOffsetRanges

getPartitions
```
public Partition[] getPartitions()
```
Description copied from class: RDD

Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.

getPreferredLocations
```
public scala.collection.Seq<String> getPreferredLocations(Partition thePart)
```
Description copied from class: RDD

Optionally overridden by subclasses to specify placement preferences.

compute

public scala.collection.Iterator<R> compute(Partition thePart,
                                   TaskContext context)

Description copied from class: RDD

:: DeveloperApi :: Implemented by subclasses to compute a given partition.

Specified by:: compute in class RDD<R>

Class KafkaRDD<K,V,U extends kafka.serializer.Decoder<?>,T extends kafka.serializer.Decoder<?>,R>

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

KafkaRDD

Method Detail

apply

offsetRanges

getPartitions

getPreferredLocations

compute