StreamingContext (Spark 1.0.0 JavaDoc)

Object
- org.apache.spark.streaming.StreamingContext

All Implemented Interfaces:

Logging
```
public class StreamingContext
extends Object
implements Logging
```
Main entry point for Spark Streaming functionality. It provides methods used to create DStreams from various input sources. It can be either created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext. The associated SparkContext can be accessed using context.sparkContext. After creating and transforming DStreams, the streaming computation can be started and stopped using context.start() and context.stop(), respectively. context.awaitTransformation() allows the current thread to wait for the termination of the context by stop() or by an exception.

Constructor Summary

Constructors
Constructor and Description
`StreamingContext(SparkConf conf, Duration batchDuration)` Create a StreamingContext by providing the configuration necessary for a new SparkContext.
`StreamingContext(SparkContext sparkContext, Duration batchDuration)` Create a StreamingContext using an existing SparkContext.
`StreamingContext(String path, org.apache.hadoop.conf.Configuration hadoopConf)` Recreate a StreamingContext from a checkpoint file.
`StreamingContext(String master, String appName, Duration batchDuration, String sparkHome, scala.collection.Seq<String> jars, scala.collection.Map<String,String> environment)` Create a StreamingContext by providing the details necessary for creating a new SparkContext.

Method Summary

Methods
Modifier and Type	Method and Description
`<T> ReceiverInputDStream<T>`	`actorStream(akka.actor.Props props, String name, StorageLevel storageLevel, akka.actor.SupervisorStrategy supervisorStrategy, scala.reflect.ClassTag<T> evidence$3)` Create an input stream with any arbitrary user implemented actor receiver.
`void`	`addStreamingListener(StreamingListener streamingListener)` Add a `StreamingListener` object for receiving system events related to streaming.
`void`	`awaitTermination()` Wait for the execution to stop.
`void`	`awaitTermination(long timeout)` Wait for the execution to stop.
`void`	`checkpoint(String directory)` Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.
`String`	`checkpointDir()`
`Duration`	`checkpointDuration()`
`SparkConf`	`conf()`
`static int`	`DEFAULT_CLEANER_TTL()`
`SparkEnv`	`env()`
`<K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> InputDStream<scala.Tuple2<K,V>>`	`fileStream(String directory, scala.reflect.ClassTag<K> evidence$6, scala.reflect.ClassTag<V> evidence$7, scala.reflect.ClassTag<F> evidence$8)` Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
`<K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> InputDStream<scala.Tuple2<K,V>>`	`fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path,Object> filter, boolean newFilesOnly, scala.reflect.ClassTag<K> evidence$9, scala.reflect.ClassTag<V> evidence$10, scala.reflect.ClassTag<F> evidence$11)` Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
`static StreamingContext`	`getOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError)` Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
`org.apache.spark.streaming.DStreamGraph`	`graph()`
`boolean`	`isCheckpointPresent()`
`static scala.Option<String>`	`jarOfClass(Class<?> cls)` Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.
`<T> ReceiverInputDStream<T>`	`networkStream(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$1)` Create an input stream with any arbitrary user implemented receiver.
`<T> InputDStream<T>`	`queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, scala.reflect.ClassTag<T> evidence$12)` Create an input stream from a queue of RDDs.
`<T> InputDStream<T>`	`queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, RDD<T> defaultRDD, scala.reflect.ClassTag<T> evidence$13)` Create an input stream from a queue of RDDs.
`<T> ReceiverInputDStream<T>`	`rawSocketStream(String hostname, int port, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$5)` Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.
`<T> ReceiverInputDStream<T>`	`receiverStream(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$2)` Create an input stream with any arbitrary user implemented receiver.
`void`	`remember(Duration duration)` Set each DStreams in this context to remember RDDs it generated in the last given duration.
`SparkContext`	`sc()`
`org.apache.spark.streaming.scheduler.JobScheduler`	`scheduler()`
`<T> ReceiverInputDStream<T>`	`socketStream(String hostname, int port, scala.Function1<java.io.InputStream,scala.collection.Iterator<T>> converter, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$4)` Create a input stream from TCP source hostname:port.
`ReceiverInputDStream<String>`	`socketTextStream(String hostname, int port, StorageLevel storageLevel)` Create a input stream from TCP source hostname:port.
`SparkContext`	`sparkContext()` Return the associated Spark context
`void`	`start()` Start the execution of the streams.
`scala.Enumeration.Value`	`state()`
`void`	`stop(boolean stopSparkContext)` Stop the execution of the streams immediately (does not wait for all received data to be processed).
`void`	`stop(boolean stopSparkContext, boolean stopGracefully)` Stop the execution of the streams, with option of ensuring all received data has been processed.
`org.apache.spark.streaming.StreamingContext.StreamingContextState$`	`StreamingContextState()` Accessor for nested Scala object
`DStream<String>`	`textFileStream(String directory)` Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).
`static <K,V> PairDStreamFunctions<K,V>`	`toPairDStreamFunctions(DStream<scala.Tuple2<K,V>> stream, scala.reflect.ClassTag<K> kt, scala.reflect.ClassTag<V> vt, scala.math.Ordering<K> ord)`
`<T> DStream<T>`	`transform(scala.collection.Seq<DStream<?>> dstreams, scala.Function2<scala.collection.Seq<RDD<?>>,Time,RDD<T>> transformFunc, scala.reflect.ClassTag<T> evidence$15)` Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
`org.apache.spark.streaming.ui.StreamingTab`	`uiTab()`
`<T> DStream<T>`	`union(scala.collection.Seq<DStream<T>> streams, scala.reflect.ClassTag<T> evidence$14)` Create a unified DStream from multiple DStreams of the same type and same slide duration.
`org.apache.spark.streaming.ContextWaiter`	`waiter()`

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - StreamingContext
```
public StreamingContext(SparkContext sparkContext,
                Duration batchDuration)
```
    Create a StreamingContext using an existing SparkContext.
    
    Parameters:
    sparkContext - existing SparkContext
    batchDuration - the time interval at which streaming data will be divided into batches
  - StreamingContext
```
public StreamingContext(SparkConf conf,
                Duration batchDuration)
```
    Create a StreamingContext by providing the configuration necessary for a new SparkContext.
    
    Parameters:
    conf - a org.apache.spark.SparkConf object specifying Spark parameters
    batchDuration - the time interval at which streaming data will be divided into batches
  - StreamingContext
```
public StreamingContext(String master,
                String appName,
                Duration batchDuration,
                String sparkHome,
                scala.collection.Seq<String> jars,
                scala.collection.Map<String,String> environment)
```
    Create a StreamingContext by providing the details necessary for creating a new SparkContext.
    
    Parameters:
    master - cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
    appName - a name for your job, to display on the cluster web UI
    batchDuration - the time interval at which streaming data will be divided into batches
  - StreamingContext
```
public StreamingContext(String path,
                org.apache.hadoop.conf.Configuration hadoopConf)
```
    Recreate a StreamingContext from a checkpoint file.
    
    Parameters:
    path - Path to the directory that was specified as the checkpoint directory
    hadoopConf - Optional, configuration object if necessary for reading from HDFS compatible filesystems
- Method Detail
  - DEFAULT_CLEANER_TTL
```
public static int DEFAULT_CLEANER_TTL()
```
  - toPairDStreamFunctions
```
public static <K,V> PairDStreamFunctions<K,V> toPairDStreamFunctions(DStream<scala.Tuple2<K,V>> stream,
                                                     scala.reflect.ClassTag<K> kt,
                                                     scala.reflect.ClassTag<V> vt,
                                                     scala.math.Ordering<K> ord)
```
  - getOrCreate
```
public static StreamingContext getOrCreate(String checkpointPath,
                           scala.Function0<StreamingContext> creatingFunc,
                           org.apache.hadoop.conf.Configuration hadoopConf,
                           boolean createOnError)
```
    Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the StreamingContext will be created by called the provided creatingFunc.
    
    Parameters:
    checkpointPath - Checkpoint directory used in an earlier StreamingContext program
    creatingFunc - Function to create a new StreamingContext
    hadoopConf - Optional Hadoop configuration if necessary for reading from the file system
    createOnError - Optional, whether to create a new StreamingContext if there is an error in reading checkpoint data. By default, an exception will be thrown on error.
  - jarOfClass
```
public static scala.Option<String> jarOfClass(Class<?> cls)
```
    Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.
  - isCheckpointPresent
```
public boolean isCheckpointPresent()
```
  - sc
```
public SparkContext sc()
```
  - conf
```
public SparkConf conf()
```
  - env
```
public SparkEnv env()
```
  - graph
```
public org.apache.spark.streaming.DStreamGraph graph()
```
  - checkpointDir
```
public String checkpointDir()
```
  - checkpointDuration
```
public Duration checkpointDuration()
```
  - scheduler
```
public org.apache.spark.streaming.scheduler.JobScheduler scheduler()
```
  - waiter
```
public org.apache.spark.streaming.ContextWaiter waiter()
```
  - uiTab
```
public org.apache.spark.streaming.ui.StreamingTab uiTab()
```
  - StreamingContextState
```
public org.apache.spark.streaming.StreamingContext.StreamingContextState$ StreamingContextState()
```
    Accessor for nested Scala object
  - state
```
public scala.Enumeration.Value state()
```
  - sparkContext
```
public SparkContext sparkContext()
```
    Return the associated Spark context
  - remember
```
public void remember(Duration duration)
```
    Set each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of time and releases them for garbage collection. This method allows the developer to specify how to long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).
    
    Parameters:
    duration - Minimum duration that each DStream should remember its RDDs
  - checkpoint
```
public void checkpoint(String directory)
```
    Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.
    
    Parameters:
    directory - HDFS-compatible directory where the checkpoint data will be reliably stored. Note that this must be a fault-tolerant file system like HDFS for
  - networkStream
```
public <T> ReceiverInputDStream<T> networkStream(Receiver<T> receiver,
                                        scala.reflect.ClassTag<T> evidence$1)
```
    Create an input stream with any arbitrary user implemented receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
    
    Parameters:
    receiver - Custom implementation of Receiver
  - receiverStream
```
public <T> ReceiverInputDStream<T> receiverStream(Receiver<T> receiver,
                                         scala.reflect.ClassTag<T> evidence$2)
```
    Create an input stream with any arbitrary user implemented receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
    
    Parameters:
    receiver - Custom implementation of Receiver
  - actorStream
```
public <T> ReceiverInputDStream<T> actorStream(akka.actor.Props props,
                                      String name,
                                      StorageLevel storageLevel,
                                      akka.actor.SupervisorStrategy supervisorStrategy,
                                      scala.reflect.ClassTag<T> evidence$3)
```
    Create an input stream with any arbitrary user implemented actor receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
    
    Parameters:
    props - Props object defining creation of the actor
    name - Name of the actor
    storageLevel - RDD storage level. Defaults to memory-only.
  - socketTextStream
```
public ReceiverInputDStream<String> socketTextStream(String hostname,
                                            int port,
                                            StorageLevel storageLevel)
```
    Create a input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.
    
    Parameters:
    hostname - Hostname to connect to for receiving data
    port - Port to connect to for receiving data
    storageLevel - Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
  - socketStream
```
public <T> ReceiverInputDStream<T> socketStream(String hostname,
                                       int port,
                                       scala.Function1<java.io.InputStream,scala.collection.Iterator<T>> converter,
                                       StorageLevel storageLevel,
                                       scala.reflect.ClassTag<T> evidence$4)
```
    Create a input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.
    
    Parameters:
    hostname - Hostname to connect to for receiving data
    port - Port to connect to for receiving data
    converter - Function to convert the byte stream to objects
    storageLevel - Storage level to use for storing the received objects
  - rawSocketStream
```
public <T> ReceiverInputDStream<T> rawSocketStream(String hostname,
                                          int port,
                                          StorageLevel storageLevel,
                                          scala.reflect.ClassTag<T> evidence$5)
```
    Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
    
    Parameters:
    hostname - Hostname to connect to for receiving data
    port - Port to connect to for receiving data
    storageLevel - Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
  - fileStream
```
public <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> InputDStream<scala.Tuple2<K,V>> fileStream(String directory,
                                                                                                      scala.reflect.ClassTag<K> evidence$6,
                                                                                                      scala.reflect.ClassTag<V> evidence$7,
                                                                                                      scala.reflect.ClassTag<F> evidence$8)
```
    Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
    
    Parameters:
    directory - HDFS directory to monitor for new file
  - fileStream
```
public <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> InputDStream<scala.Tuple2<K,V>> fileStream(String directory,
                                                                                                      scala.Function1<org.apache.hadoop.fs.Path,Object> filter,
                                                                                                      boolean newFilesOnly,
                                                                                                      scala.reflect.ClassTag<K> evidence$9,
                                                                                                      scala.reflect.ClassTag<V> evidence$10,
                                                                                                      scala.reflect.ClassTag<F> evidence$11)
```
    Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system.
    
    Parameters:
    directory - HDFS directory to monitor for new file
    filter - Function to filter paths to process
    newFilesOnly - Should process only new files and ignore existing files in the directory
  - textFileStream
```
public DStream<String> textFileStream(String directory)
```
    Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.
    
    Parameters:
    directory - HDFS directory to monitor for new file
  - queueStream
```
public <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue,
                              boolean oneAtATime,
                              scala.reflect.ClassTag<T> evidence$12)
```
    Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
    
    Parameters:
    queue - Queue of RDDs
    oneAtATime - Whether only one RDD should be consumed from the queue in every interval
  - queueStream
```
public <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue,
                              boolean oneAtATime,
                              RDD<T> defaultRDD,
                              scala.reflect.ClassTag<T> evidence$13)
```
    Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
    
    Parameters:
    queue - Queue of RDDs
    oneAtATime - Whether only one RDD should be consumed from the queue in every interval
    defaultRDD - Default RDD is returned by the DStream when the queue is empty. Set as null if no RDD should be returned when empty
  - union
```
public <T> DStream<T> union(scala.collection.Seq<DStream<T>> streams,
                   scala.reflect.ClassTag<T> evidence$14)
```
    Create a unified DStream from multiple DStreams of the same type and same slide duration.
  - transform
```
public <T> DStream<T> transform(scala.collection.Seq<DStream<?>> dstreams,
                       scala.Function2<scala.collection.Seq<RDD<?>>,Time,RDD<T>> transformFunc,
                       scala.reflect.ClassTag<T> evidence$15)
```
    Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
  - addStreamingListener
```
public void addStreamingListener(StreamingListener streamingListener)
```
    Add a StreamingListener object for receiving system events related to streaming.
  - start
```
public void start()
```
    Start the execution of the streams.
  - awaitTermination
```
public void awaitTermination()
```
    Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
  - awaitTermination
```
public void awaitTermination(long timeout)
```
    Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
    
    Parameters:
    timeout - time to wait in milliseconds
  - stop
```
public void stop(boolean stopSparkContext)
```
    Stop the execution of the streams immediately (does not wait for all received data to be processed).
    
    Parameters:
    stopSparkContext - Stop the associated SparkContext or not
  - stop
```
public void stop(boolean stopSparkContext,
        boolean stopGracefully)
```
    Stop the execution of the streams, with option of ensuring all received data has been processed.
    
    Parameters:
    stopSparkContext - Stop the associated SparkContext or not
    stopGracefully - Stop gracefully by waiting for the processing of all received data to be completed

Class StreamingContext

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

StreamingContext

StreamingContext

StreamingContext

StreamingContext

Method Detail

DEFAULT_CLEANER_TTL

toPairDStreamFunctions

getOrCreate

jarOfClass

isCheckpointPresent

sc

conf

env

graph

checkpointDir

checkpointDuration

scheduler

waiter

uiTab

StreamingContextState

state

sparkContext

remember

checkpoint

networkStream

receiverStream

actorStream

socketTextStream

socketStream

rawSocketStream

fileStream

fileStream

textFileStream

queueStream

queueStream

union

transform

addStreamingListener

start

awaitTermination

awaitTermination

stop

stop