pyspark.sql.datasource.DataSourceStreamReader#

class pyspark.sql.datasource.DataSourceStreamReader[source]#

A base class for streaming data source readers. Data source stream readers are responsible for outputting data from a streaming data source.

Methods

commit(end)

Informs the source that Spark has completed processing all data for offsets less than or equal to end and will only request offsets greater than end in the future.

initialOffset()

Return the initial offset of the streaming data source.

latestOffset()

Returns the most recent offset available.

partitions(start, end)

Returns a list of InputPartition given the start and end offsets.

read(partition)

Generates data for a given partition and returns an iterator of tuples or rows.

stop()

Stop this source and free any resources it has allocated.