pyspark.sql.datasource.DataSourceStreamReader.read#
- abstract DataSourceStreamReader.read(partition)[source]#
Generates data for a given partition and returns an iterator of tuples or rows.
This method is invoked once per partition to read the data. Implementing this method is required for stream reader. You can initialize any non-serializable resources required for reading data from the data source within this method.
- Parameters
- partition
InputPartition
The partition to read. It must be one of the partition values returned by
DataSourceStreamReader.partitions()
.
- partition
- Returns
- iterator of tuples or
Row
s An iterator of tuples or rows. Each tuple or row will be converted to a row in the final DataFrame.
- iterator of tuples or
Notes
This method is static and stateless. You shouldn’t access mutable class member or keep in memory state between different invocations of read().