pyspark.sql.datasource.DataSource#

class pyspark.sql.datasource.DataSource(options)[source]#

A base class for data sources.

This class represents a custom data source that allows for reading from and/or writing to it. The data source provides methods to create readers and writers for reading and writing data, respectively. At least one of the methods DataSource.reader() or DataSource.writer() must be implemented by any subclass to make the data source either readable or writable (or both).

After implementing this interface, you can start to load your data source using spark.read.format(...).load() and save data using df.write.format(...).save().

Methods

name()

Returns a string represents the format name of this data source.

reader(schema)

Returns a DataSourceReader instance for reading data.

schema()

Returns the schema of the data source.

simpleStreamReader(schema)

Returns a SimpleDataSourceStreamReader instance for reading data.

streamReader(schema)

Returns a DataSourceStreamReader instance for reading streaming data.

streamWriter(schema, overwrite)

Returns a DataSourceStreamWriter instance for writing data into a streaming sink.

writer(schema, overwrite)

Returns a DataSourceWriter instance for writing data.