@Experimental
public interface RowLevelOperation
Modifier and Type | Interface and Description |
---|---|
static class |
RowLevelOperation.Command
A row-level SQL command.
|
Modifier and Type | Method and Description |
---|---|
RowLevelOperation.Command |
command()
Returns the SQL command that is being performed.
|
default String |
description()
Returns the description associated with this row-level operation.
|
ScanBuilder |
newScanBuilder(CaseInsensitiveStringMap options)
Returns a
ScanBuilder to configure a Scan for this row-level operation. |
WriteBuilder |
newWriteBuilder(LogicalWriteInfo info)
Returns a
WriteBuilder to configure a Write for this row-level operation. |
default NamedReference[] |
requiredMetadataAttributes()
Returns metadata attributes that are required to perform this row-level operation.
|
default String description()
RowLevelOperation.Command command()
ScanBuilder newScanBuilder(CaseInsensitiveStringMap options)
ScanBuilder
to configure a Scan
for this row-level operation.
Data sources fall into two categories: those that can handle a delta of rows and those that need to replace groups (e.g. partitions, files). Data sources that handle deltas allow Spark to quickly discard unchanged rows and have no requirements for input scans. Data sources that replace groups of rows can discard deleted rows but need to keep unchanged rows to be passed back into the source. This means that scans for such data sources must produce all rows in a group if any are returned. Some data sources will avoid pushing filters into files (file granularity), while others will avoid pruning files within a partition (partition granularity).
For example, if a data source can only replace partitions, all rows from a partition must be returned by the scan, even if a filter can narrow the set of changes to a single file in the partition. Similarly, a data source that can swap individual files must produce all rows from files where at least one record must be changed, not just rows that must be changed.
WriteBuilder newWriteBuilder(LogicalWriteInfo info)
WriteBuilder
to configure a Write
for this row-level operation.
Note that Spark will first configure the scan and then the write, allowing data sources to pass information from the scan to the write. For example, the scan can report which condition was used to read the data that may be needed by the write under certain isolation levels. Implementations may capture the built scan or required scan information and then use it while building the write.
default NamedReference[] requiredMetadataAttributes()
Data sources that can use this method to project metadata columns needed for writing the data back (e.g. metadata columns for grouping data).