@Private
public interface ShuffleMapOutputWriter
Modifier and Type | Method and Description |
---|---|
void |
abort(Throwable error)
Abort all of the writes done by any writers returned by
getPartitionWriter(int) . |
MapOutputCommitMessage |
commitAllPartitions(long[] checksums)
Commits the writes done by all partition writers returned by all calls to this object's
getPartitionWriter(int) , and returns the number of bytes written for each
partition. |
ShufflePartitionWriter |
getPartitionWriter(int reducePartitionId)
Creates a writer that can open an output stream to persist bytes targeted for a given reduce
partition id.
|
ShufflePartitionWriter getPartitionWriter(int reducePartitionId) throws java.io.IOException
The chunk corresponds to bytes in the given reduce partition. This will not be called twice
for the same partition within any given map task. The partition identifier will be in the
range of precisely 0 (inclusive) to numPartitions (exclusive), where numPartitions was
provided upon the creation of this map output writer via
ShuffleExecutorComponents.createMapOutputWriter(int, long, int)
.
Calls to this method will be invoked with monotonically increasing reducePartitionIds; each call to this method will be called with a reducePartitionId that is strictly greater than the reducePartitionIds given to any previous call to this method. This method is not guaranteed to be called for every partition id in the above described range. In particular, no guarantees are made as to whether or not this method will be called for empty partitions.
java.io.IOException
MapOutputCommitMessage commitAllPartitions(long[] checksums) throws java.io.IOException
getPartitionWriter(int)
, and returns the number of bytes written for each
partition.
This should ensure that the writes conducted by this module's partition writers are
available to downstream reduce tasks. If this method throws any exception, this module's
abort(Throwable)
method will be invoked before propagating the exception.
Shuffle extensions which care about the cause of shuffle data corruption should store the checksums properly. When corruption happens, Spark would provide the checksum of the fetched partition to the shuffle extension to help diagnose the cause of corruption.
This can also close any resources and clean up temporary state if necessary.
The returned commit message is a structure with two components:
1) An array of longs, which should contain, for each partition from (0) to (numPartitions - 1), the number of bytes written by the partition writer for that partition id.
2) An optional metadata blob that can be used by shuffle readers.
checksums
- The checksum values for each partition (where checksum index is equivalent to
partition id) if shuffle checksum enabled. Otherwise, it's empty.java.io.IOException
void abort(Throwable error) throws java.io.IOException
getPartitionWriter(int)
.
This should invalidate the results of writing bytes. This can also close any resources and clean up temporary state if necessary.
java.io.IOException