io - org.apache.spark.internal.io

abstract class FileCommitProtocol extends AnyRef

An interface to define how a single Spark job commits its outputs.
An interface to define how a single Spark job commits its outputs. Two notes:
1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with either 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, isAppend: Boolean). 3. A committer should not be reused across multiple Spark jobs.
The proper call sequence is:
1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.
class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging

An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.

io