An interface to define how a single Spark job commits its outputs.
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.
An interface to define how a single Spark job commits its outputs. Two notes:
1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with either 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, isAppend: Boolean). 3. A committer should not be reused across multiple Spark jobs.
The proper call sequence is:
1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.