An interface to define how a single Spark job commits its outputs.
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the old mapred API).
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.
Interface for create output format/committer/writer used during saving an RDD using a Hadoop OutputFormat (both from the old mapred API and the new mapreduce API)
Interface for create output format/committer/writer used during saving an RDD using a Hadoop OutputFormat (both from the old mapred API and the new mapreduce API)
Notes: 1. Implementations should throw IllegalArgumentException when wrong hadoop API is referenced; 2. Implementations must be serializable, as the instance instantiated on the driver will be used for tasks on executors; 3. Implementations should have a constructor with exactly one argument: (conf: SerializableConfiguration) or (conf: SerializableJobConf).
An interface to define how a single Spark job commits its outputs. Three notes:
1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, dynamicPartitionOverwrite: Boolean) 3. A committer should not be reused across multiple Spark jobs.
The proper call sequence is:
1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.