Package

org.apache.spark.internal

io

Permalink

package io

Visibility
  1. Public
  2. All

Type Members

  1. abstract class FileCommitProtocol extends AnyRef

    Permalink

    An interface to define how a single Spark job commits its outputs.

    An interface to define how a single Spark job commits its outputs. Three notes:

    1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, dynamicPartitionOverwrite: Boolean) 3. A committer should not be reused across multiple Spark jobs.

    The proper call sequence is:

    1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.

  2. class HadoopMapRedCommitProtocol extends HadoopMapReduceCommitProtocol

    Permalink

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the old mapred API).

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the old mapred API).

    Unlike Hadoop's OutputCommitter, this implementation is serializable.

  3. class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging

    Permalink

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).

    Unlike Hadoop's OutputCommitter, this implementation is serializable.

  4. abstract class HadoopWriteConfigUtil[K, V] extends Serializable

    Permalink

    Interface for create output format/committer/writer used during saving an RDD using a Hadoop OutputFormat (both from the old mapred API and the new mapreduce API)

    Interface for create output format/committer/writer used during saving an RDD using a Hadoop OutputFormat (both from the old mapred API and the new mapreduce API)

    Notes: 1. Implementations should throw IllegalArgumentException when wrong hadoop API is referenced; 2. Implementations must be serializable, as the instance instantiated on the driver will be used for tasks on executors; 3. Implementations should have a constructor with exactly one argument: (conf: SerializableConfiguration) or (conf: SerializableJobConf).

Value Members

  1. object FileCommitProtocol

    Permalink

Ungrouped