HadoopMapReduceCommitProtocol

Instance Constructors

new HadoopMapReduceCommitProtocol(jobId: String, path: String, dynamicPartitionOverwrite: Boolean = false)

jobId
the job's or stage's id
path
the job's output path, or null if committer acts as a noop
dynamicPartitionOverwrite
If true, Spark will overwrite partition directories at runtime dynamically, i.e., we first write files under a staging directory with partition path, e.g. /path/to/staging/a=1/b=1/xxx.parquet. When committing the job, we first clean up the corresponding partition directories at destination path, e.g. /path/to/destination/a=1/b=1, and move files from staging directory to the corresponding partition directories under destination path.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def abortJob(jobContext: JobContext): Unit

Aborts a job after the writes fail.
Aborts a job after the writes fail. Must be called on the driver.
Calling this function is a best-effort attempt, because it is possible that the driver just crashes (or killed) before it can call abort.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
def abortTask(taskContext: TaskAttemptContext): Unit

Aborts a task after the writes have failed.
Aborts a task after the writes have failed. Must be called on the executors when running tasks.
Calling this function is a best-effort attempt, because it is possible that the executor just crashes (or killed) before it can call abort.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit

Commits a job after the writes succeed.
Commits a job after the writes succeed. Must be called on the driver.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
def commitTask(taskContext: TaskAttemptContext): TaskCommitMessage

Commits a task after the writes succeed.
Commits a task after the writes succeed. Must be called on the executors when running tasks.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean

Specifies that a file should be deleted with the commit of this job.
Specifies that a file should be deleted with the commit of this job. The default implementation deletes the file immediately.

Definition Classes
FileCommitProtocol
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String

Notifies the commit protocol to add a new file, and gets back the full path that should be used.
Notifies the commit protocol to add a new file, and gets back the full path that should be used. Must be called on the executors when running tasks.
Note that the returned temp file may have an arbitrary path. The commit protocol only promises that the file will be at the location specified by the arguments after job commit.
A full file path consists of the following parts:
1. the base path 2. some sub-directory within the base path, used to specify partitioning 3. file prefix, usually some unique job id with the task id 4. bucket id 5. source specific file extension, e.g. ".snappy.parquet"
The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest are left to the commit protocol implementation to decide.
Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
def newTaskTempFileAbsPath(taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String

Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
Similar to newTaskTempFile(), but allows files to committed to an absolute output location. Depending on the implementation, there may be weaker guarantees around adding files this way.
Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def onTaskCommit(taskCommit: TaskCommitMessage): Unit

Called on the driver after a task commits.
Called on the driver after a task commits. This can be used to access task commit messages before the job has finished. These same task commit messages will be passed to commitJob() if the entire job succeeds.

Definition Classes
FileCommitProtocol
def setupCommitter(context: TaskAttemptContext): OutputCommitter

Attributes
protected
def setupJob(jobContext: JobContext): Unit

Setups up a job.
Setups up a job. Must be called on the driver before any other methods can be invoked.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
def setupTask(taskContext: TaskAttemptContext): Unit

Sets up a task within a job.
Sets up a task within a job. Must be called before any other task related methods can be invoked.

Definition Classes
HadoopMapReduceCommitProtocol → FileCommitProtocol
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package io

class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging

Instance Constructors

new HadoopMapReduceCommitProtocol(jobId: String, path: String, dynamicPartitionOverwrite: Boolean = false)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def abortJob(jobContext: JobContext): Unit

def abortTask(taskContext: TaskAttemptContext): Unit

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit

def commitTask(taskContext: TaskAttemptContext): TaskCommitMessage

def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

final def isInstanceOf[T0]: Boolean

def isTraceEnabled(): Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logName: String

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String

def newTaskTempFileAbsPath(taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String

final def notify(): Unit

final def notifyAll(): Unit

def onTaskCommit(taskCommit: TaskCommitMessage): Unit

def setupCommitter(context: TaskAttemptContext): OutputCommitter

def setupJob(jobContext: JobContext): Unit

def setupTask(taskContext: TaskAttemptContext): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from FileCommitProtocol

Inherited from AnyRef

Inherited from Any

Ungrouped