pyspark.sql.DataFrameWriter.bucketBy¶

DataFrameWriter.bucketBy(numBuckets, col, *cols)[source]¶

Buckets the output by the given columns.If specified, the output is laid out on the file system similar to Hive’s bucketing scheme.

New in version 2.3.0.

Parameters

Notes

Applicable for file-based data sources in combination with DataFrameWriter.saveAsTable().

Examples

>>> (df.write.format('parquet')  
...     .bucketBy(100, 'year', 'month')
...     .mode("overwrite")
...     .saveAsTable('bucketed_table'))