DataFrameWriter.
bucketBy
Buckets the output by the given columns.If specified, the output is laid out on the file system similar to Hive’s bucketing scheme.
New in version 2.3.0.
the number of buckets to save
a name of a column, or a list of names.
additional names (optional). If col is a list it should be empty.
Notes
Applicable for file-based data sources in combination with DataFrameWriter.saveAsTable().
DataFrameWriter.saveAsTable()
Examples
>>> (df.write.format('parquet') ... .bucketBy(100, 'year', 'month') ... .mode("overwrite") ... .saveAsTable('bucketed_table'))