pyspark.sql.DataFrame.cache

DataFrame.cache() → pyspark.sql.dataframe.DataFrame[source]

Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER).

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns
DataFrame

Cached DataFrame.

Notes

The default storage level has changed to MEMORY_AND_DISK_DESER to match Scala in 3.0.

Examples

>>> df = spark.range(1)
>>> df.cache()
DataFrame[id: bigint]
>>> df.explain()  
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- InMemoryTableScan ...