pyspark.pandas.DataFrame.spark.frame#
- spark.frame(index_col=None)#
Return the current DataFrame as a Spark DataFrame.
DataFrame.spark.frame()
is an alias ofDataFrame.to_spark()
.- Parameters
- index_col: str or list of str, optional, default: None
Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost.
See also
DataFrame.to_spark
DataFrame.pandas_api
DataFrame.spark.frame
Examples
By default, this method loses the index as below.
>>> df = ps.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) >>> df.to_spark().show() +---+---+---+ | a| b| c| +---+---+---+ | 1| 4| 7| | 2| 5| 8| | 3| 6| 9| +---+---+---+
>>> df = ps.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) >>> df.spark.frame().show() +---+---+---+ | a| b| c| +---+---+---+ | 1| 4| 7| | 2| 5| 8| | 3| 6| 9| +---+---+---+
If index_col is set, it keeps the index column as specified.
>>> df.to_spark(index_col="index").show() +-----+---+---+---+ |index| a| b| c| +-----+---+---+---+ | 0| 1| 4| 7| | 1| 2| 5| 8| | 2| 3| 6| 9| +-----+---+---+---+
Keeping an index column is useful when you want to call some Spark APIs and convert it back to pandas-on-Spark DataFrame without creating a default index, which can affect performance.
>>> spark_df = df.to_spark(index_col="index") >>> spark_df = spark_df.filter("a == 2") >>> spark_df.pandas_api(index_col="index") a b c index 1 2 5 8
In case of multi-index, specify a list to index_col.
>>> new_df = df.set_index("a", append=True) >>> new_spark_df = new_df.to_spark(index_col=["index_1", "index_2"]) >>> new_spark_df.show() +-------+-------+---+---+ |index_1|index_2| b| c| +-------+-------+---+---+ | 0| 1| 4| 7| | 1| 2| 5| 8| | 2| 3| 6| 9| +-------+-------+---+---+
Can be converted back to pandas-on-Spark DataFrame.
>>> new_spark_df.pandas_api( ... index_col=["index_1", "index_2"]) b c index_1 index_2 0 1 4 7 1 2 5 8 2 3 6 9