pyspark.pandas.read_orc#

pyspark.pandas.read_orc(path, columns=None, index_col=None, **options)[source]#

Load an ORC object from the file path, returning a DataFrame.

Parameters

pathstr: The path string storing the ORC file to be read.
columnslist, default None: If not None, only these columns will be read from the file.
index_colstr or list of str, optional, default: None: Index column of table in Spark.
optionsdict: All other options passed directly into Spark’s data source.

Returns

DataFrame

Examples

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path)
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'])
   id
0   0

You can preserve the index in the roundtrip as below.

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path, index_col="index")
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'], index_col="index")
... 
       id
index
0       0