pyspark.pandas.read_json¶
-
pyspark.pandas.
read_json
(path: str, lines: bool = True, index_col: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]¶ Convert a JSON string to DataFrame.
- Parameters
- pathstring
File path
- linesbool, default True
Read the file as a JSON object per line. It should be always True for now.
- index_colstr or list of str, optional, default: None
Index column of table in Spark.
- optionsdict
All other options passed directly into Spark’s data source.
Examples
>>> df = ps.DataFrame([['a', 'b'], ['c', 'd']], ... columns=['col 1', 'col 2'])
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1) >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1, lineSep='___') >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path, lineSep='___' ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
You can preserve the index in the roundtrip as below.
>>> df.to_json(path=r'%s/read_json/bar.json' % path, num_files=1, index_col="index") >>> ps.read_json( ... path=r'%s/read_json/bar.json' % path, index_col="index" ... ).sort_values(by="col 1") col 1 col 2 index 0 a b 1 c d