pyspark.pandas.read_json#
- pyspark.pandas.read_json(path, lines=True, index_col=None, **options)[source]#
Convert a JSON string to DataFrame.
- Parameters
- pathstring
File path
- linesbool, default True
Read the file as a JSON object per line. It should be always True for now.
- index_colstr or list of str, optional, default: None
Index column of table in Spark.
- optionsdict
All other options passed directly into Spark’s data source.
Examples
>>> df = ps.DataFrame([['a', 'b'], ['c', 'd']], ... columns=['col 1', 'col 2'])
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1) >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1, lineSep='___') >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path, lineSep='___' ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
You can preserve the index in the roundtrip as below.
>>> df.to_json(path=r'%s/read_json/bar.json' % path, num_files=1, index_col="index") >>> ps.read_json( ... path=r'%s/read_json/bar.json' % path, index_col="index" ... ).sort_values(by="col 1") col 1 col 2 index 0 a b 1 c d