pyspark.pandas.json_normalize#

pyspark.pandas.json_normalize(data, sep='.')[source]#

Normalize semi-structured JSON data into a flat table.

New in version 4.0.0.

Parameters

datadict or list of dicts: Unserialized JSON objects.
sepstr, default ‘.’: Nested records will generate names separated by sep.

Returns

DataFrame

See also

DataFrame.to_json: Convert the pandas-on-Spark DataFrame to a JSON string.

Examples

>>> data = [
...     {"id": 1, "name": "Alice", "address": {"city": "NYC", "zipcode": "10001"}},
...     {"id": 2, "name": "Bob", "address": {"city": "SF", "zipcode": "94105"}},
... ]
>>> ps.json_normalize(data)
   id   name address.city address.zipcode
0   1  Alice          NYC           10001
1   2    Bob           SF           94105