Skip to content

Commit 4748ac4

Browse files
BUG: Handle non-dict items in json_normalize with max_level
1 parent ea75dd7 commit 4748ac4

File tree

3 files changed

+18
-1
lines changed

3 files changed

+18
-1
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1105,6 +1105,7 @@ I/O
11051105
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
11061106
- Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
11071107
- Bug in :meth:`HDFStore.select` causing queries on categorical string columns to return unexpected results (:issue:`57608`)
1108+
- Bug in :func:`pandas.json_normalize` raising ``AttributeError`` when ``max_level`` was set and the input data contained ``NaN`` values (:issue:`62829`)
11081109
- Bug in :meth:`MultiIndex.factorize` incorrectly raising on length-0 indexes (:issue:`57517`)
11091110
- Bug in :meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
11101111
- Bug in :meth:`read_csv` for the ``c`` and ``python`` engines where parsing numbers with large exponents caused overflows. Now, numbers with large positive exponents are parsed as ``inf`` or ``-inf`` depending on the sign of the mantissa, while those with large negative exponents are parsed as ``0.0`` (:issue:`62617`, :issue:`38794`, :issue:`62740`)

pandas/io/json/_normalize.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ def nested_to_record(
117117
singleton = True
118118
new_ds = []
119119
for d in ds:
120+
if not isinstance(d, dict):
121+
new_ds.append({})
122+
continue
120123
new_d = copy.deepcopy(d)
121124
for k, v in d.items():
122125
# each key gets renamed with prefix
@@ -517,7 +520,7 @@ def _pull_records(js: dict[str, Any], spec: list | str) -> list:
517520
return DataFrame(_simple_json_normalize(data, sep=sep), index=index)
518521

519522
if record_path is None:
520-
if any([isinstance(x, dict) for x in y.values()] for y in data):
523+
if any(isinstance(y, dict) for y in data):
521524
# naive normalization, this is idempotent for flat records
522525
# and potentially will inflate the data considerably for
523526
# deeply nested structures:

pandas/tests/io/json/test_normalize.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -511,6 +511,19 @@ def test_max_level_with_records_path(self, max_level, expected):
511511
expected_df = DataFrame(data=expected, columns=result.columns.values)
512512
tm.assert_equal(expected_df, result)
513513

514+
def test_json_normalize_max_level_with_nan(self):
515+
# GH 62829 - test for bug where max_level=0 fails with nan in input list
516+
d = {
517+
1: {"id": 10, "status": "AVAL"},
518+
2: {"id": 30, "status": "AVAL", "items": {"id": 12, "size": 20}},
519+
3: {"id": 50, "status": "AVAL", "items": {"id": 13, "size": 30}},
520+
}
521+
df = DataFrame.from_dict(d, orient="index")
522+
data_list = df["items"].tolist()
523+
expected = DataFrame({"id": [np.nan, 12.0, 13.0], "size": [np.nan, 20.0, 30.0]})
524+
result = json_normalize(data_list, max_level=0)
525+
tm.assert_frame_equal(result, expected)
526+
514527
def test_nested_flattening_consistent(self):
515528
# see gh-21537
516529
df1 = json_normalize([{"A": {"B": 1}}])

0 commit comments

Comments
 (0)