-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: fix error when unstacking with sort=False when indexes contains NA
#62334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix bux when indexes contains `nan` and is not sorting would raise an `IndexError` or `ValueError`.
Use `np.unique` instead of `unique`.
63aeec0 to
55117be
Compare
55117be to
6f98429
Compare
| "levels2, expected_columns, expected_data", | ||
| [ | ||
| ( | ||
| Index([None, 1, 2, 3]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you make the Index call in the body of the test since it's the same across all parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parametrized argument change would be
- Index([None, 1, 2, 3])
+ [None, 1, 2, 3]or do you expect something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After applying the suggested change I had to alter the expected_columns values, the columns from .unstack() became integers instead of floats.
I also removed expected_data from the parametrization and put it into the test body, as it was the same for all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I ended up not calling Index in the test, that's why it is no longer float. But I can still reproduce the error that I was seeing in the issue with these new tests:
❯ git restore --source=main pandas/core/reshape/reshape.py
❯ pytest pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan
...
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=first] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=second] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=third] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=last] - IndexError: index 8 is out of bounds for axis 0 with size 8|
For the sake of completeness, here is the result of the reproduction from #61221 with this patch: import pandas as pd
levels1 = ['b', 'a']
levels2 = pd.Index([1, 2, 3, pd.NA], dtype=pd.Int64Dtype())
index = pd.MultiIndex.from_product([levels1, levels2], names=['level1', 'level2'])
df = pd.DataFrame(dict(value=range(len(index))), index=index)
print(df)
# value
# level1 level2
# b 1 0
# 2 1
# 3 2
# <NA> 3
# a 1 4
# 2 5
# 3 6
# <NA> 7
print(df.unstack(level='level2'))
# value
# level2 <NA> 1 2 3
# level1
# a 7 4 5 6
# b 3 0 1 2
print(df.unstack(level='level2', sort=False))
# value
# level2 1 2 3 <NA>
# level1
# b 0 1 2 3
# a 4 5 6 7 |
|
Great! Thanks @Alvaro-Kothe |
unstack(sort=False)and NA in index #61221 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.This pull requests makes the
unstackmethod differentiate how it managesNAwhen not sorting. The main problem was thatNAvalues when not sorting didn't have-1value, resulting inIndexErroronmask.put.The main change is that it keeps track on where
NAshould be and assign it before creating the multiindex in the_repeatermethod.