Skip to content

Conversation

@SoulSniper1212
Copy link

Overview

This pull request fixes an issue (#63078) where pandas DataFrames with datetime64[ns] in MultiIndex would fail when processed using joblib or other multiprocessing libraries. The error occurred in the NDArrayBacked.__setstate__ method during unpickling, where unexpected state formats from multiprocessing contexts would trigger a NotImplementedError.

Checklist

  • Code changes: Modified pandas/_libs/arrays.pyx to handle additional state formats in NDArrayBacked.__setstate__
  • Tests: Added comprehensive tests in test_fix.py to verify the fix works for the reported scenario
  • Documentation: Not required as this is a bug fix that maintains existing functionality

Proof

The fix addresses the issue by adding handling for:

  1. 2-element states that may have different tuple structures in multiprocessing
  2. 3-element states where the third element is a (dtype, array) tuple instead of an attributes dict
  3. Other unexpected state formats that previously raised NotImplementedError

The test script test_fix.py demonstrates that the fix resolves the issue by:

  1. Testing the specific problematic state format directly
  2. Reproducing the original scenario with datetime64[ns] MultiIndex
  3. Confirming that pickle/unpickle operations work correctly after the fix

The changes maintain backward compatibility while adding robustness to handle multiprocessing-related pickling variations.

Closes #63078

…e64[ns] MultiIndex

Signed-off-by: SoulSniper1212 <warush23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: when np.datetime64[ns] is a type in a MultiIndex, "NotImplementedError" when trying to return the df from a joblib.delayed

1 participant