Skip to content

Conversation

@eoincondron
Copy link

Ref #62449

The performance of array reductions in nanops/bottleneck can be significantly improved upon for large data using numba. The improvements are due to two factors:

  • single-pass algorithms when null values are present and avoiding any copies.
  • multi-threading over chunked of array or over an axis in a single axis reduction.

Although the added code is fairly complex, it provided a central, unified piece of code built from scratch covering the different reductions across different data types, array classes, skipna toggle, masked arrays etc. , potentially replacing code existing across multiple modules and, in the case of bottleneck, code that lives in a different repository.
It currently covers nan(sum|mean|min|max|var|std|sem) and should be easily extensible. I am seeking code review before bottoming it out completely, so as not to waste effort.

This screenshot demonstrates a potential 4x improvement on a DataFrame of 10-million rows and 5 columns of various types.

image

I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation

The hardware is a new MacBook Pro with 8 cores.

The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.

Similarly to bottleneck, these codepaths can be toggled on and off.

@eoincondron eoincondron force-pushed the nanops-numba-implementation branch 3 times, most recently from 66f52c7 to babae54 Compare October 3, 2025 07:32
@eoincondron eoincondron force-pushed the nanops-numba-implementation branch from babae54 to 03168ed Compare October 21, 2025 09:43
eoincondron and others added 6 commits November 5, 2025 20:31
…s_numba

- Tests all 9 private methods prefixed with underscore
- 37 test cases organized in 8 test classes
- Comprehensive coverage of Numba-accelerated reduction operations
- Tests edge cases: NaN handling, empty arrays, masks, different dtypes
- Uses pytest fixtures and parameterization to avoid code duplication
- Tests NumbaList usage for parallel processing
- Uses pandas._testing for consistent assertion helpers
- All tests pass in pandas-dev environment

Functions tested:
- _get_initial_value: Finding first valid values in arrays
- _nb_reduce_single_arr: Single array reduction operations
- _nullify_below_mincount: Minimum count validation
- _reduce_empty_array: Empty array handling
- _chunk_arr_into_arr_list: Array chunking for parallel processing
- _nb_reduce_arr_list_in_parallel: Parallel reduction operations
- _reduce_chunked_results: Combining chunked results
- _cast_to_timelike: DateTime/timedelta type casting
- _nanvar_std_sem: Variance/standard deviation/standard error calculations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@eoincondron eoincondron force-pushed the nanops-numba-implementation branch from 03168ed to 9c7f945 Compare November 5, 2025 20:31
@fangchenli fangchenli added the numba numba-accelerated operations label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

numba numba-accelerated operations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants