Nanops numba implementation #62536

eoincondron · 2025-10-01T17:32:47Z

The performance of array reductions in nanops/bottleneck can be significantly improved upon for large data using numba. The improvements are due to two factors:

single-pass algorithms when null values are present and avoiding any copies.
multi-threading over chunked of array or over an axis in a single axis reduction.

Although the added code is fairly complex, it provided a central, unified piece of code built from scratch covering the different reductions across different data types, array classes, skipna toggle, masked arrays etc. , potentially replacing code existing across multiple modules and, in the case of bottleneck, code that lives in a different repository.
It currently covers nan(sum|mean|min|max|var|std|sem) and should be easily extensible. I am seeking code review before bottoming it out completely, so as not to waste effort.

This screenshot demonstrates a potential 4x improvement on a DataFrame of 10-million rows and 5 columns of various types.

I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation

The hardware is a new MacBook Pro with 8 cores.

The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.

Similarly to bottleneck, these codepaths can be toggled on and off.

…g parallel chunking for large arrays

…neck switches

…s_numba - Tests all 9 private methods prefixed with underscore - 37 test cases organized in 8 test classes - Comprehensive coverage of Numba-accelerated reduction operations - Tests edge cases: NaN handling, empty arrays, masks, different dtypes - Uses pytest fixtures and parameterization to avoid code duplication - Tests NumbaList usage for parallel processing - Uses pandas._testing for consistent assertion helpers - All tests pass in pandas-dev environment Functions tested: - _get_initial_value: Finding first valid values in arrays - _nb_reduce_single_arr: Single array reduction operations - _nullify_below_mincount: Minimum count validation - _reduce_empty_array: Empty array handling - _chunk_arr_into_arr_list: Array chunking for parallel processing - _nb_reduce_arr_list_in_parallel: Parallel reduction operations - _reduce_chunked_results: Combining chunked results - _cast_to_timelike: DateTime/timedelta type casting - _nanvar_std_sem: Variance/standard deviation/standard error calculations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

eoincondron force-pushed the nanops-numba-implementation branch 3 times, most recently from 66f52c7 to babae54 Compare October 3, 2025 07:32

eoincondron force-pushed the nanops-numba-implementation branch from babae54 to 03168ed Compare October 21, 2025 09:43

eoincondron and others added 6 commits November 5, 2025 20:31

Add nanops_numba module with numba implementations of nanops includin…

f416e34

…g parallel chunking for large arrays

Add a numba_switch decorator to nanops and replace most of the bottle…

b896aae

…neck switches

Use isclose instead of exact equality when testing sum of large Series

7a59d2c

(minor fixes to examples in docstrings)

a862512

Add nanops_numba to mypy overrides

9c7f945

eoincondron force-pushed the nanops-numba-implementation branch from 03168ed to 9c7f945 Compare November 5, 2025 20:31

fangchenli added the numba numba-accelerated operations label Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Nanops numba implementation #62536

Nanops numba implementation #62536

Uh oh!

eoincondron commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Nanops numba implementation #62536

Are you sure you want to change the base?

Nanops numba implementation #62536

Uh oh!

Conversation

eoincondron commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants