Skip to content

Conversation

@s-kganz
Copy link

@s-kganz s-kganz commented Nov 8, 2025

Description of proposed changes

Fixes #184. In the current version batch dimensions that are not a multiple of input dimensions result in empty or partial batches. The empty batches cause an error when accessed. For example:

import xarray as xr
import xbatcher as xb
import numpy as np 

ds = xr.DataArray(np.random.rand(30,30), name = 'var', dims = ['lat', 'lon'])
xbatcher = xb.BatchGenerator(
    ds=ds,
    input_dims={"lat": 6, 'lon': 6},
    batch_dims={"lat": 7, 'lon': 7},
    concat_input_dims=True,
)
print("Len :",len(xbatcher))
print(xbatcher[3])
ValueError: min() arg is an empty sequence

This occurs because _gen_batch_numbers assumes the size of a batch on a dimension is equal to the entry in batch_dims, which isn't the case when batch_size % input_size != 0 and there is no overlap. This scenario wasn't in any of the tests, so I'm not sure what intended behavior is here. Imo this implies a partial patch and should raise an error. This PR adds a check that batch_size % input_size == 0 for all duplicate dims, and raises the below ValueError if otherwise.

ValueError: Input and batch dimension sizes imply partial batches on dimension lat. Input size: 6; Batch size: 7

@s-kganz s-kganz requested review from ianhi and synflyn28 November 8, 2025 00:02
@codecov
Copy link

codecov bot commented Nov 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.95%. Comparing base (43c9135) to head (17d92c7).
⚠️ Report is 34 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
+ Coverage   96.25%   96.95%   +0.70%     
==========================================
  Files           6        6              
  Lines         347      394      +47     
  Branches       82       59      -23     
==========================================
+ Hits          334      382      +48     
+ Misses          8        6       -2     
- Partials        5        6       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@s-kganz s-kganz marked this pull request as draft November 8, 2025 00:10
@s-kganz
Copy link
Author

s-kganz commented Nov 8, 2025

The proposed change doesn't account for overlaps, I'll reopen once I get that part working.

Copy link

@synflyn28 synflyn28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks clean and readable to me. I have nothing else to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accessing some batches raises an error

2 participants