-
Notifications
You must be signed in to change notification settings - Fork 24
Fix: fix AtomicData.py mask handling and ensure proper numpy arrays #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: fix AtomicData.py mask handling and ensure proper numpy arrays #286
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughReworked the i == j branch in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
dptb/data/AtomicData.py (2)
1013-1027: Numpy-based masking for the i == j case looks correct; consider eliminating remaining torch/numpy mixingThe new
mask_np = mask.numpy()path and using it to indexfirst_idex,second_idex, andshifts, plusnp.atleast_1d(...), should fix the earlier incompatibility when indexing numpy arrays with a torch.bool mask and protect against scalar/0‑D returns in edge cases. The loop overo_first_idex/o_shiftkeeps the previous behavior.One minor clean‑up you might consider later:
mask[first_idex == second_idex] = Falsestill indexes a torch tensor with a numpy boolean array. It works today but slightly undermines the goal of avoiding cross‑framework indexing. You could instead keep the wholei == jmasking logic in numpy (e.g., havemaskas a numpybool_array from the start, only converting to torch right before building tensors).
1052-1056: Avoid double conversion and keepedge_indexon the same device asposUsing
mask_np = mask.numpy()and thenfirst_idex = torch.as_tensor(first_idex[mask_np], dtype=torch.long, device=out_device) second_idex = torch.as_tensor(second_idex[mask_np], dtype=torch.long, device=out_device) shifts = torch.as_tensor(shifts[mask_np], dtype=out_dtype, device=out_device)is aligned with the goal of returning proper tensors on
out_device. However,edge_indexis then built withedge_index = torch.vstack( (torch.LongTensor(first_idex), torch.LongTensor(second_idex)) )which forces
edge_indexback to CPU and incurs an extra copy ifout_deviceis not CPU, whileshiftsandcell_tensorstay onout_device. This also contradicts the docstring claim that all outputs share the same device aspos.You could instead reuse the already‑typed tensors and preserve the device:
edge_index = torch.stack((first_idex, second_idex), dim=0)and, if you adopt this, you no longer need the
torch.LongTensor(...)conversions.Also applies to: 1064-1066
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dptb/data/AtomicData.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dptb/data/AtomicData.py (2)
dptb/data/dataset/lmdb_dataset.py (1)
len(79-80)dptb/data/dataset/_base_datasets.py (1)
len(170-173)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes mask handling issues in the neighbor_list_and_relative_vec function in AtomicData.py, particularly for isolated systems when self_interaction=False. The changes address compatibility issues between PyTorch and NumPy array indexing, and also fix a bug where torch.LongTensor() was incorrectly called with a device parameter.
Key changes:
- Convert torch boolean masks to numpy arrays before indexing numpy arrays to avoid compatibility issues
- Add
np.atleast_1d()calls to ensure arrays are not scalars in edge cases with isolated systems - Replace buggy
torch.LongTensor(..., device=...)calls withtorch.as_tensor(..., dtype=torch.long, device=...)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
dptb/data/AtomicData.py
Outdated
| # Ensure arrays are proper numpy arrays (not scalars) for isolated systems | ||
| o_first_idex = np.atleast_1d(o_first_idex) | ||
| o_second_idex = np.atleast_1d(o_second_idex) | ||
| o_shift = np.atleast_1d(o_shift) |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using np.atleast_1d on o_shift is incorrect. The shifts array from ASE has shape (n, 3) representing shift vectors. When indexing with ~mask_np results in a single match, o_shift would have shape (3,) representing one vector. Applying np.atleast_1d keeps it as (3,), which breaks the loop logic on lines 1031 and 1042 where o_shift[i] expects to index into the first dimension to get a vector, but would instead get a scalar. Use o_shift = np.atleast_2d(o_shift) or ensure the shape is preserved as (1, 3) for isolated systems.
| o_shift = np.atleast_1d(o_shift) | |
| o_shift = np.atleast_2d(o_shift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dptb/data/AtomicData.py (1)
1013-1044: Self-edge extraction andnp.atleast_2dusage look correct; consider making.numpy()device-agnosticThe updated flow—using
mask_npfor numpy indexing andnp.atleast_2d(o_shift)—correctly avoids torch/numpy masking issues and ensureso_shift[i]is always a 3‑component vector even when there's only a singlei == jedge, which fixes the earlier shape bug in the loop.To make this more robust in environments that set a non‑CPU default device (e.g. via
torch.set_default_device), it would be safer to movemaskexplicitly to CPU before converting to numpy:- mask_np = mask.numpy() + mask_np = mask.cpu().numpy()and likewise for the later occurrence before re-masking at line 1053. This avoids accidental failures if
maskever ends up on a CUDA device.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dptb/data/AtomicData.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
dptb/data/AtomicData.py (1)
1052-1056: Final tensor reconstruction from NumPy selections is consistent with original APIRebuilding
first_idex,second_idex, andshiftsviamask_npand then converting them back to torch tensors onout_devicewith the appropriate dtypes preserves the original function contract while fixing the mixed torch/numpy masking issues. This segment looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
For issue #285 , I fix potential problems in AtomicData.py when treating isolated systems when
self_interaction=False.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.