-
Notifications
You must be signed in to change notification settings - Fork 7
Enable total energy extractor from ABACUS output #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…igenvalue for k-point extraction
…`, `nscf` and `relax` )
dftio/io/abacus/abacus_parser.py
Outdated
| break | ||
|
|
||
| # Extract energy | ||
| energy = self._extract_energy_from_log(loglines, mode, dump_freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are unconverged snapshots in MD or relax, the energy array will be smaller than the nframe, this might lead the output energy lose the correspondance with other structural or electronic properties output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This convention is adopted from dpdata which directly ignores the unconverged frames' energy. However, in dftio, I still choose to output the complete frames's structure and hamiltonians, etc. , to keep the original functions working well. Further development is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporarily, the unconverged frames' index would be written in results_dir.
📝 WalkthroughWalkthroughAdds total-energy extraction and unconverged MD-frame reporting to ABACUS parsing and output writers; refactors Siesta file discovery and eigenvalue parsing (unit conversion); exposes an Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI Handler
participant Parser as Parser.write()
participant Abacus as AbacusParser.get_etot()
participant Log as ABACUS log file
participant Writer as Output Writer (DAT / LMDB)
participant FS as File System
CLI->>Parser: write(idx, ..., energy=True)
Parser->>Abacus: get_etot(idx)
Abacus->>FS: _get_output_dir(idx) & open log file
FS-->>Abacus: loglines
Abacus->>Abacus: _extract_energy_from_log(loglines, mode, dump_freq)
Abacus-->>Parser: { total_energy, unconverged_frames }
Parser->>Parser: build energy_frame_mapping (if trajectory)
alt energy data available
Parser->>Writer: write total_energy / unconverged_frames (DAT/LMDB)
Writer->>FS: write files / LMDB entries
Writer-->>Parser: success
else
Parser->>Parser: log warning (no energy)
end
Parser-->>CLI: done
sequenceDiagram
participant Files as Siesta files
participant Parser as SiestaParser
participant Finder as find_content() (static)
participant Conv as convert_kpoints_bohr_inv_to_twopi_over_a
Parser->>Finder: locate k-point/log content
Finder->>Files: open/search candidate files (.fdf, .bands, logs)
Files-->>Finder: matching content (kpoints in 1/Bohr)
Finder-->>Parser: kpoints_bohr_inv
Parser->>Conv: convert(kpoints_bohr_inv, lattice_a)
Conv-->>Parser: kpoints (2π/a units)
Parser->>Files: read .bands file
Files-->>Parser: eigenvalue blocks
Parser-->>Caller: { energies, kpoints }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.14.5)dftio/io/abacus/abacus_parser.py76-77: Avoid specifying long messages outside the exception class (TRY003) 193-193: (S603) 194-194: Starting a process with a partial executable path (S607) 571-571: Avoid specifying long messages outside the exception class (TRY003) 604-604: Avoid specifying long messages outside the exception class (TRY003) 🔇 Additional comments (9)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dftio/io/siesta/siesta_parser.py (1)
491-536: density_matrix block mask is computed but not applied; keys/values can misalignYou compute block_mask but then update with all blocks:
block_mask = np.abs(block).max(axis=(1,2)) > cut_tol_dm if np.any(block_mask): keys = list(keys) keys = [keys[k] for k,t in enumerate(block_mask) if t] density_matrix_dict.update(dict(zip(keys, block))) # <- not slicedThis zips masked keys with the full block array (first axis), silently misassigning R‑blocks.
Fix:
- density_matrix_dict.update(dict(zip(keys, block))) + density_matrix_dict.update(dict(zip(keys, block[block_mask])))Mirror the same filtering used in Hamiltonian/Overlap. Add a unit test to catch this.
♻️ Duplicate comments (1)
test/data/abacus_relax/Si_ONCV_PBE-1.0.upf (1)
1-1226: Same licensing concern as MD UPF; consider de‑duplicationThis UPF duplicates the CC BY‑SA–licensed Si pseudopotential already added under
test/data/abacus_md. The same license‑compatibility check applies here; you might also consider de‑duplicating via a shared fixture or symlink if your test layout allows it.
🧹 Nitpick comments (18)
dftio/plot/plot_eigs.py (1)
84-88: Moveax.set_xticks([])outside the loop.The call to
ax.set_xticks([])on line 87 is inside theforloop, causing it to execute once per energy level. Since clearing the x-ticks only needs to happen once, move it outside the loop for better efficiency.Apply this diff to move the call outside the loop:
else: for e in self.eigs[0, bmin:bmax]: ax.hlines(e, xmin=-0.5, xmax=0.5, color='b', linewidth=1) - ax.set_xticks([]) + ax.set_xticks([]) ax.set_title("Energy levels")dftio/__main__.py (1)
156-161: Energy flag wiring is fine; validate downstream handling and consider short option styleThe new
-energy/--energyflag is correctly defined and will be forwarded viadict_argsintoParserRegisterandparser.write, assuming those call sites accept anenergykeyword.Please confirm that:
ParserRegister.__init__and.write()accept anenergy: boolkwarg (or**kwargs) and gate energy parsing on it.- Tests cover
dftio parse ... --energyfor at least ABACUS and Siesta paths.As a minor style nit, you might consider switching the short flag to a single letter (e.g.,
-E) to match typical argparse conventions.test/data/abacus_md/INPUT (1)
1-32: MD INPUT is valid; consider addingmd_dumpfreqif you want to test non‑default dump intervals.Current parameters are reasonable for an MD test, and
get_etotwill treatmd_dumpfreqas 1 since it’s not specified.
If you plan to cover non‑unit dump frequencies in tests, adding an explicitmd_dumpfreqline here (or in a separate dataset) would improve coverage.dftio/io/abacus/abacus_parser.py (1)
469-497: Make MD convergence detection more robust to message variants.For MD you currently treat only lines containing
"!! convergence has not been achieved"as unconverged steps, while SCF mode handles additional variants like"convergence has NOT been achieved!"and"convergence has not been achieved".To avoid silently missing unconverged MD frames on other ABACUS builds, you can reuse the broader pattern from the SCF branch:
- elif mode == "md": - # For MD, extract all energies at dump intervals - nenergy = 0 - for line in loglines: - if "final etot is" in line: - if nenergy % dump_freq == 0: - energy.append(float(line.split()[-2])) - nenergy += 1 - elif "!! convergence has not been achieved" in line: - if nenergy % dump_freq == 0: - energy.append(np.nan) - nenergy += 1 + elif mode == "md": + # For MD, extract all energies at dump intervals + nenergy = 0 + for line in loglines: + if "final etot is" in line: + if nenergy % dump_freq == 0: + energy.append(float(line.split()[-2])) + nenergy += 1 + elif ( + "!! convergence has not been achieved" in line + or "convergence has NOT been achieved!" in line + or "convergence has not been achieved" in line + ): + if nenergy % dump_freq == 0: + energy.append(np.nan) + nenergy += 1This keeps the current behavior but tolerates slightly different wording in the logs.
test/test_abacus_energy.py (3)
79-81: Drop unnecessary f-strings to satisfy Ruff F541 and keep the code cleanThe strings at the following locations are plain literals and don't need f-strings:
- Line 79:
f"Loaded energy doesn't match expected value"- Line 81:
print(f"DAT format energy write test passed")- Line 127:
f"LMDB energy doesn't match expected value"- Line 130:
print(f"LMDB format energy write test passed")You can simplify them as regular strings to quiet Ruff and reduce noise:
- assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \ - f"Loaded energy doesn't match expected value" + assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \ + "Loaded energy doesn't match expected value" @@ - print(f"DAT format energy write test passed") + print("DAT format energy write test passed") @@ - assert np.isclose(energy, -1879.7169812, atol=1e-5), \ - f"LMDB energy doesn't match expected value" + assert np.isclose(energy, -1879.7169812, atol=1e-5), \ + "LMDB energy doesn't match expected value" @@ - env.close() - print(f"LMDB format energy write test passed") + env.close() + print("LMDB format energy write test passed")As per static analysis hints
Also applies to: 127-130
195-197: Fix RELAX test comment to match the actual expectationThe comment says “4 frames in test data” but
expected_nframes = 1and the assertion checks for a single value:# Check shape - should be [nframes,] for RELAX (4 frames in test data) expected_nframes = 1Either update the comment to say 1 frame or, if there really are 4
!FINAL_ETOT_ISentries intended, changeexpected_nframesaccordingly. Right now the mismatch is misleading for anyone maintaining the tests.
301-309: Consider dropping the manual__main__test runnerThe explicit
if __name__ == "__main__": ...block duplicates pytest’s test discovery and isn’t typically needed in pytest-based suites. Removing it will keep the module focused on declarative tests and avoid accidental divergence between “manual” andpytestbehavior.If you keep it for local debugging, that’s fine, but it’s worth confirming it stays in sync whenever test signatures change.
test/test_siesta_parse.py (1)
112-128: Useisinstance()for type checks.The tests use
type(x) == dictfor type comparisons. Python best practice is to useisinstance(x, dict)instead, as it properly handles subclass relationships and is more idiomatic.Apply this diff to improve the type checks:
- assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)) == tuple + assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False), tuple) - assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0]) == dict + assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0], dict) assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'].shape == (15, 15) assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['2_3_0_-1_0'].shape == (15, 15) assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'][0,1] == pytest.approx(-2.3722997) assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['1_2_0_-1_0'][0,1] == pytest.approx(-4.234535e-07) - assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]) == dict + assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0], dict) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'].shape == (15, 15) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['2_3_0_-1_0'].shape == (15, 15) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,0] == pytest.approx(1.00000000) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,1] == pytest.approx(0.9065072) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,2] == pytest.approx(0.0) assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['1_2_0_1_0'][1,0] == pytest.approx(0.07965754) - assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0]) == dict + assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0], dict)dftio/io/parse.py (3)
214-214: Remove unused**kwargsparameter.The
**kwargsparameter is declared but never used in the method body. If it's not needed for future extensibility, remove it to keep the signature clean.Apply this diff if kwargs is not needed:
- def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False, **kwargs): + def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False):Alternatively, if kwargs is intended for future use, add a docstring comment explaining this design decision.
287-287: Remove unnecessary f-string prefix.Line 287 uses an f-string without any placeholders. Use a regular string instead.
Apply this diff:
- log.warning(f"Parser does not implement get_etot method") + log.warning("Parser does not implement get_etot method")
335-335: Remove unnecessary f-string prefix.Line 335 uses an f-string without any placeholders. Use a regular string instead.
Apply this diff:
- log.warning(f"Parser does not implement get_etot method") + log.warning("Parser does not implement get_etot method")dftio/io/siesta/check_siesta.ipynb (3)
101-109: Clear notebook outputs before committingLarge embedded images/base64 inflate the repo and obscure the diff. Please clear all outputs (nbstripout or jupyter nbconvert --clear-output) and keep visuals generated at runtime. If needed, store heavy artifacts via LFS.
Also applies to: 171-181, 384-392, 426-434
158-166: Convert exploratory notebook code into automated testsMany cells perform assertions/comparisons ideal for pytest (e.g., block diffs, eigenvalue checks). Consider moving them to tests/test_siesta_parse.py with deterministic, project-relative inputs; keep the notebook slim as a usage demo.
Also applies to: 192-201, 217-226, 352-365, 372-389, 414-443, 544-563, 759-775
672-686: Avoid noisy prints; prefer warnings or loggingRaw print warnings reduce notebook signal. Use warnings.warn or logging at INFO/WARN with clear messages; optionally silence verbose third-party outputs where appropriate.
Also applies to: 767-775
dftio/io/siesta/siesta_parser.py (4)
321-330: Rename ambiguous variable ‘l’‘l’ is easy to confuse with ‘1’. Prefer ‘ell’ or ‘orb_l’. This also aligns with Ruff E741.
5-12: Duplicate import and minor hygiene
from collections import Counteris imported twice (lines 5 and 11). Remove the duplicate.
187-189: Use cleaner exception messagesLong, multi-line strings inline lower readability and trip static checks (TRY003). Prefer concise messages or define custom exceptions if needed.
Also applies to: 219-219, 253-253
178-189: Case-insensitive key matching for FDFSIESTA inputs are case-insensitive. In addition to the robust value parse, match the key with
re.IGNORECASE(as shown above) instead of'WriteKbands' in line.split().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
test/data/abacus_md/OUT.ABACUS/running_md.logis excluded by!**/*.logtest/data/abacus_md/OUT.ABACUS/warning.logis excluded by!**/*.logtest/data/abacus_relax/OUT.ABACUS/running_relax.logis excluded by!**/*.logtest/data/abacus_scf/OUT.ABACUS/BANDS_1.datis excluded by!**/*.dattest/data/abacus_scf/OUT.ABACUS/running_scf.logis excluded by!**/*.logtest/data/siesta/siesta_out_withband/out.logis excluded by!**/*.log
📒 Files selected for processing (31)
.gitignore(1 hunks)dftio/__main__.py(1 hunks)dftio/data/_keys.py(1 hunks)dftio/io/abacus/abacus_parser.py(2 hunks)dftio/io/parse.py(6 hunks)dftio/io/siesta/check_siesta.ipynb(16 hunks)dftio/io/siesta/siesta_parser.py(5 hunks)dftio/plot/plot_eigs.py(1 hunks)test/data/abacus_md/INPUT(1 hunks)test/data/abacus_md/KPT(1 hunks)test/data/abacus_md/OUT.ABACUS/INPUT(1 hunks)test/data/abacus_md/OUT.ABACUS/MD_dump(1 hunks)test/data/abacus_md/OUT.ABACUS/STRU.cif(1 hunks)test/data/abacus_md/STRU(1 hunks)test/data/abacus_md/Si_ONCV_PBE-1.0.upf(1 hunks)test/data/abacus_md/Si_gga_8au_60Ry_2s2p1d.orb(1 hunks)test/data/abacus_relax/INPUT(1 hunks)test/data/abacus_relax/KPT(1 hunks)test/data/abacus_relax/OUT.ABACUS/INPUT(1 hunks)test/data/abacus_relax/OUT.ABACUS/STRU.cif(1 hunks)test/data/abacus_relax/OUT.ABACUS/STRU_ION_D(1 hunks)test/data/abacus_relax/STRU(1 hunks)test/data/abacus_relax/Si_ONCV_PBE-1.0.upf(1 hunks)test/data/abacus_relax/Si_gga_8au_60Ry_2s2p1d.orb(1 hunks)test/data/siesta/siesta_out_simple/RUN.fdf(1 hunks)test/data/siesta/siesta_out_simple/STRUCT.fdf(1 hunks)test/data/siesta/siesta_out_withband/Au_cell.bands(1 hunks)test/data/siesta/siesta_out_withband/RUN_band.fdf(1 hunks)test/data/siesta/siesta_out_withband/STRUCT.fdf(1 hunks)test/test_abacus_energy.py(1 hunks)test/test_siesta_parse.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
test/test_abacus_energy.py (2)
dftio/io/abacus/abacus_parser.py (2)
get_etot(516-572)_extract_energy_from_log(421-514)dftio/io/parse.py (2)
write(214-222)formula(88-92)
test/test_siesta_parse.py (1)
dftio/io/siesta/siesta_parser.py (3)
SiestaParser(18-551)convert_kpoints_bohr_inv_to_twopi_over_a(28-51)find_content(54-111)
dftio/io/parse.py (1)
dftio/io/abacus/abacus_parser.py (2)
get_etot(516-572)get_structure(39-49)
dftio/io/siesta/siesta_parser.py (2)
dftio/io/abacus/abacus_parser.py (4)
get_structure(39-49)get_eigenvalue(61-91)get_basis(94-131)get_blocks(134-342)dftio/io/parse.py (3)
get_structure(78-79)get_eigenvalue(143-144)get_blocks(147-148)
🪛 Ruff (0.14.4)
dftio/io/abacus/abacus_parser.py
511-511: Avoid specifying long messages outside the exception class
(TRY003)
540-540: Avoid specifying long messages outside the exception class
(TRY003)
test/test_abacus_energy.py
79-79: f-string without any placeholders
Remove extraneous f prefix
(F541)
81-81: f-string without any placeholders
Remove extraneous f prefix
(F541)
121-121: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
127-127: f-string without any placeholders
Remove extraneous f prefix
(F541)
130-130: f-string without any placeholders
Remove extraneous f prefix
(F541)
test/test_siesta_parse.py
112-112: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
114-114: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
120-120: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
128-128: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
dftio/io/siesta/check_siesta.ipynb
231-231: Found useless expression. Either assign it to a variable or remove it.
(B018)
248-248: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
275-275: Ambiguous variable name: l
(E741)
293-293: Found useless expression. Either assign it to a variable or remove it.
(B018)
301-301: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
dftio/io/parse.py
214-214: Unused method argument: kwargs
(ARG002)
287-287: f-string without any placeholders
Remove extraneous f prefix
(F541)
335-335: f-string without any placeholders
Remove extraneous f prefix
(F541)
dftio/io/siesta/siesta_parser.py
58-59: Avoid specifying long messages outside the exception class
(TRY003)
92-92: Do not catch blind exception: Exception
(BLE001)
109-109: Avoid specifying long messages outside the exception class
(TRY003)
187-188: Avoid specifying long messages outside the exception class
(TRY003)
219-219: Avoid specifying long messages outside the exception class
(TRY003)
253-253: Avoid specifying long messages outside the exception class
(TRY003)
321-321: Ambiguous variable name: l
(E741)
🔇 Additional comments (19)
dftio/plot/plot_eigs.py (1)
80-83: LGTM! Clean band structure plotting logic.The conditional logic correctly handles the multiple k-points scenario with appropriate labels and title for band structure visualization.
test/data/abacus_md/KPT (1)
1-4: LGTM! Valid ABACUS K_POINTS format.The K-points specification correctly defines a single Gamma-point mesh, which is appropriate for test data supporting MD calculations.
.gitignore (1)
160-166: LGTM! Reasonable ignore patterns.The added patterns appropriately exclude generated test outputs, notebook execution state, temporary documentation, and experimental code from version control.
dftio/data/_keys.py (1)
106-106: LGTM! Well-defined constant for unconverged frame tracking.The new key follows naming conventions and is appropriately positioned. It's correctly excluded from
ALL_ENERGY_KEYSsince it represents frame metadata rather than an energy quantity, while still being discoverable viaALLOWED_KEYS.test/data/abacus_relax/INPUT (1)
1-23: LGTM! Valid ABACUS relaxation input.The input parameters are correctly formatted for ABACUS relaxation calculations. The convergence thresholds and computational parameters are appropriate for test data, and the included comments about energy cutoff convergence testing are helpful.
test/data/abacus_md/OUT.ABACUS/STRU.cif (1)
1-35: LGTM! Valid CIF structure file.The CIF file is correctly formatted and represents a silicon structure (likely diamond cubic) with 8 atoms. All required crystallographic information is present and properly structured for test data.
test/data/abacus_relax/OUT.ABACUS/STRU_ION_D (1)
1-22: LGTM! Valid ABACUS structure file.The STRU_ION_D file is correctly formatted with appropriate movement constraints for relaxation testing. The first atom is fixed while the second can move in the x-direction, which is suitable for testing relaxation energy extraction.
test/data/siesta/siesta_out_simple/STRUCT.fdf (1)
1-20: LGTM! Valid Siesta structure file.The FDF file is correctly formatted with proper block delimiters and represents a face-centered cubic gold structure. All required sections (LatticeVectors, AtomicCoordinates, ChemicalSpeciesLabel) are present and correctly structured for testing Siesta parsing.
test/data/abacus_relax/OUT.ABACUS/STRU.cif (1)
1-29: LGTM! Valid CIF structure file.The CIF file is correctly formatted and represents a two-atom silicon structure consistent with the corresponding STRU_ION_D file. All crystallographic information is properly specified for relaxation test scenarios.
test/data/abacus_md/STRU (1)
1-28: STRU fixture looks structurally consistentABACUS sections, lattice, and 8 Si positions all look well-formed and appropriate for MD tests; no issues from a parser/fixture perspective.
test/data/abacus_md/OUT.ABACUS/INPUT (1)
1-486: ABACUS MD INPUT test fixture is coherentParameter blocks and syntax follow ABACUS INPUT conventions and should exercise the MD parsing logic well; no structural problems spotted.
test/data/abacus_relax/KPT (1)
1-4: K_POINTS block is well‑formed for ABACUS relax tests.Single Gamma‑point 1×1×1 mesh is valid and sufficient for the added relax dataset; no issues found.
test/data/siesta/siesta_out_withband/STRUCT.fdf (1)
1-20: Siesta STRUCT.fdf structure looks consistent and parseable.Blocks (lattice, coordinates, species) are syntactically correct and self‑consistent for an Au cell; suitable as test data.
test/data/siesta/siesta_out_withband/Au_cell.bands (1)
1-496: Bands file layout matches expected Siesta.bandsformat.Header, band grid, and trailing high‑symmetry point table are consistent; should be parseable by the updated Siesta bands/eigenvalue logic.
test/data/abacus_relax/STRU (1)
1-24: ABACUS STRU file is syntactically correct and coherent.Species, orbital, lattice, and Cartesian positions are in the expected ABACUS format; suitable for the relax test case.
test/data/abacus_md/OUT.ABACUS/MD_dump (1)
1-187: MD dump format and content are consistent across steps.Each MDSTEP block has stable lattice data and 8‑atom tables with labeled position/force/velocity columns, matching ABACUS MD dump conventions.
dftio/io/abacus/abacus_parser.py (1)
16-19: Logger setup is appropriate.Adding a module‑level
logging.getLogger(__name__)is a good fit for emitting warnings fromget_etotwithout changing existing behavior elsewhere.dftio/io/parse.py (1)
330-380: Energy extraction and frame mapping logic looks solid.The implementation correctly handles:
- Detection of
get_etotmethod availability- Building frame index mapping for unconverged frames in MD/RELAX calculations
- Distinguishing between single-structure (SCF/NSCF) and trajectory cases
- Proper indexing when writing energy data to LMDB
The logic appropriately skips energy values for unconverged frames while maintaining correct alignment with structure frames.
dftio/io/siesta/siesta_parser.py (1)
155-156: Comment mismatchComment says “abacus does not allow non‑pbc structure” in the SIESTA parser. Update or remove to avoid confusion. [suggest_minor_issue]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
dftio/io/abacus/abacus_parser.py (1)
555-567: Make MDmd_dumpfreqdetection more robust and aligned withget_modeRight now
get_etotonly looks formd_dumpfreqinself.raw_datas[idx]/"INPUT", and assumesdump_freq = 1otherwise. This has two practical issues:
get_modereads fromOUT.ABACUS/INPUT, so it’s plausible some datasets only carry theOUT.ABACUS/INPUTcopy; in that case MD energies will always be treated as ifmd_dumpfreq = 1, which can desynchronize energies vs. the MD frames you expect to sample.int(line.split()[1])will raise if the line is malformed or missing the second token.To avoid silent misconfiguration and fragile parsing, you can try both candidate INPUT locations and defensively parse the value, falling back to
1on any error. For example:- # Get dump frequency for MD mode - dump_freq = 1 - if mode == "md": - input_path = os.path.join(self.raw_datas[idx], "INPUT") - if os.path.exists(input_path): - with open(input_path, 'r') as f: - for line in f: - if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]: - dump_freq = int(line.split()[1]) - break + # Get dump frequency for MD mode (default to 1 if not found) + dump_freq = 1 + if mode == "md": + # Prefer root INPUT, fall back to OUT.ABACUS/INPUT + input_candidates = [ + os.path.join(self.raw_datas[idx], "INPUT"), + os.path.join(self.raw_datas[idx], "OUT.ABACUS", "INPUT"), + ] + for input_path in input_candidates: + if not os.path.exists(input_path): + continue + try: + with open(input_path, "r") as f_in: + for line in f_in: + tokens = line.split() + if tokens and tokens[0] == "md_dumpfreq": + try: + dump_freq = int(tokens[1]) + except (IndexError, ValueError): + dump_freq = 1 + break + except OSError: + # Fall back to default dump_freq = 1 on read errors + continue + # Stop once we've inspected the first existing INPUT file + breakThis keeps behavior unchanged in the common case while avoiding misaligned MD energies when only one INPUT location is present or the md_dumpfreq line is slightly malformed.
🧹 Nitpick comments (1)
dftio/io/abacus/abacus_parser.py (1)
420-520: Energy‑extraction helper is sound; consider minor robustness improvementsThe mode‑specific parsing logic (SCF/NSCF reversed scan, MD with NaN‑masking and dump‑frequency handling, RELAX with an explicit
ValueErroron failure) looks consistent with the documented behavior and aligns with howget_etotconsumes the result.If you want to make this a bit easier to debug in practice, you could:
- Emit a
log.warningorlog.debugwhen returningNonefor SCF/NSCF/MD (no energies found or all MD frames unconverged), so callers can distinguish “no data” from “parser never ran”.- Optionally narrow the RELAX failure message to include the mode and file context (e.g. mode, log path) if you later catch this higher up, though current behavior is already functionally correct.
These are polish‑level tweaks; the core implementation is fine as is.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dftio/io/abacus/abacus_parser.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
dftio/io/abacus/abacus_parser.py
516-516: Avoid specifying long messages outside the exception class
(TRY003)
549-549: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
dftio/io/abacus/abacus_parser.py (1)
16-18: Module‑level logger wiring looks goodImporting
loggingand creating a module‑levellog = logging.getLogger(__name__)is idiomatic and fits how you use logging later inget_etot.
|
To solve #21 , I add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
dftio/io/abacus/abacus_parser.py (1)
607-616: Add fallback to OUT.ABACUS/INPUT for md_dumpfreq lookup.Currently, only
self.raw_datas[idx]/INPUTis checked, butget_modereads from_get_output_dir(idx)/INPUT(typicallyOUT.ABACUS/INPUT). If only the output directory copy exists,dump_freqwill incorrectly default to 1, potentially desynchronizing energies from frames whenmd_dumpfreq > 1.Apply this diff to check both locations:
dump_freq = 1 if mode == "md": - input_path = os.path.join(self.raw_datas[idx], "INPUT") - if os.path.exists(input_path): - with open(input_path, 'r') as f: - for line in f: - if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]: - dump_freq = int(line.split()[1]) - break + # Try root INPUT first, then fall back to OUT.ABACUS/INPUT + input_candidates = [ + os.path.join(self.raw_datas[idx], "INPUT"), + os.path.join(self._get_output_dir(idx), "INPUT"), + ] + for input_path in input_candidates: + if not os.path.exists(input_path): + continue + with open(input_path, 'r') as f: + for line in f: + tokens = line.split() + if tokens and tokens[0] == "md_dumpfreq": + dump_freq = int(tokens[1]) + break + if dump_freq != 1: + breakThis also makes the parsing more robust by checking that
tokensis non-empty before indexing.Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dftio/io/abacus/abacus_parser.py(16 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
dftio/io/abacus/abacus_parser.py
76-77: Avoid specifying long messages outside the exception class
(TRY003)
192-192: Starting a process with a shell, possible injection detected
(S605)
567-567: Avoid specifying long messages outside the exception class
(TRY003)
600-600: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
dftio/io/abacus/abacus_parser.py (4)
16-19: LGTM!The new imports are appropriate and correctly used throughout the added functionality.
39-87: Well-structured output directory discovery.The fallback logic (OUT.ABACUS → OUT.* pattern → most recently modified) is robust and handles multiple deployment scenarios gracefully.
471-571: LGTM!The energy extraction logic correctly handles the different ABACUS calculation modes, properly filters unconverged MD frames, and raises appropriate errors for failed RELAX calculations.
35-35: LGTM!The refactoring to use
_get_output_diris consistently applied across all methods, centralizing output directory discovery and improving maintainability.Also applies to: 102-102, 114-119, 149-149, 190-193, 256-289, 305-389
For #19 , I add method
get_etotin abacus_parser.py and related unit tests.get_etotenable total energy extraction from ABACUS inSCF/NSCF/md/relaxcases.Summary by CodeRabbit
New Features
Chores
Tests
Data