Skip to content

Conversation

@AsymmetryChou
Copy link
Collaborator

@AsymmetryChou AsymmetryChou commented Nov 3, 2025

For #19 , I add method get_etot in abacus_parser.py and related unit tests.

  • get_etot enable total energy extraction from ABACUS in SCF/NSCF/md/relax cases.
  • All related unit tests and data are provided.

Summary by CodeRabbit

  • New Features

    • CLI option to include total energies on export; writers now export total energy and unconverged-frame info.
    • Siesta parsing: eigenvalues and k‑points extraction; band plotting now shows energy levels for single k‑point cases.
  • Chores

    • Updated ignore rules to exclude test artifacts, notebooks, and playground files.
  • Tests

    • Added tests covering energy extraction and Siesta parsing.
  • Data

    • Added multiple ABACUS and Siesta example/input/output datasets.

@AsymmetryChou AsymmetryChou added the enhancement New feature or request label Nov 3, 2025
break

# Extract energy
energy = self._extract_energy_from_log(loglines, mode, dump_freq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are unconverged snapshots in MD or relax, the energy array will be smaller than the nframe, this might lead the output energy lose the correspondance with other structural or electronic properties output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convention is adopted from dpdata which directly ignores the unconverged frames' energy. However, in dftio, I still choose to output the complete frames's structure and hamiltonians, etc. , to keep the original functions working well. Further development is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily, the unconverged frames' index would be written in results_dir.

@coderabbitai
Copy link

coderabbitai bot commented Nov 16, 2025

📝 Walkthrough

Walkthrough

Adds total-energy extraction and unconverged MD-frame reporting to ABACUS parsing and output writers; refactors Siesta file discovery and eigenvalue parsing (unit conversion); exposes an --energy CLI flag; updates plotting for single k-point; adds many test fixtures, notebooks, and unit tests.

Changes

Cohort / File(s) Summary
CLI Enhancement
dftio/__main__.py
Added -energy / --energy flag to the parse subcommand to request total-energy extraction when writing outputs.
Keys & Constants
dftio/data/_keys.py
Added UNCONVERGED_FRAME_INDICES_KEY ("unconverged_frames") constant for reporting unconverged MD frames.
ABACUS parser
dftio/io/abacus/abacus_parser.py
Added module logger; _get_output_dir(self, idx) to discover ABACUS output dirs; _extract_energy_from_log(loglines, mode, dump_freq=1) to parse SCF/MD/RELAX energies and detect unconverged frames; extended get_etot to return {TOTAL_ENERGY_KEY, UNCONVERGED_FRAME_INDICES_KEY}; updated path resolution to use discovered output dir.
Output writers / Parse pipeline
dftio/io/parse.py
Propagated energy flag through Parser.write, write_dat, and write_lmdb; added logic to call get_etot() when energy=True, write total_energy and unconverged_frames to DAT, and populate per-frame TOTAL_ENERGY_KEY entries in LMDB with mapping that accounts for trajectories and unconverged frames.
Siesta parser refactor & eigenvalues
dftio/io/siesta/siesta_parser.py
Made find_content a static method with improved candidate selection and error handling; added convert_kpoints_bohr_inv_to_twopi_over_a; added get_eigenvalue returning energies and k-points (unit conversion); updated structure/basis/blocks retrieval to use the new discovery helper.
Plotting
dftio/plot/plot_eigs.py
Conditional plotting: multi k-point -> band-structure; single k-point -> horizontal energy-level plot with adjusted labels/titles.
Repository ignores
.gitignore
Added ignore rules for test artifacts and auxiliary files (test/data/siesta/siesta_out/*, example/siesta_io/siesta_io.ipynb, playground/*, CLAUDE.md, etc.).
Test fixtures — ABACUS MD
test/data/abacus_md/*
Added ABACUS MD dataset files: INPUT, KPT, STRU, OUT.ABACUS/INPUT, OUT.ABACUS/MD_dump, OUT.ABACUS/STRU.cif, plus Si_ONCV_PBE-1.0.upf and .orb orbital files.
Test fixtures — ABACUS Relax
test/data/abacus_relax/*
Added relaxation dataset files: INPUT, KPT, STRU, OUT.ABACUS/* (INPUT, STRU.cif, STRU_ION_D), plus Si_ONCV_PBE-1.0.upf and .orb orbital files.
Test fixtures — Siesta
test/data/siesta/*, dftio/io/siesta/check_siesta.ipynb
Added Siesta input/structure/band output files for Au examples and updated notebook cells (expanded parsing/inspection outputs).
New unit tests
test/test_abacus_energy.py, test/test_siesta_parse.py
Added tests for ABACUS energy extraction (SCF/MD/RELAX, unconverged frames, DAT/LMDB writing) and Siesta parsing (k-point conversion, content discovery, eigenvalues, structure/basis/blocks).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Handler
    participant Parser as Parser.write()
    participant Abacus as AbacusParser.get_etot()
    participant Log as ABACUS log file
    participant Writer as Output Writer (DAT / LMDB)
    participant FS as File System

    CLI->>Parser: write(idx, ..., energy=True)
    Parser->>Abacus: get_etot(idx)
    Abacus->>FS: _get_output_dir(idx) & open log file
    FS-->>Abacus: loglines
    Abacus->>Abacus: _extract_energy_from_log(loglines, mode, dump_freq)
    Abacus-->>Parser: { total_energy, unconverged_frames }
    Parser->>Parser: build energy_frame_mapping (if trajectory)
    alt energy data available
        Parser->>Writer: write total_energy / unconverged_frames (DAT/LMDB)
        Writer->>FS: write files / LMDB entries
        Writer-->>Parser: success
    else
        Parser->>Parser: log warning (no energy)
    end
    Parser-->>CLI: done
Loading
sequenceDiagram
    participant Files as Siesta files
    participant Parser as SiestaParser
    participant Finder as find_content() (static)
    participant Conv as convert_kpoints_bohr_inv_to_twopi_over_a

    Parser->>Finder: locate k-point/log content
    Finder->>Files: open/search candidate files (.fdf, .bands, logs)
    Files-->>Finder: matching content (kpoints in 1/Bohr)
    Finder-->>Parser: kpoints_bohr_inv
    Parser->>Conv: convert(kpoints_bohr_inv, lattice_a)
    Conv-->>Parser: kpoints (2π/a units)
    Parser->>Files: read .bands file
    Files-->>Parser: eigenvalue blocks
    Parser-->>Caller: { energies, kpoints }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • dftio/io/abacus/abacus_parser.py — correctness of _extract_energy_from_log for SCF/MD/RELAX, dump-frequency handling, unconverged-frame detection, and logging/error paths.
    • dftio/io/parse.py — verify energy propagation, energy_frame_mapping alignment for trajectory vs single-structure cases, and LMDB/DAT write semantics.
    • dftio/io/siesta/siesta_parser.py — ensure all call sites are updated for the static find_content signature and validate k-point unit conversion and eigenvalue parsing.
    • New tests (test/test_abacus_energy.py, test/test_siesta_parse.py) — confirm fixtures match parser expectations and are environment-independent.
    • Large fixture files (*.upf, *.orb) — spot-check parser compatibility and intended inclusion.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding total energy extraction capability from ABACUS output. It directly reflects the primary objective (get_etot method) and the primary file modified (abacus_parser.py).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd216ce and 8852590.

📒 Files selected for processing (1)
  • dftio/io/abacus/abacus_parser.py (16 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
dftio/io/abacus/abacus_parser.py

76-77: Avoid specifying long messages outside the exception class

(TRY003)


193-193: subprocess call: check for execution of untrusted input

(S603)


194-194: Starting a process with a partial executable path

(S607)


571-571: Avoid specifying long messages outside the exception class

(TRY003)


604-604: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)
dftio/io/abacus/abacus_parser.py (9)

16-22: LGTM! Clean import additions and logger setup.

The new imports (logging, glob, subprocess) are all used appropriately in the code, and the module-level logger follows Python best practices.


39-87: Well-designed output directory discovery with sensible fallback logic.

The method correctly handles the issue #21 requirement by falling back to OUT.* patterns when OUT.ABACUS is not found. The multiple-match handling (using most recent modification time) is a reasonable heuristic.


35-35: LGTM! Correctly uses the new output directory discovery.

The change properly integrates _get_output_dir for structure loading in SCF/NSCF modes.


102-102: LGTM! Consistent use of output directory discovery.


114-119: LGTM! Eigenvalue extraction correctly refactored.


149-149: LGTM! Basis extraction updated consistently.


190-196: Excellent security improvement replacing os.system with subprocess.run.

The subprocess call is properly structured with a list of arguments and check=True to raise exceptions on failure. The static analysis warnings (S603/S607) are acceptable here since paths come from internal data structures.


260-393: LGTM! Comprehensive refactoring of matrix file access.

All matrix file paths now correctly use the discovered output directory, maintaining support for both hierarchical (matrix/) and flat file structures.


621-636: LGTM! Clean energy extraction and result handling.

The method properly calls the extraction helper, handles None results, logs warnings for unconverged frames, and returns a well-structured dictionary. The warning at line 631 provides valuable feedback to users about data quality.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dftio/io/siesta/siesta_parser.py (1)

491-536: density_matrix block mask is computed but not applied; keys/values can misalign

You compute block_mask but then update with all blocks:

block_mask = np.abs(block).max(axis=(1,2)) > cut_tol_dm
if np.any(block_mask):
    keys = list(keys)
    keys = [keys[k] for k,t in enumerate(block_mask) if t]
    density_matrix_dict.update(dict(zip(keys, block)))  # <- not sliced

This zips masked keys with the full block array (first axis), silently misassigning R‑blocks.

Fix:

-    density_matrix_dict.update(dict(zip(keys, block)))
+    density_matrix_dict.update(dict(zip(keys, block[block_mask])))

Mirror the same filtering used in Hamiltonian/Overlap. Add a unit test to catch this.

♻️ Duplicate comments (1)
test/data/abacus_relax/Si_ONCV_PBE-1.0.upf (1)

1-1226: Same licensing concern as MD UPF; consider de‑duplication

This UPF duplicates the CC BY‑SA–licensed Si pseudopotential already added under test/data/abacus_md. The same license‑compatibility check applies here; you might also consider de‑duplicating via a shared fixture or symlink if your test layout allows it.

🧹 Nitpick comments (18)
dftio/plot/plot_eigs.py (1)

84-88: Move ax.set_xticks([]) outside the loop.

The call to ax.set_xticks([]) on line 87 is inside the for loop, causing it to execute once per energy level. Since clearing the x-ticks only needs to happen once, move it outside the loop for better efficiency.

Apply this diff to move the call outside the loop:

     else:
         for e in self.eigs[0, bmin:bmax]:
             ax.hlines(e, xmin=-0.5, xmax=0.5, color='b', linewidth=1)
-            ax.set_xticks([])
+        ax.set_xticks([])
         ax.set_title("Energy levels")
dftio/__main__.py (1)

156-161: Energy flag wiring is fine; validate downstream handling and consider short option style

The new -energy / --energy flag is correctly defined and will be forwarded via dict_args into ParserRegister and parser.write, assuming those call sites accept an energy keyword.

Please confirm that:

  • ParserRegister.__init__ and .write() accept an energy: bool kwarg (or **kwargs) and gate energy parsing on it.
  • Tests cover dftio parse ... --energy for at least ABACUS and Siesta paths.

As a minor style nit, you might consider switching the short flag to a single letter (e.g., -E) to match typical argparse conventions.

test/data/abacus_md/INPUT (1)

1-32: MD INPUT is valid; consider adding md_dumpfreq if you want to test non‑default dump intervals.

Current parameters are reasonable for an MD test, and get_etot will treat md_dumpfreq as 1 since it’s not specified.
If you plan to cover non‑unit dump frequencies in tests, adding an explicit md_dumpfreq line here (or in a separate dataset) would improve coverage.

dftio/io/abacus/abacus_parser.py (1)

469-497: Make MD convergence detection more robust to message variants.

For MD you currently treat only lines containing "!! convergence has not been achieved" as unconverged steps, while SCF mode handles additional variants like "convergence has NOT been achieved!" and "convergence has not been achieved".

To avoid silently missing unconverged MD frames on other ABACUS builds, you can reuse the broader pattern from the SCF branch:

-        elif mode == "md":
-            # For MD, extract all energies at dump intervals
-            nenergy = 0
-            for line in loglines:
-                if "final etot is" in line:
-                    if nenergy % dump_freq == 0:
-                        energy.append(float(line.split()[-2]))
-                    nenergy += 1
-                elif "!! convergence has not been achieved" in line:
-                    if nenergy % dump_freq == 0:
-                        energy.append(np.nan)
-                    nenergy += 1
+        elif mode == "md":
+            # For MD, extract all energies at dump intervals
+            nenergy = 0
+            for line in loglines:
+                if "final etot is" in line:
+                    if nenergy % dump_freq == 0:
+                        energy.append(float(line.split()[-2]))
+                    nenergy += 1
+                elif (
+                    "!! convergence has not been achieved" in line
+                    or "convergence has NOT been achieved!" in line
+                    or "convergence has not been achieved" in line
+                ):
+                    if nenergy % dump_freq == 0:
+                        energy.append(np.nan)
+                    nenergy += 1

This keeps the current behavior but tolerates slightly different wording in the logs.

test/test_abacus_energy.py (3)

79-81: Drop unnecessary f-strings to satisfy Ruff F541 and keep the code clean

The strings at the following locations are plain literals and don't need f-strings:

  • Line 79: f"Loaded energy doesn't match expected value"
  • Line 81: print(f"DAT format energy write test passed")
  • Line 127: f"LMDB energy doesn't match expected value"
  • Line 130: print(f"LMDB format energy write test passed")

You can simplify them as regular strings to quiet Ruff and reduce noise:

-        assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \
-            f"Loaded energy doesn't match expected value"
+        assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \
+            "Loaded energy doesn't match expected value"
@@
-        print(f"DAT format energy write test passed")
+        print("DAT format energy write test passed")
@@
-            assert np.isclose(energy, -1879.7169812, atol=1e-5), \
-                f"LMDB energy doesn't match expected value"
+            assert np.isclose(energy, -1879.7169812, atol=1e-5), \
+                "LMDB energy doesn't match expected value"
@@
-        env.close()
-        print(f"LMDB format energy write test passed")
+        env.close()
+        print("LMDB format energy write test passed")

As per static analysis hints

Also applies to: 127-130


195-197: Fix RELAX test comment to match the actual expectation

The comment says “4 frames in test data” but expected_nframes = 1 and the assertion checks for a single value:

# Check shape - should be [nframes,] for RELAX (4 frames in test data)
expected_nframes = 1

Either update the comment to say 1 frame or, if there really are 4 !FINAL_ETOT_IS entries intended, change expected_nframes accordingly. Right now the mismatch is misleading for anyone maintaining the tests.


301-309: Consider dropping the manual __main__ test runner

The explicit if __name__ == "__main__": ... block duplicates pytest’s test discovery and isn’t typically needed in pytest-based suites. Removing it will keep the module focused on declarative tests and avoid accidental divergence between “manual” and pytest behavior.

If you keep it for local debugging, that’s fine, but it’s worth confirming it stays in sync whenever test signatures change.

test/test_siesta_parse.py (1)

112-128: Use isinstance() for type checks.

The tests use type(x) == dict for type comparisons. Python best practice is to use isinstance(x, dict) instead, as it properly handles subclass relationships and is more idiomatic.

Apply this diff to improve the type checks:

-    assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)) == tuple  
+    assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False), tuple)

-    assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0], dict)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['2_3_0_-1_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'][0,1] == pytest.approx(-2.3722997)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['1_2_0_-1_0'][0,1] == pytest.approx(-4.234535e-07)

-    assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0], dict)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['2_3_0_-1_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,0] == pytest.approx(1.00000000)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,1] == pytest.approx(0.9065072)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,2] == pytest.approx(0.0)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['1_2_0_1_0'][1,0] == pytest.approx(0.07965754)

-    assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0], dict)
dftio/io/parse.py (3)

214-214: Remove unused **kwargs parameter.

The **kwargs parameter is declared but never used in the method body. If it's not needed for future extensibility, remove it to keep the signature clean.

Apply this diff if kwargs is not needed:

-    def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False, **kwargs):
+    def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False):

Alternatively, if kwargs is intended for future use, add a docstring comment explaining this design decision.


287-287: Remove unnecessary f-string prefix.

Line 287 uses an f-string without any placeholders. Use a regular string instead.

Apply this diff:

-                log.warning(f"Parser does not implement get_etot method")
+                log.warning("Parser does not implement get_etot method")

335-335: Remove unnecessary f-string prefix.

Line 335 uses an f-string without any placeholders. Use a regular string instead.

Apply this diff:

-                log.warning(f"Parser does not implement get_etot method")
+                log.warning("Parser does not implement get_etot method")
dftio/io/siesta/check_siesta.ipynb (3)

101-109: Clear notebook outputs before committing

Large embedded images/base64 inflate the repo and obscure the diff. Please clear all outputs (nbstripout or jupyter nbconvert --clear-output) and keep visuals generated at runtime. If needed, store heavy artifacts via LFS.

Also applies to: 171-181, 384-392, 426-434


158-166: Convert exploratory notebook code into automated tests

Many cells perform assertions/comparisons ideal for pytest (e.g., block diffs, eigenvalue checks). Consider moving them to tests/test_siesta_parse.py with deterministic, project-relative inputs; keep the notebook slim as a usage demo.

Also applies to: 192-201, 217-226, 352-365, 372-389, 414-443, 544-563, 759-775


672-686: Avoid noisy prints; prefer warnings or logging

Raw print warnings reduce notebook signal. Use warnings.warn or logging at INFO/WARN with clear messages; optionally silence verbose third-party outputs where appropriate.

Also applies to: 767-775

dftio/io/siesta/siesta_parser.py (4)

321-330: Rename ambiguous variable ‘l’

‘l’ is easy to confuse with ‘1’. Prefer ‘ell’ or ‘orb_l’. This also aligns with Ruff E741.


5-12: Duplicate import and minor hygiene

from collections import Counter is imported twice (lines 5 and 11). Remove the duplicate.


187-189: Use cleaner exception messages

Long, multi-line strings inline lower readability and trip static checks (TRY003). Prefer concise messages or define custom exceptions if needed.

Also applies to: 219-219, 253-253


178-189: Case-insensitive key matching for FDF

SIESTA inputs are case-insensitive. In addition to the robust value parse, match the key with re.IGNORECASE (as shown above) instead of 'WriteKbands' in line.split().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e69933 and 1ac08b0.

⛔ Files ignored due to path filters (6)
  • test/data/abacus_md/OUT.ABACUS/running_md.log is excluded by !**/*.log
  • test/data/abacus_md/OUT.ABACUS/warning.log is excluded by !**/*.log
  • test/data/abacus_relax/OUT.ABACUS/running_relax.log is excluded by !**/*.log
  • test/data/abacus_scf/OUT.ABACUS/BANDS_1.dat is excluded by !**/*.dat
  • test/data/abacus_scf/OUT.ABACUS/running_scf.log is excluded by !**/*.log
  • test/data/siesta/siesta_out_withband/out.log is excluded by !**/*.log
📒 Files selected for processing (31)
  • .gitignore (1 hunks)
  • dftio/__main__.py (1 hunks)
  • dftio/data/_keys.py (1 hunks)
  • dftio/io/abacus/abacus_parser.py (2 hunks)
  • dftio/io/parse.py (6 hunks)
  • dftio/io/siesta/check_siesta.ipynb (16 hunks)
  • dftio/io/siesta/siesta_parser.py (5 hunks)
  • dftio/plot/plot_eigs.py (1 hunks)
  • test/data/abacus_md/INPUT (1 hunks)
  • test/data/abacus_md/KPT (1 hunks)
  • test/data/abacus_md/OUT.ABACUS/INPUT (1 hunks)
  • test/data/abacus_md/OUT.ABACUS/MD_dump (1 hunks)
  • test/data/abacus_md/OUT.ABACUS/STRU.cif (1 hunks)
  • test/data/abacus_md/STRU (1 hunks)
  • test/data/abacus_md/Si_ONCV_PBE-1.0.upf (1 hunks)
  • test/data/abacus_md/Si_gga_8au_60Ry_2s2p1d.orb (1 hunks)
  • test/data/abacus_relax/INPUT (1 hunks)
  • test/data/abacus_relax/KPT (1 hunks)
  • test/data/abacus_relax/OUT.ABACUS/INPUT (1 hunks)
  • test/data/abacus_relax/OUT.ABACUS/STRU.cif (1 hunks)
  • test/data/abacus_relax/OUT.ABACUS/STRU_ION_D (1 hunks)
  • test/data/abacus_relax/STRU (1 hunks)
  • test/data/abacus_relax/Si_ONCV_PBE-1.0.upf (1 hunks)
  • test/data/abacus_relax/Si_gga_8au_60Ry_2s2p1d.orb (1 hunks)
  • test/data/siesta/siesta_out_simple/RUN.fdf (1 hunks)
  • test/data/siesta/siesta_out_simple/STRUCT.fdf (1 hunks)
  • test/data/siesta/siesta_out_withband/Au_cell.bands (1 hunks)
  • test/data/siesta/siesta_out_withband/RUN_band.fdf (1 hunks)
  • test/data/siesta/siesta_out_withband/STRUCT.fdf (1 hunks)
  • test/test_abacus_energy.py (1 hunks)
  • test/test_siesta_parse.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
test/test_abacus_energy.py (2)
dftio/io/abacus/abacus_parser.py (2)
  • get_etot (516-572)
  • _extract_energy_from_log (421-514)
dftio/io/parse.py (2)
  • write (214-222)
  • formula (88-92)
test/test_siesta_parse.py (1)
dftio/io/siesta/siesta_parser.py (3)
  • SiestaParser (18-551)
  • convert_kpoints_bohr_inv_to_twopi_over_a (28-51)
  • find_content (54-111)
dftio/io/parse.py (1)
dftio/io/abacus/abacus_parser.py (2)
  • get_etot (516-572)
  • get_structure (39-49)
dftio/io/siesta/siesta_parser.py (2)
dftio/io/abacus/abacus_parser.py (4)
  • get_structure (39-49)
  • get_eigenvalue (61-91)
  • get_basis (94-131)
  • get_blocks (134-342)
dftio/io/parse.py (3)
  • get_structure (78-79)
  • get_eigenvalue (143-144)
  • get_blocks (147-148)
🪛 Ruff (0.14.4)
dftio/io/abacus/abacus_parser.py

511-511: Avoid specifying long messages outside the exception class

(TRY003)


540-540: Avoid specifying long messages outside the exception class

(TRY003)

test/test_abacus_energy.py

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)


81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)


121-121: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


127-127: f-string without any placeholders

Remove extraneous f prefix

(F541)


130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

test/test_siesta_parse.py

112-112: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


114-114: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


120-120: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


128-128: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

dftio/io/siesta/check_siesta.ipynb

231-231: Found useless expression. Either assign it to a variable or remove it.

(B018)


248-248: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


275-275: Ambiguous variable name: l

(E741)


293-293: Found useless expression. Either assign it to a variable or remove it.

(B018)


301-301: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

dftio/io/parse.py

214-214: Unused method argument: kwargs

(ARG002)


287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)


335-335: f-string without any placeholders

Remove extraneous f prefix

(F541)

dftio/io/siesta/siesta_parser.py

58-59: Avoid specifying long messages outside the exception class

(TRY003)


92-92: Do not catch blind exception: Exception

(BLE001)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


187-188: Avoid specifying long messages outside the exception class

(TRY003)


219-219: Avoid specifying long messages outside the exception class

(TRY003)


253-253: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Ambiguous variable name: l

(E741)

🔇 Additional comments (19)
dftio/plot/plot_eigs.py (1)

80-83: LGTM! Clean band structure plotting logic.

The conditional logic correctly handles the multiple k-points scenario with appropriate labels and title for band structure visualization.

test/data/abacus_md/KPT (1)

1-4: LGTM! Valid ABACUS K_POINTS format.

The K-points specification correctly defines a single Gamma-point mesh, which is appropriate for test data supporting MD calculations.

.gitignore (1)

160-166: LGTM! Reasonable ignore patterns.

The added patterns appropriately exclude generated test outputs, notebook execution state, temporary documentation, and experimental code from version control.

dftio/data/_keys.py (1)

106-106: LGTM! Well-defined constant for unconverged frame tracking.

The new key follows naming conventions and is appropriately positioned. It's correctly excluded from ALL_ENERGY_KEYS since it represents frame metadata rather than an energy quantity, while still being discoverable via ALLOWED_KEYS.

test/data/abacus_relax/INPUT (1)

1-23: LGTM! Valid ABACUS relaxation input.

The input parameters are correctly formatted for ABACUS relaxation calculations. The convergence thresholds and computational parameters are appropriate for test data, and the included comments about energy cutoff convergence testing are helpful.

test/data/abacus_md/OUT.ABACUS/STRU.cif (1)

1-35: LGTM! Valid CIF structure file.

The CIF file is correctly formatted and represents a silicon structure (likely diamond cubic) with 8 atoms. All required crystallographic information is present and properly structured for test data.

test/data/abacus_relax/OUT.ABACUS/STRU_ION_D (1)

1-22: LGTM! Valid ABACUS structure file.

The STRU_ION_D file is correctly formatted with appropriate movement constraints for relaxation testing. The first atom is fixed while the second can move in the x-direction, which is suitable for testing relaxation energy extraction.

test/data/siesta/siesta_out_simple/STRUCT.fdf (1)

1-20: LGTM! Valid Siesta structure file.

The FDF file is correctly formatted with proper block delimiters and represents a face-centered cubic gold structure. All required sections (LatticeVectors, AtomicCoordinates, ChemicalSpeciesLabel) are present and correctly structured for testing Siesta parsing.

test/data/abacus_relax/OUT.ABACUS/STRU.cif (1)

1-29: LGTM! Valid CIF structure file.

The CIF file is correctly formatted and represents a two-atom silicon structure consistent with the corresponding STRU_ION_D file. All crystallographic information is properly specified for relaxation test scenarios.

test/data/abacus_md/STRU (1)

1-28: STRU fixture looks structurally consistent

ABACUS sections, lattice, and 8 Si positions all look well-formed and appropriate for MD tests; no issues from a parser/fixture perspective.

test/data/abacus_md/OUT.ABACUS/INPUT (1)

1-486: ABACUS MD INPUT test fixture is coherent

Parameter blocks and syntax follow ABACUS INPUT conventions and should exercise the MD parsing logic well; no structural problems spotted.

test/data/abacus_relax/KPT (1)

1-4: K_POINTS block is well‑formed for ABACUS relax tests.

Single Gamma‑point 1×1×1 mesh is valid and sufficient for the added relax dataset; no issues found.

test/data/siesta/siesta_out_withband/STRUCT.fdf (1)

1-20: Siesta STRUCT.fdf structure looks consistent and parseable.

Blocks (lattice, coordinates, species) are syntactically correct and self‑consistent for an Au cell; suitable as test data.

test/data/siesta/siesta_out_withband/Au_cell.bands (1)

1-496: Bands file layout matches expected Siesta .bands format.

Header, band grid, and trailing high‑symmetry point table are consistent; should be parseable by the updated Siesta bands/eigenvalue logic.

test/data/abacus_relax/STRU (1)

1-24: ABACUS STRU file is syntactically correct and coherent.

Species, orbital, lattice, and Cartesian positions are in the expected ABACUS format; suitable for the relax test case.

test/data/abacus_md/OUT.ABACUS/MD_dump (1)

1-187: MD dump format and content are consistent across steps.

Each MDSTEP block has stable lattice data and 8‑atom tables with labeled position/force/velocity columns, matching ABACUS MD dump conventions.

dftio/io/abacus/abacus_parser.py (1)

16-19: Logger setup is appropriate.

Adding a module‑level logging.getLogger(__name__) is a good fit for emitting warnings from get_etot without changing existing behavior elsewhere.

dftio/io/parse.py (1)

330-380: Energy extraction and frame mapping logic looks solid.

The implementation correctly handles:

  • Detection of get_etot method availability
  • Building frame index mapping for unconverged frames in MD/RELAX calculations
  • Distinguishing between single-structure (SCF/NSCF) and trajectory cases
  • Proper indexing when writing energy data to LMDB

The logic appropriately skips energy values for unconverged frames while maintaining correct alignment with structure frames.

dftio/io/siesta/siesta_parser.py (1)

155-156: Comment mismatch

Comment says “abacus does not allow non‑pbc structure” in the SIESTA parser. Update or remove to avoid confusion. [suggest_minor_issue]

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
dftio/io/abacus/abacus_parser.py (1)

555-567: Make MD md_dumpfreq detection more robust and aligned with get_mode

Right now get_etot only looks for md_dumpfreq in self.raw_datas[idx]/"INPUT", and assumes dump_freq = 1 otherwise. This has two practical issues:

  • get_mode reads from OUT.ABACUS/INPUT, so it’s plausible some datasets only carry the OUT.ABACUS/INPUT copy; in that case MD energies will always be treated as if md_dumpfreq = 1, which can desynchronize energies vs. the MD frames you expect to sample.
  • int(line.split()[1]) will raise if the line is malformed or missing the second token.

To avoid silent misconfiguration and fragile parsing, you can try both candidate INPUT locations and defensively parse the value, falling back to 1 on any error. For example:

-        # Get dump frequency for MD mode
-        dump_freq = 1
-        if mode == "md":
-            input_path = os.path.join(self.raw_datas[idx], "INPUT")
-            if os.path.exists(input_path):
-                with open(input_path, 'r') as f:
-                    for line in f:
-                        if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]:
-                            dump_freq = int(line.split()[1])
-                            break
+        # Get dump frequency for MD mode (default to 1 if not found)
+        dump_freq = 1
+        if mode == "md":
+            # Prefer root INPUT, fall back to OUT.ABACUS/INPUT
+            input_candidates = [
+                os.path.join(self.raw_datas[idx], "INPUT"),
+                os.path.join(self.raw_datas[idx], "OUT.ABACUS", "INPUT"),
+            ]
+            for input_path in input_candidates:
+                if not os.path.exists(input_path):
+                    continue
+                try:
+                    with open(input_path, "r") as f_in:
+                        for line in f_in:
+                            tokens = line.split()
+                            if tokens and tokens[0] == "md_dumpfreq":
+                                try:
+                                    dump_freq = int(tokens[1])
+                                except (IndexError, ValueError):
+                                    dump_freq = 1
+                                break
+                except OSError:
+                    # Fall back to default dump_freq = 1 on read errors
+                    continue
+                # Stop once we've inspected the first existing INPUT file
+                break

This keeps behavior unchanged in the common case while avoiding misaligned MD energies when only one INPUT location is present or the md_dumpfreq line is slightly malformed.

🧹 Nitpick comments (1)
dftio/io/abacus/abacus_parser.py (1)

420-520: Energy‑extraction helper is sound; consider minor robustness improvements

The mode‑specific parsing logic (SCF/NSCF reversed scan, MD with NaN‑masking and dump‑frequency handling, RELAX with an explicit ValueError on failure) looks consistent with the documented behavior and aligns with how get_etot consumes the result.

If you want to make this a bit easier to debug in practice, you could:

  • Emit a log.warning or log.debug when returning None for SCF/NSCF/MD (no energies found or all MD frames unconverged), so callers can distinguish “no data” from “parser never ran”.
  • Optionally narrow the RELAX failure message to include the mode and file context (e.g. mode, log path) if you later catch this higher up, though current behavior is already functionally correct.

These are polish‑level tweaks; the core implementation is fine as is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac08b0 and d5573f7.

📒 Files selected for processing (1)
  • dftio/io/abacus/abacus_parser.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
dftio/io/abacus/abacus_parser.py

516-516: Avoid specifying long messages outside the exception class

(TRY003)


549-549: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
dftio/io/abacus/abacus_parser.py (1)

16-18: Module‑level logger wiring looks good

Importing logging and creating a module‑level log = logging.getLogger(__name__) is idiomatic and fits how you use logging later in get_etot.

@AsymmetryChou
Copy link
Collaborator Author

To solve #21 , I add _get_output_dir for general suffix case. If OUT.ABACUS is not found, glob would search and adopt OUT.* as results directory.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
dftio/io/abacus/abacus_parser.py (1)

607-616: Add fallback to OUT.ABACUS/INPUT for md_dumpfreq lookup.

Currently, only self.raw_datas[idx]/INPUT is checked, but get_mode reads from _get_output_dir(idx)/INPUT (typically OUT.ABACUS/INPUT). If only the output directory copy exists, dump_freq will incorrectly default to 1, potentially desynchronizing energies from frames when md_dumpfreq > 1.

Apply this diff to check both locations:

 dump_freq = 1
 if mode == "md":
-    input_path = os.path.join(self.raw_datas[idx], "INPUT")
-    if os.path.exists(input_path):
-        with open(input_path, 'r') as f:
-            for line in f:
-                if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]:
-                    dump_freq = int(line.split()[1])
-                    break
+    # Try root INPUT first, then fall back to OUT.ABACUS/INPUT
+    input_candidates = [
+        os.path.join(self.raw_datas[idx], "INPUT"),
+        os.path.join(self._get_output_dir(idx), "INPUT"),
+    ]
+    for input_path in input_candidates:
+        if not os.path.exists(input_path):
+            continue
+        with open(input_path, 'r') as f:
+            for line in f:
+                tokens = line.split()
+                if tokens and tokens[0] == "md_dumpfreq":
+                    dump_freq = int(tokens[1])
+                    break
+        if dump_freq != 1:
+            break

This also makes the parsing more robust by checking that tokens is non-empty before indexing.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5573f7 and dd216ce.

📒 Files selected for processing (1)
  • dftio/io/abacus/abacus_parser.py (16 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
dftio/io/abacus/abacus_parser.py

76-77: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Starting a process with a shell, possible injection detected

(S605)


567-567: Avoid specifying long messages outside the exception class

(TRY003)


600-600: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)
dftio/io/abacus/abacus_parser.py (4)

16-19: LGTM!

The new imports are appropriate and correctly used throughout the added functionality.


39-87: Well-structured output directory discovery.

The fallback logic (OUT.ABACUS → OUT.* pattern → most recently modified) is robust and handles multiple deployment scenarios gracefully.


471-571: LGTM!

The energy extraction logic correctly handles the different ABACUS calculation modes, properly filters unconverged MD frames, and raises appropriate errors for failed RELAX calculations.


35-35: LGTM!

The refactoring to use _get_output_dir is consistently applied across all methods, centralizing output directory discovery and improving maintainability.

Also applies to: 102-102, 114-119, 149-149, 190-193, 256-289, 305-389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants