Enable total energy extractor from ABACUS output #20

AsymmetryChou · 2025-11-03T06:06:24Z

For #19 , I add method get_etot in abacus_parser.py and related unit tests.

get_etot enable total energy extraction from ABACUS in SCF/NSCF/md/relax cases.
All related unit tests and data are provided.

Summary by CodeRabbit

New Features
- CLI option to include total energies on export; writers now export total energy and unconverged-frame info.
- Siesta parsing: eigenvalues and k‑points extraction; band plotting now shows energy levels for single k‑point cases.
Chores
- Updated ignore rules to exclude test artifacts, notebooks, and playground files.
Tests
- Added tests covering energy extraction and Siesta parsing.
Data
- Added multiple ABACUS and Siesta example/input/output datasets.

…igenvalue for k-point extraction

…lue extraction

…`, `nscf` and `relax` )

…tic_method

floatingCatty · 2025-11-14T04:23:19Z

dftio/io/abacus/abacus_parser.py

+                            break
+
+        # Extract energy
+        energy = self._extract_energy_from_log(loglines, mode, dump_freq)


If there are unconverged snapshots in MD or relax, the energy array will be smaller than the nframe, this might lead the output energy lose the correspondance with other structural or electronic properties output.

This convention is adopted from dpdata which directly ignores the unconverged frames' energy. However, in dftio, I still choose to output the complete frames's structure and hamiltonians, etc. , to keep the original functions working well. Further development is needed.

Temporarily, the unconverged frames' index would be written in results_dir.

dftio/io/parse.py

coderabbitai · 2025-11-16T13:50:47Z

📝 Walkthrough

Walkthrough

Adds total-energy extraction and unconverged MD-frame reporting to ABACUS parsing and output writers; refactors Siesta file discovery and eigenvalue parsing (unit conversion); exposes an --energy CLI flag; updates plotting for single k-point; adds many test fixtures, notebooks, and unit tests.

Changes

Cohort / File(s)	Summary
CLI Enhancement `dftio/__main__.py`	Added `-energy` / `--energy` flag to the `parse` subcommand to request total-energy extraction when writing outputs.
Keys & Constants `dftio/data/_keys.py`	Added `UNCONVERGED_FRAME_INDICES_KEY` (`"unconverged_frames"`) constant for reporting unconverged MD frames.
ABACUS parser `dftio/io/abacus/abacus_parser.py`	Added module logger; `_get_output_dir(self, idx)` to discover ABACUS output dirs; `_extract_energy_from_log(loglines, mode, dump_freq=1)` to parse SCF/MD/RELAX energies and detect unconverged frames; extended `get_etot` to return `{TOTAL_ENERGY_KEY, UNCONVERGED_FRAME_INDICES_KEY}`; updated path resolution to use discovered output dir.
Output writers / Parse pipeline `dftio/io/parse.py`	Propagated `energy` flag through `Parser.write`, `write_dat`, and `write_lmdb`; added logic to call `get_etot()` when `energy=True`, write `total_energy` and `unconverged_frames` to DAT, and populate per-frame TOTAL_ENERGY_KEY entries in LMDB with mapping that accounts for trajectories and unconverged frames.
Siesta parser refactor & eigenvalues `dftio/io/siesta/siesta_parser.py`	Made `find_content` a static method with improved candidate selection and error handling; added `convert_kpoints_bohr_inv_to_twopi_over_a`; added `get_eigenvalue` returning energies and k-points (unit conversion); updated structure/basis/blocks retrieval to use the new discovery helper.
Plotting `dftio/plot/plot_eigs.py`	Conditional plotting: multi k-point -> band-structure; single k-point -> horizontal energy-level plot with adjusted labels/titles.
Repository ignores `.gitignore`	Added ignore rules for test artifacts and auxiliary files (`test/data/siesta/siesta_out/`, `example/siesta_io/siesta_io.ipynb`, `playground/`, `CLAUDE.md`, etc.).
Test fixtures — ABACUS MD `test/data/abacus_md/*`	Added ABACUS MD dataset files: `INPUT`, `KPT`, `STRU`, `OUT.ABACUS/INPUT`, `OUT.ABACUS/MD_dump`, `OUT.ABACUS/STRU.cif`, plus `Si_ONCV_PBE-1.0.upf` and `.orb` orbital files.
Test fixtures — ABACUS Relax `test/data/abacus_relax/*`	Added relaxation dataset files: `INPUT`, `KPT`, `STRU`, `OUT.ABACUS/*` (`INPUT`, `STRU.cif`, `STRU_ION_D`), plus `Si_ONCV_PBE-1.0.upf` and `.orb` orbital files.
Test fixtures — Siesta `test/data/siesta/*`, `dftio/io/siesta/check_siesta.ipynb`	Added Siesta input/structure/band output files for Au examples and updated notebook cells (expanded parsing/inspection outputs).
New unit tests `test/test_abacus_energy.py`, `test/test_siesta_parse.py`	Added tests for ABACUS energy extraction (SCF/MD/RELAX, unconverged frames, DAT/LMDB writing) and Siesta parsing (k-point conversion, content discovery, eigenvalues, structure/basis/blocks).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Handler
    participant Parser as Parser.write()
    participant Abacus as AbacusParser.get_etot()
    participant Log as ABACUS log file
    participant Writer as Output Writer (DAT / LMDB)
    participant FS as File System

    CLI->>Parser: write(idx, ..., energy=True)
    Parser->>Abacus: get_etot(idx)
    Abacus->>FS: _get_output_dir(idx) & open log file
    FS-->>Abacus: loglines
    Abacus->>Abacus: _extract_energy_from_log(loglines, mode, dump_freq)
    Abacus-->>Parser: { total_energy, unconverged_frames }
    Parser->>Parser: build energy_frame_mapping (if trajectory)
    alt energy data available
        Parser->>Writer: write total_energy / unconverged_frames (DAT/LMDB)
        Writer->>FS: write files / LMDB entries
        Writer-->>Parser: success
    else
        Parser->>Parser: log warning (no energy)
    end
    Parser-->>CLI: done

sequenceDiagram
    participant Files as Siesta files
    participant Parser as SiestaParser
    participant Finder as find_content() (static)
    participant Conv as convert_kpoints_bohr_inv_to_twopi_over_a

    Parser->>Finder: locate k-point/log content
    Finder->>Files: open/search candidate files (.fdf, .bands, logs)
    Files-->>Finder: matching content (kpoints in 1/Bohr)
    Finder-->>Parser: kpoints_bohr_inv
    Parser->>Conv: convert(kpoints_bohr_inv, lattice_a)
    Conv-->>Parser: kpoints (2π/a units)
    Parser->>Files: read .bands file
    Files-->>Parser: eigenvalue blocks
    Parser-->>Caller: { energies, kpoints }

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:
- dftio/io/abacus/abacus_parser.py — correctness of _extract_energy_from_log for SCF/MD/RELAX, dump-frequency handling, unconverged-frame detection, and logging/error paths.
- dftio/io/parse.py — verify energy propagation, energy_frame_mapping alignment for trajectory vs single-structure cases, and LMDB/DAT write semantics.
- dftio/io/siesta/siesta_parser.py — ensure all call sites are updated for the static find_content signature and validate k-point unit conversion and eigenvalue parsing.
- New tests (test/test_abacus_energy.py, test/test_siesta_parse.py) — confirm fixtures match parser expectations and are environment-independent.
- Large fixture files (*.upf, *.orb) — spot-check parser compatibility and intended inclusion.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.11% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: adding total energy extraction capability from ABACUS output. It directly reflects the primary objective (get_etot method) and the primary file modified (abacus_parser.py).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd216ce and 8852590.

📒 Files selected for processing (1)

dftio/io/abacus/abacus_parser.py (16 hunks)

🧰 Additional context used

🪛 Ruff (0.14.5)

dftio/io/abacus/abacus_parser.py

76-77: Avoid specifying long messages outside the exception class

(TRY003)

193-193: subprocess call: check for execution of untrusted input

(S603)

194-194: Starting a process with a partial executable path

(S607)

571-571: Avoid specifying long messages outside the exception class

(TRY003)

604-604: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)

dftio/io/abacus/abacus_parser.py (9)

16-22: LGTM! Clean import additions and logger setup.

The new imports (logging, glob, subprocess) are all used appropriately in the code, and the module-level logger follows Python best practices.

39-87: Well-designed output directory discovery with sensible fallback logic.

The method correctly handles the issue #21 requirement by falling back to OUT.* patterns when OUT.ABACUS is not found. The multiple-match handling (using most recent modification time) is a reasonable heuristic.

35-35: LGTM! Correctly uses the new output directory discovery.

The change properly integrates _get_output_dir for structure loading in SCF/NSCF modes.

102-102: LGTM! Consistent use of output directory discovery.

114-119: LGTM! Eigenvalue extraction correctly refactored.

149-149: LGTM! Basis extraction updated consistently.

190-196: Excellent security improvement replacing os.system with subprocess.run.

The subprocess call is properly structured with a list of arguments and check=True to raise exceptions on failure. The static analysis warnings (S603/S607) are acceptable here since paths come from internal data structures.

260-393: LGTM! Comprehensive refactoring of matrix file access.

All matrix file paths now correctly use the discovered output directory, maintaining support for both hierarchical (matrix/) and flat file structures.

621-636: LGTM! Clean energy extraction and result handling.

The method properly calls the extraction helper, handles None results, logs warnings for unconverged frames, and returns a well-structured dictionary. The warning at line 631 provides valuable feedback to users about data quality.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dftio/io/siesta/siesta_parser.py (1)
491-536: density_matrix block mask is computed but not applied; keys/values can misalign

You compute block_mask but then update with all blocks:
block_mask = np.abs(block).max(axis=(1,2)) > cut_tol_dm
if np.any(block_mask):
    keys = list(keys)
    keys = [keys[k] for k,t in enumerate(block_mask) if t]
    density_matrix_dict.update(dict(zip(keys, block)))  # <- not sliced
This zips masked keys with the full block array (first axis), silently misassigning R‑blocks.

Fix:
-    density_matrix_dict.update(dict(zip(keys, block)))
+    density_matrix_dict.update(dict(zip(keys, block[block_mask])))
Mirror the same filtering used in Hamiltonian/Overlap. Add a unit test to catch this.

♻️ Duplicate comments (1)

test/data/abacus_relax/Si_ONCV_PBE-1.0.upf (1)

1-1226: Same licensing concern as MD UPF; consider de‑duplication

This UPF duplicates the CC BY‑SA–licensed Si pseudopotential already added under test/data/abacus_md. The same license‑compatibility check applies here; you might also consider de‑duplicating via a shared fixture or symlink if your test layout allows it.

🧹 Nitpick comments (18)

dftio/plot/plot_eigs.py (1)
84-88: Move ax.set_xticks([]) outside the loop.

The call to ax.set_xticks([]) on line 87 is inside the for loop, causing it to execute once per energy level. Since clearing the x-ticks only needs to happen once, move it outside the loop for better efficiency.

Apply this diff to move the call outside the loop:
     else:
         for e in self.eigs[0, bmin:bmax]:
             ax.hlines(e, xmin=-0.5, xmax=0.5, color='b', linewidth=1)
-            ax.set_xticks([])
+        ax.set_xticks([])
         ax.set_title("Energy levels")
dftio/__main__.py (1)

156-161: Energy flag wiring is fine; validate downstream handling and consider short option style

The new -energy / --energy flag is correctly defined and will be forwarded via dict_args into ParserRegister and parser.write, assuming those call sites accept an energy keyword.

Please confirm that:

ParserRegister.__init__ and .write() accept an energy: bool kwarg (or **kwargs) and gate energy parsing on it.

Tests cover dftio parse ... --energy for at least ABACUS and Siesta paths.

As a minor style nit, you might consider switching the short flag to a single letter (e.g., -E) to match typical argparse conventions.

test/data/abacus_md/INPUT (1)

1-32: MD INPUT is valid; consider adding md_dumpfreq if you want to test non‑default dump intervals.

Current parameters are reasonable for an MD test, and get_etot will treat md_dumpfreq as 1 since it’s not specified.
If you plan to cover non‑unit dump frequencies in tests, adding an explicit md_dumpfreq line here (or in a separate dataset) would improve coverage.
dftio/io/abacus/abacus_parser.py (1)
469-497: Make MD convergence detection more robust to message variants.

For MD you currently treat only lines containing "!! convergence has not been achieved" as unconverged steps, while SCF mode handles additional variants like "convergence has NOT been achieved!" and "convergence has not been achieved".

To avoid silently missing unconverged MD frames on other ABACUS builds, you can reuse the broader pattern from the SCF branch:
-        elif mode == "md":
-            # For MD, extract all energies at dump intervals
-            nenergy = 0
-            for line in loglines:
-                if "final etot is" in line:
-                    if nenergy % dump_freq == 0:
-                        energy.append(float(line.split()[-2]))
-                    nenergy += 1
-                elif "!! convergence has not been achieved" in line:
-                    if nenergy % dump_freq == 0:
-                        energy.append(np.nan)
-                    nenergy += 1
+        elif mode == "md":
+            # For MD, extract all energies at dump intervals
+            nenergy = 0
+            for line in loglines:
+                if "final etot is" in line:
+                    if nenergy % dump_freq == 0:
+                        energy.append(float(line.split()[-2]))
+                    nenergy += 1
+                elif (
+                    "!! convergence has not been achieved" in line
+                    or "convergence has NOT been achieved!" in line
+                    or "convergence has not been achieved" in line
+                ):
+                    if nenergy % dump_freq == 0:
+                        energy.append(np.nan)
+                    nenergy += 1
This keeps the current behavior but tolerates slightly different wording in the logs.
test/test_abacus_energy.py (3)
79-81: Drop unnecessary f-strings to satisfy Ruff F541 and keep the code clean

The strings at the following locations are plain literals and don't need f-strings:

Line 79: f"Loaded energy doesn't match expected value"

Line 81: print(f"DAT format energy write test passed")

Line 127: f"LMDB energy doesn't match expected value"

Line 130: print(f"LMDB format energy write test passed")

You can simplify them as regular strings to quiet Ruff and reduce noise:
-        assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \
-            f"Loaded energy doesn't match expected value"
+        assert np.isclose(loaded_energy[0], -1879.7169812, atol=1e-5), \
+            "Loaded energy doesn't match expected value"
@@
-        print(f"DAT format energy write test passed")
+        print("DAT format energy write test passed")
@@
-            assert np.isclose(energy, -1879.7169812, atol=1e-5), \
-                f"LMDB energy doesn't match expected value"
+            assert np.isclose(energy, -1879.7169812, atol=1e-5), \
+                "LMDB energy doesn't match expected value"
@@
-        env.close()
-        print(f"LMDB format energy write test passed")
+        env.close()
+        print("LMDB format energy write test passed")
As per static analysis hints

Also applies to: 127-130

195-197: Fix RELAX test comment to match the actual expectation

The comment says “4 frames in test data” but expected_nframes = 1 and the assertion checks for a single value:
# Check shape - should be [nframes,] for RELAX (4 frames in test data)
expected_nframes = 1
Either update the comment to say 1 frame or, if there really are 4 !FINAL_ETOT_IS entries intended, change expected_nframes accordingly. Right now the mismatch is misleading for anyone maintaining the tests.

301-309: Consider dropping the manual __main__ test runner

The explicit if __name__ == "__main__": ... block duplicates pytest’s test discovery and isn’t typically needed in pytest-based suites. Removing it will keep the module focused on declarative tests and avoid accidental divergence between “manual” and pytest behavior.

If you keep it for local debugging, that’s fine, but it’s worth confirming it stays in sync whenever test signatures change.
test/test_siesta_parse.py (1)
112-128: Use isinstance() for type checks.

The tests use type(x) == dict for type comparisons. Python best practice is to use isinstance(x, dict) instead, as it properly handles subclass relationships and is more idiomatic.

Apply this diff to improve the type checks:
-    assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)) == tuple  
+    assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False), tuple)

-    assert type(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0,hamiltonian=True,overlap=False,density_matrix=False)[0][0], dict)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['2_3_0_-1_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['0_0_0_0_0'][0,1] == pytest.approx(-2.3722997)
     assert parser.get_blocks(idx=0, hamiltonian=True, overlap=False, density_matrix=False)[0][0]['1_2_0_-1_0'][0,1] == pytest.approx(-4.234535e-07)

-    assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0], dict)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['2_3_0_-1_0'].shape == (15, 15)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,0] == pytest.approx(1.00000000)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,1] == pytest.approx(0.9065072)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['0_0_0_0_0'][0,2] == pytest.approx(0.0)
     assert parser.get_blocks(idx=0, hamiltonian=False, overlap=True, density_matrix=False)[1][0]['1_2_0_1_0'][1,0] == pytest.approx(0.07965754)

-    assert type(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0]) == dict
+    assert isinstance(parser.get_blocks(idx=0, hamiltonian=False, overlap=False, density_matrix=True)[2][0], dict)
dftio/io/parse.py (3)
214-214: Remove unused **kwargs parameter.

The **kwargs parameter is declared but never used in the method body. If it's not needed for future extensibility, remove it to keep the signature clean.

Apply this diff if kwargs is not needed:
-    def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False, **kwargs):
+    def write(self, idx, outroot, format, eigenvalue, hamiltonian, overlap, density_matrix, band_index_min, energy=False):
Alternatively, if kwargs is intended for future use, add a docstring comment explaining this design decision.

287-287: Remove unnecessary f-string prefix.

Line 287 uses an f-string without any placeholders. Use a regular string instead.

Apply this diff:
-                log.warning(f"Parser does not implement get_etot method")
+                log.warning("Parser does not implement get_etot method")
335-335: Remove unnecessary f-string prefix.

Line 335 uses an f-string without any placeholders. Use a regular string instead.

Apply this diff:
-                log.warning(f"Parser does not implement get_etot method")
+                log.warning("Parser does not implement get_etot method")
dftio/io/siesta/check_siesta.ipynb (3)

101-109: Clear notebook outputs before committing

Large embedded images/base64 inflate the repo and obscure the diff. Please clear all outputs (nbstripout or jupyter nbconvert --clear-output) and keep visuals generated at runtime. If needed, store heavy artifacts via LFS.

Also applies to: 171-181, 384-392, 426-434

158-166: Convert exploratory notebook code into automated tests

Many cells perform assertions/comparisons ideal for pytest (e.g., block diffs, eigenvalue checks). Consider moving them to tests/test_siesta_parse.py with deterministic, project-relative inputs; keep the notebook slim as a usage demo.

Also applies to: 192-201, 217-226, 352-365, 372-389, 414-443, 544-563, 759-775

672-686: Avoid noisy prints; prefer warnings or logging

Raw print warnings reduce notebook signal. Use warnings.warn or logging at INFO/WARN with clear messages; optionally silence verbose third-party outputs where appropriate.

Also applies to: 767-775

dftio/io/siesta/siesta_parser.py (4)

321-330: Rename ambiguous variable ‘l’

‘l’ is easy to confuse with ‘1’. Prefer ‘ell’ or ‘orb_l’. This also aligns with Ruff E741.

5-12: Duplicate import and minor hygiene

from collections import Counter is imported twice (lines 5 and 11). Remove the duplicate.

187-189: Use cleaner exception messages

Long, multi-line strings inline lower readability and trip static checks (TRY003). Prefer concise messages or define custom exceptions if needed.

Also applies to: 219-219, 253-253

178-189: Case-insensitive key matching for FDF

SIESTA inputs are case-insensitive. In addition to the robust value parse, match the key with re.IGNORECASE (as shown above) instead of 'WriteKbands' in line.split().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e69933 and 1ac08b0.

⛔ Files ignored due to path filters (6)

test/data/abacus_md/OUT.ABACUS/running_md.log is excluded by !**/*.log
test/data/abacus_md/OUT.ABACUS/warning.log is excluded by !**/*.log
test/data/abacus_relax/OUT.ABACUS/running_relax.log is excluded by !**/*.log
test/data/abacus_scf/OUT.ABACUS/BANDS_1.dat is excluded by !**/*.dat
test/data/abacus_scf/OUT.ABACUS/running_scf.log is excluded by !**/*.log
test/data/siesta/siesta_out_withband/out.log is excluded by !**/*.log

📒 Files selected for processing (31)

.gitignore (1 hunks)
dftio/__main__.py (1 hunks)
dftio/data/_keys.py (1 hunks)
dftio/io/abacus/abacus_parser.py (2 hunks)
dftio/io/parse.py (6 hunks)
dftio/io/siesta/check_siesta.ipynb (16 hunks)
dftio/io/siesta/siesta_parser.py (5 hunks)
dftio/plot/plot_eigs.py (1 hunks)
test/data/abacus_md/INPUT (1 hunks)
test/data/abacus_md/KPT (1 hunks)
test/data/abacus_md/OUT.ABACUS/INPUT (1 hunks)
test/data/abacus_md/OUT.ABACUS/MD_dump (1 hunks)
test/data/abacus_md/OUT.ABACUS/STRU.cif (1 hunks)
test/data/abacus_md/STRU (1 hunks)
test/data/abacus_md/Si_ONCV_PBE-1.0.upf (1 hunks)
test/data/abacus_md/Si_gga_8au_60Ry_2s2p1d.orb (1 hunks)
test/data/abacus_relax/INPUT (1 hunks)
test/data/abacus_relax/KPT (1 hunks)
test/data/abacus_relax/OUT.ABACUS/INPUT (1 hunks)
test/data/abacus_relax/OUT.ABACUS/STRU.cif (1 hunks)
test/data/abacus_relax/OUT.ABACUS/STRU_ION_D (1 hunks)
test/data/abacus_relax/STRU (1 hunks)
test/data/abacus_relax/Si_ONCV_PBE-1.0.upf (1 hunks)
test/data/abacus_relax/Si_gga_8au_60Ry_2s2p1d.orb (1 hunks)
test/data/siesta/siesta_out_simple/RUN.fdf (1 hunks)
test/data/siesta/siesta_out_simple/STRUCT.fdf (1 hunks)
test/data/siesta/siesta_out_withband/Au_cell.bands (1 hunks)
test/data/siesta/siesta_out_withband/RUN_band.fdf (1 hunks)
test/data/siesta/siesta_out_withband/STRUCT.fdf (1 hunks)
test/test_abacus_energy.py (1 hunks)
test/test_siesta_parse.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

test/test_abacus_energy.py (2)

dftio/io/abacus/abacus_parser.py (2)

get_etot (516-572)

_extract_energy_from_log (421-514)

dftio/io/parse.py (2)

write (214-222)

formula (88-92)

test/test_siesta_parse.py (1)

dftio/io/siesta/siesta_parser.py (3)

SiestaParser (18-551)

convert_kpoints_bohr_inv_to_twopi_over_a (28-51)

find_content (54-111)

dftio/io/parse.py (1)

dftio/io/abacus/abacus_parser.py (2)

get_etot (516-572)

get_structure (39-49)

dftio/io/siesta/siesta_parser.py (2)

dftio/io/abacus/abacus_parser.py (4)

get_structure (39-49)

get_eigenvalue (61-91)

get_basis (94-131)

get_blocks (134-342)

dftio/io/parse.py (3)

get_structure (78-79)

get_eigenvalue (143-144)

get_blocks (147-148)

🪛 Ruff (0.14.4)

dftio/io/abacus/abacus_parser.py

511-511: Avoid specifying long messages outside the exception class

(TRY003)

540-540: Avoid specifying long messages outside the exception class

(TRY003)

test/test_abacus_energy.py

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

81-81: f-string without any placeholders

Remove extraneous f prefix

(F541)

121-121: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

127-127: f-string without any placeholders

Remove extraneous f prefix

(F541)

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

test/test_siesta_parse.py

112-112: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

114-114: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

120-120: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

128-128: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

dftio/io/siesta/check_siesta.ipynb

231-231: Found useless expression. Either assign it to a variable or remove it.

(B018)

248-248: Comment contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

275-275: Ambiguous variable name: l

(E741)

293-293: Found useless expression. Either assign it to a variable or remove it.

(B018)

301-301: Comment contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

dftio/io/parse.py

214-214: Unused method argument: kwargs

(ARG002)

287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)

335-335: f-string without any placeholders

Remove extraneous f prefix

(F541)

dftio/io/siesta/siesta_parser.py

58-59: Avoid specifying long messages outside the exception class

(TRY003)

92-92: Do not catch blind exception: Exception

(BLE001)

109-109: Avoid specifying long messages outside the exception class

(TRY003)

187-188: Avoid specifying long messages outside the exception class

(TRY003)

219-219: Avoid specifying long messages outside the exception class

(TRY003)

253-253: Avoid specifying long messages outside the exception class

(TRY003)

321-321: Ambiguous variable name: l

(E741)

🔇 Additional comments (19)

dftio/plot/plot_eigs.py (1)

80-83: LGTM! Clean band structure plotting logic.

The conditional logic correctly handles the multiple k-points scenario with appropriate labels and title for band structure visualization.

test/data/abacus_md/KPT (1)

1-4: LGTM! Valid ABACUS K_POINTS format.

The K-points specification correctly defines a single Gamma-point mesh, which is appropriate for test data supporting MD calculations.

.gitignore (1)

160-166: LGTM! Reasonable ignore patterns.

The added patterns appropriately exclude generated test outputs, notebook execution state, temporary documentation, and experimental code from version control.

dftio/data/_keys.py (1)

106-106: LGTM! Well-defined constant for unconverged frame tracking.

The new key follows naming conventions and is appropriately positioned. It's correctly excluded from ALL_ENERGY_KEYS since it represents frame metadata rather than an energy quantity, while still being discoverable via ALLOWED_KEYS.

test/data/abacus_relax/INPUT (1)

1-23: LGTM! Valid ABACUS relaxation input.

The input parameters are correctly formatted for ABACUS relaxation calculations. The convergence thresholds and computational parameters are appropriate for test data, and the included comments about energy cutoff convergence testing are helpful.

test/data/abacus_md/OUT.ABACUS/STRU.cif (1)

1-35: LGTM! Valid CIF structure file.

The CIF file is correctly formatted and represents a silicon structure (likely diamond cubic) with 8 atoms. All required crystallographic information is present and properly structured for test data.

test/data/abacus_relax/OUT.ABACUS/STRU_ION_D (1)

1-22: LGTM! Valid ABACUS structure file.

The STRU_ION_D file is correctly formatted with appropriate movement constraints for relaxation testing. The first atom is fixed while the second can move in the x-direction, which is suitable for testing relaxation energy extraction.

test/data/siesta/siesta_out_simple/STRUCT.fdf (1)

1-20: LGTM! Valid Siesta structure file.

The FDF file is correctly formatted with proper block delimiters and represents a face-centered cubic gold structure. All required sections (LatticeVectors, AtomicCoordinates, ChemicalSpeciesLabel) are present and correctly structured for testing Siesta parsing.

test/data/abacus_relax/OUT.ABACUS/STRU.cif (1)

1-29: LGTM! Valid CIF structure file.

The CIF file is correctly formatted and represents a two-atom silicon structure consistent with the corresponding STRU_ION_D file. All crystallographic information is properly specified for relaxation test scenarios.

test/data/abacus_md/STRU (1)

1-28: STRU fixture looks structurally consistent

ABACUS sections, lattice, and 8 Si positions all look well-formed and appropriate for MD tests; no issues from a parser/fixture perspective.

test/data/abacus_md/OUT.ABACUS/INPUT (1)

1-486: ABACUS MD INPUT test fixture is coherent

Parameter blocks and syntax follow ABACUS INPUT conventions and should exercise the MD parsing logic well; no structural problems spotted.

test/data/abacus_relax/KPT (1)

1-4: K_POINTS block is well‑formed for ABACUS relax tests.

Single Gamma‑point 1×1×1 mesh is valid and sufficient for the added relax dataset; no issues found.

test/data/siesta/siesta_out_withband/STRUCT.fdf (1)

1-20: Siesta STRUCT.fdf structure looks consistent and parseable.

Blocks (lattice, coordinates, species) are syntactically correct and self‑consistent for an Au cell; suitable as test data.

test/data/siesta/siesta_out_withband/Au_cell.bands (1)

1-496: Bands file layout matches expected Siesta .bands format.

Header, band grid, and trailing high‑symmetry point table are consistent; should be parseable by the updated Siesta bands/eigenvalue logic.

test/data/abacus_relax/STRU (1)

1-24: ABACUS STRU file is syntactically correct and coherent.

Species, orbital, lattice, and Cartesian positions are in the expected ABACUS format; suitable for the relax test case.

test/data/abacus_md/OUT.ABACUS/MD_dump (1)

1-187: MD dump format and content are consistent across steps.

Each MDSTEP block has stable lattice data and 8‑atom tables with labeled position/force/velocity columns, matching ABACUS MD dump conventions.

dftio/io/abacus/abacus_parser.py (1)

16-19: Logger setup is appropriate.

Adding a module‑level logging.getLogger(__name__) is a good fit for emitting warnings from get_etot without changing existing behavior elsewhere.

dftio/io/parse.py (1)

330-380: Energy extraction and frame mapping logic looks solid.

The implementation correctly handles:

Detection of get_etot method availability

Building frame index mapping for unconverged frames in MD/RELAX calculations

Distinguishing between single-structure (SCF/NSCF) and trajectory cases

Proper indexing when writing energy data to LMDB

The logic appropriately skips energy values for unconverged frames while maintaining correct alignment with structure frames.

dftio/io/siesta/siesta_parser.py (1)

155-156: Comment mismatch

Comment says “abacus does not allow non‑pbc structure” in the SIESTA parser. Update or remove to avoid confusion. [suggest_minor_issue]

dftio/io/abacus/abacus_parser.py

dftio/io/siesta/check_siesta.ipynb

dftio/io/siesta/siesta_parser.py

test/data/abacus_md/Si_ONCV_PBE-1.0.upf

test/test_abacus_energy.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

dftio/io/abacus/abacus_parser.py (1)

555-567: Make MD md_dumpfreq detection more robust and aligned with get_mode

Right now get_etot only looks for md_dumpfreq in self.raw_datas[idx]/"INPUT", and assumes dump_freq = 1 otherwise. This has two practical issues:

get_mode reads from OUT.ABACUS/INPUT, so it’s plausible some datasets only carry the OUT.ABACUS/INPUT copy; in that case MD energies will always be treated as if md_dumpfreq = 1, which can desynchronize energies vs. the MD frames you expect to sample.
int(line.split()[1]) will raise if the line is malformed or missing the second token.

To avoid silent misconfiguration and fragile parsing, you can try both candidate INPUT locations and defensively parse the value, falling back to 1 on any error. For example:

-        # Get dump frequency for MD mode
-        dump_freq = 1
-        if mode == "md":
-            input_path = os.path.join(self.raw_datas[idx], "INPUT")
-            if os.path.exists(input_path):
-                with open(input_path, 'r') as f:
-                    for line in f:
-                        if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]:
-                            dump_freq = int(line.split()[1])
-                            break
+        # Get dump frequency for MD mode (default to 1 if not found)
+        dump_freq = 1
+        if mode == "md":
+            # Prefer root INPUT, fall back to OUT.ABACUS/INPUT
+            input_candidates = [
+                os.path.join(self.raw_datas[idx], "INPUT"),
+                os.path.join(self.raw_datas[idx], "OUT.ABACUS", "INPUT"),
+            ]
+            for input_path in input_candidates:
+                if not os.path.exists(input_path):
+                    continue
+                try:
+                    with open(input_path, "r") as f_in:
+                        for line in f_in:
+                            tokens = line.split()
+                            if tokens and tokens[0] == "md_dumpfreq":
+                                try:
+                                    dump_freq = int(tokens[1])
+                                except (IndexError, ValueError):
+                                    dump_freq = 1
+                                break
+                except OSError:
+                    # Fall back to default dump_freq = 1 on read errors
+                    continue
+                # Stop once we've inspected the first existing INPUT file
+                break

This keeps behavior unchanged in the common case while avoiding misaligned MD energies when only one INPUT location is present or the md_dumpfreq line is slightly malformed.

🧹 Nitpick comments (1)

dftio/io/abacus/abacus_parser.py (1)

420-520: Energy‑extraction helper is sound; consider minor robustness improvements

The mode‑specific parsing logic (SCF/NSCF reversed scan, MD with NaN‑masking and dump‑frequency handling, RELAX with an explicit ValueError on failure) looks consistent with the documented behavior and aligns with how get_etot consumes the result.

If you want to make this a bit easier to debug in practice, you could:

Emit a log.warning or log.debug when returning None for SCF/NSCF/MD (no energies found or all MD frames unconverged), so callers can distinguish “no data” from “parser never ran”.

Optionally narrow the RELAX failure message to include the mode and file context (e.g. mode, log path) if you later catch this higher up, though current behavior is already functionally correct.

These are polish‑level tweaks; the core implementation is fine as is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac08b0 and d5573f7.

📒 Files selected for processing (1)

dftio/io/abacus/abacus_parser.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.4)

dftio/io/abacus/abacus_parser.py

516-516: Avoid specifying long messages outside the exception class

(TRY003)

549-549: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)

dftio/io/abacus/abacus_parser.py (1)

16-18: Module‑level logger wiring looks good

Importing logging and creating a module‑level log = logging.getLogger(__name__) is idiomatic and fits how you use logging later in get_etot.

AsymmetryChou · 2025-11-18T06:29:00Z

To solve #21 , I add _get_output_dir for general suffix case. If OUT.ABACUS is not found, glob would search and adopt OUT.* as results directory.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

dftio/io/abacus/abacus_parser.py (1)

607-616: Add fallback to OUT.ABACUS/INPUT for md_dumpfreq lookup.

Currently, only self.raw_datas[idx]/INPUT is checked, but get_mode reads from _get_output_dir(idx)/INPUT (typically OUT.ABACUS/INPUT). If only the output directory copy exists, dump_freq will incorrectly default to 1, potentially desynchronizing energies from frames when md_dumpfreq > 1.

Apply this diff to check both locations:

 dump_freq = 1
 if mode == "md":
-    input_path = os.path.join(self.raw_datas[idx], "INPUT")
-    if os.path.exists(input_path):
-        with open(input_path, 'r') as f:
-            for line in f:
-                if len(line) > 0 and "md_dumpfreq" in line and "md_dumpfreq" == line.split()[0]:
-                    dump_freq = int(line.split()[1])
-                    break
+    # Try root INPUT first, then fall back to OUT.ABACUS/INPUT
+    input_candidates = [
+        os.path.join(self.raw_datas[idx], "INPUT"),
+        os.path.join(self._get_output_dir(idx), "INPUT"),
+    ]
+    for input_path in input_candidates:
+        if not os.path.exists(input_path):
+            continue
+        with open(input_path, 'r') as f:
+            for line in f:
+                tokens = line.split()
+                if tokens and tokens[0] == "md_dumpfreq":
+                    dump_freq = int(tokens[1])
+                    break
+        if dump_freq != 1:
+            break

This also makes the parsing more robust by checking that tokens is non-empty before indexing.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5573f7 and dd216ce.

📒 Files selected for processing (1)

dftio/io/abacus/abacus_parser.py (16 hunks)

🧰 Additional context used

🪛 Ruff (0.14.5)

dftio/io/abacus/abacus_parser.py

76-77: Avoid specifying long messages outside the exception class

(TRY003)

192-192: Starting a process with a shell, possible injection detected

(S605)

567-567: Avoid specifying long messages outside the exception class

(TRY003)

600-600: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)

dftio/io/abacus/abacus_parser.py (4)

16-19: LGTM!

The new imports are appropriate and correctly used throughout the added functionality.

39-87: Well-structured output directory discovery.

The fallback logic (OUT.ABACUS → OUT.* pattern → most recently modified) is robust and handles multiple deployment scenarios gracefully.

471-571: LGTM!

The energy extraction logic correctly handles the different ABACUS calculation modes, properly filters unconverged MD frames, and raises appropriate errors for failed RELAX calculations.

35-35: LGTM!

The refactoring to use _get_output_dir is consistently applied across all methods, centralizing output directory discovery and improving maintainability.

Also applies to: 102-102, 114-119, 149-149, 190-193, 256-289, 305-389

dftio/io/abacus/abacus_parser.py

AsymmetryChou added 30 commits August 23, 2024 12:39

update get_basis

e3a3d08

update get_structure

ef97257

update check_siesta.ipynb

36e9430

update check_siesta.ipynb

d09c362

Merge branch 'main' into io_siesta

5dba3e2

feat( SIESTA parser): enhance find_content method and implement get_e…

8087e74

…igenvalue for k-point extraction

fix(README): update SIESTA support status to include eigenvalues

39d4144

feat(siesta_parser): add k-points unit conversion and improve eigenva…

c89819d

…lue extraction

feat(test): add test data for SIESTA

413a430

update try_siesta.ipynb

5fc959d

fix(try_siesta): update notebook

309502b

set find_content as staticmethod

a67cbca

feature(siesta_parse): add unittest

a2cff97

Add unit test for parsing Siesta output data

092c7fb

add PROJECT_ROOT to enforce pytest the local code ./dftio

1ec76f2

add DM file

250210d

add PROJECT_root and test for DM

2b937d8

update try_siesta.ipynb

b95f5fd

增强SiestaParser以支持多个文件搜索和错误处理，改进LatticeVectors和.band解析

36571a9

更新SiestaParser以扩展跳过文件格式的列表，增加对'ion'和'psdump'格式的支持

5373c0e

更新SiestaParser以改进警告信息，明确未找到目标字符串时使用SIESTA默认值

0d5f02d

增强SiestaParser以检查SIESTA输入文件中WriteKbands设置，确保其为true以提取k点和特征值

8f0264f

增强SiestaParser findcontent函数以支持不区分大小写的字符串搜索

235f8b6

增强SiestaParser以支持不区分大小写的字符串搜索，并改进WriteKbands设置检查

eaeba14

增强SiestaParser以检查SIESTA输入文件中的WriteKbands设置，并确保SystemLabel被定义

5835a1a

Add support for isolated system's enegy level plot in Bandplot

e5ea14b

Merge branch 'main' into io_siesta

8c81a6e

更新 .gitignore，添加测试文件和示例文件的忽略规则

b0b05d1

add total energy extractor and its unittest (to do: check func in `md…

6d79f49

…`, `nscf` and `relax` )

rename abacus as abacus_scf

1dbac38

AsymmetryChou added 5 commits November 3, 2025 11:13

add abacus_md test data

c777c64

update test_abacus_energy.py with accuracy float64 and correct path

b73b5b6

add abacus_relax test data

ce32e8b

将能量返回类型从 float32 更改为 float64，以提高精度

5311322

更新 AbacusParser 以支持 MD 和 RELAX 模式下的能量提取，并添加相应的测试用例

a8eecfb

AsymmetryChou requested a review from floatingCatty November 3, 2025 06:06

AsymmetryChou added the enhancement New feature or request label Nov 3, 2025

AsymmetryChou added 2 commits November 3, 2025 22:09

在能量提取函数中添加关于 ABACUS 版本 3.9.0 的验证说明

ac56166

remove un-necessary nframe and change _extract_energy_from_log as sta…

9b74469

…tic_method

floatingCatty reviewed Nov 14, 2025

View reviewed changes

AsymmetryChou added 2 commits November 16, 2025 20:37

change total_energy.npy as .dat format

7e5e07e

添加对未收敛帧的支持，包括在能量提取和写入过程中记录未收敛帧的索引

1ac08b0

coderabbitai bot reviewed Nov 16, 2025

View reviewed changes

AsymmetryChou requested a review from floatingCatty November 16, 2025 14:05

更新 AbacusParser 文档，详细说明能量提取返回值和未收敛帧的处理逻辑

d5573f7

coderabbitai bot reviewed Nov 16, 2025

View reviewed changes

Refactor: add _get_output_dir for general case

dd216ce

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

dftio/io/abacus/abacus_parser.py Outdated Show resolved Hide resolved

replace subprocee with os.system for safer operation

8852590

Enable total energy extractor from ABACUS output #20

Are you sure you want to change the base?

Enable total energy extractor from ABACUS output #20

Uh oh!

Conversation

AsymmetryChou commented Nov 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

floatingCatty Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

AsymmetryChou Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

AsymmetryChou Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

AsymmetryChou commented Nov 18, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AsymmetryChou commented Nov 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 16, 2025 •

edited

Loading