Skip to content

feat(extxyz): support stress header, improve robustness and ASE compatibility#983

Open
SchrodingersCattt wants to merge 4 commits into
deepmodeling:masterfrom
SchrodingersCattt:feat/enhance-extxyz
Open

feat(extxyz): support stress header, improve robustness and ASE compatibility#983
SchrodingersCattt wants to merge 4 commits into
deepmodeling:masterfrom
SchrodingersCattt:feat/enhance-extxyz

Conversation

@SchrodingersCattt

@SchrodingersCattt SchrodingersCattt commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Enhance the native extxyz reader/writer (quip_gap_xyz.py) to fix silent virial data loss and improve ASE interoperability.

Fixes #973

Changes

Reader (handle_single_xyz_frame)

Change Detail
stress→virial fallback Parse stress= header when virial= is absent. Supports both 3×3 (9 values) and Voigt (6 values). Conversion: virial = stress_sign * V * stress (eV/ų assumed).
stress_sign parameter Default -1 (ASE convention). User can pass stress_sign=1 for opposite convention via dpdata.MultiSystems.from_file(..., stress_sign=1).
Tolerate unknown per-atom properties magmom, charges, tags, local_energy, etc. now emit a warning instead of crashing with RuntimeError.
Recognize forces key Accept both force and forces as per-atom property keys (ASE uses forces).
Flexible energy key lookup Try energy, Energy, free_energy, REF_energy, energies in order.
Missing Lattice / nopbc Handle missing Lattice gracefully (nopbc=True + 100Å dummy box). Parse pbc="F F F" header.

Writer (format_single_frame)

Change Detail
forces instead of force ASE SinglePointCalculator requires key forces (addresses @qchempku2017 comment).
Write stress field stress = -virial / volume written alongside virial for ASE interop.
Write pbc field pbc="T T T" or pbc="F F F" based on system data.

Plugin (QuipGapXYZFormat)

Change Detail
Fix from_labeled_system Was a broken passthrough causing TypeError. Now properly reads files when called directly.
Thread **kwargs from_multi_systemsQuipGapxyzSystemshandle_single_xyz_frame so user options (e.g. stress_sign) reach the parser.

Tests

22 new tests added to tests/test_quip_gap_xyz.py covering:

  • Stress→virial (3×3 and Voigt), stress_sign parameter
  • Unknown properties tolerance, forces key, energy key variants
  • nopbc, pbc="F F F" header
  • Writer forces/stress/pbc output
  • Roundtrip stress preservation
  • LabeledSystem direct file reading

All 127 tests in test_quip_gap_xyz.py pass. All 138 tests in related files (test_xyz.py, test_quip_gap_xyz_to_methods.py) pass.

Sign Convention

The stress→virial conversion follows the ASE convention by default:

virial = -V * stress

where V is the cell volume and stress is the stress tensor in eV/ų. This matches dpdata/plugins/ase.py line 110. Users can override with stress_sign=1 for codes using the opposite convention.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced XYZ/extxyz support: parse stress (Voigt 6 or full tensor) into virials, including configurable stress-sign convention.
    • Writer updates to emit stress and/or virial when available, and improved Lattice/pbc/nopbc handling.
    • More flexible input handling when loading single or multiple systems (including parameter propagation).
  • Bug Fixes

    • Per-atom property parsing is more robust: unknown fields warn instead of failing.
    • Energy extraction now supports multiple energy key variants.
  • Tests

    • Expanded unit tests and new fixtures covering stress/virial, force key variants, energy variants, and PBC edge cases.

…tibility

- Add stress= header → virial conversion (virial = stress_sign * V * stress),
  supporting both 3×3 and Voigt (6-component) formats. Default stress_sign=-1
  follows ASE convention. User can pass stress_sign=1 for opposite convention.
  Fixes deepmodeling#973.

- Thread **kwargs from MultiSystems.from_file() through to the parser so that
  stress_sign (and future options) reach handle_single_xyz_frame().

- Tolerate unknown per-atom properties (magmom, charges, tags, etc.) with a
  warning instead of crashing with RuntimeError.

- Recognize both 'force' and 'forces' as per-atom property keys. Writer now
  outputs 'forces' for ASE SinglePointCalculator compatibility (addresses
  qchempku2017's comment on deepmodeling#973).

- Support flexible energy key lookup: try energy, Energy, free_energy,
  REF_energy, energies in order.

- Handle missing Lattice gracefully (nopbc=True + 100Å dummy box) and parse
  pbc="F F F" / pbc="T T T" header field.

- Writer now emits pbc field and stress field (stress = -virial/volume)
  alongside virial for better ASE interoperability.

- Fix from_labeled_system to actually read files when called directly
  (previously was a broken passthrough causing TypeError).

- Add comprehensive tests for all new functionality.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. dpdata enhancement New feature or request labels Jun 23, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 23, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing SchrodingersCattt:feat/enhance-extxyz (dd585ff) with master (c71799f)

Open in CodSpeed

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.38554% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.75%. Comparing base (c71799f) to head (dd585ff).

Files with missing lines Patch % Lines
dpdata/formats/xyz/quip_gap_xyz.py 97.36% 2 Missing ⚠️
dpdata/plugins/xyz.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #983      +/-   ##
==========================================
+ Coverage   86.68%   86.75%   +0.07%     
==========================================
  Files          89       89              
  Lines        8116     8174      +58     
==========================================
+ Hits         7035     7091      +56     
- Misses       1081     1083       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@SchrodingersCattt, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 44 minutes and 8 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fb31f3b1-2925-4160-96db-772c4c7481d5

📥 Commits

Reviewing files that changed from the base of the PR and between e57ad41 and dd585ff.

📒 Files selected for processing (4)
  • dpdata/formats/xyz/quip_gap_xyz.py
  • tests/test_quip_gap_xyz.py
  • tests/xyz/stresses_key.xyz
  • tests/xyz/virials_key.xyz
📝 Walkthrough

Walkthrough

The QUIP/GAP XYZ parser gains stress→virial conversion (_parse_stress_to_virials), multi-key energy extraction (_ENERGY_KEYS), and graceful skipping of unknown per-atom properties. The formatter now emits ASE-compatible virial, stress, Lattice, pbc, and forces:R:3 fields. The from_labeled_system plugin method is fixed to iterate frames rather than returning the raw filename. Thirteen new test classes and seven fixture files cover all added paths.

Changes

extxyz Parser and Formatter Enhancements

Layer / File(s) Summary
Parser: constants, helpers, class init, frame signature, per-atom decoding, metadata assembly
dpdata/formats/xyz/quip_gap_xyz.py
Adds _ENERGY_KEYS list and _parse_stress_to_virials helper. Extends QuipGapxyzSystems.__init__ to accept **kwargs (forwarded to handle_single_xyz_frame) and adds __del__ to close the file handle. Updates handle_single_xyz_frame signature with stress_sign=-1 and **kwargs. Reworks per-atom Properties decoding to warn-and-skip unknown fields and accept both force and forces. Overhauls comment-line metadata parsing to extract Lattice (with a default cubic cell fallback), apply pbc-driven nopbc override, derive virials from either virial or stress, and search _ENERGY_KEYS in priority order.
Formatter: virial/stress/pbc/forces header output
dpdata/formats/xyz/quip_gap_xyz.py
Updates format_single_frame to always emit Lattice, include virial and derived stress when virials are present, serialize pbc from nopbc, and write forces:R:3 instead of force:R:3 in the Properties descriptor.
Plugin: fix from_labeled_system and propagate kwargs
dpdata/plugins/xyz.py
Changes from_labeled_system from a passthrough to conditional dict-vs-filename handling, iterating QuipGapxyzSystems(..., **kwargs) and raising RuntimeError when no frames are found. Propagates **kwargs in from_multi_systems.
Tests and fixtures
tests/test_quip_gap_xyz.py, tests/xyz/stress_only.xyz, tests/xyz/stress_voigt.xyz, tests/xyz/unknown_props.xyz, tests/xyz/forces_key.xyz, tests/xyz/nopbc.xyz, tests/xyz/pbc_false.xyz, tests/xyz/energy_key_variants.xyz
Adds seven XYZ fixture files and thirteen new test classes: stress-to-virial conversion (full 3×3 and Voigt), stress_sign=1 convention, unknown per-atom property warning, forces key parsing, nopbc/dummy-cell handling, energy key case variants, pbc="F F F" header, writer forces/stress/pbc field output, roundtrip stress fidelity, and direct LabeledSystem(..., fmt="extxyz") read.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the PR's main objective: adding stress header support and improving robustness and ASE compatibility for the extxyz format reader/writer.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #973: stress-to-virial conversion with configurable sign, tolerating unknown per-atom properties, supporting both force/forces keys, flexible energy key lookup, and fixing from_labeled_system method.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue: XYZ format parsing/writing enhancements, stress handling, property robustness, and comprehensive test coverage with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_quip_gap_xyz.py (2)

159-159: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Replace multiplication sign × with letter x in docstring.

Line 159 docstring uses the Unicode multiplication sign × (U+00D7) instead of the Latin letter x. This triggers Ruff's RUF002 warning for ambiguous characters in source code.

🔧 Proposed fix
-        """Voigt stress=[0.01,0.02,0.03,0.004,0.005,0.006] → 3×3 → virial=-V*stress."""
+        """Voigt stress=[0.01,0.02,0.03,0.004,0.005,0.006] → 3x3 → virial=-V*stress."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_quip_gap_xyz.py` at line 159, The docstring on line 159 contains a
Unicode multiplication sign character `×` (U+00D7) instead of a regular ASCII
letter `x`, which triggers a Ruff RUF002 warning for ambiguous characters.
Replace the Unicode multiplication sign `×` with the regular letter `x` in the
docstring text where it appears in the expression "3×3" to resolve the ambiguous
character warning.

Source: Linters/SAST tools


122-145: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Address Ruff style warnings: prefer next(iter(...)) over list slicing for single-element extraction.

Lines 130, 153, 174, 192, 202, 212, 224, 244, 255, 274, 289, 304, 314, 331, 336 all use list(...)[0] to extract the single system from MultiSystems.systems.values(). Ruff recommends next(iter(...)) for clarity and consistency.

🔧 Suggested refactor

Replace each instance of list(self.ms.systems.values())[0] with:

next(iter(self.ms.systems.values()))

Example for line 130:

- self.system = list(self.ms.systems.values())[0]
+ self.system = next(iter(self.ms.systems.values()))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_quip_gap_xyz.py` around lines 122 - 145, The setUp method in the
TestStressToVirial class uses list slicing `list(self.ms.systems.values())[0]`
to extract a single element from the systems dictionary values, which violates
Ruff style conventions. Replace this pattern with
`next(iter(self.ms.systems.values()))` to extract the first element more
idiomatically. This same pattern should be fixed wherever it appears in the test
file to ensure consistent style compliance with Ruff linting standards.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpdata/formats/xyz/quip_gap_xyz.py`:
- Line 24: Replace unicode multiplication sign (×) characters with ASCII
lowercase x in the docstring comments throughout the file to satisfy Ruff
linting rules. Locate all occurrences of the unicode × character in docstrings
and comments (including the one in "3×3 matrix" and other similar patterns) and
replace them with the ASCII character x to resolve RUF001/RUF002 lint warnings
across lines 24, 27, and 48.

---

Nitpick comments:
In `@tests/test_quip_gap_xyz.py`:
- Line 159: The docstring on line 159 contains a Unicode multiplication sign
character `×` (U+00D7) instead of a regular ASCII letter `x`, which triggers a
Ruff RUF002 warning for ambiguous characters. Replace the Unicode multiplication
sign `×` with the regular letter `x` in the docstring text where it appears in
the expression "3×3" to resolve the ambiguous character warning.
- Around line 122-145: The setUp method in the TestStressToVirial class uses
list slicing `list(self.ms.systems.values())[0]` to extract a single element
from the systems dictionary values, which violates Ruff style conventions.
Replace this pattern with `next(iter(self.ms.systems.values()))` to extract the
first element more idiomatically. This same pattern should be fixed wherever it
appears in the test file to ensure consistent style compliance with Ruff linting
standards.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3b34f0ae-a3a2-4ebd-b311-e6d46a3eb0c9

📥 Commits

Reviewing files that changed from the base of the PR and between c71799f and e29e074.

📒 Files selected for processing (10)
  • dpdata/formats/xyz/quip_gap_xyz.py
  • dpdata/plugins/xyz.py
  • tests/test_quip_gap_xyz.py
  • tests/xyz/energy_key_variants.xyz
  • tests/xyz/forces_key.xyz
  • tests/xyz/nopbc.xyz
  • tests/xyz/pbc_false.xyz
  • tests/xyz/stress_only.xyz
  • tests/xyz/stress_voigt.xyz
  • tests/xyz/unknown_props.xyz

Comment thread dpdata/formats/xyz/quip_gap_xyz.py Outdated
…KEYS/_STRESS_KEYS constants

Recognize both singular and plural forms for virial and stress header keys,
consistent with _ENERGY_KEYS and _FORCE_KEYS patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpdata enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] extxyz / quip/gap/xyz reader silently ignores stress= headers

1 participant