feat(pt): add embedding sub cli for pt models by OutisLi · Pull Request #5571 · deepmodeling/deepmd-kit

OutisLi · 2026-06-22T08:10:51Z

Summary by CodeRabbit

New Features
- Added dp embed to export descriptor, atomic_feature, and structural_feature into a single HDF5 file with per-system groups.
- Added DeepPot.eval_embedding() with selectable output precision (fp32, fp64, native).
- Added embedding dtype CLI option and multi-task head selection (--head / --model-branch).
Bug Fixes
- Improved fail-fast validation for neighbor-list pairing when model sub-configurations don’t match.
Documentation
- Updated embedding docs, including Python API usage and embedding-only behavior.
Tests
- Added embedding-focused test coverage and removed obsolete OOM-retry tests.

coderabbitai · 2026-06-22T08:21:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c534f85e-d643-4012-a53d-96cb03cde3c2

📥 Commits

Reviewing files that changed from the base of the PR and between de0d109 and a68cfcd.

📒 Files selected for processing (5)

deepmd/infer/deep_eval.py
deepmd/pt/infer/deep_eval.py
deepmd/pt/model/model/make_model.py
deepmd/pt/model/model/sezm_model.py
source/tests/pt/model/test_embedding.py

🚧 Files skipped from review as they are similar to previous changes (3)

deepmd/pt/model/model/make_model.py
deepmd/pt/model/model/sezm_model.py
deepmd/pt/infer/deep_eval.py

📝 Walkthrough

Walkthrough

Adds a new dp embed CLI command and DeepEval.eval_embedding API that compute descriptor, atomic_feature, and structural_feature in a single forward pass without forces or virials. The old hook-based eval_descriptor/eval_fitting_last_layer mechanism (middle_output, RetrySignal, OOM retry) is replaced throughout the PyTorch fitting nets, atomic models, and model layer. Results are written to a gzip-compressed HDF5 file per system.

Changes

Embedding extraction feature

Layer / File(s)	Summary
Fitting net: replace middle_output with return_atomic_feature `deepmd/pt/model/task/fitting.py`, `deepmd/pt/model/task/invar_fitting.py`, `deepmd/pt/model/task/dipole.py`, `deepmd/pt/model/task/polarizability.py`, `deepmd/pt/model/task/ener.py`, `deepmd/pt/model/task/sezm_ener.py`, `deepmd/pt/model/descriptor/sezm_nn/dens.py`, `deepmd/pt/model/atomic_model/sezm_atomic_model.py`	Removes `set_return_middle_output`/`middle_output` state from `GeneralFitting` and `SeZMDeNSFittingNet`; adds `return_atomic_feature: bool = False` to `_forward_common` in all fitting nets, which now emits `results["atomic_feature"]` instead of `results["middle_output"]`; `EnergyFittingNetDirect` raises `NotImplementedError` when the flag is set.
Atomic model forward_embedding and has_embedding `deepmd/pt/model/atomic_model/base_atomic_model.py`, `deepmd/pt/model/atomic_model/dp_atomic_model.py`, `deepmd/pt/model/atomic_model/linear_atomic_model.py`, `deepmd/pt/model/model/dp_model.py`	`BaseAtomicModel` gains `has_embedding()`/`forward_embedding()` stubs; `DPAtomicModel` gates autograd clone on `return_atomic_feature`, removes hook state, and implements `forward_embedding` with exclusion masking and structural_feature pooling; `LinearEnergyAtomicModel` delegates to the first embedding-capable sub-model with `strict=True` zip; `DPModelCommon` TorchScript hook methods are removed.
Model-level forward_embedding (DPModel and SeZMModel) `deepmd/pt/model/model/make_model.py`, `deepmd/pt/model/model/sezm_model.py`	`make_model.py` adds a TorchScript-exported `forward_embedding` that builds the neighbor list and calls `atomic_model.forward_embedding` under `no_grad`; `SeZMModel` adds `forward_embedding`, threads `embedding_only` through `forward_common`/`forward_common_lower`/`core_compute`/`trace_and_compile`, manages a separate `compiled_embedding` cache, and removes dens-path eval-hook side effects.
DeepEvalBackend / DeepEval eval_embedding and PyTorch backend `deepmd/infer/deep_eval.py`, `deepmd/pt/infer/deep_eval.py`	`DeepEvalBackend` gains abstract `eval_embedding` returning a 3-tuple; `DeepEval.eval_embedding` standardizes inputs and delegates; `eval_descriptor`/`eval_fitting_last_layer` gain `dtype` parameter; the PyTorch backend adds `_EMBEDDING_DTYPE_TO_TORCH`, implements `eval_embedding`/`_eval_embedding` calling `model.forward_embedding` and casting outputs, and rewrites the legacy methods as thin delegation wrappers removing hook/retry logic.
dp embed CLI, HDF5 entrypoint, and eval_desc dtype `deepmd/main.py`, `deepmd/entrypoints/main.py`, `deepmd/entrypoints/embedding.py`, `deepmd/entrypoints/eval_desc.py`	`deepmd/main.py` adds the `embed` subparser with model/system/output/dtype/head arguments and adds `--dtype`/`--head` to `eval-desc`; `deepmd/entrypoints/main.py` wires the `embed` command; `embedding.py` implements the full eval loop loading systems, calling `eval_embedding`, and writing compressed HDF5 datasets with group attributes; `eval_desc.py` adds `dtype` parameter and strips whitespace-only lines from datafile.
Documentation and tests `doc/inference/embedding.md`, `doc/inference/index.rst`, `doc/inference/python.md`, `doc/model/dpa4.md`, `doc/test/test.md`, `source/tests/pt/model/test_embedding.py`, `source/tests/pt/model/test_linear_atomic_model.py`, `source/tests/pt/test_eval_desc.py`	Adds `doc/inference/embedding.md` documenting outputs and HDF5 format; updates dpa4.md, inference index, python.md, and test.md; adds `test_embedding.py` covering SeZM and se_e2_a embedding APIs, energy reconstruction, compiled parity, dtype propagation, entrypoint HDF5 writing, and the old-frozen-model error case; adds `test_forward_embedding` to linear atomic model test; updates `test_eval_desc.py` header.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant main_py as main.py / entrypoints/main.py
    participant embedding_py as entrypoints/embedding.py
    participant DeepEval as DeepEval (deep_eval.py)
    participant PTDeepEval as PyTorch DeepEval (pt/infer/deep_eval.py)
    participant Model as SeZMModel / DPModel

    User->>main_py: dp embed -m model.pt -s ./data -o out.h5 --dtype fp32
    main_py->>embedding_py: embedding(model, system, output, dtype="fp32")
    loop per discovered system
        embedding_py->>embedding_py: DeepmdData(system_path)
        embedding_py->>DeepEval: eval_embedding(coords, cells, atype, dtype="fp32")
        DeepEval->>DeepEval: _standard_input(coords, cells, atype, mixed_type)
        DeepEval->>PTDeepEval: deep_eval.eval_embedding(coords, cells, atype, dtype="fp32")
        PTDeepEval->>Model: model.forward_embedding(coord, atype, box)
        Model-->>PTDeepEval: descriptor, atomic_feature, structural_feature
        PTDeepEval-->>DeepEval: float32 numpy arrays
        DeepEval-->>embedding_py: (descriptor, atomic_feature, structural_feature)
        embedding_py->>embedding_py: write HDF5 group + gzip datasets + attributes
    end
    embedding_py-->>User: out.h5

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

deepmodeling/deepmd-kit#5391: Adds eval_descriptor/eval_fitting_last_layer and the middle_output toggle via GeneralFitting.set_return_middle_output, which this PR replaces with the unified eval_embedding + return_atomic_feature/forward_embedding flow.

Suggested reviewers

njzjz
iProzd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(pt): add embedding sub cli for pt models' clearly describes the main feature addition - a new embedding sub-command for PyTorch models.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/entrypoints/embedding.py`:
- Around line 116-120: The datalist.read().splitlines() call in the datafile
parsing section keeps blank lines, which can result in empty strings being
passed to DeepmdData and causing failures. Filter out blank lines when reading
from the datafile by adding a filter condition that excludes empty or
whitespace-only entries from the all_sys list. This ensures only valid system
entries are processed while preserving the original structure of the code.

In `@deepmd/pt/infer/deep_eval.py`:
- Around line 1208-1260: The eval_embedding method directly calls
self.dp.model["Default"].forward_embedding(...) which bypasses the data modifier
that would normally be applied through ModelWrapper. Add a guard condition that
checks if self.modifier is not None before proceeding with the forward_embedding
call, and raise an appropriate error or fail fast if a modifier exists, since
modifier-aware embedding processing is not yet implemented for this code path.

In `@deepmd/pt/model/atomic_model/linear_atomic_model.py`:
- Around line 341-349: Add explicit `strict=False` parameter to both zip() calls
in the list comprehensions to satisfy Ruff B905. The first zip() call combines
self.get_model_rcuts() and self.get_model_nsels(), and the second zip() call
combines self.mixed_types_list, raw_nlists, and self.get_model_sels(). Update
both invocations to include the strict parameter to explicitly indicate that the
iterables may have different lengths.

In `@deepmd/pt/model/task/fitting.py`:
- Around line 855-868: The atomic_feature tensor allocation in the
return_atomic_feature block accesses self.neuron[-1] without checking if the
neuron list is empty. When neuron=[] is configured, this causes an IndexError.
Fix this by replacing the hardcoded self.neuron[-1] with a conditional check
that determines the correct feature dimension. If self.neuron is not empty, use
self.neuron[-1], otherwise determine the feature width from an alternative
source such as the output shape of ll.call_until_last(xx) or from the
descriptor's feature dimension to ensure the atomic_feature tensor is allocated
with the correct size regardless of neuron configuration.

In `@doc/inference/embedding.md`:
- Line 44: On line 44 in the embedding.md file, remove the shell prompt marker
`$ ` from the command to comply with markdownlint rule MD014. Change `$ dp embed
--help` to just `dp embed --help` to fix the lint violation while preserving the
command documentation.

In `@source/tests/pt/model/test_embedding.py`:
- Around line 260-271: The tolerance values (rtol=1e-10 and atol=1e-12) in both
np.testing.assert_allclose calls for eval_descriptor and eval_fitting_last_layer
are too strict for float32 comparisons across separate forward passes, causing
intermittent test failures. Increase both the relative tolerance (rtol) and
absolute tolerance (atol) parameters in both assert_allclose calls to more
reasonable values that account for floating-point precision variations while
still validating correctness.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cf1c59a0-671a-4d28-8ec5-c902cadd310c

📥 Commits

Reviewing files that changed from the base of the PR and between 03682bf and 8c191f0.

📒 Files selected for processing (30)

deepmd/entrypoints/embedding.py
deepmd/entrypoints/eval_desc.py
deepmd/entrypoints/main.py
deepmd/infer/deep_eval.py
deepmd/main.py
deepmd/pt/infer/deep_eval.py
deepmd/pt/model/atomic_model/base_atomic_model.py
deepmd/pt/model/atomic_model/dp_atomic_model.py
deepmd/pt/model/atomic_model/linear_atomic_model.py
deepmd/pt/model/atomic_model/sezm_atomic_model.py
deepmd/pt/model/descriptor/sezm_nn/dens.py
deepmd/pt/model/model/dp_model.py
deepmd/pt/model/model/make_model.py
deepmd/pt/model/model/sezm_model.py
deepmd/pt/model/task/dipole.py
deepmd/pt/model/task/ener.py
deepmd/pt/model/task/fitting.py
deepmd/pt/model/task/invar_fitting.py
deepmd/pt/model/task/polarizability.py
deepmd/pt/model/task/sezm_ener.py
doc/inference/embedding.md
doc/inference/index.rst
doc/inference/python.md
doc/model/dpa4.md
doc/test/test.md
source/tests/infer/test_models.py
source/tests/pt/model/test_embedding.py
source/tests/pt/model/test_linear_atomic_model.py
source/tests/pt/test_eval_desc.py
source/tests/pt/test_oom_retry.py

💤 Files with no reviewable changes (4)

source/tests/pt/test_oom_retry.py
deepmd/pt/model/atomic_model/sezm_atomic_model.py
deepmd/pt/model/model/dp_model.py
deepmd/pt/model/descriptor/sezm_nn/dens.py

codecov · 2026-06-22T09:16:17Z

Codecov Report

❌ Patch coverage is 91.30435% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.16%. Comparing base (f9554f1) to head (a68cfcd).

Files with missing lines	Patch %	Lines
deepmd/entrypoints/embedding.py	93.05%	5 Missing ⚠️
deepmd/pt/infer/deep_eval.py	89.79%	5 Missing ⚠️
deepmd/pt/model/model/sezm_model.py	90.47%	4 Missing ⚠️
deepmd/pt/model/task/sezm_ener.py	0.00%	3 Missing ⚠️
deepmd/entrypoints/main.py	50.00%	2 Missing ⚠️
deepmd/infer/deep_eval.py	86.66%	2 Missing ⚠️
deepmd/entrypoints/eval_desc.py	0.00%	1 Missing ⚠️
...eepmd/pt/model/atomic_model/linear_atomic_model.py	94.11%	1 Missing ⚠️
deepmd/pt/model/task/ener.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5571      +/-   ##
==========================================
+ Coverage   82.14%   82.16%   +0.01%     
==========================================
  Files         900      901       +1     
  Lines      104140   104297     +157     
  Branches     4472     4472              
==========================================
+ Hits        85549    85696     +147     
- Misses      17182    17190       +8     
- Partials     1409     1411       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

njzjz-bot

Thanks for adding the embedding extraction path. I think this needs a few changes before merge, mainly around backend/API compatibility. The PR adds a PyTorch-only eval_embedding implementation, but the CLI/docs deprecate backend-generic descriptor APIs (dp eval-desc, DeepPot.eval_descriptor) in favor of dp embed / eval_embedding. That currently sends TensorFlow/Paddle users from a working descriptor path to an unsupported one. There is also no descriptor-only mode in the replacement command, so users who only need descriptors pay for fitting features and larger HDF5 output.

Reviewed by OpenClaw 2026.6.8 (844f405) (model: custom-chat-jinzhezeng-group/gpt-5.5).

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

deepmd/infer/deep_eval.py (1)

695-708: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the dtype wording in eval_embedding’s docstring.

Line 707 still says all embeddings are returned as float32, but Lines 737-738 now allow "fp64" and "native".

Proposed docstring adjustment

-        embeddings are returned as float32, which is ample for downstream
-        analysis.
+        embeddings are returned as float32 by default, which is ample for
+        downstream analysis. Pass ``dtype="fp64"`` or ``dtype="native"`` to
+        request a different output precision.

Also applies to: 737-738

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/infer/deep_eval.py` around lines 695 - 708, The docstring for the
eval_embedding method on line 712-713 states that all three embeddings are
returned as float32, but the implementation now supports configurable dtypes
including "fp64" and "native" (as shown in the dtype parameter). Update the
docstring statement that mentions float32 to reflect that the returned dtype
depends on the dtype parameter passed to the function, removing the hardcoded
reference to float32 and indicating that the embeddings are returned in the
specified dtype format.

deepmd/pt/infer/deep_eval.py (1)

1151-1164: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the float32-only wording now that dtype is configurable.

Line 1163 says all embeddings are returned as float32, but Lines 1192-1193 now document "fp64" and "native" as valid output dtypes.

Proposed docstring adjustment

-        total energy. All three embeddings are returned as float32, which is
-        ample for downstream analysis.
+        total energy. All three embeddings are returned as float32 by default,
+        which is ample for downstream analysis. Pass ``dtype="fp64"`` or
+        ``dtype="native"`` to request a different output precision.

Also applies to: 1192-1193

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/infer/deep_eval.py` around lines 1151 - 1164, The docstring for the
evaluate_descriptor_atomic_feature_and_structural_feature method (or similar
function name around line 1151-1164) contains outdated information stating that
all three embeddings are always returned as float32. Since the function now
includes a configurable dtype parameter that accepts "fp32", "fp64", and
"native" options (as documented in lines 1192-1193), update the docstring to
reflect that the output dtype depends on the dtype parameter passed to the
function, rather than claiming they are exclusively float32. Remove or modify
the sentence that incorrectly claims fixed float32 output and replace it with
text indicating the dtype is configurable via the dtype parameter.

♻️ Duplicate comments (1)

source/tests/pt/model/test_embedding.py (1)

264-279: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Loosen tolerances for float32 cross-call comparisons.

The tolerances rtol=1e-10, atol=1e-12 are tighter than float32 machine epsilon (~1e-7) and can cause flaky test failures across independent forward passes. Graph context shows float32 comparisons elsewhere in the codebase use rtol=1e-4.

Proposed fix

         np.testing.assert_allclose(
             dp.eval_descriptor(
                 self.coord_np, self.cell_np, self.atype_np, dtype="fp32"
             ),
             descriptor,
-            rtol=1e-10,
-            atol=1e-12,
+            rtol=1e-5,
+            atol=1e-6,
         )
         np.testing.assert_allclose(
             dp.eval_fitting_last_layer(
                 self.coord_np, self.cell_np, self.atype_np, dtype="fp32"
             ),
             atomic_feature,
-            rtol=1e-10,
-            atol=1e-12,
+            rtol=1e-5,
+            atol=1e-6,
         )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt/model/test_embedding.py` around lines 264 - 279, The
tolerances rtol=1e-10, atol=1e-12 are too tight for float32 precision and cause
flaky test failures. Update both assert_allclose calls for eval_descriptor and
eval_fitting_last_layer to use rtol=1e-4 which aligns with float32 comparison
standards used elsewhere in the codebase. Change the rtol parameter from 1e-10
to 1e-4 and the atol parameter from 1e-12 to a value appropriate for float32
(typically around 1e-6 or 1e-5) in both assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@deepmd/infer/deep_eval.py`:
- Around line 695-708: The docstring for the eval_embedding method on line
712-713 states that all three embeddings are returned as float32, but the
implementation now supports configurable dtypes including "fp64" and "native"
(as shown in the dtype parameter). Update the docstring statement that mentions
float32 to reflect that the returned dtype depends on the dtype parameter passed
to the function, removing the hardcoded reference to float32 and indicating that
the embeddings are returned in the specified dtype format.

In `@deepmd/pt/infer/deep_eval.py`:
- Around line 1151-1164: The docstring for the
evaluate_descriptor_atomic_feature_and_structural_feature method (or similar
function name around line 1151-1164) contains outdated information stating that
all three embeddings are always returned as float32. Since the function now
includes a configurable dtype parameter that accepts "fp32", "fp64", and
"native" options (as documented in lines 1192-1193), update the docstring to
reflect that the output dtype depends on the dtype parameter passed to the
function, rather than claiming they are exclusively float32. Remove or modify
the sentence that incorrectly claims fixed float32 output and replace it with
text indicating the dtype is configurable via the dtype parameter.

---

Duplicate comments:
In `@source/tests/pt/model/test_embedding.py`:
- Around line 264-279: The tolerances rtol=1e-10, atol=1e-12 are too tight for
float32 precision and cause flaky test failures. Update both assert_allclose
calls for eval_descriptor and eval_fitting_last_layer to use rtol=1e-4 which
aligns with float32 comparison standards used elsewhere in the codebase. Change
the rtol parameter from 1e-10 to 1e-4 and the atol parameter from 1e-12 to a
value appropriate for float32 (typically around 1e-6 or 1e-5) in both
assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d11ec34e-9512-4db8-8896-57b97f9f5cc0

📥 Commits

Reviewing files that changed from the base of the PR and between 21bc006 and 2f63199.

📒 Files selected for processing (9)

deepmd/entrypoints/embedding.py
deepmd/entrypoints/eval_desc.py
deepmd/infer/deep_eval.py
deepmd/main.py
deepmd/pt/infer/deep_eval.py
doc/inference/embedding.md
doc/inference/python.md
doc/test/test.md
source/tests/pt/model/test_embedding.py

✅ Files skipped from review due to trivial changes (2)

doc/test/test.md
doc/inference/embedding.md

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/entrypoints/embedding.py

iProzd

Thanks, this is a useful embedding extraction feature and the backend/docs compatibility issues look much better now.

I still think one back-compat issue should be fixed before merge. eval_descriptor() and eval_fitting_last_layer() are existing public APIs, but they now route through eval_embedding(), which requires the frozen TorchScript model to expose the newly exported forward_embedding() method. Models frozen before this PR cannot have that method, even though the old hook-based descriptor / fitting-last-layer path used to work for them.

The explicit NotImplementedError is fine for the new eval_embedding() API, but it is a regression for the existing descriptor APIs and for dp eval-desc. Could you keep a fallback to the old hook-based path when forward_embedding is unavailable, and add a regression test for a legacy-frozen / legacy-hook model path?

After that, LGTM.

njzjz-bot · 2026-06-23T05:30:34Z

I found one more backend-compatibility issue around the new --dtype option for dp eval-desc.

deepmd/infer/deep_eval.py only omits dtype when it is "native"; for --dtype fp32/fp64 it forwards dtype into self.deep_eval.eval_descriptor(...). That is fine for the PyTorch backend added here, but the same CLI/docs now advertise --dtype for the backend-generic descriptor path (including the graph.pb example), while the TensorFlow backend eval_descriptor(...) signature does not accept dtype. So dp eval-desc -m graph.pb ... --dtype fp32 will raise TypeError instead of casting the output.

Please either keep --dtype PyTorch-scoped, or preferably make it backend-generic by evaluating descriptors through the existing backend API first and casting the returned NumPy array in the high-level wrapper / entrypoint. That would also keep native behavior unchanged for all backends.

This is separate from the already-requested legacy .pth fallback for eval_descriptor() / eval_fitting_last_layer(); I still think that one is the main blocker because it regresses existing frozen TorchScript models.

— OpenClaw 2026.6.8 (844f405), model: custom-chat-jinzhezeng-group/gpt-5.5

… level of features

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/infer/deep_eval.py`:
- Around line 611-615: Remove the dtype parameter from being passed to
backend-specific method calls (eval_descriptor() and eval_fitting_last_layer()).
Instead of adding dtype to eval_kwargs that gets forwarded to the backend,
capture the returned NumPy array from these backend calls and perform the dtype
casting at the high-level wrapper level when dtype is not "native". This ensures
the backend-generic DeepEval wrapper handles dtype conversion independently
without assuming backend support for this parameter, fixing the TensorFlow
backend compatibility issue where eval_descriptor() and
eval_fitting_last_layer() do not accept a dtype argument.
- Around line 704-708: The docstring text in the deep_eval.py file claims that
all three embeddings are returned as float32, but this contradicts the actual
method behavior which accepts dtype parameters like "fp64" and "native". Update
the documentation text to accurately reflect that embeddings are returned
according to the dtype parameter specified in the method, rather than always
being float32.

In `@deepmd/main.py`:
- Around line 495-500: The `--dtype` argument with choices fp32, fp64, and
native is exposed unconditionally without backend-aware validation, causing
TensorFlow users to encounter runtime errors instead of early CLI feedback.
After argument parsing in the eval-desc command, add a validation check that
verifies the selected dtype is compatible with the active backend. If the
backend is TensorFlow and dtype is anything other than native, raise an error
with a clear message explaining that non-native dtypes are only supported on
PyTorch. This ensures the command fails early with helpful guidance rather than
at runtime.

In `@deepmd/pt/model/atomic_model/linear_atomic_model.py`:
- Around line 311-368: The forward_embedding method currently returns the
embedding from the first sub-model that supports embeddings without verifying
that exactly one sub-model has embedding capability, which creates ambiguity
when multiple sub-models support embeddings. Add a check in the
forward_embedding method to count how many sub-models in self.models have
embeddings (by iterating through and counting calls to model.has_embedding()),
and raise a NotImplementedError or ValueError if the count is greater than one,
ensuring only the DP+ZBL contract (exactly one embedding-capable sub-model) is
accepted. Keep the existing behavior of returning from the first
embedding-capable model, but only after verifying there is exactly one.

In `@deepmd/pt/model/model/sezm_model.py`:
- Around line 879-887: The file contains 36 F722 lint errors from Ruff related
to jaxtyping annotations across multiple methods. To fix this, add `# noqa:
F722` suppression comments to all type-annotated parameter lines in the affected
methods: forward_embedding, forward, forward_common, forward_lower,
build_neighbor_list, and build_extended_neighbor_list. Apply the suppression
consistently to each line containing jaxtyping annotations (Float, Int, Tensor
types) across all six methods to ensure Ruff passes validation before CI runs.
- Around line 921-925: Add a training mode guard in the `forward_embedding()`
method to prevent incorrect embedding cache reuse. After the existing mode check
that validates `self.get_active_mode() != "ener"`, add another guard that raises
NotImplementedError when `self.training` is True, since the code comment
indicates this is an eval-only graph and the compiled_embedding cache does not
account for training mode differences like the regular cache does (which keys by
bool(self.training)). This ensures training-mode descriptor graphs are not
cached and later reused incorrectly during evaluation.

In `@source/tests/pt/model/test_embedding.py`:
- Line 7: The file uses unittest.mock.patch.dict, unittest.mock.Mock, and
unittest.mock.patch.object throughout the test file but only imports the base
unittest module without explicitly importing the mock submodule. Add an explicit
import statement for unittest.mock alongside the existing unittest import at the
top of the file to ensure the mock submodule is available when these mock
utilities are used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 36655b57-8fe8-49cf-8dda-b08f1e539402

📥 Commits

Reviewing files that changed from the base of the PR and between 2f63199 and de0d109.

📒 Files selected for processing (29)

deepmd/entrypoints/embedding.py
deepmd/entrypoints/eval_desc.py
deepmd/entrypoints/main.py
deepmd/infer/deep_eval.py
deepmd/main.py
deepmd/pt/infer/deep_eval.py
deepmd/pt/model/atomic_model/base_atomic_model.py
deepmd/pt/model/atomic_model/dp_atomic_model.py
deepmd/pt/model/atomic_model/linear_atomic_model.py
deepmd/pt/model/atomic_model/sezm_atomic_model.py
deepmd/pt/model/descriptor/sezm_nn/dens.py
deepmd/pt/model/model/dp_model.py
deepmd/pt/model/model/make_model.py
deepmd/pt/model/model/sezm_model.py
deepmd/pt/model/task/dipole.py
deepmd/pt/model/task/ener.py
deepmd/pt/model/task/fitting.py
deepmd/pt/model/task/invar_fitting.py
deepmd/pt/model/task/polarizability.py
deepmd/pt/model/task/sezm_ener.py
doc/inference/embedding.md
doc/inference/index.rst
doc/inference/python.md
doc/model/dpa4.md
doc/test/test.md
source/tests/pt/model/test_embedding.py
source/tests/pt/model/test_linear_atomic_model.py
source/tests/pt/test_eval_desc.py
source/tests/pt/test_oom_retry.py

💤 Files with no reviewable changes (4)

deepmd/pt/model/model/dp_model.py
deepmd/pt/model/atomic_model/sezm_atomic_model.py
source/tests/pt/test_oom_retry.py
deepmd/pt/model/descriptor/sezm_nn/dens.py

✅ Files skipped from review due to trivial changes (6)

doc/inference/index.rst
doc/inference/python.md
source/tests/pt/test_eval_desc.py
doc/test/test.md
doc/model/dpa4.md
doc/inference/embedding.md

🚧 Files skipped from review as they are similar to previous changes (12)

source/tests/pt/model/test_linear_atomic_model.py
deepmd/entrypoints/main.py
deepmd/pt/model/task/invar_fitting.py
deepmd/entrypoints/eval_desc.py
deepmd/pt/model/task/polarizability.py
deepmd/pt/model/task/ener.py
deepmd/pt/model/task/dipole.py
deepmd/pt/model/atomic_model/base_atomic_model.py
deepmd/pt/model/model/make_model.py
deepmd/pt/model/task/sezm_ener.py
deepmd/pt/infer/deep_eval.py
deepmd/pt/model/task/fitting.py

OutisLi · 2026-06-23T09:42:18Z

Closes #5507

njzjz-bot

Thanks for the update. I re-reviewed the compatibility fixes and this looks good to me now.

The two blockers I had are addressed:

eval_descriptor() / eval_fitting_last_layer() now fall back to the legacy TorchScript hook path when forward_embedding is absent, so old frozen .pth models should keep working.
dp eval-desc --dtype is now handled by the backend-agnostic high-level wrapper, so non-PyTorch backends keep using their existing eval_descriptor() signatures and only the returned NumPy array is cast.

I also checked git diff --check, CI is green, and the touched Python files compile locally with py_compile in this checkout. I don't see remaining blockers from my side.

— OpenClaw 2026.6.8 (844f405), model: custom-chat-jinzhezeng-group/gpt-5.5

dosubot Bot added Docs new feature labels Jun 22, 2026

github-actions Bot added the Python label Jun 22, 2026

OutisLi requested review from njzjz and wanghan-iapcm June 22, 2026 08:14

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

Comment thread source/tests/pt/model/test_embedding.py Fixed

OutisLi requested a review from iProzd June 22, 2026 08:14

This was referenced Jun 22, 2026

feat(pt/dpa4): add nvalchemi-toolkit warpper for DPA4 #5568

Draft

[BUG] dp --pt eval-desc OOMs for DPA4-Plus descriptors while dp --pt test inference completes #5507

Closed

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

wanghan-iapcm reviewed Jun 22, 2026

View reviewed changes

Comment thread deepmd/pt/infer/deep_eval.py

wanghan-iapcm reviewed Jun 22, 2026

View reviewed changes

Comment thread deepmd/pt/infer/deep_eval.py

wanghan-iapcm reviewed Jun 22, 2026

View reviewed changes

Comment thread deepmd/entrypoints/embedding.py

wanghan-iapcm reviewed Jun 22, 2026

View reviewed changes

Comment thread deepmd/pt/infer/deep_eval.py Outdated

njzjz-bot suggested changes Jun 22, 2026

View reviewed changes

Comment thread deepmd/main.py Outdated

Comment thread deepmd/main.py Outdated

Comment thread doc/inference/python.md Outdated

Comment thread deepmd/entrypoints/embedding.py