Skip to content

Latest commit

 

History

History
195 lines (150 loc) · 13.2 KB

File metadata and controls

195 lines (150 loc) · 13.2 KB
name pxmeter
description Used to invoke the PXMeter tool for rigorous quality assessment of biomolecular structure prediction models (e.g., proteins, nucleic acids, small molecules). This skill covers single-sample CIF evaluations and large-scale dataset benchmarks, providing rich result parsing support. Trigger this skill when the user asks to "run PXMeter", "compare PDB/CIF evaluation results of different tools", "check benchmark scores", or asks "how accurate is the structure generated by the model".

PXMeter Agent Skill

EXTREMELY IMPORTANT: DO NOT SCAN FOR FILES Under NO circumstances should you use find, Glob, ls -R, or Task/Explore subagents to search for missing .cif or .json evaluation files. The dataset storage contains millions of structures, and any recursive search will completely crash or hang the session. If you don't know the exact path, you MUST construct it via $PXM_MMCIF_DIR or immediately ask the user. DO NOT GUESS.

1. Core Concepts & Metrics

PXMeter has powerful all-atom matching capabilities. Even if the model outputs disordered PDB/CIF chains, PXMeter can find the correct correspondence at the sequence and geometric levels through symmetry resolution (Permutation).

Key Metrics Guide (Must-Read for AI):

  1. LDDT (Local Distance Difference Test):
    • Meaning: A superposition-free score for local structure differences.
    • Interpretation: Scored out of 100. >80 is generally considered good quality. For the environment around ligands (LDDT-PLI), this value reflects the accuracy of the model's binding pocket prediction.
  2. DockQ:
    • Meaning: Specifically measures the interaction interface quality between polymers like protein-protein complexes.
    • Interpretation: Rages from 0 to 1. Usually, DockQ > 0.23 is considered a successful interaction interface (Acceptable). Batch Benchmarks often output avg_dockq_avg_sr (success rate).
  3. Pocket-aligned RMSD (Ligand RMSD):
    • Meaning: Root-mean-square deviation of the ligand calculated after aligning the binding pocket of the model and the reference structure.
    • Interpretation: Unit is Å. Typically, RMSD < 2.0 Å is considered a successfully predicted binding pose (Success).
  4. PoseBusters Validity (Stereochemical Check):
    • Meaning: Verifies whether the generated small molecule conformation violates physicochemical rules (e.g., severe steric clashes, incorrect chirality).
    • Interpretation: Usually outputs success rates, such as pb_all_valid_sr (proportion passing all chemical checks) and pb_all_valid_and_good_rmsd_sr (proportion passing both validity checks and RMSD < 2.0 Å).

2. Single Sample Evaluation CLI Command

🚨 Remember: Never use Glob, find, or ls -R if you lack the paths. If you don't know the exact reference.cif path, either assemble it (e.g. $PXM_MMCIF_DIR/{id}.cif) and do a direct ls check, or STOP and ask the user.

If the user provides a pair of CIF files and asks for an evaluation, generate and execute the following command:

pxm -r <reference.cif> -m <model.cif> -o <output.json>

Advanced Parameter Scenarios:

  • Evaluation with Ligands: If the user cares about the binding quality of specific small molecule ligands, you must use the -l parameter to specify the chain ID. Without it, RMSD and PoseBusters will not be output.
    • pxm -r ref.cif -m model.cif -l A,B -o output.json
  • Processing Specific Assemblies:
    • --ref_assembly_id 1 (Defaults to Asymmetric Unit if not provided).
  • Overriding Built-in Hyperparameters: Use -C, e.g., to ignore ligand mapping:
    • -C mapping.mapping_ligand=false

3. pxm gen-input Input Generation

If the user only has mmCIF reference structures, or wants to convert between different model input formats (e.g., converting AlphaFold3 JSON to Boltz YAML), PXMeter provides a convenient CLI tool:

pxm gen-input \
  -i <INPUT_PATH> \
  -o <OUTPUT_PATH> \
  -it <cif|af3|protenix|boltz|openfold3> \
  -ot <af3|protenix|boltz|openfold3> \
  --num-seeds <N>
  • This command supports both single-file conversion and batch conversion of flat directories.
  • Interactive Mode: If the user runs pxm gen-input without any parameters, it enters an interactive terminal mode, guiding the user step-by-step to build the input file.

4. pxm stereocheck Polymer Stereochemistry Validation

In addition to calculating metrics against a reference, PXMeter provides a standalone tool to evaluate the stereochemical rationality of polymer structures (like proteins and nucleic acids) within a single generated CIF file, without needing a reference structure.

If the user wants to "check if the generated structure has stereochemical violations" or "evaluate bond lengths/angles of the protein", run:

pxm stereocheck -c <model.cif> -o <stereochem_report.csv>
  • This command will scan the <model.cif> and output a CSV report (defaulting to stereochem_report.csv) listing any stereochemical violations (e.g., severe bond length or bond angle deviations) found in the polymer chains. If the structure is completely valid, it will report "No stereochemistry violations found."

5. Batch Benchmark Workflow

If the user needs to evaluate and aggregate an entire dataset, follow these steps.

Step 5.1 Evaluation Phase (run_eval.py)

First, specify the dataset root directory and execute run_eval. PXMeter has built-in support for parsing various model structures like protenix, af3, chai, boltz.

export PXM_EVAL_DATA_ROOT_PATH="<path/to/dataset>"

python -m benchmark.run_eval \
    -i <infer_results_dir> \
    -o <output_eval_dir> \
    -m <model_name> \
    -n -1

Step 5.2 Aggregation and Presentation Phase (show_results.py)

Aggregate the previous <output_eval_dir> results into a final CSV table. First, create a config.json to declare the model name and result paths:

{
  "MyModel": {
    "model": "protenix",
    "seeds": [1, 2, 3],
    "dataset_path": {
      "RecentPDB": "<output_eval_dir>"
    }
  }
}

Then run:

python -m benchmark.show_results \
    -c config.json \
    -o ./pxm_results \
    -t Summary,DockQ,LDDT,RMSD

6. Configuration Overrides

PXMeter allows overriding default behaviors via the -C command-line argument, e.g., -C mapping.res_id_alignments=false. Here are important options users might need:

6.1 Mapping Configuration

Controls how the reference and model structures are matched:

  • mapping.mapping_polymer (default true): Whether to map protein/nucleic acid polymers. Set to false if focusing purely on non-polymers.
  • mapping.mapping_ligand (default true): Whether to map ligands (small molecules/ions). Disable to speed up pure protein evaluation if the model has unreliable ligands.
  • mapping.res_id_alignments (default true): If true, matches strictly by residue ID (suitable for refined or consistently numbered models). If false, matches via sequence alignment (suitable for De novo predictions, indels, or inconsistent numbering).
  • mapping.auto_fix_model_entities (default true): Attempts to auto-correct erroneous entity annotations in the model. Highly recommended when handling CIFs from heterogeneous sources.

6.2 Metrics Configuration

Toggle specific metrics to save time or resources:

  • metric.calc_lddt / metric.calc_dockq / metric.calc_rmsd / metric.calc_clashes / metric.calc_pb_valid (all default true): Toggles for main metrics.
  • metric.calc_cdr_h3_bb_rmsd (default false): When enabled, uses ANARCII to identify antibody sequences and calculates the backbone RMSD of the antibody CDR-H3 loop. Only enable when evaluating antibodies.

6.3 LDDT & DockQ Fine-Tuning

  • metric.lddt.nucleotide_threshold (default 30.0): Inclusion radius for nucleic acid atoms during LDDT calculation.
  • metric.lddt.non_nucleotide_threshold (default 15.0): Inclusion radius for non-nucleic (e.g., protein) atoms during LDDT calculation.
  • metric.lddt.calc_backbone_lddt (default true): Calculates and outputs an additional backbone-only LDDT score (bb_lddt).
  • metric.lddt.stereochecks (default false): If true, LDDT will ignore atoms that violate basic stereochemical rules.
  • metric.dockq.exclude_hetatms (default true): Excludes HETATMs before calculating DockQ. Set to false if the interface contains non-standard amino acids or for special peptide-protein interfaces to prevent false exclusion.

7. Interpreting Benchmark Results

After aggregation, a ./pxm_results directory will be generated. The AI needs to know where to find answers:

  1. "What are the overall metrics/success rates?"
    • Read pxm_results/Summary_table.csv or Summary_table.txt.
    • Look for avg_dockq_avg_sr (interaction success rate) and pb_all_valid_and_good_rmsd_sr (comprehensive ligand prediction success rate).
  2. "Which Cases (PDBs) failed prediction?"
    • Check the specific Details CSVs, such as pxm_results/DockQ_details.csv or RMSD_details.csv.
    • These tables contain fine-grained info, including entry_id (e.g., 7rss), chain_id_1, chain_id_2, and specific lddt or DockQ scores.
  3. Parquet Data Files
    • A *_metrics.parquet file will be generated in the user's dataset_path, which is the raw performance cache summarized from JSONs.

8. FAQ & Troubleshooting for AI

Q1: The user says: My model's output directory structure is not supported, throwing "Evaluator not found"?

Action: Guide the user to write a custom parser:

  1. Tell the user to run tree <prediction_dir> -L 3 to capture the directory tree of a single PDB sample.
  2. Inherit from benchmark.evaluators.base.BaseEvaluator.
  3. Implement the _get_info_from_each_pdb_dir(self, pdb_dir: Path) -> list method, extracting name, pdb_id, seed, sample, pred_cif_path, confidence_json_path. Refer to docs/ai_evaluator_helper.md.

Q2: The user asks: Why are there no small molecule RMSD and PoseBusters outputs in the single file evaluation?

Action: Inform the user: During pxm CLI evaluation, ligand-specific metrics are not calculated by default. You must explicitly declare the ligand chains of interest using -l <label_asym_id> (e.g., -l B).

Q3: The user asks: The multimer chain order predicted by the model is different from the reference structure, will it affect the score?

Action: Explain PXMeter's mapping logic: It will not affect it. PXMeter matches entities at the sequence and chemical levels before evaluation. For homologous multimers, it uses geometric alignment to resolve Symmetry, eliminating chain permutation ambiguity; for intra-molecular atoms, it also matches at the atomic level, ensuring the evaluated atoms perfectly correspond.

Q4: How to handle custom small molecules (non-PDB standard CCD)?

Action: Remind the user they can use the --chain_id_to_mol_json parameter, passing a JSON file containing a <chain_id>: <SMILES> dictionary. This guides PXMeter to identify and correctly parse non-standard small molecules generated by the model.


Instructions for AI Agent:

  1. 🚨 CRITICAL RULE: NEVER SCAN THE DISK FOR CIF FILES: When a user's instruction is missing key parameters (like reference.cif), ABSOLUTELY DO NOT use find, Glob, ls -R, or the Task/Explore subagents to search for missing CIF files! The dataset directories contain millions of files; any form of recursive scanning will hang the system and fail the task.
    • ONLY Try Direct Path Assembly: Try checking the exact path using the $PXM_MMCIF_DIR environment variable (if set) or similar paths like $PXM_EVAL_DATA_ROOT_PATH/supported_data/mmcif. E.g., directly ls $PXM_MMCIF_DIR/7rss.cif. Do not guess subdirectories.
    • If the direct path doesn't work, YOU MUST STOP IMMEDIATELY. Ask the user: "Please provide the exact path to the reference structure (reference CIF) and the model's prediction result. Note: I cannot scan for these files because the dataset is too large."
  2. Proactive Result Parsing: After executing any pxm command, if there are output results, proactively parse the JSON or generated CSV files. Do not just report that the command executed successfully; you must also summarize the most important metric scores for the user (e.g., "The LDDT score for this structure is 85.2, and the ligand RMSD is 1.5Å, which falls into the category of a successful prediction").
  3. Addressing Source Code Implementation Queries: When the user asks about the specific implementation principles of pxmeter or benchmark:
    • First, check if a pxmeter directory exists in the current working directory.
    • If not in the current directory, try looking for the pxmeter source code in Python's site-packages installation path (obtainable via python -c "import pxmeter; print(pxmeter.__path__[0])").
    • If the user asks about benchmark aggregation or dataset pipelines (e.g., run_eval), search for the code in the benchmark directory within the current working directory.
    • If the corresponding pxmeter or benchmark code repositories cannot be found in either of these locations, stop searching and explicitly ask the user: "Where is the pxmeter code repository or installation path located?"