| name | pxmeter |
|---|---|
| description | Used to invoke the PXMeter tool for rigorous quality assessment of biomolecular structure prediction models (e.g., proteins, nucleic acids, small molecules). This skill covers single-sample CIF evaluations and large-scale dataset benchmarks, providing rich result parsing support. Trigger this skill when the user asks to "run PXMeter", "compare PDB/CIF evaluation results of different tools", "check benchmark scores", or asks "how accurate is the structure generated by the model". |
EXTREMELY IMPORTANT: DO NOT SCAN FOR FILES Under NO circumstances should you use
find,Glob,ls -R, orTask/Exploresubagents to search for missing.cifor.jsonevaluation files. The dataset storage contains millions of structures, and any recursive search will completely crash or hang the session. If you don't know the exact path, you MUST construct it via$PXM_MMCIF_DIRor immediately ask the user. DO NOT GUESS.
PXMeter has powerful all-atom matching capabilities. Even if the model outputs disordered PDB/CIF chains, PXMeter can find the correct correspondence at the sequence and geometric levels through symmetry resolution (Permutation).
- LDDT (Local Distance Difference Test):
- Meaning: A superposition-free score for local structure differences.
- Interpretation: Scored out of 100. >80 is generally considered good quality. For the environment around ligands (LDDT-PLI), this value reflects the accuracy of the model's binding pocket prediction.
- DockQ:
- Meaning: Specifically measures the interaction interface quality between polymers like protein-protein complexes.
- Interpretation: Rages from 0 to 1. Usually, DockQ > 0.23 is considered a successful interaction interface (Acceptable). Batch Benchmarks often output
avg_dockq_avg_sr(success rate).
- Pocket-aligned RMSD (Ligand RMSD):
- Meaning: Root-mean-square deviation of the ligand calculated after aligning the binding pocket of the model and the reference structure.
- Interpretation: Unit is Å. Typically, RMSD < 2.0 Å is considered a successfully predicted binding pose (Success).
- PoseBusters Validity (Stereochemical Check):
- Meaning: Verifies whether the generated small molecule conformation violates physicochemical rules (e.g., severe steric clashes, incorrect chirality).
- Interpretation: Usually outputs success rates, such as
pb_all_valid_sr(proportion passing all chemical checks) andpb_all_valid_and_good_rmsd_sr(proportion passing both validity checks and RMSD < 2.0 Å).
🚨 Remember: Never use Glob, find, or ls -R if you lack the paths. If you don't know the exact reference.cif path, either assemble it (e.g. $PXM_MMCIF_DIR/{id}.cif) and do a direct ls check, or STOP and ask the user.
If the user provides a pair of CIF files and asks for an evaluation, generate and execute the following command:
pxm -r <reference.cif> -m <model.cif> -o <output.json>Advanced Parameter Scenarios:
- Evaluation with Ligands: If the user cares about the binding quality of specific small molecule ligands, you must use the
-lparameter to specify the chain ID. Without it, RMSD and PoseBusters will not be output.pxm -r ref.cif -m model.cif -l A,B -o output.json
- Processing Specific Assemblies:
--ref_assembly_id 1(Defaults to Asymmetric Unit if not provided).
- Overriding Built-in Hyperparameters: Use
-C, e.g., to ignore ligand mapping:-C mapping.mapping_ligand=false
If the user only has mmCIF reference structures, or wants to convert between different model input formats (e.g., converting AlphaFold3 JSON to Boltz YAML), PXMeter provides a convenient CLI tool:
pxm gen-input \
-i <INPUT_PATH> \
-o <OUTPUT_PATH> \
-it <cif|af3|protenix|boltz|openfold3> \
-ot <af3|protenix|boltz|openfold3> \
--num-seeds <N>- This command supports both single-file conversion and batch conversion of flat directories.
- Interactive Mode: If the user runs
pxm gen-inputwithout any parameters, it enters an interactive terminal mode, guiding the user step-by-step to build the input file.
In addition to calculating metrics against a reference, PXMeter provides a standalone tool to evaluate the stereochemical rationality of polymer structures (like proteins and nucleic acids) within a single generated CIF file, without needing a reference structure.
If the user wants to "check if the generated structure has stereochemical violations" or "evaluate bond lengths/angles of the protein", run:
pxm stereocheck -c <model.cif> -o <stereochem_report.csv>- This command will scan the
<model.cif>and output a CSV report (defaulting tostereochem_report.csv) listing any stereochemical violations (e.g., severe bond length or bond angle deviations) found in the polymer chains. If the structure is completely valid, it will report "No stereochemistry violations found."
If the user needs to evaluate and aggregate an entire dataset, follow these steps.
First, specify the dataset root directory and execute run_eval. PXMeter has built-in support for parsing various model structures like protenix, af3, chai, boltz.
export PXM_EVAL_DATA_ROOT_PATH="<path/to/dataset>"
python -m benchmark.run_eval \
-i <infer_results_dir> \
-o <output_eval_dir> \
-m <model_name> \
-n -1Aggregate the previous <output_eval_dir> results into a final CSV table.
First, create a config.json to declare the model name and result paths:
{
"MyModel": {
"model": "protenix",
"seeds": [1, 2, 3],
"dataset_path": {
"RecentPDB": "<output_eval_dir>"
}
}
}Then run:
python -m benchmark.show_results \
-c config.json \
-o ./pxm_results \
-t Summary,DockQ,LDDT,RMSDPXMeter allows overriding default behaviors via the -C command-line argument, e.g., -C mapping.res_id_alignments=false. Here are important options users might need:
Controls how the reference and model structures are matched:
mapping.mapping_polymer(defaulttrue): Whether to map protein/nucleic acid polymers. Set to false if focusing purely on non-polymers.mapping.mapping_ligand(defaulttrue): Whether to map ligands (small molecules/ions). Disable to speed up pure protein evaluation if the model has unreliable ligands.mapping.res_id_alignments(defaulttrue): Iftrue, matches strictly by residue ID (suitable for refined or consistently numbered models). Iffalse, matches via sequence alignment (suitable for De novo predictions, indels, or inconsistent numbering).mapping.auto_fix_model_entities(defaulttrue): Attempts to auto-correct erroneous entity annotations in the model. Highly recommended when handling CIFs from heterogeneous sources.
Toggle specific metrics to save time or resources:
metric.calc_lddt/metric.calc_dockq/metric.calc_rmsd/metric.calc_clashes/metric.calc_pb_valid(all defaulttrue): Toggles for main metrics.metric.calc_cdr_h3_bb_rmsd(defaultfalse): When enabled, uses ANARCII to identify antibody sequences and calculates the backbone RMSD of the antibody CDR-H3 loop. Only enable when evaluating antibodies.
metric.lddt.nucleotide_threshold(default30.0): Inclusion radius for nucleic acid atoms during LDDT calculation.metric.lddt.non_nucleotide_threshold(default15.0): Inclusion radius for non-nucleic (e.g., protein) atoms during LDDT calculation.metric.lddt.calc_backbone_lddt(defaulttrue): Calculates and outputs an additional backbone-only LDDT score (bb_lddt).metric.lddt.stereochecks(defaultfalse): Iftrue, LDDT will ignore atoms that violate basic stereochemical rules.metric.dockq.exclude_hetatms(defaulttrue): Excludes HETATMs before calculating DockQ. Set tofalseif the interface contains non-standard amino acids or for special peptide-protein interfaces to prevent false exclusion.
After aggregation, a ./pxm_results directory will be generated. The AI needs to know where to find answers:
- "What are the overall metrics/success rates?"
- Read
pxm_results/Summary_table.csvorSummary_table.txt. - Look for
avg_dockq_avg_sr(interaction success rate) andpb_all_valid_and_good_rmsd_sr(comprehensive ligand prediction success rate).
- Read
- "Which Cases (PDBs) failed prediction?"
- Check the specific Details CSVs, such as
pxm_results/DockQ_details.csvorRMSD_details.csv. - These tables contain fine-grained info, including
entry_id(e.g., 7rss),chain_id_1,chain_id_2, and specificlddtorDockQscores.
- Check the specific Details CSVs, such as
- Parquet Data Files
- A
*_metrics.parquetfile will be generated in the user'sdataset_path, which is the raw performance cache summarized from JSONs.
- A
Q1: The user says: My model's output directory structure is not supported, throwing "Evaluator not found"?
Action: Guide the user to write a custom parser:
- Tell the user to run
tree <prediction_dir> -L 3to capture the directory tree of a single PDB sample. - Inherit from
benchmark.evaluators.base.BaseEvaluator. - Implement the
_get_info_from_each_pdb_dir(self, pdb_dir: Path) -> listmethod, extractingname, pdb_id, seed, sample, pred_cif_path, confidence_json_path. Refer todocs/ai_evaluator_helper.md.
Q2: The user asks: Why are there no small molecule RMSD and PoseBusters outputs in the single file evaluation?
Action:
Inform the user: During pxm CLI evaluation, ligand-specific metrics are not calculated by default. You must explicitly declare the ligand chains of interest using -l <label_asym_id> (e.g., -l B).
Q3: The user asks: The multimer chain order predicted by the model is different from the reference structure, will it affect the score?
Action: Explain PXMeter's mapping logic: It will not affect it. PXMeter matches entities at the sequence and chemical levels before evaluation. For homologous multimers, it uses geometric alignment to resolve Symmetry, eliminating chain permutation ambiguity; for intra-molecular atoms, it also matches at the atomic level, ensuring the evaluated atoms perfectly correspond.
Action:
Remind the user they can use the --chain_id_to_mol_json parameter, passing a JSON file containing a <chain_id>: <SMILES> dictionary. This guides PXMeter to identify and correctly parse non-standard small molecules generated by the model.
Instructions for AI Agent:
- 🚨 CRITICAL RULE: NEVER SCAN THE DISK FOR CIF FILES: When a user's instruction is missing key parameters (like
reference.cif), ABSOLUTELY DO NOT usefind,Glob,ls -R, or theTask/Exploresubagents to search for missing CIF files! The dataset directories contain millions of files; any form of recursive scanning will hang the system and fail the task.- ONLY Try Direct Path Assembly: Try checking the exact path using the
$PXM_MMCIF_DIRenvironment variable (if set) or similar paths like$PXM_EVAL_DATA_ROOT_PATH/supported_data/mmcif. E.g., directlyls $PXM_MMCIF_DIR/7rss.cif. Do not guess subdirectories. - If the direct path doesn't work, YOU MUST STOP IMMEDIATELY. Ask the user: "Please provide the exact path to the reference structure (reference CIF) and the model's prediction result. Note: I cannot scan for these files because the dataset is too large."
- ONLY Try Direct Path Assembly: Try checking the exact path using the
- Proactive Result Parsing: After executing any
pxmcommand, if there are output results, proactively parse the JSON or generated CSV files. Do not just report that the command executed successfully; you must also summarize the most important metric scores for the user (e.g., "The LDDT score for this structure is 85.2, and the ligand RMSD is 1.5Å, which falls into the category of a successful prediction"). - Addressing Source Code Implementation Queries: When the user asks about the specific implementation principles of pxmeter or benchmark:
- First, check if a
pxmeterdirectory exists in the current working directory. - If not in the current directory, try looking for the
pxmetersource code in Python'ssite-packagesinstallation path (obtainable viapython -c "import pxmeter; print(pxmeter.__path__[0])"). - If the user asks about benchmark aggregation or dataset pipelines (e.g.,
run_eval), search for the code in thebenchmarkdirectory within the current working directory. - If the corresponding
pxmeterorbenchmarkcode repositories cannot be found in either of these locations, stop searching and explicitly ask the user: "Where is the pxmeter code repository or installation path located?"
- First, check if a