Skip to content

Restore dropped term-info fields; harden License + heavy-term paths#50

Merged
Robbie1977 merged 4 commits into
mainfrom
term-info-parity-fixes
Jun 24, 2026
Merged

Restore dropped term-info fields; harden License + heavy-term paths#50
Robbie1977 merged 4 commits into
mainfrom
term-info-parity-fixes

Conversation

@Robbie1977

Copy link
Copy Markdown
Contributor

What

Reconcile term_info_parse_object (vfb_queries.py) with the canonical
dataclass serialiser (term_info_queries.py) so the VFBquery get_term_info
path drops no fields the legacy SOLR term_info-field path renders, and fix two
operational issues found while running the parity harness.

Why

This is the Phase-2 gate for the term-info side-panel migration (legacy SOLR
term_info read → VFBquery get_term_info). Display parity is the gate. Three
reference-bearing fields were silently dropped:

  • Class definition references (def_pubs) — never read. Every class page lost
    its definition citations (e.g. medulla FBbt_00003748 lost FBrf0231227,
    FBrf0224194). The legacy panel appends these inline to the definition
    (VFBProcessTermInfoCachedJson.java:937), so they are restored inline on
    Meta.Description, not as a separate Publications section.
  • Individual synonyms (pub_syn) — the synonym block was gated
    "Class" in SuperTypes, so Individual terms (e.g. VFB_00101385) lost their
    synonyms. Class synonyms were already correct.
  • Publication external content (pub_specific_content) — gated on SuperType
    "Publication", but the SOLR marker is the lowercase "pub"; the block never
    fired, dropping pub title / PubMed / DOI / FlyBase links (e.g. FBrf0242477).

Changes

  • vfb_queries.py term_info_parse_object:
    • append def_pubs microref links inline to Meta.Description (matching the
      legacy definition render), instead of a structured Publications entry;
    • emit Synonyms from pub_syn for any SuperType, not Class-only (each
      synonym already carries its own publication inline);
    • gate the publication block on "pub"/"Publication" (pub title + links).
  • vfb_queries.py fill_query_results: skip the limit=-1 full re-run used only
    to length-check a query when the preview was not saturated (preview rows < cap
    ⇒ count = preview rows). Cuts cold latency on SuperTypes offering many queries
    and the zero-count grey-out path. Saturated previews still re-run.
  • solr_result_cache.py cache_result: default the cache write to a soft commit
    (commit=false) so a wedged IndexWriter cannot stall a cold-miss request and
    surface as ha_api 503 (the License-term symptom). Override with
    VFBQUERY_SOLR_WRITE_COMMIT=true.
  • src/test/test_term_info_parity.py: parity tests for the three field gaps plus
    a License smoke test.

How to test

VFBQUERY_CACHE_ENABLED=false python -m pytest src/test/test_term_info_parity.py -v

All six pass. Parity completeness harness
(projects/term-info-vfbquery-migration/parity_harness.py) reports 0 dropped
identifiers across the full SuperType set.

Follow-ups (out of scope)

  • Deeper count optimisation: dedicated count(...) / Owlery count modes per
    heavy query function (this PR only avoids the unnecessary re-run).
  • Pre-warm License / non-query-reachable SuperTypes in owlery-cache-reload.

Release

Minor bump to v1.22.0 (tag-driven). After merge:
owlery-cache-reload --only "V3 term info" --force-refresh.

term_info_parse_object dropped three reference-bearing fields that the
panel renders today, so the VFBquery term-info path showed less than the
legacy SOLR-field path:

- def_pubs (class definition references) were never read. The legacy
  processor appends them inline to the definition
  (VFBProcessTermInfoCachedJson.java:937); restored the same way, as
  microref links on Meta.Description (not a separate Publications entry,
  so the rendered panel is identical and no new section is introduced).
- pub_syn synonyms were gated Class-only, dropping Individual synonyms.
  Each synonym already carries its own publication inline, matching the
  legacy 'synonym (microref)' render; only the Class gate is removed.
- pub_specific_content was gated on the SuperType "Publication" but the
  SOLR marker is the lowercase "pub", so pub title/PubMed/DOI/FlyBase
  never surfaced.

Also harden two operational paths:

- solr_result_cache.cache_result issued a blocking commit=true write. On a
  wedged IndexWriter a cold-miss term (e.g. a License individual, never
  pre-warmed by owlery-cache-reload) stalled the ha_api worker and the
  request surfaced as HTTP 503. Default to a soft commit (autoSoftCommit
  handles visibility); override with VFBQUERY_SOLR_WRITE_COMMIT=true.
- fill_query_results re-ran each query at limit=-1 purely to length-check
  the result, even when the preview was not saturated. Skip the full
  re-run when the preview returned fewer rows than its cap.

Add test_term_info_parity covering the three field gaps plus a License
smoke test.
Field-coverage sweep against the legacy processor
(VFBProcessTermInfoCachedJson.java) found term_info_parse_object also drops
sections the panel renders today:

- xrefs (external DB cross-references) were dropped entirely — e.g. medulla's
  Insect Brain DB link and gene FlyBase links. Now emitted as a structured
  Xrefs list (site label, accession, external link, icon).
- related_individuals (present on most FBbt classes) were dropped. Now emitted
  as Meta.RelatedIndividuals, grouped like relationships.
- targeting_splits / target_neurons are wired (TargetingSplits/TargetingNeurons)
  to match the legacy model; unpopulated in current SOLR data so a no-op today,
  but no longer at risk of being silently dropped.

Declare the new fields on TermInfoOutputSchema so .load keeps them.

images, downloads and queries are already covered: image/example/domain records
carry the nrrd/obj/wlz/swc URLs and template voxel/extent/centre, so downloads
are recreatable client-side.

Extend test_term_info_parity with xref (anatomy + gene) and related_individuals
cases.
@Robbie1977

Copy link
Copy Markdown
Contributor Author

Pushed a follow-up commit from a pre-release field-coverage sweep against the legacy processor (VFBProcessTermInfoCachedJson.java). Beyond the three named gaps, term_info_parse_object also dropped:

  • xrefs — external DB links (medulla Insect Brain DB; gene FlyBase). Now a structured Xrefs list.
  • related_individuals — present on most FBbt classes. Now Meta.RelatedIndividuals.
  • targeting_splits / target_neurons — wired (TargetingSplits/TargetingNeurons) to match the legacy model; unpopulated in current data so a no-op today, but no longer droppable.

New fields declared on TermInfoOutputSchema. images/downloads/queries already covered (image records carry nrrd/obj/wlz/swc + template geometry). Tests now 9/9; harness spans 12 SuperTypes, 0 dropped.

Reviewed VFB_json_schema_indexer (the indexer that produces the SOLR
term_info field) — the authoritative per-SuperType clause set lives in its
vfb_query_builder QueryLibrary. Two follow-ups from that review:

- DataSet term.link (e.g. Ito2013's FlyBase reference) and term.logo were
  dropped by term_info_parse_object. Now surfaced as Meta.Link / Meta.Logo,
  matching the panel's link/logo rows.
- targeting_splits (neuron_split clause, Neuron classes) and target_neurons
  (split_neuron clause, Split classes) are Cypher-derived in the indexer.
  They are query data, not static term metadata, so the static fields added
  earlier are removed; they will be reintroduced as proper query types
  (preview + count badge) displayed like the current term-info section.

dataset_counts (images/types) is already covered by the DatasetImages query
count badge. Add a DataSet-link test.
@Robbie1977

Copy link
Copy Markdown
Contributor Author

Reviewed VFB_json_schema_indexer (the indexer producing the SOLR term_info field) — its vfb_query_builder QueryLibrary *_term_info() methods are the authoritative per-SuperType clause set. Cross-checked every clause against term_info_parse_object:

Now covered (this PR): term/parents/relationships/synonyms(pub_syn)/def_pubs/xrefs/related_individuals(+term_replaced_by)/license/pubs/pub_specific_content/channel+anatomy images/template channel+domains, and newly DataSet term.link/term.logo → Meta.Link/Meta.Logo.

Query-derived, not static fields — removed from this PR, tracked as a separate follow-up:

  • targeting_splits (Neuron classes) / target_neurons (Split classes) — Cypher in the indexer neuron_split/split_neuron clauses; these become query types with a count badge.
  • dataset_counts (images/types) — already served by the DatasetImages query count.

Tests 10/10; harness 12 SuperTypes, 0 dropped.

targeting_splits (Neuron classes) and target_neurons (Split classes) are
Cypher-derived in the indexer (query_roller neuron_split / split_neuron
clauses), not static term metadata. Surface them as proper VFBquery query
types with preview + count badge, displayed like the current term-info
targeting section:

- get_splits_targeting / get_neurons_targeted_by_split — live Neo4j queries
  returning the standard class-row table (id/label/tags/thumbnail) and the
  true count (so fill_query_results needs no re-run).
- SplitsTargeting_to_schema (Class+Neuron) / TargetNeurons_to_schema
  (Class+Split) builders; wired into term_info_parse_object and ha_api
  QUERY_TYPE_MAP.

Verified live: MBON FBbt_00100243 -> 33 targeting splits; split
VFBexp_FBtp0129935FBtp0129968 -> 18 target neurons. Tests cover both the
offered query and the function count/rows.
@Robbie1977

Copy link
Copy Markdown
Contributor Author

Added targeting_splits/target_neurons as live query types (your call — they're query data, not static fields):

  • get_splits_targeting (Class+Neuron) and get_neurons_targeted_by_split (Class+Split) — live Neo4j queries mirroring the indexer neuron_split/split_neuron clauses, returning the standard class-row table + true count, wired into term_info_parse_object and QUERY_TYPE_MAP with a count badge.
  • Verified live: MBON FBbt_00100243 → 33 targeting splits; split VFBexp_FBtp0129935FBtp0129968 → 18 target neurons.

Also in this PR: DataSet term.link/term.logoMeta.Link/Meta.Logo (License url/logo too). Tests 14/14; harness 12 SuperTypes, 0 dropped. Holding for your review before merge.

@Robbie1977 Robbie1977 merged commit 419a95b into main Jun 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant