Restore dropped term-info fields; harden License + heavy-term paths#50
Conversation
term_info_parse_object dropped three reference-bearing fields that the panel renders today, so the VFBquery term-info path showed less than the legacy SOLR-field path: - def_pubs (class definition references) were never read. The legacy processor appends them inline to the definition (VFBProcessTermInfoCachedJson.java:937); restored the same way, as microref links on Meta.Description (not a separate Publications entry, so the rendered panel is identical and no new section is introduced). - pub_syn synonyms were gated Class-only, dropping Individual synonyms. Each synonym already carries its own publication inline, matching the legacy 'synonym (microref)' render; only the Class gate is removed. - pub_specific_content was gated on the SuperType "Publication" but the SOLR marker is the lowercase "pub", so pub title/PubMed/DOI/FlyBase never surfaced. Also harden two operational paths: - solr_result_cache.cache_result issued a blocking commit=true write. On a wedged IndexWriter a cold-miss term (e.g. a License individual, never pre-warmed by owlery-cache-reload) stalled the ha_api worker and the request surfaced as HTTP 503. Default to a soft commit (autoSoftCommit handles visibility); override with VFBQUERY_SOLR_WRITE_COMMIT=true. - fill_query_results re-ran each query at limit=-1 purely to length-check the result, even when the preview was not saturated. Skip the full re-run when the preview returned fewer rows than its cap. Add test_term_info_parity covering the three field gaps plus a License smoke test.
Field-coverage sweep against the legacy processor (VFBProcessTermInfoCachedJson.java) found term_info_parse_object also drops sections the panel renders today: - xrefs (external DB cross-references) were dropped entirely — e.g. medulla's Insect Brain DB link and gene FlyBase links. Now emitted as a structured Xrefs list (site label, accession, external link, icon). - related_individuals (present on most FBbt classes) were dropped. Now emitted as Meta.RelatedIndividuals, grouped like relationships. - targeting_splits / target_neurons are wired (TargetingSplits/TargetingNeurons) to match the legacy model; unpopulated in current SOLR data so a no-op today, but no longer at risk of being silently dropped. Declare the new fields on TermInfoOutputSchema so .load keeps them. images, downloads and queries are already covered: image/example/domain records carry the nrrd/obj/wlz/swc URLs and template voxel/extent/centre, so downloads are recreatable client-side. Extend test_term_info_parity with xref (anatomy + gene) and related_individuals cases.
|
Pushed a follow-up commit from a pre-release field-coverage sweep against the legacy processor (
New fields declared on |
Reviewed VFB_json_schema_indexer (the indexer that produces the SOLR term_info field) — the authoritative per-SuperType clause set lives in its vfb_query_builder QueryLibrary. Two follow-ups from that review: - DataSet term.link (e.g. Ito2013's FlyBase reference) and term.logo were dropped by term_info_parse_object. Now surfaced as Meta.Link / Meta.Logo, matching the panel's link/logo rows. - targeting_splits (neuron_split clause, Neuron classes) and target_neurons (split_neuron clause, Split classes) are Cypher-derived in the indexer. They are query data, not static term metadata, so the static fields added earlier are removed; they will be reintroduced as proper query types (preview + count badge) displayed like the current term-info section. dataset_counts (images/types) is already covered by the DatasetImages query count badge. Add a DataSet-link test.
|
Reviewed Now covered (this PR): term/parents/relationships/synonyms(pub_syn)/def_pubs/xrefs/related_individuals(+term_replaced_by)/license/pubs/pub_specific_content/channel+anatomy images/template channel+domains, and newly DataSet Query-derived, not static fields — removed from this PR, tracked as a separate follow-up:
Tests 10/10; harness 12 SuperTypes, 0 dropped. |
targeting_splits (Neuron classes) and target_neurons (Split classes) are Cypher-derived in the indexer (query_roller neuron_split / split_neuron clauses), not static term metadata. Surface them as proper VFBquery query types with preview + count badge, displayed like the current term-info targeting section: - get_splits_targeting / get_neurons_targeted_by_split — live Neo4j queries returning the standard class-row table (id/label/tags/thumbnail) and the true count (so fill_query_results needs no re-run). - SplitsTargeting_to_schema (Class+Neuron) / TargetNeurons_to_schema (Class+Split) builders; wired into term_info_parse_object and ha_api QUERY_TYPE_MAP. Verified live: MBON FBbt_00100243 -> 33 targeting splits; split VFBexp_FBtp0129935FBtp0129968 -> 18 target neurons. Tests cover both the offered query and the function count/rows.
|
Added
Also in this PR: DataSet |
What
Reconcile
term_info_parse_object(vfb_queries.py) with the canonicaldataclass serialiser (
term_info_queries.py) so the VFBqueryget_term_infopath drops no fields the legacy SOLR
term_info-field path renders, and fix twooperational issues found while running the parity harness.
Why
This is the Phase-2 gate for the term-info side-panel migration (legacy SOLR
term_inforead → VFBqueryget_term_info). Display parity is the gate. Threereference-bearing fields were silently dropped:
def_pubs) — never read. Every class page lostits definition citations (e.g. medulla
FBbt_00003748lost FBrf0231227,FBrf0224194). The legacy panel appends these inline to the definition
(
VFBProcessTermInfoCachedJson.java:937), so they are restored inline onMeta.Description, not as a separate Publications section.pub_syn) — the synonym block was gated"Class" in SuperTypes, so Individual terms (e.g.VFB_00101385) lost theirsynonyms. Class synonyms were already correct.
pub_specific_content) — gated on SuperType"Publication", but the SOLR marker is the lowercase"pub"; the block neverfired, dropping pub title / PubMed / DOI / FlyBase links (e.g.
FBrf0242477).Changes
vfb_queries.pyterm_info_parse_object:def_pubsmicroref links inline toMeta.Description(matching thelegacy definition render), instead of a structured Publications entry;
Synonymsfrompub_synfor any SuperType, not Class-only (eachsynonym already carries its own publication inline);
"pub"/"Publication"(pub title + links).vfb_queries.pyfill_query_results: skip thelimit=-1full re-run used onlyto length-check a query when the preview was not saturated (preview rows < cap
⇒ count = preview rows). Cuts cold latency on SuperTypes offering many queries
and the zero-count grey-out path. Saturated previews still re-run.
solr_result_cache.pycache_result: default the cache write to a soft commit(
commit=false) so a wedged IndexWriter cannot stall a cold-miss request andsurface as ha_api 503 (the License-term symptom). Override with
VFBQUERY_SOLR_WRITE_COMMIT=true.src/test/test_term_info_parity.py: parity tests for the three field gaps plusa License smoke test.
How to test
All six pass. Parity completeness harness
(
projects/term-info-vfbquery-migration/parity_harness.py) reports 0 droppedidentifiers across the full SuperType set.
Follow-ups (out of scope)
count(...)/ Owlery count modes perheavy query function (this PR only avoids the unnecessary re-run).
owlery-cache-reload.Release
Minor bump to v1.22.0 (tag-driven). After merge:
owlery-cache-reload --only "V3 term info" --force-refresh.