Replies: 1 comment
-
|
Design approved retroactively — this is exactly what #935 implements. PR #935 merged yesterday (2026-05-15) with Your 8x candidate improvement (75 vs 9 per session) became the headline number in the PR description — a compelling case for locale support. Portuguese and German patterns are both in the initial merge; adding further locales is now a one-YAML-file contribution. Thanks for the careful design-first write-up — it made the implementation review straightforward. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @doobidoo — another design-first proposal per #870 guidelines.
Problem
The harvest extractor uses hardcoded English regex patterns (
"decided","root cause","convention:","learned that"). Non-English users get dramatically fewer candidates — in our case (Portuguese), harvest found 9 candidates vs 75 with locale patterns enabled. That's an 8x gap.This affects any user whose conversations happen in a non-English language.
Proposed Design
YAML-based pattern plugins loaded at startup:
Loading mechanism:
HARVEST_LOCALE=en,pt_BRenv var controls which files to loadenonly (backward compatible)YAML format:
No changes to extraction logic — only the pattern list grows. Same confidence scoring, same dedup.
Why not LLM-based?
LLM classification (Phase 2 in harvest) already exists as opt-in. This proposal is for the regex Phase 1 — fast, deterministic, zero-cost. The two are complementary.
Scope
harvest/patterns/__init__.py— YAML loaderharvest/patterns/en.yaml— extract existing patternsharvest/patterns/pt_BR.yaml— Portuguese exampleharvest/extractor.py— use loaded patterns instead of hardcodedThoughts? The implementation is straightforward — main question is whether YAML plugins is the right format or if you'd prefer something else.
Beta Was this translation helpful? Give feedback.
All reactions