Adds the semantic judgement layer on top of the slice-1 detector + GT wall.
DEV-TIME, propose-only — nothing mutates the library or runtime.
- CandidateJudge interface with two implementations: HeuristicJudge
(deterministic default/fallback, used in tests) and LLMJudge (offline, over the
shared llm.ProviderRegistry via the LLMCompleter adapter). LLMJudge degrades to
"uncertain" on any transport/parse error — it can never break a run.
- BuildJudgePrompt: the ISO 12100 same-vs-distinct prompt, unit-tested
deterministically even though the call is not.
- RenderProposalQueue: markdown human-review queue with a suggested action per
candidate (supersede / keep both / needs review).
On real warewashing output the heuristic punts to "uncertain — needs the LLM
judge" for exactly the two recall-safe near-dupes (HP807/HP033 update,
HP101/HP096 winding-vs-friction), making the LLM's role explicit. All 3 GTs
unaffected (read-only). Live qwen wiring + a CLI/file queue are slice 3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First thin slice of the offline library-improvement proposer. DEV-TIME ONLY,
propose-only — it never mutates the pattern library or the runtime.
- FindDedupCandidates (proposer_dedup.go): structural near-duplicate detection
over the fired patterns (category + measure/zone/scenario overlap). Bakes in
the P1 lesson: only same-category pairs compare, and pairs with different
operational states are never proposed (normal-operation vs maintenance are
legitimately distinct, e.g. HP011 vs HP077).
- ScreenSupersession (proposer_screen.go): the wall. A proposal is safe only if
(1) dropping the hazard does not reduce GT recall AND (2) keep/drop do not
credit DIFFERENT GT entries. Check 2 catches distinct hazards that merely share
measures (HP2201 hot surface GT 1.3 vs HP2202 hot ware GT 1.4) which recall
alone would wave through.
On real warewashing output: 3 candidates -> 1 BLOCKED (distinct GT), 2
RECALL-SAFE for human/LLM review (the update + winding/friction near-dupes).
Nothing auto-applied. All 3 GTs unaffected (read-only). The LLM judgement and a
CLI/file queue are slice 2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 of the commercial white-goods expansion (EN ISO 10472 family). Extend
GT #3 with 8 completeness hazards a Fachmann expects but that were neither in
the GT nor previously questioned: dry-run boiler overheating, residual/stored
electrical energy, sharp-edge cut, tipping, interlock-failure, unexpected
restart, backflow (EN 1717), microbial/legionella. Enrich the UC-M narrative
with the real features so existing library patterns can fire.
Result: 4/8 auto-covered by existing patterns (dry-run, residual voltage,
tipping, interlock-failure) — recall 84% (21/25). Remaining gaps documented:
spray-arm contact (4.3), sharp-edge cut (4.6), backflow (2.3), restart (6.4).
Gate the re-surfaced CNC leak ("spanende Bearbeitung", high_temperature-only)
via dom_cnc. Kistenhub 97.1% and Bremse pinned mappings unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Class D (generic keyword hygiene, GT-guarded). Two over-broad keyword->component
mappings produced ghost components:
- "kuehl"/"cool" -> Kuehlaggregat (C095) matched product variants
("Cool-Ausfuehrung") and outputs ("kuehle Glaeser"). Narrowed to cooling-UNIT
terms (kuehlaggregat, kuehlanlage, kuehler, kaeltemaschine, chiller, rueckkuehl).
- "filter" -> Absauganlage/Oelnebelabscheider (C124) matched any filter
(Laugen-/Wasser-/Oelfilter). Keep "filteranlage" only.
No pattern or GT test depends on these mappings (Kistenhub/Bremse use hand-crafted
inputs). UC-M now parses 6 plausible components (was 8 incl. the two ghosts).
Warewashing GT recall 82.4% and Kistenhub/Bremse pins unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add ground_truth_warewashing.json + TestWarewashing_GTCoverage. The test runs
the UC-M narrative through the SAME chain as production (ParseNarrative ->
engine -> relevance + cyber filter), so keyword/gating fixes are measured on
the real hazard set, and false positives show up as "extra".
Class A (generic keyword hygiene): spuelarm/spuelfeld no longer map to library
component C004 ("Drehtisch" / rotary table) — that mislabelled the spray arm.
Keep the rotating_part tag. Removes the bogus "Drehtisch" hazard.
GT #3 baseline -> after Class A: recall 80% (unchanged), one false positive
(Drehtisch) removed. Kistenhub 97.1% and Bremse pinned mappings unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>