breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	0ce4794767	feat(ai-sdk): pluggable LLM judgment over recall-safe dedup candidates (P2 slice 2) Adds the semantic judgement layer on top of the slice-1 detector + GT wall. DEV-TIME, propose-only — nothing mutates the library or runtime. - CandidateJudge interface with two implementations: HeuristicJudge (deterministic default/fallback, used in tests) and LLMJudge (offline, over the shared llm.ProviderRegistry via the LLMCompleter adapter). LLMJudge degrades to "uncertain" on any transport/parse error — it can never break a run. - BuildJudgePrompt: the ISO 12100 same-vs-distinct prompt, unit-tested deterministically even though the call is not. - RenderProposalQueue: markdown human-review queue with a suggested action per candidate (supersede / keep both / needs review). On real warewashing output the heuristic punts to "uncertain — needs the LLM judge" for exactly the two recall-safe near-dupes (HP807/HP033 update, HP101/HP096 winding-vs-friction), making the LLM's role explicit. All 3 GTs unaffected (read-only). Live qwen wiring + a CLI/file queue are slice 3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:27:01 +02:00
Benjamin Admin	8674b2cd9a	feat(ai-sdk): offline dedup-candidate proposer + deterministic GT wall (P2 slice 1) First thin slice of the offline library-improvement proposer. DEV-TIME ONLY, propose-only — it never mutates the pattern library or the runtime. - FindDedupCandidates (proposer_dedup.go): structural near-duplicate detection over the fired patterns (category + measure/zone/scenario overlap). Bakes in the P1 lesson: only same-category pairs compare, and pairs with different operational states are never proposed (normal-operation vs maintenance are legitimately distinct, e.g. HP011 vs HP077). - ScreenSupersession (proposer_screen.go): the wall. A proposal is safe only if (1) dropping the hazard does not reduce GT recall AND (2) keep/drop do not credit DIFFERENT GT entries. Check 2 catches distinct hazards that merely share measures (HP2201 hot surface GT 1.3 vs HP2202 hot ware GT 1.4) which recall alone would wave through. On real warewashing output: 3 candidates -> 1 BLOCKED (distinct GT), 2 RECALL-SAFE for human/LLM review (the update + winding/friction near-dupes). Nothing auto-applied. All 3 GTs unaffected (read-only). The LLM judgement and a CLI/file queue are slice 2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:27:01 +02:00
Benjamin Admin	fe5dc59152	test(ai-sdk): GT #3 completeness — 8 shared white-goods hazards + CNC gate CI / detect-changes (push) Successful in 7s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / build-sha-integrity (push) Successful in 6s Details CI / validate-canonical-controls (push) Successful in 4s Details CI / loc-budget (push) Successful in 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / test-go (push) Successful in 58s Details CI / iace-gt-coverage (push) Successful in 15s Details CI / test-python-backend (push) Has been skipped Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Phase 1 of the commercial white-goods expansion (EN ISO 10472 family). Extend GT #3 with 8 completeness hazards a Fachmann expects but that were neither in the GT nor previously questioned: dry-run boiler overheating, residual/stored electrical energy, sharp-edge cut, tipping, interlock-failure, unexpected restart, backflow (EN 1717), microbial/legionella. Enrich the UC-M narrative with the real features so existing library patterns can fire. Result: 4/8 auto-covered by existing patterns (dry-run, residual voltage, tipping, interlock-failure) — recall 84% (21/25). Remaining gaps documented: spray-arm contact (4.3), sharp-edge cut (4.6), backflow (2.3), restart (6.4). Gate the re-surfaced CNC leak ("spanende Bearbeitung", high_temperature-only) via dom_cnc. Kistenhub 97.1% and Bremse pinned mappings unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-24 23:46:19 +02:00
Benjamin Admin	bde6e76a57	fix(ai-sdk): keyword precision — stop adjective/generic ghost components Class D (generic keyword hygiene, GT-guarded). Two over-broad keyword->component mappings produced ghost components: - "kuehl"/"cool" -> Kuehlaggregat (C095) matched product variants ("Cool-Ausfuehrung") and outputs ("kuehle Glaeser"). Narrowed to cooling-UNIT terms (kuehlaggregat, kuehlanlage, kuehler, kaeltemaschine, chiller, rueckkuehl). - "filter" -> Absauganlage/Oelnebelabscheider (C124) matched any filter (Laugen-/Wasser-/Oelfilter). Keep "filteranlage" only. No pattern or GT test depends on these mappings (Kistenhub/Bremse use hand-crafted inputs). UC-M now parses 6 plausible components (was 8 incl. the two ghosts). Warewashing GT recall 82.4% and Kistenhub/Bremse pins unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-24 23:01:19 +02:00
Benjamin Admin	cf86dc241b	test(ai-sdk): GT #3 (commercial dishwasher) + fix Drehtisch keyword mislabel Add ground_truth_warewashing.json + TestWarewashing_GTCoverage. The test runs the UC-M narrative through the SAME chain as production (ParseNarrative -> engine -> relevance + cyber filter), so keyword/gating fixes are measured on the real hazard set, and false positives show up as "extra". Class A (generic keyword hygiene): spuelarm/spuelfeld no longer map to library component C004 ("Drehtisch" / rotary table) — that mislabelled the spray arm. Keep the rotating_part tag. Removes the bogus "Drehtisch" hazard. GT #3 baseline -> after Class A: recall 80% (unchanged), one false positive (Drehtisch) removed. Kistenhub 97.1% and Bremse pinned mappings unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-24 21:51:26 +02:00

5 Commits