breakpilot-compliance

Files

T

Benjamin Admin e809d0bc1c feat(cookie): Layer-3 sufficiency-judge — Haiku re-judges embedding/boost rescues

The embedding/boost auto-rescue is intentionally optimistic (finds the topic, not
fulfilment) -> 159 FN over-rescues vs Opus-GT (recall 0.13). Layer-3 re-judges
exactly the rescued passes with the validated Haiku judge (cohort
cookie_sufficiency_v1 P0.89/R0.91) -- NOT the Qwen-first cascade (local is
disproven as a sufficiency judge) -- and un-passes them when the obligation is
not concretely met. Gated to the full check (not skip_llm).

Measured (5-firm Opus-GT, engine+L3): FN 159->12, recall 0.13->0.93,
precision 0.96->0.90 (276 rescues corrected). "Embedding finds, Claude decides."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-22 17:37:48 +02:00

_kb

…

agb

feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29 )

2026-06-21 12:58:26 +00:00

cookie_policy

feat(cookie): Layer-3 sufficiency-judge — Haiku re-judges embedding/boost rescues

2026-06-22 17:37:48 +02:00

cross_placement

…

dse