Mark recall-limited obligations in DSE shadow telemetry

Trennt im Shadow drei Kategorien statt eines pauschalen FAILED: - echte Lücke (failed_by_current_checker) - redundanter Control-FP (kollabiert per OR zu MET) - Prüfer-Reichweitenproblem (recall_limited) obligation_taxonomy.py: decision_method_required=LLM für recipients_disclosed, third_country_transfer_disclosed, safeguards_disclosed, safeguards_accessible (versioniertes Registry-Artefakt bis DB-Tabelle, v1-Spec). Empirisch: TeamViewer 0/22 kw+emb trotz erfüllter Pflicht (cos 0.49-0.57) → CONTENT/LLM-Klasse, kein Schwellen-Fix. compute_obligation_shadow segregiert FAILED/PARTIAL über requires_llm(): teamviewer 5 Findings → 2 echte + 3 recall_limited. 9 neue Unit-Tests (41 gesamt grün). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 13:46:21 +02:00
parent c3542f7dfe
commit 0631a98bdd
4 changed files with 99 additions and 18 deletions
@@ -59,6 +59,7 @@ def compute_obligation_shadow(results: list[dict], text: str,
        FAILED, LM, NA, PARTIAL, CriterionEval, aggregate_obligations,
    )
    from compliance.services.obligation_applicability import applicable
+    from compliance.services.obligation_taxonomy import requires_llm

    legacy = 0
    evals: list[Any] = []
@@ -78,20 +79,34 @@ def compute_obligation_shadow(results: list[dict], text: str,
        return {"status": "no obligation markers on result controls"}

    obls = aggregate_obligations(evals, applicable_fn=applicable, doc_text=text)
-    findings = sum(1 for o in obls if o.status in (FAILED, PARTIAL))
-    na = sum(1 for o in obls if o.status == NA)
+    # FAILED/PARTIAL ehrlich trennen: echte Lücke (failed_by_current_checker) vs
+    # RECALL_LIMITED (Obligation braucht LLM, aktueller Prüfer kann sie nicht verifizieren).
+    findings = failed_current = recall_limited = na = 0
+    for o in obls:
+        if o.status == NA:
+            na += 1
+        elif o.status in (FAILED, PARTIAL):
+            findings += 1
+            if requires_llm(o.obligation_id):
+                recall_limited += 1
+            else:
+                failed_current += 1
    top = []
    for o in obls:
        cs = contrib.get(o.obligation_id, [])
        fehlt = sum(1 for _, p in cs if not p)
        if fehlt >= 2:
            top.append({"obligation": o.obligation_id, "fehlt": fehlt,
-                        "total": len(cs), "status": o.status})
+                        "total": len(cs), "status": o.status,
+                        "recall_limited": bool(requires_llm(o.obligation_id)
+                                               and o.status in (FAILED, PARTIAL))})
    top.sort(key=lambda x: -x["fehlt"])
    return {
        "legacy_control_findings": legacy,
        "obligation_shadow_results": len(obls),
        "obligation_findings": findings,
+        "failed_by_current_checker": failed_current,
+        "recall_limited": recall_limited,
        "collapse_factor": round(legacy / findings, 2) if findings else None,
        "na_count": na,
        "met_failed_delta": legacy - findings,