docs+feat(platform): Pruefer-Matrix-Foundation einfrieren (Evidenz, Mapping, Checker-Library, AGB-Kalibrierung)

Know-how-Freeze der Website-Compliance-Runde (DSE/Cookie/Impressum/AGB). docs: platform_evidence_v1 (Evidenz-/Qualitaetsnachweis, echte Zahlen), nutzungsbedingungen_mapping (neues Modul = Mapping, empirisch belegt), platform_checker_matrix (Meta-Modell verification_method x decision_method), verification_method, platform_validation_v1. code: checkers/ (reusable Pruefer-Library base+reference+embedding+llm, im Container validiert), agb/ (decision_method-Routing + Checker-Prototypen, 71% FP -> ~0 validiert). Dev-only, kein Prod-Push; Benchmark-GTs/Korpora im internen Archiv (data-retention).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-21 09:23:21 +02:00
parent 6b9c7984b4
commit 9d79cf1576
14 changed files with 900 additions and 0 deletions
@@ -0,0 +1,51 @@
"""CONTENT-Pruefer / decision_method=EMBEDDING.
Ist die Pflicht SEMANTISCH im Text vorhanden? Max-Cosinus (Doc-Chunks x Control-
Paraphrasen) >= per-Control-Schwelle. Deterministisch (festes Embedding-Modell)
und gecacht. Rettet Recall-FP (Klausel da, anders formuliert).
Faellt der Embedding-Service aus, liefert der Checker present=None (unklar) — der
Aufrufer behaelt dann das Keyword-Ergebnis (kein Hang, kein Crash).
(Validiert an AGB: 17 Items, per-Item-Schwelle, 0 Fehl-Rescue.)
"""
from __future__ import annotations
import asyncio
import logging
from .base import CheckResult, ControlSpec, DocContext, VerificationMethod
logger = logging.getLogger(__name__)
# Paraphrasen-Vektoren je Control einmal einbetten + cachen.
_PARA_CACHE: dict[str, list] = {}
class EmbeddingChecker:
verification_method = VerificationMethod.CONTENT
async def check(self, ctrl: ControlSpec, doc: DocContext) -> CheckResult:
text = doc.text or ""
paras = ctrl.paraphrases or []
thr = ctrl.embed_threshold if ctrl.embed_threshold is not None else 0.60
if not paras or len(text) < 100:
return CheckResult(present=None, source="embedding")
try:
from compliance.services.mc_embedding_matcher import (
DIM, _chunk_text, _cosine, _embed_texts,
)
if ctrl.control_id not in _PARA_CACHE:
pv = await _embed_texts(paras)
_PARA_CACHE[ctrl.control_id] = [v for v in pv if v and len(v) == DIM]
pvecs = _PARA_CACHE[ctrl.control_id]
chunks = _chunk_text(text)
cvecs = [v for v in await asyncio.wait_for(
_embed_texts(chunks), timeout=90.0) if v and len(v) == DIM]
except (Exception, asyncio.TimeoutError) as e:
logger.info("embedding checker inaktiv %s: %s", ctrl.control_id, str(e)[:80])
return CheckResult(present=None, source="embedding")
if not pvecs or not cvecs:
return CheckResult(present=None, source="embedding")
best = max((_cosine(p, c) for p in pvecs for c in cvecs), default=0.0)
return CheckResult(present=best >= thr, confidence=round(best, 3),
source="embedding")