feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
- Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
- TOC + Sprung-Links
- 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
- Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
- Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
- 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
- Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
- Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).
5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:
B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
(Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.
B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
(Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
(Einwilligung empfehlen).
B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).
B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).
B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
(kein Usercentrics/OneTrust/Cookiebot/etc → MED).
B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).
LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
- Läuft AFTER MC pipeline, BEFORE D3 render
- Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
fuzzy-tail-match
- Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
jeden FAIL CheckItem
- V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
- KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.
Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
- 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
- 4/6 MEDIUM ✓
- 2/3 LOW ✓
- Total: 10/13 = 77% (Sprung von 4/13 = 31%)
Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).
V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -128,9 +128,10 @@ _SENTENCE_SPLIT = re.compile(r"(?<=[.!?])\s+(?=[A-ZÄÖÜ])")
|
||||
# Quick anchor terms for retention sentences.
|
||||
_RETENTION_ANCHORS = (
|
||||
"speicherdauer", "speicherfrist", "speicher",
|
||||
"aufbewahrungsdauer", "aufbewahrungsfrist",
|
||||
"löschfrist", "löschung",
|
||||
"gespeichert für", "wird gespeichert", "wird für",
|
||||
"aufbewahrungsdauer", "aufbewahrungsfrist", "aufbewahr",
|
||||
"löschfrist", "löschung", "gelöscht",
|
||||
"gespeichert für", "wird gespeichert", "wird für", "werden für",
|
||||
"in der regel", "bis zu",
|
||||
"retention", "expires", "expiration", "lifetime",
|
||||
"gültigkeit", "laufzeit",
|
||||
)
|
||||
@@ -318,6 +319,74 @@ def compare_retention(
|
||||
return out
|
||||
|
||||
|
||||
def detect_intra_doc_contradictions(
|
||||
dsi_text: str,
|
||||
) -> list[dict]:
|
||||
"""Find sentences in the SAME doc that claim different retention
|
||||
values for what looks like the same data category.
|
||||
|
||||
Catches the Elli pattern:
|
||||
"Logfiles werden 7 Tage gespeichert" + "Logfiles werden 30 Tage
|
||||
aufbewahrt" → contradiction in one DSE.
|
||||
|
||||
Heuristik: group retention-bearing sentences by a category-anchor
|
||||
keyword (logfile / log / chatverlauf / cookies / nutzungsdaten /
|
||||
server-log) and report when ≥2 different day-values exist for the
|
||||
same group.
|
||||
"""
|
||||
if not dsi_text:
|
||||
return []
|
||||
claims = extract_retention_claims(dsi_text)
|
||||
if len(claims) < 2:
|
||||
return []
|
||||
|
||||
anchors = (
|
||||
("logfile", ("logfile", "log-file", "log file", "server-log")),
|
||||
("chat", ("chat", "chatverlauf", "konversation")),
|
||||
("cookie", ("cookie",)),
|
||||
("session", ("session", "sitzung")),
|
||||
("nutzungsdaten", ("nutzungsdaten", "usage data")),
|
||||
)
|
||||
|
||||
by_group: dict[str, list[RetentionClaim]] = {}
|
||||
for cl in claims:
|
||||
if cl.days is None:
|
||||
continue
|
||||
sentence_lc = cl.sentence.lower()
|
||||
for group, kws in anchors:
|
||||
if any(k in sentence_lc for k in kws):
|
||||
by_group.setdefault(group, []).append(cl)
|
||||
break
|
||||
|
||||
findings: list[dict] = []
|
||||
for group, group_claims in by_group.items():
|
||||
days_set = {round(c.days, 1) for c in group_claims if c.days}
|
||||
if len(days_set) < 2:
|
||||
continue
|
||||
values = sorted(days_set)
|
||||
delta = values[-1] - values[0]
|
||||
sev = "HIGH" if delta > values[0] * 3 else "MEDIUM"
|
||||
findings.append({
|
||||
"check_id": "TH-RETENTION-INTRA-001",
|
||||
"category": group,
|
||||
"severity": sev,
|
||||
"severity_reason": "factually_wrong",
|
||||
"values_days": values,
|
||||
"claims": [c.sentence[:200] for c in group_claims[:3]],
|
||||
"title": (
|
||||
f"Speicherdauer-Widerspruch in DSE für '{group}': "
|
||||
f"{values} Tage"
|
||||
),
|
||||
"norm": "DSGVO Art. 5 Abs. 1 lit. a (Transparenz)",
|
||||
"action": (
|
||||
f"In der DSE einheitlichen Wert für '{group}' angeben. "
|
||||
"Aktuell mindestens zwei verschiedene Werte genannt — "
|
||||
"ein Mandant kann die Frist nicht eindeutig erkennen."
|
||||
),
|
||||
})
|
||||
return findings
|
||||
|
||||
|
||||
def build_retention_theme_summary(
|
||||
findings: list[dict],
|
||||
) -> dict:
|
||||
|
||||
Reference in New Issue
Block a user