feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase

Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
  - Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
  - TOC + Sprung-Links
  - 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
  - Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
  - Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
  - 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
  - Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
  - Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).

5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:

  B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
     (Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
     6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.

  B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
     (Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
     Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
     (Einwilligung empfehlen).

  B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).

  B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
     Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).

  B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
     (kein Usercentrics/OneTrust/Cookiebot/etc → MED).

  B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
     Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).

LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
  - Läuft AFTER MC pipeline, BEFORE D3 render
  - Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
    fuzzy-tail-match
  - Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
    jeden FAIL CheckItem
  - V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
  - KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
    8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.

Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
  - 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
    VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
  - 4/6 MEDIUM ✓
  - 2/3 LOW ✓
  - Total: 10/13 = 77% (Sprung von 4/13 = 31%)

Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).

V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-06 21:19:49 +02:00
parent c2c8783fee
commit d0e3621192
27 changed files with 4426 additions and 3 deletions
@@ -20,6 +20,7 @@ import time
from compliance.services.retention_comparator import (
build_retention_theme_summary,
compare_retention,
detect_intra_doc_contradictions,
extract_retention_claims,
)
@@ -54,6 +55,11 @@ def run_b3(state: dict) -> None:
if not dsi_text:
return
# Intra-doc contradictions are independent of cmp_vendors — run
# them first so they survive the early-return below.
intra = detect_intra_doc_contradictions(dsi_text)
state["retention_intra_doc"] = intra
cookie_records: list[dict] = []
cookie_names: list[str] = []
vendor_names: list[str] = []
@@ -0,0 +1,78 @@
"""B4 wiring — Cross-Doc Vendor-Consistency check + HTML block.
Activated after B1+B3 in the orchestrator. The check itself is
deterministic (no LLM); it scans DSE + cookie texts for known
service providers per service type and flags every mismatch.
The mail renderer reads `state["vendor_consistency_findings"]` and
`state["vendor_consistency_html"]` directly — no further wiring.
"""
from __future__ import annotations
import html
import logging
from compliance.services.vendor_consistency_check import (
check_vendor_consistency,
)
logger = logging.getLogger(__name__)
def run_b4(state: dict) -> None:
findings = check_vendor_consistency(state)
state["vendor_consistency_findings"] = findings
if not findings:
return
state["vendor_consistency_html"] = _render(findings)
logger.info(
"B4 Vendor-Consistency: %d findings (HIGH=%d, MEDIUM=%d)",
len(findings),
sum(1 for f in findings if (f.get("severity") or "") == "HIGH"),
sum(1 for f in findings if (f.get("severity") or "") == "MEDIUM"),
)
def _render(findings: list[dict]) -> str:
rows = []
for f in findings:
sev = (f.get("severity") or "").upper()
color = "#dc2626" if sev == "HIGH" else "#f59e0b"
dse = ", ".join(f.get("dse_providers") or []) or "<em></em>"
cookie = ", ".join(f.get("cookie_providers") or []) or "<em></em>"
rows.append(
"<tr>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;'>"
f"{html.escape((f.get('service_type') or '').replace('_',' ').title())}"
"</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;'>"
f"{dse}</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;'>"
f"{cookie}</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;"
f"color:{color};font-weight:600;'>"
f"{sev} {html.escape(f.get('severity_reason') or '')}</td>"
"</tr>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid #dc2626;"
"background:#fff1f2;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#991b1b;font-size:16px;'>"
"VENDOR-CONSISTENCY-001 — Vendor-Konsistenz DSE ↔ Cookies</h2>"
"<p style='margin:0 0 8px;font-size:14px;color:#3f3f46;'>"
f"<strong>{len(findings)}</strong> Provider-Widersprüche zwischen "
"Datenschutzerklärung und Cookie-Seite. Beispiel Elli: "
"DSE = Vertex AI für Chatbot, Cookies-Seite = Iadvize.</p>"
"<table style='width:100%;border-collapse:collapse;font-size:13px;"
"margin-top:8px;background:#fff;'>"
"<thead><tr style='background:#f1f5f9;'>"
"<th style='text-align:left;padding:6px 10px;'>Service-Typ</th>"
"<th style='text-align:left;padding:6px 10px;'>In DSE</th>"
"<th style='text-align:left;padding:6px 10px;'>Auf Cookies-Seite</th>"
"<th style='text-align:left;padding:6px 10px;'>Severity</th>"
"</tr></thead>"
f"<tbody>{''.join(rows)}</tbody>"
"</table>"
"</div>"
)
@@ -0,0 +1,81 @@
"""B5 wiring — AI-Act Art. 50 Transparenzpflicht-Check + HTML block.
Runs after B4 (vendor-consistency). Deterministic detection of
AI-Provider mentions + disclosure-phrase mentions. When an AI is
present but no Art-50-disclosure → HIGH finding; when both present
the renderer flags MEDIUM/manual-review because the LIVE pre-chat
UI hint cannot be verified without a consent-tester DOM scan.
"""
from __future__ import annotations
import html
import logging
from compliance.services.ai_act_transparency_check import (
check_ai_act_transparency,
)
logger = logging.getLogger(__name__)
def run_b5(state: dict) -> None:
findings = check_ai_act_transparency(state)
state["ai_act_findings"] = findings
if not findings:
return
state["ai_act_html"] = _render(findings)
logger.info(
"B5 AI-Act: %d findings (HIGH=%d, MEDIUM=%d)",
len(findings),
sum(1 for f in findings if (f.get("severity") or "") == "HIGH"),
sum(1 for f in findings if (f.get("severity") or "") == "MEDIUM"),
)
def _render(findings: list[dict]) -> str:
cards = []
for f in findings:
sev = (f.get("severity") or "").upper()
color = "#dc2626" if sev == "HIGH" else "#f59e0b"
vendors_html = ""
if f.get("ai_vendors"):
chips = "".join(
f"<span style='display:inline-block;background:#f1f5f9;"
f"padding:2px 8px;border-radius:999px;margin:2px 4px 2px 0;"
f"font-size:11px;'>{html.escape(v.get('vendor') or '')}</span>"
for v in f["ai_vendors"]
)
vendors_html = (
"<div style='margin-top:6px;font-size:13px;'>"
f"<strong>Erkannte AI-Vendors:</strong> {chips}</div>"
)
signals_html = (
f"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>{html.escape(f.get('detected_signals') or '')}</em></div>"
)
cards.append(
f"<div style='margin:12px 0;padding:14px;background:#fff;"
f"border-left:3px solid {color};border-radius:4px;'>"
f"<div style='font-weight:600;color:{color};font-size:14px;'>"
f"{sev} · {html.escape(f.get('check_id') or '')}</div>"
f"<div style='font-size:14px;margin-top:4px;'>"
f"<strong>{html.escape(f.get('title') or '')}</strong></div>"
f"<div style='font-size:12px;color:#64748b;margin-top:2px;'>"
f"{html.escape(f.get('norm') or '')}</div>"
f"{vendors_html}{signals_html}"
f"<div style='font-size:13px;margin-top:8px;background:#dcfce7;"
f"padding:8px 10px;border-radius:4px;'>"
f"<strong>→ Empfehlung:</strong> "
f"{html.escape(f.get('action') or '')}</div>"
"</div>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid #dc2626;"
"background:#fef2f2;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#991b1b;font-size:16px;'>"
"🤖 AI-Act Art. 50 — Transparenzpflicht KI-Interaktion"
"</h2>"
+ "".join(cards) +
"</div>"
)
@@ -0,0 +1,97 @@
"""B6 / B7 / B8 wiring — DPO-cross-doc, doc-staleness, CMP-fingerprint.
Three small, deterministic checks added after B5. Each writes one or
more findings into `state["extra_findings"]` and a tiny HTML block
into `state["extra_findings_html"]` that the V2 renderer concatenates
between B5 (AI-Act) and the legacy section block.
"""
from __future__ import annotations
import html
import logging
from compliance.services.cmp_fingerprint_check import check_cmp_fingerprint
from compliance.services.cross_doc_dpo_check import check_dpo_cross_doc
from compliance.services.doc_staleness_check import check_staleness
logger = logging.getLogger(__name__)
def run_b6b7b8(state: dict) -> None:
findings: list[dict] = []
dpo = check_dpo_cross_doc(state)
if dpo:
findings.append(dpo)
stale = check_staleness(state)
findings.extend(stale)
cmp = check_cmp_fingerprint(state)
if cmp:
findings.append(cmp)
state["extra_findings"] = findings
if findings:
state["extra_findings_html"] = _render(findings)
logger.info(
"B6/B7/B8 extra: %d findings (DPO=%d, staleness=%d, CMP=%d)",
len(findings), 1 if dpo else 0, len(stale), 1 if cmp else 0,
)
def _render(findings: list[dict]) -> str:
cards = []
for f in findings:
sev = (f.get("severity") or "").upper()
color = "#dc2626" if sev == "HIGH" else (
"#f59e0b" if sev == "MEDIUM" else "#64748b"
)
evidence_html = ""
if f.get("evidence_dse"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>In DSE: {html.escape(', '.join(f['evidence_dse']))}</em>"
"</div>"
)
if f.get("doc_date"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>Stand: {html.escape(f['doc_date'])} "
f"({f.get('age_years','?')} Jahre alt, Cap "
f"{f.get('threshold_years','?')} Jahre)</em>"
"</div>"
)
if f.get("detected_provider"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>Erkannter Provider: "
f"{html.escape(f['detected_provider'])}</em>"
"</div>"
)
cards.append(
f"<div style='margin:12px 0;padding:14px;background:#fff;"
f"border-left:3px solid {color};border-radius:4px;'>"
f"<div style='font-weight:600;color:{color};font-size:14px;'>"
f"{sev} · {html.escape(f.get('check_id') or '')}</div>"
f"<div style='font-size:14px;margin-top:4px;'>"
f"<strong>{html.escape(f.get('title') or '')}</strong></div>"
f"<div style='font-size:12px;color:#64748b;margin-top:2px;'>"
f"{html.escape(f.get('norm') or '')}</div>"
f"{evidence_html}"
f"<div style='font-size:13px;margin-top:8px;background:#dcfce7;"
f"padding:8px 10px;border-radius:4px;'>"
f"<strong>→ Empfehlung:</strong> "
f"{html.escape(f.get('action') or '')}</div>"
"</div>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid #f59e0b;"
"background:#fffbeb;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#92400e;font-size:16px;'>"
"📌 Zusätzliche Cross-Doc-Befunde (DPO / Staleness / CMP-Fingerprint)"
"</h2>"
+ "".join(cards) +
"</div>"
)
@@ -18,12 +18,16 @@ import logging
from ._b1_wiring import run_b1
from ._b3_wiring import run_b3
from ._b4_wiring import run_b4
from ._b5_wiring import run_b5
from ._b6b7b8_wiring import run_b6b7b8
from ._constants import _compliance_check_jobs
from ._phase_a_resolve import run_phase_a
from ._phase_b_profile_check import run_phase_b
from ._phase_c_banner import run_phase_c
from ._phase_d1_vendors_raw import run_phase_d1
from ._phase_d2_vendors_finalize import run_phase_d2
from ._phase_d2b_plausibility import run_phase_d2b
from ._phase_d3_blocks_bot import run_phase_d3_bot
from ._phase_d3_blocks_mid import run_phase_d3_mid
from ._phase_d3_blocks_top import run_phase_d3_top
@@ -49,11 +53,16 @@ async def run_compliance_check(check_id: str, req) -> None:
# Phase D-1/D-2: Step 5 vendor extraction + finalize
await run_phase_d1(state)
await run_phase_d2(state)
# D-2b: LLM Plausibility Re-Eval — stamps llm_* on all FAIL checks
await run_phase_d2b(state)
# B1 + B3: cross-cutting checks that need the finalized vendor
# list + DSI text. Render their own HTML blocks consumed by
# phase D-3 bot's full_html composition.
await run_b1(state)
run_b3(state)
run_b4(state) # Cross-doc vendor-consistency (Elli Vertex↔Iadvize)
run_b5(state) # AI-Act Art. 50 transparency
run_b6b7b8(state) # DPO-cross-doc + Doc-Staleness + CMP-fingerprint
# Phase D-3 top/mid/bot: Step 5 HTML blocks
await run_phase_d3_top(state)
await run_phase_d3_mid(state)
@@ -0,0 +1,41 @@
"""Phase D-2b — LLM Plausibility Re-Eval over all MC findings.
Runs AFTER vendor finalize and BEFORE D3 HTML blocks. Stamps the
`llm_title` / `llm_severity` / `llm_recommendation` / `llm_drop`
fields onto every FAIL CheckItem. The V2 mail renderer reads these
fields automatically — no further wiring needed.
Opt-out via env var `PLAUSIBILITY_DISABLED=true` (e.g. for CI runs
where the LLM endpoint isn't reachable).
"""
from __future__ import annotations
import logging
import os
from ._helpers import _update
logger = logging.getLogger(__name__)
async def run_phase_d2b(state: dict) -> None:
"""Run the plausibility re-eval over state["results"]. Mutates checks."""
if os.environ.get("PLAUSIBILITY_DISABLED", "false").lower() in (
"true", "1", "yes",
):
logger.info("plausibility-check disabled by env")
return
check_id = state["check_id"]
results = state.get("results") or []
doc_texts = state.get("doc_texts") or {}
if not results:
return
_update(check_id, "LLM-Plausibilitäts-Check über alle Findings...", 94)
try:
from compliance.services.finding_plausibility_check import (
verify_plausibility,
)
await verify_plausibility(results, doc_texts)
except Exception as e:
logger.warning("plausibility-phase failed (continuing): %s", e)
@@ -217,4 +217,17 @@ async def run_phase_d3_bot(state: dict) -> None:
)
state["audit_quality_findings"] = audit_quality_findings
# MAIL_RENDER_V2 — opt-in unified layout. Default keeps the legacy
# composition so we can A/B compare in Mailpit.
try:
from compliance.services.mail_render_v2._compose import (
compose_v2, is_v2_enabled,
)
if is_v2_enabled():
full_html = compose_v2(state)
logger.info("MAIL_RENDER_V2 active: %d bytes", len(full_html))
except Exception as e:
logger.warning("MAIL_RENDER_V2 fallback to legacy: %s", e)
state["full_html"] = full_html
@@ -0,0 +1,221 @@
"""AI-Act Art. 50 Transparenzpflicht-Check (vereinfacht).
Art. 50 AI Act verlangt, dass Nutzer beim Interagieren mit einem
KI-System (Chatbot, Sprachassistent etc.) erkennen können, dass sie
mit einer KI sprechen — es sei denn, das ist offensichtlich aus dem
Kontext heraus.
Der Check ist heuristisch (kein LLM) und prüft drei Schichten:
1. AI-Provider-Detection in DSE und Vendor-Liste
(Vertex AI, OpenAI, Anthropic, etc.)
2. Disclosure-Text-Detection in DSE / Cookie-Doc
("KI-System", "Sie chatten mit einer KI", "automatisiert",
"Artificial Intelligence", "Konversations-KI", "GPT", …)
3. Cross-Check: AI-Provider gefunden + keine Disclosure → HIGH
AI-Provider gefunden + Disclosure vorhanden, aber kein "Sie
interagieren mit einer KI"-Hinweis → MEDIUM (Pre-Chat-Hinweis
vor erstem Input gefordert; kann nur ein consent-tester-DOM-Scan
verifizieren)
Bekannte Limitation: ohne consent-tester-Erweiterung kann der Check
nicht entscheiden, ob ein Pre-Chat-Hinweis im Live-DOM vor dem
ersten Nutzer-Input erscheint. Wir flaggen das daher als MEDIUM
"manuell verifizieren".
"""
from __future__ import annotations
import logging
import re
logger = logging.getLogger(__name__)
# AI-Anbieter / Modelle / Frameworks, die "AI" auslösen.
_AI_KEYWORDS = (
"vertex ai", "google vertex", "openai", "gpt-3", "gpt-4", "chatgpt",
"anthropic", "claude.ai", "claude-3", "mistral ai", "huggingface",
"hugging face", "stable diffusion", "midjourney", "llama-2", "llama-3",
"qwen", "deepseek", "perplexity ai", "azure openai", "copilot",
"konversations-ki", "konversations ki",
"ai assistant", "ai-assistant", "ki-assistent", "ki assistent",
"intelligenter assistent",
"automatisierter chat", "chatbot", "live-chat",
)
# Phrasen, die als Art-50-Disclosure gelten.
_DISCLOSURE_PHRASES = (
"sie chatten mit einer ki", "sie kommunizieren mit einer ki",
"automatisierter chat", "automatisierter assistent",
"ki-gestützter", "ki gestützt", "ki-gestuetzt",
"künstliche intelligenz", "kuenstliche intelligenz",
"artificial intelligence",
"art. 50 ai act", "ai act art. 50", "art. 50 ki-vo",
"ki-verordnung", "ki verordnung",
"automatisiertes system",
"generativ", "generative ki", "generative ai",
"large language model", "llm-",
"machine learning",
)
def _has_any(text: str, phrases) -> list[str]:
text_lc = (text or "").lower()
if not text_lc:
return []
return [p for p in phrases if p in text_lc]
def _find_ai_in_vendors(cmp_vendors: list[dict]) -> list[dict]:
"""Find vendors whose name or category mentions an AI provider."""
hits: list[dict] = []
for v in cmp_vendors or []:
haystack = " ".join([
(v.get("name") or "").lower(),
(v.get("category") or "").lower(),
(v.get("processing_company") or "").lower(),
])
if not haystack.strip():
continue
matched = [k for k in _AI_KEYWORDS if k in haystack]
if matched:
hits.append({
"vendor": v.get("name") or "",
"matched": matched[:3],
})
return hits
def check_ai_act_transparency(state: dict) -> list[dict]:
"""Return findings about AI Art. 50 transparency obligations."""
doc_texts = state.get("doc_texts") or {}
dse_text = doc_texts.get("dse") or ""
cookie_text = doc_texts.get("cookie") or ""
cmp_vendors = state.get("cmp_vendors") or []
if not dse_text and not cookie_text and not cmp_vendors:
return []
ai_mentions_dse = _has_any(dse_text, _AI_KEYWORDS)
ai_mentions_cookie = _has_any(cookie_text, _AI_KEYWORDS)
ai_vendors = _find_ai_in_vendors(cmp_vendors)
has_ai_signal = bool(
ai_mentions_dse or ai_mentions_cookie or ai_vendors
)
if not has_ai_signal:
return []
disc_dse = _has_any(dse_text, _DISCLOSURE_PHRASES)
disc_cookie = _has_any(cookie_text, _DISCLOSURE_PHRASES)
has_disclosure = bool(disc_dse or disc_cookie)
findings: list[dict] = []
summary_signals = (
f"DSE-AI-Hinweise: {len(ai_mentions_dse)} "
f"(z.B. {', '.join(ai_mentions_dse[:3])}); "
f"Cookie-AI-Hinweise: {len(ai_mentions_cookie)}; "
f"AI-Vendors: {len(ai_vendors)}"
)
if not has_disclosure:
findings.append({
"check_id": "AI-ACT-TRANSPARENCY-001",
"severity": "HIGH",
"severity_reason": "missing",
"title": (
"AI-Act Art. 50 Transparenz-Hinweis fehlt — "
"KI-System eingesetzt, aber keine Nutzer-Erklärung"
),
"norm": "AI Act Art. 50 Abs. 1 (Transparenz gegenüber Nutzern)",
"detected_signals": summary_signals,
"ai_vendors": ai_vendors,
"ai_keywords_in_dse": ai_mentions_dse[:5],
"ai_keywords_in_cookie": ai_mentions_cookie[:5],
"action": (
"DSE und Pre-Chat-UI mit ausdrücklichem Hinweis "
"'Sie kommunizieren mit einer KI (System X)' ergänzen. "
"Anbieter offen nennen + Rechtsgrundlage + Speicherdauer."
),
})
else:
# AI detected + DSE-Disclosure vorhanden — aber Pre-Chat-Hinweis
# im Live-DOM kann der Check nicht verifizieren.
findings.append({
"check_id": "AI-ACT-TRANSPARENCY-002",
"severity": "MEDIUM",
"severity_reason": "manual_review_required",
"title": (
"AI-Act Art. 50: DSE-Disclosure vorhanden — Pre-Chat-Hinweis "
"im UI manuell verifizieren"
),
"norm": "AI Act Art. 50 Abs. 1",
"detected_signals": summary_signals,
"ai_vendors": ai_vendors,
"disclosure_in_dse": disc_dse[:3],
"disclosure_in_cookie": disc_cookie[:3],
"action": (
"Pre-Chat-UI öffnen: vor der ersten Nutzereingabe muss "
"ein klarer Hinweis erscheinen, dass die Konversation "
"mit einer KI geführt wird. Verifizieren ob Banner/Modal "
"vorhanden oder reine Footnote."
),
})
# Zusatzcheck: Wenn AI vorhanden und Rechtsgrundlage = berechtigtes
# Interesse (Art. 6 Abs. 1 lit. f) statt Einwilligung — MEDIUM
if ai_vendors or ai_mentions_dse:
if _legitimate_interest_for_ai(dse_text):
findings.append({
"check_id": "AI-ACT-RISK-001",
"severity": "MEDIUM",
"severity_reason": "misclassified",
"title": (
"Rechtsgrundlage 'berechtigtes Interesse' für "
"KI-Verarbeitung — Einwilligung empfehlen"
),
"norm": "DSGVO Art. 6 Abs. 1 lit. a vs lit. f + AI Act",
"detected_signals": (
"AI-Provider erkannt; Art. 6 Abs. 1 lit. f als "
"Rechtsgrundlage in DSE genannt"
),
"ai_vendors": ai_vendors,
"action": (
"Bei generativer KI (insbesondere mit Drittland-"
"Transfer und Profiling-Verwandtschaft) "
"Rechtsgrundlage auf Einwilligung (Art. 6 Abs. 1 "
"lit. a) umstellen. Interessenabwägung dokumentieren."
),
})
if findings:
logger.info("ai-act-transparency: %d findings", len(findings))
return findings
def _legitimate_interest_for_ai(dse_text: str) -> bool:
"""Detect 'Rechtsgrundlage Art. 6 Abs. 1 lit. f' near AI mentions."""
text_lc = (dse_text or "").lower()
if not text_lc:
return False
# crude proximity check: any of the AI keywords AND lit-f phrase
# within a ~600 char window
import re
lit_f_patterns = (
"art. 6 abs. 1 lit. f", "artikel 6 abs. 1 lit. f",
"art. 6 1 f", "berechtigtes interesse", "berechtigten interesses",
)
for ai_kw in _AI_KEYWORDS:
for pos in range(0, len(text_lc) - 200):
window = text_lc[max(0, pos-300):pos+300]
if ai_kw in window and any(p in window for p in lit_f_patterns):
return True
# don't walk every char; jump to next ai_kw occurrence
idx = text_lc.find(ai_kw, pos)
if idx == -1:
break
pos = idx + 1
break
return False
@@ -0,0 +1,55 @@
"""B8 — CMP-Provider-Fingerprint-Check.
Findings wenn:
- Cookie-Banner wurde erkannt (banner_result.detected=True)
- Aber CMP-Provider/Vendor nicht ableitbar (provider in {Generic, "", "?"})
Das ist nach EDPB-Taskforce-Methodik MEDIUM: ohne klare CMP-Identität
ist schwer zu beurteilen, welches Consent-Storage-Format greift, ob
TCF unterstützt wird, und wie der DSB mit dem CMP-Anbieter
kommunizieren kann (Audit-Trail / DPA).
Provider-Detection läuft schon im consent-tester. Hier nur die
Lückenmeldung wenn der Banner zwar steht aber der Anbieter offen
bleibt.
"""
from __future__ import annotations
import logging
logger = logging.getLogger(__name__)
_KNOWN_PROVIDERS = (
"usercentrics", "onetrust", "cookiebot", "cookiepro",
"sourcepoint", "consentmanager", "klaro!", "borlabs",
"iubenda", "didomi", "trustarc", "complianz",
)
def check_cmp_fingerprint(state: dict) -> dict | None:
br = state.get("banner_result") or {}
detected = br.get("detected") or br.get("banner_detected")
if not detected:
return None
provider = (br.get("provider") or br.get("banner_provider") or "").lower()
is_known = any(k in provider for k in _KNOWN_PROVIDERS)
if is_known:
return None
# Banner steht, aber CMP-Provider ist generisch oder leer.
finding = {
"check_id": "COOKIE-CONSENT-UX-002",
"severity": "MEDIUM",
"severity_reason": "incomplete",
"title": "Cookie-Banner erkannt, aber CMP-Provider nicht eindeutig",
"norm": "EDPB Cookie Banner Taskforce-Report (Transparenz CMP)",
"detected_provider": provider or "",
"action": (
"CMP-Provider in der DSE benennen (Auftragsverarbeiter), "
"Consent-Storage-Format dokumentieren (TCF / proprietär), "
"und Audit-Trail-Zugang für den DSB sicherstellen."
),
}
logger.info("B8 CMP-fingerprint: detected_provider=%r is generic", provider)
return finding
@@ -0,0 +1,66 @@
"""B6 — DPO in DSE genannt, im Impressum aber nicht verlinkt.
Best-Practice-Check nach DSGVO Art. 37 + § 5 TMG-Geist:
wenn die DSE einen Datenschutzbeauftragten benennt, sollte er
auch im Impressum referenziert sein (mind. Verweis "DSB siehe DSE")
— sonst geht die Kontaktmöglichkeit verloren, wenn die DSE separat
publiziert wird.
Severity LOW (nicht zwingend Pflicht), aber relevant für DSBs.
"""
from __future__ import annotations
import logging
import re
logger = logging.getLogger(__name__)
# Phrasen, die einen DSB / DPO in einem Text als benannt markieren
_DSB_NAMED_PATTERNS = [
re.compile(r"datenschutzbeauftrag\w+", re.I),
re.compile(r"data\s+protection\s+officer\b", re.I),
re.compile(r"\bdpo\b", re.I),
re.compile(r"privacy@\S+", re.I),
re.compile(r"datenschutz@\S+", re.I),
]
def _names_dsb(text: str) -> list[str]:
if not text:
return []
out: list[str] = []
for pat in _DSB_NAMED_PATTERNS:
for m in pat.finditer(text):
out.append(m.group(0))
if len(out) >= 3:
return out
return out
def check_dpo_cross_doc(state: dict) -> dict | None:
"""Return a finding when DSE names a DPO but Impressum does not."""
doc_texts = state.get("doc_texts") or {}
dse = doc_texts.get("dse") or ""
imp = doc_texts.get("impressum") or ""
if not dse or not imp:
return None
dse_hits = _names_dsb(dse)
imp_hits = _names_dsb(imp)
if dse_hits and not imp_hits:
finding = {
"check_id": "IMPRESSUM-DPO-001",
"severity": "LOW",
"severity_reason": "incomplete",
"title": "DSB im Impressum nicht verlinkt",
"norm": "DSGVO Art. 37 (Best Practice) + § 5 TMG-Geist",
"evidence_dse": dse_hits[:2],
"action": (
"Im Impressum den DSB-Kontakt verlinken oder Verweis "
"auf die Datenschutzerklärung ergänzen, damit Betroffene "
"auch über das Impressum den DSB erreichen."
),
}
logger.info("B6 DPO-cross-doc: DSE has DPO, Impressum doesn't")
return finding
return None
@@ -0,0 +1,133 @@
"""B7 — Doc-Staleness: Datum extrahieren + Aktualität bewerten.
Findings, wenn ein rechtliches Dokument (AGB, Nutzungsbedingungen,
Widerruf, DSE) über N Jahre alt ist. Default-Cap: 3 Jahre für AGB/
Nutzungsbedingungen (TERMS-STALENESS-001), 2 Jahre für DSE.
Heuristik für Datumsextraktion:
- "Stand: November 2018" / "Stand November 2018" / "Stand: Dezember 2018"
- "Letzte Aktualisierung: 2018-12-01"
- "Version vom 1.12.2018"
- "Last updated: December 2018"
"""
from __future__ import annotations
import logging
import re
from datetime import datetime
logger = logging.getLogger(__name__)
_MONTHS_DE = {
"januar": 1, "februar": 2, "märz": 3, "maerz": 3, "april": 4,
"mai": 5, "juni": 6, "juli": 7, "august": 8, "september": 9,
"oktober": 10, "november": 11, "dezember": 12,
}
_MONTHS_EN = {
"january": 1, "february": 2, "march": 3, "april": 4, "may": 5,
"june": 6, "july": 7, "august": 8, "september": 9, "october": 10,
"november": 11, "december": 12,
}
# Match patterns like "Stand: Dezember 2018" / "Stand November 2018"
_PAT_STAND = re.compile(
r"(?:stand|version|letzte\s+aktualisierung|last\s+updated|"
r"last\s+revised)\s*[:\-]?\s*"
r"(?:vom\s+)?"
r"(?:(?P<day>\d{1,2})[.\-/])?"
r"(?P<month>"
r"januar|februar|m[äa]rz|april|mai|juni|juli|august|september|"
r"oktober|november|dezember|"
r"january|february|march|april|may|june|july|august|september|"
r"october|november|december|"
r"\d{1,2}"
r")"
r"[.\s\-/]+"
r"(?P<year>\d{4})",
re.I,
)
_AGE_THRESHOLDS_YEARS = {
"agb": 3,
"nutzungsbedingungen": 3,
"widerruf": 2,
"dse": 2,
"impressum": 5, # less critical
"cookie": 2,
}
def _extract_date(text: str) -> tuple[int, int, int] | None:
"""Return (year, month, day) of the most recent revision date."""
if not text:
return None
candidates: list[tuple[int, int, int]] = []
for m in _PAT_STAND.finditer(text):
try:
year = int(m.group("year"))
mon_str = (m.group("month") or "").lower()
day = int(m.group("day") or 1)
if mon_str.isdigit():
month = int(mon_str)
else:
month = (_MONTHS_DE.get(mon_str)
or _MONTHS_EN.get(mon_str))
if not month or not (1 <= month <= 12):
continue
if year < 2000 or year > 2100:
continue
candidates.append((year, month, day))
except (ValueError, TypeError):
continue
if not candidates:
return None
# newest date wins
candidates.sort(reverse=True)
return candidates[0]
def check_staleness(state: dict) -> list[dict]:
"""Run staleness check across legal docs."""
findings: list[dict] = []
doc_texts = state.get("doc_texts") or {}
today = datetime.utcnow()
for doc_type, text in doc_texts.items():
threshold_years = _AGE_THRESHOLDS_YEARS.get(doc_type)
if not threshold_years:
continue
date = _extract_date(text)
if not date:
continue
year, month, day = date
try:
doc_date = datetime(year, month, min(day, 28))
except ValueError:
continue
age_years = (today - doc_date).days / 365.25
if age_years < threshold_years:
continue
sev = "HIGH" if age_years > threshold_years * 2 else "MEDIUM"
findings.append({
"check_id": f"DOC-STALENESS-{doc_type.upper()}",
"doc_type": doc_type,
"severity": sev,
"severity_reason": "incomplete",
"title": (
f"{doc_type.title()} ist {int(age_years)} Jahre alt "
f"(Stand {year:04d}-{month:02d})"
),
"norm": "Sorgfaltspflicht (laufende Anpassung an Rechtsänderungen)",
"doc_date": f"{year:04d}-{month:02d}-{day:02d}",
"age_years": round(age_years, 1),
"threshold_years": threshold_years,
"action": (
f"{doc_type.title()} überprüfen und an aktuelle "
"Gesetzeslage anpassen (DSGVO-Updates, AI Act, DSA, "
"neue BGH-Rechtsprechung). Stand-Datum aktualisieren."
),
})
if findings:
logger.info("B7 staleness: %d findings", len(findings))
return findings
@@ -0,0 +1,329 @@
"""LLM Plausibility Re-Evaluation for MC findings.
Why this exists:
MC-DB labels are historic compliance-officer questions ("Dokumentiert
die DSI alle Datenübermittlungen gemäß Art. 49 Abs. 1 Unterabs. 2
DS-GVO?"). When the deterministic regex+LLM-verify pipeline flags
them as FAIL, the question stays as the title. The reader sees
"we don't know" — unhelpful.
What this does:
AFTER the MC pipeline finished, run a second LLM pass over EVERY
remaining FAIL with the original doc-text. The LLM:
1. Reformulates the question as a STATEMENT-OF-TOPIC
("Drittland-Übermittlungen nach Art. 49 Abs. 1 Unterabs. 2 DS-GVO")
2. Suggests a plausible severity (or DROP if the finding is bogus)
3. Produces a CONCRETE recommendation ("Im Abschnitt 'Drittland'
der DSE Mechanismus pro Empfänger ergänzen")
What this does NOT do:
- Touch the MC-DB. Original label stays in c.label.
- Touch passed/skipped/regulation/matched_text — those are facts.
- Run for non-fails or already-handled checks.
Stamping schema on each Check (CheckItem dataclass):
llm_title: str — reformulated topic statement
llm_severity: str — suggested severity ("HIGH"|"MED"|"LOW"|"DROP")
llm_recommendation: str — concrete fix recommendation
llm_drop: bool — True if the LLM judged the finding not plausible
llm_plausibility: float — 0..1 confidence (optional)
The mail-render V2 reads these stamps and renders them next to the
original label (🤖 LLM-Plausibility box).
Config:
OLLAMA_URL default "http://host.docker.internal:11434"
PLAUSIBILITY_LLM_MODEL default "qwen3:30b-a3b"
PLAUSIBILITY_BATCH_SIZE default 8
PLAUSIBILITY_TIMEOUT_S default 60.0
"""
from __future__ import annotations
import hashlib
import json
import logging
import os
import httpx
logger = logging.getLogger(__name__)
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
MODEL = os.getenv("PLAUSIBILITY_LLM_MODEL", "qwen3:30b-a3b")
BATCH_SIZE = int(os.getenv("PLAUSIBILITY_BATCH_SIZE", "8"))
TIMEOUT = float(os.getenv("PLAUSIBILITY_TIMEOUT_S", "60.0"))
# In-memory cache: (input_hash) -> result_dict. Survives one run.
_CACHE: dict[str, dict] = {}
def _checksum(check_id: str, label: str, hint: str,
doc_excerpt: str) -> str:
"""Stable hash of the LLM input — avoid re-asking on retries."""
h = hashlib.sha256()
h.update(check_id.encode())
h.update(b"\x00")
h.update(label.encode())
h.update(b"\x00")
h.update(hint.encode())
h.update(b"\x00")
h.update(doc_excerpt[:2000].encode())
return h.hexdigest()[:16]
_SYSTEM_PROMPT = (
"Du bist Compliance-Plausibilitäts-Auditor für deutsche "
"Datenschutz-Prüfberichte. Für jeden Finding-Eintrag bekommst du "
"die MC-Pflichtfrage, den LLM-Hinweis und einen Ausschnitt aus "
"dem geprüften Dokument.\n\n"
"REGELN — sehr wichtig:\n"
"1. Du gibst für JEDEN Finding-Eintrag im Input GENAU EINEN Output-"
"Eintrag zurück (keine ausgelassen, keine zusätzlichen).\n"
"2. Die ID muss BUCHSTABENGENAU vom Input übernommen werden — "
"nicht abgekürzt, nicht umformatiert (Beispiel: \"mc-DATA-3953-A04\" "
"bleibt \"mc-DATA-3953-A04\").\n"
"3. Reihenfolge der Output-Items entspricht der Input-Reihenfolge.\n\n"
"Pro Finding:\n"
"- title: TOPIC-STATEMENT (max 80 Zeichen, ohne Frageton, "
"nennt die Norm wenn sinnvoll). Beispiel: "
"Frage \"Dokumentiert die DSI Drittlandtransfers nach Art. 49?\" "
"→ title \"Drittlandtransfer-Doku Art. 49 DSGVO\".\n"
"- severity: HIGH (klar verletzt), MEDIUM (verletzt, weniger "
"kritisch), LOW (unsicher / manuelle Prüfung), DROP "
"(Auszug zeigt klar dass die Anforderung erfüllt ist).\n"
"- recommendation: KONKRETE Aktion (max 200 Zeichen), nennt "
"WAS und WO. Beispiel: \"Im Abschnitt 'Drittlandtransfer' "
"der DSE pro Empfänger einen Mechanismus nach Art. 49 ergänzen\".\n"
"- drop: true wenn severity=DROP, sonst false.\n\n"
"JSON-Schema (genauso antworten):\n"
"{\"findings\":["
"{\"id\":\"<exakte-id-vom-input>\",\"title\":\"...\","
"\"severity\":\"HIGH|MEDIUM|LOW|DROP\","
"\"recommendation\":\"...\",\"drop\":false}"
"]}\n\n"
"Beispiel-Antwort bei 2 Inputs mit IDs mc-A und mc-B:\n"
"{\"findings\":[{\"id\":\"mc-A\",\"title\":\"Norm X erfüllen\","
"\"severity\":\"MEDIUM\",\"recommendation\":\"In Abschnitt Y "
"ergänzen: Norm X erfüllt\",\"drop\":false},"
"{\"id\":\"mc-B\",\"title\":\"Norm Z geprüft\",\"severity\":\"DROP\","
"\"recommendation\":\"Bereits erfüllt — Hinweis im Doc Z3\","
"\"drop\":true}]}"
)
def _build_user_prompt(items: list[dict], doc_title: str,
doc_excerpt: str) -> str:
findings_block = "\n".join(
f'{i+1}. ID="{it["id"]}" | FRAGE: {it["label"]} | '
f'HINT: {it.get("hint", "")[:200]} | SEV_REGEX: {it.get("severity")}'
for i, it in enumerate(items)
)
return (
f"DOKUMENT: {doc_title}\n\n"
f"DOKUMENT-AUSZUG (max 4000 Zeichen):\n{doc_excerpt[:4000]}\n\n"
f"FINDINGS ZU BEWERTEN:\n{findings_block}"
)
async def _ask_llm_batch(items: list[dict], doc_title: str,
doc_excerpt: str) -> dict[str, dict]:
"""Send a batch of up to BATCH_SIZE findings to the LLM."""
body = {
"model": MODEL,
"messages": [
{"role": "system", "content": _SYSTEM_PROMPT},
{"role": "user", "content": _build_user_prompt(
items, doc_title, doc_excerpt,
)},
],
"format": "json",
"stream": False,
"options": {"temperature": 0.0, "seed": 42, "num_predict": 1500},
}
out: dict[str, dict] = {}
input_ids = [it["id"] for it in items]
try:
async with httpx.AsyncClient(timeout=TIMEOUT) as c:
r = await c.post(f"{OLLAMA_URL}/api/chat", json=body)
r.raise_for_status()
content = (r.json().get("message") or {}).get("content", "")
if not content:
logger.warning("plausibility LLM returned empty content")
return out
try:
data = json.loads(content)
except json.JSONDecodeError as je:
logger.warning(
"plausibility LLM JSON parse failed: %s; raw=%s",
je, content[:300],
)
return out
llm_findings = data.get("findings") or []
if not llm_findings:
logger.warning(
"plausibility LLM returned 0 findings for %d input "
"items; raw=%s", len(items), content[:300],
)
return out
# Phase 1: exact ID match
id_set = set(input_ids)
for entry in llm_findings:
fid = (entry.get("id") or "").strip()
if fid in id_set and fid not in out:
out[fid] = _entry_to_stamp(entry)
# Phase 2: position fallback — for any input item still
# unmapped, use the LLM finding at the same index if it's
# otherwise unclaimed.
if len(out) < len(input_ids):
claimed_indices: set[int] = set()
for idx, entry in enumerate(llm_findings):
fid = (entry.get("id") or "").strip()
if fid in out:
claimed_indices.add(idx)
for idx, input_id in enumerate(input_ids):
if input_id in out:
continue
if idx < len(llm_findings) and idx not in claimed_indices:
out[input_id] = _entry_to_stamp(llm_findings[idx])
claimed_indices.add(idx)
# Phase 3: fuzzy match by ID-tail
if len(out) < len(input_ids):
unmapped_ids = [i for i in input_ids if i not in out]
used_entries: set[int] = set()
for idx, entry in enumerate(llm_findings):
fid = (entry.get("id") or "").strip().lower()
if not fid or any(entry == out.get(i) for i in unmapped_ids):
continue
if idx in used_entries:
continue
for inp in unmapped_ids:
if inp in out:
continue
if inp[-8:].lower() in fid or fid in inp.lower():
out[inp] = _entry_to_stamp(entry)
used_entries.add(idx)
break
if not out:
logger.warning(
"plausibility could not map any of %d input IDs; "
"raw=%s", len(input_ids), content[:300],
)
else:
logger.info(
"plausibility mapped %d/%d findings", len(out),
len(input_ids),
)
except Exception as e:
logger.warning("plausibility batch failed: %s", e)
return out
def _entry_to_stamp(entry: dict) -> dict:
return {
"llm_title": (entry.get("title") or "")[:200],
"llm_severity": (entry.get("severity") or "").upper(),
"llm_recommendation": (entry.get("recommendation") or "")[:400],
"llm_drop": bool(entry.get("drop", False)),
}
async def verify_plausibility(results, doc_texts: dict[str, str]) -> None:
"""Stamp llm_* fields onto every FAIL CheckItem in results.
Args:
results: list of DocCheckResult, each with .checks (list of CheckItem)
and .doc_type
doc_texts: doc_type -> source text excerpt for context
"""
if not results:
return
# Gather candidate fails per doc_type so the prompt can scope the
# excerpt correctly.
by_doc: dict[str, list] = {}
by_doc_meta: dict[str, str] = {}
for r in results:
dt = getattr(r, "doc_type", "")
label = getattr(r, "label", "") or dt
for c in getattr(r, "checks", []) or []:
if getattr(c, "passed", True) or getattr(c, "skipped", False):
continue
# MC checks only — skip the structural P-* placement findings
cid = (getattr(c, "id", "") or "").lower()
if not cid.startswith("mc-"):
continue
by_doc.setdefault(dt, []).append(c)
by_doc_meta[dt] = label
if not by_doc:
return
total = sum(len(v) for v in by_doc.values())
logger.info("plausibility-check: %d findings across %d docs",
total, len(by_doc))
for dt, checks in by_doc.items():
doc_title = by_doc_meta.get(dt) or dt
doc_text = doc_texts.get(dt) or ""
if not doc_text:
# Fall back to DSE excerpt when the doc has no own text
doc_text = doc_texts.get("dse") or ""
for i in range(0, len(checks), BATCH_SIZE):
batch = checks[i:i + BATCH_SIZE]
items = []
for c in batch:
items.append({
"id": getattr(c, "id", ""),
"label": getattr(c, "label", ""),
"hint": getattr(c, "hint", "") or "",
"severity": getattr(c, "severity", ""),
})
# Cache lookup per item — skip those already cached.
uncached_items: list[dict] = []
for it in items:
key = _checksum(it["id"], it["label"], it["hint"], doc_text)
if key in _CACHE:
continue
uncached_items.append(it)
if not uncached_items:
cache_results = {it["id"]: _CACHE[_checksum(
it["id"], it["label"], it["hint"], doc_text,
)] for it in items}
else:
cache_results = await _ask_llm_batch(
uncached_items, doc_title, doc_text,
)
for it in uncached_items:
rid = it["id"]
if rid in cache_results:
key = _checksum(
it["id"], it["label"], it["hint"], doc_text,
)
_CACHE[key] = cache_results[rid]
# add cached ones too
for it in items:
if it["id"] not in cache_results:
key = _checksum(
it["id"], it["label"], it["hint"], doc_text,
)
if key in _CACHE:
cache_results[it["id"]] = _CACHE[key]
# Stamp onto each CheckItem
stamped = 0
for c in batch:
cid = getattr(c, "id", "")
if cid in cache_results:
res = cache_results[cid]
try:
c.llm_title = res.get("llm_title", "") or ""
sev = res.get("llm_severity", "") or ""
c.llm_severity = sev if sev in (
"HIGH", "MEDIUM", "LOW", "DROP") else ""
c.llm_recommendation = res.get(
"llm_recommendation", "") or ""
c.llm_drop = bool(res.get("llm_drop", False))
stamped += 1
except Exception:
pass
logger.info("plausibility-check %s: batch %d%d stamped",
dt, len(batch), stamped)
@@ -0,0 +1,16 @@
"""Mail Render V2 — unified, consistent layout for the audit mail.
The original Step-5 HTML composition grew across 27+ render functions,
each with its own inline styles. Result: inconsistent colors,
typography, and card widths. V2 fixes that with:
- `_style.py` ONE place for colors, fonts, spacing helpers
- `_cookie_inventory.py` SINGLE cookie list merged from DSE / table /
live browser, with per-cookie status
- `_blocks.py` Header / TOC / Critical / Per-Doc /
Per-Theme / Caveats / Footer renderers
- `_compose.py` compose_v2(state) → full_html
Activate via env var `MAIL_RENDER_V2=true`. Default is the legacy
renderer so we can A/B compare in Mailpit.
"""
@@ -0,0 +1,296 @@
"""Mail-V2 Action library — turn findings into 'what to do where'.
Each finding type maps to a concrete action recommendation. The
mapping is intentionally pattern-matched (not LLM-generated): the
audit is deterministic, so the corrective action must be too.
Patterns matched by:
- finding `id` prefix (mc-impressum-handelsregister → impressum/HR)
- severity_reason (factually_wrong / missing / misclassified)
- mismatch_type (dsi_under_actual / table_under_actual / ...)
- cookie field name (country / duration / processing_company)
Fallback: "Manuelle Prüfung beim DSB erforderlich" with finding hint.
Returns an Action dict:
- title: short imperative ("Sitzland ergänzen")
- target: where to fix ("DSE / Vendor-Liste")
- detail: extended explanation
- aggregation_key: groupBy key for bulk recommendations
("missing_country" / "long_retention" / ...)
- effort: "low" | "med" | "hi"
"""
from __future__ import annotations
from dataclasses import asdict, dataclass
@dataclass
class Action:
title: str
target: str
detail: str
aggregation_key: str | None
effort: str # low | med | hi
def to_dict(self) -> dict:
return asdict(self)
# ── Field-level actions for cookie inventory ──────────────────────
def cookie_field_missing_action(field: str, cookie_name: str,
vendor: str) -> Action | None:
"""Return action when a cookie field is missing (or unknown)."""
if field == "country":
return Action(
title="Sitzland ergänzen",
target="DSE / Vendor-Tabelle",
detail=(f"Für Cookie '{cookie_name}' (Vendor {vendor or ''}) "
"ist kein Sitzland der verarbeitenden Stelle angegeben. "
"Art. 13 Abs. 1 lit. a DSGVO verlangt die Identität + "
"Anschrift des Verantwortlichen."),
aggregation_key="missing_country",
effort="low",
)
if field == "duration":
return Action(
title="Speicherdauer angeben",
target="DSE / Cookie-Tabelle",
detail=(f"Cookie '{cookie_name}' hat keine deklarierte "
"Speicherdauer. Art. 13 Abs. 2 lit. a DSGVO verlangt "
"die Dauer der Speicherung oder ein Kriterium dafür."),
aggregation_key="missing_duration",
effort="low",
)
if field == "retention_grounds":
return Action(
title="Löschfrist + Rechtsgrundlage angeben",
target="Löschkonzept + DSE",
detail=(f"Für Cookie '{cookie_name}' fehlt eine konkrete "
"Löschfrist. § 35 BDSG + DSK-Standard verlangen ein "
"dokumentiertes Löschkonzept pro Datenkategorie."),
aggregation_key="missing_retention",
effort="med",
)
if field == "processing_company":
return Action(
title="Verantwortliche Stelle nennen",
target="DSE",
detail=(f"Cookie '{cookie_name}' nennt keinen Verantwortlichen "
"(Firma + Adresse). Art. 13 Abs. 1 DSGVO Pflichtangabe."),
aggregation_key="missing_processing_company",
effort="low",
)
if field == "third_country":
return Action(
title="Drittlandtransfer absichern",
target="DSE + AVV-Anhang",
detail=(f"Cookie '{cookie_name}' (Vendor {vendor or ''}) "
"verarbeitet Daten außerhalb EU/EWR. Erforderlich: "
"Angemessenheitsbeschluss, Standardvertragsklauseln "
"oder ausdrückliche Einwilligung (Art. 44 ff. DSGVO)."),
aggregation_key="missing_third_country",
effort="med",
)
if field == "category":
return Action(
title="Kategorie zuordnen",
target="Cookie-Tabelle",
detail=(f"Cookie '{cookie_name}' hat keine Kategorie. EDPB "
"Cookie-Sweep verlangt: technisch notwendig / "
"Statistik / Marketing / Externe Medien."),
aggregation_key="missing_category",
effort="low",
)
return None
# ── Status-level actions (UNDOC / ORPH / MISMATCH) ───────────────
def cookie_status_action(status_code: str, cookie_name: str,
vendor: str) -> Action | None:
if status_code == "UNDOC":
return Action(
title="Cookie deklarieren oder entfernen",
target="CMP-Config + DSE",
detail=(f"Cookie '{cookie_name}' wird im Browser gesetzt, ist "
"aber nicht in DSE/Cookie-Tabelle deklariert. § 25 "
"TDDDG: entweder Deklaration nachholen oder Cookie "
"blockieren (CMP-Trigger prüfen)."),
aggregation_key="undoc_cookies",
effort="med",
)
if status_code == "ORPH":
return Action(
title="Veraltete Cookie-Angabe entfernen",
target="DSE / Cookie-Tabelle",
detail=(f"Cookie '{cookie_name}' ist in DSE deklariert, wird "
"aber im Live-Browser nicht gesetzt. Veraltete Angabe "
"entfernen, um Transparenz zu wahren."),
aggregation_key="orphan_cookies",
effort="low",
)
if status_code == "MISMATCH":
return Action(
title="Cookie-Werte korrigieren",
target="DSE / Cookie-Tabelle",
detail=(f"Cookie '{cookie_name}': deklarierte Werte weichen von "
"tatsächlich gesetzten ab. Tabelle anpassen oder "
"Cookie-Setup korrigieren."),
aggregation_key="mismatch_cookies",
effort="med",
)
return None
# ── Retention-comparison actions ─────────────────────────────────
def retention_action(retention_finding: dict) -> Action | None:
mt = retention_finding.get("mismatch_type")
cookie = retention_finding.get("cookie_name", "")
if mt == "dsi_under_actual":
return Action(
title="DSE-Speicherdauer korrigieren",
target="DSE",
detail=(f"DSE behauptet für '{cookie}' kürzere Speicherdauer als "
"real. Wert in DSE auf reale Dauer anpassen ODER Cookie-"
"Setup auf deklarierte Dauer reduzieren."),
aggregation_key="dsi_too_short",
effort="low",
)
if mt == "table_under_actual":
return Action(
title="Cookie-Tabelle korrigieren",
target="Cookie-Tabelle / CMP",
detail=(f"Cookie-Tabelle behauptet für '{cookie}' kürzere Dauer "
"als real. Wert anpassen oder Cookie-Lifetime reduzieren."),
aggregation_key="table_too_short",
effort="low",
)
if mt == "dsi_vs_table":
return Action(
title="DSE und Cookie-Tabelle synchronisieren",
target="DSE + Cookie-Tabelle",
detail=(f"DSE und Cookie-Tabelle geben unterschiedliche Werte "
f"für '{cookie}' an. Werte abgleichen."),
aggregation_key="dsi_table_mismatch",
effort="low",
)
if mt == "actual_under_table":
return Action(
title="Speicherdauer-Cap dokumentieren (Safari-ITP)",
target="DSE",
detail=(f"Cookie '{cookie}' lebt real kürzer als deklariert — "
"wahrscheinlich Safari ITP 7-Tage-Cap. In DSE ergänzen: "
"'Auf Safari-Geräten kann die Speicherdauer durch ITP "
"verkürzt werden.'"),
aggregation_key="safari_itp",
effort="low",
)
return None
# ── Reachability actions (B1) ────────────────────────────────────
def reachability_action(rb1: dict) -> Action | None:
if rb1.get("passed"):
return None
reason = rb1.get("severity_reason")
if reason == "missing":
return Action(
title="Cookie-Einstellungen-Link im Footer ergänzen",
target="Website-Footer (alle Seiten)",
detail=("Art. 7 Abs. 3 DSGVO: Widerruf muss so einfach wie "
"Erteilung sein. Footer-Link 'Cookie-Einstellungen' "
"ergänzen, der den CMP direkt öffnet (kein neuer Tab, "
"kein Zwischendokument)."),
aggregation_key="footer_reachability",
effort="low",
)
if reason == "misclassified":
return Action(
title="CMP direkt öffnen statt neuer Tab",
target="Footer-Link-Config",
detail=("Bestehender Footer-Link öffnet die CMP nicht direkt. "
"JavaScript-Trigger umstellen: kein target=_blank, "
"keine externe Policy-Seite — CMP-Layer direkt öffnen."),
aggregation_key="footer_reachability",
effort="low",
)
if reason == "factually_wrong":
return Action(
title="Eigenen CMP statt Browser-Verweis",
target="Footer + CMP",
detail=("Nutzer wird auf Browser-Einstellungen verwiesen — das "
"ist nach LfDI BW kein gleichwertiger Widerruf. Eigenen "
"CMP-Re-Open-Mechanismus implementieren."),
aggregation_key="footer_reachability",
effort="med",
)
return None
# ── Generic finding → action ────────────────────────────────────
_ID_PATTERNS = {
"handelsregister": ("HR-Eintrag im Impressum ergänzen",
"Impressum",
"§ 5 Abs. 1 Nr. 4 TMG: Registereintrag mit "
"Registergericht + HR-Nr."),
"ust-id": ("USt-IdNr. ergänzen",
"Impressum",
"§ 5 Abs. 1 Nr. 6 TMG: USt-IdNr. falls vorhanden."),
"vertretungsberechtig": ("Vertretungsberechtigte Person nennen",
"Impressum",
"§ 5 Abs. 1 Nr. 1 TMG"),
"aufsichtsbehoerde": ("Aufsichtsbehörde nennen",
"Impressum",
"§ 5 Abs. 1 Nr. 3 TMG (regulierte Branchen)"),
"berufsordnung": ("Berufsrechtliche Angaben ergänzen",
"Impressum",
"§ 5 Abs. 1 Nr. 5 TMG"),
"dsb": ("DSB benennen",
"DSE",
"Art. 37 ff. DSGVO: Datenschutzbeauftragten benennen + DSE "
"ergänzen."),
"odr": ("OS-Link auf EU-Plattform ergänzen",
"Impressum / AGB",
"Art. 14 EU-VO 524/2013 (B2C-Onlineshop)"),
"widerrufsbelehrung": ("Widerrufsbelehrung anpassen",
"Widerruf-Dokument",
"§ 312g BGB + Art. 246a EGBGB Muster-Widerrufs-"
"belehrung."),
}
def derive_generic_action(finding_id: str, label: str,
hint: str) -> Action | None:
"""Pattern-match a generic MC finding ID to an action template."""
fid = (finding_id or "").lower()
haystack = f"{fid} {label.lower()}"
for kw, (title, target, detail) in _ID_PATTERNS.items():
if kw in haystack:
return Action(
title=title,
target=target,
detail=detail + (f" Hinweis: {hint[:200]}" if hint else ""),
aggregation_key=f"mc_{kw}",
effort="low",
)
if hint:
return Action(
title="Manuelle Prüfung beim DSB",
target=label or "Doc",
detail=hint[:400],
aggregation_key=None,
effort="med",
)
return None
def action_for_finding(finding_id: str, severity: str, label: str,
hint: str) -> Action | None:
"""Top-level entry point for MC findings."""
return derive_generic_action(finding_id, label, hint)
@@ -0,0 +1,248 @@
"""Mail-V2 Bulk-Recommendation Aggregator.
Collects per-item actions (cookie-level, MC-level, retention, B1)
and groups them by `aggregation_key` so the mail can show:
🛠 Sofortmaßnahmen
• Sitzland ergänzen für 12 Cookies: _ga, _gid, _fbp, …
• Drittlandtransfer absichern für 5 US-Vendors: Google, Meta, …
• Speicherdauer > 13mo bei 3 Cookies (CNIL-Cap): IDE, _gcl_au, …
This converts individual fix-recommendations into actionable
"do-this-one-thing-fixes-multiple-cookies" bullets that scale.
"""
from __future__ import annotations
from ._actions import (
Action,
cookie_field_missing_action,
cookie_status_action,
reachability_action,
retention_action,
action_for_finding,
)
# ── Group-label registry ─────────────────────────────────────────
GROUP_LABELS: dict[str, dict] = {
"missing_country": {
"label": "Sitzland ergänzen",
"icon": "🌍",
"norm": "Art. 13 Abs. 1 lit. a DSGVO",
},
"missing_duration": {
"label": "Speicherdauer ergänzen",
"icon": "",
"norm": "Art. 13 Abs. 2 lit. a DSGVO",
},
"missing_retention": {
"label": "Löschfrist + Rechtsgrundlage angeben",
"icon": "🗑",
"norm": "§ 35 BDSG",
},
"missing_processing_company": {
"label": "Verantwortliche Stelle nennen",
"icon": "🏢",
"norm": "Art. 13 Abs. 1 DSGVO",
},
"missing_third_country": {
"label": "Drittlandtransfer absichern",
"icon": "🌐",
"norm": "Art. 44 ff. DSGVO",
},
"missing_category": {
"label": "Cookie-Kategorie zuordnen",
"icon": "🏷",
"norm": "EDPB Cookie-Sweep",
},
"undoc_cookies": {
"label": "Undeklarierte Cookies adressieren",
"icon": "",
"norm": "§ 25 Abs. 1 TDDDG",
},
"orphan_cookies": {
"label": "Veraltete Cookie-Angaben entfernen",
"icon": "👻",
"norm": "Art. 5 Abs. 1 lit. a DSGVO (Transparenz)",
},
"mismatch_cookies": {
"label": "Cookie-Werte mit Realität abgleichen",
"icon": "🔀",
"norm": "Art. 5 Abs. 1 lit. d DSGVO",
},
"dsi_too_short": {
"label": "DSE-Speicherdauer korrigieren (zu kurz angegeben)",
"icon": "📏",
"norm": "Art. 13 Abs. 2 DSGVO",
},
"table_too_short": {
"label": "Cookie-Tabelle-Speicherdauer korrigieren",
"icon": "📏",
"norm": "Art. 13 Abs. 2 DSGVO",
},
"dsi_table_mismatch": {
"label": "DSE ↔ Cookie-Tabelle synchronisieren",
"icon": "🔁",
"norm": "Art. 5 Abs. 2 DSGVO Rechenschaftspflicht",
},
"safari_itp": {
"label": "Safari-ITP-Cap in DSE dokumentieren",
"icon": "🍎",
"norm": "DSGVO Transparenzgebot",
},
"footer_reachability": {
"label": "Footer-Reachability für Widerruf herstellen",
"icon": "🔗",
"norm": "Art. 7 Abs. 3 DSGVO",
},
}
def _generic_group(key: str | None) -> dict:
if not key:
return {"label": "Manuelle Prüfung", "icon": "🔍", "norm": ""}
if key.startswith("mc_"):
kw = key[3:].replace("_", " ").title()
return {"label": f"{kw} ergänzen", "icon": "📝",
"norm": "MC-Prüfung"}
return {"label": key.replace("_", " ").title(), "icon": "", "norm": ""}
# ── Item types collected ────────────────────────────────────────
def _cookie_items(state: dict) -> list[tuple[Action, str]]:
"""Yield (action, item_label) for every cookie-level concern.
item_label is what gets aggregated into the bullet list of names.
"""
from ._cookie_inventory import build_cookie_inventory
rows, _ = build_cookie_inventory(state)
items: list[tuple[Action, str]] = []
for r in rows:
name = r.get("name") or ""
vendor = r.get("vendor") or ""
label = f"{name}" + (f" ({vendor})" if vendor and vendor != "" else "")
# Status-level
st_action = cookie_status_action(r["status_code"], name, vendor)
if st_action:
items.append((st_action, label))
# Field-level
for field, value in (
("country", r.get("country")),
("duration", r.get("duration")),
("retention_grounds", r.get("retention_grounds")),
("processing_company", r.get("processing_company")),
("category", r.get("category")),
):
if not value or value in ("", "", ""):
fa = cookie_field_missing_action(field, name, vendor)
if fa:
items.append((fa, label))
if r.get("third_country"):
ta = cookie_field_missing_action("third_country", name, vendor)
if ta:
items.append((ta, label))
return items
def _retention_items(state: dict) -> list[tuple[Action, str]]:
items: list[tuple[Action, str]] = []
for f in (state.get("retention_findings") or []):
if f.get("matches"):
continue
a = retention_action(f)
if a:
label = (f.get("cookie_name") or "")
vendor = f.get("vendor_name") or ""
if vendor:
label += f" ({vendor})"
items.append((a, label))
return items
def _reachability_items(state: dict) -> list[tuple[Action, str]]:
a = reachability_action(state.get("reachability_finding") or {})
if not a:
return []
return [(a, "Footer")]
def _mc_items(state: dict) -> list[tuple[Action, str]]:
items: list[tuple[Action, str]] = []
for r in (state.get("results") or []):
doc = getattr(r, "label", "") or ""
for c in getattr(r, "checks", []) or []:
if getattr(c, "passed", True) or getattr(c, "skipped", False):
continue
sev = (getattr(c, "severity", "") or "").upper()
if sev not in ("CRITICAL", "HIGH", "MEDIUM"):
continue
a = action_for_finding(
getattr(c, "id", ""),
sev,
getattr(c, "label", ""),
getattr(c, "hint", "") or "",
)
if a:
items.append((a, doc))
return items
def collect_actions(state: dict) -> list[dict]:
"""Top-level: collect every item-action across cookie/retention/B1/MC."""
raw = (
_cookie_items(state)
+ _retention_items(state)
+ _reachability_items(state)
+ _mc_items(state)
)
out: list[dict] = []
for action, label in raw:
out.append({**action.to_dict(), "item": label})
return out
def group_by_action(state: dict) -> list[dict]:
"""Aggregate item-actions by aggregation_key.
Returns a list of groups:
{
"key": "missing_country",
"label": "Sitzland ergänzen",
"icon": "🌍",
"norm": "Art. 13 Abs. 1 lit. a DSGVO",
"effort": "low",
"count": 12,
"items": ["_ga (Google)", "_gid (Google)", ...],
"first_detail": "..." (first action.detail in the group),
}
sorted by count desc, then by group label.
"""
actions = collect_actions(state)
buckets: dict[str | None, dict] = {}
for a in actions:
key = a.get("aggregation_key")
bucket = buckets.setdefault(key, {
"key": key,
"label": None, "icon": None, "norm": None,
"effort": a.get("effort", "med"),
"items": [], "count": 0,
"first_detail": a.get("detail", ""),
})
if not bucket["label"]:
meta = GROUP_LABELS.get(key or "") or _generic_group(key)
bucket["label"] = meta["label"]
bucket["icon"] = meta["icon"]
bucket["norm"] = meta["norm"]
item = a.get("item") or ""
if item not in bucket["items"]:
bucket["items"].append(item)
bucket["count"] = len(bucket["items"])
groups = list(buckets.values())
# sort: high-impact (effort=low + many items) first
eff_rank = {"low": 0, "med": 1, "hi": 2}
groups.sort(key=lambda g: (eff_rank.get(g["effort"], 9),
-g["count"], g["label"] or ""))
return groups
@@ -0,0 +1,367 @@
"""Mail-V2 section renderers — one function per top-level block.
Each renderer takes a slice of `state` and returns ready-to-concatenate
HTML using the helpers from `_style`. Every block is full-width, has
the same card shell, and uses the same color palette.
Finding-bucket renderers (critical / manual / internal) live in
`_blocks_findings.py` to keep this file under the LOC cap.
"""
from __future__ import annotations
from html import escape as h
from ._aggregator import group_by_action
from ._blocks_findings import count_critical, count_internal, count_manual
from ._cookie_inventory import (
build_cookie_inventory,
inventory_headers,
render_inventory_rows,
)
from ._style import (
SZ_H3,
SZ_SMALL,
TEXT,
TEXT_MUTED,
card,
chip,
kpi_row,
section,
table,
)
# ── Helpers ──────────────────────────────────────────────────────
def _score_sev(pct: int | None) -> str:
if pct is None:
return "info"
if pct >= 90:
return "pass"
if pct >= 70:
return "info"
if pct >= 40:
return "warn"
return "fail"
# ── 1. Header + KPI row ──────────────────────────────────────────
def render_header(state: dict) -> str:
site = h(state.get("site_name") or "")
dom = h(state.get("domain") or "")
scorecard = state.get("scorecard") or {}
score_pct = (scorecard.get("totals") or {}).get("pct")
doc_count = state.get("doc_count") or 0
docs_total = len(state.get("results") or [])
findings = state.get("total_findings") or 0
vendors = len(state.get("cmp_vendors") or [])
title_html = (
f'<h1 style="font-size:24px;margin:0 0 4px;color:{TEXT};'
f'font-weight:700;">{site}</h1>'
f'<div style="font-size:13px;color:{TEXT_MUTED};margin-bottom:8px;">'
f'{dom} · Compliance-Audit</div>'
)
kpis = [
{"label": "Compliance-Score",
"value": f"{score_pct}%" if score_pct is not None else "",
"sev": _score_sev(score_pct)},
{"label": "Findings", "value": str(findings),
"sev": "fail" if findings > 5 else "warn" if findings > 0 else "pass"},
{"label": "Dokumente",
"value": f"{doc_count}/{docs_total}", "sev": "info"},
{"label": "Vendors", "value": str(vendors),
"sev": "warn" if vendors > 20 else "info"},
]
return title_html + kpi_row(kpis)
# ── 2. Table of contents ────────────────────────────────────────
def render_toc(state: dict) -> str:
rows = [
("#critical", f"Kritische Befunde ({count_critical(state)})"),
("#manual", f"Manuelle Prüfung ({count_manual(state)})"),
("#internal", f"Interne Reminder ({count_internal(state)})"),
("#sofortmassnahmen", "Sofortmaßnahmen"),
("#per-doc",
f"Pro Dokument ({len(state.get('results') or [])})"),
("#per-theme", "Pro Thema"),
("#caveats",
f"Audit-Vorbehalte ({len(state.get('audit_quality_findings') or [])})"),
("#attach",
f"Anhänge ({1 if state.get('cookie_evidence_slices') else 0})"),
]
items = "".join(
f'<li style="margin:6px 0;"><a href="{href}" style="color:#1e40af;'
f'text-decoration:none;">{h(label)}</a></li>'
for href, label in rows
)
return section(
"📋 Inhalt",
f'<ol style="margin:0;padding-left:18px;font-size:14px;">{items}</ol>',
)
# ── 4. Per-document blocks ──────────────────────────────────────
def render_per_doc(state: dict) -> str:
results = state.get("results") or []
if not results:
return ""
cards = []
for r in results:
label = h(getattr(r, "label", "") or "")
url = getattr(r, "url", "") or ""
url_html = (f'<a href="{h(url)}" style="color:#1e40af;font-size:'
f'{SZ_SMALL};">{h(url)}</a>') if url else ""
corr = getattr(r, "correctness_pct", 0) or 0
err = getattr(r, "error", "") or ""
checks = getattr(r, "checks", []) or []
n_total = len(checks)
n_pass = sum(1 for c in checks if c.passed and not c.skipped)
n_fail = sum(1 for c in checks if not c.passed and not c.skipped)
n_skip = sum(1 for c in checks if c.skipped)
score_sev = _score_sev(corr)
head = (
f'<div style="display:flex;justify-content:space-between;'
f'align-items:flex-start;">'
f'<div><span style="font-size:{SZ_H3};font-weight:600;">{label}</span>'
f'<div>{url_html}</div></div>'
f'<div style="text-align:right;">'
f'{chip(f"{corr}%", score_sev)}</div></div>'
)
if err:
body = (f'<p style="margin:8px 0 0;color:{TEXT_MUTED};">'
f'{h(err)}</p>')
else:
counts = (
f'<div style="margin:8px 0;font-size:{SZ_SMALL};'
f'color:{TEXT_MUTED};">'
f'{n_total} MCs · {n_pass} ✓ · {n_fail} ✗ · {n_skip} ?</div>'
)
top = [c for c in checks
if not c.passed and not c.skipped][:3]
top_list = ""
if top:
lis = "".join(
f'<li style="margin:4px 0;">'
f'{h(getattr(c, "label", "")[:120])}</li>'
for c in top
)
top_list = (
f'<ul style="margin:6px 0 0 16px;padding:0;'
f'font-size:13px;color:{TEXT};">{lis}</ul>'
)
body = counts + top_list
cards.append(card(head + body,
sev=score_sev if not err else "info"))
return section(f"📄 4. Pro Dokument ({len(results)})",
"".join(cards), anchor="per-doc")
# ── 5. Per-theme blocks ─────────────────────────────────────────
def render_theme_cookie_banner(state: dict) -> str:
br = state.get("banner_result") or {}
if not br:
return ""
detected = br.get("detected") or br.get("banner_detected")
provider = br.get("provider") or br.get("banner_provider") or ""
violations = br.get("violations") or len(
(br.get("banner_checks") or {}).get("violations") or [])
body = (
f'<div><strong>Provider:</strong> {h(str(provider))} · '
f'<strong>Detected:</strong> '
f'{chip("Ja" if detected else "Nein", "pass" if detected else "fail")} · '
f'<strong>Violations:</strong> {violations}</div>'
)
return card(
f'<h3 style="margin:0 0 6px;font-size:{SZ_H3};">▶ Cookie-Banner</h3>'
+ body,
sev="warn" if violations else "pass",
)
def render_theme_cookie_inventory(state: dict) -> str:
rows, summary = build_cookie_inventory(state)
if summary["total"] == 0:
return ""
head = (
f'<h3 style="margin:0 0 6px;font-size:{SZ_H3};">'
f'▶ Cookie-Inventar ({summary["total"]})</h3>'
f'<div style="font-size:{SZ_SMALL};color:{TEXT_MUTED};'
f'margin-bottom:6px;">'
f'{summary["declared"]} deklariert · '
f'{summary["in_browser"]} im Browser · '
f'<span style="color:#dc2626;">{summary["undoc"]} UNDOC</span> · '
f'<span style="color:#92400e;">{summary["orph"]} ORPH</span> · '
f'<span style="color:#15803d;">{summary["ok"]} OK</span>'
f' · {summary["third_country"]} Drittland'
f'</div>'
f'<div style="font-size:{SZ_SMALL};color:{TEXT_MUTED};'
f'margin-bottom:6px;">'
f'Fehlende Pflichtangaben — Sitzland: {summary["missing_country"]}'
f' · Speicherdauer: {summary["missing_duration"]}'
f'</div>'
)
show_rows = render_inventory_rows(rows[:50])
body = table(inventory_headers(), show_rows)
if len(rows) > 50:
body += (
f'<p style="margin:6px 0 0;font-size:{SZ_SMALL};'
f'color:{TEXT_MUTED};">'
f'… und {len(rows) - 50} weitere</p>'
)
sev = "fail" if summary["undoc"] else "warn" if summary["orph"] else "pass"
return card(head + body, sev=sev)
def render_sofortmassnahmen(state: dict) -> str:
"""Aggregated bulk-recommendations: '1 Aktion fixt N Items'."""
groups = group_by_action(state)
if not groups:
return ""
rows = []
for g in groups:
items = g["items"]
sample = ", ".join(items[:5])
more = f" + {len(items) - 5} weitere" if len(items) > 5 else ""
eff_sev = ("pass" if g["effort"] == "low"
else "warn" if g["effort"] == "med" else "fail")
rows.append([
f'{g.get("icon") or ""} <strong>{h(g["label"])}</strong>'
f'<div style="font-size:11px;color:{TEXT_MUTED};margin-top:2px;">'
f'{h(g.get("norm") or "")}</div>',
f'<strong>{g["count"]}</strong>',
f'<div style="font-size:12px;color:{TEXT};">'
f'{h(sample)}{h(more)}</div>',
chip(g["effort"].upper(), eff_sev),
])
body = table(["Maßnahme", "Anz.", "Betrifft", "Aufwand"], rows)
return section(
f"🛠 Sofortmaßnahmen ({len(groups)})",
'<p style="margin:0 0 8px;color:' + TEXT_MUTED + ';font-size:13px;">'
'Eine Aktion behebt mehrere Findings auf einmal — nach Aufwand sortiert.'
'</p>' + body,
sev="warn",
anchor="sofortmassnahmen",
)
def render_theme_retention(state: dict) -> str:
s = state.get("retention_theme_summary") or {}
findings = state.get("retention_findings") or []
if not s.get("total"):
return ""
head = (
f'<h3 style="margin:0 0 6px;font-size:{SZ_H3};">'
f'▶ Speicherdauer-Konsistenz (TH-RETENTION)</h3>'
f'<div style="font-size:{SZ_SMALL};color:{TEXT_MUTED};'
f'margin-bottom:6px;">'
f'{s["total"]} Cookies · '
f'<span style="color:#15803d;">{s["passed"]} ✓</span> · '
f'<span style="color:#dc2626;">{s["failed"]} ✗</span> · '
f'<span style="color:#64748b;">{s["incomplete"]} ?</span>'
f'</div>'
)
fails = [f for f in findings
if not f.get("matches")
and f.get("severity_reason") != "incomplete"][:5]
if not fails:
return card(head, sev="pass")
rows = []
for f in fails:
sev = (f.get("severity") or "").upper()
sev_key = "fail" if sev == "HIGH" else "warn"
rows.append([
f'<code>{h(f.get("cookie_name") or "")}</code>',
h(f.get("vendor_name") or ""),
h(f.get("mismatch_type") or ""),
chip(sev, sev_key),
])
body = table(["Cookie", "Vendor", "Mismatch", "Sev"], rows)
sev = "fail" if s.get("failed", 0) else "warn"
return card(head + body, sev=sev)
def render_theme_reachability(state: dict) -> str:
f = state.get("reachability_finding") or {}
if not f:
return ""
passed = f.get("passed")
sev_key = "pass" if passed else (
"fail" if (f.get("severity") or "").upper() == "HIGH" else "warn")
notes_html = "".join(
f'<li style="margin:3px 0;">{h(n)}</li>'
for n in (f.get("notes") or [])
)
sub = (
f'<ul style="margin:6px 0 0 16px;font-size:13px;color:{TEXT};">'
f'{notes_html}</ul>' if notes_html else ""
)
head = (
f'<h3 style="margin:0 0 6px;font-size:{SZ_H3};">'
f'▶ Mobile Reachability (COOKIE-CONSENT-UX-001)</h3>'
f'<div>{chip((f.get("severity") or "PASS").upper(), sev_key)} '
f'<span style="margin-left:6px;font-size:{SZ_SMALL};'
f'color:{TEXT_MUTED};">{h(f.get("severity_reason") or "ok")}</span>'
f'</div>'
)
return card(head + sub, sev=sev_key)
def render_per_theme(state: dict) -> str:
parts = [
render_theme_cookie_banner(state),
render_theme_cookie_inventory(state),
render_theme_retention(state),
render_theme_reachability(state),
]
parts = [p for p in parts if p]
if not parts:
return ""
return section("🎯 5. Pro Thema", "".join(parts), anchor="per-theme")
# ── 6. Audit caveats ────────────────────────────────────────────
def render_caveats(state: dict) -> str:
fs = state.get("audit_quality_findings") or []
if not fs:
return ""
items = []
for f in fs:
sev = (f.get("severity") or "INFO").upper()
sev_key = ("fail" if sev == "HIGH"
else "warn" if sev == "MEDIUM" else "info")
title = h(f.get("title") or f.get("label") or "Vorbehalt")
msg = h(f.get("message") or f.get("hint") or "")
items.append(card(
f'<strong>{chip(sev, sev_key)} {title}</strong>'
f'<div style="margin-top:6px;">{msg}</div>',
sev=sev_key,
))
return section(f"⚠️ 6. Audit-Vorbehalte ({len(fs)})",
"".join(items), sev="warn", anchor="caveats")
# ── 7. Attachments ──────────────────────────────────────────────
def render_attachments(state: dict) -> str:
slices = state.get("cookie_evidence_slices") or []
if not slices:
return ""
meta = state.get("cookie_evidence_meta") or {}
n = len(slices)
body = (
f'<p style="margin:0;">'
f'Beweis-ZIP <code>evidence-{h(state.get("check_id", "")[:8])}.zip</code> '
f'mit <strong>{n}</strong> Slice(s), '
f'manifest.json + audit_metadata.json (SHA256 pro Slice).</p>'
f'<p style="margin:6px 0 0;font-size:{SZ_SMALL};color:{TEXT_MUTED};">'
f'Quelle: {h(meta.get("url") or "")}'
f'</p>'
)
return section("📎 7. Anhänge", body, sev="info", anchor="attach")
@@ -0,0 +1,290 @@
"""Mail-V2 finding-bucket renderers.
Separates FAIL items into three buckets — the user's design constraint:
hard_fail public + evidence → 🔴 Kritische Befunde
manual_review public, no evidence → 🔍 Manuelle Prüfung
internal_reminder internal process → 💼 Reminder (NEVER a fail)
The MC-DB stays as-is. If the LLM-Plausibility phase has already run
it stamps `c.llm_title` / `c.llm_recommendation` / `c.llm_severity`
onto each check; the renderer picks those up when present, otherwise
falls back to the original MC label verbatim. No question-form
rewriting here — that's the LLM-phase's job.
"""
from __future__ import annotations
from html import escape as h
from ._actions import action_for_finding
from ._label_norm import classify_check
from ._scope_filter import (
filter_out_of_scope,
get_last_drop_stats,
)
from ._style import (
SZ_SMALL,
TEXT,
TEXT_MUTED,
card,
chip,
section,
)
def _strip_qmark(s: str) -> str:
"""Normalise a string for dedup comparison."""
return (s or "").strip().rstrip("?").strip().lower()
def _is_dup(a: str, b: str) -> bool:
"""True when a and b carry essentially the same content."""
aa = _strip_qmark(a)
bb = _strip_qmark(b)
if not aa or not bb:
return False
if aa == bb:
return True
short, long = sorted((aa, bb), key=len)
return short and short in long and len(short) > 30
def _collect_three_buckets(state: dict) -> tuple[list[dict], list[dict],
list[dict]]:
"""Split all FAIL items into the three buckets."""
hard: list[dict] = []
manual: list[dict] = []
internal: list[dict] = []
business_scope = state.get("business_scope") or set()
for r in state.get("results") or []:
# Drop sector-specific MCs that don't apply to this business
scoped = filter_out_of_scope(
getattr(r, "checks", []) or [], business_scope,
)
for c in scoped:
sev = (getattr(c, "severity", "") or "").upper()
# LLM-plausibility may downgrade — read llm_severity if set
llm_sev = (getattr(c, "llm_severity", "") or "").upper()
effective_sev = llm_sev or sev
if effective_sev not in ("CRITICAL", "HIGH", "MEDIUM"):
continue
if getattr(c, "passed", True) or getattr(c, "skipped", False):
continue
# LLM may flag a finding as not plausible → drop
if getattr(c, "llm_drop", False):
continue
bucket = classify_check(c)
raw_label = getattr(c, "label", "")
llm_title = getattr(c, "llm_title", "") or ""
llm_recommendation = getattr(c, "llm_recommendation", "") or ""
title = (llm_title or raw_label)[:200]
hint = (getattr(c, "hint", "") or "")[:500]
matched = (getattr(c, "matched_text", "") or "")[:400]
action = action_for_finding(
getattr(c, "id", ""), effective_sev, raw_label, hint,
)
entry = {
"sev": effective_sev,
"id": getattr(c, "id", ""),
"title": title,
"raw_label": raw_label,
"hint": hint,
"matched": matched,
"llm_recommendation": llm_recommendation,
"doc": getattr(r, "label", ""),
"reg": getattr(c, "regulation", "") or "",
"action": action.to_dict() if action else None,
}
if bucket == "hard_fail" and effective_sev in ("CRITICAL", "HIGH"):
hard.append(entry)
elif bucket == "internal_reminder":
internal.append(entry)
else:
manual.append(entry)
# B1 reachability (always hard if HIGH — directly observed)
rb1 = state.get("reachability_finding") or {}
if (rb1.get("severity") or "").upper() == "HIGH" and not rb1.get("passed"):
notes = " · ".join(rb1.get("notes") or [])
hard.append({
"sev": "HIGH",
"id": rb1.get("check_id", "COOKIE-CONSENT-UX-001"),
"title": "Mobile Consent-Reachability — kein Reopen-Link im Footer",
"raw_label": "Mobile Consent-Reachability",
"hint": notes,
"matched": "Footer-Scan: 0 Reopen-Anchor",
"llm_recommendation": "",
"doc": "Website-Footer",
"reg": "DSGVO Art. 7 Abs. 3",
"action": {"title": "Cookie-Einstellungen-Link im Footer ergänzen",
"target": "Website-Footer (alle Seiten)",
"detail": ("Footer-Link 'Cookie-Einstellungen' "
"ergänzen, der den CMP direkt öffnet."),
"effort": "low"},
})
# B3 retention HIGH/MED fails (3-source evidence)
for f in (state.get("retention_findings") or []):
sev = (f.get("severity") or "").upper()
if sev not in ("HIGH", "MEDIUM") or f.get("matches"):
continue
cookie = f.get("cookie_name") or ""
hard.append({
"sev": sev,
"id": "TH-RETENTION",
"title": f"Speicherdauer-Konflikt für {cookie}",
"raw_label": "Cookie-Speicherdauer-Konsistenz",
"hint": (f"DSI {f.get('dsi_days')}d · Tabelle "
f"{f.get('table_days')}d · "
f"Realität {f.get('actual_days')}d"),
"matched": (f.get("dsi_sentence") or "")[:200],
"llm_recommendation": "",
"doc": "Cookie-Richtlinie",
"reg": "DSGVO Art. 13 Abs. 2 lit.a",
"action": {"title": ("DSE / Cookie-Tabelle korrigieren "
if "dsi" in (f.get("mismatch_type") or "")
else "Cookie-Lifetime reduzieren"),
"target": "DSE + Cookie-Tabelle",
"detail": f"Mismatch-Typ: {f.get('mismatch_type')}",
"effort": "low"},
})
sev_rank = {"CRITICAL": 0, "HIGH": 1, "MEDIUM": 2}
hard.sort(key=lambda x: (sev_rank.get(x["sev"], 9), x["title"]))
return hard, manual, internal
def count_critical(state: dict) -> int:
hard, _, _ = _collect_three_buckets(state)
return len(hard)
def count_manual(state: dict) -> int:
_, manual, _ = _collect_three_buckets(state)
return len(manual)
def count_internal(state: dict) -> int:
_, _, internal = _collect_three_buckets(state)
return len(internal)
def _render_finding_card(it: dict, *, sev_key: str = "fail") -> str:
head = (
f'{chip(it["sev"], sev_key)}'
f'<span style="margin-left:8px;font-weight:600;">{h(it["title"])}</span>'
)
meta = (
f'<div style="font-size:{SZ_SMALL};color:{TEXT_MUTED};margin:4px 0;">'
f'{h(it["id"])} · {h(it["doc"])} · {h(it["reg"])}</div>'
)
evidence = ""
if it.get("matched"):
evidence = (
f'<div style="font-size:13px;color:{TEXT};margin-top:6px;'
f'background:#fef3c7;padding:6px 8px;border-radius:4px;'
f'border-left:2px solid #f59e0b;">'
f'<strong>Beobachtet:</strong> <em>{h(it["matched"])}</em></div>'
)
hint = ""
if it.get("hint") and not _is_dup(it.get("hint"), it.get("title")):
hint = (
f'<div style="font-size:13px;color:{TEXT_MUTED};margin-top:6px;">'
f'{h(it["hint"])}</div>'
)
action_html = ""
a = it.get("action")
if a:
# Skip action.target rendering when it duplicates the title
target = "" if _is_dup(a.get("target", ""), it.get("title")) \
else a.get("target", "")
# Skip action.detail when it duplicates hint or title
detail = a.get("detail", "")
if _is_dup(detail, it.get("hint")) or _is_dup(detail, it.get("title")):
detail = ""
target_html = (
f' <span style="color:{TEXT_MUTED};">({h(target)})</span>'
if target else ""
)
detail_html = (
f'<div style="margin-top:4px;color:{TEXT};">{h(detail)}</div>'
if detail else ""
)
action_html = (
f'<div style="font-size:13px;margin-top:8px;background:#dcfce7;'
f'padding:8px 10px;border-radius:4px;border-left:2px solid #16a34a;">'
f'<strong>→ {h(a["title"])}</strong>{target_html}'
f'{detail_html}'
f'</div>'
)
llm_html = ""
if it.get("llm_recommendation"):
llm_html = (
f'<div style="font-size:13px;margin-top:8px;background:#dbeafe;'
f'padding:8px 10px;border-radius:4px;border-left:2px solid #3b82f6;">'
f'<strong>🤖 LLM-Plausibility:</strong> '
f'{h(it["llm_recommendation"])}</div>'
)
return card(head + meta + evidence + hint + action_html + llm_html,
sev=sev_key)
def render_critical(state: dict) -> str:
hard, _, _ = _collect_three_buckets(state)
if not hard:
body = (
'<p style="margin:0;color:#15803d;">'
'Keine HIGH/CRITICAL-Befunde mit harter Evidenz im aktuellen Lauf.'
'</p>'
)
return section("✅ 1. Kritische Befunde", body, sev="pass",
anchor="critical")
cards = [_render_finding_card(it, sev_key="fail") for it in hard]
intro = ('<p style="margin:0 0 8px;color:' + TEXT_MUTED + ';font-size:13px;">'
'Findings mit direkt beobachtbarer Evidenz (öffentliche Daten). '
'Pro Befund: Was wir geprüft haben · Beobachtung · Was zu tun ist.'
'</p>')
return section(f"🔴 1. Kritische Befunde ({len(hard)})",
intro + "".join(cards), sev="fail", anchor="critical")
def render_manual_review(state: dict) -> str:
_, manual, _ = _collect_three_buckets(state)
drop_stats = get_last_drop_stats()
if not manual:
if drop_stats.get("count"):
note = ('<p style="margin:0;color:#64748b;font-size:13px;">'
f'Keine manuell zu prüfenden Punkte. '
f'Branchen-spezifische MCs ausgefiltert: '
f'{drop_stats["count"]} '
f'({", ".join(f"{k}:{v}" for k,v in drop_stats["by_prefix"].items())})'
'</p>')
return section("✅ 2. Manuelle Prüfung", note, sev="pass",
anchor="manual")
return ""
cards = [_render_finding_card(it, sev_key="warn") for it in manual]
intro = ('<p style="margin:0 0 8px;color:' + TEXT_MUTED + ';font-size:13px;">'
'Diese Punkte sind öffentlich prüfbar, aber unser Audit konnte '
'sie nicht eindeutig feststellen — Hinweis: Original-MC-Frage. '
'Empfehlung: manuell beim Mandanten/DSB klären. '
'Die LLM-Plausibilitätsprüfung hilft Frage→Aussage zu wandeln '
'(siehe 🤖-Block pro Finding falls schon gelaufen).</p>')
return section(f"🔍 2. Manuelle Prüfung erforderlich ({len(manual)})",
intro + "".join(cards), sev="warn", anchor="manual")
def render_internal_reminders(state: dict) -> str:
_, _, internal = _collect_three_buckets(state)
if not internal:
return ""
cards = [_render_finding_card(it, sev_key="info") for it in internal]
intro = ('<p style="margin:0 0 8px;color:' + TEXT_MUTED + ';font-size:13px;">'
'Interne Prozesse (TOM, DSFA, AVV, Löschkonzept, Schulungen, '
'Incident-Response, VVT) sind von außen nicht prüfbar. '
'<strong>Dies sind Reminder — kein Befund über die Website.</strong> '
'Beim Mandanten die Existenz + Aktualität der Dokumente verifizieren.'
'</p>')
return section(f"💼 3. Interne Prozesse — Reminder ({len(internal)})",
intro + "".join(cards), sev="info", anchor="internal")
@@ -0,0 +1,64 @@
"""Mail-V2 compose — single entrypoint that returns the full HTML.
Call `compose_v2(state)` from the email-dispatch phase when
`MAIL_RENDER_V2=true`. Default remains the legacy compose so we can
A/B in Mailpit.
"""
from __future__ import annotations
import os
from ._blocks import (
render_attachments,
render_caveats,
render_header,
render_per_doc,
render_per_theme,
render_sofortmassnahmen,
render_toc,
)
from ._blocks_findings import (
render_critical,
render_internal_reminders,
render_manual_review,
)
from ._legacy_wrappers import render_all_legacy
from ._style import page_close, page_open
def compose_v2(state: dict) -> str:
"""Build the full audit-mail HTML in the V2 layout."""
site = state.get("site_name") or ""
parts = [
page_open(site),
render_header(state),
render_toc(state),
render_critical(state),
render_manual_review(state),
render_internal_reminders(state),
render_sofortmassnahmen(state),
render_per_doc(state),
render_per_theme(state),
# B4 — Cross-Doc Vendor-Consistency (Elli Vertex↔Iadvize pattern)
state.get("vendor_consistency_html", ""),
# B5 — AI-Act Art. 50 Transparenzpflicht
state.get("ai_act_html", ""),
# B6/B7/B8 — DPO-cross-doc + Doc-Staleness + CMP-fingerprint
state.get("extra_findings_html", ""),
# All legacy build_*_html() wrapped in V2 sections — preserves
# every information block from the old renderer (Exec Summary,
# Banner-Screenshot, VVT, Redundancy, Solutions, Diff, etc.)
render_all_legacy(state),
render_caveats(state),
render_attachments(state),
page_close(state.get("check_id", ""),
os.environ.get("BUILD_SHA", "unknown")),
]
return "".join(p for p in parts if p)
def is_v2_enabled() -> bool:
return os.environ.get("MAIL_RENDER_V2", "false").lower() in (
"true", "1", "yes", "on",
)
@@ -0,0 +1,267 @@
"""Mail-V2 Cookie-Inventar — single table with per-cookie status + action.
Merges three sources:
- declared in DSE / cookie-table (state["cmp_vendors"][i]["cookies"])
- live in browser (state["banner_result"]["cookies_detailed"])
- cookie_audit comparison (state["cookie_audit"]: declared/undocumented)
Status hierarchy per cookie:
UNDOC — in browser, NOT in declared list HIGH
MISMATCH — declared with different category/duration MED
ORPH — declared, NOT in browser LOW
OK — declared + in browser, values agree PASS
Per-row fields (each `❌` when not ascertainable):
name, vendor, category, duration, retention_grounds, country,
third_country (bool), processing_company, sources, status, action
"""
from __future__ import annotations
from html import escape as h
from ._style import chip
# EU + EWR + CH — no third-country transfer.
EU_EEA_CH = {
"DE", "AT", "BE", "BG", "HR", "CY", "CZ", "DK", "EE", "FI",
"FR", "GR", "HU", "IE", "IT", "LV", "LT", "LU", "MT", "NL",
"PL", "PT", "RO", "SK", "SI", "ES", "SE",
"IS", "LI", "NO", "CH",
}
# Adequacy decisions (limited list — most relevant in cookie context).
ADEQUACY = {"US", "UK", "JP", "KR", "IL", "CA", "NZ", "AR", "UY", "AD"}
def _norm(s: str | None) -> str:
return (s or "").strip().lower()
def _missing(value: str | None) -> bool:
if value is None:
return True
v = str(value).strip()
if not v:
return True
return v.lower() in ("", "?", "unknown", "n/a", "tbd")
def _x_or(value: str | None) -> str:
"""Render `❌` when the value is missing, else escape + return."""
if _missing(value):
return '<span style="color:#dc2626;font-weight:700;" title="fehlend">❌</span>'
return h(str(value))
def _country_third(country: str | None) -> tuple[str, bool, str | None]:
"""Return (display, is_third_country, adequacy_tag).
is_third_country=True when outside EU/EEA/CH.
adequacy_tag e.g. "DPF" or None.
"""
if _missing(country):
return ("", False, None)
code = (country or "").strip().upper()
# accept "Germany" → "DE" via crude mapping for the most common names
name_map = {
"DEUTSCHLAND": "DE", "GERMANY": "DE", "IRELAND": "IE", "IRLAND": "IE",
"USA": "US", "UNITED STATES": "US",
}
code = name_map.get(code, code)
if code in EU_EEA_CH:
return (code, False, None)
tag = "DPF" if code in ADEQUACY else "RISK"
return (code, True, tag)
def _src_chip(in_dse: bool, in_table: bool, in_browser: bool,
in_ocr: bool) -> str:
parts: list[str] = []
if in_dse:
parts.append("DSE")
if in_table:
parts.append("Tabelle")
if in_ocr:
parts.append("OCR")
if in_browser:
parts.append("Browser")
return " · ".join(parts) if parts else ""
def _build_status(declared: bool, in_browser: bool,
cookie_audit_undeclared: set,
cookie_audit_compliant: set,
name_lc: str) -> tuple[str, str]:
if name_lc in cookie_audit_undeclared or (in_browser and not declared):
return "UNDOC", "fail"
if declared and not in_browser:
return "ORPH", "warn"
if declared and in_browser:
return "OK", "pass"
return "", "info"
def build_cookie_inventory(state: dict) -> tuple[list[dict], dict]:
"""Build the merged inventory + summary."""
cmp_vendors = state.get("cmp_vendors") or []
banner = state.get("banner_result") or {}
cookies_detailed = banner.get("cookies_detailed") or []
cookie_audit = state.get("cookie_audit") or {}
# 1) Declared
declared: dict[str, dict] = {}
for v in cmp_vendors:
vname = (v.get("name") or "").strip()
vcountry = (v.get("country") or "").strip()
vproc = (v.get("processing_company") or "").strip()
vretention = (v.get("persistence") or "").strip() # vendor-level
src = (v.get("source") or "").lower()
in_dse = "dse" in src or "table_crawled" in src
in_table = ("table" in src or "pasted" in src
or "html_table" in src)
in_ocr = "tesseract" in src or "ocr" in src
for c in (v.get("cookies") or []):
cname = (c.get("name") or "").strip()
if not cname:
continue
key = _norm(cname)
entry = declared.setdefault(key, {
"name": cname,
"vendor": vname,
"category": "",
"duration": "",
"retention_grounds": "",
"country": vcountry,
"processing_company": vproc,
"in_dse": False,
"in_table": False,
"in_ocr": False,
})
entry["category"] = (entry["category"]
or (c.get("category") or "").strip())
entry["duration"] = (entry["duration"]
or (c.get("duration")
or c.get("persistence") or "").strip())
# cookie-level overrides if richer
if not entry["country"] and vcountry:
entry["country"] = vcountry
if not entry["processing_company"] and vproc:
entry["processing_company"] = vproc
if not entry["retention_grounds"] and vretention:
entry["retention_grounds"] = vretention
entry["in_dse"] = entry["in_dse"] or in_dse
entry["in_table"] = entry["in_table"] or in_table
entry["in_ocr"] = entry["in_ocr"] or in_ocr
# 2) Browser
browser: dict[str, dict] = {}
for c in cookies_detailed:
cname = (c.get("name") or "").strip()
if not cname:
continue
browser[_norm(cname)] = c
# 3) cookie_audit hints
undeclared_set: set = {
_norm((c.get("name") if isinstance(c, dict) else c) or "")
for c in (cookie_audit.get("undeclared_in_browser") or [])
}
compliant_set: set = {
_norm((c.get("name") if isinstance(c, dict) else c) or "")
for c in (cookie_audit.get("compliant") or [])
}
all_keys = set(declared.keys()) | set(browser.keys())
rows: list[dict] = []
for key in sorted(all_keys):
d = declared.get(key) or {}
b = browser.get(key) or {}
name = d.get("name") or b.get("name") or key
vendor = (d.get("vendor")
or b.get("domain") or "").strip() or ""
country = d.get("country", "")
country_display, is_third, adq = _country_third(country)
in_browser = key in browser
is_declared = key in declared
status, sev = _build_status(
is_declared, in_browser, undeclared_set, compliant_set, key,
)
sources = _src_chip(
d.get("in_dse", False),
d.get("in_table", False),
in_browser,
d.get("in_ocr", False),
)
rows.append({
"name": name,
"vendor": vendor,
"category": d.get("category", ""),
"duration": d.get("duration", ""),
"retention_grounds": d.get("retention_grounds", ""),
"country": country_display,
"third_country": is_third,
"third_country_tag": adq,
"processing_company": d.get("processing_company", ""),
"sources": sources,
"status_code": status,
"status_sev": sev,
"declared": is_declared,
"in_browser": in_browser,
})
order = {"UNDOC": 0, "MISMATCH": 1, "ORPH": 2, "OK": 3, "": 4}
rows.sort(key=lambda r: (order.get(r["status_code"], 9),
r["name"].lower()))
summary = {
"total": len(rows),
"ok": sum(1 for r in rows if r["status_code"] == "OK"),
"undoc": sum(1 for r in rows if r["status_code"] == "UNDOC"),
"orph": sum(1 for r in rows if r["status_code"] == "ORPH"),
"mismatch": sum(1 for r in rows if r["status_code"] == "MISMATCH"),
"declared": sum(1 for r in rows if r["declared"]),
"in_browser": sum(1 for r in rows if r["in_browser"]),
"third_country": sum(1 for r in rows if r["third_country"]),
"missing_country": sum(1 for r in rows if _missing(r["country"])),
"missing_duration": sum(1 for r in rows if _missing(r["duration"])),
}
return rows, summary
def render_inventory_rows(rows: list[dict]) -> list[list[str]]:
"""Cell-rows for `_style.table`.
Columns: Name | Vendor | Kat | Speicherdauer | Löschfrist |
Sitzland | Verantwortlich | Quelle | Status
"""
out: list[list[str]] = []
for r in rows:
country_html = _x_or(r["country"])
if r["third_country"]:
tag = r.get("third_country_tag") or "RISK"
tag_color = "#92400e" if tag == "DPF" else "#dc2626"
country_html += (
f' <span style="font-size:10px;color:{tag_color};'
f'font-weight:700;">[{tag}]</span>'
)
out.append([
f'<code>{h(r["name"])}</code>',
h(r["vendor"]) if r["vendor"] else
'<span style="color:#dc2626;">❌</span>',
_x_or(r["category"]),
_x_or(r["duration"]),
_x_or(r["retention_grounds"]),
country_html,
_x_or(r["processing_company"]),
h(r["sources"]),
chip(r["status_code"], r["status_sev"]),
])
return out
def inventory_headers() -> list[str]:
return ["Name", "Vendor", "Kat.", "Speicherdauer", "Löschfrist",
"Sitzland", "Verantwortlich", "Quelle", "Status"]
@@ -0,0 +1,113 @@
"""Mail-V2 label normalizer — turn MC questions into statements.
Historic MC labels read like compliance-officer checklists:
"Dokumentiert die Datenschutzinformation alle Datenübermittlungen
gemäß Art. 49 Abs. 1 Unterabs. 2 DS-GVO?"
In the audit mail that looks like "we don't know" — unhelpful.
This module rewrites the label as a statement of WHAT WAS CHECKED
so the reader gets a topic, not a question:
"Drittland-Übermittlungen Art. 49 Abs. 1 Unterabs. 2 DS-GVO"
The transformation is purely textual; the underlying MC stays as is.
"""
from __future__ import annotations
import re
# Question-stem → topic-prefix rewrites, applied in order.
_REWRITES: list[tuple[re.Pattern, str]] = [
(re.compile(r"^Dokumentiert\s+die\s+(.+?)\s+(.+?)\?$", re.IGNORECASE),
r"\2"),
(re.compile(r"^Werden\s+(.+?)\s+dokumentiert\?$", re.IGNORECASE),
r"\1 dokumentieren"),
(re.compile(r"^Wird\s+(.+?)\s+benannt\?$", re.IGNORECASE),
r"\1 benennen"),
(re.compile(r"^Ist\s+(.+?)\s+angegeben\?$", re.IGNORECASE),
r"\1 angeben"),
(re.compile(r"^Enthält\s+(?:die\s+)?(.+?)\s+(.+?)\?$", re.IGNORECASE),
r"\2 in \1"),
(re.compile(r"^Sind\s+(.+?)\s+vorhanden\?$", re.IGNORECASE),
r"\1 prüfen"),
(re.compile(r"^Gibt\s+es\s+(.+?)\?$", re.IGNORECASE),
r"\1 prüfen"),
]
def label_as_statement(label: str) -> str:
"""Rewrite a question-form label as a topic statement."""
if not label:
return label
s = label.strip()
if not s.endswith("?"):
return s
for pat, repl in _REWRITES:
m = pat.match(s)
if m:
out = pat.sub(repl, s).strip()
# First word capitalised
return out[:1].upper() + out[1:] if out else s
# Generic fallback: drop the question mark + leading "Wird/Sind/Ist"
s2 = re.sub(r"^\s*(Wird|Sind|Ist|Werden|Gibt es|Enthält|Hat)\s+",
"", s, flags=re.IGNORECASE).rstrip("?")
return s2[:1].upper() + s2[1:] if s2 else s
def has_evidence(check) -> bool:
"""Decide whether an MC check has real evidence backing the FAIL.
A FAIL with non-empty `matched_text` (the regex/LLM did find a
string and judged it insufficient) is a hard fail. A FAIL with
empty matched_text is more like 'we could not confirm' → that
belongs in the manual-review bucket, not in critical findings.
"""
matched = getattr(check, "matched_text", "") or ""
return bool(matched.strip())
# Keywords that indicate a check is about an INTERNAL process the
# auditor cannot observe from outside (TOM, DSFA, AVV, training,
# incident response, risk analysis, deletion concept). These are
# never findings — they are reminders that the DPO/DSB must verify
# the document/process exists internally.
_INTERNAL_KEYWORDS = (
"tom", "technisch-organisatorische", "technisch organisatorische",
"dsfa", "datenschutz-folgenabschätzung",
"datenschutzfolgenabschätzung",
"schulung", "training", "awareness",
"avv", "auftragsverarbeitungsvertrag", "auftragsverarbeitung",
"incident", "vorfall", "meldepflicht intern",
"risikoanalyse", "risikobewertung", "risk assessment",
"löschkonzept", "löschfristen-konzept",
"vvt", "verzeichnis der verarbeitungstätigkeiten",
"dsb-bestellung", "dsb bestellung",
"verfahrensverzeichnis", "berichtigungskonzept",
"betroffenenrechte-prozess", "dsr-prozess",
)
def is_internal_process(check) -> bool:
"""Decide whether the MC check is about an internal process."""
label = (getattr(check, "label", "") or "").lower()
cid = (getattr(check, "id", "") or "").lower()
hint = (getattr(check, "hint", "") or "").lower()
# mc_audit_type module may have annotated the check
audit_type = getattr(check, "audit_type", "")
if audit_type and audit_type in ("internal", "process", "documentation"):
return True
hay = f"{label} {cid} {hint}"
return any(k in hay for k in _INTERNAL_KEYWORDS)
def classify_check(check) -> str:
"""Return one of: 'hard_fail' | 'manual_review' | 'internal_reminder'.
Only call on FAIL checks (passed=False, skipped=False). Drives
which bucket the check renders into.
"""
if is_internal_process(check):
return "internal_reminder"
if has_evidence(check):
return "hard_fail"
return "manual_review"
@@ -0,0 +1,446 @@
"""Mail-V2 legacy wrappers — wrap each existing build_*_html() in V2 shell.
The original step-5 had 24+ render functions, each emitting standalone
HTML with their own styles. V2 keeps all the information by wrapping
each output in a consistent V2 `section()` container with stripe +
palette. The block-level styling normalizes; the inner data tables/
lists keep their legacy markup so we don't lose detail.
Each wrapper is defensive: missing data, import errors, or empty
HTML → return "" so the section disappears rather than crashing.
"""
from __future__ import annotations
import logging
from ._style import section
logger = logging.getLogger(__name__)
def _safe_wrap(label: str, anchor: str, html: str,
*, sev: str = "info") -> str:
if not html or not html.strip():
return ""
return section(label, html, sev=sev, anchor=anchor)
# ── Tier 1 (Sales-critical) ──────────────────────────────────────
def render_executive_summary(state: dict) -> str:
"""P82 GF-1-Pager + P1 Exec-Summary combined as 'Executive Summary'."""
parts: list[str] = []
req = state.get("req")
try:
from compliance.services.gf_one_pager import build_gf_one_pager_html
html = build_gf_one_pager_html(
site_name=state.get("site_name") or "",
scorecard=state.get("scorecard") or {},
previous_scorecard=state.get("prev_scorecard"),
banner_result=state.get("banner_result"),
library_mismatch_findings=state.get("mismatches") or [],
scan_context=getattr(req, "scan_context", None) if req else None,
audit_quality_findings=state.get("audit_quality_findings") or [],
)
if html and html.strip():
parts.append(html)
except Exception as e:
logger.warning("gf_one_pager wrapper: %s", e)
try:
from compliance.api.agent_doc_check_exec_summary import (
build_exec_summary_html,
)
html = build_exec_summary_html(
scorecard=state.get("scorecard") or {},
previous_scorecard=state.get("prev_scorecard"),
cmp_vendors=state.get("cmp_vendors") or [],
redundancy_report=state.get("redundancy_report"),
site_name=state.get("site_name") or "",
)
if html and html.strip():
parts.append(html)
except Exception as e:
logger.warning("exec_summary wrapper: %s", e)
return _safe_wrap("💼 Executive Summary", "exec",
"".join(parts), sev="info")
def render_banner_screenshot(state: dict) -> str:
"""P85 — Banner-Screenshot as visual proof."""
try:
from compliance.services.banner_screenshot_block import (
build_banner_screenshot_html,
)
html = build_banner_screenshot_html(state.get("banner_result"))
return _safe_wrap("📸 Banner-Screenshot", "banner-shot",
html, sev="info")
except Exception as e:
logger.warning("banner_screenshot wrapper: %s", e)
return ""
def render_vvt(state: dict) -> str:
"""VVT-Tabelle nach Art. 30 DSGVO — Verarbeitungstätigkeiten."""
try:
from compliance.api.agent_doc_check_extras import (
build_vvt_table_html,
)
html = build_vvt_table_html(state.get("cmp_vendors") or [])
return _safe_wrap("📋 VVT — Verarbeitungstätigkeiten (Art. 30 DSGVO)",
"vvt", html, sev="info")
except Exception as e:
logger.warning("vvt wrapper: %s", e)
return ""
def render_redundancy(state: dict) -> str:
"""O4 — Vendor-Redundanz + EU-Alternativen + Cost-Savings."""
try:
from compliance.api.agent_doc_check_redundancy import (
build_redundancy_html,
)
html = build_redundancy_html(state.get("redundancy_report"))
return _safe_wrap("💰 Optimierungspotenzial (Redundanz / EU-Alt.)",
"redundancy", html, sev="warn")
except Exception as e:
logger.warning("redundancy wrapper: %s", e)
return ""
def render_diff(state: dict) -> str:
"""P84 — Diff-Mode: Veränderung seit letztem Lauf."""
try:
from compliance.services.run_diff import (
build_diff_block_html, compute_diff,
)
from database import SessionLocal
db = SessionLocal()
try:
diff = compute_diff(
db, state["check_id"], state.get("domain_for_exec") or "",
state.get("banner_result"), state.get("scorecard"),
)
html = build_diff_block_html(diff) if diff else ""
finally:
db.close()
return _safe_wrap("📊 Veränderung seit letztem Lauf",
"diff", html, sev="info")
except Exception as e:
logger.warning("diff wrapper: %s", e)
return ""
def render_scope_disclaimer(state: dict) -> str:
"""P62 — Was wir prüfen, was wir nicht prüfen können."""
try:
from compliance.api.scope_disclaimer import build_scope_disclaimer_html
html = build_scope_disclaimer_html()
return _safe_wrap("🔍 Prüfumfang & Methodische Hinweise",
"scope", html, sev="info")
except Exception as e:
logger.warning("scope_disclaimer wrapper: %s", e)
return ""
# ── Tier 2 (Audit-detail) ─────────────────────────────────────────
def render_banner_deep(state: dict) -> str:
"""Banner-Deep: Phases + Quality-Score + Per-Category-Tracker."""
try:
from compliance.api.agent_doc_check_banner import (
build_banner_deep_html,
)
html = build_banner_deep_html(state.get("banner_result"))
return _safe_wrap("🍪 Banner-Tiefenanalyse (Phasen + Kategorien)",
"banner-deep", html, sev="info")
except Exception as e:
logger.warning("banner_deep wrapper: %s", e)
return ""
def render_cookie_audit(state: dict) -> str:
"""Cookie 3-Quellen-Audit (deklariert ↔ Browser ↔ Library)."""
try:
from compliance.services.cookie_compliance_audit import (
build_cookie_audit_block_html,
)
html = build_cookie_audit_block_html(state.get("cookie_audit") or {})
return _safe_wrap("🔬 Cookie-Audit (3-Quellen-Vergleich)",
"cookie-audit", html, sev="warn")
except Exception as e:
logger.warning("cookie_audit wrapper: %s", e)
return ""
def render_solutions(state: dict) -> str:
"""P73 — LLM-Lösungsvorschläge pro HIGH-Fail."""
try:
from compliance.services.mc_solution_generator import (
build_solutions_block_html,
)
html = build_solutions_block_html(state.get("mc_solutions") or [])
return _safe_wrap("🎯 LLM-Lösungsvorschläge (P73)",
"solutions", html, sev="info")
except Exception as e:
logger.warning("solutions wrapper: %s", e)
return ""
def render_cookie_architecture(state: dict) -> str:
"""P10 — Cookie-Policy-Architecture (BMW-Pattern, layered separation)."""
try:
from compliance.services.cookie_policy_architecture import (
build_architecture_html,
)
html = build_architecture_html(state.get("cookie_architecture") or {})
return _safe_wrap("🏗 Cookie-Policy-Architektur",
"cookie-arch", html, sev="info")
except Exception as e:
logger.warning("cookie_architecture wrapper: %s", e)
return ""
def render_library_mismatch(state: dict) -> str:
"""P102 — Cookie-Klassifikations-Pruefung gegen Library."""
try:
from compliance.services.cookie_library_mismatch import (
build_mismatch_block_html,
)
html = build_mismatch_block_html(state.get("mismatches") or [])
return _safe_wrap("⚖️ Cookie-Klassifikation gegen Library (P102)",
"lib-mismatch", html, sev="warn")
except Exception as e:
logger.warning("library_mismatch wrapper: %s", e)
return ""
def render_banner_consistency(state: dict) -> str:
"""P92/P94 — Banner-Konsistenz / CMP-Health."""
try:
from compliance.services.banner_consistency_checks import (
build_consistency_block_html,
)
html = build_consistency_block_html(
state.get("consistency_findings") or [])
return _safe_wrap("🧩 Banner-Konsistenz + CMP-Health",
"banner-consistency", html, sev="warn")
except Exception as e:
logger.warning("banner_consistency wrapper: %s", e)
return ""
def render_signals(state: dict) -> str:
"""P35/P77/P78 — Save-Label, Cookies-in-DSE, JC-Klausel."""
try:
from compliance.services.doc_text_signals import (
build_signals_block_html,
)
html = build_signals_block_html(state.get("signal_findings") or [])
return _safe_wrap("🚩 Doc-Text-Signale (P35/P77/P78)",
"signals", html, sev="info")
except Exception as e:
logger.warning("signals wrapper: %s", e)
return ""
def render_scorecard_regulation(state: dict) -> str:
"""MC-Scorecard per Regulation (DSGVO/TDDDG/BGB-Split)."""
try:
from compliance.api.agent_doc_check_scorecard import (
build_scorecard_html,
)
html = build_scorecard_html(
state.get("scorecard") or {},
previous_scorecard=state.get("prev_scorecard"),
)
return _safe_wrap("📊 Compliance-Scorecard pro Regulation",
"scorecard", html, sev="info")
except Exception as e:
logger.warning("scorecard wrapper: %s", e)
return ""
def render_profile_html(state: dict) -> str:
"""Erkanntes Geschäftsmodell."""
try:
from compliance.api.agent_doc_check_report import build_profile_html
html = build_profile_html(state.get("profile"))
return _safe_wrap("🏢 Erkanntes Geschäftsmodell",
"profile", html, sev="info")
except Exception as e:
logger.warning("profile wrapper: %s", e)
return ""
def render_input_warnings(state: dict) -> str:
"""Doc-Input-Warnings: User Text in falsches Feld gepastet."""
try:
from compliance.services.doc_input_warnings import (
build_warnings_block_html,
)
warns = state.get("input_warnings") or []
html = build_warnings_block_html(warns) if warns else ""
return _safe_wrap("⚠️ Eingabe-Warnungen",
"input-warn", html, sev="warn")
except Exception as e:
logger.warning("input_warnings wrapper: %s", e)
return ""
# ── Tier 3 (Cookie-deep + advisory) ───────────────────────────────
def render_entropy(state: dict) -> str:
"""P103 — Cookie-Value-Entropy."""
try:
from compliance.services.cookie_value_entropy import (
build_entropy_block_html,
)
html = build_entropy_block_html(state.get("entropy_findings") or [])
return _safe_wrap("🎲 Cookie-Entropy-Anomalien (P103)",
"entropy", html, sev="info")
except Exception as e:
logger.warning("entropy wrapper: %s", e)
return ""
def render_network_trace(state: dict) -> str:
"""P104 — Network-Tracing."""
try:
from compliance.services.cookie_network_tracer import (
build_network_trace_block_html,
)
html = build_network_trace_block_html(
state.get("network_findings") or [])
return _safe_wrap("🌐 Network-Tracing (P104)",
"network", html, sev="info")
except Exception as e:
logger.warning("network_trace wrapper: %s", e)
return ""
def render_tcf_authority(state: dict) -> str:
"""P105 — IAB TCF Authority Cross-Reference."""
try:
from compliance.services.tcf_vendor_authority import (
build_tcf_authority_block_html,
)
html = build_tcf_authority_block_html(
state.get("tcf_authority_findings") or [])
return _safe_wrap("🆔 IAB TCF Vendor Authority (P105)",
"tcf-auth", html, sev="info")
except Exception as e:
logger.warning("tcf_authority wrapper: %s", e)
return ""
def render_jc_avv(state: dict) -> str:
"""P71 — JC-vs-AVV Entscheidungsbaum."""
try:
from compliance.services.jc_avv_decision import (
build_jc_avv_decision_html,
)
html = build_jc_avv_decision_html(
(state.get("doc_texts") or {}).get("dse"))
return _safe_wrap("⚖️ Joint Controller vs. AVV — Entscheidung (P71)",
"jc-avv", html, sev="info")
except Exception as e:
logger.warning("jc_avv wrapper: %s", e)
return ""
def render_industry_context(state: dict) -> str:
"""P6/53/55 — Branchen-Kontext + Site-History."""
try:
from compliance.services.industry_library import (
build_industry_context_block_html,
)
ind = None
req = state.get("req")
if req and getattr(req, "scan_context", None):
ind = req.scan_context.get("industry")
html = build_industry_context_block_html(
ind, state.get("site_profile"))
return _safe_wrap("🏭 Branchen-Kontext + Historie",
"industry", html, sev="info")
except Exception as e:
logger.warning("industry_context wrapper: %s", e)
return ""
def render_benchmark(state: dict) -> str:
"""P86 — Branchen-Benchmark."""
try:
from compliance.services.industry_benchmark import (
build_benchmark_html,
)
html = build_benchmark_html(state.get("benchmark") or {})
return _safe_wrap("📈 Branchen-Benchmark (P86)",
"bench", html, sev="info")
except Exception as e:
logger.warning("benchmark wrapper: %s", e)
return ""
def render_scanned_urls(state: dict) -> str:
"""Quellen-Transparenz: welche URLs wurden gecrawlt."""
try:
from compliance.api.agent_doc_check_report import (
build_scanned_urls_html,
)
html = build_scanned_urls_html(state.get("doc_entries") or [])
return _safe_wrap("🔗 Geprüfte URLs (Quellen-Transparenz)",
"scanned-urls", html, sev="info")
except Exception as e:
logger.warning("scanned_urls wrapper: %s", e)
return ""
def render_management_summary(state: dict) -> str:
"""Konkrete Aufgaben für die Geschäftsführung."""
try:
from compliance.api.agent_doc_check_report import (
build_management_summary,
)
html = build_management_summary(state.get("results") or [])
return _safe_wrap("📝 Management-Zusammenfassung",
"mgmt", html, sev="info")
except Exception as e:
logger.warning("management_summary wrapper: %s", e)
return ""
# ── Render the whole legacy block region ────────────────────────
def render_all_legacy(state: dict) -> str:
"""Render every legacy block in the canonical order."""
return "".join([
# Tier 1 (Sales)
render_executive_summary(state),
render_diff(state),
render_solutions(state),
render_redundancy(state),
render_vvt(state),
render_banner_screenshot(state),
# Tier 2 (Audit-detail)
render_scorecard_regulation(state),
render_banner_deep(state),
render_banner_consistency(state),
render_cookie_audit(state),
render_cookie_architecture(state),
render_library_mismatch(state),
render_signals(state),
render_profile_html(state),
render_input_warnings(state),
# Tier 3 (advisory)
render_entropy(state),
render_network_trace(state),
render_tcf_authority(state),
render_jc_avv(state),
render_industry_context(state),
render_benchmark(state),
render_scanned_urls(state),
render_management_summary(state),
# Scope-Disclaimer last — footer-ish
render_scope_disclaimer(state),
])
@@ -0,0 +1,88 @@
"""Mail-V2 scope filter — drop MC findings that don't apply.
Some MC-DB entries are sector-specific (FIN = financial services,
GOV = public authority, MED = healthcare, INS = insurance, EDU =
education, LEG = legal profession). They have no business surfacing
for a normal B2C company like Elli (energy/EV charging).
This filter inspects the MC ID prefix and, when the prefix denotes
a sector that doesn't match the detected `business_scope`, drops the
check from the V2 finding renderers.
The MC pipeline itself is unchanged — MCs are still evaluated; we
just suppress them in the report when out of scope. Set
`KEEP_OOS_MCS=true` in the env to disable the filter (useful for
DSB debug runs).
"""
from __future__ import annotations
import os
# Prefix -> sector token expected in business_scope to KEEP the check.
SECTOR_PREFIXES: dict[str, set[str]] = {
"FIN": {"financial_services", "bank", "bafin", "fintech",
"payment_provider"},
"GOV": {"public_authority", "government", "behoerde"},
"MED": {"healthcare", "medical", "pharma", "klinik"},
"INS": {"insurance", "versicherung"},
"EDU": {"education", "schule", "hochschule", "university"},
"LEG": {"legal_profession", "anwaltskammer", "kanzlei"},
"REL": {"church", "religion", "religious"},
"POL": {"political_party", "partei"},
}
# Cheap counter so the renderer can show "X MCs gefiltert (out of scope)".
_LAST_DROPPED: dict[str, int] = {"count": 0, "by_prefix": {}}
def _enabled() -> bool:
return os.environ.get("KEEP_OOS_MCS", "false").lower() not in (
"true", "1", "yes", "on",
)
def _extract_prefix(check_id: str) -> str | None:
"""Return the sector prefix (e.g. 'FIN') from mc-FIN-814-A03."""
if not check_id:
return None
parts = check_id.split("-")
# mc-XXX-NNN-AYY → parts = ["mc", "XXX", "NNN", "AYY"]
if len(parts) >= 2 and parts[0].lower() == "mc":
prefix = parts[1].upper()
if prefix in SECTOR_PREFIXES:
return prefix
return None
def is_out_of_scope(check, business_scope: set[str] | None) -> bool:
"""Decide whether the check is sector-specific AND out of scope."""
if not _enabled():
return False
prefix = _extract_prefix(getattr(check, "id", "") or "")
if not prefix:
return False
required = SECTOR_PREFIXES.get(prefix) or set()
scope_lc = {s.lower() for s in (business_scope or set())}
return not (scope_lc & required)
def filter_out_of_scope(checks, business_scope: set[str] | None) -> list:
"""Return `checks` with out-of-scope items removed; mutates counter."""
_LAST_DROPPED["count"] = 0
_LAST_DROPPED["by_prefix"] = {}
out = []
for c in checks:
if is_out_of_scope(c, business_scope):
_LAST_DROPPED["count"] += 1
prefix = _extract_prefix(getattr(c, "id", "") or "") or "?"
_LAST_DROPPED["by_prefix"][prefix] = (
_LAST_DROPPED["by_prefix"].get(prefix, 0) + 1
)
continue
out.append(c)
return out
def get_last_drop_stats() -> dict:
return dict(_LAST_DROPPED)
@@ -0,0 +1,200 @@
"""Mail-V2 style system — single source of truth for all visual props.
Email rendering = inline styles only (most clients strip <style> tags
or sandbox them). Table-based layouts because flex/grid is unreliable
in Outlook. Font stack = email-safe (no web fonts).
Public helpers:
- `section(title, body_html, *, sev=None, anchor=None)` →
standardized full-width card with optional severity stripe + TOC
anchor
- `card(body_html, *, sev=None)` → smaller card inside a section
- `kpi(label, value, sub=None, sev=None)` → single KPI tile (used
in 4-column header grid)
- `kpi_row(items)` → evenly-sized row of KPIs
- `chip(text, sev)` → inline pill for severity / status
- `table(headers, rows, *, sev_col=None)` → consistent zebra table
`sev` is one of: "pass" | "fail" | "warn" | "info" | None (neutral)
"""
from __future__ import annotations
# ── Color palette ─────────────────────────────────────────────────
PAGE_BG = "#f8fafc"
CARD_BG = "#ffffff"
BORDER = "#e2e8f0"
TEXT = "#1e293b"
TEXT_MUTED = "#64748b"
HEADER_BG = "#0f172a"
HEADER_FG = "#f8fafc"
SEV = {
"pass": {"bg": "#dcfce7", "fg": "#15803d", "stripe": "#16a34a"},
"fail": {"bg": "#fee2e2", "fg": "#991b1b", "stripe": "#dc2626"},
"warn": {"bg": "#fef3c7", "fg": "#92400e", "stripe": "#f59e0b"},
"info": {"bg": "#dbeafe", "fg": "#1e40af", "stripe": "#3b82f6"},
}
NEUTRAL_STRIPE = "#cbd5e1"
# ── Typography ────────────────────────────────────────────────────
FONT = ("-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Oxygen,"
"Ubuntu,sans-serif")
SZ_TITLE = "24px"
SZ_H2 = "18px"
SZ_H3 = "15px"
SZ_BODY = "14px"
SZ_SMALL = "12px"
# ── Layout ────────────────────────────────────────────────────────
MAX_W = "720px"
PAD_SECTION = "20px"
PAD_CARD = "14px"
RADIUS = "6px"
def _stripe(sev: str | None) -> str:
return SEV[sev]["stripe"] if sev in SEV else NEUTRAL_STRIPE
def section(title: str, body_html: str, *,
sev: str | None = None, anchor: str | None = None) -> str:
"""Top-level audit card — every report section uses this shell."""
stripe = _stripe(sev)
a = f'<a id="{anchor}"></a>' if anchor else ""
return (
f'{a}<table role="presentation" cellpadding="0" cellspacing="0" '
f'border="0" width="100%" style="margin:24px 0;border-collapse:'
f'separate;border-spacing:0;">'
f'<tr><td style="background:{CARD_BG};border:1px solid {BORDER};'
f'border-left:4px solid {stripe};border-radius:{RADIUS};'
f'padding:{PAD_SECTION};">'
f'<h2 style="margin:0 0 12px;font-family:{FONT};font-size:{SZ_H2};'
f'color:{TEXT};font-weight:600;">{title}</h2>'
f'<div style="font-family:{FONT};font-size:{SZ_BODY};color:{TEXT};'
f'line-height:1.5;">{body_html}</div>'
f'</td></tr></table>'
)
def card(body_html: str, *, sev: str | None = None) -> str:
"""Sub-card inside a section."""
stripe = _stripe(sev)
return (
f'<table role="presentation" cellpadding="0" cellspacing="0" '
f'border="0" width="100%" style="margin:8px 0;border-collapse:'
f'separate;border-spacing:0;">'
f'<tr><td style="background:{CARD_BG};border:1px solid {BORDER};'
f'border-left:3px solid {stripe};border-radius:{RADIUS};'
f'padding:{PAD_CARD};font-family:{FONT};font-size:{SZ_BODY};'
f'color:{TEXT};">'
f'{body_html}'
f'</td></tr></table>'
)
def kpi(label: str, value: str, sub: str | None = None,
sev: str | None = None) -> str:
"""One KPI tile. Used 4-in-a-row in the header."""
value_color = SEV[sev]["fg"] if sev in SEV else TEXT
sub_html = (
f'<div style="font-family:{FONT};font-size:{SZ_SMALL};'
f'color:{TEXT_MUTED};margin-top:2px;">{sub}</div>'
if sub else ""
)
return (
f'<td style="background:{CARD_BG};border:1px solid {BORDER};'
f'border-radius:{RADIUS};padding:14px;text-align:center;'
f'width:25%;vertical-align:top;">'
f'<div style="font-family:{FONT};font-size:{SZ_SMALL};'
f'color:{TEXT_MUTED};text-transform:uppercase;letter-spacing:.5px;">'
f'{label}</div>'
f'<div style="font-family:{FONT};font-size:26px;color:{value_color};'
f'font-weight:700;margin-top:6px;">{value}</div>'
f'{sub_html}'
f'</td>'
)
def kpi_row(items: list[dict]) -> str:
"""Render a row of 2-4 KPI tiles, equally sized."""
cells = "".join(
kpi(it["label"], it["value"], it.get("sub"), it.get("sev"))
for it in items
)
spacers = "".join(
'<td style="width:8px;"></td>' for _ in range(max(0, len(items) - 1))
)
# interleave
parts = items[:]
cells_list = [
kpi(it["label"], it["value"], it.get("sub"), it.get("sev"))
for it in parts
]
interleaved = '<td style="width:8px;"></td>'.join(cells_list)
return (
f'<table role="presentation" cellpadding="0" cellspacing="0" '
f'border="0" width="100%" style="margin:12px 0;border-collapse:'
f'separate;border-spacing:0;"><tr>{interleaved}</tr></table>'
)
def chip(text: str, sev: str | None = None) -> str:
"""Inline pill for severity / status."""
pal = SEV.get(sev or "", {"bg": "#f1f5f9", "fg": TEXT_MUTED})
return (
f'<span style="display:inline-block;background:{pal["bg"]};'
f'color:{pal["fg"]};font-family:{FONT};font-size:11px;font-weight:600;'
f'padding:2px 8px;border-radius:999px;'
f'text-transform:uppercase;letter-spacing:.3px;">{text}</span>'
)
def table(headers: list[str], rows: list[list[str]], *,
sev_col: int | None = None) -> str:
"""Render a consistent zebra table.
`sev_col`, when set, indicates which column already contains a
chip() (so we don't escape it).
"""
th = "".join(
f'<th style="text-align:left;padding:8px 10px;font-family:{FONT};'
f'font-size:{SZ_SMALL};color:{TEXT_MUTED};text-transform:uppercase;'
f'letter-spacing:.5px;border-bottom:1px solid {BORDER};">{h}</th>'
for h in headers
)
body_rows = []
for i, r in enumerate(rows):
bg = "#ffffff" if i % 2 == 0 else "#f8fafc"
cells = "".join(
f'<td style="padding:8px 10px;font-family:{FONT};font-size:13px;'
f'color:{TEXT};border-bottom:1px solid {BORDER};vertical-align:top;">'
f'{c}</td>' for c in r
)
body_rows.append(f'<tr style="background:{bg};">{cells}</tr>')
body = "".join(body_rows)
return (
f'<table role="presentation" cellpadding="0" cellspacing="0" '
f'border="0" width="100%" style="border-collapse:collapse;'
f'margin:8px 0;background:{CARD_BG};border:1px solid {BORDER};'
f'border-radius:{RADIUS};overflow:hidden;">'
f'<thead><tr>{th}</tr></thead><tbody>{body}</tbody></table>'
)
def page_open(site_name: str) -> str:
return (
f'<div style="background:{PAGE_BG};padding:24px 16px;font-family:{FONT};">'
f'<div style="max-width:{MAX_W};margin:0 auto;">'
)
def page_close(check_id: str, build_sha: str) -> str:
return (
f'<div style="margin-top:32px;padding:16px;font-family:{FONT};'
f'font-size:11px;color:{TEXT_MUTED};text-align:center;">'
f'BreakPilot Compliance · check_id <code>{check_id}</code> · '
f'build <code>{build_sha}</code>'
f'</div>'
f'</div></div>'
)
@@ -128,9 +128,10 @@ _SENTENCE_SPLIT = re.compile(r"(?<=[.!?])\s+(?=[A-ZÄÖÜ])")
# Quick anchor terms for retention sentences.
_RETENTION_ANCHORS = (
"speicherdauer", "speicherfrist", "speicher",
"aufbewahrungsdauer", "aufbewahrungsfrist",
"löschfrist", "löschung",
"gespeichert für", "wird gespeichert", "wird für",
"aufbewahrungsdauer", "aufbewahrungsfrist", "aufbewahr",
"löschfrist", "löschung", "gelöscht",
"gespeichert für", "wird gespeichert", "wird für", "werden für",
"in der regel", "bis zu",
"retention", "expires", "expiration", "lifetime",
"gültigkeit", "laufzeit",
)
@@ -318,6 +319,74 @@ def compare_retention(
return out
def detect_intra_doc_contradictions(
dsi_text: str,
) -> list[dict]:
"""Find sentences in the SAME doc that claim different retention
values for what looks like the same data category.
Catches the Elli pattern:
"Logfiles werden 7 Tage gespeichert" + "Logfiles werden 30 Tage
aufbewahrt" → contradiction in one DSE.
Heuristik: group retention-bearing sentences by a category-anchor
keyword (logfile / log / chatverlauf / cookies / nutzungsdaten /
server-log) and report when ≥2 different day-values exist for the
same group.
"""
if not dsi_text:
return []
claims = extract_retention_claims(dsi_text)
if len(claims) < 2:
return []
anchors = (
("logfile", ("logfile", "log-file", "log file", "server-log")),
("chat", ("chat", "chatverlauf", "konversation")),
("cookie", ("cookie",)),
("session", ("session", "sitzung")),
("nutzungsdaten", ("nutzungsdaten", "usage data")),
)
by_group: dict[str, list[RetentionClaim]] = {}
for cl in claims:
if cl.days is None:
continue
sentence_lc = cl.sentence.lower()
for group, kws in anchors:
if any(k in sentence_lc for k in kws):
by_group.setdefault(group, []).append(cl)
break
findings: list[dict] = []
for group, group_claims in by_group.items():
days_set = {round(c.days, 1) for c in group_claims if c.days}
if len(days_set) < 2:
continue
values = sorted(days_set)
delta = values[-1] - values[0]
sev = "HIGH" if delta > values[0] * 3 else "MEDIUM"
findings.append({
"check_id": "TH-RETENTION-INTRA-001",
"category": group,
"severity": sev,
"severity_reason": "factually_wrong",
"values_days": values,
"claims": [c.sentence[:200] for c in group_claims[:3]],
"title": (
f"Speicherdauer-Widerspruch in DSE für '{group}': "
f"{values} Tage"
),
"norm": "DSGVO Art. 5 Abs. 1 lit. a (Transparenz)",
"action": (
f"In der DSE einheitlichen Wert für '{group}' angeben. "
"Aktuell mindestens zwei verschiedene Werte genannt — "
"ein Mandant kann die Frist nicht eindeutig erkennen."
),
})
return findings
def build_retention_theme_summary(
findings: list[dict],
) -> dict:
@@ -0,0 +1,192 @@
"""Cross-Doc Vendor-Consistency Check.
Coverage gap discovered against the Elli ground truth (2026-06-06):
the DSE declares "Google Vertex AI" for the customer-service chatbot,
but `/de/cookies` lists "Iadvize" as the chat provider a direct
contradiction the deterministic pipeline missed.
This check looks for cross-doc provider mismatches per service type:
service_type keywords searched in DSE / cookie text
chatbot "chatbot", "AI assistant", "Konversations", "Live-Chat"
analytics "Analytics", "Analyse"
tag_manager "Tag Manager", "GTM"
marketing_pixel "Pixel", "Tracking-Pixel"
cdn "CDN", "Content Delivery"
consent_mgmt "Consent Management", "CMP"
For each service type, extract the provider name(s) mentioned in the
DSE and in the cookie/cookie-policy text. When DSE and cookie text
disagree finding with severity HIGH (transparency contradiction).
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass
logger = logging.getLogger(__name__)
# Known providers per service type. Keep generous; we'd rather
# detect Iadvize-vs-Vertex than under-detect.
_PROVIDERS = {
"chatbot": [
("Google Vertex AI", ["vertex ai", "google vertex", "vertex-ai"]),
("OpenAI", ["openai", "gpt-4", "chatgpt"]),
("Anthropic Claude", ["anthropic", "claude.ai"]),
("Iadvize", ["iadvize", "i-advize"]),
("Intercom", ["intercom"]),
("Zendesk", ["zendesk"]),
("Drift", ["drift.com", "drift chat"]),
("Userlike", ["userlike"]),
("Tidio", ["tidio"]),
("LivePerson", ["liveperson"]),
("Salesforce Einstein", ["einstein bot", "salesforce einstein"]),
("HubSpot", ["hubspot chat", "hubspot conversation"]),
("Microsoft Copilot", ["copilot", "azure openai"]),
("Mistral AI", ["mistral ai", "mistral.ai"]),
("Hugging Face", ["hugging face", "huggingface"]),
],
"analytics": [
("Google Analytics", ["google analytics", "ga4", "_ga ", "_ga,",
"_ga\""]),
("Matomo", ["matomo", "piwik"]),
("Plausible", ["plausible"]),
("Etracker", ["etracker"]),
("Adobe Analytics", ["adobe analytics", "omniture"]),
("Mixpanel", ["mixpanel"]),
("Heap", ["heap analytics"]),
("Amplitude", ["amplitude.com", "amplitude analytics"]),
],
"tag_manager": [
("Google Tag Manager", ["google tag manager", "gtm", "googletagmanager"]),
("Matomo Tag Manager", ["matomo tag", "mtm"]),
("Tealium", ["tealium"]),
("Adobe Launch", ["adobe launch"]),
],
"marketing_pixel": [
("Meta Pixel", ["meta pixel", "facebook pixel", "_fbp"]),
("LinkedIn Insight Tag", ["linkedin insight"]),
("TikTok Pixel", ["tiktok pixel"]),
("X Pixel", ["twitter pixel", "x pixel"]),
("Pinterest Tag", ["pinterest tag"]),
],
"cdn": [
("Cloudflare", ["cloudflare"]),
("Akamai", ["akamai"]),
("Fastly", ["fastly"]),
("AWS CloudFront", ["cloudfront"]),
],
"consent_mgmt": [
("Usercentrics", ["usercentrics"]),
("OneTrust", ["onetrust", "cookiepro"]),
("Cookiebot", ["cookiebot"]),
("Sourcepoint", ["sourcepoint"]),
("Klaro", ["klaro!"]),
],
}
@dataclass
class ProviderMatch:
service_type: str
canonical: str
in_dse: bool
in_cookie: bool
def _find_providers(text: str, service_type: str) -> set[str]:
text_lc = (text or "").lower()
if not text_lc:
return set()
out: set[str] = set()
for canonical, kws in _PROVIDERS.get(service_type, []):
for kw in kws:
if kw in text_lc:
out.add(canonical)
break
return out
def check_vendor_consistency(state: dict) -> list[dict]:
"""Compare provider mentions across DSE and cookie-policy text.
Returns a list of finding dicts, one per service_type with a
mismatch. Empty list when there are no contradictions.
"""
doc_texts = state.get("doc_texts") or {}
dse_text = doc_texts.get("dse") or ""
cookie_text = doc_texts.get("cookie") or ""
if not dse_text or not cookie_text:
return []
findings: list[dict] = []
for service_type in _PROVIDERS:
dse_set = _find_providers(dse_text, service_type)
cookie_set = _find_providers(cookie_text, service_type)
if not dse_set and not cookie_set:
continue
# Disagreement when both name a provider but no overlap.
if dse_set and cookie_set and not (dse_set & cookie_set):
findings.append({
"check_id": "VENDOR-CONSISTENCY-001",
"service_type": service_type,
"severity": "HIGH",
"severity_reason": "factually_wrong",
"dse_providers": sorted(dse_set),
"cookie_providers": sorted(cookie_set),
"title": (
f"{service_type.replace('_', '-').title()}: "
f"DSE nennt {', '.join(sorted(dse_set))}"
f"Cookies-Seite nennt {', '.join(sorted(cookie_set))}"
),
"norm": "DSGVO Art. 13 + Art. 5 Abs. 1 lit. a (Transparenz)",
"action": (
"DSE und Cookie-Richtlinie auf denselben Provider "
"abgleichen — entweder DSE ist veraltet oder die "
"Cookie-Seite nennt einen ausgewechselten Provider."
),
})
elif dse_set and not cookie_set:
findings.append({
"check_id": "VENDOR-CONSISTENCY-002",
"service_type": service_type,
"severity": "MEDIUM",
"severity_reason": "incomplete",
"dse_providers": sorted(dse_set),
"cookie_providers": [],
"title": (
f"{service_type.replace('_', '-').title()}: "
f"DSE nennt {', '.join(sorted(dse_set))} — auf der "
"Cookies-Seite nicht erwähnt"
),
"norm": "DSGVO Art. 13 + EDPB Cookie-Sweep",
"action": (
f"Provider {', '.join(sorted(dse_set))} auf der "
"Cookies-Seite ergänzen — Cookie-Tabelle prüfen."
),
})
elif cookie_set and not dse_set:
findings.append({
"check_id": "VENDOR-CONSISTENCY-003",
"service_type": service_type,
"severity": "HIGH",
"severity_reason": "missing",
"dse_providers": [],
"cookie_providers": sorted(cookie_set),
"title": (
f"{service_type.replace('_', '-').title()}: "
f"Cookies-Seite nennt {', '.join(sorted(cookie_set))} "
"— in DSE nicht deklariert"
),
"norm": "DSGVO Art. 13 Abs. 1 lit. e Empfängerkategorien",
"action": (
f"Provider {', '.join(sorted(cookie_set))} in der DSE "
"als Empfänger benennen + Zweck + Rechtsgrundlage."
),
})
if findings:
logger.info("vendor-consistency: %d findings", len(findings))
return findings
@@ -0,0 +1,213 @@
"""Smoke tests for mail-render V2."""
from unittest.mock import MagicMock
from compliance.services.mail_render_v2._compose import compose_v2
from compliance.services.mail_render_v2._cookie_inventory import (
build_cookie_inventory,
)
from compliance.services.mail_render_v2._style import (
card, chip, kpi_row, page_close, page_open, section, table,
)
def _mock_check_result(label, doc_type, url, comp, corr, n_checks, n_fail):
r = MagicMock()
r.label = label
r.doc_type = doc_type
r.url = url
r.completeness_pct = comp
r.correctness_pct = corr
r.error = ""
r.scenario = "fix"
checks = []
for i in range(n_checks):
c = MagicMock()
c.id = f"mc-{i}"
c.label = f"Check {i}"
c.passed = i >= n_fail
c.severity = "HIGH" if i < n_fail else "LOW"
c.skipped = False
c.hint = f"Hint {i}" if i < n_fail else ""
c.regulation = "DSGVO"
c.level = 2
c.matched_text = ""
checks.append(c)
r.checks = checks
return r
def _full_state():
return {
"check_id": "abc12345",
"site_name": "Example AG",
"domain": "example.de",
"doc_count": 5,
"results": [
_mock_check_result("Impressum", "impressum",
"https://example.de/impressum", 95, 92, 10, 1),
_mock_check_result("Datenschutzerklärung", "dse",
"https://example.de/dse", 80, 63, 20, 5),
],
"total_findings": 6,
"cmp_vendors": [
{"name": "Google", "source": "table_crawled",
"cookies": [
{"name": "_ga", "category": "Statistik",
"duration": "14 Monate"},
{"name": "_gid", "category": "Statistik",
"duration": "24h"},
]},
{"name": "Meta", "source": "html_table",
"cookies": [
{"name": "_fbp", "category": "Marketing",
"duration": "3 Monate"},
]},
],
"banner_result": {
"banner_detected": True,
"banner_provider": "Cookiebot",
"cookies_detailed": [
{"name": "_ga", "domain": "example.de"},
{"name": "_fbp", "domain": "facebook.net"},
{"name": "undocumented_pixel", "domain": "tracker.com"},
],
"banner_checks": {"violations": [{"id": "v1"}]},
},
"cookie_audit": {
"declared_count": 3,
"browser_count": 3,
"undeclared_in_browser": [{"name": "undocumented_pixel"}],
"compliant": [{"name": "_ga"}, {"name": "_fbp"}],
},
"scorecard": {"totals": {"pct": 72}},
"retention_findings": [
{"matches": False, "cookie_name": "_ga", "vendor_name": "Google",
"severity": "HIGH", "severity_reason": "factually_wrong",
"mismatch_type": "dsi_under_actual",
"dsi_days": 180, "table_days": 420, "actual_days": 420,
"diff_days": 240},
{"matches": True, "cookie_name": "_fbp", "severity": None,
"severity_reason": None},
],
"retention_theme_summary": {
"theme_id": "TH-RETENTION", "total": 2, "passed": 1,
"failed": 1, "incomplete": 0, "pct": 50,
"by_severity": {"HIGH": 1}, "by_mismatch_type": {},
"top_fails": [],
},
"reachability_finding": {
"check_id": "COOKIE-CONSENT-UX-001",
"passed": False, "severity": "HIGH",
"severity_reason": "missing",
"notes": ["no consent-manager link in footer"],
"reopen_anchor": None,
"anchors_total": 0,
},
"cookie_evidence_slices": [{"idx": 0}, {"idx": 1}],
"cookie_evidence_meta": {"url": "https://example.de/cookie"},
"audit_quality_findings": [
{"severity": "MEDIUM", "title": "Cookie-URL nicht erreichbar",
"message": "Auto-Discovery hat keine Alternative gefunden."},
],
}
class TestStyleHelpers:
def test_section_wraps_with_title(self):
out = section("My Section", "<p>body</p>")
assert "My Section" in out
assert "<p>body</p>" in out
def test_chip_renders_text(self):
out = chip("FAIL", "fail")
assert "FAIL" in out
def test_table_basic(self):
out = table(["A", "B"], [["1", "2"], ["3", "4"]])
assert "<thead>" in out
assert ">1<" in out and ">4<" in out
def test_kpi_row_4(self):
out = kpi_row([
{"label": "Score", "value": "92%", "sev": "pass"},
{"label": "Findings", "value": "3"},
{"label": "Docs", "value": "5/7"},
{"label": "Vendors", "value": "12"},
])
assert "Score" in out and "92%" in out
def test_card_with_sev(self):
out = card("inner", sev="warn")
assert "inner" in out
def test_page_wraps(self):
head = page_open("Foo")
tail = page_close("abc123", "deadbee")
assert "Foo" not in head # site_name not in open shell
assert "abc123" in tail and "deadbee" in tail
class TestCookieInventory:
def test_merge_declared_and_browser(self):
st = _full_state()
rows, summary = build_cookie_inventory(st)
assert summary["total"] >= 3
names = [r["name"].lower() for r in rows]
assert "_ga" in names
assert "undocumented_pixel" in names
def test_status_undoc_for_browser_only(self):
st = _full_state()
rows, _ = build_cookie_inventory(st)
undoc = next(r for r in rows
if r["name"].lower() == "undocumented_pixel")
assert undoc["status_code"] == "UNDOC"
assert undoc["status_sev"] == "fail"
def test_status_ok_for_compliant(self):
st = _full_state()
rows, _ = build_cookie_inventory(st)
ga = next(r for r in rows if r["name"].lower() == "_ga")
assert ga["status_code"] == "OK"
def test_empty_state(self):
rows, summary = build_cookie_inventory({})
assert rows == []
assert summary["total"] == 0
class TestComposeV2:
def test_full_render(self):
st = _full_state()
html = compose_v2(st)
# Header
assert "Example AG" in html
assert "example.de" in html
# KPIs
assert "72%" in html
# Critical
assert "1. Kritische Befunde" in html
# Per Doc
assert "Impressum" in html
assert "Datenschutzerklärung" in html
# Per Theme
assert "Cookie-Inventar" in html
assert "UNDOC" in html
# Reachability
assert "Mobile Reachability" in html
# Retention
assert "TH-RETENTION" in html
# Caveats
assert "Cookie-URL nicht erreichbar" in html
# Attachments
assert "evidence-abc12345" in html
def test_no_critical_when_clean(self):
st = _full_state()
st["results"] = []
st["reachability_finding"] = {"passed": True, "severity": None,
"notes": []}
st["retention_findings"] = []
html = compose_v2(st)
assert "Keine HIGH/CRITICAL-Befunde" in html
@@ -0,0 +1,425 @@
{
"site": "elli.eco",
"crawled_at": "2026-06-06",
"crawler": "BreakPilot-Compliance Ground-Truth crawl via WebFetch",
"notes_on_url_structure": [
"Canonical privacy policy lives at /de/datenschutz (NOT /de/datenschutzerklaerung — that path 404s).",
"Canonical cookie page lives at /de/cookies (NOT /de/cookie-richtlinie — that path 404s).",
"Canonical terms page lives at /de/nutzungsbedingungen (NOT /de/agb — that path 404s).",
"Withdrawal page: /de/widerruf and /de/widerrufsbelehrung BOTH 404. No standalone Widerrufsbelehrung discovered on the public-facing footer.",
"Footer has a 'Datenschutz-Einstellungen' label, but /de/datenschutz-einstellungen 404s as a URL — the link is a JS-triggered CMP-reopen, not a page. This is relevant for the reachability finding.",
"/de/ (with trailing slash) returns HTTP 303 redirect; homepage canonical is /de/startseite."
],
"docs": {
"home": {
"url": "https://www.elli.eco/de/startseite",
"exists": true,
"status": 200,
"word_count_approx": 1300,
"key_data": {
"cookie_banner_on_entry": true,
"cmp_provider_visible": "not explicitly identifiable from rendered markup (no Usercentrics/OneTrust/Cookiebot/TrustArc fingerprint exposed to crawler)",
"consent_banner_button_labels": ["ablehnen", "akzeptieren"],
"ai_chatbot_widget_visible": false,
"ai_chatbot_disclosure_near_widget": null,
"footer_links": [
"Impressum",
"Datenschutz",
"EU Data Act",
"Datenschutz-Einstellungen",
"Nutzungsbedingungen",
"Cookies",
"Barrierefreiheit",
"Karriere",
"Newsroom",
"Mission Statement",
"Hinweisgebersystem",
"Kontakt",
"Ladetarife",
"Volkswagen Naturstrom",
"FAQ für Privatkunden",
"Fleet Charging",
"Wallbox",
"Flexpole - Schnellladesäule",
"Charging Site Management",
"Downloads",
"Partner FAQs"
]
}
},
"impressum": {
"url": "https://www.elli.eco/de/impressum",
"exists": true,
"status": 200,
"word_count_approx": 160,
"key_data": {
"entities_listed": [
{
"legal_name": "Volkswagen Group Charging GmbH",
"address": "Karl-Liebknecht-Str. 32, 10178 Berlin",
"register_court": "Amtsgericht Charlottenburg",
"hrb": "HRB 208967 B",
"vat_id": null,
"managing_directors": ["Giovanni Palazzo (CEO)", "Mark Möller (CTO)", "Dr. Tobias Canz (CFO)", "Anja Christmann (CHRO)"],
"phone": "00800 3554 1111",
"email": "info@elli.eco",
"content_responsible_mstv_section_18": "Giovanni Palazzo, Karl-Liebknecht-Str. 32, 10178 Berlin"
},
{
"legal_name": "Elli Mobility GmbH",
"address": "Karl-Liebknecht-Str. 32, 10178 Berlin",
"register_court": "Amtsgericht Charlottenburg",
"hrb": "HRB 274616 B",
"vat_id": "DE814424009",
"managing_directors": ["Joschi Jennermann", "Sebastian Steffen"],
"phone": "00800 00002030",
"email": "ellimobility@elli.eco",
"content_responsible_mstv_section_18": "Joschi Jennermann und Sebastian Steffen, Karl-Liebknecht-Str. 32, 10178 Berlin"
}
],
"dpo_named_in_impressum": false,
"dpo_contact_in_impressum": null,
"vat_id_for_vw_group_charging": null,
"completeness_section_5_tmg": "PARTIAL — VW Group Charging GmbH entry lacks a USt-IdNr."
}
},
"dse": {
"url": "https://www.elli.eco/de/datenschutz",
"exists": true,
"status": 200,
"word_count_approx": 18500,
"alt_paths_tried": {
"/de/datenschutzerklaerung": "404",
"/de/datenschutzerklaerung#cookies": "404"
},
"key_data": {
"controller": [
{"name": "Volkswagen Group Charging GmbH", "address": "Karl-Liebknecht-Str. 32, 10178 Berlin", "hrb": "HRB 208967 B"},
{"name": "Elli Mobility GmbH", "address": "Karl-Liebknecht-Str. 32, 10178 Berlin", "hrb": "HRB 274616 B"}
],
"dsb_named": true,
"dsb_contact": {
"address": "Mollstraße 1, 10178 Berlin",
"email": "privacy@elli.eco",
"note": "DPO disclosed inside the DSE, NOT in the Impressum (separate finding)."
},
"retention_claims": [
{"sentence": "Log files: nach 7 Tagen gelöscht", "days": 7, "subject": "log_files_v1"},
{"sentence": "Log files: IP addresses and access data retained for 30 days", "days": 30, "subject": "log_files_v2", "note": "Inconsistent with the 7-day claim — TWO different retention values appear in the policy."},
{"sentence": "Kontaktanfragen: 6 Monate nach der Beantwortung Ihrer Anfrage", "months": 6, "subject": "contact_inquiries"},
{"sentence": "Google Analytics: 14 Monate nach der Bereitstellung der Daten", "months": 14, "subject": "google_analytics_user_id"},
{"sentence": "Chatbot: Die Speicherdauer der Daten beträgt grundsätzlich 6 Monate", "months": 6, "subject": "chatbot_vertex_ai"},
{"sentence": "B2B-Registrierung (unvollständig): sechs Monate lang", "months": 6, "subject": "b2b_registration_incomplete"},
{"sentence": "Kundenzufriedenheitsumfragen: nach fünf Jahren gelöscht", "years": 5, "subject": "customer_satisfaction_surveys"},
{"sentence": "Pseudonymisierte Produktdaten: nach 5 Jahren", "years": 5, "subject": "pseudonymised_product_data"},
{"sentence": "Bewerbungen (abgelehnt): 6 Monate nach Bekanntgabe der Entscheidung", "months": 6, "subject": "job_applications_rejected"},
{"sentence": "Facebook IP-Adressen: nach 90 Tagen gelöscht", "days": 90, "subject": "facebook_ip"},
{"sentence": "Telefonaufzeichnungen (QS): 30 Tage", "days": 30, "subject": "call_recordings_qa"},
{"sentence": "Pseudonymisierte Aufzeichnungen: 3 Jahre oder gemäß gesetzlicher Aufbewahrungspflicht", "years": 3, "subject": "call_recordings_pseudonymised"},
{"sentence": "Newsletter: bis zum Widerruf der Einwilligung", "days": null, "subject": "newsletter"},
{"sentence": "Produktregistrierung: bis zum Widerruf der Einwilligung", "days": null, "subject": "product_registration"}
],
"vendors_mentioned": [
"Google Analytics",
"Google Tag Manager",
"Google Ads / DoubleClick Floodlight",
"YouTube Analytics",
"Google Vertex AI",
"Google Forms",
"Meta (Facebook Pixel / Ads Manager)",
"Instagram Insights",
"LinkedIn Campaign Manager",
"LinkedIn Insight Tag",
"Hotjar",
"Leadinfo",
"Salesforce",
"Webflow",
"Adyen",
"CRIF GmbH",
"Arvato Distribution GmbH",
"HERE Global B.V.",
"Stadia Maps",
"Iadvize"
],
"third_country_transfers": [
{"vendor": "Google Analytics", "country": "US", "mechanism": "Data Privacy Framework (DPF)"},
{"vendor": "Google Ads / Conversion Tracking", "country": "US", "mechanism": "DPF"},
{"vendor": "YouTube", "country": "US", "mechanism": "DPF"},
{"vendor": "Meta (Facebook/Instagram)", "country": "US", "mechanism": "DPF"},
{"vendor": "Salesforce", "country": "US", "mechanism": "Standard Contractual Clauses (SCCs)"},
{"vendor": "Webflow", "country": "US", "mechanism": "SCCs (implied — vendor seated in San Francisco)"}
],
"ai_chatbot": {
"present": true,
"provider": "Google Vertex AI",
"processor": "Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Irland",
"retention_claim": "6 Monate",
"literal_quote": "Im Rahmen der Nutzung des Chatbots werden lediglich IT- und pseudonymisierte Nutzungsdaten verarbeitet... Die Speicherdauer der Daten beträgt grundsätzlich 6 Monate.",
"legal_basis": "Art. 6 Abs. 1 lit. f DSGVO (berechtigtes Interesse)",
"consent_collected_pre_interaction": false,
"ai_act_disclosure_in_chat_ui": "unknown — not observable on /de/startseite because the chatbot widget was NOT rendered to the crawler. Live UI may differ."
},
"legal_basis_per_tool": {
"log_files": "Art. 6 Abs. 1 lit. f DSGVO",
"contact_form": "Art. 6 Abs. 1 lit. b + lit. f DSGVO",
"test_project_enrollment": "Art. 6 Abs. 1 lit. a DSGVO",
"webshop_orders": "Art. 6 Abs. 1 lit. b DSGVO",
"payment_adyen": "Art. 6 Abs. 1 lit. b DSGVO",
"chatbot_vertex_ai": "Art. 6 Abs. 1 lit. f DSGVO",
"newsletter": "Art. 6 Abs. 1 lit. a DSGVO",
"satisfaction_surveys": "Art. 6 Abs. 1 lit. f DSGVO",
"technical_cookies": "Art. 6 Abs. 1 lit. f DSGVO",
"non_technical_cookies": "Art. 6 Abs. 1 lit. a DSGVO",
"google_analytics": "Art. 6 Abs. 1 lit. a DSGVO",
"social_media_tracking": "Art. 6 Abs. 1 lit. f DSGVO",
"job_applications": "Art. 6 Abs. 1 lit. b DSGVO"
},
"cookie_categories_in_dse": [
"Technisch notwendige Cookies",
"Technisch nicht erforderliche Cookies (Komfort: Consent, Sprache, Warenkorb, LocalStorage)",
"Analyse-Cookies (Google Analytics)",
"Tracking / Remarketing (Google Ads, LinkedIn, Facebook, Instagram, YouTube)"
]
}
},
"agb": {
"url": "https://www.elli.eco/de/nutzungsbedingungen",
"exists": true,
"status": 200,
"word_count_approx": 520,
"alt_paths_tried": {
"/de/agb": "404",
"/de/agb/": "404",
"/de/agb-privatkunden": "404"
},
"key_data": {
"title": "Allgemeine Nutzungsbedingungen",
"stand": "Dezember 2018",
"contracting_party": "Volkswagen Group Charging GmbH",
"key_sections": ["Haftungsausschlüsse", "Drittinhalte / Verlinkung", "Urheber- und Markenrecht", "Untersagte Nutzung", "Streitbeilegung (Ablehnung Verbraucherschlichtung mit Ausnahme Energieliefervertrag)"],
"references_to_data_protection": false,
"references_to_ai_or_automated_decisions": false,
"withdrawal_rights_mentioned": false,
"staleness_finding": "Stand 'Dezember 2018' — 7,5 Jahre alt; mehrere DSGVO-Praxisrunden + AI Act + DSA seither erlassen."
}
},
"widerruf": {
"url_canonical_attempted": "https://www.elli.eco/de/widerrufsbelehrung",
"exists": false,
"status": 404,
"alt_paths_tried": {
"/de/widerruf": "404",
"/de/widerrufsbelehrung-privatkunden": "404"
},
"reason": "Keine eigenständige Widerrufsbelehrung als Public Page auf der DE-Domain auffindbar. Footer enthält KEINEN Widerruf-Link. Möglich, dass die Widerrufsbelehrung ausschließlich im Shop-Checkout-Flow / PDF erscheint — aus reiner Site-Crawler-Sicht jedoch NICHT verlinkt.",
"compliance_implication": "Bei B2C-Webshop (Ladetarife, Wallbox, Volkswagen Naturstrom) gem. § 312d BGB + Art. 246a EGBGB MUSS die Widerrufsbelehrung dauerhaft + leicht zugänglich sein. Fehlende Footer-Verlinkung = Verstoßindiz."
},
"cookie_richtlinie": {
"url_canonical_attempted": "https://www.elli.eco/de/cookie-richtlinie",
"exists": false,
"status": 404,
"alt_path_found": "https://www.elli.eco/de/cookies",
"reason": "Pfad /de/cookie-richtlinie 404. Stattdessen existiert /de/cookies (Status 200) — Footer-Label lautet 'Cookies', nicht 'Cookie-Richtlinie'."
},
"cookies_page": {
"url": "https://www.elli.eco/de/cookies",
"exists": true,
"status": 200,
"key_data": {
"categories": ["Session-Cookies", "Permanente / Protokoll-Cookies", "Drittanbieter-Cookies"],
"cookie_table_excerpt": [
{"name": "cookie_consent", "type": "Protokoll", "duration": "3 Monate"},
{"name": "_ga", "vendor": "Google Analytics", "duration": "24 Monate"},
{"name": "_gid", "vendor": "Google Analytics", "duration": "24 Stunden"},
{"name": "_gat", "vendor": "Google Analytics", "duration": "10 Minuten"},
{"name": "_hjid / _hjClosedSurveyInvites / _hjDonePolls", "vendor": "Hotjar", "duration": "365 Tage"},
{"name": "fr / fp / _fbp / _fbc", "vendor": "Facebook", "duration": "90 Tage bis 2 Jahre"},
{"name": "sessionStorage chatbot items", "vendor": "Iadvize?", "duration": "Sitzung"}
],
"vendors_on_cookies_page": ["Google Analytics", "Hotjar", "Facebook", "Iadvize"],
"vertex_ai_disclosed_here": false,
"consistency_with_dse": "INCONSISTENT — Cookies-Seite nennt Iadvize als Chat-Provider; DSE nennt Google Vertex AI als Chatbot-Provider. Doppelte / widersprüchliche Chat-Stack-Aussage."
}
}
},
"expected_findings": [
{
"id": "COOKIE-CONSENT-UX-001",
"severity": "HIGH",
"title": "Mobile-Footer-Reachability für Consent-Reopen unzureichend",
"art_dsgvo": "Art. 7 Abs. 3 DSGVO (Widerruf der Einwilligung so einfach wie Erteilung)",
"art_ttdsg": "§ 25 TTDSG / TDDDG",
"evidence": "Footer enthält 'Datenschutz-Einstellungen' als Label, aber /de/datenschutz-einstellungen 404 → Link ist JS-getriggertes CMP-Reopen. Auf mobiler Ansicht / bei JS-Blockern oder Crawl-Sicht nicht funktional. Zusätzlich öffnet 'Cookies'-Link eine statische Seite, KEIN Consent-Reopen.",
"expected_pass": false
},
{
"id": "COOKIE-CONSENT-UX-002",
"severity": "MEDIUM",
"title": "Consent-Banner-Button-Symmetrie zwar ok ('ablehnen' vs 'akzeptieren'), aber CMP-Provider nicht offen erkennbar",
"evidence": "Beide Buttons sichtbar gleichrangig in Markup. CMP-Fingerprint (Usercentrics/OneTrust/Cookiebot) nicht extrahierbar → Transparenz über eingesetzten CMP fehlt in DSE-Cookies-Sektion.",
"expected_pass": "PARTIAL"
},
{
"id": "TH-RETENTION-001",
"severity": "MEDIUM",
"title": "Widersprüchliche Speicherdauer für Log-Files in DSE",
"evidence": "Privacy Policy nennt einmal '7 Tage' und einmal '30 Tage' Logfile-Retention. Eine der beiden ist falsch oder veraltet.",
"expected_pass": false
},
{
"id": "TH-RETENTION-002",
"severity": "LOW",
"title": "Chatbot-Retention '6 Monate grundsätzlich' ohne Differenzierung nach Kategorie",
"evidence": "Vertex-AI-Chatbot speichert 'IT- und pseudonymisierte Nutzungsdaten' pauschal 6 Monate. Keine Abstufung nach Datenkategorie (Prompt/Output/Metadaten).",
"expected_pass": "PARTIAL"
},
{
"id": "VENDOR-CONSISTENCY-001",
"severity": "HIGH",
"title": "Chatbot-Provider widersprüchlich zwischen Cookies-Seite und DSE",
"evidence": "/de/cookies nennt Iadvize als Chat-Provider; /de/datenschutz nennt Google Vertex AI. Entweder doppelter Stack ohne Offenlegung beider Verarbeiter, oder veraltete Cookies-Seite. Verstoß gegen Transparenzgebot Art. 13 DSGVO.",
"expected_pass": false
},
{
"id": "AI-ACT-TRANSPARENCY-001",
"severity": "HIGH",
"title": "AI-Act Art. 50 Transparenzpflicht für Chatbot nicht erkennbar erfüllt",
"evidence": "Chatbot nutzt Google Vertex AI (LLM). Art. 50 Abs. 1 AI Act verlangt, dass Endnutzer informiert werden, dass sie mit einem KI-System interagieren. In DSE erwähnt, aber im Chat-UI Pre-Interaction-Hinweis NICHT verifizierbar (Widget nicht aktiv im Crawl).",
"expected_pass": "UNKNOWN-LIKELY-FAIL"
},
{
"id": "AI-ACT-RISK-001",
"severity": "MEDIUM",
"title": "Rechtsgrundlage Vertex-AI-Chatbot = berechtigtes Interesse statt Einwilligung",
"evidence": "Art. 6 Abs. 1 lit. f angegeben. Bei Vertex-AI-Verarbeitung mit potenzieller US-Transfer-Komponente + Profiling-Ähnlichkeit der LLM-Logging-Daten ist berechtigtes Interesse fragwürdig. Einwilligung (lit. a) wäre sauberer.",
"expected_pass": false
},
{
"id": "IMPRESSUM-001",
"severity": "MEDIUM",
"title": "USt-IdNr. fehlt bei VW Group Charging GmbH im Impressum",
"evidence": "Nur Elli Mobility GmbH listet 'DE814424009'. VW Group Charging GmbH ohne USt-IdNr. — § 5 Abs. 1 Nr. 6 TMG fordert Angabe, sofern vorhanden.",
"expected_pass": false
},
{
"id": "IMPRESSUM-002",
"severity": "LOW",
"title": "DPO nicht im Impressum genannt",
"evidence": "DPO Mollstr. 1 / privacy@elli.eco erscheint nur in DSE. Best Practice (nicht zwingend Pflicht): DPO-Verlinkung auch im Impressum.",
"expected_pass": "BEST-PRACTICE"
},
{
"id": "WIDERRUFSBELEHRUNG-001",
"severity": "HIGH",
"title": "Keine Widerrufsbelehrung im Footer / Public-Pfad auffindbar",
"evidence": "Pfade /de/widerruf, /de/widerrufsbelehrung, /de/widerrufsbelehrung-privatkunden alle 404. Footer enthält keinen Widerrufslink. Bei B2C-Shop-Komponenten (Wallbox, Naturstrom) verstößt das gegen § 312d BGB i.V.m. Art. 246a EGBGB (dauerhaft + leicht zugänglich).",
"expected_pass": false
},
{
"id": "TERMS-STALENESS-001",
"severity": "MEDIUM",
"title": "Allgemeine Nutzungsbedingungen Stand Dezember 2018",
"evidence": "Über 7 Jahre alt; weder DSA noch AI Act referenziert; KI-Chatbot existiert ohne Erwähnung in den Nutzungsbedingungen.",
"expected_pass": false
},
{
"id": "TRANSFER-001",
"severity": "MEDIUM",
"title": "DPF + SCCs gemischt — Mechanismus pro Vendor nicht durchgängig benannt",
"evidence": "Google/Meta auf DPF; Salesforce auf SCCs; Webflow als US-Sitz erwähnt aber Mechanismus implizit. Detailgrad pro Vendor uneinheitlich.",
"expected_pass": "PARTIAL"
},
{
"id": "URL-STRUCTURE-001",
"severity": "LOW",
"title": "Pfad-Inkonsistenz Footer-Label ↔ URL-Slug",
"evidence": "'Cookie-Richtlinie' (404) erwartet, real existiert /de/cookies. 'AGB' (404) erwartet, real existiert /de/nutzungsbedingungen. Externe Verlinkungen / Bookmarks brechen.",
"expected_pass": "PARTIAL"
}
],
"expected_vendors_in_dse": [
{"name": "Google Analytics", "country": "US", "ai_act_relevance": "none direct; profiling via _ga/_gid"},
{"name": "Google Tag Manager", "country": "US", "ai_act_relevance": "none"},
{"name": "Google Ads / DoubleClick Floodlight", "country": "US", "ai_act_relevance": "none"},
{"name": "Google Vertex AI", "country": "EU-frontend / US-fallback nicht ausgeschlossen", "ai_act_relevance": "HIGH — GPAI / LLM-System gem. AI Act Art. 50 + 51, Transparenzpflicht"},
{"name": "YouTube Analytics", "country": "US", "ai_act_relevance": "none"},
{"name": "Google Forms", "country": "US", "ai_act_relevance": "none"},
{"name": "Meta (Facebook Pixel)", "country": "US", "ai_act_relevance": "Profiling-relevant"},
{"name": "Instagram Insights", "country": "US", "ai_act_relevance": "none"},
{"name": "LinkedIn Insight Tag", "country": "US/IE", "ai_act_relevance": "none"},
{"name": "Hotjar", "country": "MT/EU", "ai_act_relevance": "Session-Replay → potenzielle Profiling-Komponente"},
{"name": "Leadinfo", "country": "DE/NL", "ai_act_relevance": "B2B-Identifikation, nicht AI"},
{"name": "Salesforce", "country": "US", "ai_act_relevance": "Einstein-Features wenn aktiv → AI-Bewertung nötig"},
{"name": "Webflow", "country": "US", "ai_act_relevance": "CMS, keine AI"},
{"name": "Adyen", "country": "NL", "ai_act_relevance": "Fraud-Scoring potenziell relevant"},
{"name": "CRIF GmbH", "country": "DE", "ai_act_relevance": "Bonitäts-Scoring → automatisierte Entscheidung Art. 22 DSGVO!"},
{"name": "Arvato Distribution GmbH", "country": "DE", "ai_act_relevance": "none"},
{"name": "HERE Global B.V.", "country": "NL", "ai_act_relevance": "none"},
{"name": "Stadia Maps", "country": "US", "ai_act_relevance": "none"},
{"name": "Iadvize", "country": "FR", "ai_act_relevance": "Live-Chat — wenn AI-Routing aktiv: AI-Act relevant; Konflikt mit Vertex AI Angabe"}
],
"expected_cookie_categories": [
"Technisch notwendige Cookies",
"Komfort-Cookies (Sprache, Warenkorb, Consent)",
"Statistik / Analyse (Google Analytics, Hotjar)",
"Marketing / Tracking (Google Ads, Meta, LinkedIn, YouTube, Instagram, Facebook Pixel)"
],
"expected_critical_issues": [
"Mobile Reachability: Footer hat KEINEN harten 'Cookie-Einstellungen öffnen' Pfad — 'Datenschutz-Einstellungen' ist nur JS-Trigger ohne URL-Fallback; 'Cookies' verlinkt nur statische Seite.",
"AI Assistant: DSE nennt Google Vertex AI mit 6 Monaten Retention + Rechtsgrundlage Art. 6 Abs. 1 lit. f — KEINE Einwilligung pre-interaction, AI-Act Art. 50 Disclosure im Chat-UI nicht verifiziert.",
"Vendor-Stack-Widerspruch: Cookies-Seite sagt Iadvize, DSE sagt Vertex AI — entweder doppelter Stack ohne saubere Trennung oder veraltete Cookies-Seite.",
"Widerrufsbelehrung als eigenständige Seite nicht im Footer — bei B2C-Komponenten regelwidrig.",
"Nutzungsbedingungen Stand Dezember 2018 — fast 8 Jahre alt; weder AI Act noch DSA reflektiert.",
"Logfile-Retention widersprüchlich (7 vs 30 Tage) in derselben DSE.",
"VW Group Charging GmbH ohne USt-IdNr. im Impressum (TMG § 5 Abs. 1 Nr. 6).",
"DPO nur in DSE, nicht im Impressum (best practice gap).",
"Footer-Label 'Cookie-Richtlinie' / 'AGB' erwartet → reale Slugs sind /de/cookies + /de/nutzungsbedingungen; externe Bookmarks brechen, SEO-Verlust für Standardsuchen."
],
"ai_chatbot_analysis": {
"name_in_ui": "AI Assistant (vermutet; Widget im Crawl nicht aktiv gerendert)",
"provider_disclosed_in_dse": "Google Vertex AI (Google Ireland Limited als Auftragsverarbeiter)",
"alt_provider_disclosed_in_cookies_page": "Iadvize (widersprüchlich)",
"retention_claim_in_dse": "6 Monate (grundsätzlich)",
"legal_basis_in_dse": "Art. 6 Abs. 1 lit. f DSGVO (berechtigtes Interesse) — nicht Einwilligung",
"user_warning_before_input": "Nicht im Crawl beobachtbar — Pre-Interaction-Disclosure-Status: UNKNOWN. Im Web-Frontend bei Live-Test zu prüfen.",
"ai_act_article_50_compliance": "WAHRSCHEINLICH UNZUREICHEND — Hinweis nur in DSE statt am Interaction-Point.",
"transparency_score": "5/10 — Provider + Retention transparent, aber Rechtsgrundlage schwach, Pre-Chat-Disclosure fehlt, Vendor-Widerspruch zur Cookies-Seite."
},
"footer_reachability_analysis": {
"has_consent_reopen_link": false,
"has_consent_reopen_label": true,
"label_text": "Datenschutz-Einstellungen",
"label_behaviour": "JS-Trigger ohne eigene URL — /de/datenschutz-einstellungen 404",
"footer_links_relevant": [
"Datenschutz (→ /de/datenschutz, statisch)",
"Datenschutz-Einstellungen (→ JS-CMP-Reopen, kein URL-Fallback)",
"Cookies (→ /de/cookies, statisch, KEIN Reopen)",
"Impressum (→ /de/impressum)",
"Nutzungsbedingungen (→ /de/nutzungsbedingungen)"
],
"footer_widerruf_link_present": false,
"browser_deflection_text": "Keine Hinweistexte beobachtet — Banner-Buttons rein 'ablehnen' / 'akzeptieren' ohne Hinweis, dass Widerruf später möglich ist.",
"art_7_abs_3_compliance": "FAIL — Widerruf nicht 'so einfach wie Erteilung'; abhängig von JavaScript + Auffinden eines Footer-Labels in deutschsprachigem Fachjargon. Mobile Reachability schwach.",
"ttdsg_section_25_compliance": "PARTIAL FAIL — Consent-Erteilung über Banner, Widerruf-Mechanismus fragil."
},
"summary_for_breakpilot_audit_comparison": {
"total_pages_crawled_200": 5,
"total_pages_404": 4,
"key_url_corrections_needed_by_audit_tool": {
"datenschutzerklaerung_should_resolve_to": "/de/datenschutz",
"agb_should_resolve_to": "/de/nutzungsbedingungen",
"cookie-richtlinie_should_resolve_to": "/de/cookies",
"widerrufsbelehrung": "not found anywhere on public footer — flag as finding"
},
"high_severity_findings_count": 4,
"medium_severity_findings_count": 6,
"low_severity_findings_count": 3,
"must_detect_to_pass_benchmark": [
"COOKIE-CONSENT-UX-001",
"VENDOR-CONSISTENCY-001",
"AI-ACT-TRANSPARENCY-001",
"WIDERRUFSBELEHRUNG-001",
"TH-RETENTION-001"
]
}
}