d0e3621192
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
- Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
- TOC + Sprung-Links
- 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
- Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
- Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
- 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
- Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
- Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).
5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:
B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
(Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.
B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
(Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
(Einwilligung empfehlen).
B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).
B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).
B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
(kein Usercentrics/OneTrust/Cookiebot/etc → MED).
B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).
LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
- Läuft AFTER MC pipeline, BEFORE D3 render
- Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
fuzzy-tail-match
- Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
jeden FAIL CheckItem
- V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
- KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.
Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
- 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
- 4/6 MEDIUM ✓
- 2/3 LOW ✓
- Total: 10/13 = 77% (Sprung von 4/13 = 31%)
Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).
V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
193 lines
7.4 KiB
Python
193 lines
7.4 KiB
Python
"""Cross-Doc Vendor-Consistency Check.
|
|
|
|
Coverage gap discovered against the Elli ground truth (2026-06-06):
|
|
the DSE declares "Google Vertex AI" for the customer-service chatbot,
|
|
but `/de/cookies` lists "Iadvize" as the chat provider — a direct
|
|
contradiction the deterministic pipeline missed.
|
|
|
|
This check looks for cross-doc provider mismatches per service type:
|
|
|
|
service_type keywords searched in DSE / cookie text
|
|
────────────── ───────────────────────────────────────
|
|
chatbot "chatbot", "AI assistant", "Konversations", "Live-Chat"
|
|
analytics "Analytics", "Analyse"
|
|
tag_manager "Tag Manager", "GTM"
|
|
marketing_pixel "Pixel", "Tracking-Pixel"
|
|
cdn "CDN", "Content Delivery"
|
|
consent_mgmt "Consent Management", "CMP"
|
|
|
|
For each service type, extract the provider name(s) mentioned in the
|
|
DSE and in the cookie/cookie-policy text. When DSE and cookie text
|
|
disagree → finding with severity HIGH (transparency contradiction).
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import re
|
|
from dataclasses import dataclass
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Known providers per service type. Keep generous; we'd rather
|
|
# detect Iadvize-vs-Vertex than under-detect.
|
|
_PROVIDERS = {
|
|
"chatbot": [
|
|
("Google Vertex AI", ["vertex ai", "google vertex", "vertex-ai"]),
|
|
("OpenAI", ["openai", "gpt-4", "chatgpt"]),
|
|
("Anthropic Claude", ["anthropic", "claude.ai"]),
|
|
("Iadvize", ["iadvize", "i-advize"]),
|
|
("Intercom", ["intercom"]),
|
|
("Zendesk", ["zendesk"]),
|
|
("Drift", ["drift.com", "drift chat"]),
|
|
("Userlike", ["userlike"]),
|
|
("Tidio", ["tidio"]),
|
|
("LivePerson", ["liveperson"]),
|
|
("Salesforce Einstein", ["einstein bot", "salesforce einstein"]),
|
|
("HubSpot", ["hubspot chat", "hubspot conversation"]),
|
|
("Microsoft Copilot", ["copilot", "azure openai"]),
|
|
("Mistral AI", ["mistral ai", "mistral.ai"]),
|
|
("Hugging Face", ["hugging face", "huggingface"]),
|
|
],
|
|
"analytics": [
|
|
("Google Analytics", ["google analytics", "ga4", "_ga ", "_ga,",
|
|
"_ga\""]),
|
|
("Matomo", ["matomo", "piwik"]),
|
|
("Plausible", ["plausible"]),
|
|
("Etracker", ["etracker"]),
|
|
("Adobe Analytics", ["adobe analytics", "omniture"]),
|
|
("Mixpanel", ["mixpanel"]),
|
|
("Heap", ["heap analytics"]),
|
|
("Amplitude", ["amplitude.com", "amplitude analytics"]),
|
|
],
|
|
"tag_manager": [
|
|
("Google Tag Manager", ["google tag manager", "gtm", "googletagmanager"]),
|
|
("Matomo Tag Manager", ["matomo tag", "mtm"]),
|
|
("Tealium", ["tealium"]),
|
|
("Adobe Launch", ["adobe launch"]),
|
|
],
|
|
"marketing_pixel": [
|
|
("Meta Pixel", ["meta pixel", "facebook pixel", "_fbp"]),
|
|
("LinkedIn Insight Tag", ["linkedin insight"]),
|
|
("TikTok Pixel", ["tiktok pixel"]),
|
|
("X Pixel", ["twitter pixel", "x pixel"]),
|
|
("Pinterest Tag", ["pinterest tag"]),
|
|
],
|
|
"cdn": [
|
|
("Cloudflare", ["cloudflare"]),
|
|
("Akamai", ["akamai"]),
|
|
("Fastly", ["fastly"]),
|
|
("AWS CloudFront", ["cloudfront"]),
|
|
],
|
|
"consent_mgmt": [
|
|
("Usercentrics", ["usercentrics"]),
|
|
("OneTrust", ["onetrust", "cookiepro"]),
|
|
("Cookiebot", ["cookiebot"]),
|
|
("Sourcepoint", ["sourcepoint"]),
|
|
("Klaro", ["klaro!"]),
|
|
],
|
|
}
|
|
|
|
|
|
@dataclass
|
|
class ProviderMatch:
|
|
service_type: str
|
|
canonical: str
|
|
in_dse: bool
|
|
in_cookie: bool
|
|
|
|
|
|
def _find_providers(text: str, service_type: str) -> set[str]:
|
|
text_lc = (text or "").lower()
|
|
if not text_lc:
|
|
return set()
|
|
out: set[str] = set()
|
|
for canonical, kws in _PROVIDERS.get(service_type, []):
|
|
for kw in kws:
|
|
if kw in text_lc:
|
|
out.add(canonical)
|
|
break
|
|
return out
|
|
|
|
|
|
def check_vendor_consistency(state: dict) -> list[dict]:
|
|
"""Compare provider mentions across DSE and cookie-policy text.
|
|
|
|
Returns a list of finding dicts, one per service_type with a
|
|
mismatch. Empty list when there are no contradictions.
|
|
"""
|
|
doc_texts = state.get("doc_texts") or {}
|
|
dse_text = doc_texts.get("dse") or ""
|
|
cookie_text = doc_texts.get("cookie") or ""
|
|
if not dse_text or not cookie_text:
|
|
return []
|
|
|
|
findings: list[dict] = []
|
|
for service_type in _PROVIDERS:
|
|
dse_set = _find_providers(dse_text, service_type)
|
|
cookie_set = _find_providers(cookie_text, service_type)
|
|
if not dse_set and not cookie_set:
|
|
continue
|
|
# Disagreement when both name a provider but no overlap.
|
|
if dse_set and cookie_set and not (dse_set & cookie_set):
|
|
findings.append({
|
|
"check_id": "VENDOR-CONSISTENCY-001",
|
|
"service_type": service_type,
|
|
"severity": "HIGH",
|
|
"severity_reason": "factually_wrong",
|
|
"dse_providers": sorted(dse_set),
|
|
"cookie_providers": sorted(cookie_set),
|
|
"title": (
|
|
f"{service_type.replace('_', '-').title()}: "
|
|
f"DSE nennt {', '.join(sorted(dse_set))} — "
|
|
f"Cookies-Seite nennt {', '.join(sorted(cookie_set))}"
|
|
),
|
|
"norm": "DSGVO Art. 13 + Art. 5 Abs. 1 lit. a (Transparenz)",
|
|
"action": (
|
|
"DSE und Cookie-Richtlinie auf denselben Provider "
|
|
"abgleichen — entweder DSE ist veraltet oder die "
|
|
"Cookie-Seite nennt einen ausgewechselten Provider."
|
|
),
|
|
})
|
|
elif dse_set and not cookie_set:
|
|
findings.append({
|
|
"check_id": "VENDOR-CONSISTENCY-002",
|
|
"service_type": service_type,
|
|
"severity": "MEDIUM",
|
|
"severity_reason": "incomplete",
|
|
"dse_providers": sorted(dse_set),
|
|
"cookie_providers": [],
|
|
"title": (
|
|
f"{service_type.replace('_', '-').title()}: "
|
|
f"DSE nennt {', '.join(sorted(dse_set))} — auf der "
|
|
"Cookies-Seite nicht erwähnt"
|
|
),
|
|
"norm": "DSGVO Art. 13 + EDPB Cookie-Sweep",
|
|
"action": (
|
|
f"Provider {', '.join(sorted(dse_set))} auf der "
|
|
"Cookies-Seite ergänzen — Cookie-Tabelle prüfen."
|
|
),
|
|
})
|
|
elif cookie_set and not dse_set:
|
|
findings.append({
|
|
"check_id": "VENDOR-CONSISTENCY-003",
|
|
"service_type": service_type,
|
|
"severity": "HIGH",
|
|
"severity_reason": "missing",
|
|
"dse_providers": [],
|
|
"cookie_providers": sorted(cookie_set),
|
|
"title": (
|
|
f"{service_type.replace('_', '-').title()}: "
|
|
f"Cookies-Seite nennt {', '.join(sorted(cookie_set))} "
|
|
"— in DSE nicht deklariert"
|
|
),
|
|
"norm": "DSGVO Art. 13 Abs. 1 lit. e Empfängerkategorien",
|
|
"action": (
|
|
f"Provider {', '.join(sorted(cookie_set))} in der DSE "
|
|
"als Empfänger benennen + Zweck + Rechtsgrundlage."
|
|
),
|
|
})
|
|
if findings:
|
|
logger.info("vendor-consistency: %d findings", len(findings))
|
|
return findings
|