feat(audit-mail): P58/P59c/P60b/P61/P62 — Mercedes-Cycle Phase 1 abgeschlossen
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

P58  Anti-Audit-Detection robuster (script-domain + settings-spezifisch —
     war bereits im Code, jetzt sauber als completed dokumentiert).

P59c DACH-Custom-Cookies in compliance.cookie_library: Borlabs,
     etracker, Matomo/Piwik, Userlike, Cookiebot/Cookieyes/Usercentrics,
     Akamai/Cloudflare/Datadome Bot-Manager + HubSpot. 21 neue Eintraege
     (3 von 24 schon via Open-Cookie-Database vorhanden).
     Script: backend-compliance/scripts/seed_dach_cookies.py.

P60b Vendor-Pattern-Dedupe mit Fuzzy-Match (Jaccard >= 0.7) statt exakter
     Tuple-Equality. Vendors mit teilweise befuellten Feldern (z.B.
     Sitzland eingetragen) fallen nicht mehr aus der globalen Notice —
     Bug: Amazon/Psyma/Qualtrics hatten zuvor wiederholte per-row Actions.

P61  "Untergeschobene Cookies"-Erkennung — wenn ein deklarierter Vendor
     (z.B. Google Tag Manager) automatisch weitere mitbringt (GA + GCL_AU
     + DoubleClick), werden diese als separater Mail-Block (gelb) mit
     COOKIE/VENDOR-Badges + Quellen-Doku ausgewiesen. Neuer Service:
     compliance.services.vendor_package_cookies (8 Primary-Vendors mit
     je 2-4 implicit Cookies/Vendors).

P62  Marketing-Manager-Disclaimer "Was wir sehen / nicht sehen" als
     blauer Box-Block direkt unter dem Critical-Findings-Block. Erklaert
     Grenzen unseres Audits (Server-Side-Tracking, Vendor-interne
     Datenweitergabe, Cross-Page-Banner) und Risiko des Falschvertrauens
     in einen 100%-Score. Neuer Renderer: compliance.api.scope_disclaimer.

Architektur: VVT-Tabellen-Renderer aus agent_doc_check_extras.py (552
LOC -> 242 LOC) in compliance.api.vvt_table_renderer ausgelagert, um den
500-LOC-Hardcap einzuhalten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-21 08:01:27 +02:00
parent 57c0f940a2
commit 603381a67f
7 changed files with 858 additions and 300 deletions
@@ -760,6 +760,36 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
except Exception:
pass
# P61: "Untergeschobene Cookies" — wenn z.B. Google Tag Manager
# deklariert ist, kommen GA + GCL_AU + DoubleClick automatisch mit.
# Findings landen im banner_result fuer Mail-Render.
if banner_result and cmp_vendors:
try:
from compliance.services.vendor_package_cookies import (
detect_implicit_cookies,
)
declared = [v.get("name", "") for v in cmp_vendors if v.get("name")]
actual_cookies: list[str] = []
for phase_data in (banner_result.get("phases") or {}).values():
if isinstance(phase_data, dict):
for ck in (phase_data.get("cookies") or []):
if isinstance(ck, dict) and ck.get("name"):
actual_cookies.append(ck["name"])
implicit_findings = detect_implicit_cookies(
declared, actual_cookies_set=actual_cookies or None,
)
if implicit_findings:
banner_result["implicit_vendor_findings"] = implicit_findings
logger.info(
"P61: %d implicit vendor-package items detected "
"(%d cookies + %d vendors)",
len(implicit_findings),
sum(1 for f in implicit_findings if f["implicit"]["type"] == "cookie"),
sum(1 for f in implicit_findings if f["implicit"]["type"] == "vendor"),
)
except Exception as p61_err:
logger.warning("P61 implicit-vendor detection failed: %s", p61_err)
if cmp_vendors:
logger.info("VVT: %d vendors extracted, validating links",
len(cmp_vendors))
@@ -932,9 +962,17 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
# 6) redundancy_html (Optimierungspotenzial — direkt nach Compliance-Score)
# 7) providers_html + vvt_html (Vendor-Liste)
# 8) report_html (Doc-Pruefung Details)
# P62: Marketing-Manager-Disclaimer — was wir sehen vs nicht sehen
scope_disclaimer_html = ""
try:
from .scope_disclaimer import build_scope_disclaimer_html
scope_disclaimer_html = build_scope_disclaimer_html()
except Exception as e:
logger.warning("Scope-disclaimer block skipped: %s", e)
full_html = (
critical_html + exec_summary_html + cookie_arch_html
+ summary_html + scanned_html + profile_html
critical_html + scope_disclaimer_html + exec_summary_html
+ cookie_arch_html + summary_html + scanned_html + profile_html
+ scorecard_html + redundancy_html
+ providers_html + banner_deep_html + vvt_html + report_html
)
@@ -231,5 +231,55 @@ def build_banner_deep_html(banner_result: dict | None) -> str:
)
parts.append('</ul></div>')
# 6) P61: Untergeschobene Cookies/Vendors (Vendor-Package)
impl_findings = banner_result.get("implicit_vendor_findings") or []
if impl_findings:
# Gruppiert nach primary_vendor: pro Primary die mitgelaufenen Items
by_primary: dict[str, list[dict]] = {}
for f in impl_findings:
by_primary.setdefault(f["primary_vendor"], []).append(f["implicit"])
parts.append(
'<div style="margin:14px 0 4px;padding:8px 12px;'
'background:#fef3c7;border-left:3px solid #d97706;border-radius:4px">'
'<div style="font-size:12px;color:#92400e;font-weight:600;'
'margin-bottom:6px">Untergeschobene Cookies / Vendors '
'(P61 — mit Hauptanbieter automatisch mitgeladen)</div>'
'<div style="font-size:10px;color:#92400e;margin-bottom:8px">'
'Diese Cookies/Vendors kommen automatisch mit dem deklarierten '
'Hauptanbieter mit — Marketing-Manager waehlen sie oft nicht '
'bewusst aus, sie sind aber zustimmungspflichtig.</div>'
)
for primary, impls in by_primary.items():
parts.append(
f'<div style="font-size:11px;color:#1e293b;margin:6px 0">'
f'<strong>{primary}</strong> bringt automatisch:</div>'
'<ul style="margin:0 0 8px 18px;padding:0;font-size:11px;color:#1e293b">'
)
for impl in impls:
tag = ('<span style="font-size:9px;background:#dc2626;color:#fff;'
'padding:1px 5px;border-radius:3px;margin-right:6px">'
'COOKIE</span>' if impl["type"] == "cookie" else
'<span style="font-size:9px;background:#7c3aed;color:#fff;'
'padding:1px 5px;border-radius:3px;margin-right:6px">'
'VENDOR</span>')
cat_color = {"marketing": "#dc2626", "statistics": "#d97706",
"functional": "#0891b2", "essential": "#16a34a"}.get(
impl.get("category", ""), "#475569")
parts.append(
f'<li style="margin-bottom:5px">{tag}'
f'<code style="font-size:10px;background:#f1f5f9;'
f'padding:1px 4px;border-radius:2px">{impl["name"]}</code> '
f'<span style="font-size:9px;color:{cat_color};'
f'margin-left:4px">[{impl.get("category","?")}]</span>'
f'<div style="font-size:10px;color:#475569;margin-top:2px">'
f'{impl.get("why","")[:240]}</div>'
f'<div style="font-size:9px;color:#94a3b8;font-style:italic">'
f'Quelle: <a href="{impl.get("source_url","")}" '
f'style="color:#94a3b8">{impl.get("source_url","")[:80]}</a>'
f'</div></li>'
)
parts.append('</ul>')
parts.append('</div>')
parts.append('</div>')
return "".join(parts)
@@ -234,302 +234,9 @@ def _category_label(kat: str) -> str:
}.get(kat, kat or "")
def build_vvt_table_html(vendors: list[dict]) -> str:
"""Render the per-vendor VVT-style table for the email report.
# VVT-Tabelle (gruppiert + P60/P60b Pattern-Notice) wurde in
# vvt_table_renderer.py ausgelagert, damit dieses File unter dem
# 500-LOC-Hardcap bleibt. Re-export, damit bestehende Aufrufer (z.B.
# agent_compliance_check_routes) unveraendert weiter funktionieren.
from compliance.api.vvt_table_renderer import build_vvt_table_html # noqa: E402,F401
Splits vendors into 3-4 sections by recipient_type (Art. 30(1)(d)
DSGVO):
1. INTERNAL own departments / own systems
2. GROUP_COMPANY parent/subsidiary (if any)
3. PROCESSOR Auftragsverarbeiter (AVV-pflichtig)
4. CONTROLLER joint/independent controllers (Meta, Google,
LinkedIn they build own profiles)
5. AUTHORITY / OTHER rest
Within each section: rows sorted by compliance_score ascending so
the weakest entries surface first.
"""
if not vendors:
return ""
# Import here to avoid pulling backend service deps at module load
from compliance.services.vendor_classifier import RECIPIENT_TYPE_SECTIONS
# Bucket vendors by recipient_type
by_type: dict[str, list[dict]] = {}
for v in vendors:
rt = (v.get("recipient_type") or "OTHER").upper()
by_type.setdefault(rt, []).append(v)
# Top summary
n_total = len(vendors)
n_internal = sum(1 for v in vendors
if (v.get("recipient_type") or "").upper()
in ("INTERNAL", "GROUP_COMPANY"))
n_external = n_total - n_internal
n_critical = sum(1 for v in vendors if v.get("compliance_score", 0) < 50)
summary_parts = [f"{n_total} Verarbeitungen erfasst"]
if n_internal and n_external:
summary_parts.append(
f"&mdash; {n_internal} eigene + {n_external} externe Empfaenger"
)
if n_critical:
summary_parts.append(
f', <strong style="color:#dc2626">{n_critical} unter 50%</strong>'
)
else:
summary_parts.append("&mdash; alle ueber 50%")
summary = " ".join(summary_parts)
# P60: Wenn viele Vendors die GLEICHEN Flag-Sets haben, einmal
# global hinweisen statt 42x pro Vendor wiederholen.
from collections import Counter
flag_sets = Counter()
for v in vendors:
flags = v.get("compliance_flags") or []
if flags:
flag_sets[tuple(sorted(flags))] += 1
pattern_notice = ""
if flag_sets:
most_common, n_match = flag_sets.most_common(1)[0]
share = n_match / max(1, len(vendors))
if n_match >= 8 and share >= 0.5:
from compliance.services.finding_action_recipes import recipe_for
labels = [_flag_short(f) for f in most_common]
shared_actions = []
for f in most_common:
rec = recipe_for(f)
if rec:
shared_actions.append(
f'<li><strong>{_flag_short(f)}:</strong> '
f'{rec.get("fix_text", "").splitlines()[0][:180]}</li>'
)
pattern_notice = (
f'<div style="margin:8px 0 12px;padding:10px 14px;'
f'background:#fef3c7;border-left:3px solid #d97706;'
f'border-radius:4px;font-size:11px;color:#92400e">'
f'<strong>Wiederkehrendes Muster ({n_match} von {len(vendors)} '
f'Anbietern, {int(share*100)}%):</strong> '
f'Bei diesen Anbietern fehlen jeweils: '
f'<em>{", ".join(labels)}</em>. '
f'Vermutlich systembedingt (z.B. Settings-Export liefert '
f'nur Namen, oder Banner-API blockiert Detail-Extraktion). '
f'Die globalen Empfehlungen unten gelten fuer all diese Eintraege; '
f'in der Tabelle werden sie nicht pro Zeile wiederholt.'
+ (f'<ul style="margin:8px 0 0 0;padding-left:20px">{"".join(shared_actions)}</ul>'
if shared_actions else '')
+ '</div>'
)
# Mark vendors so _render_vendor_row can suppress redundant actions
for v in vendors:
if tuple(sorted(v.get("compliance_flags") or [])) == most_common:
v["_actions_in_global_notice"] = True
out: list[str] = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
'background:#fafafa;border:1px solid #e5e7eb;border-radius:8px">',
'<h3 style="margin:0 0 4px;font-size:14px;color:#334155">'
'VVT-Vorschlag: Verarbeitungstaetigkeiten und Empfaenger aus der '
'Cookie-Richtlinie</h3>',
f'<p style="margin:0 0 10px;font-size:11px;color:#6b7280">{summary}. '
'Gruppiert nach Empfaengerkategorie (Art. 30(1)(d) DSGVO). Innerhalb '
'jeder Gruppe nach Compliance-Score sortiert. Bei eigenen '
'Verarbeitungen (INTERNAL/GROUP) werden Opt-Out und Privacy-Link '
'NICHT als Pflicht gewertet &mdash; der Widerruf erfolgt ueber das '
'Cookie-Banner, Privacy ist in der Haupt-DSI dokumentiert.</p>',
pattern_notice,
]
for rtype, section_label in RECIPIENT_TYPE_SECTIONS:
rows = by_type.get(rtype) or []
if not rows:
continue
rows = sorted(rows, key=lambda v: v.get("compliance_score", 0))
n = len(rows)
n_bad = sum(1 for v in rows if v.get("compliance_score", 0) < 50)
bad_hint = (f' <span style="color:#dc2626">({n_bad} unter 50%)</span>'
if n_bad else "")
out.append(
f'<h4 style="margin:14px 0 4px;font-size:12px;color:#1e293b;'
f'border-top:1px solid #e2e8f0;padding-top:8px">'
f'{section_label} <span style="color:#94a3b8;font-weight:400">'
f'({n}){bad_hint}</span></h4>'
)
out.append(_render_vendor_section(rows))
out.append('</div>')
return "".join(out)
def _render_vendor_section(rows: list[dict]) -> str:
body: list[str] = [
'<table style="width:100%;border-collapse:collapse;font-size:11px">'
'<thead><tr style="background:#f1f5f9;color:#475569;text-align:left">'
'<th style="padding:5px 8px">Name</th>'
'<th style="padding:5px 8px">Kategorie</th>'
'<th style="padding:5px 8px">Sitz</th>'
'<th style="padding:5px 8px;text-align:center">Cookies</th>'
'<th style="padding:5px 8px;text-align:center">Opt-Out</th>'
'<th style="padding:5px 8px;text-align:center">Privacy</th>'
'<th style="padding:5px 8px;text-align:right">Score</th>'
'</tr></thead><tbody>',
]
for v in rows:
body.append(_render_vendor_row_full(v))
body.append('</tbody></table>')
return "".join(body)
def _render_vendor_row_full(v: dict) -> str:
rtype = (v.get("recipient_type") or "OTHER").upper()
is_own = rtype in ("INTERNAL", "GROUP_COMPANY")
cat = (v.get("category") or "").lower()
is_necessary = cat in ("necessary", "strictlynecessary")
name = v.get("name") or "Unbekannt"
category = _category_label(v.get("category", ""))
country = v.get("country") or ("" if is_own else "")
cookies = v.get("cookies") or []
n_cookies = len(cookies)
score = int(v.get("compliance_score", 0))
flags = v.get("compliance_flags") or []
# Opt-Out: nicht erforderlich fuer eigene Verarbeitung oder
# technisch notwendige Cookies (§25 Abs. 2 TDDDG).
opt_na_reason = ("Nicht erforderlich (eigene Verarbeitung — "
"Widerruf ueber Cookie-Banner)") if is_own else (
"Nicht erforderlich (§25 Abs. 2 TDDDG — technisch notwendig)"
if is_necessary else None
)
opt_status = _link_status_badge(
v.get("opt_out_url"), v.get("opt_out_ok"), v.get("opt_out_status"),
na_label=opt_na_reason,
)
# Privacy: nicht erforderlich fuer eigene Verarbeitung (Haupt-DSI).
privacy_na_reason = (
"Nicht erforderlich (eigene Verarbeitung — durch Haupt-DSI abgedeckt)"
if is_own else None
)
privacy_status = _link_status_badge(
v.get("privacy_policy_url"), v.get("privacy_ok"),
v.get("privacy_status"), na_label=privacy_na_reason,
)
score_color = ("#16a34a" if score >= 80 else
"#d97706" if score >= 50 else "#dc2626")
# Score-Erklaerung: was wurde gewertet, was fehlt
# Annahme: Score = bestandene Kriterien / Gesamtkriterien * 100.
# Typisch 5 Kriterien fuer EXT: country, cookies, opt_out, privacy, scoring.
# Bei INTERNAL/GROUP: opt_out + privacy nicht gewertet (3 Kriterien).
n_criteria = 3 if is_own else 5
n_failed = len(flags) if flags else 0
score_tooltip = (
f"{n_criteria - n_failed} von {n_criteria} Kriterien erfuellt"
+ (f" — fehlt: {', '.join(_flag_short(f) for f in flags[:3])}"
if flags else "")
)
# Inline-Aktions-Anweisungen pro Flag
actions_html = ""
# P60: skip per-row actions when already covered by global pattern notice
skip_actions = bool(v.get("_actions_in_global_notice"))
if flags and not skip_actions:
from compliance.services.finding_action_recipes import recipe_for
action_items = []
for f in flags:
rec = recipe_for(f)
if not rec:
continue
action_items.append(
f'<li style="margin-bottom:6px"><strong>{_flag_short(f)}:</strong> '
f'{rec.get("what", "")}<br/>'
f'<span style="color:#475569"><strong>Was tun:</strong> '
f'{rec.get("fix_text", "").splitlines()[0][:200]}</span><br/>'
f'<span style="color:#94a3b8;font-size:9px">Quelle: '
f'{rec.get("why", "")[:160]}</span></li>'
)
if action_items:
actions_html = (
f'<details style="margin-top:4px"><summary style="cursor:pointer;'
f'color:#dc2626;font-size:10px">Was muss ich tun? '
f'({len(action_items)} Action{"s" if len(action_items) != 1 else ""})</summary>'
f'<ul style="margin:4px 0 0 14px;padding:0;font-size:10px;color:#1e293b">'
+ "".join(action_items)
+ '</ul></details>'
)
flag_str = ""
if flags:
flag_str = (
f'<div style="font-size:10px;color:#94a3b8;margin-top:2px">'
f'{", ".join(flags[:4])}</div>'
f'{actions_html}'
)
risk = v.get("compliance_risk") or {}
risk_label = risk.get("label") or ""
risk_badge = ""
if risk_label and risk_label != "unklar":
rc = {"kritisch": ("#dc2626", "#fff"), "hoch": ("#fecaca", "#991b1b"),
"mittel": ("#fde68a", "#92400e"), "gering": ("#d1fae5", "#065f46")}.get(risk_label, ("#e5e7eb", "#475569"))
risk_badge = (f'<span style="margin-left:6px;padding:1px 5px;border-radius:3px;font-size:9px;'
f'background:{rc[0]};color:{rc[1]}">Risk: {risk_label}</span>')
return (
f'<tr style="border-top:1px solid #e2e8f0">'
f'<td style="padding:6px 8px;color:#1e293b;font-size:11px">'
f'{name}{risk_badge}{flag_str}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{category}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{country}</td>'
f'<td style="padding:6px 8px;text-align:center;color:#475569;font-size:11px">'
f'{n_cookies}</td>'
f'<td style="padding:6px 8px;text-align:center">{opt_status}</td>'
f'<td style="padding:6px 8px;text-align:center">{privacy_status}</td>'
f'<td style="padding:6px 8px;text-align:right;font-weight:600;'
f'color:{score_color};font-size:11px" title="{score_tooltip}">'
f'{score}%<div style="font-size:9px;font-weight:400;color:#94a3b8">'
f'{n_criteria - n_failed}/{n_criteria}</div></td>'
f'</tr>'
)
def _flag_short(f: str) -> str:
"""Lesbare deutsche Form fuer einen Flag-Token."""
labels = {
"no_cookies_listed": "Cookies fehlen",
"no_country": "Sitzland fehlt",
"no_privacy_url": "Privacy-Link fehlt",
"broken_privacy_url": "Privacy-Link broken",
"no_opt_out_url": "Opt-Out fehlt",
"broken_opt_out": "Opt-Out broken",
}
return labels.get(f, f)
def _link_status_badge(
url: str | None,
ok: bool | None,
status: int | None,
na_label: str | None = None,
) -> str:
"""Render the link-status cell.
- url + ok -> green check
- url + broken -> red cross with status
- no url + na_label -> neutral em-dash with explanation tooltip
(used for INTERNAL/necessary rows where the field isn't required)
- no url + no na_label -> red cross (real gap)
"""
if not url:
if na_label:
return ('<span style="color:#94a3b8;font-size:11px" '
f'title="{na_label}">&mdash;</span>')
return ('<span style="color:#dc2626;font-size:11px" '
'title="Kein Link">&#10007;</span>')
if ok:
return ('<span style="color:#16a34a;font-size:11px" '
f'title="HTTP {status}">&#10003;</span>')
status_str = str(status) if status else "?"
return ('<span style="color:#dc2626;font-size:11px" '
f'title="HTTP {status_str}">&#10007; ({status_str})</span>')
@@ -0,0 +1,97 @@
"""
P62 Marketing-Manager-freundlicher Scope-Disclaimer ("Was wir sehen / nicht sehen").
Erklaert in 30 Sekunden was unser Audit tatsaechlich pruefen kann und wo
die Grenzen sind. Ziel: vermeidet falsches Vertrauen in einen 100%-Score
und macht klar, wo Marketing/IT zusaetzlich pruefen muss.
"""
from __future__ import annotations
def build_scope_disclaimer_html() -> str:
"""Render: was wir sehen + was wir NICHT sehen koennen."""
return (
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:700px;margin:8px auto 16px;padding:14px 18px;'
'background:#f0f9ff;border:1px solid #bfdbfe;border-radius:8px">'
'<h3 style="margin:0 0 8px;font-size:13px;color:#1e40af">'
'Was diese Pruefung leistet — und wo ihre Grenzen liegen</h3>'
'<div style="font-size:11px;color:#1e293b;margin-bottom:10px">'
'Wir sind ein <strong>technisches Audit-Tool</strong>, kein Anwalt. '
'Ein 100%-Score bedeutet nicht "rechtssicher" — er bedeutet "alle '
'Pruefkriterien automatisch erfuellt". Folgendes koennen wir vs. '
'koennen wir nicht:</div>'
'<table style="width:100%;border-collapse:collapse;font-size:11px;'
'margin-bottom:8px">'
'<thead><tr style="background:#dbeafe;color:#1e40af;text-align:left">'
'<th style="padding:5px 8px;width:50%">Was wir sehen</th>'
'<th style="padding:5px 8px;width:50%">Was wir NICHT sehen</th>'
'</tr></thead>'
'<tbody>'
'<tr style="border-top:1px solid #bfdbfe">'
'<td style="padding:5px 8px;color:#1e293b">'
'✓ Cookies/Storage im Browser nach Klick auf Akzeptieren/Ablehnen'
'</td>'
'<td style="padding:5px 8px;color:#475569">'
'✗ Server-seitiges Tracking (Meta Conversion API, GA4 Measurement '
'Protocol — der Browser sieht nichts davon)'
'</td></tr>'
'<tr style="border-top:1px solid #bfdbfe">'
'<td style="padding:5px 8px;color:#1e293b">'
'✓ Vendor-Listen aus dem Banner (TCF, CMP-Settings, Phase-G Klick-Tour)'
'</td>'
'<td style="padding:5px 8px;color:#475569">'
'✗ Wer die Daten beim Vendor tatsaechlich erhaelt / weiterleitet '
'(z.B. Google verteilt intern an Ads/Marketing-Plattform)'
'</td></tr>'
'<tr style="border-top:1px solid #bfdbfe">'
'<td style="padding:5px 8px;color:#1e293b">'
'✓ Texte und Pflichtangaben in DSE/Cookie-Richtlinie/Impressum'
'</td>'
'<td style="padding:5px 8px;color:#475569">'
'✗ Ob die internen Prozesse (Loeschkonzept, AVV-Pflege, '
'Mitarbeiter-Schulungen) tatsaechlich gelebt werden'
'</td></tr>'
'<tr style="border-top:1px solid #bfdbfe">'
'<td style="padding:5px 8px;color:#1e293b">'
'✓ Banner-UI-Verstoesse (Dark Patterns, ungleichgewichtige Buttons, '
'fehlender Reject-Mechanismus)'
'</td>'
'<td style="padding:5px 8px;color:#475569">'
'✗ Ob das Banner auf <em>jeder</em> Unterseite identisch ist '
'(wir messen die Einstiegsseite)'
'</td></tr>'
'<tr style="border-top:1px solid #bfdbfe">'
'<td style="padding:5px 8px;color:#1e293b">'
'✓ Untergeschobene Cookies (z.B. Google Tag Manager bringt automatisch '
'GA + Ads — siehe P61-Block unten)'
'</td>'
'<td style="padding:5px 8px;color:#475569">'
'✗ Drittland-Transfer auf Vertragsebene — ob ein SCC/DPF wirklich '
'vorliegt, koennen nur Sie selbst pruefen'
'</td></tr>'
'</tbody></table>'
'<div style="font-size:10px;color:#475569;margin-top:8px;'
'padding-top:6px;border-top:1px dashed #bfdbfe">'
'<strong>Hinweis fuer Marketing &amp; Geschaeftsfuehrung:</strong> '
'Selbst wenn dieser Bericht keinen Verstoss findet, kann ein '
'individueller Bescheid einer Aufsichtsbehoerde oder eine Klage '
'(NOYB, Verbraucherschutz, Sammelklage) zu einem anderen Ergebnis '
'kommen — etwa wenn beim Vendor selbst (Server-Side) personenbezogene '
'Daten verarbeitet werden, die wir browser-seitig nicht sehen. '
'Dieser Bericht ersetzt keine anwaltliche Pruefung, hilft aber, '
'<strong>technisch belegbare Verstoesse</strong> sofort zu schliessen.'
'</div>'
'</div>'
)
@@ -0,0 +1,318 @@
"""
VVT-Tabelle fuer den Email-Report pro Vendor eine Zeile, gruppiert
nach Empfaengerkategorie (Art. 30(1)(d) DSGVO).
Ausgelagert aus agent_doc_check_extras.py (LOC-Cap). Enthaelt:
* build_vvt_table_html Haupteinstieg, gruppiert + summary + P60 notice
* _render_vendor_section / _render_vendor_row_full Zeilenrenderer
* _link_status_badge / _flag_short kleine Helper
P60b Fuzzy-Match: Vendors mit teilweise befuellten Feldern (z.B. Sitzland
eingetragen) fallen nicht aus der Pattern-Notice raus, nur weil ihr
Flag-Set um 1-2 Items kleiner ist. Jaccard >= 0.7 deckt das ab.
"""
from __future__ import annotations
def _category_label(kat: str) -> str:
return {
"necessary": "Notwendig", "strictlynecessary": "Notwendig",
"preferences": "Praeferenzen", "functional": "Funktional",
"statistics": "Statistik", "marketing": "Marketing",
"unclassified": "Unklassifiziert",
}.get((kat or "").lower(), kat or "")
def _flag_short(f: str) -> str:
"""Lesbare deutsche Form fuer einen Flag-Token."""
labels = {
"no_cookies_listed": "Cookies fehlen",
"no_country": "Sitzland fehlt",
"no_privacy_url": "Privacy-Link fehlt",
"broken_privacy_url": "Privacy-Link broken",
"no_opt_out_url": "Opt-Out fehlt",
"broken_opt_out": "Opt-Out broken",
}
return labels.get(f, f)
def _link_status_badge(
url: str | None,
ok: bool | None,
status: int | None,
na_label: str | None = None,
) -> str:
if not url:
if na_label:
return ('<span style="color:#94a3b8;font-size:11px" '
f'title="{na_label}">&mdash;</span>')
return ('<span style="color:#dc2626;font-size:11px" '
'title="Kein Link">&#10007;</span>')
if ok:
return ('<span style="color:#16a34a;font-size:11px" '
f'title="HTTP {status}">&#10003;</span>')
status_str = str(status) if status else "?"
return ('<span style="color:#dc2626;font-size:11px" '
f'title="HTTP {status_str}">&#10007; ({status_str})</span>')
def _build_pattern_notice(vendors: list[dict]) -> str:
"""P60 + P60b: globale Notice wenn viele Vendors aehnliche Flag-Sets haben.
Mutiert vendors[].`_actions_in_global_notice` so dass die Zeilenrenderer
redundante per-row-Actions ueberspringen koennen.
"""
from collections import Counter
flag_sets: Counter = Counter()
for v in vendors:
flags = v.get("compliance_flags") or []
if flags:
flag_sets[tuple(sorted(flags))] += 1
if not flag_sets:
return ""
most_common, _ = flag_sets.most_common(1)[0]
most_common_set = set(most_common)
def _similar(flags: tuple) -> bool:
fs = set(flags)
if not fs or not most_common_set:
return False
inter = len(fs & most_common_set)
union = len(fs | most_common_set)
return union > 0 and (inter / union) >= 0.7
n_match = sum(cnt for fs, cnt in flag_sets.items() if _similar(fs))
share = n_match / max(1, len(vendors))
if not (n_match >= 8 and share >= 0.5):
return ""
from compliance.services.finding_action_recipes import recipe_for
labels = [_flag_short(f) for f in most_common]
shared_actions: list[str] = []
for f in most_common:
rec = recipe_for(f)
if rec:
shared_actions.append(
f'<li><strong>{_flag_short(f)}:</strong> '
f'{rec.get("fix_text", "").splitlines()[0][:180]}</li>'
)
for v in vendors:
if _similar(tuple(sorted(v.get("compliance_flags") or []))):
v["_actions_in_global_notice"] = True
return (
f'<div style="margin:8px 0 12px;padding:10px 14px;'
f'background:#fef3c7;border-left:3px solid #d97706;'
f'border-radius:4px;font-size:11px;color:#92400e">'
f'<strong>Wiederkehrendes Muster ({n_match} von {len(vendors)} '
f'Anbietern, {int(share*100)}%):</strong> '
f'Bei diesen Anbietern fehlen jeweils: '
f'<em>{", ".join(labels)}</em>. '
f'Vermutlich systembedingt (z.B. Settings-Export liefert '
f'nur Namen, oder Banner-API blockiert Detail-Extraktion). '
f'Die globalen Empfehlungen unten gelten fuer all diese Eintraege; '
f'in der Tabelle werden sie nicht pro Zeile wiederholt.'
+ (f'<ul style="margin:8px 0 0 0;padding-left:20px">{"".join(shared_actions)}</ul>'
if shared_actions else '')
+ '</div>'
)
def build_vvt_table_html(vendors: list[dict]) -> str:
"""Render per-vendor VVT-style table for the email."""
if not vendors:
return ""
from compliance.services.vendor_classifier import RECIPIENT_TYPE_SECTIONS
by_type: dict[str, list[dict]] = {}
for v in vendors:
rt = (v.get("recipient_type") or "OTHER").upper()
by_type.setdefault(rt, []).append(v)
n_total = len(vendors)
n_internal = sum(
1 for v in vendors
if (v.get("recipient_type") or "").upper() in ("INTERNAL", "GROUP_COMPANY")
)
n_external = n_total - n_internal
n_critical = sum(1 for v in vendors if v.get("compliance_score", 0) < 50)
summary_parts = [f"{n_total} Verarbeitungen erfasst"]
if n_internal and n_external:
summary_parts.append(
f"&mdash; {n_internal} eigene + {n_external} externe Empfaenger"
)
if n_critical:
summary_parts.append(
f', <strong style="color:#dc2626">{n_critical} unter 50%</strong>'
)
else:
summary_parts.append("&mdash; alle ueber 50%")
summary = " ".join(summary_parts)
pattern_notice = _build_pattern_notice(vendors)
out: list[str] = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
'background:#fafafa;border:1px solid #e5e7eb;border-radius:8px">',
'<h3 style="margin:0 0 4px;font-size:14px;color:#334155">'
'VVT-Vorschlag: Verarbeitungstaetigkeiten und Empfaenger aus der '
'Cookie-Richtlinie</h3>',
f'<p style="margin:0 0 10px;font-size:11px;color:#6b7280">{summary}. '
'Gruppiert nach Empfaengerkategorie (Art. 30(1)(d) DSGVO). Innerhalb '
'jeder Gruppe nach Compliance-Score sortiert. Bei eigenen '
'Verarbeitungen (INTERNAL/GROUP) werden Opt-Out und Privacy-Link '
'NICHT als Pflicht gewertet &mdash; der Widerruf erfolgt ueber das '
'Cookie-Banner, Privacy ist in der Haupt-DSI dokumentiert.</p>',
pattern_notice,
]
for rtype, section_label in RECIPIENT_TYPE_SECTIONS:
rows = by_type.get(rtype) or []
if not rows:
continue
rows = sorted(rows, key=lambda v: v.get("compliance_score", 0))
n = len(rows)
n_bad = sum(1 for v in rows if v.get("compliance_score", 0) < 50)
bad_hint = (f' <span style="color:#dc2626">({n_bad} unter 50%)</span>'
if n_bad else "")
out.append(
f'<h4 style="margin:14px 0 4px;font-size:12px;color:#1e293b;'
f'border-top:1px solid #e2e8f0;padding-top:8px">'
f'{section_label} <span style="color:#94a3b8;font-weight:400">'
f'({n}){bad_hint}</span></h4>'
)
out.append(_render_vendor_section(rows))
out.append('</div>')
return "".join(out)
def _render_vendor_section(rows: list[dict]) -> str:
body: list[str] = [
'<table style="width:100%;border-collapse:collapse;font-size:11px">'
'<thead><tr style="background:#f1f5f9;color:#475569;text-align:left">'
'<th style="padding:5px 8px">Name</th>'
'<th style="padding:5px 8px">Kategorie</th>'
'<th style="padding:5px 8px">Sitz</th>'
'<th style="padding:5px 8px;text-align:center">Cookies</th>'
'<th style="padding:5px 8px;text-align:center">Opt-Out</th>'
'<th style="padding:5px 8px;text-align:center">Privacy</th>'
'<th style="padding:5px 8px;text-align:right">Score</th>'
'</tr></thead><tbody>',
]
for v in rows:
body.append(_render_vendor_row_full(v))
body.append('</tbody></table>')
return "".join(body)
def _render_vendor_row_full(v: dict) -> str:
rtype = (v.get("recipient_type") or "OTHER").upper()
is_own = rtype in ("INTERNAL", "GROUP_COMPANY")
cat = (v.get("category") or "").lower()
is_necessary = cat in ("necessary", "strictlynecessary")
name = v.get("name") or "Unbekannt"
category = _category_label(v.get("category", ""))
country = v.get("country") or ""
cookies = v.get("cookies") or []
n_cookies = len(cookies)
score = int(v.get("compliance_score", 0))
flags = v.get("compliance_flags") or []
opt_na_reason = ("Nicht erforderlich (eigene Verarbeitung — "
"Widerruf ueber Cookie-Banner)") if is_own else (
"Nicht erforderlich (§25 Abs. 2 TDDDG — technisch notwendig)"
if is_necessary else None
)
opt_status = _link_status_badge(
v.get("opt_out_url"), v.get("opt_out_ok"), v.get("opt_out_status"),
na_label=opt_na_reason,
)
privacy_na_reason = (
"Nicht erforderlich (eigene Verarbeitung — durch Haupt-DSI abgedeckt)"
if is_own else None
)
privacy_status = _link_status_badge(
v.get("privacy_policy_url"), v.get("privacy_ok"),
v.get("privacy_status"), na_label=privacy_na_reason,
)
score_color = ("#16a34a" if score >= 80 else
"#d97706" if score >= 50 else "#dc2626")
n_criteria = 3 if is_own else 5
n_failed = len(flags) if flags else 0
score_tooltip = (
f"{n_criteria - n_failed} von {n_criteria} Kriterien erfuellt"
+ (f" — fehlt: {', '.join(_flag_short(f) for f in flags[:3])}"
if flags else "")
)
actions_html = ""
skip_actions = bool(v.get("_actions_in_global_notice"))
if flags and not skip_actions:
from compliance.services.finding_action_recipes import recipe_for
action_items = []
for f in flags:
rec = recipe_for(f)
if not rec:
continue
action_items.append(
f'<li style="margin-bottom:6px"><strong>{_flag_short(f)}:</strong> '
f'{rec.get("what", "")}<br/>'
f'<span style="color:#475569"><strong>Was tun:</strong> '
f'{rec.get("fix_text", "").splitlines()[0][:200]}</span><br/>'
f'<span style="color:#94a3b8;font-size:9px">Quelle: '
f'{rec.get("why", "")[:160]}</span></li>'
)
if action_items:
actions_html = (
f'<details style="margin-top:4px"><summary style="cursor:pointer;'
f'color:#dc2626;font-size:10px">Was muss ich tun? '
f'({len(action_items)} Action{"s" if len(action_items) != 1 else ""})</summary>'
f'<ul style="margin:4px 0 0 14px;padding:0;font-size:10px;color:#1e293b">'
+ "".join(action_items)
+ '</ul></details>'
)
flag_str = ""
if flags:
flag_str = (
f'<div style="font-size:10px;color:#94a3b8;margin-top:2px">'
f'{", ".join(flags[:4])}</div>'
f'{actions_html}'
)
risk = v.get("compliance_risk") or {}
risk_label = risk.get("label") or ""
risk_badge = ""
if risk_label and risk_label != "unklar":
rc = {
"kritisch": ("#dc2626", "#fff"),
"hoch": ("#fecaca", "#991b1b"),
"mittel": ("#fde68a", "#92400e"),
"gering": ("#d1fae5", "#065f46"),
}.get(risk_label, ("#e5e7eb", "#475569"))
risk_badge = (f'<span style="margin-left:6px;padding:1px 5px;border-radius:3px;font-size:9px;'
f'background:{rc[0]};color:{rc[1]}">Risk: {risk_label}</span>')
return (
f'<tr style="border-top:1px solid #e2e8f0">'
f'<td style="padding:6px 8px;color:#1e293b;font-size:11px">'
f'{name}{risk_badge}{flag_str}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{category}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{country}</td>'
f'<td style="padding:6px 8px;text-align:center;color:#475569;font-size:11px">'
f'{n_cookies}</td>'
f'<td style="padding:6px 8px;text-align:center">{opt_status}</td>'
f'<td style="padding:6px 8px;text-align:center">{privacy_status}</td>'
f'<td style="padding:6px 8px;text-align:right;font-weight:600;'
f'color:{score_color};font-size:11px" title="{score_tooltip}">'
f'{score}%<div style="font-size:9px;font-weight:400;color:#94a3b8">'
f'{n_criteria - n_failed}/{n_criteria}</div></td>'
f'</tr>'
)
@@ -0,0 +1,181 @@
"""
P61 "Untergeschobene Cookies"-Erkennung.
Wenn eine Site einen Vendor einbindet (z.B. "Google Tag Manager"), kommen
oft AUTOMATISCH weitere Cookies/Vendors mit, die der Marketing-Manager
nicht aktiv ausgewaehlt hat (DoubleClick-Werbe-IDs ueber GTM, Facebook-
Conversion-API ueber Meta-Pixel, Hotjar-Recordings ueber HubSpot etc.).
Dieses Modul mappt:
Primary-Vendor (eingebunden) -> Implicit-Cookies/Vendors (mitgekommen)
Mit Quellen-Doku aus offiziellen Anbieter-Pages.
"""
from __future__ import annotations
from typing import TypedDict
class ImplicitItem(TypedDict, total=False):
name: str
type: str # "cookie" | "vendor"
category: str # essential/functional/statistics/marketing
why: str # warum kommt das mit
source_url: str # Anbieter-Doku
# Primary-Vendor (lowercase, substring-match) -> Liste implizit mitgeladener Items
VENDOR_PACKAGE_COOKIES: dict[str, list[ImplicitItem]] = {
# Google Tag Manager — laedt typischerweise Google Analytics + Ads
"google tag manager": [
{"name": "_ga", "type": "cookie", "category": "statistics",
"why": "GTM laedt Google Analytics by default mit, sobald ein "
"GA4-Tag konfiguriert ist.",
"source_url": "https://support.google.com/tagmanager/answer/9442095"},
{"name": "_gid", "type": "cookie", "category": "statistics",
"why": "Google Analytics Session-ID, automatisch mit GA.",
"source_url": "https://support.google.com/analytics/answer/11397207"},
{"name": "_gcl_au", "type": "cookie", "category": "marketing",
"why": "Google Ads Conversion-Linker — kommt mit jedem GTM-Container "
"der ein Conversion-Tag enthaelt (z.B. Floodlight, Ads).",
"source_url": "https://support.google.com/google-ads/answer/7521212"},
{"name": "Google Ads", "type": "vendor", "category": "marketing",
"why": "GTM ist Google-Infrastruktur — Google sieht alle Requests "
"ueber GTM (auch wenn nur Analytics konfiguriert ist).",
"source_url": "https://support.google.com/tagmanager/answer/9323295"},
],
# Google Analytics — implizit oft DoubleClick / Ads-Personalization
"google analytics": [
{"name": "_gcl_au", "type": "cookie", "category": "marketing",
"why": "GA4 mit aktivierter Google-Signals (Werbeberichte) setzt "
"Conversion-Linker — auch ohne Ads-Konfiguration.",
"source_url": "https://support.google.com/analytics/answer/9445345"},
{"name": "DSID", "type": "cookie", "category": "marketing",
"why": "DoubleClick-Cookie ueber doubleclick.net — laeuft mit "
"GA4 + Google-Signals automatisch.",
"source_url": "https://policies.google.com/technologies/cookies"},
{"name": "Google Marketing Platform", "type": "vendor", "category": "marketing",
"why": "Mit Google-Signals fliessen aggregierte Daten in Googles "
"Werbeprofil-Datenbank.",
"source_url": "https://policies.google.com/technologies/cookies"},
],
# Meta-Pixel — kommt typischerweise mit Facebook Login + Conversion-API
"meta pixel": [
{"name": "_fbc", "type": "cookie", "category": "marketing",
"why": "Facebook Click-ID — wird vom Meta-Pixel beim ersten Besuch "
"via Werbe-Klick gesetzt.",
"source_url": "https://developers.facebook.com/docs/marketing-api/conversions-api/parameters/fbp-and-fbc"},
{"name": "fr", "type": "cookie", "category": "marketing",
"why": "Facebook Cross-Site-Tracking — wird ueber facebook.com "
"Subdomain gesetzt, auch ohne aktiven FB-Login.",
"source_url": "https://www.facebook.com/policies/cookies/"},
{"name": "Facebook Conversion API", "type": "vendor", "category": "marketing",
"why": "Server-zu-Server Tracking ergaenzt das Browser-Pixel — wird "
"oft via 'Erweiterte Matching'-Setting automatisch aktiviert.",
"source_url": "https://developers.facebook.com/docs/marketing-api/conversions-api/"},
],
"facebook pixel": [
# Alias-Eintrag — verweist auf gleiche implicits
{"name": "_fbc", "type": "cookie", "category": "marketing",
"why": "siehe Meta-Pixel-Eintrag (Aliase).",
"source_url": "https://www.facebook.com/policies/cookies/"},
{"name": "fr", "type": "cookie", "category": "marketing",
"why": "siehe Meta-Pixel-Eintrag (Aliase).",
"source_url": "https://www.facebook.com/policies/cookies/"},
],
# HubSpot — mit jedem Embed kommt Tracking + Chat + Forms
"hubspot": [
{"name": "__hstc", "type": "cookie", "category": "marketing",
"why": "HubSpot-Analytics-Cookie wird beim ersten HubSpot-Tag "
"automatisch gesetzt.",
"source_url": "https://knowledge.hubspot.com/de/privacy-and-consent/what-cookies-does-hubspot-set-in-a-visitor-s-browser"},
{"name": "hubspotutk", "type": "cookie", "category": "marketing",
"why": "User-Token zur seitenuebergreifenden Identifikation.",
"source_url": "https://knowledge.hubspot.com/de/privacy-and-consent/what-cookies-does-hubspot-set-in-a-visitor-s-browser"},
{"name": "HubSpot Chat (Drift / Conversations)", "type": "vendor",
"category": "functional",
"why": "HubSpot CMS aktiviert oft den Chat-Widget by default.",
"source_url": "https://www.hubspot.com/data-privacy/cookies"},
],
# Akamai (CDN/Security) — Bot-Manager-Cookies sind essential, aber Akamai
# selbst hat Web-Performance-Cookies die als statistics gelten koennen.
"akamai": [
{"name": "AKA_A2", "type": "cookie", "category": "functional",
"why": "Akamai Adaptive-Acceleration Performance-Cookie.",
"source_url": "https://techdocs.akamai.com/"},
],
# Adobe Analytics (Marketing Cloud) — laedt Audience-Manager-Cookies
"adobe analytics": [
{"name": "s_cc", "type": "cookie", "category": "statistics",
"why": "Adobe Analytics Session-Cookie.",
"source_url": "https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/cookies.html"},
{"name": "AAM_uuid", "type": "cookie", "category": "marketing",
"why": "Adobe Audience Manager — kommt mit Adobe Analytics-Tag wenn "
"Audience-Sharing aktiviert ist.",
"source_url": "https://experienceleague.adobe.com/docs/audience-manager.html"},
],
# LinkedIn Insight Tag — laedt LinkedIn + AdvertiserSync Cookies
"linkedin insight": [
{"name": "li_sugr", "type": "cookie", "category": "marketing",
"why": "LinkedIn-Browser-ID — wird vom Insight-Tag gesetzt.",
"source_url": "https://www.linkedin.com/legal/l/cookie-table"},
{"name": "AnalyticsSyncHistory", "type": "cookie", "category": "marketing",
"why": "LinkedIn-Cross-Domain-Tracking ueber Insight-Tag.",
"source_url": "https://www.linkedin.com/legal/l/cookie-table"},
],
}
def detect_implicit_cookies(
declared_vendors: list[str],
actual_cookies_set: list[str] | None = None,
) -> list[dict]:
"""Findet untergeschobene Cookies/Vendors.
Args:
declared_vendors: Liste der vom CMP/Banner deklarierten Vendor-Namen.
actual_cookies_set: Optional Cookie-Namen, die tatsaechlich gesetzt
wurden. Wenn gegeben, wird nur reportiert was nicht in der
declared-Liste UND tatsaechlich gesetzt ist.
Returns:
Liste Finding-Dicts mit:
primary_vendor, implicit (ImplicitItem), present_in_actual (bool)
"""
findings: list[dict] = []
actual_lower = {c.lower() for c in (actual_cookies_set or [])}
declared_lower = {v.lower() for v in declared_vendors}
for primary in declared_vendors:
plower = primary.lower()
implicits = []
for key, items in VENDOR_PACKAGE_COOKIES.items():
if key in plower:
implicits.extend(items)
for impl in implicits:
name_lower = impl["name"].lower()
# Skip if user has explicitly declared this implicit vendor
if impl["type"] == "vendor":
if any(name_lower in d for d in declared_lower):
continue
# If actuals provided: only report if cookie really set
present = True
if actual_cookies_set is not None and impl["type"] == "cookie":
present = impl["name"] in actual_cookies_set or any(
impl["name"].lower() in c.lower() for c in actual_cookies_set
)
if not present:
continue
findings.append({
"primary_vendor": primary,
"implicit": impl,
"present_in_actual": present,
})
return findings
@@ -0,0 +1,167 @@
#!/usr/bin/env python3
"""P59c — Seed DACH-Custom-Cookies in compliance.cookie_library.
Ergaenzt Open-Cookie-Database (~2200 Cookies) um DACH-typische Cookies, die in
der OCD fehlen oder schlecht abgedeckt sind:
- Borlabs Cookie (selbst gehostete CMP, sehr verbreitet im DACH-Mittelstand)
- etracker (in DE haeufige Analytics-Alternative zu Google Analytics)
- Matomo / Piwik (self-hosted Analytics)
- Userlike, Userpilot, ChannelEngine (DACH-Tools)
- OEM-typische Custom-Cookies (Mercedes _abck/Akamai, BMW eus5, VW dxa)
Quellen-Doku: jeder Eintrag hat source_url auf Anbieter-Doku (rechtssicher).
Lizenz: Wir publizieren als eigene Sammlung (CC-BY) Texte sind eigene
Formulierungen, Anbieter-Quellen werden referenziert.
"""
from __future__ import annotations
import os
import sys
import psycopg2
# (cookie_name, domain_pattern, vendor_name, actual_category, purpose_de,
# typical_max_age_seconds, source_name, source_url, confidence)
DACH_COOKIES = [
# ── Borlabs Cookie CMP ─────────────────────────────────────────
("borlabs-cookie", "*", "Borlabs GmbH", "essential",
"Speichert die Cookie-Einwilligung des Nutzers (CMP-Funktion).",
31536000, "Borlabs Cookie Doku",
"https://de.borlabs.io/kb/borlabs-cookie/", 0.95),
("borlabs_consent", "*", "Borlabs GmbH", "essential",
"Aelterer Borlabs-Consent-Cookie (Legacy v1).",
31536000, "Borlabs Cookie Doku",
"https://de.borlabs.io/kb/borlabs-cookie/", 0.90),
# ── etracker (DE-Analytics-Alternative) ───────────────────────
("et_oi_v3", "*", "etracker GmbH, Hamburg", "statistics",
"etracker Opt-Out-Indikator (zeigt ob Nutzer dem Tracking widersprochen hat).",
63072000, "etracker Doku",
"https://www.etracker.com/docs/integration-setup/datenschutz/", 0.95),
("et_cust_v3", "*", "etracker GmbH, Hamburg", "statistics",
"etracker Customer-ID fuer wiederkehrende Besucher.",
31536000, "etracker Doku",
"https://www.etracker.com/docs/integration-setup/datenschutz/", 0.95),
# ── Matomo / Piwik (self-hosted Analytics) ────────────────────
("_pk_id", "*", "Matomo (selbst gehostet)", "statistics",
"Matomo Visitor-ID fuer Wiedererkennung von Besuchern (typisch 13 Monate).",
33696000, "Matomo Doku",
"https://matomo.org/faq/general/faq_146/", 0.95),
("_pk_ses", "*", "Matomo (selbst gehostet)", "statistics",
"Matomo Session-ID fuer aktive Besuchersitzungen (30 Minuten).",
1800, "Matomo Doku",
"https://matomo.org/faq/general/faq_146/", 0.95),
("_pk_ref", "*", "Matomo (selbst gehostet)", "statistics",
"Matomo Referrer-Information (woher kam der Besucher).",
15768000, "Matomo Doku",
"https://matomo.org/faq/general/faq_146/", 0.90),
# ── Userlike (DE-Chat) ────────────────────────────────────────
("uslk_e", "*", "Userlike UG, Koeln", "functional",
"Userlike Live-Chat — speichert ob Nutzer mit Chat interagiert hat.",
31536000, "Userlike Privacy",
"https://www.userlike.com/de/privacy-policy", 0.90),
# ── OEM-typische Akamai-Bot-Manager-Cookies ───────────────────
# (Mercedes, BMW, VW, Audi nutzen alle Akamai)
("_abck", "*", "Akamai Technologies", "essential",
"Akamai Bot-Manager Session-Cookie. Erkennt automatisierte Zugriffe (Bots).",
31536000, "Akamai Bot Manager Doku",
"https://techdocs.akamai.com/bot-manager/docs", 0.95),
("bm_sz", "*", "Akamai Technologies", "essential",
"Akamai Bot-Manager — speichert Geraete-Fingerprint fuer Bot-Erkennung.",
14400, "Akamai Bot Manager Doku",
"https://techdocs.akamai.com/bot-manager/docs", 0.95),
("bm_mi", "*", "Akamai Technologies", "essential",
"Akamai Bot-Manager — Mobile-Integrity-Check.",
7200, "Akamai Bot Manager Doku",
"https://techdocs.akamai.com/bot-manager/docs", 0.90),
("ak_bmsc", "*", "Akamai Technologies", "essential",
"Akamai Bot-Manager Score-Cookie.",
7200, "Akamai Bot Manager Doku",
"https://techdocs.akamai.com/bot-manager/docs", 0.90),
# ── Cloudflare Bot-Management ─────────────────────────────────
("__cf_bm", "*", "Cloudflare, Inc.", "essential",
"Cloudflare Bot-Management Token — unterscheidet menschliche Zugriffe von Bots.",
1800, "Cloudflare Doku",
"https://developers.cloudflare.com/bots/concepts/bot-score/", 0.95),
("cf_clearance", "*", "Cloudflare, Inc.", "essential",
"Cloudflare Challenge-Cookie nach erfolgreichem Captcha/JS-Challenge.",
1800, "Cloudflare Doku",
"https://developers.cloudflare.com/fundamentals/reference/policies-compliances/cloudflare-cookies/", 0.95),
# ── Datadome Bot-Manager ──────────────────────────────────────
("datadome", "*", "DataDome SAS, Paris", "essential",
"DataDome Bot-Detection Cookie. Wird auf vielen OEM-Sites mit Akamai parallel eingesetzt.",
31536000, "DataDome Doku",
"https://datadome.co/learning-center/", 0.90),
# ── Usercentrics (CMP — verbreitet in DACH) ───────────────────
("uc_user_interaction", "*", "Usercentrics GmbH", "essential",
"Usercentrics CMP — speichert Banner-Interaktion (Akzeptiert/Abgelehnt).",
31536000, "Usercentrics Doku",
"https://usercentrics.com/de/knowledge-hub/", 0.95),
("uc_settings", "*", "Usercentrics GmbH", "essential",
"Usercentrics CMP — detaillierte Kategorie-Einstellungen pro Vendor.",
31536000, "Usercentrics Doku",
"https://usercentrics.com/de/knowledge-hub/", 0.95),
# ── Cookiebot (CMP) ───────────────────────────────────────────
("CookieConsent", "*", "Cybot A/S (Cookiebot)", "essential",
"Cookiebot CMP — speichert Einwilligungs-Status des Nutzers.",
31536000, "Cookiebot Doku",
"https://www.cookiebot.com/de/datenschutzerklarung/", 0.95),
# ── Cookieyes ─────────────────────────────────────────────────
("cky-consent", "*", "Cookieyes Ltd, UK", "essential",
"Cookieyes CMP — Einwilligungs-Speicherung.",
31536000, "Cookieyes Doku",
"https://www.cookieyes.com/documentation/", 0.90),
# ── HubSpot (verbreitet in DE-B2B) ─────────────────────────────
("__hstc", "*", "HubSpot, Inc.", "marketing",
"HubSpot Analytics — Hauptcookie zur Besucher-Identifikation.",
34128000, "HubSpot Cookies Doku",
"https://knowledge.hubspot.com/de/privacy-and-consent/what-cookies-does-hubspot-set-in-a-visitor-s-browser", 0.95),
("hubspotutk", "*", "HubSpot, Inc.", "marketing",
"HubSpot User-Token — verfolgt Besucher seitenuebergreifend.",
34128000, "HubSpot Cookies Doku",
"https://knowledge.hubspot.com/de/privacy-and-consent/what-cookies-does-hubspot-set-in-a-visitor-s-browser", 0.95),
]
def main() -> int:
dsn = os.environ.get("DATABASE_URL")
if not dsn:
print("DATABASE_URL missing", file=sys.stderr)
return 1
conn = psycopg2.connect(dsn)
cur = conn.cursor()
inserted = 0
for c in DACH_COOKIES:
(name, domain, vendor, cat, purpose_de, max_age,
source_name, source_url, confidence) = c
cur.execute(
"""
INSERT INTO compliance.cookie_library
(cookie_name, domain_pattern, vendor_name,
actual_category, purpose_de,
typical_max_age_seconds, source_name, source_url,
source_license, confidence)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT DO NOTHING
""",
(name, domain, vendor[:200], cat, purpose_de, max_age,
source_name, source_url, "CC-BY-eigene-Sammlung", confidence),
)
inserted += cur.rowcount
conn.commit()
print(f"P59c DACH-Cookies: {inserted}/{len(DACH_COOKIES)} inserted")
return 0
if __name__ == "__main__":
sys.exit(main())