feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV

Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".

Step 1 — cookie_library_lookup.py (3 Layer):
  1. Override = cookie_knowledge_db.py + extended (74) für
     Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
  2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
     Database, CC0). actual_category als Wahrheit.
  3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
     wenn ≥3 Sites denselben Cookie melden.

  Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
  Library-Namen ("c", "ID") brauchen exact-match — verhindert
  False-Positive auf "completely_unknown". Trailing-Underscore
  in OCD ("guest_uuid_essential_") wird als implicit-wildcard
  interpretiert.

Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
  - MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
    deklariert essential/erforderlich → Einwilligung wird umgangen
  - LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
  - PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
    / <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
  - MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
  - UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
  - DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
    Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln

  Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
  raus und in 'Marketing' setzen").

Step 3 — cookie_observation_logger.py:
  Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
  compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
  in Layer 3.

Step 4 — cookie_csv_exporter.py:
  cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
  Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
  recommended_action). UTF-8 BOM für Excel.
  ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
  parameter; phase_e ruft mit cookies-full-...csv auf.

Step 5 — mail_render_v2/_vendor_cards.py:
  Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
  Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
  Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.

Step 6 — render_info_box_rechtsrahmen():
  Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
  + § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
  finding-mapping (User-mündigkeit).

Orchestrator + _compose: run_b19 + render_vendor_cards +
  render_info_box_rechtsrahmen ins V2-Layout.

Tests: 28/28 grün (15 lookup + 13 coherence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-07 23:48:04 +02:00
parent 0b29d1fada
commit c908fcd5eb
12 changed files with 1364 additions and 1 deletions
@@ -23,6 +23,10 @@ from ._blocks_findings import (
render_internal_reminders,
render_manual_review,
)
from ._vendor_cards import (
render_info_box_rechtsrahmen,
render_vendor_cards,
)
from ._legacy_wrappers import render_all_legacy
from ._style import page_close, page_open
@@ -33,7 +37,12 @@ def compose_v2(state: dict) -> str:
parts = [
page_open(site),
render_header(state),
render_info_box_rechtsrahmen(),
render_toc(state),
render_vendor_cards(
state.get("cmp_vendors") or [],
state.get("cookie_coherence_findings") or [],
),
render_critical(state),
render_manual_review(state),
render_internal_reminders(state),
@@ -60,6 +69,8 @@ def compose_v2(state: dict) -> str:
state.get("audit_walk_html", ""),
# B18 Impressum-Specialist-Agent (Pattern + LLM)
state.get("impressum_agent_html", ""),
# B19 Cookie-Coherence-Check (Salesforce-as-essential etc.)
state.get("cookie_coherence_html", ""),
# Browser-Matrix (Stage 1.c)
state.get("browser_matrix_html", ""),
# All legacy build_*_html() wrapped in V2 sections — preserves
@@ -0,0 +1,190 @@
"""Vendor-Karten-Renderer für die Audit-Mail.
Statt 740 Cookie-Rows aggregieren wir nach VENDOR. Pro Vendor eine
Karte mit:
- Vendor-Name + Sitzland (deklariert + KB)
- Kategorie deklariert vs KB
- Cookie-Count + Issue-Count
- 1-2 Beispiel-Cookies (mit auffälligster Lifetime)
- Top-Issue-Typen als Tags
Sortiert nach Issue-Severity. Top 30 in der Mail, Rest in CSV.
Die volle 740-Cookies-Tabelle bleibt im CSV-Anhang (cookies-full.csv).
"""
from __future__ import annotations
from collections import defaultdict
from html import escape as h
from ._cookie_inventory import _country_third
def _build_vendor_summary(cmp_vendors: list[dict],
coherence_findings: list[dict]) -> list[dict]:
"""Aggregate cookies by vendor, score by issue severity."""
by_vendor: dict[str, dict] = {}
# Findings index per vendor
findings_per_vendor: dict[str, list[dict]] = defaultdict(list)
for f in coherence_findings:
v = (f.get("vendor") or "").lower()
if v:
findings_per_vendor[v].append(f)
for v in cmp_vendors:
name = (v.get("name") or "").strip() or "Unbekannt"
key = name.lower()
entry = by_vendor.setdefault(key, {
"name": name,
"country": (v.get("country") or "").strip(),
"category": (v.get("category") or "").strip(),
"cookies": [],
})
for c in (v.get("cookies") or []):
entry["cookies"].append(c)
out: list[dict] = []
sev_score = {"HIGH": 3, "MEDIUM": 2, "LOW": 1, "INFO": 0}
for key, e in by_vendor.items():
fs = findings_per_vendor.get(key, [])
score = sum(sev_score.get((f.get("severity") or "").upper(), 0)
for f in fs)
# Pick up to 2 example cookies: prefer those WITH findings
finding_cookies = {f.get("cookie_name") for f in fs
if f.get("cookie_name")}
examples = [c for c in e["cookies"]
if (c.get("name") or "") in finding_cookies][:2]
if len(examples) < 2:
for c in e["cookies"]:
if len(examples) >= 2:
break
if c not in examples:
examples.append(c)
# Issue-types as tags
issue_types = sorted({
(f.get("check_id") or "").split("-")[-1]
for f in fs
if f.get("check_id")
})
out.append({
"name": e["name"],
"country": e["country"],
"category": e["category"],
"cookie_count": len(e["cookies"]),
"issue_count": len(fs),
"issue_score": score,
"issue_types": issue_types,
"examples": examples,
})
# Sort: issue_score DESC, then cookie_count DESC
out.sort(key=lambda r: (-r["issue_score"], -r["cookie_count"]))
return out
def render_vendor_cards(cmp_vendors: list[dict],
coherence_findings: list[dict],
top_n: int = 30) -> str:
summary = _build_vendor_summary(cmp_vendors, coherence_findings)
if not summary:
return ""
total_vendors = len(summary)
total_cookies = sum(s["cookie_count"] for s in summary)
total_issues = sum(s["issue_count"] for s in summary)
cards = []
for s in summary[:top_n]:
sev_color = ("#dc2626" if s["issue_score"] >= 6 else
"#f59e0b" if s["issue_score"] >= 2 else "#64748b")
country_disp = s["country"] or ""
country_tag = ""
if s["country"]:
_disp, is_third, _adq = _country_third(s["country"])
if is_third:
country_tag = (
" <span style='font-size:10px;color:#dc2626;"
"font-weight:700;'>[Drittland]</span>"
)
issue_chips = "".join(
f"<span style='display:inline-block;background:#fee2e2;"
f"color:#7f1d1d;font-size:10px;padding:1px 6px;border-radius:999px;"
f"margin-right:3px;'>{h(t)}</span>"
for t in s["issue_types"][:4]
)
examples_html = ""
for c in s["examples"]:
cname = c.get("name") or "?"
lifetime = (c.get("duration") or c.get("persistence")
or c.get("expiry") or "")
examples_html += (
f"<div style='font-size:11px;color:#475569;"
f"font-family:monospace;'>"
f"• <code>{h(cname)}</code> "
f"<span style='color:#94a3b8;'>(Lifetime: {h(str(lifetime))})</span>"
"</div>"
)
cards.append(
f"<div style='margin:10px 0;padding:12px;background:#fff;"
f"border-left:3px solid {sev_color};border-radius:4px;'>"
f"<div style='display:flex;justify-content:space-between;"
f"align-items:baseline;'>"
f"<div><strong style='font-size:14px;'>{h(s['name'])}</strong>"
f" <span style='font-size:11px;color:#64748b;'>"
f"{country_disp}{country_tag}</span></div>"
f"<div style='font-size:11px;color:#475569;'>"
f"{s['cookie_count']} Cookies · "
f"<strong style='color:{sev_color};'>{s['issue_count']}</strong> "
f"Issues</div>"
f"</div>"
f"<div style='margin-top:4px;'>{issue_chips}</div>"
f"<div style='margin-top:6px;'>{examples_html}</div>"
"</div>"
)
rest_note = ""
if len(summary) > top_n:
rest_note = (
f"<p style='font-size:12px;color:#64748b;margin-top:8px;'>"
f"<em>… und {len(summary)-top_n} weitere Vendoren — "
f"vollständige Liste in <code>cookies-full-*.csv</code> "
f"im ZIP-Anhang.</em></p>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid #0f766e;"
"background:#f0fdfa;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#134e4a;font-size:16px;'>"
f"🏷️ Vendor-Übersicht ({total_vendors} Vendoren · "
f"{total_cookies} Cookies · {total_issues} Issues)"
"</h2>"
"<p style='margin:0 0 8px;font-size:12px;color:#475569;'>"
"Sortiert nach Issue-Severity. Pro Vendor: 1-2 Beispielcookies + "
"Issue-Tags. Volle Cookie×Finding-Matrix in CSV."
"</p>"
+ "".join(cards) + rest_note + "</div>"
)
def render_info_box_rechtsrahmen() -> str:
"""Generic legal-frame info box. Always shown in V2 mail header."""
return (
"<div style='margin:16px 0;padding:14px;border:1px solid #e2e8f0;"
"background:#f8fafc;border-radius:4px;font-size:12px;"
"color:#475569;line-height:1.5;'>"
"<strong style='color:#1e293b;'>Rechtsrahmen dieser Analyse</strong>"
"<ul style='margin:6px 0 0 18px;padding:0;'>"
"<li><strong>DSGVO Art. 13 Abs. 1 lit. c</strong> — konkrete "
"Zweckangabe pro Cookie / Verarbeitung.</li>"
"<li><strong>§ 25 Abs. 1 TDDDG</strong> — Einwilligung für jeden "
"nicht-technisch-erforderlichen Cookie.</li>"
"<li><strong>DSGVO Art. 5 Abs. 1 lit. c</strong> — Datenminimierung "
"(Lifetime + Reichweite).</li>"
"<li><strong>§ 5 UWG</strong> — irreführende geschäftliche Handlung "
"(falsche Kategorisierung als 'erforderlich').</li>"
"<li><strong>§ 30/130 OWiG</strong> — persönliche Verantwortung "
"der Geschäftsführung.</li>"
"</ul>"
"</div>"
)