feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -23,6 +23,10 @@ from ._blocks_findings import (
|
||||
render_internal_reminders,
|
||||
render_manual_review,
|
||||
)
|
||||
from ._vendor_cards import (
|
||||
render_info_box_rechtsrahmen,
|
||||
render_vendor_cards,
|
||||
)
|
||||
from ._legacy_wrappers import render_all_legacy
|
||||
from ._style import page_close, page_open
|
||||
|
||||
@@ -33,7 +37,12 @@ def compose_v2(state: dict) -> str:
|
||||
parts = [
|
||||
page_open(site),
|
||||
render_header(state),
|
||||
render_info_box_rechtsrahmen(),
|
||||
render_toc(state),
|
||||
render_vendor_cards(
|
||||
state.get("cmp_vendors") or [],
|
||||
state.get("cookie_coherence_findings") or [],
|
||||
),
|
||||
render_critical(state),
|
||||
render_manual_review(state),
|
||||
render_internal_reminders(state),
|
||||
@@ -60,6 +69,8 @@ def compose_v2(state: dict) -> str:
|
||||
state.get("audit_walk_html", ""),
|
||||
# B18 Impressum-Specialist-Agent (Pattern + LLM)
|
||||
state.get("impressum_agent_html", ""),
|
||||
# B19 Cookie-Coherence-Check (Salesforce-as-essential etc.)
|
||||
state.get("cookie_coherence_html", ""),
|
||||
# Browser-Matrix (Stage 1.c)
|
||||
state.get("browser_matrix_html", ""),
|
||||
# All legacy build_*_html() wrapped in V2 sections — preserves
|
||||
|
||||
@@ -0,0 +1,190 @@
|
||||
"""Vendor-Karten-Renderer für die Audit-Mail.
|
||||
|
||||
Statt 740 Cookie-Rows aggregieren wir nach VENDOR. Pro Vendor eine
|
||||
Karte mit:
|
||||
- Vendor-Name + Sitzland (deklariert + KB)
|
||||
- Kategorie deklariert vs KB
|
||||
- Cookie-Count + Issue-Count
|
||||
- 1-2 Beispiel-Cookies (mit auffälligster Lifetime)
|
||||
- Top-Issue-Typen als Tags
|
||||
|
||||
Sortiert nach Issue-Severity. Top 30 in der Mail, Rest in CSV.
|
||||
|
||||
Die volle 740-Cookies-Tabelle bleibt im CSV-Anhang (cookies-full.csv).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections import defaultdict
|
||||
from html import escape as h
|
||||
|
||||
from ._cookie_inventory import _country_third
|
||||
|
||||
|
||||
def _build_vendor_summary(cmp_vendors: list[dict],
|
||||
coherence_findings: list[dict]) -> list[dict]:
|
||||
"""Aggregate cookies by vendor, score by issue severity."""
|
||||
by_vendor: dict[str, dict] = {}
|
||||
# Findings index per vendor
|
||||
findings_per_vendor: dict[str, list[dict]] = defaultdict(list)
|
||||
for f in coherence_findings:
|
||||
v = (f.get("vendor") or "").lower()
|
||||
if v:
|
||||
findings_per_vendor[v].append(f)
|
||||
|
||||
for v in cmp_vendors:
|
||||
name = (v.get("name") or "").strip() or "Unbekannt"
|
||||
key = name.lower()
|
||||
entry = by_vendor.setdefault(key, {
|
||||
"name": name,
|
||||
"country": (v.get("country") or "").strip(),
|
||||
"category": (v.get("category") or "").strip(),
|
||||
"cookies": [],
|
||||
})
|
||||
for c in (v.get("cookies") or []):
|
||||
entry["cookies"].append(c)
|
||||
|
||||
out: list[dict] = []
|
||||
sev_score = {"HIGH": 3, "MEDIUM": 2, "LOW": 1, "INFO": 0}
|
||||
for key, e in by_vendor.items():
|
||||
fs = findings_per_vendor.get(key, [])
|
||||
score = sum(sev_score.get((f.get("severity") or "").upper(), 0)
|
||||
for f in fs)
|
||||
# Pick up to 2 example cookies: prefer those WITH findings
|
||||
finding_cookies = {f.get("cookie_name") for f in fs
|
||||
if f.get("cookie_name")}
|
||||
examples = [c for c in e["cookies"]
|
||||
if (c.get("name") or "") in finding_cookies][:2]
|
||||
if len(examples) < 2:
|
||||
for c in e["cookies"]:
|
||||
if len(examples) >= 2:
|
||||
break
|
||||
if c not in examples:
|
||||
examples.append(c)
|
||||
# Issue-types as tags
|
||||
issue_types = sorted({
|
||||
(f.get("check_id") or "").split("-")[-1]
|
||||
for f in fs
|
||||
if f.get("check_id")
|
||||
})
|
||||
out.append({
|
||||
"name": e["name"],
|
||||
"country": e["country"],
|
||||
"category": e["category"],
|
||||
"cookie_count": len(e["cookies"]),
|
||||
"issue_count": len(fs),
|
||||
"issue_score": score,
|
||||
"issue_types": issue_types,
|
||||
"examples": examples,
|
||||
})
|
||||
|
||||
# Sort: issue_score DESC, then cookie_count DESC
|
||||
out.sort(key=lambda r: (-r["issue_score"], -r["cookie_count"]))
|
||||
return out
|
||||
|
||||
|
||||
def render_vendor_cards(cmp_vendors: list[dict],
|
||||
coherence_findings: list[dict],
|
||||
top_n: int = 30) -> str:
|
||||
summary = _build_vendor_summary(cmp_vendors, coherence_findings)
|
||||
if not summary:
|
||||
return ""
|
||||
|
||||
total_vendors = len(summary)
|
||||
total_cookies = sum(s["cookie_count"] for s in summary)
|
||||
total_issues = sum(s["issue_count"] for s in summary)
|
||||
cards = []
|
||||
for s in summary[:top_n]:
|
||||
sev_color = ("#dc2626" if s["issue_score"] >= 6 else
|
||||
"#f59e0b" if s["issue_score"] >= 2 else "#64748b")
|
||||
country_disp = s["country"] or "—"
|
||||
country_tag = ""
|
||||
if s["country"]:
|
||||
_disp, is_third, _adq = _country_third(s["country"])
|
||||
if is_third:
|
||||
country_tag = (
|
||||
" <span style='font-size:10px;color:#dc2626;"
|
||||
"font-weight:700;'>[Drittland]</span>"
|
||||
)
|
||||
issue_chips = "".join(
|
||||
f"<span style='display:inline-block;background:#fee2e2;"
|
||||
f"color:#7f1d1d;font-size:10px;padding:1px 6px;border-radius:999px;"
|
||||
f"margin-right:3px;'>{h(t)}</span>"
|
||||
for t in s["issue_types"][:4]
|
||||
)
|
||||
examples_html = ""
|
||||
for c in s["examples"]:
|
||||
cname = c.get("name") or "?"
|
||||
lifetime = (c.get("duration") or c.get("persistence")
|
||||
or c.get("expiry") or "—")
|
||||
examples_html += (
|
||||
f"<div style='font-size:11px;color:#475569;"
|
||||
f"font-family:monospace;'>"
|
||||
f"• <code>{h(cname)}</code> "
|
||||
f"<span style='color:#94a3b8;'>(Lifetime: {h(str(lifetime))})</span>"
|
||||
"</div>"
|
||||
)
|
||||
|
||||
cards.append(
|
||||
f"<div style='margin:10px 0;padding:12px;background:#fff;"
|
||||
f"border-left:3px solid {sev_color};border-radius:4px;'>"
|
||||
f"<div style='display:flex;justify-content:space-between;"
|
||||
f"align-items:baseline;'>"
|
||||
f"<div><strong style='font-size:14px;'>{h(s['name'])}</strong>"
|
||||
f" <span style='font-size:11px;color:#64748b;'>"
|
||||
f"{country_disp}{country_tag}</span></div>"
|
||||
f"<div style='font-size:11px;color:#475569;'>"
|
||||
f"{s['cookie_count']} Cookies · "
|
||||
f"<strong style='color:{sev_color};'>{s['issue_count']}</strong> "
|
||||
f"Issues</div>"
|
||||
f"</div>"
|
||||
f"<div style='margin-top:4px;'>{issue_chips}</div>"
|
||||
f"<div style='margin-top:6px;'>{examples_html}</div>"
|
||||
"</div>"
|
||||
)
|
||||
|
||||
rest_note = ""
|
||||
if len(summary) > top_n:
|
||||
rest_note = (
|
||||
f"<p style='font-size:12px;color:#64748b;margin-top:8px;'>"
|
||||
f"<em>… und {len(summary)-top_n} weitere Vendoren — "
|
||||
f"vollständige Liste in <code>cookies-full-*.csv</code> "
|
||||
f"im ZIP-Anhang.</em></p>"
|
||||
)
|
||||
|
||||
return (
|
||||
"<div style='margin:24px 0;padding:16px;border-left:4px solid #0f766e;"
|
||||
"background:#f0fdfa;border-radius:4px;'>"
|
||||
"<h2 style='margin:0 0 8px;color:#134e4a;font-size:16px;'>"
|
||||
f"🏷️ Vendor-Übersicht ({total_vendors} Vendoren · "
|
||||
f"{total_cookies} Cookies · {total_issues} Issues)"
|
||||
"</h2>"
|
||||
"<p style='margin:0 0 8px;font-size:12px;color:#475569;'>"
|
||||
"Sortiert nach Issue-Severity. Pro Vendor: 1-2 Beispielcookies + "
|
||||
"Issue-Tags. Volle Cookie×Finding-Matrix in CSV."
|
||||
"</p>"
|
||||
+ "".join(cards) + rest_note + "</div>"
|
||||
)
|
||||
|
||||
|
||||
def render_info_box_rechtsrahmen() -> str:
|
||||
"""Generic legal-frame info box. Always shown in V2 mail header."""
|
||||
return (
|
||||
"<div style='margin:16px 0;padding:14px;border:1px solid #e2e8f0;"
|
||||
"background:#f8fafc;border-radius:4px;font-size:12px;"
|
||||
"color:#475569;line-height:1.5;'>"
|
||||
"<strong style='color:#1e293b;'>Rechtsrahmen dieser Analyse</strong>"
|
||||
"<ul style='margin:6px 0 0 18px;padding:0;'>"
|
||||
"<li><strong>DSGVO Art. 13 Abs. 1 lit. c</strong> — konkrete "
|
||||
"Zweckangabe pro Cookie / Verarbeitung.</li>"
|
||||
"<li><strong>§ 25 Abs. 1 TDDDG</strong> — Einwilligung für jeden "
|
||||
"nicht-technisch-erforderlichen Cookie.</li>"
|
||||
"<li><strong>DSGVO Art. 5 Abs. 1 lit. c</strong> — Datenminimierung "
|
||||
"(Lifetime + Reichweite).</li>"
|
||||
"<li><strong>§ 5 UWG</strong> — irreführende geschäftliche Handlung "
|
||||
"(falsche Kategorisierung als 'erforderlich').</li>"
|
||||
"<li><strong>§ 30/130 OWiG</strong> — persönliche Verantwortung "
|
||||
"der Geschäftsführung.</li>"
|
||||
"</ul>"
|
||||
"</div>"
|
||||
)
|
||||
Reference in New Issue
Block a user