c908fcd5eb
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
141 lines
5.3 KiB
Python
141 lines
5.3 KiB
Python
"""Vollständiger Cookie-CSV-Export.
|
|
|
|
Eine Zeile pro deklariertem Cookie, mit:
|
|
- Name + Vendor
|
|
- Was die Site deklariert (category, lifetime, purpose, country)
|
|
- Was die 3-Layer-KB sagt (actual_category, typical_lifetime,
|
|
vendor_country, kb_source)
|
|
- Alle Findings als FIND_* boolean-Spalten
|
|
- recommended_action (1-Zeiler aus dem schwersten Finding)
|
|
|
|
Output: bytes (UTF-8 CSV mit BOM für Excel-Kompatibilität).
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import csv
|
|
import io
|
|
import logging
|
|
|
|
from .cookie_library_lookup import lookup as kb_lookup
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
COLUMNS = [
|
|
"cookie_name", "vendor_declared", "kb_vendor", "kb_layer",
|
|
"category_declared", "category_kb",
|
|
"lifetime_declared", "lifetime_kb_typical",
|
|
"purpose_declared",
|
|
"country_declared", "country_kb",
|
|
"optout_kb",
|
|
"FIND_marketing_as_essential",
|
|
"FIND_lifetime_too_long_for_essential",
|
|
"FIND_pseudo_purpose",
|
|
"FIND_missing_country",
|
|
"FIND_missing_retention",
|
|
"FIND_unknown_vendor",
|
|
"FIND_duplicate_vendor",
|
|
"FIND_third_country_no_mechanism",
|
|
"recommended_action",
|
|
"source_in_audit",
|
|
]
|
|
|
|
|
|
def _action_for(findings_for_cookie: list[dict]) -> str:
|
|
"""Pick the action from the highest-severity finding."""
|
|
if not findings_for_cookie:
|
|
return ""
|
|
priority = {"HIGH": 0, "MEDIUM": 1, "LOW": 2, "INFO": 3}
|
|
sorted_f = sorted(
|
|
findings_for_cookie,
|
|
key=lambda f: priority.get((f.get("severity") or "").upper(), 9),
|
|
)
|
|
return sorted_f[0].get("recommended_action", "") or ""
|
|
|
|
|
|
def build_cookie_csv(state: dict) -> bytes:
|
|
"""Iterate cmp_vendors + cookies, write CSV bytes."""
|
|
cmp_vendors = state.get("cmp_vendors") or []
|
|
coherence_findings = state.get("cookie_coherence_findings") or []
|
|
|
|
# Index findings by cookie_name for fast lookup
|
|
by_cookie: dict[str, list[dict]] = {}
|
|
duplicate_vendors: set[str] = set()
|
|
for f in coherence_findings:
|
|
cname = f.get("cookie_name")
|
|
if cname:
|
|
by_cookie.setdefault(cname, []).append(f)
|
|
if f.get("check_id") == "COOKIE-COHERENCE-DUP-001":
|
|
duplicate_vendors.add((f.get("vendor") or "").lower())
|
|
|
|
buf = io.StringIO()
|
|
# Excel-compatible BOM so Umlauts render correctly
|
|
buf.write("")
|
|
writer = csv.writer(buf, delimiter=";", quoting=csv.QUOTE_MINIMAL)
|
|
writer.writerow(COLUMNS)
|
|
|
|
written = 0
|
|
for v in cmp_vendors:
|
|
vendor_name = (v.get("name") or "").strip()
|
|
vendor_src = (v.get("source") or "").strip()
|
|
vendor_country = (v.get("country") or "").strip()
|
|
vendor_category = (v.get("category") or "").strip()
|
|
for c in (v.get("cookies") or []):
|
|
cname = (c.get("name") or "").strip()
|
|
if not cname:
|
|
continue
|
|
declared_cat = (c.get("category") or vendor_category).strip()
|
|
declared_purpose = (c.get("purpose") or v.get("purpose") or "").strip()
|
|
declared_lifetime = (c.get("duration") or c.get("persistence")
|
|
or c.get("expiry") or "").strip()
|
|
|
|
kb = kb_lookup(cname)
|
|
kb_vendor = (kb.get("vendor_name") or kb.get("vendor") or "")
|
|
kb_layer = kb.get("_layer") or "unknown"
|
|
kb_category = (kb.get("actual_category")
|
|
or kb.get("consensus_category") or "")
|
|
kb_country = (kb.get("vendor_country") or "")
|
|
kb_optout = (kb.get("vendor_opt_out_url") or "")
|
|
kb_typical_lifetime = (kb.get("typical_lifetime") or "")
|
|
if not kb_typical_lifetime and kb.get("typical_max_age_seconds"):
|
|
secs = kb["typical_max_age_seconds"]
|
|
if secs:
|
|
days = secs / 86400.0
|
|
kb_typical_lifetime = (
|
|
f"{int(days)} Tage" if days >= 1
|
|
else f"{int(secs / 3600)} h" if secs >= 3600
|
|
else f"{int(secs / 60)} min"
|
|
)
|
|
|
|
f_cookie = by_cookie.get(cname) or []
|
|
check_ids = {fp.get("check_id") for fp in f_cookie}
|
|
|
|
row = [
|
|
cname, vendor_name, kb_vendor, kb_layer,
|
|
declared_cat, kb_category,
|
|
declared_lifetime, kb_typical_lifetime,
|
|
declared_purpose[:300],
|
|
vendor_country, kb_country,
|
|
kb_optout,
|
|
"1" if "COOKIE-COHERENCE-MAE-001" in check_ids else "",
|
|
"1" if "COOKIE-COHERENCE-LIFE-001" in check_ids else "",
|
|
"1" if "COOKIE-COHERENCE-PURP-001" in check_ids else "",
|
|
"1" if "COOKIE-COHERENCE-CTRY-001" in check_ids else "",
|
|
"1" if not declared_lifetime else "",
|
|
"1" if "COOKIE-COHERENCE-UNK-001" in check_ids else "",
|
|
"1" if vendor_name.lower() in duplicate_vendors else "",
|
|
"1" if (kb_country
|
|
and kb_country.upper() not in
|
|
("DE", "EU", "AT", "FR", "NL", "IT", "ES",
|
|
"BE", "CH", "IE", "DK", "FI", "SE", "NO")
|
|
and not c.get("transfer_mechanism")) else "",
|
|
_action_for(f_cookie),
|
|
vendor_src,
|
|
]
|
|
writer.writerow(row)
|
|
written += 1
|
|
|
|
logger.info("cookie-csv export: %d rows", written)
|
|
return buf.getvalue().encode("utf-8")
|