Files
breakpilot-compliance/backend-compliance/compliance/services/cookie_csv_exporter.py
T
Benjamin Admin c908fcd5eb feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".

Step 1 — cookie_library_lookup.py (3 Layer):
  1. Override = cookie_knowledge_db.py + extended (74) für
     Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
  2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
     Database, CC0). actual_category als Wahrheit.
  3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
     wenn ≥3 Sites denselben Cookie melden.

  Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
  Library-Namen ("c", "ID") brauchen exact-match — verhindert
  False-Positive auf "completely_unknown". Trailing-Underscore
  in OCD ("guest_uuid_essential_") wird als implicit-wildcard
  interpretiert.

Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
  - MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
    deklariert essential/erforderlich → Einwilligung wird umgangen
  - LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
  - PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
    / <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
  - MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
  - UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
  - DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
    Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln

  Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
  raus und in 'Marketing' setzen").

Step 3 — cookie_observation_logger.py:
  Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
  compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
  in Layer 3.

Step 4 — cookie_csv_exporter.py:
  cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
  Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
  recommended_action). UTF-8 BOM für Excel.
  ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
  parameter; phase_e ruft mit cookies-full-...csv auf.

Step 5 — mail_render_v2/_vendor_cards.py:
  Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
  Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
  Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.

Step 6 — render_info_box_rechtsrahmen():
  Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
  + § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
  finding-mapping (User-mündigkeit).

Orchestrator + _compose: run_b19 + render_vendor_cards +
  render_info_box_rechtsrahmen ins V2-Layout.

Tests: 28/28 grün (15 lookup + 13 coherence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 23:48:04 +02:00

141 lines
5.3 KiB
Python
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""Vollständiger Cookie-CSV-Export.
Eine Zeile pro deklariertem Cookie, mit:
- Name + Vendor
- Was die Site deklariert (category, lifetime, purpose, country)
- Was die 3-Layer-KB sagt (actual_category, typical_lifetime,
vendor_country, kb_source)
- Alle Findings als FIND_* boolean-Spalten
- recommended_action (1-Zeiler aus dem schwersten Finding)
Output: bytes (UTF-8 CSV mit BOM für Excel-Kompatibilität).
"""
from __future__ import annotations
import csv
import io
import logging
from .cookie_library_lookup import lookup as kb_lookup
logger = logging.getLogger(__name__)
COLUMNS = [
"cookie_name", "vendor_declared", "kb_vendor", "kb_layer",
"category_declared", "category_kb",
"lifetime_declared", "lifetime_kb_typical",
"purpose_declared",
"country_declared", "country_kb",
"optout_kb",
"FIND_marketing_as_essential",
"FIND_lifetime_too_long_for_essential",
"FIND_pseudo_purpose",
"FIND_missing_country",
"FIND_missing_retention",
"FIND_unknown_vendor",
"FIND_duplicate_vendor",
"FIND_third_country_no_mechanism",
"recommended_action",
"source_in_audit",
]
def _action_for(findings_for_cookie: list[dict]) -> str:
"""Pick the action from the highest-severity finding."""
if not findings_for_cookie:
return ""
priority = {"HIGH": 0, "MEDIUM": 1, "LOW": 2, "INFO": 3}
sorted_f = sorted(
findings_for_cookie,
key=lambda f: priority.get((f.get("severity") or "").upper(), 9),
)
return sorted_f[0].get("recommended_action", "") or ""
def build_cookie_csv(state: dict) -> bytes:
"""Iterate cmp_vendors + cookies, write CSV bytes."""
cmp_vendors = state.get("cmp_vendors") or []
coherence_findings = state.get("cookie_coherence_findings") or []
# Index findings by cookie_name for fast lookup
by_cookie: dict[str, list[dict]] = {}
duplicate_vendors: set[str] = set()
for f in coherence_findings:
cname = f.get("cookie_name")
if cname:
by_cookie.setdefault(cname, []).append(f)
if f.get("check_id") == "COOKIE-COHERENCE-DUP-001":
duplicate_vendors.add((f.get("vendor") or "").lower())
buf = io.StringIO()
# Excel-compatible BOM so Umlauts render correctly
buf.write("")
writer = csv.writer(buf, delimiter=";", quoting=csv.QUOTE_MINIMAL)
writer.writerow(COLUMNS)
written = 0
for v in cmp_vendors:
vendor_name = (v.get("name") or "").strip()
vendor_src = (v.get("source") or "").strip()
vendor_country = (v.get("country") or "").strip()
vendor_category = (v.get("category") or "").strip()
for c in (v.get("cookies") or []):
cname = (c.get("name") or "").strip()
if not cname:
continue
declared_cat = (c.get("category") or vendor_category).strip()
declared_purpose = (c.get("purpose") or v.get("purpose") or "").strip()
declared_lifetime = (c.get("duration") or c.get("persistence")
or c.get("expiry") or "").strip()
kb = kb_lookup(cname)
kb_vendor = (kb.get("vendor_name") or kb.get("vendor") or "")
kb_layer = kb.get("_layer") or "unknown"
kb_category = (kb.get("actual_category")
or kb.get("consensus_category") or "")
kb_country = (kb.get("vendor_country") or "")
kb_optout = (kb.get("vendor_opt_out_url") or "")
kb_typical_lifetime = (kb.get("typical_lifetime") or "")
if not kb_typical_lifetime and kb.get("typical_max_age_seconds"):
secs = kb["typical_max_age_seconds"]
if secs:
days = secs / 86400.0
kb_typical_lifetime = (
f"{int(days)} Tage" if days >= 1
else f"{int(secs / 3600)} h" if secs >= 3600
else f"{int(secs / 60)} min"
)
f_cookie = by_cookie.get(cname) or []
check_ids = {fp.get("check_id") for fp in f_cookie}
row = [
cname, vendor_name, kb_vendor, kb_layer,
declared_cat, kb_category,
declared_lifetime, kb_typical_lifetime,
declared_purpose[:300],
vendor_country, kb_country,
kb_optout,
"1" if "COOKIE-COHERENCE-MAE-001" in check_ids else "",
"1" if "COOKIE-COHERENCE-LIFE-001" in check_ids else "",
"1" if "COOKIE-COHERENCE-PURP-001" in check_ids else "",
"1" if "COOKIE-COHERENCE-CTRY-001" in check_ids else "",
"1" if not declared_lifetime else "",
"1" if "COOKIE-COHERENCE-UNK-001" in check_ids else "",
"1" if vendor_name.lower() in duplicate_vendors else "",
"1" if (kb_country
and kb_country.upper() not in
("DE", "EU", "AT", "FR", "NL", "IT", "ES",
"BE", "CH", "IE", "DK", "FI", "SE", "NO")
and not c.get("transfer_mechanism")) else "",
_action_for(f_cookie),
vendor_src,
]
writer.writerow(row)
written += 1
logger.info("cookie-csv export: %d rows", written)
return buf.getvalue().encode("utf-8")