feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
"""Auto-Learning für Cookies: nach jedem Audit alle deklarierten +
|
||||
beobachteten Cookies in compliance.cookie_behavior_audits loggen.
|
||||
|
||||
Cross-Site-Konsens (≥3 Sites mit ähnlichem declared_purpose) macht
|
||||
einen unbekannten Cookie zum Promotion-Kandidaten für die kuratierte
|
||||
BreakPilot-KB. Diese Logik lebt im `cookie_library_lookup._load_auto_learning`.
|
||||
|
||||
Best-Effort: jeder DB-Fehler wird geloggt aber nicht propagiert —
|
||||
ein Logging-Fail soll keinen Audit abbrechen.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from urllib.parse import urlparse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _site_url_from_state(state: dict) -> str:
|
||||
req = state.get("req")
|
||||
if req is None:
|
||||
return ""
|
||||
for d in getattr(req, "documents", []) or []:
|
||||
url = getattr(d, "url", "") or ""
|
||||
if url and "://" in url:
|
||||
p = urlparse(url)
|
||||
return f"{p.scheme}://{p.netloc}"
|
||||
return ""
|
||||
|
||||
|
||||
def log_observations(state: dict) -> dict:
|
||||
"""Persist every (cookie, site, declared) tuple into
|
||||
cookie_behavior_audits. Returns stats dict for logging."""
|
||||
try:
|
||||
from database import SessionLocal
|
||||
from sqlalchemy import text
|
||||
except Exception:
|
||||
return {"logged": 0, "skipped": "no_db"}
|
||||
|
||||
check_id = state.get("check_id") or ""
|
||||
site_url = _site_url_from_state(state)
|
||||
if not site_url:
|
||||
return {"logged": 0, "skipped": "no_site_url"}
|
||||
|
||||
cmp_vendors = state.get("cmp_vendors") or []
|
||||
if not cmp_vendors:
|
||||
return {"logged": 0, "skipped": "no_cmp_vendors"}
|
||||
|
||||
db = SessionLocal()
|
||||
inserted = 0
|
||||
skipped = 0
|
||||
try:
|
||||
for v in cmp_vendors:
|
||||
vendor_name = (v.get("name") or "").strip()
|
||||
for c in (v.get("cookies") or []):
|
||||
cname = (c.get("name") or "").strip()
|
||||
if not cname:
|
||||
skipped += 1
|
||||
continue
|
||||
declared_cat = (c.get("category")
|
||||
or v.get("category") or "").strip()[:50]
|
||||
try:
|
||||
db.execute(
|
||||
text(
|
||||
"INSERT INTO compliance.cookie_behavior_audits "
|
||||
"(check_id, site_url, cookie_name, "
|
||||
"cookie_domain, declared_category, "
|
||||
"observed_max_age_seconds) "
|
||||
"VALUES (:cid, :site, :name, :dom, :cat, :age)"
|
||||
),
|
||||
{
|
||||
"cid": check_id,
|
||||
"site": site_url,
|
||||
"name": cname,
|
||||
"dom": (v.get("domain")
|
||||
or v.get("name") or "")[:200],
|
||||
"cat": declared_cat,
|
||||
"age": None,
|
||||
},
|
||||
)
|
||||
inserted += 1
|
||||
except Exception as e:
|
||||
logger.info("cookie_observations insert skipped %s: %s",
|
||||
cname, str(e)[:120])
|
||||
skipped += 1
|
||||
db.commit()
|
||||
except Exception as e:
|
||||
logger.warning("cookie_observations commit failed: %s", e)
|
||||
finally:
|
||||
db.close()
|
||||
return {"logged": inserted, "skipped": skipped, "site_url": site_url}
|
||||
Reference in New Issue
Block a user