081e4f057a
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 55s
CI / iace-gt-coverage (push) Successful in 25s
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
ZENTRALER USP: cookie_compliance_audit.py vergleicht 3 Quellen * DEKLARIERT in Cookie-Richtlinie (parse_cookie_table + parse_flat) * TATSAECHLICH im Browser geladen (banner_result.phases.after_accept) * LIBRARY-Metadaten (cookie_library lookup) Liefert 3 Listen mit Compliance-Verdict: * compliant (deklariert UND geladen) — gruener Block * undeclared_in_browser (geladen NICHT deklariert) — ROTER HIGH-Block → Art. 13(1)(c) DSGVO + § 25 TDDDG Verstoss * declared_not_loaded (deklariert NICHT geladen) — gelber Hinweis → Tabelle moeglicherweise veraltet parse_cookie_table erweitert um Block-Format (5 Zeilen pro Cookie wie beim User-Copy aus VW). Findet 35+ Cookies aus Copy-Paste statt 0. vendor_normalizer.py: 50+ Aliases (Google-Familie, Adobe-Familie, Trade Desk, AdForm, ...) + Garbage-Filter (URLs, leere Strings, 'click to select', 'Mehrere OEMs'). Mergt cookies-Listen beim Dedup. _guess_vendor erweitert: Adobe-Familie (s_ecid/AMCV/demdex/mbox/...), Trade Desk (TDID/TDCPM/TTDOptOut), AdForm (uid/cid/otsid), Salesforce LiveAgent, etracker, Akamai, EDAA. audit_quality_checks: vendor-thin-Threshold jetzt dynamisch nach Cookie-Doc-Wörter (3k→10 / 6k→20 / 10k→30 / 15k+→40). VW-Test-Fixture: tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt (36-Cookie-Sample fuer Regression-Tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
222 lines
8.9 KiB
Python
222 lines
8.9 KiB
Python
"""
|
|
Cookie-Compliance-Audit — 3-Quellen-Vergleich.
|
|
|
|
DAS ist der eigentliche Mehrwert des Tools:
|
|
* A. Was in der Cookie-Richtlinie DEKLARIERT ist (Text-Parse)
|
|
* B. Was im Browser TATSAECHLICH GELADEN wurde (after_accept)
|
|
* C. Was unsere LIBRARY ueber den Cookie weiss (Vendor, Kategorie)
|
|
|
|
Daraus 3 Listen:
|
|
1. ✓ deklariert + geladen + library-bekannt → compliant
|
|
2. ❌ geladen aber NICHT deklariert → HIGH-Verstoss (Art. 13(1)(c) DSGVO)
|
|
3. ⚠️ deklariert aber NICHT geladen → Tabelle veraltet (LOW)
|
|
4. 🔍 deklariert + Library-Kategorie weicht ab → Pruefanlass
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import re
|
|
from typing import Iterable
|
|
|
|
from sqlalchemy import text as sa_text
|
|
from sqlalchemy.orm import Session
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
def _normalize_cookie_name(name: str) -> str:
|
|
"""Wildcard-Cookies wie 'AMCV_*', 'pm_sess_NNN' werden auf Prefix
|
|
reduziert damit '_ga' und '_ga_GTM-XXX' als ein Cookie zaehlen."""
|
|
if not name:
|
|
return ""
|
|
s = name.strip()
|
|
# AMCV_*, sc_v44, etc.
|
|
s = re.sub(r"[<\[].*?[>\]]", "", s) # entferne <ID>, [...]
|
|
s = s.rstrip("*").rstrip("_")
|
|
s = re.sub(r"_NNN$|_\d+$", "", s)
|
|
return s.lower()
|
|
|
|
|
|
def _extract_declared_cookies(cookie_doc_text: str | None) -> set[str]:
|
|
"""Liest Cookie-Namen aus dem Cookie-Richtlinien-Text.
|
|
Nutzt zuerst parse_cookie_table (Block/Tab-Format), dann
|
|
parse_flat_cookie_text (Anchor-Pattern).
|
|
"""
|
|
if not cookie_doc_text:
|
|
return set()
|
|
declared: set[str] = set()
|
|
try:
|
|
from compliance.services.cookies_table_parser import (
|
|
parse_cookie_table, parse_flat_cookie_text,
|
|
)
|
|
for v in parse_cookie_table(cookie_doc_text):
|
|
for c in (v.get("cookies") or []):
|
|
if isinstance(c, dict) and c.get("name"):
|
|
declared.add(_normalize_cookie_name(c["name"]))
|
|
for v in parse_flat_cookie_text(cookie_doc_text):
|
|
for c in (v.get("cookies") or []):
|
|
if isinstance(c, dict) and c.get("name"):
|
|
declared.add(_normalize_cookie_name(c["name"]))
|
|
except Exception as e:
|
|
logger.warning("declared-cookie-extract failed: %s", e)
|
|
return {n for n in declared if n}
|
|
|
|
|
|
def _extract_browser_cookies(banner_result: dict | None) -> set[str]:
|
|
"""Liest Cookie-Namen aus banner_result.phases.after_accept.cookies."""
|
|
out: set[str] = set()
|
|
if not isinstance(banner_result, dict):
|
|
return out
|
|
phases = banner_result.get("phases") or {}
|
|
for ph_name in ("after_accept", "before_consent", "after_reject"):
|
|
ph = phases.get(ph_name) or {}
|
|
if not isinstance(ph, dict):
|
|
continue
|
|
for c in (ph.get("cookies") or []):
|
|
if isinstance(c, str):
|
|
out.add(_normalize_cookie_name(c))
|
|
elif isinstance(c, dict) and c.get("name"):
|
|
out.add(_normalize_cookie_name(c["name"]))
|
|
return {n for n in out if n}
|
|
|
|
|
|
def _lookup_library(db: Session, names: Iterable[str]) -> dict[str, dict]:
|
|
"""Liefert {normalized_name: {category, vendor}} aus cookie_library."""
|
|
nl = [n for n in names if n]
|
|
if not nl:
|
|
return {}
|
|
try:
|
|
rows = db.execute(sa_text(
|
|
"SELECT cookie_name, actual_category, vendor_name "
|
|
"FROM compliance.cookie_library "
|
|
"WHERE LOWER(cookie_name) = ANY(:lc)"
|
|
), {"lc": nl}).fetchall()
|
|
return {r[0].lower(): {"category": r[1], "vendor": r[2]} for r in rows}
|
|
except Exception as e:
|
|
logger.warning("library lookup failed: %s", e)
|
|
return {}
|
|
|
|
|
|
def audit_cookie_compliance(
|
|
db: Session | None,
|
|
cookie_doc_text: str | None,
|
|
banner_result: dict | None,
|
|
) -> dict:
|
|
"""Hauptfunktion: liefert dict mit 4 Listen + counts."""
|
|
declared = _extract_declared_cookies(cookie_doc_text)
|
|
browser = _extract_browser_cookies(banner_result)
|
|
|
|
all_names = declared | browser
|
|
library = _lookup_library(db, all_names) if db else {}
|
|
|
|
declared_only = declared - browser
|
|
browser_only = browser - declared
|
|
both = declared & browser
|
|
|
|
return {
|
|
"declared_count": len(declared),
|
|
"browser_count": len(browser),
|
|
"library_count": len(library),
|
|
"compliant": sorted(both),
|
|
"undeclared_in_browser": sorted(browser_only),
|
|
"declared_not_loaded": sorted(declared_only),
|
|
"library_metadata": library,
|
|
"high_findings": len(browser_only),
|
|
"low_findings": len(declared_only),
|
|
}
|
|
|
|
|
|
def build_cookie_audit_block_html(audit: dict) -> str:
|
|
"""Rendert den 3-Spalten-Vergleichs-Block in die Mail."""
|
|
if not audit:
|
|
return ""
|
|
n_dec = audit.get("declared_count", 0)
|
|
n_brw = audit.get("browser_count", 0)
|
|
n_undecl = len(audit.get("undeclared_in_browser") or [])
|
|
n_dec_only = len(audit.get("declared_not_loaded") or [])
|
|
n_both = len(audit.get("compliant") or [])
|
|
|
|
sev_color = "#dc2626" if n_undecl else "#16a34a"
|
|
|
|
undecl_html = ""
|
|
if audit.get("undeclared_in_browser"):
|
|
undecl_html = (
|
|
'<div style="margin-top:10px;padding:10px 12px;background:#fee2e2;'
|
|
'border:1px solid #fecaca;border-radius:6px">'
|
|
f'<strong style="color:#991b1b">❌ {n_undecl} Cookie'
|
|
f'{"s" if n_undecl != 1 else ""} im Browser geladen, '
|
|
'aber NICHT in der Cookie-Richtlinie deklariert:</strong>'
|
|
'<div style="font-family:monospace;font-size:10px;color:#7f1d1d;'
|
|
'margin-top:6px;max-height:200px;overflow:auto">'
|
|
+ ", ".join(audit["undeclared_in_browser"][:50])
|
|
+ (f' ... +{n_undecl - 50} weitere'
|
|
if n_undecl > 50 else '') +
|
|
'</div>'
|
|
'<div style="font-size:10px;color:#7f1d1d;margin-top:4px;'
|
|
'font-style:italic">Art. 13(1)(c) DSGVO + § 25 TDDDG — '
|
|
'die Empfaengerliste muss vollstaendig sein. Diese Cookies '
|
|
'sind potenziell ungenannte Verarbeitungen.</div>'
|
|
'</div>'
|
|
)
|
|
|
|
dec_only_html = ""
|
|
if audit.get("declared_not_loaded"):
|
|
dec_only_html = (
|
|
'<div style="margin-top:10px;padding:10px 12px;background:#fef3c7;'
|
|
'border:1px solid #fde68a;border-radius:6px">'
|
|
f'<strong style="color:#92400e">⚠️ {n_dec_only} Cookie'
|
|
f'{"s" if n_dec_only != 1 else ""} in der Richtlinie '
|
|
'deklariert, aber bei diesem Audit NICHT im Browser gesehen:</strong>'
|
|
'<div style="font-family:monospace;font-size:10px;color:#78350f;'
|
|
'margin-top:6px;max-height:200px;overflow:auto">'
|
|
+ ", ".join(audit["declared_not_loaded"][:50])
|
|
+ (f' ... +{n_dec_only - 50} weitere'
|
|
if n_dec_only > 50 else '') +
|
|
'</div>'
|
|
'<div style="font-size:10px;color:#78350f;margin-top:4px;'
|
|
'font-style:italic">Kein direkter Verstoss — die Cookies '
|
|
'koennen nur in bestimmten User-Journeys / Geo-Regionen / '
|
|
'eingeloggten Zustaenden geladen werden. Empfehlung: '
|
|
'pruefen ob die Cookie-Richtlinie veraltet ist.</div>'
|
|
'</div>'
|
|
)
|
|
|
|
compliant_html = ""
|
|
if audit.get("compliant"):
|
|
compliant_html = (
|
|
'<div style="margin-top:10px;padding:10px 12px;background:#dcfce7;'
|
|
'border:1px solid #bbf7d0;border-radius:6px">'
|
|
f'<strong style="color:#166534">✓ {n_both} Cookie'
|
|
f'{"s" if n_both != 1 else ""} sowohl deklariert als auch geladen '
|
|
'(compliant):</strong>'
|
|
'<div style="font-family:monospace;font-size:10px;color:#14532d;'
|
|
'margin-top:6px;max-height:150px;overflow:auto">'
|
|
+ ", ".join(audit["compliant"][:50])
|
|
+ (f' ... +{n_both - 50} weitere'
|
|
if n_both > 50 else '') +
|
|
'</div>'
|
|
'</div>'
|
|
)
|
|
|
|
return (
|
|
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
|
'max-width:760px;margin:0 auto 16px;padding:14px 18px;'
|
|
'background:#fff;border:1px solid #cbd5e1;border-radius:8px">'
|
|
f'<div style="font-size:11px;color:{sev_color};text-transform:uppercase;'
|
|
f'letter-spacing:1.2px;margin-bottom:4px;font-weight:600">'
|
|
'Cookie-Compliance-Audit — 3-Quellen-Vergleich</div>'
|
|
'<h3 style="margin:0 0 6px;font-size:14px;color:#1e293b">'
|
|
f'{n_dec} in Richtlinie · {n_brw} im Browser · '
|
|
f'{n_both} compliant · {n_undecl} undokumentiert · '
|
|
f'{n_dec_only} nicht geladen</h3>'
|
|
'<p style="margin:0 0 8px;font-size:11px;color:#475569;line-height:1.5">'
|
|
'Wir vergleichen die in der Cookie-Richtlinie genannten Cookies '
|
|
'mit dem was der Browser nach Akzeptieren tatsaechlich laed. '
|
|
'Undokumentierte Cookies im Browser sind ein direkter Verstoss '
|
|
'gegen die DSGVO-Informationspflicht.'
|
|
'</p>'
|
|
+ undecl_html + dec_only_html + compliant_html +
|
|
'</div>'
|
|
)
|