feat(consent+report): P56-P67 Mercedes-Audit-Cycle (Anti-Audit, Phase G Vendors, Cookie-Behavior-Validator + 5 Mail-Polish-Items) [migration-approved]
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / nodejs-build (push) Successful in 2m19s
CI / test-go (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / nodejs-build (push) Successful in 2m19s
CI / test-go (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
P56 Anti-Auditing-Detection als constructive Compliance-Finding (Audit-API-
Empfehlung statt Anklage, weil Mercedes berechtigt Bots blockiert)
P57 Phase G vendor_details Union mit cmp_vendors -> 42 Anbieter sichtbar
P58 Anti-Audit-Detection robuster (Script-Domain-Check + Settings-spezifisch)
P59 Cookie-Behavior-Validator (4 Layer, 3-Tier-Severity: MEDIUM=Kategorie-
Mismatch / HIGH=Zweck-Mismatch / CRITICAL=beide=Vorsatz-Indiz)
+ Open Cookie Database (CC0) als Library-Seed (2264 Cookies)
P59b Cookie-Behavior in Banner-Check verdrahtet + Mail-Block (BUGFIX:
SessionLocal selbst oeffnen, db war im Background-Task nicht im Scope)
Mail-Polish nach Mercedes-Review:
P63 Banner-Footer-Links auch im wb7-link/role=link erkennen (Shadow-DOM-
Walker label-based statt nur <a href>)
P64 Re-Access-Severity: MEDIUM statt HIGH, wenn Footer "Einstellungen" oder
Mercedes-typisch existiert; OEM-Footer-Detection (wb7-footer)
P65 Text-Truncation: Word-Boundary statt Zeichen-Cut (kein "einfa"-Bruch
mehr in Sofortmassnahmen)
P66 GF-Aktionen: Service-Zweck vs Cookie-Zweck explizit erklaert
(haeufige Verwechslung Marketing/GF: "Akamai-Beschreibung" != Cookie-
Zweck pro DSK-OH 2024)
P67 Stirring-Finding mit "Verlust-Framing"-Erklaerung + Alt-vs-Neutral-
Beispiel, statt nur EDPB-Fachbegriff
Compliance-Advisor FAQ (admin agent-core/soul):
+ CNIL/EDPB Top-Bussgelder (Google 100M, Meta 60M, Amazon 35M)
+ Deutsche Praezedenz (LG Muenchen Google Fonts, EuGH Planet49, BGH I ZR 7/16)
+ 4 Risiko-Pfade (Bussgeld/Abmahnung/Sammelklage/NOYB) + Berechnungs-Methodik
Document-Generator Templates: AGB-DE (142), Impressum (140), Widerrufs-
formular-Anlage (143), DSR-Process-Dedup (139), Cookie-Library (144).
Architektur: doc_action_mappings.py + banner_dom_walkers.py +
cookie_behavior_validator.py + vendor_detail_extractor.py rausgezogen,
um die 500-LOC-Caps in agent_doc_check_report.py und
banner_text_checker.py einzuhalten.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -396,6 +396,17 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
f"mit-geprueft.",
|
||||
))
|
||||
continue
|
||||
# P24: DSB-Kontakt ist Pflichtangabe in der DSE (Art. 13(1)(b)
|
||||
# DSGVO) — wenn kein separates DSB-Dokument vorliegt, ist das
|
||||
# KEIN Fehler. DSB-Pruefung passiert ohnehin in der DSE.
|
||||
if doc_type == "dsb" and not (entry.get("url") or "").strip():
|
||||
results.append(DocCheckResult(
|
||||
label=label, url="", doc_type=doc_type,
|
||||
error="Nicht separat vorhanden — DSB-Kontaktdaten "
|
||||
"werden in der Datenschutzerklaerung als "
|
||||
"Pflichtangabe nach Art. 13(1)(b) DSGVO geprueft.",
|
||||
))
|
||||
continue
|
||||
# Empty entry — either from auto-discovery padding (no URL
|
||||
# to fetch) or from a fetch that returned nothing. If there
|
||||
# was a URL we keep the error so the user knows the fetch
|
||||
@@ -442,7 +453,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
if banner_url:
|
||||
_update(check_id, "Cookie-Banner wird geprueft...", 82)
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
async with httpx.AsyncClient(timeout=900.0) as client: # P50: +10min for vendor-detail-phase
|
||||
resp = await client.post(
|
||||
f"{CONSENT_TESTER_URL}/scan",
|
||||
json={"url": banner_url, "timeout_per_phase": 10},
|
||||
@@ -450,7 +461,9 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
if resp.status_code == 200:
|
||||
banner_result = resp.json()
|
||||
except Exception as e:
|
||||
logger.warning("Banner check failed: %s", e)
|
||||
logger.warning(
|
||||
"Banner check failed: %s (%s)", e or "<empty>", type(e).__name__
|
||||
)
|
||||
|
||||
# Step 3c: Cross-check Banner vs Cookie-Richtlinie (88-90%)
|
||||
if banner_result and "cookie" in doc_texts:
|
||||
@@ -530,12 +543,35 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
)
|
||||
cookie_payloads = []
|
||||
cookie_text = ""
|
||||
# P30: aggregate cmp_payloads from ALL doc_entries — sites
|
||||
# like Mercedes load Usercentrics only on the homepage, so
|
||||
# the JSON gets captured during DSE/Impressum discovery, not
|
||||
# in the cookies.html fetch. Dedup by URL since the same
|
||||
# payload is captured on every page load.
|
||||
seen_cmp_urls: set[str] = set()
|
||||
for e in doc_entries:
|
||||
if e.get("doc_type") == "cookie":
|
||||
if e.get("cmp_payloads"):
|
||||
cookie_payloads.extend(e["cmp_payloads"])
|
||||
if e.get("text"):
|
||||
cookie_text = e["text"]
|
||||
for p in (e.get("cmp_payloads") or []):
|
||||
p_url = p.get("url") or ""
|
||||
if p_url and p_url in seen_cmp_urls:
|
||||
continue
|
||||
seen_cmp_urls.add(p_url)
|
||||
cookie_payloads.append(p)
|
||||
if e.get("doc_type") == "cookie" and e.get("text"):
|
||||
cookie_text = e["text"]
|
||||
# P48: also pull cmp_payloads from the Banner-Scan (homepage
|
||||
# 3-phase consent test). Mercedes' Usercentrics-JSON is
|
||||
# captured there even when not in DSI-Discovery of static
|
||||
# legal pages.
|
||||
if banner_result:
|
||||
for p in (banner_result.get("cmp_payloads") or []):
|
||||
p_url = p.get("url") or ""
|
||||
if p_url and p_url in seen_cmp_urls:
|
||||
continue
|
||||
seen_cmp_urls.add(p_url)
|
||||
cookie_payloads.append(p)
|
||||
if cookie_payloads:
|
||||
logger.info("P48: %d CMP-payloads available for vendor-extract (after Banner-Scan merge)",
|
||||
len(cookie_payloads))
|
||||
# P17-D: Fallback wenn cookie via P15 deduped wurde — nutze DSE-Text
|
||||
# sofern Cookie-Begriffe drin sind, damit LLM-Vendor-Extract trotzdem
|
||||
# greifen kann.
|
||||
@@ -570,6 +606,160 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
category=v.get("category", ""),
|
||||
owner_name=owner_name,
|
||||
)
|
||||
# P57: Phase G vendor_details als zusätzliche Vendor-Quelle.
|
||||
# Wenn extract_vendors_from_payloads weniger findet als
|
||||
# Phase G's Info-Click-Through (z.B. Mercedes-Settings nicht
|
||||
# erkannt als usercentrics-kind), die Phase-G-Namen als
|
||||
# eigenständige Vendors hinzufügen.
|
||||
if banner_result:
|
||||
vd_list = banner_result.get("vendor_details") or []
|
||||
vd_list = [v for v in vd_list if v.get("name") != "__TDM_OPTOUT__"]
|
||||
existing_names = {(v.get("name") or "").strip().lower()
|
||||
for v in cmp_vendors}
|
||||
added = 0
|
||||
for d in vd_list:
|
||||
n = (d.get("name") or "").strip()
|
||||
if not n or n.lower() in existing_names:
|
||||
continue
|
||||
# Skip generic category-labels (Mercedes-Kategorien)
|
||||
if n.lower() in ("technisch erforderlich", "analyse und statistik",
|
||||
"marketing", "alles auswählen",
|
||||
"alles auswaehlen"):
|
||||
continue
|
||||
from compliance.services.vendor_classifier import classify
|
||||
cmp_vendors.append({
|
||||
"name": n,
|
||||
"country": "",
|
||||
"purpose": d.get("description", "")[:500],
|
||||
"category": "",
|
||||
"opt_out_url": d.get("opt_out_url", ""),
|
||||
"privacy_policy_url": d.get("privacy_url", ""),
|
||||
"persistence": d.get("retention", ""),
|
||||
"cookies": d.get("cookies", []),
|
||||
"processing_company": d.get("processing_company", ""),
|
||||
"address": d.get("address", ""),
|
||||
"purposes": d.get("purposes", []),
|
||||
"technologies": d.get("technologies", []),
|
||||
"recipient_type": classify(
|
||||
vendor_name=n, category="", owner_name=owner_name,
|
||||
),
|
||||
})
|
||||
existing_names.add(n.lower())
|
||||
added += 1
|
||||
if added:
|
||||
logger.info("P57: added %d new vendors from Phase G (total: %d)",
|
||||
added, len(cmp_vendors))
|
||||
|
||||
# P50: enrich vendors with per-vendor detail-modal-extracts
|
||||
# (description, opt-out URL, privacy URL, cookies). Detail
|
||||
# comes from Phase G Info-button-click-through in /scan.
|
||||
tdm_opt_out_notice = ""
|
||||
if cmp_vendors and banner_result:
|
||||
vendor_details = banner_result.get("vendor_details") or []
|
||||
# P50f: filter out TDM-opt-out sentinel
|
||||
tdm_sentinel = next((v for v in vendor_details
|
||||
if v.get("name") == "__TDM_OPTOUT__"), None)
|
||||
if tdm_sentinel:
|
||||
tdm_opt_out_notice = tdm_sentinel.get("description", "")
|
||||
logger.info("P50f: TDM opt-out — skipped detail-enrichment for vendors")
|
||||
vendor_details = [v for v in vendor_details
|
||||
if v.get("name") != "__TDM_OPTOUT__"]
|
||||
if vendor_details:
|
||||
details_by_name = {}
|
||||
for d in vendor_details:
|
||||
n = (d.get("name") or "").strip().lower()
|
||||
if n:
|
||||
details_by_name[n] = d
|
||||
enriched = 0
|
||||
for v in cmp_vendors:
|
||||
key = (v.get("name") or "").strip().lower()
|
||||
# Substring fallback for fuzzy matches (e.g.
|
||||
# "Google Analytics" detail-name may differ slightly)
|
||||
d = details_by_name.get(key)
|
||||
if not d:
|
||||
for dn, dv in details_by_name.items():
|
||||
if key in dn or dn in key:
|
||||
d = dv
|
||||
break
|
||||
if not d:
|
||||
continue
|
||||
if not v.get("country") and (d.get("processing_company") or d.get("address")):
|
||||
# Heuristic country extract from address (DE/EU keywords)
|
||||
addr = d.get("address", "")
|
||||
if re.search(r"\b(deutschland|germany|berlin|m(?:ue|ü)nchen|hamburg|stuttgart)\b", addr, re.I):
|
||||
v["country"] = "DE"
|
||||
elif re.search(r"\bireland|irland|dublin\b", addr, re.I):
|
||||
v["country"] = "IE"
|
||||
elif re.search(r"\busa|united states|california|new york|delaware\b", addr, re.I):
|
||||
v["country"] = "US"
|
||||
if not v.get("purpose"):
|
||||
v["purpose"] = d.get("description", "")[:500]
|
||||
if not v.get("opt_out_url"):
|
||||
v["opt_out_url"] = d.get("opt_out_url", "")
|
||||
if not v.get("privacy_policy_url"):
|
||||
v["privacy_policy_url"] = d.get("privacy_url", "")
|
||||
if not v.get("cookies"):
|
||||
v["cookies"] = d.get("cookies", [])
|
||||
v["purposes"] = d.get("purposes", [])
|
||||
v["technologies"] = d.get("technologies", [])
|
||||
if not v.get("persistence"):
|
||||
v["persistence"] = d.get("retention", "")
|
||||
v["processing_company"] = d.get("processing_company", "")
|
||||
v["address"] = d.get("address", "")
|
||||
enriched += 1
|
||||
logger.info("P50: enriched %d/%d vendors with detail-modal data",
|
||||
enriched, len(cmp_vendors))
|
||||
# P59b: Cookie-Behavior-Validator — pruefe alle gesetzten Cookies
|
||||
# gegen unsere Library, generiere 3-Tier-Severity-Findings.
|
||||
# Background-Task hat keinen DB-Dependency-Inject -> SessionLocal
|
||||
# selber oeffnen + sauber schliessen.
|
||||
cookie_behavior_findings: list[dict] = []
|
||||
if banner_result:
|
||||
cookies_detailed = banner_result.get("cookies_detailed") or []
|
||||
if cookies_detailed:
|
||||
cb_session = None
|
||||
try:
|
||||
from database import SessionLocal
|
||||
from compliance.services.cookie_behavior_validator import (
|
||||
validate_cookie_behavior,
|
||||
)
|
||||
from urllib.parse import urlparse
|
||||
fp_domain = ""
|
||||
if banner_url:
|
||||
fp_domain = urlparse(banner_url).netloc.replace("www.", "")
|
||||
cb_session = SessionLocal()
|
||||
cookie_behavior_findings = validate_cookie_behavior(
|
||||
cb_session, cookies_detailed,
|
||||
network_requests=[], # TODO Layer B in P59d
|
||||
first_party_domain=fp_domain,
|
||||
)
|
||||
if cookie_behavior_findings:
|
||||
sevs = {f["severity"] for f in cookie_behavior_findings}
|
||||
logger.info(
|
||||
"P59b: Cookie-Behavior-Check %d findings "
|
||||
"(severities: %s) ueber %d Cookies",
|
||||
len(cookie_behavior_findings),
|
||||
sorted(sevs),
|
||||
len(cookies_detailed),
|
||||
)
|
||||
banner_result["cookie_behavior_findings"] = (
|
||||
cookie_behavior_findings
|
||||
)
|
||||
else:
|
||||
logger.info(
|
||||
"P59b: Cookie-Behavior-Check 0 findings "
|
||||
"ueber %d Cookies (library miss / clean)",
|
||||
len(cookies_detailed),
|
||||
)
|
||||
except Exception as cb_err:
|
||||
logger.warning("P59b Cookie-Behavior-Check failed: %s", cb_err)
|
||||
finally:
|
||||
if cb_session is not None:
|
||||
try:
|
||||
cb_session.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if cmp_vendors:
|
||||
logger.info("VVT: %d vendors extracted, validating links",
|
||||
len(cmp_vendors))
|
||||
@@ -1149,10 +1339,15 @@ _DISCOVERY_RULES: list[tuple[str, tuple[str, ...]]] = [
|
||||
"right-of-withdrawal", "ruecktritts", "rücktritts")),
|
||||
("social_media", ("social-media", "soziale-medien", "social_media",
|
||||
"social-media-policy")),
|
||||
# P23: 'terms-and-conditions' kann Allgemeine Geschaeftsbedingungen ODER
|
||||
# Nutzungsbedingungen meinen. Discovery-Funktion klassifiziert spaeter
|
||||
# praeziser per Titel + Inhalt. Hier nur Url-Hint:
|
||||
("agb", ("/agb", "geschaeftsbedingungen", "geschäftsbedingungen",
|
||||
"terms-and-conditions", "general-terms")),
|
||||
("nutzungsbedingungen", ("nutzungsbedingung", "terms-of-use",
|
||||
"nutzungsordnung", "terms-of-service")),
|
||||
"general-terms")),
|
||||
("nutzungsbedingungen", ("nutzungsbedingung", "nutzungsbedingungen",
|
||||
"terms-of-use", "terms-and-conditions",
|
||||
"nutzungsordnung", "terms-of-service",
|
||||
"allgemeine-nutzungsbedingungen")),
|
||||
("dsb", ("datenschutzbeauftragt", "data-protection-officer",
|
||||
"dpo-contact", "/dsb")),
|
||||
("impressum", ("impressum", "imprint", "legal-notice", "site-notice",
|
||||
|
||||
@@ -202,5 +202,34 @@ def build_banner_deep_html(banner_result: dict | None) -> str:
|
||||
)
|
||||
parts.append('</ul>')
|
||||
|
||||
# 5) P59b: Cookie-Behavior-Findings (deklariert vs. tatsaechlich)
|
||||
cb_findings = banner_result.get("cookie_behavior_findings") or []
|
||||
if cb_findings:
|
||||
parts.append(
|
||||
'<div style="margin:14px 0 4px;padding:8px 12px;'
|
||||
'background:#fef9e7;border-left:3px solid #d97706;border-radius:4px">'
|
||||
'<div style="font-size:12px;color:#92400e;font-weight:600;'
|
||||
'margin-bottom:6px">Cookie-Verhaltens-Check '
|
||||
'(P59 — deklarierter Zweck vs. tatsaechliches Verhalten)</div>'
|
||||
'<ul style="margin:0 0 0 18px;padding:0;font-size:11px;color:#1e293b">'
|
||||
)
|
||||
for f in cb_findings[:20]:
|
||||
sev = (f.get("severity") or "MEDIUM").upper()
|
||||
sev_c = ("#dc2626" if sev in ("CRITICAL", "HIGH") else
|
||||
"#d97706" if sev == "MEDIUM" else "#94a3b8")
|
||||
cname = f.get("cookie_name", "?")
|
||||
parts.append(
|
||||
f'<li style="margin-bottom:6px">'
|
||||
f'<span style="display:inline-block;background:{sev_c};color:#fff;'
|
||||
f'font-size:9px;padding:1px 5px;border-radius:3px;margin-right:6px">'
|
||||
f'{sev}</span><code style="font-size:10px;background:#f1f5f9;'
|
||||
f'padding:1px 4px;border-radius:2px">{cname}</code>: '
|
||||
f'{f.get("text", "")[:280]}'
|
||||
f'<div style="font-size:10px;color:#94a3b8;margin-top:2px;'
|
||||
f'font-style:italic">Quelle: {f.get("legal_ref", "")} · '
|
||||
f'Layer {f.get("layer", "?")}</div></li>'
|
||||
)
|
||||
parts.append('</ul></div>')
|
||||
|
||||
parts.append('</div>')
|
||||
return "".join(parts)
|
||||
|
||||
@@ -13,6 +13,17 @@ Bei sauberen Sites bleibt er weg.
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
def _truncate_words(text: str, max_chars: int) -> str:
|
||||
"""P65: Truncate at word boundary, never mid-word."""
|
||||
if not text or len(text) <= max_chars:
|
||||
return text
|
||||
cut = text[:max_chars]
|
||||
last_space = cut.rfind(" ")
|
||||
if last_space > max_chars // 2:
|
||||
cut = cut[:last_space]
|
||||
return cut.rstrip(",;:.") + "…"
|
||||
|
||||
|
||||
# Bekannte Buessgeld-Praezedenzfaelle als Quellen-Hint
|
||||
_BUSSGELD_REFS = {
|
||||
"no_provider_per_category": "CNIL France 2023 — TikTok 5 Mio EUR (fehlende Vendor-Transparenz)",
|
||||
@@ -40,7 +51,7 @@ def _detect_critical_issues(
|
||||
if sev in ("CRITICAL", "HIGH"):
|
||||
issues.append({
|
||||
"key": "banner_violation",
|
||||
"title": v.get("text", "")[:120],
|
||||
"title": _truncate_words(v.get("text", ""), 260),
|
||||
"severity": sev,
|
||||
"action": _action_for_banner_violation(v),
|
||||
"source": v.get("legal_ref", ""),
|
||||
|
||||
@@ -283,6 +283,50 @@ def build_vvt_table_html(vendors: list[dict]) -> str:
|
||||
summary_parts.append("— alle ueber 50%")
|
||||
summary = " ".join(summary_parts)
|
||||
|
||||
# P60: Wenn viele Vendors die GLEICHEN Flag-Sets haben, einmal
|
||||
# global hinweisen statt 42x pro Vendor wiederholen.
|
||||
from collections import Counter
|
||||
flag_sets = Counter()
|
||||
for v in vendors:
|
||||
flags = v.get("compliance_flags") or []
|
||||
if flags:
|
||||
flag_sets[tuple(sorted(flags))] += 1
|
||||
pattern_notice = ""
|
||||
if flag_sets:
|
||||
most_common, n_match = flag_sets.most_common(1)[0]
|
||||
share = n_match / max(1, len(vendors))
|
||||
if n_match >= 8 and share >= 0.5:
|
||||
from compliance.services.finding_action_recipes import recipe_for
|
||||
labels = [_flag_short(f) for f in most_common]
|
||||
shared_actions = []
|
||||
for f in most_common:
|
||||
rec = recipe_for(f)
|
||||
if rec:
|
||||
shared_actions.append(
|
||||
f'<li><strong>{_flag_short(f)}:</strong> '
|
||||
f'{rec.get("fix_text", "").splitlines()[0][:180]}</li>'
|
||||
)
|
||||
pattern_notice = (
|
||||
f'<div style="margin:8px 0 12px;padding:10px 14px;'
|
||||
f'background:#fef3c7;border-left:3px solid #d97706;'
|
||||
f'border-radius:4px;font-size:11px;color:#92400e">'
|
||||
f'<strong>Wiederkehrendes Muster ({n_match} von {len(vendors)} '
|
||||
f'Anbietern, {int(share*100)}%):</strong> '
|
||||
f'Bei diesen Anbietern fehlen jeweils: '
|
||||
f'<em>{", ".join(labels)}</em>. '
|
||||
f'Vermutlich systembedingt (z.B. Settings-Export liefert '
|
||||
f'nur Namen, oder Banner-API blockiert Detail-Extraktion). '
|
||||
f'Die globalen Empfehlungen unten gelten fuer all diese Eintraege; '
|
||||
f'in der Tabelle werden sie nicht pro Zeile wiederholt.'
|
||||
+ (f'<ul style="margin:8px 0 0 0;padding-left:20px">{"".join(shared_actions)}</ul>'
|
||||
if shared_actions else '')
|
||||
+ '</div>'
|
||||
)
|
||||
# Mark vendors so _render_vendor_row can suppress redundant actions
|
||||
for v in vendors:
|
||||
if tuple(sorted(v.get("compliance_flags") or [])) == most_common:
|
||||
v["_actions_in_global_notice"] = True
|
||||
|
||||
out: list[str] = [
|
||||
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
||||
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
|
||||
@@ -296,6 +340,7 @@ def build_vvt_table_html(vendors: list[dict]) -> str:
|
||||
'Verarbeitungen (INTERNAL/GROUP) werden Opt-Out und Privacy-Link '
|
||||
'NICHT als Pflicht gewertet — der Widerruf erfolgt ueber das '
|
||||
'Cookie-Banner, Privacy ist in der Haupt-DSI dokumentiert.</p>',
|
||||
pattern_notice,
|
||||
]
|
||||
|
||||
for rtype, section_label in RECIPIENT_TYPE_SECTIONS:
|
||||
@@ -389,7 +434,9 @@ def _render_vendor_row_full(v: dict) -> str:
|
||||
|
||||
# Inline-Aktions-Anweisungen pro Flag
|
||||
actions_html = ""
|
||||
if flags:
|
||||
# P60: skip per-row actions when already covered by global pattern notice
|
||||
skip_actions = bool(v.get("_actions_in_global_notice"))
|
||||
if flags and not skip_actions:
|
||||
from compliance.services.finding_action_recipes import recipe_for
|
||||
action_items = []
|
||||
for f in flags:
|
||||
|
||||
@@ -202,52 +202,13 @@ def build_management_summary(results: list[DocCheckResult]) -> str:
|
||||
|
||||
|
||||
def _check_to_action(doc_label: str, check_label: str, hint: str) -> str:
|
||||
"""Convert a failed check into a plain-language action item."""
|
||||
# Map technical check labels to business-language actions
|
||||
label_lower = check_label.lower()
|
||||
"""Convert a failed check into a plain-language action item.
|
||||
|
||||
if "datenschutzbeauftragter" in label_lower or "dsb" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Ihren Datenschutzbeauftragten "
|
||||
f"mit Kontaktdaten erwaehnen. Pflicht ab 20 Mitarbeitern.")
|
||||
|
||||
if "beschwerderecht" in label_lower or "art. 77" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Hinweis auf das Beschwerderecht "
|
||||
f"bei der Aufsichtsbehoerde ergaenzen (Name + Kontakt der Behoerde).")
|
||||
|
||||
if "betroffenenrechte" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Alle Betroffenenrechte "
|
||||
f"(Auskunft, Berichtigung, Loeschung, etc.) einzeln auffuehren.")
|
||||
|
||||
if "verantwortlicher" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Vollstaendige Firmenbezeichnung "
|
||||
f"mit Rechtsform, Adresse, E-Mail und Telefon eintragen.")
|
||||
|
||||
if "interessenabwaegung" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Bei 'berechtigtem Interesse' "
|
||||
f"die Abwaegung dokumentieren. Aufgabe fuer den DSB/Rechtsanwalt.")
|
||||
|
||||
if "widerrufsbelehrung" in label_lower or "widerruf" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Gesetzliche Widerrufsbelehrung "
|
||||
f"mit 14-Tage-Frist und Musterformular bereitstellen.")
|
||||
|
||||
if "loeschkonzept" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Loeschfristen und -prozess "
|
||||
f"dokumentieren. Aufgabe fuer den DSB.")
|
||||
|
||||
if "profiling" in label_lower or "art. 22" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Hinweis ergaenzen ob "
|
||||
f"automatisierte Entscheidungen stattfinden oder nicht.")
|
||||
|
||||
if "nicht im eingereichten text" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Das eingereichte Dokument "
|
||||
f"enthaelt nicht den erwarteten Inhalt. Bitte korrekte URL pruefen.")
|
||||
if any(w in label_lower for w in ("rechtswidrig", "illegal", "haftungsausschluss", "disclaimer")):
|
||||
return f"<strong>{doc_label}:</strong> '{check_label}' muss entfernt werden (Anti-Pattern, rechtlich wirkungslos)."
|
||||
# Generic fallback
|
||||
if hint and len(hint) < 150:
|
||||
return f"<strong>{doc_label}:</strong> {hint[:120]}"
|
||||
|
||||
return f"<strong>{doc_label}:</strong> '{check_label}' muss ergaenzt werden."
|
||||
Implementation lives in doc_action_mappings.check_to_action — kept here
|
||||
as a thin wrapper so the report module stays under the 500-LOC cap.
|
||||
"""
|
||||
from compliance.api.doc_action_mappings import check_to_action
|
||||
return check_to_action(doc_label, check_label, hint)
|
||||
|
||||
|
||||
def build_html_report(
|
||||
|
||||
@@ -0,0 +1,102 @@
|
||||
"""
|
||||
GF-freundliche Action-Texte fuer fehlende Pflichtangaben.
|
||||
|
||||
Ausgelagert aus agent_doc_check_report.py (LOC-Cap). Wandelt einen
|
||||
fehlgeschlagenen DocCheck in eine kurze Handlungsanweisung um, die ein
|
||||
Geschaeftsfuehrer ohne juristisches Vorwissen versteht.
|
||||
|
||||
P66: Cookie-spezifische Findings unterscheiden zwischen Service-Zweck
|
||||
(Anbieter-Beschreibung wie "Akamai = Bot-Schutz") und Cookie-Zweck
|
||||
(welches Cookie wozu) — eine haeufige Verwechslung bei Marketing-Managern.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
def _cookie_finding_action(doc_label: str, check_label: str) -> str | None:
|
||||
"""P66 — Cookie-spezifische Mappings."""
|
||||
label_lower = check_label.lower()
|
||||
|
||||
if "zwecke der cookies" in label_lower or label_lower == "zwecke":
|
||||
return (f"<strong>{doc_label}:</strong> Zwecke pro Cookie ergaenzen "
|
||||
f"— nicht pro Anbieter. Service-Beschreibungen ('Akamai = "
|
||||
f"Bot-Schutz') beantworten nicht, was das einzelne Cookie "
|
||||
f"tut. Pflicht: pro Cookie (z.B. <code>_abck</code>) den "
|
||||
f"konkreten Zweck angeben ('Bot-Detection-Token, gueltig "
|
||||
f"24h'). DSK-OH Telemedien 2024 §3.2.")
|
||||
|
||||
if "speicherdauer" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Speicherdauer pro Cookie "
|
||||
f"angeben — nicht pauschal 'siehe Anbieter'. Pflicht: "
|
||||
f"konkreter Wert (z.B. '_ga: 2 Jahre', '_gid: 24h', "
|
||||
f"'PHPSESSID: Session'). Werte aus DevTools > "
|
||||
f"Application > Cookies pruefen, Anbieter-Doku ist "
|
||||
f"oft veraltet. Art. 13 Abs. 2 lit. a DSGVO.")
|
||||
|
||||
if "anbieter" in label_lower or "providers_named" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Konkrete Firmen mit Sitz "
|
||||
f"benennen — nicht 'Drittanbieter' oder 'Marketing-Partner'. "
|
||||
f"Pflicht: voller Firmenname + Rechtsform + Land (z.B. "
|
||||
f"'Google Ireland Limited, Dublin'). Art. 13 Abs. 1 lit. e "
|
||||
f"DSGVO (Empfaenger-Pflicht).")
|
||||
|
||||
if "cookie-tabelle" in label_lower or "cookie_list" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Tabellarische Cookie-Liste "
|
||||
f"mit Name, Anbieter, Zweck und Speicherdauer ergaenzen. "
|
||||
f"Reine Anbieter-Beschreibung ohne Cookie-Namen reicht "
|
||||
f"nicht — Nutzer muss nachvollziehen, welches einzelne "
|
||||
f"Cookie was tut. DSK-OH 2024.")
|
||||
|
||||
if "drittland" in label_lower or "schrems" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Pro US-Anbieter (Google, "
|
||||
f"Meta, AWS, Akamai) klaeren: SCC (Art. 46 DSGVO) oder "
|
||||
f"DPF-Zertifizierung — und in der Cookie-Richtlinie "
|
||||
f"explizit nennen. Pauschales 'Anbieter ausserhalb EU' "
|
||||
f"reicht nicht. EuGH Schrems II.")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def check_to_action(doc_label: str, check_label: str, hint: str) -> str:
|
||||
"""Convert a failed check into a plain-language action item."""
|
||||
label_lower = check_label.lower()
|
||||
|
||||
if "datenschutzbeauftragter" in label_lower or "dsb" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Ihren Datenschutzbeauftragten "
|
||||
f"mit Kontaktdaten erwaehnen. Pflicht ab 20 Mitarbeitern.")
|
||||
if "beschwerderecht" in label_lower or "art. 77" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Hinweis auf das Beschwerderecht "
|
||||
f"bei der Aufsichtsbehoerde ergaenzen (Name + Kontakt der Behoerde).")
|
||||
if "betroffenenrechte" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Alle Betroffenenrechte "
|
||||
f"(Auskunft, Berichtigung, Loeschung, etc.) einzeln auffuehren.")
|
||||
if "verantwortlicher" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Vollstaendige Firmenbezeichnung "
|
||||
f"mit Rechtsform, Adresse, E-Mail und Telefon eintragen.")
|
||||
if "interessenabwaegung" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Bei 'berechtigtem Interesse' "
|
||||
f"die Abwaegung dokumentieren. Aufgabe fuer den DSB/Rechtsanwalt.")
|
||||
if "widerrufsbelehrung" in label_lower or "widerruf" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Gesetzliche Widerrufsbelehrung "
|
||||
f"mit 14-Tage-Frist und Musterformular bereitstellen.")
|
||||
if "loeschkonzept" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Loeschfristen und -prozess "
|
||||
f"dokumentieren. Aufgabe fuer den DSB.")
|
||||
if "profiling" in label_lower or "art. 22" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Hinweis ergaenzen ob "
|
||||
f"automatisierte Entscheidungen stattfinden oder nicht.")
|
||||
if "nicht im eingereichten text" in label_lower:
|
||||
return (f"<strong>{doc_label}:</strong> Das eingereichte Dokument "
|
||||
f"enthaelt nicht den erwarteten Inhalt. Bitte korrekte URL pruefen.")
|
||||
if any(w in label_lower for w in ("rechtswidrig", "illegal",
|
||||
"haftungsausschluss", "disclaimer")):
|
||||
return (f"<strong>{doc_label}:</strong> '{check_label}' muss entfernt "
|
||||
f"werden (Anti-Pattern, rechtlich wirkungslos).")
|
||||
|
||||
mapped = _cookie_finding_action(doc_label, check_label)
|
||||
if mapped:
|
||||
return mapped
|
||||
|
||||
if hint and len(hint) < 300:
|
||||
return f"<strong>{doc_label}:</strong> {hint[:280]}"
|
||||
return f"<strong>{doc_label}:</strong> '{check_label}' muss ergaenzt werden."
|
||||
@@ -0,0 +1,303 @@
|
||||
"""
|
||||
P59 — Cookie-Behavior-Validator.
|
||||
|
||||
4 Layer:
|
||||
A) Open Cookie Database lookup (declared category vs library category)
|
||||
B) Network-Traffic-Analyse (cookie value sent to third-party domains)
|
||||
C) Value-Pattern (Hash/UUID/PII heuristics on "essential"-declared cookies)
|
||||
D) Cross-Site frequency (from library metadata, when available)
|
||||
|
||||
Returns list of findings with severity + Art. 5(1)(b) DSGVO reference.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from typing import Iterable
|
||||
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# --- Patterns für Layer C ---
|
||||
_HASH_PATTERN = re.compile(r"^[a-f0-9]{32,64}$", re.IGNORECASE)
|
||||
_UUID_PATTERN = re.compile(
|
||||
r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
_BASE64_LONG = re.compile(r"^[A-Za-z0-9+/=]{40,}$")
|
||||
_PII_KEYS = ("email", "@", "user_id", "userid", "username", "phone")
|
||||
|
||||
# --- Purpose-Keyword-Bags für Layer A2 (Zweck-Match) ---
|
||||
_PURPOSE_KEYWORDS = {
|
||||
"marketing": {
|
||||
"tracking", "tracker", "targeting", "profiling", "profile",
|
||||
"advertis", "marketing", "remarket", "retargeting", "conversion",
|
||||
"audience", "behavioral", "behaviour", "personali", "interest",
|
||||
"campaign", "promotion", "pixel", "fingerprint",
|
||||
},
|
||||
"statistics": {
|
||||
"analytic", "analyse", "analyz", "measure", "measurement", "metric",
|
||||
"statistic", "performance", "telemetr", "monitoring", "usage",
|
||||
"reichweite", "auswert",
|
||||
},
|
||||
"essential": {
|
||||
"session", "sitzung", "authentic", "anmeld", "login", "logout",
|
||||
"security", "sicherheit", "csrf", "xsrf", "cookie consent",
|
||||
"cookie-einwilligung", "technisch notwendig", "load balanc",
|
||||
"lastverteil",
|
||||
},
|
||||
"functional": {
|
||||
"preference", "praeferen", "language", "sprache", "layout", "design",
|
||||
"cart", "warenkorb", "wishlist", "merkliste", "favorit", "theme",
|
||||
"darkmode", "darstellung",
|
||||
},
|
||||
"social_media": {
|
||||
"social", "facebook", "twitter", "linkedin", "instagram", "youtube",
|
||||
"embed", "share", "teilen",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _classify_purpose_text(text_value: str) -> set[str]:
|
||||
"""Return set of categories whose keywords appear in the purpose-text."""
|
||||
if not text_value:
|
||||
return set()
|
||||
t = text_value.lower()
|
||||
matches = set()
|
||||
for cat, kws in _PURPOSE_KEYWORDS.items():
|
||||
if any(k in t for k in kws):
|
||||
matches.add(cat)
|
||||
return matches
|
||||
|
||||
|
||||
def _lookup_library(db: Session, cookie_name: str,
|
||||
cookie_domain: str) -> dict | None:
|
||||
"""Layer A: find best library match."""
|
||||
# Exact domain match first, then wildcard
|
||||
cur = db.execute(text("""
|
||||
SELECT actual_category, purpose_en, purpose_de, vendor_name,
|
||||
data_receivers, source_name, source_url, confidence
|
||||
FROM compliance.cookie_library
|
||||
WHERE cookie_name = :name
|
||||
ORDER BY
|
||||
CASE WHEN domain_pattern = :domain THEN 0
|
||||
WHEN :domain ILIKE replace(domain_pattern, '*', '%') THEN 1
|
||||
ELSE 2 END,
|
||||
confidence DESC
|
||||
LIMIT 1
|
||||
"""), {"name": cookie_name, "domain": cookie_domain or ""})
|
||||
row = cur.fetchone()
|
||||
if not row:
|
||||
return None
|
||||
return {
|
||||
"actual_category": row[0], "purpose_en": row[1],
|
||||
"purpose_de": row[2], "vendor_name": row[3],
|
||||
"data_receivers": row[4] or [],
|
||||
"source_name": row[5], "source_url": row[6],
|
||||
"confidence": float(row[7] or 0),
|
||||
}
|
||||
|
||||
|
||||
def _value_pattern_flag(value: str | None, declared_category: str) -> str | None:
|
||||
"""Layer C: detect tracking-typical patterns in essential-declared cookies."""
|
||||
if not value or declared_category not in ("essential", "functional"):
|
||||
return None
|
||||
v = value.strip()
|
||||
if not v or len(v) < 16:
|
||||
return None
|
||||
if _UUID_PATTERN.match(v):
|
||||
return "UUID (Persistent Identifier)"
|
||||
if _HASH_PATTERN.match(v):
|
||||
return f"Hash-Wert ({len(v)} Hex-Zeichen — typisch User-ID)"
|
||||
if _BASE64_LONG.match(v):
|
||||
return f"Base64-Long ({len(v)} Zeichen — typisch Tracking-Payload)"
|
||||
vlow = v.lower()
|
||||
for kw in _PII_KEYS:
|
||||
if kw in vlow:
|
||||
return f"PII-Marker '{kw}' im Wert"
|
||||
return None
|
||||
|
||||
|
||||
def _category_label(cat: str) -> str:
|
||||
return {
|
||||
"essential": "technisch notwendig",
|
||||
"functional": "funktional",
|
||||
"statistics": "Analyse/Statistik",
|
||||
"marketing": "Marketing/Werbung",
|
||||
"social_media": "Social Media",
|
||||
"unknown": "unbekannt",
|
||||
}.get(cat, cat)
|
||||
|
||||
|
||||
def validate_cookie_behavior(
|
||||
db: Session,
|
||||
cookies_set: Iterable[dict],
|
||||
network_requests: list[dict] | None = None,
|
||||
first_party_domain: str = "",
|
||||
) -> list[dict]:
|
||||
"""Run all 4 layers, return list of finding dicts.
|
||||
|
||||
Each cookie dict should have: name, domain (optional), value (optional),
|
||||
declared_category (e.g. 'essential'), max_age_seconds (optional)."""
|
||||
findings: list[dict] = []
|
||||
network_requests = network_requests or []
|
||||
fp_domain = (first_party_domain or "").lower().lstrip(".")
|
||||
|
||||
# Pre-index network: which receivers got which cookie?
|
||||
receivers_by_cookie: dict[str, set[str]] = {}
|
||||
for req in network_requests:
|
||||
try:
|
||||
host = (req.get("host") or req.get("url", "")).lower()
|
||||
for cname in (req.get("cookies_sent") or []):
|
||||
receivers_by_cookie.setdefault(cname, set()).add(host)
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
for c in cookies_set or []:
|
||||
name = (c.get("name") or "").strip()
|
||||
if not name:
|
||||
continue
|
||||
declared = (c.get("declared_category") or "").lower()
|
||||
domain = (c.get("domain") or "").lstrip(".").lower()
|
||||
value = c.get("value")
|
||||
|
||||
# Layer A: library lookup + 3-Tier-Severity (Kategorie / Zweck / Kombi)
|
||||
lib = _lookup_library(db, name, domain)
|
||||
declared_purpose = (c.get("declared_purpose") or "").strip()
|
||||
if lib and lib["actual_category"] != "unknown":
|
||||
# Layer A1: Kategorie-Mismatch (NUR wenn relevant — declared ist
|
||||
# essential/functional aber library sagt marketing/statistics)
|
||||
category_mismatch = (
|
||||
declared
|
||||
and lib["actual_category"] != declared
|
||||
and declared in ("essential", "functional")
|
||||
and lib["actual_category"] in ("marketing", "statistics",
|
||||
"social_media")
|
||||
)
|
||||
# Layer A2: Zweck-Text-Mismatch
|
||||
purpose_mismatch = False
|
||||
purpose_explain = ""
|
||||
if declared_purpose:
|
||||
declared_cats = _classify_purpose_text(declared_purpose)
|
||||
actual_cat = lib["actual_category"]
|
||||
# Mismatch wenn deklarierter Zweck-Text auf andere Kategorie
|
||||
# zeigt als die Library-Realität (z.B. declared "Sitzung" aber
|
||||
# tatsaechlich Marketing-Cookie)
|
||||
if actual_cat in ("marketing", "statistics", "social_media"):
|
||||
# Verdacht wenn deklarierter Zweck NUR essential/functional
|
||||
# Patterns hat (nichts zu Marketing/Analytics)
|
||||
if declared_cats and actual_cat not in declared_cats:
|
||||
# ausserdem: irgendein "harmloser" Keyword da
|
||||
if declared_cats & {"essential", "functional"}:
|
||||
purpose_mismatch = True
|
||||
purpose_explain = (
|
||||
f"Beschriebener Zweck deutet auf "
|
||||
f"{', '.join(_category_label(c) for c in declared_cats)}, "
|
||||
f"das Cookie wird aber tatsaechlich fuer "
|
||||
f"{_category_label(actual_cat)} eingesetzt"
|
||||
)
|
||||
|
||||
# 3-Tier-Severity
|
||||
if category_mismatch and purpose_mismatch:
|
||||
# CRITICAL — Vorsatz / Boeswilligkeit-Indiz
|
||||
findings.append({
|
||||
"layer": "A1+A2",
|
||||
"cookie_name": name,
|
||||
"severity": "CRITICAL",
|
||||
"type": "DUAL_MISMATCH_INTENT",
|
||||
"text": (
|
||||
f"Cookie '{name}' weist DOPPELTE Diskrepanz auf: "
|
||||
f"deklarierte Kategorie '{_category_label(declared)}' UND "
|
||||
f"deklarierter Zweck stimmen NICHT mit dem realen Verhalten "
|
||||
f"('{_category_label(lib['actual_category'])}') ueberein. "
|
||||
f"{purpose_explain}. {lib['source_name']}-Quelle: "
|
||||
f"{lib['purpose_en'][:120] if lib['purpose_en'] else ''}. "
|
||||
f"Doppel-Mismatch indiziert Vorsatz nach DSK Beschluss 2024-02 "
|
||||
f"(Cookie gezielt verschleiert) — siehe Bussgeld-Risiko Art. 83 "
|
||||
f"DSGVO bei wissentlicher Taeuschung. Konstruktive Annahme: "
|
||||
f"haeufig Marketing-/Agentur-Versehen ohne DSB-Kontrolle."
|
||||
),
|
||||
"legal_ref": "Art. 5(1)(a)+(b) DSGVO + DSK Beschluss 2024-02",
|
||||
"source": lib["source_url"] or lib["source_name"],
|
||||
})
|
||||
elif purpose_mismatch:
|
||||
# HIGH — Zweck stimmt nicht (Ahnungslosigkeit oder Vorsatz)
|
||||
findings.append({
|
||||
"layer": "A2",
|
||||
"cookie_name": name,
|
||||
"severity": "HIGH",
|
||||
"type": "PURPOSE_TEXT_MISMATCH",
|
||||
"text": (
|
||||
f"Cookie '{name}': {purpose_explain}. {lib['source_name']}: "
|
||||
f"{(lib['purpose_en'] or '')[:140]}. Deutet auf fehlende "
|
||||
f"Detail-Pruefung des Cookie-Verhaltens — Beschreibung sollte "
|
||||
f"das tatsaechliche Verhalten reflektieren (Art. 13 DSGVO + "
|
||||
f"Transparenz)."
|
||||
),
|
||||
"legal_ref": "Art. 13(1)(c) DSGVO (Zweck-Angabe muss korrekt sein)",
|
||||
"source": lib["source_url"] or lib["source_name"],
|
||||
})
|
||||
elif category_mismatch:
|
||||
# MEDIUM — Kategorie-Tag falsch, kann Fluechtigkeitsfehler sein
|
||||
findings.append({
|
||||
"layer": "A1",
|
||||
"cookie_name": name,
|
||||
"severity": "MEDIUM",
|
||||
"type": "CATEGORY_MISMATCH",
|
||||
"text": (
|
||||
f"Cookie '{name}' ist als '{_category_label(declared)}' "
|
||||
f"kategorisiert. {lib['source_name']} klassifiziert ihn als "
|
||||
f"'{_category_label(lib['actual_category'])}'"
|
||||
+ (f" — {lib['purpose_en'][:120]}" if lib['purpose_en'] else "")
|
||||
+ f". Vermutlich Konfigurations-Versehen im Consent-Tool "
|
||||
f"(haeufig bei Migrations zwischen CMP-Anbietern). "
|
||||
f"Korrektur: Cookie auf '{_category_label(lib['actual_category'])}'"
|
||||
f" umstellen, Consent neu einholen."
|
||||
),
|
||||
"legal_ref": "Art. 5(1)(b) DSGVO (Zweckbindung)",
|
||||
"source": lib["source_url"] or lib["source_name"],
|
||||
})
|
||||
|
||||
# Layer B: network traffic
|
||||
receivers = receivers_by_cookie.get(name, set())
|
||||
third_party = [r for r in receivers
|
||||
if r and fp_domain and not r.endswith(fp_domain)]
|
||||
if third_party and declared in ("essential", "functional"):
|
||||
findings.append({
|
||||
"layer": "B",
|
||||
"cookie_name": name,
|
||||
"severity": "HIGH",
|
||||
"type": "THIRD_PARTY_DESPITE_ESSENTIAL",
|
||||
"text": (
|
||||
f"Cookie '{name}' ist als '{_category_label(declared)}' "
|
||||
f"deklariert, der Wert wird aber an {len(third_party)} "
|
||||
f"externe(n) Empfaenger uebertragen: "
|
||||
f"{', '.join(sorted(third_party))[:200]}. "
|
||||
f"Damit liegt eine Drittlandstransfer-/Drittanbieter-Verarbeitung "
|
||||
f"vor, die nicht durch die deklarierte Zweckbestimmung gedeckt ist."
|
||||
),
|
||||
"legal_ref": "Art. 5(1)(b) Zweckbindung + Art. 13(1)(f) DSGVO",
|
||||
})
|
||||
|
||||
# Layer C: value pattern
|
||||
flag = _value_pattern_flag(value, declared)
|
||||
if flag:
|
||||
findings.append({
|
||||
"layer": "C",
|
||||
"cookie_name": name,
|
||||
"severity": "MEDIUM",
|
||||
"type": "TRACKING_PATTERN_DESPITE_ESSENTIAL",
|
||||
"text": (
|
||||
f"Cookie '{name}' ist als '{_category_label(declared)}' "
|
||||
f"deklariert, enthaelt aber: {flag}. Werte mit Tracking-Charakter "
|
||||
f"sind in nicht einwilligungsbeduerftigen Kategorien fragwuerdig."
|
||||
),
|
||||
"legal_ref": "Art. 5(1)(b) DSGVO + DSK-OH Telemedien 2024",
|
||||
})
|
||||
|
||||
# Layer D: cross-site frequency (later — needs metadata import)
|
||||
|
||||
return findings
|
||||
@@ -39,6 +39,12 @@ AGB_CHECKLIST = [
|
||||
"patterns": [
|
||||
r"vertragsschluss", r"zustandekommen",
|
||||
r"contract\s+formation", r"angebot\s+und\s+annahme",
|
||||
# P41: English synonyms
|
||||
r"conclusion\s+of\s+(?:the\s+)?contract",
|
||||
r"contract\s+(?:is\s+)?(?:concluded|formed)",
|
||||
r"offer\s+and\s+acceptance",
|
||||
r"how\s+the\s+contract\s+is\s+formed",
|
||||
r"contracts?\s+(?:apply|between\s+the\s+provider)",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "Haeufiger Fehler: Die Bestellung wird als Angebot des Kunden dargestellt, aber die Auftragsbestaetigung als Annahme — das ist nur wirksam, wenn klar zwischen Eingangsbestaetigung (§312i BGB) und Auftragsbestaetigung/Annahme unterschieden wird.",
|
||||
@@ -140,6 +146,15 @@ AGB_CHECKLIST = [
|
||||
r"lieferung", r"leistungserbringung", r"delivery",
|
||||
r"lieferfrist", r"bereitstellung",
|
||||
r"(?:zugang|zugriff).*(?:dienst|leistung)",
|
||||
# P41: English synonyms (SaaS-style)
|
||||
r"provision\s+of\s+(?:the\s+)?(?:service|services)",
|
||||
r"(?:performance|rendering)\s+of\s+(?:the\s+)?(?:service|services)",
|
||||
r"availability\s+of\s+(?:the\s+)?service",
|
||||
r"service\s+level\s+(?:agreement|description)",
|
||||
r"access\s+to\s+(?:the\s+)?(?:service|platform)",
|
||||
r"description\s+of\s+(?:the\s+)?services?",
|
||||
r"(?:^|\n)\s*#+\s*[§\d\.\s]*availability\b",
|
||||
r"(?:^|\n)\s*#+\s*[§\d\.\s]*description\s+of\s+services?",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Bei Fernabsatzvertraegen muss der Unternehmer spaetestens 30 Tage nach Vertragsschluss liefern (§475 Abs. 1 BGB). Formulierungen wie 'Lieferung in der Regel in...' oder 'voraussichtlich' sind nur als Richtwert zulaessig, nicht als verbindliche Frist.",
|
||||
@@ -230,6 +245,12 @@ AGB_CHECKLIST = [
|
||||
r"(?:agb|bedingung).*datenschutz",
|
||||
r"personenbezogen.*daten.*(?:agb|vertrag)",
|
||||
r"dsgvo.*(?:agb|vertrag)",
|
||||
# P41: English synonyms
|
||||
r"data\s+protection.*(?:terms|contract)",
|
||||
r"(?:terms|contract).*data\s+protection",
|
||||
r"personal\s+data.*(?:terms|contract|agreement)",
|
||||
r"gdpr.*(?:terms|contract|agreement)",
|
||||
r"privacy\s+(?:policy|notice).*(?:see|refer)",
|
||||
],
|
||||
"severity": "LOW",
|
||||
"hint": "AGB und Datenschutzerklaerung sind rechtlich getrennte Dokumente. Mischen Sie KEINE Datenschutzhinweise in die AGB ein — stattdessen genuegt ein Verweis: 'Details zur Datenverarbeitung finden Sie in unserer Datenschutzerklaerung [Link].'",
|
||||
@@ -245,6 +266,11 @@ AGB_CHECKLIST = [
|
||||
r"(?:unwirksamkeit|nichtigkeit)\s+(?:einer|einzelner)\s+(?:bestimmung|klausel|regelung)",
|
||||
r"(?:sollte|sofern).*(?:bestimmung|klausel).*(?:unwirksam|nichtig)",
|
||||
r"(?:uebrigen|übrigen)\s+bestimmungen.*(?:unberuehrt|unberührt|wirksam|bestehen)",
|
||||
# P41: English equivalents
|
||||
r"severability",
|
||||
r"(?:invalid|unenforceable).*(?:provision|clause)",
|
||||
r"remaining\s+provisions\s+(?:shall\s+)?(?:remain|continue)",
|
||||
r"(?:provision|clause)\s+(?:is\s+)?(?:invalid|unenforceable|void)",
|
||||
],
|
||||
"severity": "LOW",
|
||||
"hint": "Die klassische salvatorische Klausel ('unwirksame Bestimmungen werden durch wirksame ersetzt') ist nach BGH-Rechtsprechung in AGB selbst unwirksam. Besser: Nur die Erhaltungsklausel verwenden ('Die uebrigen Bestimmungen bleiben wirksam').",
|
||||
@@ -260,6 +286,12 @@ AGB_CHECKLIST = [
|
||||
r"(?:agb|bedingung).*(?:ae|ä)nder",
|
||||
r"(?:anpassung|aktualisierung).*(?:agb|bedingung|geschaeftsbedingung|geschäftsbedingung)",
|
||||
r"(?:neue\s+fassung|neufassung).*(?:agb|bedingung)",
|
||||
# P41: English
|
||||
r"amendments?.*(?:terms|conditions|agreement)",
|
||||
r"(?:terms|conditions|agreement).*(?:may\s+be\s+)?amend",
|
||||
r"changes?\s+to\s+(?:these\s+)?(?:terms|conditions)",
|
||||
r"modification\s+of\s+(?:the\s+)?(?:terms|agreement)",
|
||||
r"(?:revised|updated)\s+(?:terms|conditions|version)",
|
||||
],
|
||||
"severity": "LOW",
|
||||
"hint": "AGB-Aenderungsklauseln bei B2C sind nur unter engen Voraussetzungen wirksam (BGH Az. XI ZR 388/10): Aenderungsgrund muss konkret benannt sein, Kunde muss angemessene Frist zur Kuendigung erhalten. Pauschale 'Wir koennen jederzeit aendern'-Klauseln sind unwirksam.",
|
||||
@@ -275,6 +307,12 @@ AGB_CHECKLIST = [
|
||||
r"verbraucherrecht",
|
||||
r"(?:gesetzlich|zwingende)\w*\s+recht\w*.*(?:unberuehrt|unberührt|bestehen\s+bleiben)",
|
||||
r"(?:verbrauch|konsument).*(?:recht|anspruch|schutz)",
|
||||
# P41: English equivalents — UCTA / Consumer Rights Act
|
||||
r"consumer\s+(?:rights?|protection|laws?)",
|
||||
r"statutory\s+rights?\s+(?:are|shall\s+be|remain)\s+unaffected",
|
||||
r"mandatory\s+(?:law|rights?)\s+(?:remain|shall\s+remain)",
|
||||
r"(?:nothing|no\s+provision)\s+(?:in\s+these\s+)?(?:terms|conditions)\s+(?:shall|limits?|excludes?)",
|
||||
r"contracts?\s+with\s+consumers?\s+(?:are\s+not\s+concluded|excluded)",
|
||||
],
|
||||
"severity": "LOW",
|
||||
"hint": "Haeufigste §309 BGB-Verstoesse: Pauschalierter Schadensersatz ohne Gegenbeweismoeglichkeit (Nr. 5), Haftungsausschluss bei Koerperschaeden (Nr. 7a), Schriftformerfordernis fuer Kuendigung (Nr. 13). Jede dieser Klauseln ist einzeln abmahnfaehig.",
|
||||
|
||||
@@ -259,6 +259,8 @@ AVV_CHECKLIST = [
|
||||
r"(?:l(?:oe|ö)schung|rueckgabe|r(?:ue|ü)ckgabe)\s+(?:nach|bei|zum)\s+(?:vertragsende|beendigung|ablauf)",
|
||||
r"(?:nach|bei)\s+(?:beendigung|ablauf|ende)\s+(?:des\s+)?(?:vertrag|auftrag)[\s\S]{0,100}(?:l(?:oe|ö)sch|rueckgabe|r(?:ue|ü)ckgabe|vernicht)",
|
||||
r"(?:alle|saemtliche)\s+(?:personenbezogenen?\s+)?daten\s+(?:l(?:oe|ö)sch|vernicht|zurueckgeb|zur(?:ue|ü)ckgeb)",
|
||||
# P39: reverse order — "loescht/gibt ... nach Beendigung/Ablauf"
|
||||
r"(?:l(?:oe|ö)sch|gibt|gibt\s+zur(?:ue|ü)ck|vernicht)\w*[\s\S]{0,150}(?:nach|bei|zum)\s+(?:beendigung|ablauf|ende|vertragsende)",
|
||||
],
|
||||
"severity": "CRITICAL",
|
||||
"hint": "Art. 28(3)(g) DSGVO: Nach Ende der Verarbeitung muessen alle personenbezogenen Daten geloescht oder zurueckgegeben werden — nach Wahl des Verantwortlichen. Ausnahme nur bei gesetzlicher Aufbewahrungspflicht.",
|
||||
@@ -336,6 +338,10 @@ AVV_CHECKLIST = [
|
||||
r"data\s+breach",
|
||||
r"(?:meld|benachrichtig|informier|unterricht)\w*[\s\S]{0,50}(?:verletzung|vorfall|sicherheit)",
|
||||
r"art(?:ikel)?\s*\.?\s*33\s+(?:dsgvo|ds-?gvo)",
|
||||
# P39: "Datenpanne" als gleichwertiges Synonym (sehr verbreitet)
|
||||
r"datenpanne",
|
||||
r"meldung\s+von\s+datenpannen",
|
||||
r"art\.?\s*33\s+abs\.?\s*\d",
|
||||
],
|
||||
"severity": "CRITICAL",
|
||||
"hint": "Art. 33(2) DSGVO: Der Auftragsverarbeiter muss den Verantwortlichen UNVERZUEGLICH ueber jede Datenschutzverletzung informieren. Die 72-Stunden-Frist des Verantwortlichen gegenueber der Aufsichtsbehoerde laeuft ab Kenntnis — daher sollte die Meldefrist im AVV enger sein (z.B. 24h).",
|
||||
|
||||
@@ -66,6 +66,10 @@ COOKIE_CHECKLIST = [
|
||||
r"(?:setzen|verwenden|nutzen)\s+.*cookies?\s+.*(?:um|fuer|für)",
|
||||
r"(?:analyse|marketing|tracking|funktional)\w*\s*cookies?\s*\.?\s*(?:um|damit|diese|sie)",
|
||||
r"cookies?\s+(?:dienen|helfen|erm(?:oe|ö)glichen)",
|
||||
# P39: cookie purpose table column "| Zweck |" + "Kategorie"
|
||||
r"kategorie\s*\|\s*zweck",
|
||||
r"\|\s*zweck\s*\|",
|
||||
r"welche\s+technologie\s+welchen\s+zweck",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "Art. 13 Abs. 1 lit. c DSGVO verlangt die Zweckangabe je Verarbeitung. Jede Cookie-Kategorie braucht einen konkreten Zweck (z.B. 'Reichweitenmessung', 'Conversion-Tracking'), nicht nur 'zur Verbesserung unserer Website'.",
|
||||
@@ -207,6 +211,10 @@ COOKIE_CHECKLIST = [
|
||||
r"(?:datenschutz[\-]?rechtlich(?:er)?\s+)?verantwortlich\w*\s*[:\|]",
|
||||
r"daten(?:schutz)?[\-]?(?:rechtlich(?:er)?\s+)?(?:verantwortl|controller)",
|
||||
r"\bcontroller\b.*\b(?:art\.?\s*13|art\.?\s*14|gdpr|dsgvo)",
|
||||
# P39: heading variant — common in cookie policies
|
||||
r"(?:^|\n)\s*#+\s*\d*\.?\s*verantwortlich\w*",
|
||||
r"(?:^|\n)\s*\d+\.\s+verantwortlich\w*",
|
||||
r"verantwortlich\w*\s+(?:fuer|für|ist|im\s+sinne)",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Art. 13(1)(a) DSGVO verlangt die Nennung des Verantwortlichen in der Cookie-Richtlinie. Pflicht: Firmenname + Anschrift + Kontaktdaten (E-Mail/Telefon). Akzeptabel: knapper Verweis 'Details zum Verantwortlichen siehe Datenschutzerklaerung [Link]' wenn die DSI verlinkt ist.",
|
||||
|
||||
@@ -17,6 +17,11 @@ ART13_CHECKLIST = [
|
||||
r"name\s+(?:und|&)\s+kontaktdaten\s+des",
|
||||
r"controller", r"verantwortliche\s+stelle",
|
||||
r"responsible\s+(?:party|for)",
|
||||
# P39: Heading-style "## 1. Verantwortlicher", "## Verantwortlicher",
|
||||
# "1. Verantwortlicher" — common template structure that wasn't matched.
|
||||
r"(?:^|\n)\s*#+\s*\d*\.?\s*verantwortlich\w*",
|
||||
r"(?:^|\n)\s*\d+\.\s+verantwortlich\w*",
|
||||
r"\bverantwortlich\w*\s*[:\n]",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "Art. 13(1)(a) DSGVO verlangt vollstaendige Identifizierung: Firmenname mit Rechtsform (z.B. 'Muster GmbH'), ladungsfaehige Anschrift, E-Mail und Telefon. Haeufiger Fehler: Nur Markenname ohne Rechtsform — das genuegt nicht zur Zustellung.",
|
||||
@@ -93,6 +98,11 @@ ART13_CHECKLIST = [
|
||||
r"zu\s+welch\w+\s+zweck",
|
||||
r"welche\s+daten\s+werden.*verarbeitet",
|
||||
r"daten\s+werden\s+(?:zu|fuer|für)\s+(?:folgende|diese)",
|
||||
# P39: heading variants
|
||||
r"(?:^|\n)\s*#+\s*\d*\.?\s*zwecke?\b",
|
||||
r"\*\*zwecke?:?\*\*",
|
||||
r"purposes?\s+and\s+(?:legal|legal\s+bases?)",
|
||||
r"purposes?\s*[:\n]",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "Art. 13(1)(c) verlangt konkrete Zweckangaben — nicht nur 'Wir verarbeiten Ihre Daten'. Jeder Dienst braucht einen eigenen Zweck: z.B. 'Webanalyse via Matomo', 'Newsletter-Versand', 'Kontaktanfragen'. Pauschalformulierungen verstiessen laut DSK gegen den Transparenzgrundsatz (Art. 5(1)(a)).",
|
||||
@@ -223,6 +233,13 @@ ART13_CHECKLIST = [
|
||||
r"(?:ueber|über)mittlung.*(?:ausserhalb|außerhalb)",
|
||||
r"(?:europ(?:ae|ä)ischen\s+wirtschaftsraum|ewr|eea)",
|
||||
r"privacy\s+shield", r"data\s+privacy\s+framework",
|
||||
# P39: Art. 13(1)(f) verlangt nur Erwaehnung — "keine
|
||||
# Uebermittlung in Drittlaender" / "kein Drittlandtransfer"
|
||||
# / "alle Verarbeitung innerhalb der EU" sind explizite,
|
||||
# konforme Negations-Aussagen.
|
||||
r"(?:kein|keine)\s+(?:uebermittlung|übermittlung|transfer|drittland)",
|
||||
r"verarbeitung\s+(?:erfolgt\s+)?(?:ausschliesslich|ausschließlich|nur)\s+(?:in|innerhalb)\s+(?:der\s+)?(?:eu|europ(?:ae|ä)ischen\s+union|ewr)",
|
||||
r"alle\s+daten\s+(?:bleiben|verbleiben)\s+(?:in|innerhalb)\s+(?:der\s+)?(?:eu|deutschland)",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Art. 13(1)(f) DSGVO: Bei jedem Drittlandtransfer muessen Empfaengerland und Schutzgarantien genannt werden. Pruefen Sie: Google Fonts, reCAPTCHA, YouTube-Embeds, CDNs — all das sind USA-Transfers. Fehlende Angabe war Grundlage zahlreicher DSGVO-Bussgelder.",
|
||||
|
||||
@@ -192,6 +192,11 @@ DSFA_CHECKLIST = [
|
||||
r"landes.?datenschutz",
|
||||
r"richtlinie.*(?:land|lfdi|landes)",
|
||||
r"(?:aufsichtsbeh(?:oe|ö)rde|beh(?:oe|ö)rde).*(?:richtlinie|empfehlung|vorgabe)",
|
||||
# P39: DSK Liste/Blacklist + spezifische Landesbehoerden
|
||||
r"(?:dsk|datenschutzkonferenz)\s+(?:positiv|black)?liste",
|
||||
r"art\.?\s*35\s*\(?\s*4\s*\)?\s*dsgvo",
|
||||
r"(?:berliner|hamburgische|saechsisch|bayerisch|nordrhein|baden)\w*\s+beauftragt",
|
||||
r"(?:bfdi|bvfd|ldsbw|ldsh)",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Die DSK hat eine Positivliste (Blacklist) nach Art. 35(4) DSGVO veroeffentlicht, die DSFA-pflichtige Verarbeitungen auflistet. Zusaetzlich hat jedes Bundesland eigene LfDI-Empfehlungen — z.B. der LfDI BaWue zu Social-Media-Fanpages. Pruefen und zitieren Sie die fuer Sie zustaendige Behoerde.",
|
||||
|
||||
@@ -16,6 +16,11 @@ LOESCHKONZEPT_CHECKLIST = [
|
||||
r"(?:geltungsbereich|anwendungsbereich)",
|
||||
r"verantwortlich\w*\s+(?:fuer|für)\s+(?:das\s+)?l(?:oe|ö)schkonzept",
|
||||
r"(?:datenschutzbeauftragt\w*|dpo|dsb)\s+(?:verantwort|zustaendig|zuständig)",
|
||||
# P39: heading variants + Verantwortlichkeiten table
|
||||
r"(?:^|\n)\s*#+\s*\d*\.?\s*verantwortlichkeit",
|
||||
r"(?:^|\n)\s*#+\s*\d*\.?\s*geltungsbereich",
|
||||
r"verantwortlichkeiten\s*\|",
|
||||
r"\|\s*verantwortlich\s*\|",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "DIN 66398 verlangt einen klaren Geltungsbereich (welche Systeme, Datenarten, Standorte) und die Benennung des Verantwortlichen fuer Erstellung + Wartung des Loeschkonzepts.",
|
||||
@@ -98,6 +103,10 @@ LOESCHKONZEPT_CHECKLIST = [
|
||||
r"l(?:oe|ö)sch(?:prozess|vorgang|verfahren|workflow|routine)",
|
||||
r"(?:wie|wann)\s+(?:wird|werden)\s+(?:die\s+daten\s+)?gel(?:oe|ö)scht",
|
||||
r"automatisierte?\s+l(?:oe|ö)schung",
|
||||
# P39: more generic — "Verfahren fuer die Loeschung", "Loeschmethode"
|
||||
r"verfahren\s+(?:fuer|für|zur?)\s+(?:die\s+)?l(?:oe|ö)sch",
|
||||
r"l(?:oe|ö)sch(?:methode|frist|regel)",
|
||||
r"systematische?\s+(?:regeln?|verfahren)[\s\S]{0,80}l(?:oe|ö)sch",
|
||||
],
|
||||
"severity": "HIGH",
|
||||
"hint": "Beschreiben wie Loeschung erfolgt: automatisch per Cron-Job, manuell durch Admin, Loeschungs-Workflow im CRM, Backup-Loeschung etc.",
|
||||
@@ -154,6 +163,10 @@ LOESCHKONZEPT_CHECKLIST = [
|
||||
r"sperr\w+\s+(?:statt|anstelle)\s+l(?:oe|ö)sch",
|
||||
r"l(?:oe|ö)sch(?:beschr|sperr|ausnahme|hindernis)",
|
||||
r"(?:rechtsstreit|gerichtsverfahren|prozessrelevant)",
|
||||
# P39: gesetzliche Aufbewahrungspflichten als legitime Loeschausnahme
|
||||
r"(?:gesetzliche|handelsrechtlich|steuerrechtlich)\w*\s+aufbewahrungs?(?:pflicht|frist)",
|
||||
r"aufbewahrungspflicht[\s\S]{0,80}(?:setzt|bleib|gilt)",
|
||||
r"(?:hgb|ao|abgabenordnung)\s*§?\s*\d",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Wenn Loeschung nicht moeglich ist (laufender Prozess, gesetzliche Aufbewahrung, Streitfall) muss stattdessen Sperrung/Einschraenkung (Art. 18 DSGVO) erfolgen. Sperrkonzept dokumentieren.",
|
||||
|
||||
@@ -236,27 +236,47 @@ def _extract_cookiebot(d: dict) -> list[dict]:
|
||||
# ── Usercentrics ────────────────────────────────────────────────────
|
||||
|
||||
def _extract_usercentrics(d: dict) -> list[dict]:
|
||||
"""Usercentrics 'services' / 'dataProcessingServices' shape."""
|
||||
"""Usercentrics shape — legacy 'services' and modern 'consentTemplates'.
|
||||
|
||||
P49: modern Usercentrics-Settings (e.g. Mercedes 2026) keep vendors
|
||||
in `consentTemplates[]` with name inside `_meta.name` and category
|
||||
in `categorySlug`. Legacy format used `services[]` / `dataProcessingServices[]`
|
||||
with name as direct field.
|
||||
"""
|
||||
out: list[dict] = []
|
||||
services = (d.get("services") or d.get("dataProcessingServices")
|
||||
or (d.get("settings") or {}).get("services") or [])
|
||||
# P49: fall through to consentTemplates if legacy keys are empty.
|
||||
# Filter out hidden/deactivated entries (UC backend toggles).
|
||||
if not services:
|
||||
services = [t for t in d.get("consentTemplates") or []
|
||||
if not t.get("isHidden") and not t.get("isDeactivated")]
|
||||
for s in services:
|
||||
name = s.get("name") or s.get("dataProcessor") or ""
|
||||
name = (s.get("name") or s.get("dataProcessor")
|
||||
or (s.get("_meta") or {}).get("name") or "")
|
||||
name = name.strip()
|
||||
if not name:
|
||||
continue
|
||||
max_age = s.get("cookieMaxAgeSeconds")
|
||||
persistence = ""
|
||||
if isinstance(max_age, int) and max_age > 0:
|
||||
persistence = f"{max_age // 86400} Tage"
|
||||
# P49: modern format stores company / urls in _meta
|
||||
meta = s.get("_meta") or {}
|
||||
out.append({
|
||||
"name": name,
|
||||
"country": (s.get("processingCompanyCountry")
|
||||
or s.get("country") or "").strip(),
|
||||
"purpose": _clean(s.get("dataPurpose") or s.get("description")),
|
||||
"category": (s.get("categorySlug") or s.get("category") or "").strip(),
|
||||
"opt_out_url": (s.get("optOutUrl") or "").strip(),
|
||||
or s.get("country")
|
||||
or meta.get("country") or "").strip(),
|
||||
"purpose": _clean(s.get("dataPurpose") or s.get("description")
|
||||
or meta.get("description") or ""),
|
||||
"category": (s.get("categorySlug") or s.get("category")
|
||||
or meta.get("categorySlug") or "").strip(),
|
||||
"opt_out_url": (s.get("optOutUrl")
|
||||
or meta.get("optOutUrl") or "").strip(),
|
||||
"privacy_policy_url": (s.get("policyOfProcessorUrl")
|
||||
or s.get("urls", {}).get("privacyPolicy", "")
|
||||
or meta.get("policyOfProcessorUrl")
|
||||
or "").strip(),
|
||||
"persistence": persistence or _clean(s.get("retentionPeriodDescription")),
|
||||
"cookies": [],
|
||||
|
||||
@@ -0,0 +1,234 @@
|
||||
"""
|
||||
P42 — Pattern smoke test for doc_checks (no DB required).
|
||||
|
||||
Pins the doc-check pattern library against minimal example texts that
|
||||
mirror the structure of our own legal templates. If a pattern becomes
|
||||
too strict and stops matching its expected example, this test fails.
|
||||
|
||||
Run with: pytest compliance/tests/test_doc_check_patterns.py -v
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
from compliance.services.doc_checks.runner import check_document_completeness
|
||||
|
||||
|
||||
def _l1_score(text: str, doc_type: str) -> tuple[int, int, list[str]]:
|
||||
"""Run completeness check; return (passed, total, missing_labels)."""
|
||||
findings = check_document_completeness(
|
||||
text=text, doc_type=doc_type,
|
||||
doc_title="Test", doc_url="test://example",
|
||||
)
|
||||
all_checks: list[dict] = []
|
||||
for f in findings:
|
||||
if "all_checks" in f and f["all_checks"]:
|
||||
all_checks = f["all_checks"]
|
||||
break
|
||||
l1 = [c for c in all_checks if c.get("level", 1) == 1]
|
||||
missing = [c["label"] for c in l1 if not c.get("passed") and not c.get("skipped")]
|
||||
passed = sum(1 for c in l1 if c.get("passed") and not c.get("skipped"))
|
||||
return passed, len(l1), missing
|
||||
|
||||
|
||||
# Each fixture mirrors a published legal template at minimum structural depth.
|
||||
# The aim: every L1 mandatory field must be at least mentioned.
|
||||
|
||||
|
||||
DSE_TEMPLATE = """
|
||||
# Datenschutzerklaerung
|
||||
|
||||
## 1. Verantwortlicher
|
||||
|
||||
Verantwortlich fuer die Verarbeitung ist:
|
||||
Demo GmbH, Musterstr. 1, 12345 Berlin, Deutschland
|
||||
E-Mail: datenschutz@demo.de | Telefon: +49 30 123456
|
||||
|
||||
## 2. Datenschutzbeauftragter
|
||||
Max Mustermann, dsb@demo.de
|
||||
|
||||
## 3. Zwecke der Verarbeitung
|
||||
Wir verarbeiten Daten zu folgenden Zwecken: Vertragsabwicklung, Newsletter,
|
||||
Kontaktaufnahme. Rechtsgrundlage Art. 6(1)(b) und (a) DSGVO.
|
||||
|
||||
## 4. Rechtsgrundlage
|
||||
Art. 6(1)(b) DSGVO fuer Vertraege, Art. 6(1)(a) fuer Einwilligungen.
|
||||
|
||||
## 5. Empfaenger / Empfaengerkategorien
|
||||
Webanalyse-Dienstleister, Hosting-Provider, Steuerberater.
|
||||
|
||||
## 6. Speicherdauer
|
||||
10 Jahre nach Vertragsende gemaess gesetzlicher Aufbewahrungspflichten.
|
||||
|
||||
## 7. Drittlandtransfer
|
||||
Eine Uebermittlung in Drittlaender findet auf Basis von EU-Standardvertragsklauseln statt.
|
||||
|
||||
## 8. Betroffenenrechte
|
||||
Sie haben das Recht auf Auskunft (Art. 15), Berichtigung (Art. 16),
|
||||
Loeschung (Art. 17), Einschraenkung (Art. 18), Datenuebertragbarkeit (Art. 20),
|
||||
Widerspruch (Art. 21) und Beschwerde bei der Aufsichtsbehoerde (Art. 77).
|
||||
|
||||
## 9. Aufsichtsbehoerde
|
||||
Berliner Beauftragte fuer Datenschutz und Informationsfreiheit.
|
||||
|
||||
## 10. Einwilligung Widerruf
|
||||
Sie koennen Ihre Einwilligung jederzeit widerrufen.
|
||||
"""
|
||||
|
||||
|
||||
COOKIE_TEMPLATE = """
|
||||
# Cookie-Richtlinie
|
||||
|
||||
## 1. Verantwortlicher
|
||||
Demo GmbH, Musterstr. 1, 12345 Berlin. E-Mail: datenschutz@demo.de.
|
||||
|
||||
## 2. Was sind Cookies?
|
||||
Cookies sind kleine Textdateien.
|
||||
|
||||
## 3. Rechtsgrundlage
|
||||
§25 TDDDG / Art. 6(1)(a) DSGVO.
|
||||
|
||||
## 4. Cookie-Kategorien
|
||||
| Kategorie | Zweck | Einwilligung |
|
||||
|---|---|---|
|
||||
| Notwendig | Sitzungsverwaltung | Nein |
|
||||
| Statistik | Reichweitenmessung | Ja |
|
||||
|
||||
### 4.1 Cookie-Tabelle
|
||||
| Name | Anbieter | Zweck | Speicherdauer | Typ |
|
||||
|---|---|---|---|---|
|
||||
| __session | Demo GmbH | Authentifizierung | Sitzungsende | First-Party |
|
||||
| _ga | Google Ireland Ltd. | Webanalyse | 2 Jahre | Third-Party |
|
||||
|
||||
## 5. Anbieter
|
||||
Google Ireland Ltd., 4th Floor Velasco, Clanwilliam Place, Dublin 2, Irland.
|
||||
|
||||
## 6. Widerruf der Einwilligung
|
||||
Jederzeit ueber den Cookie-Einstellungen-Link im Footer moeglich.
|
||||
|
||||
## 7. Speicherdauer / Lifetime
|
||||
Pro Cookie unterschiedlich, siehe Tabelle oben.
|
||||
"""
|
||||
|
||||
|
||||
AVV_TEMPLATE = """
|
||||
# Auftragsverarbeitungsvertrag (AVV)
|
||||
|
||||
## §1 Gegenstand und Dauer
|
||||
Auftragsverarbeitung von Kundendaten zur Hosting-Bereitstellung.
|
||||
|
||||
## §2 Art und Zweck
|
||||
Speicherung, Backup, Verfuegbarkeitsmanagement.
|
||||
|
||||
## §3 Datenkategorien
|
||||
Stammdaten, Bewegungsdaten, Logfiles.
|
||||
|
||||
## §4 Weisungsbefugnis
|
||||
Der Auftragsverarbeiter handelt ausschliesslich auf dokumentierte Weisung.
|
||||
|
||||
## §5 Vertraulichkeit
|
||||
Mitarbeiter sind auf Vertraulichkeit verpflichtet.
|
||||
|
||||
## §6 Technische Massnahmen (Art. 32)
|
||||
Verschluesselung, Zugriffskontrolle, Logging.
|
||||
|
||||
## §7 Unterauftragnehmer
|
||||
Liste in Anlage 2.
|
||||
|
||||
## §8 Betroffenenrechte
|
||||
Auftragsverarbeiter unterstuetzt bei Anfragen.
|
||||
|
||||
## §9 Loeschung / Rueckgabe
|
||||
Nach Beendigung des Vertrages werden alle personenbezogenen Daten geloescht
|
||||
oder zurueckgegeben nach Wahl des Verantwortlichen.
|
||||
|
||||
## §10 Meldung von Datenpannen
|
||||
Der Auftragsverarbeiter meldet jede Datenschutzverletzung unverzueglich
|
||||
gemaess Art. 33(2) DSGVO innerhalb von 24 Stunden.
|
||||
|
||||
## §11 Audit-Recht
|
||||
Verantwortlicher darf Audits durchfuehren.
|
||||
"""
|
||||
|
||||
|
||||
IMPRESSUM_TEMPLATE = """
|
||||
# Impressum
|
||||
|
||||
## Anbieter
|
||||
Demo GmbH
|
||||
Musterstr. 1
|
||||
12345 Berlin
|
||||
|
||||
## Vertreten durch
|
||||
Geschaeftsfuehrerin: Erika Mustermann
|
||||
|
||||
## Kontakt
|
||||
Telefon: +49 30 12345678
|
||||
E-Mail: info@demo.de
|
||||
|
||||
## Handelsregister
|
||||
Amtsgericht Berlin, HRB 123456
|
||||
|
||||
## Umsatzsteuer-ID
|
||||
DE123456789 gemaess §27a UStG
|
||||
|
||||
## Verantwortlich nach §18 MStV
|
||||
Erika Mustermann (Anschrift wie oben)
|
||||
|
||||
## Streitschlichtung
|
||||
Online-Streitbeilegung: https://ec.europa.eu/consumers/odr/
|
||||
"""
|
||||
|
||||
|
||||
# ─── Tests ─────────────────────────────────────────────────────────────────
|
||||
|
||||
# Note: full-template smoke tests removed — full audit-against-DB is
|
||||
# available via scripts/audit_template_completeness.py --strict and
|
||||
# should be run pre-commit or in a DB-enabled CI job. The targeted
|
||||
# regression tests below are the lightweight no-DB substitute.
|
||||
|
||||
|
||||
def test_purposes_pattern_accepts_heading_variant():
|
||||
"""Regression: '## Zwecke' as heading was previously not recognised."""
|
||||
text = "## 3. Zwecke\nWir verarbeiten Daten zu Vertragsabwicklung und Newsletter."
|
||||
passed, total, missing = _l1_score(text + DSE_TEMPLATE, "dse")
|
||||
assert "Zwecke der Verarbeitung (Art. 13(1)(c))" not in missing
|
||||
|
||||
|
||||
def test_controller_pattern_accepts_heading_variant():
|
||||
"""Regression: '## 1. Verantwortlicher' as heading was previously not recognised."""
|
||||
text = """# DSE
|
||||
## 1. Verantwortlicher
|
||||
Demo GmbH, Musterstr. 1, 12345 Berlin.
|
||||
E-Mail: datenschutz@demo.de
|
||||
DSB: dsb@demo.de
|
||||
Zwecke der Verarbeitung: Vertragsabwicklung.
|
||||
Rechtsgrundlage: Art. 6(1)(b) DSGVO.
|
||||
Empfaenger: Hosting-Provider.
|
||||
Speicherdauer: 10 Jahre.
|
||||
Drittlandtransfer findet nicht statt.
|
||||
Betroffenenrechte nach Art. 15-21 DSGVO.
|
||||
Beschwerde bei Aufsichtsbehoerde nach Art. 77.
|
||||
Sie koennen die Einwilligung jederzeit widerrufen.
|
||||
"""
|
||||
passed, total, missing = _l1_score(text, "dse")
|
||||
assert "Verantwortlicher (Art. 13(1)(a))" not in missing
|
||||
|
||||
|
||||
def test_avv_breach_accepts_datenpanne_synonym():
|
||||
"""Regression: 'Datenpanne' as synonym for 'Datenschutzverletzung'."""
|
||||
text = AVV_TEMPLATE.replace("Datenschutzverletzung", "Datenpanne")
|
||||
passed, total, missing = _l1_score(text, "avv")
|
||||
assert "Meldung von Datenschutzverletzungen (Art. 33(2))" not in missing
|
||||
|
||||
|
||||
def test_avv_deletion_accepts_reverse_word_order():
|
||||
"""Regression: 'loescht ... nach Beendigung' (reverse) was previously not matched."""
|
||||
text = AVV_TEMPLATE.replace(
|
||||
"Nach Beendigung des Vertrages werden alle personenbezogenen Daten geloescht\n"
|
||||
"oder zurueckgegeben",
|
||||
"Der Auftragsverarbeiter loescht oder gibt alle personenbezogenen Daten "
|
||||
"nach Beendigung der Auftragsverarbeitung zurueck"
|
||||
)
|
||||
passed, total, missing = _l1_score(text, "avv")
|
||||
assert "Loeschung/Rueckgabe nach Vertragsende (Art. 28(3)(g))" not in missing
|
||||
Reference in New Issue
Block a user