fix(cookie-inventory): fuzzy prefix-match + BMW-GT-File

BMW-Mail zeigte 738 deklariert / 31 Browser / **0 OK** — alle
Browser-Cookies landeten als UNDOC, alle deklarierten als ORPH.
Ursache: exact-string-match scheitert bei Suffix-Cookies.

_norm_for_match() + _matches() Helper:
  - Strippt Wildcards (`*`, `.*`, `<id>`, `{var}`) + Lower-Case
  - Erhält führende Underscores (`__cf_bm`, `_ga` sind meaningful)
  - Prefix-Match in BEIDE Richtungen, min 3 Chars (kein "_"-Garbage)

build_cookie_inventory():
  - Für jeden Browser-Cookie: längster Prefix-Match in declared wählen
  - browser-to-decl Index + decl-match-Index für O(N×M) → O(N+M)
  - matched browser-keys werden aus all_keys entfernt → kein
    Double-Count (vorher: ORPH + UNDOC parallel)

Realistischer BMW-Match-Test:
  declared=[_ga, _gid, __cf_bm, AMP_TOKEN, _fbp, intercom-session,
            _pk_id.*, OptanonConsent]
  browser= [_ga_K8YL3M9T, _gid_xyz, __cf_bm_actual_hash,
            AMP_TOKEN_runtime, _fbp_123, intercom-session-2026,
            _pk_id.5.7d8, OptanonConsent]
  → 8 OK (vorher 0)

BMW-GT-File (zeroclaw/docs/ground-truth/bmw_de_2026-06-07.json):
  - OneTrust CMP + 14 erwartete Vendoren
  - Cookie-Count-Ranges (browser 80-250, deklariert 300-800)
  - 7 expected findings inkl. neuem COOKIE-INVENTORY-MATCH-001 als
    Benchmark gegen den Fuzzy-Match-Bug

Tests: 14/14 grün (4 _norm_for_match + 5 _matches + 5
build_cookie_inventory inkl. realistic_bmw_pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-07 21:29:21 +02:00
parent b16130369a
commit 0b29d1fada
3 changed files with 289 additions and 2 deletions
@@ -0,0 +1,107 @@
{
"site": "bmw.de",
"crawled_at": "2026-06-07",
"crawler": "BreakPilot-Compliance Audit-Run + Web-Recherche",
"notes": [
"BMW Group DE-Site — Konzern-Stack: BMW, MINI, BMW M, BMW i, Connected Drive, Financial Services, Performance.",
"Verantwortlicher: Bayerische Motoren Werke Aktiengesellschaft (München).",
"CMP: OneTrust (häufigster Stack im Konzern-Auto-Segment).",
"DSE listet typischerweise mehrere hundert Cookies (alle Marken/Regionen aggregiert).",
"Connected-Drive-AI-Assistant — schauen ob AI-Act Art. 50 Hinweis im Chat-UI."
],
"expected_url_layout": {
"impressum": "/de/footer/footer-section/imprint.html",
"dse": "/de/footer/datenschutz-cookies/datenschutz-bmw-website.html",
"cookie": "/de/footer/datenschutz-cookies/cookie-richtlinie-de.html",
"agb_or_nutzungsbedingungen": "/de/footer/footer-section/terms-of-use.html",
"widerrufsbelehrung": "unbekannt — bei Online-Shop-Komponenten (M Performance Parts Onlineshop) erforderlich"
},
"expected_vendors_in_dse": [
{"name": "OneTrust", "country": "US", "category": "CMP"},
{"name": "Google Analytics", "country": "US", "category": "Analytics"},
{"name": "Google Tag Manager", "country": "US", "category": "Tag-Mgmt"},
{"name": "Google Ads / DoubleClick", "country": "US", "category": "Marketing"},
{"name": "Meta Pixel", "country": "US", "category": "Marketing"},
{"name": "Adobe Analytics", "country": "US", "category": "Analytics"},
{"name": "Adobe Target", "country": "US", "category": "Personalisierung"},
{"name": "Salesforce Marketing Cloud", "country": "US", "category": "CRM/Marketing"},
{"name": "Sitecore", "country": "US", "category": "CMS"},
{"name": "Cloudflare", "country": "US", "category": "CDN/Bot"},
{"name": "Microsoft Clarity", "country": "US", "category": "Session-Replay"},
{"name": "LinkedIn Insight Tag", "country": "US/IE", "category": "Marketing"},
{"name": "YouTube", "country": "US", "category": "Embed/Marketing"},
{"name": "BMW Connected Drive AI", "country": "DE", "category": "AI-Assistant (vermutet)"}
],
"expected_cookie_count_ranges": {
"im_browser_nach_accept": "80250 (BMW.de allein, ohne Sub-Domains)",
"deklariert_in_dse": "300800 (Konzern-DSE deckt mehrere Marken)",
"match_quote_OK_in_browser": ">85% — Standard-Cookies (_ga, __cf_bm, OptanonConsent) müssen matchen",
"third_country_cookies": "6090% (US-Vendoren dominieren)"
},
"expected_findings": [
{
"id": "AI-ACT-TRANSPARENCY-001",
"severity": "HIGH",
"title": "AI-Act Art. 50 Pre-Interaction-Disclosure für Connected-Drive-AI nicht prüfbar ohne Live-Test",
"evidence": "BMW Connected Drive nutzt AI-Assistenten. DSE nennt KI-Einsatz, aber Pre-Chat-Disclosure am Widget muss live verifiziert werden.",
"expected_pass": "UNKNOWN-LIKELY-PARTIAL"
},
{
"id": "TH-RETENTION-001",
"severity": "MEDIUM",
"title": "Aufbewahrungsdauer pro Cookie unvollständig — Konzern-DSE listet viele ohne Speicherdauer",
"evidence": "Bei einer Cookie-Liste von 300+ Cookies fehlt erfahrungsgemäß bei 40-60% die explizite Speicherdauer (Art. 13 Abs. 2 lit. a DSGVO).",
"expected_pass": "PARTIAL"
},
{
"id": "TRANSFER-001",
"severity": "MEDIUM",
"title": "US-Transfer-Mechanismus pro Vendor inkonsistent benannt",
"evidence": "Google/Meta meist auf DPF, Salesforce auf SCCs, Cloudflare implizit. Detailgrad pro Vendor uneinheitlich (typisches Großkonzern-Pattern).",
"expected_pass": "PARTIAL"
},
{
"id": "IMPRESSUM-001",
"severity": "LOW",
"title": "Konzern-Impressum vermutlich vollständig — single legal entity (BMW AG)",
"evidence": "BMW AG ist Hauptverantwortlicher. Konzern-Konstellation: HRB München, USt-IdNr, Vorstand (mehrere Personen) — Multi-Entity-Bug-Trigger nicht erwartet.",
"expected_pass": "PASS"
},
{
"id": "URL-STRUCTURE-001",
"severity": "LOW",
"title": "Vermutlich Standard-Slug-Drift (Standard-Slugs wie /impressum 404)",
"evidence": "BMW nutzt Subpaths unter /footer/. /impressum direkt → wahrscheinlich 404 oder Redirect.",
"expected_pass": false
},
{
"id": "COOKIE-INVENTORY-MATCH-001",
"severity": "HIGH",
"title": "Match-Quote zwischen DSE-Cookies und Browser-Cookies muss >85% sein",
"evidence": "Engine muss Standard-Cookies wie _ga (declared) ↔ _ga_K8YL3M9T (browser), __cf_bm ↔ __cf_bm_<hash> per Prefix-Match folden. <85% = Fuzzy-Match-Bug.",
"expected_pass": "BENCHMARK"
},
{
"id": "COOKIE-CONSENT-UX-001",
"severity": "MEDIUM",
"title": "Mobile-Reachability für Consent-Reopen via OneTrust",
"evidence": "OneTrust-Footer-Link 'Cookie-Einstellungen' muss Tap-Target ≥ 44×44 px haben (Apple HIG / WCAG 2.5.5).",
"expected_pass": "UNKNOWN"
}
],
"expected_b17_walk_behaviour": {
"footer_links_min": 6,
"accordion_expansion_on_dse": "wahrscheinlich >5 (BMW DSE hat Akkordeons für Cookie-Tabellen)",
"banner_tour_clicks": "10-30 (OneTrust hat viele Tab/Category-Toggles)"
},
"summary_for_breakpilot_audit_comparison": {
"high_severity_findings_count": 2,
"medium_severity_findings_count": 3,
"low_severity_findings_count": 2,
"must_detect_to_pass_benchmark": [
"AI-ACT-TRANSPARENCY-001",
"URL-STRUCTURE-001",
"COOKIE-INVENTORY-MATCH-001"
]
}
}