c908fcd5eb
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
70 lines
2.2 KiB
Python
70 lines
2.2 KiB
Python
"""Tests for the 3-Layer Cookie-Lookup-Service."""
|
|
|
|
from compliance.services.cookie_library_lookup import (
|
|
_is_specific_enough,
|
|
_name_matches,
|
|
_strip_wildcards,
|
|
)
|
|
|
|
|
|
class TestStripWildcards:
|
|
def test_lowercase(self):
|
|
assert _strip_wildcards("_GA") == "_ga"
|
|
|
|
def test_strip_star(self):
|
|
assert _strip_wildcards("_ga*") == "_ga"
|
|
|
|
def test_strip_dotstar(self):
|
|
assert _strip_wildcards("_pk_id.*") == "_pk_id"
|
|
|
|
def test_strip_trailing_underscore(self):
|
|
# OCD-Pattern: trailing _ is implicit wildcard
|
|
assert _strip_wildcards("guest_uuid_essential_") == "guest_uuid_essential"
|
|
|
|
def test_strip_trailing_dot(self):
|
|
assert _strip_wildcards("_pk_id.") == "_pk_id"
|
|
|
|
|
|
class TestIsSpecificEnough:
|
|
def test_long_name(self):
|
|
assert _is_specific_enough("OptanonConsent")
|
|
|
|
def test_short_with_separator(self):
|
|
assert _is_specific_enough("_ga")
|
|
|
|
def test_short_no_separator_rejected(self):
|
|
assert not _is_specific_enough("c")
|
|
assert not _is_specific_enough("ID")
|
|
assert not _is_specific_enough("abc")
|
|
|
|
|
|
class TestNameMatches:
|
|
def test_exact(self):
|
|
assert _name_matches("OptanonConsent", "OptanonConsent")
|
|
|
|
def test_prefix_with_separator(self):
|
|
# _ga library + browser _ga_K8YL3M9T
|
|
assert _name_matches("_ga", "_ga_K8YL3M9T")
|
|
# __cf_bm library + browser __cf_bm_hash
|
|
assert _name_matches("__cf_bm", "__cf_bm_hash")
|
|
|
|
def test_short_unspecific_rejected(self):
|
|
# 1-char library entries must not match arbitrary queries
|
|
assert not _name_matches("c", "completely_unknown")
|
|
assert not _name_matches("ID", "IDcharger")
|
|
|
|
def test_prefix_no_separator_rejected(self):
|
|
# Even with longer library, must have separator after prefix
|
|
assert not _name_matches("Compa", "Completely_unknown")
|
|
|
|
def test_wildcard_match(self):
|
|
# _pk_id.* matches _pk_id.5.7d8
|
|
assert _name_matches("_pk_id.*", "_pk_id.5.7d8")
|
|
|
|
def test_trailing_underscore_match(self):
|
|
# guest_uuid_essential_ matches guest_uuid_essential_xyz
|
|
assert _name_matches("guest_uuid_essential_", "guest_uuid_essential_xyz")
|
|
|
|
def test_unrelated(self):
|
|
assert not _name_matches("_ga", "intercom-session")
|