fix(cookie): Präfix-Matcher über-matcht kurze generische Basen nicht mehr
CI / detect-changes (push) Successful in 15s
CI / guardrail-integrity (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 10s
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Successful in 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 59s
CI / iace-gt-coverage (push) Successful in 28s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

Die Deklaration-vs-Bibliothek-Sicht deckte sofort einen Fehl-Match auf:
'cct_chatSessionToken' (Genesys-Webchat) traf die Library-Basis 'cct'
(actual_category Marketing, purpose 'shopping cart') → falsches
'necessary→Marketing'-Finding. Ursache: gekürzte 3-Zeichen-Basis ohne
führenden _.

_is_distinctive_base: gekürzte Präfix-Basis nur akzeptieren bei ≥4 Zeichen
ODER führendem '_' (kanonische Cookies wie '_ga'). GTM-/AdobeOrg-/Hash-
Suffix-Stripping bleibt erhalten (Tests grün), generische 'cct'/'sid'/'gtm'
über-matchen nicht mehr.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-11 21:26:47 +02:00
parent 403e3c66d2
commit 99901bba0a
2 changed files with 23 additions and 2 deletions
@@ -57,10 +57,22 @@ _HINWEIS_TYPES = {"third_country", "eu_alternative"}
_SEP_RE = re.compile(r"[_\-.:$%@\[]")
def _is_distinctive_base(s: str) -> bool:
"""Ist `s` als gekürzte Präfix-Basis distinktiv genug für einen Match?
Kurze generische Abkürzungen ohne führenden '_' (z.B. 'cct', 'sid', 'gtm')
über-matchen sonst fremde Cookie-Familien: 'cct_chatSessionToken'
(Genesys-Webchat) würde die Library-Zeile 'cct' (actual_category Marketing,
purpose 'shopping cart') treffen → Fehl-Finding. Kanonische Tracking-Cookies
beginnen dagegen mit '_' ('_ga', '_gid', '_pk_id') und bleiben ab 3 Zeichen
erlaubt; alle anderen erst ab 4."""
return len(s) >= 4 or (len(s) >= 3 and s.startswith("_"))
def _candidate_keys(name: str) -> list[str]:
"""Library-Match-Kandidaten: voller (entwildcardeter) Name + Präfixe an
Trennzeichen. Fängt Per-Instanz-Suffixe (GTM-Container, @AdobeOrg, Hash-IDs),
ohne kurze generische Namen zu über-matchen (Mindestlänge 3)."""
ohne kurze generische Namen zu über-matchen (siehe _is_distinctive_base)."""
from compliance.services.cookie_library_lookup import _strip_wildcards
base = _strip_wildcards(name)
keys: list[str] = []
@@ -72,7 +84,7 @@ def _candidate_keys(name: str) -> list[str]:
if not seps:
break
cur = cur[:seps[-1].start()].rstrip("_-.:$%@")
if len(cur) >= 3 and cur not in keys:
if _is_distinctive_base(cur) and cur not in keys:
keys.append(cur)
else:
break
@@ -15,6 +15,15 @@ def test_candidate_keys_strips_runtime_suffix():
assert "_pk_id" in _candidate_keys("_pk_id.5.7d8f")
def test_candidate_keys_rejects_short_generic_base():
# 'cct' (3 Zeichen, kein führender _) darf NICHT als Präfix-Basis dienen —
# sonst matchen alle 'cct_*' (Genesys-Webchat) die fremde Library-Zeile
# 'cct' (Marketing). Kanonische Basen mit führendem _ bleiben erlaubt.
assert "cct" not in _candidate_keys("cct_chatSessionToken")
assert "sid" not in _candidate_keys("sid_tracker_x")
assert "_ga" in _candidate_keys("_ga_GTM-ABC123")
def test_match_lib_prefix_and_exact():
lib = {"_ga": {"actual_category": "statistics"},
"phpsessid": {"actual_category": "essential"}}