fix(cookie): Präfix-Matcher über-matcht kurze generische Basen nicht mehr
CI / detect-changes (push) Successful in 15s
CI / guardrail-integrity (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 10s
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Successful in 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 59s
CI / iace-gt-coverage (push) Successful in 28s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 15s
CI / guardrail-integrity (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 10s
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Successful in 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 59s
CI / iace-gt-coverage (push) Successful in 28s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Die Deklaration-vs-Bibliothek-Sicht deckte sofort einen Fehl-Match auf: 'cct_chatSessionToken' (Genesys-Webchat) traf die Library-Basis 'cct' (actual_category Marketing, purpose 'shopping cart') → falsches 'necessary→Marketing'-Finding. Ursache: gekürzte 3-Zeichen-Basis ohne führenden _. _is_distinctive_base: gekürzte Präfix-Basis nur akzeptieren bei ≥4 Zeichen ODER führendem '_' (kanonische Cookies wie '_ga'). GTM-/AdobeOrg-/Hash- Suffix-Stripping bleibt erhalten (Tests grün), generische 'cct'/'sid'/'gtm' über-matchen nicht mehr. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -57,10 +57,22 @@ _HINWEIS_TYPES = {"third_country", "eu_alternative"}
|
||||
_SEP_RE = re.compile(r"[_\-.:$%@\[]")
|
||||
|
||||
|
||||
def _is_distinctive_base(s: str) -> bool:
|
||||
"""Ist `s` als gekürzte Präfix-Basis distinktiv genug für einen Match?
|
||||
|
||||
Kurze generische Abkürzungen ohne führenden '_' (z.B. 'cct', 'sid', 'gtm')
|
||||
über-matchen sonst fremde Cookie-Familien: 'cct_chatSessionToken'
|
||||
(Genesys-Webchat) würde die Library-Zeile 'cct' (actual_category Marketing,
|
||||
purpose 'shopping cart') treffen → Fehl-Finding. Kanonische Tracking-Cookies
|
||||
beginnen dagegen mit '_' ('_ga', '_gid', '_pk_id') und bleiben ab 3 Zeichen
|
||||
erlaubt; alle anderen erst ab 4."""
|
||||
return len(s) >= 4 or (len(s) >= 3 and s.startswith("_"))
|
||||
|
||||
|
||||
def _candidate_keys(name: str) -> list[str]:
|
||||
"""Library-Match-Kandidaten: voller (entwildcardeter) Name + Präfixe an
|
||||
Trennzeichen. Fängt Per-Instanz-Suffixe (GTM-Container, @AdobeOrg, Hash-IDs),
|
||||
ohne kurze generische Namen zu über-matchen (Mindestlänge 3)."""
|
||||
ohne kurze generische Namen zu über-matchen (siehe _is_distinctive_base)."""
|
||||
from compliance.services.cookie_library_lookup import _strip_wildcards
|
||||
base = _strip_wildcards(name)
|
||||
keys: list[str] = []
|
||||
@@ -72,7 +84,7 @@ def _candidate_keys(name: str) -> list[str]:
|
||||
if not seps:
|
||||
break
|
||||
cur = cur[:seps[-1].start()].rstrip("_-.:$%@")
|
||||
if len(cur) >= 3 and cur not in keys:
|
||||
if _is_distinctive_base(cur) and cur not in keys:
|
||||
keys.append(cur)
|
||||
else:
|
||||
break
|
||||
|
||||
@@ -15,6 +15,15 @@ def test_candidate_keys_strips_runtime_suffix():
|
||||
assert "_pk_id" in _candidate_keys("_pk_id.5.7d8f")
|
||||
|
||||
|
||||
def test_candidate_keys_rejects_short_generic_base():
|
||||
# 'cct' (3 Zeichen, kein führender _) darf NICHT als Präfix-Basis dienen —
|
||||
# sonst matchen alle 'cct_*' (Genesys-Webchat) die fremde Library-Zeile
|
||||
# 'cct' (Marketing). Kanonische Basen mit führendem _ bleiben erlaubt.
|
||||
assert "cct" not in _candidate_keys("cct_chatSessionToken")
|
||||
assert "sid" not in _candidate_keys("sid_tracker_x")
|
||||
assert "_ga" in _candidate_keys("_ga_GTM-ABC123")
|
||||
|
||||
|
||||
def test_match_lib_prefix_and_exact():
|
||||
lib = {"_ga": {"actual_category": "statistics"},
|
||||
"phpsessid": {"actual_category": "essential"}}
|
||||
|
||||
Reference in New Issue
Block a user