fix: 4 bugs from IHK scan — false positives + missing etracker

1. GA regex: G-\w{5,} matched CSS classes (g-7031048). Now requires
   G-[A-Z0-9]{8,12} (uppercase after G-, 8-12 chars = real GA4 ID)
2. External page scanning: DSE-internal links now SAME DOMAIN only.
   Previously followed links to etracker.com, google.de/policies etc.
   and detected services on THOSE sites as IHK services.
3. Added etracker to service registry (DE, ePrivacy-certified)
4. CSS/JS/image files excluded from page scanning
5. Navigation-pattern links for deeper DSE sub-pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-29 19:08:07 +02:00
parent 891fc5bea0
commit 5eeef3a9c3
3 changed files with 15 additions and 6 deletions
+1 -1
View File
@@ -6,7 +6,7 @@ import re
from dataclasses import dataclass
SERVICE_PATTERNS: dict[str, dict] = {
r"google.?analytics|gtag|UA-\d|G-\w{5}": {
r"google.?analytics|gtag\(|UA-\d{4,}|G-[A-Z0-9]{8,12}": {
"name": "Google Analytics", "requires_consent": True,
"legal_ref": "§25 TDDDG, Art. 44-49 DSGVO",
},