feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,98 @@
|
||||
"""B10 — Drittland-Transfer-Mechanismus-Konsistenz pro Vendor.
|
||||
|
||||
DSGVO Art. 44 ff. verlangt für Drittland-Transfers EINEN klaren
|
||||
Mechanismus: Angemessenheitsbeschluss / EU-US DPF / SCCs / BCRs /
|
||||
ausdrückliche Einwilligung. Wenn ein Vendor in cmp_vendors als
|
||||
Drittland-Verarbeiter erkannt wird, muss der DSE-Text einen
|
||||
Mechanismus pro Vendor (oder per Vendor-Kategorie) klar benennen.
|
||||
|
||||
GT-Pattern Elli (TRANSFER-001):
|
||||
- Google/Meta → DPF in DSE genannt ✓
|
||||
- Salesforce → SCCs ✓
|
||||
- Webflow als US-Sitz erwähnt aber kein Mechanismus → MEDIUM
|
||||
|
||||
Heuristik:
|
||||
1. Aus cmp_vendors die Drittland-Vendors filtern (third_country=True).
|
||||
2. Im DSE-Text suchen, ob pro Vendor ein Mechanismus erwähnt ist.
|
||||
3. Wenn ein Drittland-Vendor keinen Mechanismus hat → MEDIUM.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_MECHANISM_KEYWORDS = (
|
||||
("DPF / Data Privacy Framework",
|
||||
["data privacy framework", "dpf-", "eu-us dpf",
|
||||
"angemessenheitsbeschluss"]),
|
||||
("Standardvertragsklauseln (SCCs)",
|
||||
["standardvertragsklauseln", "scc-", "scc ", "standard contractual",
|
||||
"art. 46 abs. 2 lit. c"]),
|
||||
("Binding Corporate Rules",
|
||||
["binding corporate rules", "bcr-", "verbindliche unternehmensregeln"]),
|
||||
("Ausdrückliche Einwilligung",
|
||||
["ausdrückliche einwilligung nach art. 49",
|
||||
"explicit consent under art. 49"]),
|
||||
)
|
||||
|
||||
|
||||
def _mechanism_for_vendor(vendor_name: str, dse_text: str) -> str | None:
|
||||
if not vendor_name or not dse_text:
|
||||
return None
|
||||
name_lc = vendor_name.lower()
|
||||
text_lc = dse_text.lower()
|
||||
# Find vendor mention in DSE; locate a ±400 char window for
|
||||
# mechanism keywords
|
||||
idx = text_lc.find(name_lc)
|
||||
if idx < 0:
|
||||
return None
|
||||
window = text_lc[max(0, idx - 400): idx + 400]
|
||||
for mech_label, kws in _MECHANISM_KEYWORDS:
|
||||
if any(k in window for k in kws):
|
||||
return mech_label
|
||||
return None
|
||||
|
||||
|
||||
def check_transfer_mechanism(state: dict) -> list[dict]:
|
||||
cmp_vendors = state.get("cmp_vendors") or []
|
||||
doc_texts = state.get("doc_texts") or {}
|
||||
dse = doc_texts.get("dse") or ""
|
||||
if not cmp_vendors or not dse:
|
||||
return []
|
||||
findings: list[dict] = []
|
||||
for v in cmp_vendors:
|
||||
country = (v.get("country") or "").upper().strip()
|
||||
name = (v.get("name") or "").strip()
|
||||
if not name:
|
||||
continue
|
||||
# Skip EU/EEA
|
||||
if country in ("DE", "AT", "BE", "BG", "HR", "CY", "CZ", "DK",
|
||||
"EE", "FI", "FR", "GR", "HU", "IE", "IT", "LV",
|
||||
"LT", "LU", "MT", "NL", "PL", "PT", "RO", "SK",
|
||||
"SI", "ES", "SE", "IS", "LI", "NO", "CH"):
|
||||
continue
|
||||
# Either flagged as third_country OR country not in EU
|
||||
mech = _mechanism_for_vendor(name, dse)
|
||||
if mech is None:
|
||||
findings.append({
|
||||
"check_id": "TRANSFER-MECH-001",
|
||||
"vendor": name,
|
||||
"country": country or "UNKNOWN",
|
||||
"severity": "MEDIUM",
|
||||
"severity_reason": "missing",
|
||||
"title": (
|
||||
f"Drittland-Transfer-Mechanismus für {name} "
|
||||
f"({country or 'Drittland'}) fehlt in DSE"
|
||||
),
|
||||
"norm": "DSGVO Art. 44 + Art. 46 / Art. 49",
|
||||
"action": (
|
||||
f"Im DSE-Abschnitt zu {name} den Transfermechanismus "
|
||||
"angeben (DPF / SCCs / BCRs / Einwilligung) und ggf. "
|
||||
"Vertragsdokument referenzieren."
|
||||
),
|
||||
})
|
||||
if findings:
|
||||
logger.info("B10 transfer-mechanism: %d findings", len(findings))
|
||||
return findings
|
||||
Reference in New Issue
Block a user