feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,138 @@
|
||||
"""Browser-matrix stage-1 profile registry.
|
||||
|
||||
Each profile is a deterministic recipe for a Playwright BrowserContext.
|
||||
The orchestrator runs the scan once per profile and aggregates the
|
||||
results with the worst-of-rule (a HIGH on any browser → HIGH overall).
|
||||
|
||||
Keep this module dependency-light so it can be imported in unit tests
|
||||
without spawning Playwright. The Playwright glue lives in
|
||||
`services/multi_browser_scanner.py`.
|
||||
|
||||
Profile schema:
|
||||
{
|
||||
"id": str canonical identifier shown in the audit report
|
||||
"label": str human-readable name
|
||||
"engine": str blink | gecko | webkit
|
||||
"channel": str? Playwright channel ('chrome' / 'msedge')
|
||||
"device": str? Playwright devices preset for mobile emulation
|
||||
"headless": bool
|
||||
"viewport": {"width": int, "height": int} (ignored when `device` set)
|
||||
"locale": str
|
||||
"timezone": str
|
||||
"user_agent": str? overridden UA when not derived from device
|
||||
}
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
DEFAULT_PROFILES: list[dict] = [
|
||||
{
|
||||
"id": "chromium-headed-de",
|
||||
"label": "Chromium (Headed) · de-DE",
|
||||
"engine": "blink",
|
||||
"channel": None,
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
{
|
||||
"id": "firefox-headed-de",
|
||||
"label": "Firefox (Headed, ETP-Standard) · de-DE",
|
||||
"engine": "gecko",
|
||||
"channel": None,
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
{
|
||||
"id": "webkit-headed-de",
|
||||
"label": "WebKit (Headed) · de-DE",
|
||||
"engine": "webkit",
|
||||
"channel": None,
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
{
|
||||
"id": "iphone-mobile-safari-de",
|
||||
"label": "Mobile Safari (iPhone 15) · de-DE",
|
||||
"engine": "webkit",
|
||||
"channel": None,
|
||||
"device": "iPhone 15",
|
||||
"headless": False,
|
||||
"viewport": None,
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
# Optional profiles enabled via env var BROWSER_PROFILES_EXTRA
|
||||
EXTRA_PROFILES: dict[str, dict] = {
|
||||
"chrome-channel-desktop-de": {
|
||||
"id": "chrome-channel-desktop-de",
|
||||
"label": "Chrome Channel (Google Build) · de-DE",
|
||||
"engine": "blink",
|
||||
"channel": "chrome",
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
"edge-channel-desktop-de": {
|
||||
"id": "edge-channel-desktop-de",
|
||||
"label": "Edge Channel · de-DE",
|
||||
"engine": "blink",
|
||||
"channel": "msedge",
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
},
|
||||
"brave-default-de": {
|
||||
"id": "brave-default-de",
|
||||
"label": "Brave Default-Shields · de-DE",
|
||||
"engine": "blink",
|
||||
"channel": None,
|
||||
"device": None,
|
||||
"headless": False,
|
||||
"viewport": {"width": 1920, "height": 1080},
|
||||
"locale": "de-DE",
|
||||
"timezone": "Europe/Berlin",
|
||||
"user_agent": None,
|
||||
"executable_path": "/usr/bin/brave-browser",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def resolve_profiles(requested: list[str] | None) -> list[dict]:
|
||||
"""Map requested ids to profile dicts. Falls back to all defaults
|
||||
when `requested` is None or empty."""
|
||||
if not requested:
|
||||
return list(DEFAULT_PROFILES)
|
||||
by_id = {p["id"]: p for p in DEFAULT_PROFILES}
|
||||
by_id.update(EXTRA_PROFILES)
|
||||
out: list[dict] = []
|
||||
for r in requested:
|
||||
prof = by_id.get(r)
|
||||
if prof:
|
||||
out.append(prof)
|
||||
return out or list(DEFAULT_PROFILES)
|
||||
|
||||
|
||||
def default_ids() -> list[str]:
|
||||
return [p["id"] for p in DEFAULT_PROFILES]
|
||||
Reference in New Issue
Block a user