feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung

Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding:
  - consent-tester/Dockerfile: firefox + webkit + Xvfb deps
  - playwright install chromium firefox webkit
  - services/browser_profiles.py: Registry mit DEFAULT_PROFILES
    (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) +
    EXTRA_PROFILES (Chrome-Channel, Edge, Brave)
  - services/multi_browser_scanner.py: run_matrix() orchestriert N
    parallele Scans + worst-of-Aggregation + 3 Sub-Scores
    (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) +
    Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß
  - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul,
    damit main.py unter 500 LOC bleibt)
  KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium,
    echte Engine-Diversität in Stage 1.b (consent_scanner.py Param)

Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen:
  - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt
    USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen
    (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging)
  - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor
    in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im
    ±400-char-Window. Findet Vendors ohne benannten Mechanismus.
  - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt
    semantisch-tief, vorgesehen für Specialist-Agents Task #18.

Plausibility-LLM Empty-Response-Härtung (Task #16):
  - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s
  - Single-retry mit halbierter Batch wenn LLM empty content
    zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts
    unter format='json'. Falls auch Half-Batch empty: log + skip.
  - Pipeline läuft jetzt nicht mehr 10min in Timeouts.

GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓,
2/3 LOW ✓.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-06 21:42:27 +02:00
parent d0e3621192
commit e1dadc8027
10 changed files with 687 additions and 4 deletions
+9 -1
View File
@@ -8,6 +8,13 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
curl \
# Browser-matrix stage 1: Firefox + WebKit deps + Xvfb (headed runs)
xvfb \
libdbus-glib-1-2 libxt6 \
libwoff1 libvpx7 libevent-2.1-7 libopus0 libgstreamer-plugins-base1.0-0 \
libgstreamer-gl1.0-0 libgstreamer1.0-0 libwebpdemux2 libharfbuzz-icu0 \
libenchant-2-2 libsecret-1-0 libhyphen0 libmanette-0.2-0 libflite1 \
libgles2 libx264-164 \
&& rm -rf /var/lib/apt/lists/*
# Create user BEFORE installing Playwright (so browsers are in user's cache)
@@ -17,8 +24,9 @@ COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Install Playwright browsers AS appuser (so they land in /home/appuser/.cache/)
# Stage 1: chromium + firefox + webkit (Mobile-Safari = WebKit + devices preset)
USER appuser
RUN playwright install chromium
RUN playwright install chromium firefox webkit
USER root
COPY . .
+4
View File
@@ -60,6 +60,10 @@ class ScanResponse(BaseModel):
banner_screenshot_b64: str = "" # P85: base64-PNG des Banners (initial-view)
from routes_matrix import router as matrix_router
app.include_router(matrix_router)
@app.get("/health")
async def health():
return {"status": "healthy", "service": "consent-tester"}
+61
View File
@@ -0,0 +1,61 @@
"""POST /scan-matrix — browser-matrix stage-1 endpoint.
Runs the existing consent_scanner once per browser profile and
returns the aggregated robustness-score per browser plus a
worst-of/best-of summary. Kept in its own module so main.py stays
under the 500-LOC cap.
KNOWN LIMITATION (stage 1.a):
The underlying `run_consent_test` does not yet accept a
`browser_profile` kwarg — all profiles currently execute on the
same Chromium instance. Engine diversity (real Firefox/WebKit
contexts) ships in stage 1.b once consent_scanner is split.
"""
from __future__ import annotations
import logging
from datetime import datetime, timezone
from fastapi import APIRouter
from pydantic import BaseModel
from services.consent_scanner import run_consent_test
from services.multi_browser_scanner import run_matrix
logger = logging.getLogger(__name__)
router = APIRouter()
class MatrixScanRequest(BaseModel):
url: str
timeout_per_phase: int = 10
categories: list[str] = []
# Resolved against browser_profiles.resolve_profiles. None or
# empty list → default 4 profiles (chromium/firefox/webkit/iphone).
browser_profiles: list[str] | None = None
async def _scanner_shim(url: str, browser_profile: dict | None = None,
timeout_per_phase: int = 10,
categories: list[str] | None = None):
"""Shim that ignores `browser_profile` until consent_scanner accepts it."""
return await run_consent_test(url, timeout_per_phase,
categories or [])
@router.post("/scan-matrix")
async def scan_matrix(req: MatrixScanRequest):
"""Run consent-scan across the resolved browser-profile matrix."""
logger.info("Matrix scan for %s profiles=%s", req.url,
req.browser_profiles or "default")
matrix = await run_matrix(
_scanner_shim,
req.url,
requested_profiles=req.browser_profiles,
timeout_per_phase=req.timeout_per_phase,
categories=req.categories,
)
matrix["url"] = req.url
matrix["scanned_at"] = datetime.now(timezone.utc).isoformat()
return matrix
+138
View File
@@ -0,0 +1,138 @@
"""Browser-matrix stage-1 profile registry.
Each profile is a deterministic recipe for a Playwright BrowserContext.
The orchestrator runs the scan once per profile and aggregates the
results with the worst-of-rule (a HIGH on any browser → HIGH overall).
Keep this module dependency-light so it can be imported in unit tests
without spawning Playwright. The Playwright glue lives in
`services/multi_browser_scanner.py`.
Profile schema:
{
"id": str canonical identifier shown in the audit report
"label": str human-readable name
"engine": str blink | gecko | webkit
"channel": str? Playwright channel ('chrome' / 'msedge')
"device": str? Playwright devices preset for mobile emulation
"headless": bool
"viewport": {"width": int, "height": int} (ignored when `device` set)
"locale": str
"timezone": str
"user_agent": str? overridden UA when not derived from device
}
"""
from __future__ import annotations
DEFAULT_PROFILES: list[dict] = [
{
"id": "chromium-headed-de",
"label": "Chromium (Headed) · de-DE",
"engine": "blink",
"channel": None,
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
{
"id": "firefox-headed-de",
"label": "Firefox (Headed, ETP-Standard) · de-DE",
"engine": "gecko",
"channel": None,
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
{
"id": "webkit-headed-de",
"label": "WebKit (Headed) · de-DE",
"engine": "webkit",
"channel": None,
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
{
"id": "iphone-mobile-safari-de",
"label": "Mobile Safari (iPhone 15) · de-DE",
"engine": "webkit",
"channel": None,
"device": "iPhone 15",
"headless": False,
"viewport": None,
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
]
# Optional profiles enabled via env var BROWSER_PROFILES_EXTRA
EXTRA_PROFILES: dict[str, dict] = {
"chrome-channel-desktop-de": {
"id": "chrome-channel-desktop-de",
"label": "Chrome Channel (Google Build) · de-DE",
"engine": "blink",
"channel": "chrome",
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
"edge-channel-desktop-de": {
"id": "edge-channel-desktop-de",
"label": "Edge Channel · de-DE",
"engine": "blink",
"channel": "msedge",
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
},
"brave-default-de": {
"id": "brave-default-de",
"label": "Brave Default-Shields · de-DE",
"engine": "blink",
"channel": None,
"device": None,
"headless": False,
"viewport": {"width": 1920, "height": 1080},
"locale": "de-DE",
"timezone": "Europe/Berlin",
"user_agent": None,
"executable_path": "/usr/bin/brave-browser",
},
}
def resolve_profiles(requested: list[str] | None) -> list[dict]:
"""Map requested ids to profile dicts. Falls back to all defaults
when `requested` is None or empty."""
if not requested:
return list(DEFAULT_PROFILES)
by_id = {p["id"]: p for p in DEFAULT_PROFILES}
by_id.update(EXTRA_PROFILES)
out: list[dict] = []
for r in requested:
prof = by_id.get(r)
if prof:
out.append(prof)
return out or list(DEFAULT_PROFILES)
def default_ids() -> list[str]:
return [p["id"] for p in DEFAULT_PROFILES]
@@ -0,0 +1,158 @@
"""Multi-browser consent-scan orchestrator (browser-matrix stage 1).
Runs the existing single-browser `consent_scanner.run_consent_test`
once per profile from `browser_profiles.resolve_profiles` and
aggregates the per-browser results with the worst-of rule:
* any HIGH-violation on any browser → robustness_score capped to <60
* Pre-Consent + Reject-Respekt are weighted 80% combined
* Banner-Design only contributes if the banner was detected at all
Returns a unified ScanResponse-compatible dict plus a fresh
`browser_matrix` block (one entry per profile) so the backend mail
renderer can show "Chrome 95% · Firefox 92% · WebKit 78% · Mobile-Safari 65%".
Heuristic only — the real per-test scoring (T1..T7 from the EDPB
taskforce report) is mocked here as a placeholder until the consent
scanner emits structured per-test results.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any, Callable, Awaitable
from .browser_profiles import resolve_profiles
logger = logging.getLogger(__name__)
# Worst-of capping: if pre-consent or reject-respect has ANY hard fail,
# overall robustness can never exceed this value.
_HARD_FAIL_CAP = 55
# Per-dimension weights — Sales/Risk-tuned (see strategy doc):
# Pre-Consent-Compliance 50%
# Reject-Respekt 30%
# Banner-Design / Dark 20%
_WEIGHTS = {"pre_consent": 0.5, "reject_respect": 0.3, "banner_design": 0.2}
def _extract_dimensions(banner_result: dict) -> dict[str, float]:
"""Best-effort: derive 3 sub-scores from the existing scan output.
Falls back to neutral 0.5 when the input is too sparse.
"""
if not banner_result:
return {"pre_consent": 0.5, "reject_respect": 0.5,
"banner_design": 0.5}
phases = banner_result.get("phases") or {}
before = phases.get("before_consent") or phases.get("before") or {}
after_reject = phases.get("after_reject") or {}
bv = (banner_result.get("banner_checks") or {}).get("violations") or []
pre_cookies = len(before.get("cookies") or [])
rej_cookies = len(after_reject.get("cookies") or [])
pre_consent = max(0.0, 1.0 - min(1.0, pre_cookies / 10.0))
reject_respect = max(0.0, 1.0 - min(1.0, rej_cookies / 5.0))
banner_design = max(0.0, 1.0 - min(1.0, len(bv) / 5.0))
return {
"pre_consent": round(pre_consent, 3),
"reject_respect": round(reject_respect, 3),
"banner_design": round(banner_design, 3),
}
def _score(dimensions: dict[str, float]) -> int:
base = (
dimensions["pre_consent"] * _WEIGHTS["pre_consent"]
+ dimensions["reject_respect"] * _WEIGHTS["reject_respect"]
+ dimensions["banner_design"] * _WEIGHTS["banner_design"]
)
pct = int(round(base * 100))
if (dimensions["pre_consent"] < 0.5
or dimensions["reject_respect"] < 0.5):
pct = min(pct, _HARD_FAIL_CAP)
return pct
def _verbal(score: int) -> str:
if score >= 95:
return "Im Prüfumfang keine wesentlichen Mängel"
if score >= 80:
return "Niedriges Risiko, Korrektur empfohlen"
if score >= 60:
return "Mittlere Mängel, kurzfristige Korrektur"
if score >= 30:
return "Schwere Mängel, sofortige Korrektur"
return "Bußgeldrelevante Verstöße"
async def run_matrix(
scanner: Callable[..., Awaitable[Any]],
url: str,
requested_profiles: list[str] | None = None,
**scanner_kwargs: Any,
) -> dict:
"""Run `scanner(url, profile=…, **kw)` once per profile in parallel.
`scanner` must be the existing consent_scanner.run_consent_test
or a shim with the same signature; it must accept a `browser_profile`
kwarg. Returns:
{
"browser_matrix": [
{"profile_id": ..., "label": ..., "scan": <raw scan dict>,
"dimensions": {...}, "score": int, "verbal": str},
...
],
"aggregate": {
"worst_score": int, "worst_profile": "...",
"best_score": int, "best_profile": "...",
"verbal": "...",
},
}
"""
profiles = resolve_profiles(requested_profiles)
if not profiles:
return {"browser_matrix": [], "aggregate": {}}
async def _run_one(prof: dict) -> dict:
try:
scan = await scanner(
url, browser_profile=prof, **scanner_kwargs,
)
except TypeError:
# Backward-compat: scanner that doesn't accept the kwarg
scan = await scanner(url, **scanner_kwargs)
except Exception as e:
logger.warning("matrix profile %s failed: %s", prof["id"], e)
return {
"profile_id": prof["id"], "label": prof["label"],
"scan": None, "error": str(e)[:200],
"dimensions": {"pre_consent": 0, "reject_respect": 0,
"banner_design": 0},
"score": 0, "verbal": "Scan fehlgeschlagen",
}
dims = _extract_dimensions(scan or {})
score = _score(dims)
return {
"profile_id": prof["id"], "label": prof["label"],
"scan": scan, "dimensions": dims, "score": score,
"verbal": _verbal(score),
}
results = await asyncio.gather(*[_run_one(p) for p in profiles])
sorted_by_score = sorted(results, key=lambda r: r["score"])
worst = sorted_by_score[0]
best = sorted_by_score[-1]
return {
"browser_matrix": results,
"aggregate": {
"worst_score": worst["score"],
"worst_profile": worst["profile_id"],
"best_score": best["score"],
"best_profile": best["profile_id"],
"verbal": worst["verbal"],
"profiles_run": len(results),
},
}