feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,61 @@
|
||||
"""POST /scan-matrix — browser-matrix stage-1 endpoint.
|
||||
|
||||
Runs the existing consent_scanner once per browser profile and
|
||||
returns the aggregated robustness-score per browser plus a
|
||||
worst-of/best-of summary. Kept in its own module so main.py stays
|
||||
under the 500-LOC cap.
|
||||
|
||||
KNOWN LIMITATION (stage 1.a):
|
||||
The underlying `run_consent_test` does not yet accept a
|
||||
`browser_profile` kwarg — all profiles currently execute on the
|
||||
same Chromium instance. Engine diversity (real Firefox/WebKit
|
||||
contexts) ships in stage 1.b once consent_scanner is split.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from fastapi import APIRouter
|
||||
from pydantic import BaseModel
|
||||
|
||||
from services.consent_scanner import run_consent_test
|
||||
from services.multi_browser_scanner import run_matrix
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
class MatrixScanRequest(BaseModel):
|
||||
url: str
|
||||
timeout_per_phase: int = 10
|
||||
categories: list[str] = []
|
||||
# Resolved against browser_profiles.resolve_profiles. None or
|
||||
# empty list → default 4 profiles (chromium/firefox/webkit/iphone).
|
||||
browser_profiles: list[str] | None = None
|
||||
|
||||
|
||||
async def _scanner_shim(url: str, browser_profile: dict | None = None,
|
||||
timeout_per_phase: int = 10,
|
||||
categories: list[str] | None = None):
|
||||
"""Shim that ignores `browser_profile` until consent_scanner accepts it."""
|
||||
return await run_consent_test(url, timeout_per_phase,
|
||||
categories or [])
|
||||
|
||||
|
||||
@router.post("/scan-matrix")
|
||||
async def scan_matrix(req: MatrixScanRequest):
|
||||
"""Run consent-scan across the resolved browser-profile matrix."""
|
||||
logger.info("Matrix scan for %s profiles=%s", req.url,
|
||||
req.browser_profiles or "default")
|
||||
matrix = await run_matrix(
|
||||
_scanner_shim,
|
||||
req.url,
|
||||
requested_profiles=req.browser_profiles,
|
||||
timeout_per_phase=req.timeout_per_phase,
|
||||
categories=req.categories,
|
||||
)
|
||||
matrix["url"] = req.url
|
||||
matrix["scanned_at"] = datetime.now(timezone.utc).isoformat()
|
||||
return matrix
|
||||
Reference in New Issue
Block a user