feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -51,8 +51,13 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
|
||||
MODEL = os.getenv("PLAUSIBILITY_LLM_MODEL", "qwen3:30b-a3b")
|
||||
BATCH_SIZE = int(os.getenv("PLAUSIBILITY_BATCH_SIZE", "8"))
|
||||
TIMEOUT = float(os.getenv("PLAUSIBILITY_TIMEOUT_S", "60.0"))
|
||||
# Reduced from 8 → 4 to fight qwen3 empty-response-on-large-prompts bug.
|
||||
# 4 items × ~500 token/item + 2000 system + 1500 excerpt = ~5500 token total,
|
||||
# well within qwen3's safe range for format='json'.
|
||||
BATCH_SIZE = int(os.getenv("PLAUSIBILITY_BATCH_SIZE", "4"))
|
||||
TIMEOUT = float(os.getenv("PLAUSIBILITY_TIMEOUT_S", "45.0"))
|
||||
# Reduced excerpt 4000 → 1500 chars (same reason).
|
||||
DOC_EXCERPT_CHARS = int(os.getenv("PLAUSIBILITY_DOC_EXCERPT", "1500"))
|
||||
|
||||
# In-memory cache: (input_hash) -> result_dict. Survives one run.
|
||||
_CACHE: dict[str, dict] = {}
|
||||
@@ -121,7 +126,8 @@ def _build_user_prompt(items: list[dict], doc_title: str,
|
||||
)
|
||||
return (
|
||||
f"DOKUMENT: {doc_title}\n\n"
|
||||
f"DOKUMENT-AUSZUG (max 4000 Zeichen):\n{doc_excerpt[:4000]}\n\n"
|
||||
f"DOKUMENT-AUSZUG (max {DOC_EXCERPT_CHARS} Zeichen):\n"
|
||||
f"{doc_excerpt[:DOC_EXCERPT_CHARS]}\n\n"
|
||||
f"FINDINGS ZU BEWERTEN:\n{findings_block}"
|
||||
)
|
||||
|
||||
@@ -149,6 +155,23 @@ async def _ask_llm_batch(items: list[dict], doc_title: str,
|
||||
r.raise_for_status()
|
||||
content = (r.json().get("message") or {}).get("content", "")
|
||||
if not content:
|
||||
# Single retry with smaller batch — qwen3 sometimes
|
||||
# rejects ≥6-item prompts under format='json'.
|
||||
if len(items) > 2:
|
||||
half = len(items) // 2
|
||||
logger.info(
|
||||
"plausibility empty → retry split %d → %dx2",
|
||||
len(items), half,
|
||||
)
|
||||
first = await _ask_llm_batch(
|
||||
items[:half], doc_title, doc_excerpt,
|
||||
)
|
||||
second = await _ask_llm_batch(
|
||||
items[half:], doc_title, doc_excerpt,
|
||||
)
|
||||
out.update(first)
|
||||
out.update(second)
|
||||
return out
|
||||
logger.warning("plausibility LLM returned empty content")
|
||||
return out
|
||||
try:
|
||||
|
||||
Reference in New Issue
Block a user