feat(b17): Stufe 4 banner-tour + Stufe 5 annotierte Screenshots + V2-default

Stufe 4 — Cookie-Banner-Tour vor dem Accept-Klick: - audit_walk_banner_tour.tour_cookie_banner(): öffnet Settings (16 Phrase-Varianten), scrollt vertikal, aktiviert jedes [role=tab], expandet jedes [aria-expanded=false] / details / summary + 14 CMP-spezifische Selektoren. Max 35 Klicks, Best-Effort. - audit_walk_recorder ruft tour_cookie_banner() VOR _try_accept_banner auf — Reviewer sieht den vollen Consent- Katalog im Video (Vendor-Liste, Kategorien, Zwecke). - Recorder unter 500 LOC (412+155 split). Stufe 5 — Annotierte Screenshots pro Finding: - finding_annotator.annotate_url(): WebKit headless, JS-Inject eines rot-banner-Labels oben + roter Outline um das Element (Selector oder Text-Match). - finding_annotator.annotate_findings(): dispatched 3 Cases — B1 Tap-Target (Anchor markiert mit "Tap-Target X×Y px"), B16 URL-Slug-Drift (404-Seite mit "/<slug> 404"), B13 Widerruf (Footer markiert "Widerruf-Link fehlt"). - routes_audit_walk.POST /annotate-findings (consent-tester). - _b17_wiring ruft annotate-findings nach record_audit_walk und speichert annotations in walk.annotations. - audit_walk_zip_builder packt PNGs nach findings/<name>.png ins ZIP — Reviewer hat Beweis-Bilder im Postfach. Plausibility Circuit-Breaker: - Nach 6 consecutive empty batches (PLAUSIBILITY_EMPTY_BUDGET=6) bricht die ganze Phase ab statt 200 Calls zu warten. Fix für qwen3-down + große DSE-Sites (BMW: ohne Breaker 21min, mit Breaker ~3min). audit_walk_zip_builder fängt walk.annotations ab und legt sie unter findings/<fname>.png im ZIP-Anhang ab. V2-Default: - docker-compose.yml backend-compliance.environment.MAIL_RENDER_V2: default 'true'. Ohne diesen Override liefert die Engine weiterhin das alte Legacy-Mail-Layout, in dem die B-Wiring- Blöcke nicht sichtbar sind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 20:44:42 +02:00
parent e8ff75cbfe
commit b16130369a
8 changed files with 540 additions and 0 deletions
@@ -333,13 +333,27 @@ async def verify_plausibility(results, doc_texts: dict[str, str]) -> None:
    logger.info("plausibility-check: %d findings across %d docs",
                total, len(by_doc))

+    # Circuit-Breaker gegen Ollama-Total-Down: nach N consecutive
+    # batches mit 0 stamped → ganze Phase abbrechen (statt 200 calls
+    # warten). Wert konservativ: 6 consecutive empties = qwen3 ist
+    # offensichtlich nicht in der Lage zu antworten.
+    consecutive_empty_budget = int(
+        os.getenv("PLAUSIBILITY_EMPTY_BUDGET", "6"),
+    )
+    consecutive_empty = 0
+    breaker_tripped = False
+
    for dt, checks in by_doc.items():
+        if breaker_tripped:
+            break
        doc_title = by_doc_meta.get(dt) or dt
        doc_text = doc_texts.get(dt) or ""
        if not doc_text:
            # Fall back to DSE excerpt when the doc has no own text
            doc_text = doc_texts.get("dse") or ""
        for i in range(0, len(checks), BATCH_SIZE):
+            if breaker_tripped:
+                break
            batch = checks[i:i + BATCH_SIZE]
            items = []
            for c in batch:
@@ -396,5 +410,19 @@ async def verify_plausibility(results, doc_texts: dict[str, str]) -> None:
                        stamped += 1
                    except Exception:
                        pass
+            # Circuit-Breaker: stamped=0 zählt als consecutive_empty.
+            # Ausnahme: wenn ALLE items aus dem _CACHE kamen, ist 0 OK
+            # (kein neuer LLM-Call gemacht).
+            if uncached_items and stamped == 0:
+                consecutive_empty += 1
+                if consecutive_empty >= consecutive_empty_budget:
+                    logger.warning(
+                        "plausibility circuit-breaker tripped after "
+                        "%d consecutive empty batches — aborting phase",
+                        consecutive_empty,
+                    )
+                    breaker_tripped = True
+            elif stamped > 0:
+                consecutive_empty = 0
            logger.info("plausibility-check %s: batch %d → %d stamped",
                        dt, len(batch), stamped)