fix: 4 Bugs gemeinsam — B22 PDF + B17 Walk-Fallback + company_name + Plausibility-Fallback

(1) B22 Cross-Domain (fix #59): Elli-Test fand AGB auf logpay.de NICHT obwohl URL in doc_entries korrekt. Vermutete Ursache: Discovery-Phase A drops/überschreibt Original-URL bei PDF-Fetch-Fail (word_count=0). Fix: _collect_audit_urls() iteriert über state.doc_entries + rejected_url + req.documents — Cross-Domain-Hosting ist unabhängig vom Text-Inhalt. Plus Trace-Logging für künftige Diagnose. Dedup per (doc_type, host_sld). (2) B17 Audit-Walk-Fail-Fallback (fix #60): BMW v5 hatte audit_walk=None ohne Mail-Hinweis. Vermutlich 180s-Timeout bei OneTrust-CMP-Banner-Tour. Fix: Timeout 180s → 300s. Plus: Bei Fail wird ein Hinweis- Stub mit error-Grund in state["audit_walk"] + HTML-Block geschrieben — Reviewer sieht den Fail statt silent-skip. (3) company_name + origin_domain im Backend (fix #61): Frontend sendet seit ec03317 die zwei Felder — Backend ignorierte sie. Fix: ComplianceCheckRequest-Schema um company_name + origin_domain erweitert. phase_e_email priorisiert User-Input vor URL-Heuristik für site_name. Bei origin_domain ohne ableitbare doc_entries-domain wird der User-Input als domain übernommen. (4) Plausibility-LLM Fallback-Modell (fix #62): qwen3:30b-a3b liefert auf großen DSEs (BMW 122 FAIL) gehäuft leere format='json'-Responses — Circuit-Breaker griff aber Phase blieb nutzlos. Fix: Default-Modell auf qwen2.5:7b umgestellt (4× kleiner, zuverlässiger bei format=json, ausreichendes Reasoning für PASS/MODIFY/DROP-Klassifikation). Plus Strategy-C eingeführt — Fallback-Modell (llama3.2:3b) wenn primary leer bleibt. BATCH_SIZE 4 → 3. ENV-Switches PLAUSIBILITY_LLM_MODEL + PLAUSIBILITY_FALLBACK_MODEL für Tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 16:39:33 +02:00
parent ec03317170
commit d6b8bf87c2
5 changed files with 138 additions and 35 deletions
@@ -50,11 +50,19 @@ import httpx
 logger = logging.getLogger(__name__)

 OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
-MODEL = os.getenv("PLAUSIBILITY_LLM_MODEL", "qwen3:30b-a3b")
-# Reduced from 8 → 4 to fight qwen3 empty-response-on-large-prompts bug.
-# 4 items × ~500 token/item + 2000 system + 1500 excerpt = ~5500 token total,
-# well within qwen3's safe range for format='json'.
-BATCH_SIZE = int(os.getenv("PLAUSIBILITY_BATCH_SIZE", "4"))
+# Default-Modell als ENV-Switch konfigurierbar. qwen3:30b-a3b ist
+# bestes Reasoning, aber gibt bei großen DSEs gerne leere Responses
+# unter format='json'. qwen2.5:7b ist 4× kleiner, deutlich
+# zuverlässiger, leicht schwächeres Reasoning aber für die einfache
+# Plausibility-Klassifikation (PASS/MODIFY/DROP) ausreichend.
+MODEL = os.getenv("PLAUSIBILITY_LLM_MODEL", "qwen2.5:7b")
+# Fallback-Modell wenn das primary trotz Retries nichts liefert
+# (Strategy A → B → C → D-Schritte erschöpft). Default ist ein
+# kleines, robustes Modell.
+FALLBACK_MODEL = os.getenv("PLAUSIBILITY_FALLBACK_MODEL", "llama3.2:3b")
+# Mit kleinerem Modell können größere Batches funktionieren — aber
+# konservativ bleiben damit Single-Modell-Fail nicht ganz Phase killt.
+BATCH_SIZE = int(os.getenv("PLAUSIBILITY_BATCH_SIZE", "3"))
 TIMEOUT = float(os.getenv("PLAUSIBILITY_TIMEOUT_S", "45.0"))
 # Reduced excerpt 4000 → 1500 chars (same reason).
 DOC_EXCERPT_CHARS = int(os.getenv("PLAUSIBILITY_DOC_EXCERPT", "1500"))
@@ -173,33 +181,46 @@ async def _ask_llm_batch(items: list[dict], doc_title: str,
    """Send a batch of up to BATCH_SIZE findings to the LLM.

    Resilience strategy (P125 fix for empty-response bug):
-      A. format='json' (strict) — current default
-      B. If A returns empty: format='' (loose), extract JSON manually
-      C. If B also empty AND batch >2: split batch + recurse
-      D. Else: give up, return {} (callers stamp llm_skipped=true)
+      A. primary MODEL + format='json' (strict)
+      B. primary MODEL + format='' (loose), parse JSON manuell
+      C. FALLBACK_MODEL + format='json' (kleineres robusteres Modell)
+      D. If batch >2: split + recurse
+      E. Else: give up, return {} (callers stamp llm_skipped=true)
    """
    user_prompt = _build_user_prompt(items, doc_title, doc_excerpt)
-    base_body = {
-        "model": MODEL,
-        "messages": [
-            {"role": "system", "content": _SYSTEM_PROMPT},
-            {"role": "user", "content": user_prompt},
-        ],
-        "stream": False,
-        "options": {"temperature": 0.0, "seed": 42, "num_predict": 1500},
-    }
+
+    def _body(model: str) -> dict:
+        return {
+            "model": model,
+            "messages": [
+                {"role": "system", "content": _SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            "stream": False,
+            "options": {"temperature": 0.0, "seed": 42, "num_predict": 1500},
+        }
+
    out: dict[str, dict] = {}
    input_ids = [it["id"] for it in items]
    try:
-        # Strategy A: format='json'
-        content = await _post_llm({**base_body, "format": "json"})
+        # Strategy A: primary + format='json'
+        content = await _post_llm({**_body(MODEL), "format": "json"})
        if not content:
-            # Strategy B: format-free, parse-on-our-side
+            # Strategy B: primary + format-free
            logger.info(
                "plausibility A→empty, trying B (format-free) batch=%d",
                len(items),
            )
-            content = await _post_llm(base_body)
+            content = await _post_llm(_body(MODEL))
+        if not content and FALLBACK_MODEL and FALLBACK_MODEL != MODEL:
+            # Strategy C: fallback-model + format='json'
+            logger.info(
+                "plausibility A+B empty, trying C (fallback=%s) batch=%d",
+                FALLBACK_MODEL, len(items),
+            )
+            content = await _post_llm(
+                {**_body(FALLBACK_MODEL), "format": "json"},
+            )

        if not content:
            # Strategy C: split + recurse