feat(cra): Datenblatt-Extraktion auf lokales 35B + llm_status-Fix

llm_cascade additiv modell-faehig (optionaler model-Param, Cache-Key kennt model_hint → keine Kollision; Default unveraendert für alle anderen Nutzer). Datenblatt-Extraktor nutzt jetzt qwen3.5:35b-a3b (CRA_DATASHEET_MODEL, gleiches Modell wie der Compliance Advisor) für bessere semantische Zuordnung. Plus llm_status (ok|empty|unavailable) + Logging statt stillem except; Frontend zeigt bei 'unavailable' einen Hinweis statt leerer Felder (wichtig auf prod ohne lokales Ollama → Cascade-Fallback bzw. Hinweis). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-16 19:53:48 +02:00
parent 6ca085ffc5
commit b217429d39
3 changed files with 31 additions and 9 deletions
@@ -104,9 +104,10 @@ def _heuristic_confidence(response_text: str, input_len: int) -> float:

 async def _call_ollama(system: str, user: str,
                        max_tokens: int = 6000,
-                        timeout: float = 90.0) -> str:
+                        timeout: float = 90.0,
+                        model: str = "") -> str:
    base = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
-    model = os.getenv("CMP_LLM_MODEL", "qwen3:30b-a3b")
+    model = model or os.getenv("CMP_LLM_MODEL", "qwen3:30b-a3b")
    payload = {
        "model": model, "stream": False, "format": "json",
        "messages": [{"role": "system", "content": system},
@@ -188,10 +189,11 @@ async def call_with_cascade(
    user: str,
    min_confidence: float = 0.6,
    max_tokens: int = 6000,
+    model: str = "",
 ) -> dict:
    """Returns {'text': str, 'confidence': float, 'source': str,
-    'cached': bool}."""
-    key = _cache_key(system, user)
+    'cached': bool}. `model` overrides the local Tier-1 (Ollama) model only."""
+    key = _cache_key(system, user, model)
    cached = _cache_get(key)
    if cached:
        cached["cached"] = True
@@ -211,7 +213,7 @@ async def call_with_cascade(
                "or ANTHROPIC_API_KEY to enable fallbacks."
            )
    # Tier 1: Qwen lokal
-    text = await _call_ollama(system, user, max_tokens=max_tokens)
+    text = await _call_ollama(system, user, max_tokens=max_tokens, model=model)
    conf = _heuristic_confidence(text, input_len)
    if text and conf >= min_confidence:
        out = {"text": text, "confidence": conf,