perf(audit): vendor_llm_extractor + mc_solution_generator nutzen P31 LLM-Cascade
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Beide rufen jetzt llm_cascade.call_with_cascade() statt direkter Qwen/OVH- Aufrufe. Damit: * Cache-Hit auf identische Eingaben (Valkey, 7d TTL) → ~50ms statt 4-6min beim Re-Run derselben Cookie-Doc. * Tiered Cascade automatisch: Qwen → OVH 120B → Anthropic Claude Haiku wenn lower-tier under confidence-threshold. * Confidence-Scoring (JSON-parse + items_per_input_size) entscheidet ob weiter delegiert wird. Fallback auf alte _call_ollama/_call_ovh bleibt bestehen wenn der Cascade-Aufruf scheitert. Erwartete Wirkung beim 2. VW-Lauf: ~10min statt ~25min (Cache-Hit auf identische Cookie-Doc + MC-Solutions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -172,6 +172,21 @@ async def generate_solution(
|
||||
"Liefere die Loesung als JSON."
|
||||
)
|
||||
|
||||
# P31: tiered Cascade (Qwen → OVH → Anthropic) mit Valkey-Cache.
|
||||
try:
|
||||
from compliance.services.llm_cascade import call_with_cascade
|
||||
res = await call_with_cascade(
|
||||
system=_SYSTEM_PROMPT, user=prompt,
|
||||
min_confidence=0.5, max_tokens=600,
|
||||
)
|
||||
parsed = _parse(res.get("text", ""))
|
||||
if parsed:
|
||||
_cache_put(cache_key, parsed)
|
||||
return parsed
|
||||
except Exception:
|
||||
# fall through to legacy direct calls
|
||||
pass
|
||||
|
||||
content = await _call_ollama(prompt)
|
||||
parsed = _parse(content)
|
||||
if not parsed:
|
||||
|
||||
Reference in New Issue
Block a user