fix(cascade): give OVH/gpt-oss reasoning headroom so Tier-2 isn't silently dead
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / loc-budget (push) Successful in 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 25s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 6s
CI / validate-canonical-controls (push) Successful in 5s

gpt-oss-120b is a reasoning model: it spends output tokens on chain-of-thought
before the answer. deep_check called _call_ovh with max_tokens=400, which
length-capped it mid-reasoning -> content=null -> the OVH tier returned nothing
and the cascade always skipped Tier-2. Floor the OVH budget to >=2000, fall back
to reasoning_content when content is null, and raise the client timeout to 90s
for the slower reasoning path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-21 22:50:43 +02:00
parent b9c00574b1
commit 067118b12d
2 changed files with 88 additions and 4 deletions
@@ -142,19 +142,26 @@ async def _call_ovh(system: str, user: str, max_tokens: int = 6000) -> str:
headers = {"Content-Type": "application/json"}
if key:
headers["Authorization"] = f"Bearer {key}"
# gpt-oss-120b is a REASONING model: it spends output tokens on
# chain-of-thought before emitting the answer. A low cap (e.g. deep_check's
# max_tokens=400) makes it hit the length limit mid-reasoning and return
# content=null — the whole tier then silently yields nothing. Floor the
# budget so the reasoning AND the JSON answer fit.
payload = {
"model": model, "temperature": 0.05, "max_tokens": max_tokens,
"model": model, "temperature": 0.05, "max_tokens": max(max_tokens, 2000),
"messages": [{"role": "system", "content": system},
{"role": "user", "content": user}],
"response_format": {"type": "json_object"},
}
try:
async with httpx.AsyncClient(timeout=45.0) as c:
async with httpx.AsyncClient(timeout=90.0) as c:
r = await c.post(f"{base.rstrip('/')}/v1/chat/completions",
json=payload, headers=headers)
r.raise_for_status()
choice = (r.json().get("choices") or [{}])[0]
return (choice.get("message") or {}).get("content", "") or ""
msg = (r.json().get("choices") or [{}])[0].get("message") or {}
# Answer is normally in content; if the model was length-capped the
# JSON can land in reasoning_content instead — fall back to it.
return (msg.get("content") or "") or (msg.get("reasoning_content") or "")
except Exception as e:
logger.warning("ovh cascade tier 2 failed: %s", e)
return ""