feat(consent-tester): Phase C+D — LLM cascade fallback (Qwen → OVH)
New module consent-tester/services/cmp_llm_fallback.py:
- LLMCookieExtractor: single-endpoint adapter (Ollama OR OpenAI-compat)
- LLMCascade: tries Qwen (local Mac Mini Ollama) first; falls through to
OVH (managed 120B) when Qwen returns no usable strategy
- LLMCascade.from_env(): reads OLLAMA_URL/CMP_LLM_MODEL + OVH_LLM_URL/
OVH_LLM_KEY/OVH_LLM_MODEL from environment
- LLM returns JSON {strategy: url|selector|text, value: ...}
- Valkey-backed cache per netloc (cmp:hint:<netloc>, 7-day TTL) — next run
against the same domain skips the LLM entirely
dsi_discovery.py:
- Wired network_log collector (URL/status/content-type/size of every JSON
response on the page) — passed to LLM prompt as observation
- After Named CMP (Phase B) + Heuristic (Phase A) both fail AND DOM
< 300 words: invoke LLMCascade.analyze(...)
- _apply_llm_hint executes the LLM's strategy: refetch URL via Playwright
request context, query DOM selector, or use text directly
- Cache HIT path: apply cached hint, only fall back to LLM if cache is stale
docker-compose.yml:
- consent-tester gets env vars + cmp-data volume (for Phase E)
- All LLM endpoints configurable via env, sensible defaults
consent-tester/requirements.txt:
- redis>=5.0 (asyncio client, Valkey-compatible)
- httpx>=0.27
This commit is contained in:
@@ -12,6 +12,7 @@ networks:
|
||||
|
||||
volumes:
|
||||
dsms_data:
|
||||
cmp-data: # consent-tester: CMP discovery log + auto-promoted modules
|
||||
|
||||
services:
|
||||
|
||||
@@ -256,6 +257,18 @@ services:
|
||||
depends_on:
|
||||
core-health-check:
|
||||
condition: service_completed_successfully
|
||||
environment:
|
||||
# LLM fallback for cookie-policy extraction (Phase C+D of cascade)
|
||||
OLLAMA_URL: "${OLLAMA_URL:-http://bp-core-ollama:11434}"
|
||||
CMP_LLM_MODEL: "${CMP_LLM_MODEL:-qwen3:30b-a3b}"
|
||||
OVH_LLM_URL: "${OVH_LLM_URL:-}"
|
||||
OVH_LLM_KEY: "${OVH_LLM_KEY:-}"
|
||||
OVH_LLM_MODEL: "${OVH_LLM_MODEL:-}"
|
||||
VALKEY_URL: "${VALKEY_URL:-redis://bp-core-valkey:6379}"
|
||||
# CMP discovery log (Phase E auto-promote)
|
||||
CMP_DISCOVERY_DB: "/data/cmp_discoveries.db"
|
||||
volumes:
|
||||
- cmp-data:/data
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8094/health"]
|
||||
interval: 30s
|
||||
|
||||
Reference in New Issue
Block a user