feat(consent-tester): Phase E — self-improving CMP library
cmp_discovery_log.py:
- sqlite log at /data/cmp_discoveries.db: every LLM-discovered CMP
pattern recorded with domain, strategy, value, sample text
- Auto-promote (user-chosen 'voll automatisch' mode): when LLM returns
strategy=url AND extracted text >= 800 words, write a new module
/data/auto_cmp/auto_<slug>.py with derived regex matcher + reconstruct
- record_discovery() called from dsi_discovery._try_llm_cascade on success
cmp_library/_registry.py:
- Loads both hand-written modules from services/cmp_library/ AND
auto-promoted modules from /data/auto_cmp/ (CMP_AUTO_DIR env)
- Auto modules use importlib.util.spec_from_file_location, no package
install needed; restart consent-tester to pick up new ones
dsi_discovery.py:
- _try_llm_cascade now calls record_discovery() on every successful
LLM analysis (cached AND fresh)
main.py:
- GET /cmp-discoveries — admin endpoint listing all logged discoveries
- DELETE /cmp-discoveries/{id} — rollback (unlinks auto_*.py)
This closes the self-improving loop: first encounter with a new CMP fires
the LLM (cost) → discovery is auto-promoted → all future runs against the
same vendor pattern hit Phase B (Named CMP) at <50ms with no LLM call.
This commit is contained in:
@@ -836,6 +836,18 @@ async def _try_llm_cascade(
|
||||
if wc >= 300:
|
||||
await cache_set(netloc, hint)
|
||||
logger.info("LLM cached for %s (%s): %d words", netloc, hint.get("_tier"), wc)
|
||||
# Phase E: log discovery + (if eligible) auto-promote to named CMP
|
||||
try:
|
||||
from services.cmp_discovery_log import record_discovery
|
||||
record_discovery(
|
||||
domain=netloc,
|
||||
llm_used=hint.get("_tier", "unknown"),
|
||||
strategy=hint.get("strategy", ""),
|
||||
value=hint.get("value", ""),
|
||||
extracted_text=text,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("CMP discovery log failed: %s", e)
|
||||
return text, wc
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user