feat(knowledge-intake): classify a document + assess its impact before extraction
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake classifies a new document (no content extraction) and intersects its signals with an index of the existing knowledge to emit a Knowledge Package (an impact analysis). - compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios, obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic, NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns / reference scenarios / (injected) obligations, whether it is a new domain, and a triage level (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation. - ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package -> Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review. - reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high, 14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section lives in _helpers.py to keep generate.py under the 500-LOC budget. - Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act). 10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,23 @@
|
||||
"""Knowledge Intake — classify an incoming document and assess its impact on existing knowledge.
|
||||
|
||||
The stage BEFORE the parser: no content extraction, only Einordnung. Intersects a document's signals
|
||||
(regulations + keywords) with an index of the existing knowledge to emit a `KnowledgePackage` — which
|
||||
capabilities / playbooks / patterns / reference scenarios / obligations it probably touches, whether
|
||||
it is a new domain, and how much review it warrants. Deterministic, no LLM, no new corpus (freeze v1.0).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .engine import assess_document_impact, build_knowledge_index
|
||||
from .schemas import (
|
||||
DocumentDescriptor, ImpactLevel, KnowledgeIndex, KnowledgePackage,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"build_knowledge_index",
|
||||
"assess_document_impact",
|
||||
"DocumentDescriptor",
|
||||
"KnowledgeIndex",
|
||||
"KnowledgePackage",
|
||||
"ImpactLevel",
|
||||
]
|
||||
Reference in New Issue
Block a user