feat(knowledge-intake): classify a document + assess its impact before extraction
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake classifies a new document (no content extraction) and intersects its signals with an index of the existing knowledge to emit a Knowledge Package (an impact analysis). - compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios, obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic, NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns / reference scenarios / (injected) obligations, whether it is a new domain, and a triage level (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation. - ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package -> Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review. - reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high, 14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section lives in _helpers.py to keep generate.py under the 500-LOC budget. - Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act). 10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -46,6 +46,7 @@ import yaml
|
||||
|
||||
from _helpers import ( # noqa: E402 (script-dir module; keeps generate.py under the LOC budget)
|
||||
OUT, ROLLUP, Row, w, coverage_table, reg_map_block, unsupported_block, interp_status,
|
||||
knowledge_intake_section,
|
||||
)
|
||||
|
||||
ISO_MAP = {"ISO27001": CapabilityMappingEntry(
|
||||
@@ -463,6 +464,8 @@ coverage_table([
|
||||
("Draft-Generatoren neue Domänen (Phase A)", "TODO", "Transition-/Reference-Scenario-Drafts"),
|
||||
])
|
||||
|
||||
knowledge_intake_section(os.path.dirname(__file__)) # Knowledge Intake (impact triage) — kept in _helpers for LOC
|
||||
|
||||
# ── Epics + roll-up ───────────────────────────────────────────────────────
|
||||
w("## Gaps → Epics (Backlog — nur erfasst, NICHT implementiert)")
|
||||
w("")
|
||||
|
||||
Reference in New Issue
Block a user