07e392913f
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake classifies a new document (no content extraction) and intersects its signals with an index of the existing knowledge to emit a Knowledge Package (an impact analysis). - compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios, obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic, NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns / reference scenarios / (injected) obligations, whether it is a new domain, and a triage level (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation. - ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package -> Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review. - reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high, 14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section lives in _helpers.py to keep generate.py under the 500-LOC budget. - Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act). 10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
24 lines
879 B
Python
24 lines
879 B
Python
"""Knowledge Intake — classify an incoming document and assess its impact on existing knowledge.
|
|
|
|
The stage BEFORE the parser: no content extraction, only Einordnung. Intersects a document's signals
|
|
(regulations + keywords) with an index of the existing knowledge to emit a `KnowledgePackage` — which
|
|
capabilities / playbooks / patterns / reference scenarios / obligations it probably touches, whether
|
|
it is a new domain, and how much review it warrants. Deterministic, no LLM, no new corpus (freeze v1.0).
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from .engine import assess_document_impact, build_knowledge_index
|
|
from .schemas import (
|
|
DocumentDescriptor, ImpactLevel, KnowledgeIndex, KnowledgePackage,
|
|
)
|
|
|
|
__all__ = [
|
|
"build_knowledge_index",
|
|
"assess_document_impact",
|
|
"DocumentDescriptor",
|
|
"KnowledgeIndex",
|
|
"KnowledgePackage",
|
|
"ImpactLevel",
|
|
]
|