Files
breakpilot-compliance/backend-compliance/compliance/knowledge_intake/__init__.py
T
Benjamin Admin 07e392913f feat(knowledge-intake): classify a document + assess its impact before extraction
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents
arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake
classifies a new document (no content extraction) and intersects its signals with an index of the
existing knowledge to emit a Knowledge Package (an impact analysis).

- compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios,
  obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic,
  NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns /
  reference scenarios / (injected) obligations, whether it is a new domain, and a triage level
  (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation.
- ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package ->
  Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review.
- reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high,
  14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section
  lives in _helpers.py to keep generate.py under the 500-LOC budget.
- Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act).

10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 13:58:59 +02:00

24 lines
879 B
Python

"""Knowledge Intake — classify an incoming document and assess its impact on existing knowledge.
The stage BEFORE the parser: no content extraction, only Einordnung. Intersects a document's signals
(regulations + keywords) with an index of the existing knowledge to emit a `KnowledgePackage` — which
capabilities / playbooks / patterns / reference scenarios / obligations it probably touches, whether
it is a new domain, and how much review it warrants. Deterministic, no LLM, no new corpus (freeze v1.0).
"""
from __future__ import annotations
from .engine import assess_document_impact, build_knowledge_index
from .schemas import (
DocumentDescriptor, ImpactLevel, KnowledgeIndex, KnowledgePackage,
)
__all__ = [
"build_knowledge_index",
"assess_document_impact",
"DocumentDescriptor",
"KnowledgeIndex",
"KnowledgePackage",
"ImpactLevel",
]