Files
breakpilot-compliance/backend-compliance/compliance/knowledge_production/schemas.py
T
Benjamin Admin b6cfc0a503 feat(knowledge-production): Playbook Draft Generator — prepare the corpus deterministically
The bottleneck is not content, it is knowledge PRODUCTION. Instead of writing 200 playbooks by
hand, generate drafts deterministically from data the software already owns, then have an expert
review them. Mirrors the legal pipeline (Gesetz -> Parser -> Obligation -> Review) for BreakPilot's
own knowledge: new Capability -> Registry -> Transition Pattern -> Playbook Draft Generator ->
Expert Review -> versioned Playbook.

- compliance/knowledge_production/: generate_playbook_draft(capability, requirement, control_links)
  + drafts_from_pattern(pattern) -> one PlaybookDraft per delta capability. Owned fields (why /
  closes_regulations / expected_evidence / typical_controls) are assembled with per-field provenance;
  the practitioner know-how (tools / process_steps / how_others) is left as an explicit TODO.
- DraftStatus lifecycle (Freigabestatus): draft_generated -> in_review -> reviewed -> validated ->
  proven. Deterministic, NO LLM in the core (any model enrichment stays offline/advisory/propose-only).
- ADR-005: extends "the engine does not change, the corpus grows" with "and the corpus is not written
  by hand — it is deterministically prepared, then curated".
- reference suite: "Knowledge Production" section turns the convergence pattern into 12 auto-assembled
  drafts (why/closes/evidence filled, tools/steps TODO) -> review 12 drafts, don't write 12 playbooks.

10 tests (50 with playbook/optimization/transition/company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 13:31:31 +02:00

47 lines
2.4 KiB
Python

"""Schemas for Knowledge Production — deterministic draft assembly + lifecycle.
The corpus is no longer written by hand: it is deterministically PREPARED from data the software
already owns (Capability, Transition Pattern, Controls, Evidence, leverage), then curated by an
expert. A `PlaybookDraft` is a machine-assembled skeleton with per-field provenance and an explicit
TODO list of what still needs human (or offline-propose) input. No LLM in the deterministic core.
Python 3.9 compatible (no `|` unions).
"""
from __future__ import annotations
from enum import Enum
from typing import Dict, List
from pydantic import BaseModel, Field
class DraftStatus(str, Enum):
"""Freigabestatus — the knowledge lifecycle from machine draft to proven (mirrors the
transition-pattern / playbook maturity, with a machine-assembled pre-stage)."""
DRAFT_GENERATED = "draft_generated" # machine-assembled, NOT yet expert-touched
IN_REVIEW = "in_review" # an expert is curating it
REVIEWED = "reviewed" # internally reviewed
VALIDATED = "validated" # domain expert confirmed
PROVEN = "proven" # confirmed in the field
class PlaybookDraft(BaseModel):
"""A deterministically assembled playbook draft for one capability.
Owned fields (why / closes_regulations / expected_evidence / typical_controls) are filled from
existing data with provenance; the practitioner know-how (tools / process_steps / how_others)
is left as TODO. The expert reviews a draft instead of writing from a blank page.
"""
capability_id: str
status: DraftStatus = DraftStatus.DRAFT_GENERATED
title: str = ""
why: str = "" # from the transition pattern (why_asked/missing_because)
closes_regulations: List[str] = Field(default_factory=list) # from leverage (covers_targets)
expected_evidence: List[str] = Field(default_factory=list) # from the transition pattern
typical_controls: List[str] = Field(default_factory=list) # injected from Execution (may be empty)
provenance: Dict[str, str] = Field(default_factory=dict) # field -> source it was assembled from
todo: List[str] = Field(default_factory=list) # fields the expert/offline-propose must still add
disclaimer: str = "" # machine draft, requires expert curation