Files
breakpilot-compliance/backend-compliance/compliance/completeness/schemas.py
T
Benjamin Admin aa99111a87 feat(completeness): Regulatory Completeness Engine — auditable coverage, not confidence
Phase A½. The move from feature to product development: for every assessment, answer "how sure are
we that this answer is COMPLETE?" — different from confidence. The product never claims full coverage;
it makes its own knowledge state transparent and auditable. Shows what we do NOT know and why.

- compliance/completeness/: assess_completeness(identified, corpus_status, uncertain, assumptions,
  assessed_obligations) -> CompletenessReport. Separates IDENTIFIED from ASSESSED (validated corpus
  AND determined applicability) and justifies every gap. Two kinds of open: corpus gap (future_corpus)
  and applicability uncertainty (query_required + deciding question, e.g. Data Act / generates_usage_data).
- The metric is COUNTS, never a single percentage: "Identifiziert N · bewertet M · offen K ·
  Unsicherheiten U · Begründung ja" + an honest audit statement.
- ADR-007: auditable honesty; phase order A factory -> A½ Completeness -> B new domains; the
  transparency selling point. Deterministic, no LLM; corpus status + obligation count injected.
- reference suite: "Regulatory Completeness" section runs an industrial-dishwasher assessment
  (assessed CRA/MaschinenVO; open EMV/Environmental=future_corpus, Data Act=query_required) and notes
  Environmental flips open->validated automatically once the corpus lands.

11 completeness tests (54 with adjacent modules), mypy --strict clean (15 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 14:16:12 +02:00

63 lines
2.7 KiB
Python

"""Schemas for the Regulatory Completeness Engine — auditable knowledge-coverage, not confidence.
For an assessment it answers „wie sicher sind wir, dass diese Antwort VOLLSTÄNDIG ist?" by separating
IDENTIFIED regulations from ASSESSED ones (those in the validated corpus) and listing every open or
excluded domain WITH a reason. The metric is counts, never a single „87%". This is an internal quality
machine: the product never claims full coverage — it makes its own knowledge state transparent.
Deterministic, computed-not-stored, no new meta-model class (freeze v1.0). Python 3.9 compatible.
"""
from __future__ import annotations
from enum import Enum
from typing import List
from pydantic import BaseModel, Field
class CorpusStatus(str, Enum):
"""The maturity of our knowledge corpus for a regulation/domain."""
VALIDATED = "validated" # we can fully assess this
DRAFT = "draft" # partial / under review
UNSUPPORTED = "unsupported" # triggered but no corpus yet
UNKNOWN = "unknown" # not in our registry at all
class DomainCoverage(BaseModel):
regulation: str
status: CorpusStatus = CorpusStatus.UNKNOWN
note: str = ""
class Exclusion(BaseModel):
"""A domain/regulation DELIBERATELY not assessed — always with a reason (the heart of the engine)."""
subject: str
reason: str
deciding_question: str = "" # what would resolve it (if a query)
resolution: str = "future_corpus" # query_required | future_corpus | not_applicable
class Assumption(BaseModel):
key: str
value: str = ""
note: str = ""
class CompletenessReport(BaseModel):
"""The auditable coverage report for one assessment — counts + justification, NO single percentage."""
identified_regulations: List[str] = Field(default_factory=list)
assessed_regulations: List[str] = Field(default_factory=list) # in the validated corpus
open_regulations: List[str] = Field(default_factory=list) # identified but not validated
open_corpora: List[str] = Field(default_factory=list) # missing domains worth building
coverage: List[DomainCoverage] = Field(default_factory=list)
assumptions: List[Assumption] = Field(default_factory=list)
exclusions: List[Exclusion] = Field(default_factory=list)
uncertainties_count: int = 0
assessed_obligations: int = 0 # injected (Execution-owned)
justification_present: bool = False
completeness_summary: str = "" # "Identifiziert N · bewertet M · offen K · ..."
audit_statement: str = "" # the honest narrative sentence