Files
breakpilot-compliance/backend-compliance/reference_scenarios/mission_machine_builder.py
T
Benjamin Admin 98f67e75d9 feat(mission): Customer Mission #1 — the platform as one connected expert system (end-to-end)
Turn the architecture inside-out: instead of refining classes/registries/journeys, force the whole
platform to behave as ONE expert system and run a real consulting project end-to-end — measuring how
often the consultant has to "jump" (special-case glue instead of a clean engine-to-engine handoff). A
Reference Scenario asks "is the knowledge correct?"; a Customer Mission asks "can a customer WORK with
it?". This is the last big architecture test before broad corpus expansion.

- reference_scenarios/mission_machine_builder.py: a synthetic machine builder (ISO9001 + ISMS + CE +
  PLC + remote maintenance + cloud + 80 devs + EU; no real names) asks "what must I do in the next 6
  months?". Runs the REAL engines: Regulatory Map -> Journey selection -> Capability Delta (RS-005) ->
  Roadmap (leverage) -> Playbooks -> Evidence -> Verification -> Completeness, and produces the 6-month
  consulting answer ("the top-5 measures close 9/16 = 56%, starting with the ones that satisfy CRA AND
  MaschinenVO at once").
- Flow-Continuity audit (the actual test): 5 CLEAN, 2 JUMPS, 2 deliberate DEPENDENCIES. The two real
  seams: (1) Scope -> Journey (no `certs x targets -> journeys` selector engine; the data exists in
  transitions.yaml, only the selection is glue); (2) Evidence -> Verification (parked, Vision V2). The
  two dependencies (cert->capability map @Execution, corpus_status curation) are intended ownership
  boundaries, not architecture breaks.
- Finding: the platform carries the WHOLE consulting flow end-to-end. Once the Scope->Journey selector
  exists, the foundation is essentially done — from there the work is knowledge, not architecture.

4 end-to-end tests (mission runs, exactly two known jumps, full flow present, no real company names).
check-loc 0. Non-runtime harness -> no deploy (ADR-001).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 08:39:26 +02:00

172 lines
12 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ruff: noqa
# mypy: ignore-errors
"""Customer Mission #1 — a full consulting simulation, NOT another architecture artifact.
A Reference Scenario asks „is the knowledge correct?". A Customer Mission asks „can a customer actually
WORK with it?" — it forces the whole platform to behave as ONE connected expert system, from the first
question to a prioritised 6-month plan, and MEASURES how often the consultant had to „jump" (special-case
glue instead of a clean engine-to-engine handoff). If Mission #1 runs without jumps, the architecture is
probably done; the remaining work is knowledge, not foundation.
Synthetic machine builder (NO real company names). Runs the REAL engines end-to-end.
Run: cd backend-compliance && PYTHONPATH=. python3 reference_scenarios/mission_machine_builder.py
Not product code; not imported by the app. Non-runtime -> no deploy.
"""
from __future__ import annotations
import os
import yaml
from compliance.profile.canonical import (
CanonicalProductRegulatoryProfile as P, CanonicalProductType as PT,
EconomicOperatorRole as Role, CanonicalLifecyclePhase as LP,
)
from compliance.regulatory_map.renderer import render_regulatory_map
from compliance.company import CompanyContext, Certification, CapabilityMappingEntry, build_company_profile
from compliance.reasoning.enums import Confidence
from compliance.transition_reasoning import (
TransitionContext, TransitionGoal, TargetRequirement, assess_transition, CoverageStatus,
)
from compliance.optimization import roadmap_from_delta, select_within_budget
from compliance.playbook import playbooks_for_plan
from compliance.completeness import assess_completeness
OUT = []
JUMPS = [] # (handoff, status, note) — the flow-continuity audit
def w(s=""):
OUT.append(s)
def step(handoff, status, note):
JUMPS.append((handoff, status, note))
_HERE = os.path.dirname(__file__)
_K = os.path.join(_HERE, "..", "knowledge")
w('# Customer Mission #1 — Maschinenbauer: „Was muss ich in den nächsten 6 Monaten tun?"')
w("")
w('_KEINE Demo, KEIN Reference Scenario — eine vollständige Simulation eines Beratungsprojekts mit den ECHTEN Engines. Gemessen wird, wie oft der Berater „springen" muss (Sonderlogik statt sauberem Engine-Fluss). Synthetischer Kunde, keine echten Namen._')
w("")
w("## Der Kunde (synthetisch)")
w("> ISO 9001 · ISMS (ISO 27001) · CE-Prozess · SPS · Fernwartung · Cloud · 80 Entwickler · Export EU")
w('> **Eine Frage:** „Was muss ich in den nächsten sechs Monaten tun?"')
w("")
# ── 1. Scope — was gilt? (regulatory map) ─────────────────────────────────
prod = P(name="mb", product_type=PT.MACHINERY, markets=["EU"], economic_operator_role=Role.MANUFACTURER,
lifecycle_phase=LP.PLACING_ON_MARKET, is_machine=True, is_component=False, has_embedded_software=True,
connected_to_internet=True, has_remote_access=True, generates_usage_data=None)
rm = render_regulatory_map(prod)
appl = [v.regulation_id for v in rm.applicable_regulations]
unc = [v.regulation_id for v in rm.uncertain_regulations]
w("## 1. Scope — was gilt? _(Regulatory Map)_")
w("- **Gilt:** %s" % ", ".join(appl))
w("- **Unsicher (Rückfrage):** %s" % ", ".join(unc))
w("- **Overlaps:** %s" % ", ".join(ov.overlap_group_id for ov in rm.overlaps))
w("")
step("Onboarding → Scope", "CLEAN", "Regulatory Map leitet aus dem Produktprofil CRA/MaschinenVO/EMV ab; RED/DataAct/NIS2 unsicher.")
# ── 2. Journey — welche Übergänge? (certs + targets -> transitions) ────────
# the company HAS ISO 27001 + ISO 9001; the product triggers CRA + MaschinenVO.
# THERE IS NO ENGINE that selects the journeys from (certs x targets) — we do it by hand here.
w("## 2. Journey — welche Übergänge? _(aus Zertifikaten + Zielen)_")
w("- Hat **ISO 27001 + ISO 9001**, Produkt = vernetzte Maschine → Ziel **CRA + MaschinenVO**.")
w("- Gewählte Journey: **ISO 27001 → CRA + MaschinenVO** (Convergence-Pattern) + QM-Seite ISO 9001 → MaschinenVO.")
w("- ⚠️ Die Übergänge stehen als DATEN in `knowledge/programs/transitions.yaml`, aber **keine Engine wählt sie aus Zertifikaten+Zielen** — hier manuell selektiert.")
w("")
step("Scope → Journey", "JUMP", "Kein Selektor-Engine `certs × applicable-targets → journeys` — die Journey-Wahl ist Glue (Daten existieren in transitions.yaml).")
CP = yaml.safe_load(open(os.path.join(_K, "transition_patterns", "transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml"), encoding="utf-8"))
# ── 3. Capability Delta — was fehlt? (Company 2A + RS-005) ─────────────────
have = [a["capability"] for a in CP["likely_covered"]]
cmap = {"ISO27001": CapabilityMappingEntry(capability_ids=have, confidence=Confidence.MEDIUM)}
prof = build_company_profile(CompanyContext(company_id="mb", certifications=[Certification(certification_id="ISO27001")]), cmap)
reqs = [TargetRequirement(capability_id=a["capability"]) for a in CP["likely_covered"]]
reqs += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"))
for d in CP["delta_requirements"]]
assess = assess_transition(TransitionContext(company_id="mb", target=TransitionGoal(target_id="CRA+MaschinenVO")), reqs, prof)
missing = sorted({c.capability_id for c in assess.coverage if c.status == CoverageStatus.MISSING})
w("## 3. Capability Delta — was fehlt? _(Company 2A + RS-005)_")
w("> %s" % assess.summary.headline)
w("- Vermutlich vorhanden (aus ISMS, Welt 1): %s" % ", ".join(assess.summary.probably_covered[:4]) + " …")
w("- Fehlt (Delta): %d Capabilities, z. B. %s …" % (len(missing), ", ".join(missing[:4])))
w("")
step("Journey → Capability Delta", "CLEAN", "assess_transition(Company-Profil, Required) → Coverage + Delta; sauberer Engine-Handoff.")
step("Zertifikate → Capabilities (Dependency)", "DEPENDENCY", "cert→capability-Map ist Execution-owned + injiziert (hier gemockt) — bewusste Ownership-Grenze, kein Architektur-Bruch.")
# ── 4. Roadmap — was zuerst? (Optimization / Leverage) ────────────────────
delta_t = {d["capability"]: d.get("covers_targets", []) for d in CP["delta_requirements"]}
opt = roadmap_from_delta(assess, delta_t)
bud = select_within_budget({m.capability_id: m.covers for m in opt.ranked_measures}, 5)
w("## 4. Roadmap — was zuerst? _(Optimization, größter Hebel)_")
w("> %s" % opt.headline)
w("- **Top-Maßnahmen:** %s" % ", ".join("`%s`(%d)" % (m.capability_id, m.leverage) for m in opt.ranked_measures[:5]))
w("")
step("Capability Delta → Roadmap", "CLEAN", "roadmap_from_delta(assessment, covers_targets) → Maßnahmen nach Hebel; sauber.")
# ── 5. Playbooks — wie umsetzen? ──────────────────────────────────────────
kb = {}
for f in sorted(os.listdir(os.path.join(_K, "implementation_playbooks"))):
if f.endswith(".yaml"):
d = yaml.safe_load(open(os.path.join(_K, "implementation_playbooks", f), encoding="utf-8"))
kb[d["capability_id"]] = d
pbs = playbooks_for_plan(opt, kb)
have_pb = [p for p in pbs if p.status != "missing"]
w("## 5. Playbooks — wie umsetzen? _(Berater-Renderer)_")
w("- %d von %d Maßnahmen haben ein Playbook; %d brauchen Inhalt (Maschinensicherheits-Playbooks @IACE delegiert)." % (len(have_pb), len(pbs), len(pbs) - len(have_pb)))
w("")
step("Roadmap → Playbook", "CLEAN", "playbooks_for_plan(plan, knowledge) → Reise je Maßnahme; fehlender Inhalt = ehrliche `missing`-Stubs.")
# ── 6. Nachweise — was belegen? ───────────────────────────────────────────
ev = sorted({e for d in CP["delta_requirements"] for e in d.get("expected_evidence", [])})
w("## 6. Nachweise — was belegen? _(expected_evidence)_")
w("- Geforderte Nachweise (Auszug): %s …" % ", ".join(ev[:6]))
w("")
step("Playbook → Evidence", "CLEAN", "expected_evidence trägt aus Pattern/Playbook durch — Datenfeld, kein Bruch.")
# ── 7. Verification — kann ich es beweisen? ───────────────────────────────
w("## 7. Verification — kann ich es BEWEISEN?")
w("- ⚠️ **Nicht gebaut** — der Verification-Layer (Evidence × Reality → bewiesen) ist Vision V2 (geparkt, Task #45).")
w("")
step("Evidence → Verification", "JUMP", "Verification-Layer fehlt (bewusst geparkt, Vision V2 / Requirements Verification Platform).")
# ── 8. Completeness — wie sicher/vollständig? ─────────────────────────────
corpus = {r: ("validated" if r in ("CRA", "MaschinenVO", "DataAct") else "unsupported") for r in appl + unc}
rep = assess_completeness(appl + unc, corpus,
uncertain=[{"regulation": "DataAct", "deciding_question": "generates_usage_data", "reason": "generates_usage_data unbekannt"}])
w("## 8. Completeness — wie sicher/vollständig? _(auditierbar)_")
w("> %s" % rep.completeness_summary)
w("- Offen/begründet: %s" % ", ".join("`%s`(%s)" % (e.subject, e.resolution) for e in rep.exclusions))
w("")
step("Completeness (Dependency)", "DEPENDENCY", "corpus_status (welche Regelwerke validiert) wird kuratiert/injiziert, nicht aus dem Korpus abgeleitet.")
# ── Die 6-Monats-Antwort ──────────────────────────────────────────────────
w("## Die 6-Monats-Antwort (Beratungsnarrativ)")
w("")
w('> „Sie sind als Maschinenbauer von **CRA + MaschinenVO** (und EMV) betroffen; RED/Data Act/NIS2 sind erst nach **einer Rückfrage** (`generates_usage_data`) zu klären. Ihr ISMS deckt die Informationssicherheits-Seite *wahrscheinlich* ab (zu bestätigen). Offen sind **%d Maßnahmen**. **Wenn Sie in den nächsten 6 Monaten die Top-5 nach regulatorischem Hebel umsetzen, schließen Sie %d von %d identifizierten Anforderungen (%d%%)** — beginnend mit den Maßnahmen, die CRA UND MaschinenVO gleichzeitig erfüllen. Für jede gibt es ein Umsetzungs-Playbook und die geforderten Nachweise; was wir noch NICHT bewerten konnten (EMV/RED/NIS2), weisen wir transparent aus."' % (
len(missing), bud.requirements_closed, bud.total_requirements, int(round(bud.coverage_ratio * 100))))
w("")
# ── Flow-Continuity-Audit (der eigentliche Test) ──────────────────────────
clean = sum(1 for _, s, _ in JUMPS if s == "CLEAN")
jumps = sum(1 for _, s, _ in JUMPS if s == "JUMP")
deps = sum(1 for _, s, _ in JUMPS if s == "DEPENDENCY")
w("## Flow-Continuity-Audit — der eigentliche Test")
w("")
w("| Übergang | Status | Befund |")
w("|---|---|---|")
for h, s, n in JUMPS:
icon = {"CLEAN": "✅ sauber", "JUMP": "⚠️ SPRUNG", "DEPENDENCY": "🔌 Dependency"}[s]
w("| %s | %s | %s |" % (h, icon, n))
w("")
w("**%d sauber · %d Sprünge · %d bewusste Dependencies.**" % (clean, jumps, deps))
w("")
w("**Befund:** Die Plattform trägt den **gesamten Beratungsfluss** end-to-end — von der Kundenfrage bis zur priorisierten 6-Monats-Maßnahmenliste mit Playbooks, Nachweisen und ehrlicher Vollständigkeit. **Genau ZWEI echte Sprünge:** (1) **Scope → Journey** — es fehlt ein Selektor-Engine `Zertifikate × Ziele → Journeys` (die Daten existieren, nur die Auswahl ist Glue); (2) **Evidence → Verification** — bewusst geparkter Layer (Vision V2). Die zwei Dependencies (cert→capability-Map @Execution, corpus_status-Kuratierung) sind gewollte Ownership-Grenzen, keine Architektur-Brüche. → **Wenn der Scope→Journey-Selektor steht, ist das Fundament im Wesentlichen fertig; ab dann ist die Arbeit Wissen, nicht Architektur.**")
w("")
print("\n".join(OUT))