From 1a9439d013b0b21753c78e7ace305dd4985e9d1d Mon Sep 17 00:00:00 2001 From: Benjamin Admin Date: Sat, 27 Jun 2026 18:49:06 +0200 Subject: [PATCH] =?UTF-8?q?feat(programs):=20open=20Domain=20Knowledge=20P?= =?UTF-8?q?rogram=20v1=20=E2=80=94=207-stage=20production=20line=20+=20per?= =?UTF-8?q?-domain=20KPI?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The real bottleneck is domain MODELLING. Phase B is organized as one program with sub-programs per domain, each run through the SAME 7-stage production line. No new runtime framework, no new module (ADR-009, Freeze v1.0) — only program data + a derived reporting view. - Customer enters by INDUSTRY, not regulation: Industry -> Domain Model -> Requirement Sources -> Requirements -> Capabilities -> ... -> Completeness. - 7-stage checklist identical for every domain (Domain Model / Requirement Sources / Capability Registry / Transition Patterns / Playbooks / Reference Scenarios / Completeness) with per-stage ownership. README generalized to the framework. - Each domain lists typical_requirement_sources + typical_certifications -> pre-onboarding capability HYPOTHESIS (the ETO insight; feeds Company 2A as inferred, never confirmed). - Backlog v1 (by customer value): 1 Industrial Automation, 2 Environmental, 3 Automotive, 4 Medical, 5 Energy. Five domain-definition shells (environmental restructured to the unified shape, law-first preserved). - Per-domain KPI is DERIVED from the real corpus (computed-not-stored; sources modelled / transition patterns / playbooks / reference scenarios), NOT a curated number. Reference suite renders maturity bars: Industrial Automation 43% (3/7 sources) leads, Environmental 0% (work ahead). Backlog (value) and KPI (corpus state) are deliberately separated. - ADR-009: Domain Knowledge Program framework. Honest known refinement: regulation-ID normalization (CRA vs Cyber Resilience Act) aliased in the KPI. 7 program-contract tests (backlog order + industry-first + derived-not-stored), check-loc 0. Knowledge data + ADR + reference harness = non-runtime -> no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 --- .../knowledge/programs/README.md | 87 ++++++++++--------- .../knowledge/programs/automotive.yaml | 19 ++++ .../knowledge/programs/energy.yaml | 18 ++++ .../knowledge/programs/environmental.yaml | 74 ++++++---------- .../programs/industrial_automation.yaml | 22 +++++ .../knowledge/programs/medical.yaml | 18 ++++ .../reference_scenarios/_helpers.py | 70 ++++++++++----- .../reference_scenario_suite_v1.md | 30 +++---- .../tests/test_domain_programs.py | 80 ++++++++++------- .../adr/ADR-009-domain-knowledge-program.md | 53 +++++++++++ 10 files changed, 312 insertions(+), 159 deletions(-) create mode 100644 backend-compliance/knowledge/programs/automotive.yaml create mode 100644 backend-compliance/knowledge/programs/energy.yaml create mode 100644 backend-compliance/knowledge/programs/industrial_automation.yaml create mode 100644 backend-compliance/knowledge/programs/medical.yaml create mode 100644 docs-src/architecture/adr/ADR-009-domain-knowledge-program.md diff --git a/backend-compliance/knowledge/programs/README.md b/backend-compliance/knowledge/programs/README.md index 4d0cd003..39dfdc60 100644 --- a/backend-compliance/knowledge/programs/README.md +++ b/backend-compliance/knowledge/programs/README.md @@ -1,51 +1,58 @@ -# Domain Knowledge Programs — from architecture to domains +# Domain Knowledge Program — the production line for every domain -**The architecture is stable. From here the value comes from DOMAINS, not more software.** -The runtime architecture (scope, regulatory map, capability delta, optimization, playbooks, intake, -production, completeness) is frozen. A new regulatory domain is a **data + knowledge** project that -extends every existing view automatically — no new runtime framework (see ADR-008, Freeze v1.0). +**The architecture is stable. From here the value comes from DOMAIN MODELLING, not more software.** +The real bottleneck is no longer architecture or controls or even „knowledge" — it is **domain +modelling**. Phase B is therefore organised as ONE program with sub-programs per domain, each run +through the SAME production line. No new runtime framework (ADR-008/009, Freeze v1.0). -## The production line (reusable for EVERY domain) +## The customer enters by INDUSTRY, not by regulation + +A customer never says „explain ISO 9001". They say „I build packaging machines" / „I'm an automotive +supplier" / „I build parking systems". So the pipeline starts at the industry: ``` -Regulatory Corpus → Obligations → Capabilities → Transition Patterns → Playbooks → Reference Scenarios → Completeness +Industry → Domain Model → Requirement Sources → Requirements → Capabilities → … → Reality / Verification ``` -Each domain delivers the SAME artifacts. Once they land, the existing engines automatically extend: -Scope · Gap · Capability Delta · Optimization · Playbooks · Reference Scenarios · Regulatory Completeness. +## The 7-stage checklist (identical for EVERY domain) -## Law-first (a deliberate ordering) - -Start from the **law**, not from a management system. A management system (e.g. ISO 14001) is NOT -the domain — it is one possible source state. The customer asks *„welche Anforderungen gelten für mein -Produkt?"*, not *„wie komme ich von ISO 14001 weg?"*. So the order is: - -``` -Recht → Obligations → Capabilities → (Managementsystem) → Delta -``` - -A management-system→corpus transition pattern is built LAST, once BOTH sides are known. - -## Ownership per stage (coordinate via the board, do not duplicate) - -| Stage | Artifact | Owner | +| # | Stage | Owner | |---|---|---| -| B1 Corpus + Obligations | legal sources, obligations registry | **Legal Knowledge / Obligation Registry** | -| B2 Capability Model | new capabilities in the Master Capability Registry | **Compliance Execution** | -| B3 Transition Patterns | source-state → corpus delta patterns | **Reasoning (Knowledge Acquisition)** | -| Playbooks | implementation playbooks per capability | **Reasoning** | -| Reference Scenarios | canonical regression + expected outcomes | **Reasoning** | -| Completeness | corpus-status registry per domain | **Reasoning / curation** | +| 1 | **Domain Model** (industry → what world is this?) | Reasoning / curation | +| 2 | **Requirement Sources** (which regulations/standards/specs apply) | Legal Knowledge | +| 3 | **Capability Registry** (capabilities the sources require) | Compliance Execution | +| 4 | **Transition Patterns** (source-state → domain delta) | Reasoning | +| 5 | **Playbooks** (how to implement each capability) | Reasoning | +| 6 | **Reference Scenarios** (canonical regression + expected outcomes) | Reasoning | +| 7 | **Completeness** (auditable coverage per domain) | Reasoning / curation | -## Programs (planned) +This is the scaling mechanism: every new domain reuses the same production line; the existing engines +(Scope, Gap, Capability Delta, Optimization, Playbooks, Reference, Completeness) extend automatically. -| Program | File | Status | -|---|---|---| -| Environmental Knowledge Program | `environmental.yaml` | started (B1 handed off) | -| Automotive Knowledge Program | _planned_ | — | -| OT / IEC 62443 Knowledge Program | _planned_ | — | -| Functional Safety Knowledge Program | _planned_ | — | +## A domain knows its typical sources → pre-onboarding HYPOTHESIS (the ETO insight) -Each program is a machine-readable definition (`*.yaml`) consumed by the reference suite to track its -progress; future sessions flip stage `status` as artifacts land, and the Completeness engine reports -the domain flipping `unsupported → validated` automatically. +Each domain definition lists `typical_requirement_sources` and `typical_certifications`. So before +onboarding, BreakPilot can say „this process world is *probably* present" — as a **hypothesis, not a +truth**. We don't want to know whether an automotive supplier has ISO 9001 (everyone does); we want +to know **which company capabilities are therefore probably already present** (feeds Company 2A as +`inferred`, never `confirmed`). + +## Per-domain KPI — reproducible, not marketing + +Progress per domain is **derived from the Regulatory Completeness Engine + the actual corpus** +(computed-not-stored): identified requirement sources · modelled capabilities · transition patterns · +playbooks · passed reference scenarios · consciously declared corpus gaps. Rendered as a bar +(`Industrial ███████░░░ 70 %`). These are reproducible quality metrics — no curated numbers. + +## Domain Knowledge Program v1 — backlog (by current customer value) + +| Rank | Domain | File | Typical sources | +|---|---|---|---| +| 1 | **Industrial Automation** | `industrial_automation.yaml` | CRA · MaschinenVO · EMV · RED · Data Act · IEC 62443 · NIS2 | +| 2 | Environmental | `environmental.yaml` | Wasser · Chemikalien · Luft · Energie · Abfall · Produktverantwortung | +| 3 | Automotive | `automotive.yaml` | IATF · TISAX · UNECE R155/R156 · ASPICE · OEM-Lastenhefte | +| 4 | Medical | `medical.yaml` | MDR · IEC 62304 · ISO 14971 | +| 5 | Energy | `energy.yaml` | je nach Zielmarkt | + +The work shifts decisively from software development to knowledge production; the competitive +advantage now comes from the quality and breadth of the modelled domains. diff --git a/backend-compliance/knowledge/programs/automotive.yaml b/backend-compliance/knowledge/programs/automotive.yaml new file mode 100644 index 00000000..3305f4d3 --- /dev/null +++ b/backend-compliance/knowledge/programs/automotive.yaml @@ -0,0 +1,19 @@ +# Domain Knowledge Program — Automotive (backlog rank 3). A domain DEFINITION, not corpus content. +# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md. + +id: PROG-automotive +name: "Automotive Domain" +industry: "Automobilzulieferer, OEM-Zulieferkette" +customer_entry: "Ich bin Automobilzulieferer." +backlog_rank: 3 +rationale: "großer Markt; OEM-Lastenhefte = früher Business-Requirement-Anwendungsfall." +status: planned + +typical_requirement_sources: [IATF16949, TISAX, "UNECE R155", "UNECE R156", ASPICE, OEM_Lastenheft] +typical_certifications: [ISO9001, IATF16949, TISAX, ISO27001] + +ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning" +note: > + ISMS→TISAX-Transition-Pattern existiert bereits (Vorarbeit). UNECE R155 (Cybersecurity Management + System) ↔ CRA = quellenübergreifender Convergence-Kandidat. OEM-Lastenheft = erster Business + Requirement (siehe Vision V2 / Requirements Verification, NICHT jetzt). diff --git a/backend-compliance/knowledge/programs/energy.yaml b/backend-compliance/knowledge/programs/energy.yaml new file mode 100644 index 00000000..247d0570 --- /dev/null +++ b/backend-compliance/knowledge/programs/energy.yaml @@ -0,0 +1,18 @@ +# Domain Knowledge Program — Energy (backlog rank 5). A domain DEFINITION, not corpus content. +# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md. + +id: PROG-energy +name: "Energy Domain" +industry: "Energieerzeugung/-verteilung, Anlagen kritischer Infrastruktur" +customer_entry: "Ich baue Anlagen für Energieerzeugung / kritische Infrastruktur." +backlog_rank: 5 +rationale: "Zielmarkt-abhängig; nach den klareren Industrie-/Produkt-Domänen." +status: planned + +typical_requirement_sources: [NIS2, IEC62443, CRA, "netzcode/marktabhängig"] +typical_certifications: [ISO27001, IEC62443] + +ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning" +note: > + Stark zielmarkt-abhängig (Netzcodes, nationale Vorgaben). NIS2/IEC62443 teilen sich Capabilities + mit Industrial Automation → Wiederverwendung wahrscheinlich hoch. diff --git a/backend-compliance/knowledge/programs/environmental.yaml b/backend-compliance/knowledge/programs/environmental.yaml index 19209419..02ffdb66 100644 --- a/backend-compliance/knowledge/programs/environmental.yaml +++ b/backend-compliance/knowledge/programs/environmental.yaml @@ -1,57 +1,33 @@ -# Environmental Knowledge Program — a regulatory DOMAIN, not an ISO-14001 project. -# Machine-readable program definition consumed by the reference suite to track progress. +# Domain Knowledge Program — Environmental (backlog rank 2). A domain DEFINITION, not corpus content. # LAW-FIRST: Umweltrecht -> Obligations -> Capabilities -> ISO 14001 -> Delta (never the reverse). +# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md. id: PROG-environmental -name: "Environmental Knowledge Program" -customer_question: "Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?" -status: started # planned | started | in_progress | complete +name: "Environmental Domain" +industry: "Hersteller mit Umweltpflichten (z. B. Industriespülmaschinen, Anlagenbau)" +customer_entry: "Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?" +backlog_rank: 2 +rationale: "konkreter Kundenbezug (Abwasser/Chemikalien) — direkt nach Industrial Automation." +status: started principle: > - ISO 14001 ist KEIN Umweltrecht, sondern ein Managementsystem. Wir starten beim Recht und fragen - erst danach, welche vorhandenen Managementsysteme davon wahrscheinlich schon etwas abdecken. + ISO 14001 ist KEIN Umweltrecht, sondern ein Managementsystem (= ein Quellzustand). LAW-FIRST: + erst das Recht, dann welche vorhandenen Managementsysteme davon wahrscheinlich schon etwas abdecken. -# the reusable production line, instantiated for this domain -blueprint: [corpus, obligations, capabilities, transition_patterns, playbooks, reference_scenarios, completeness] +# Stage 2 — the requirement-source areas of this domain (each becomes laws/obligations at Stage 2-3) +typical_requirement_sources: [water, chemicals, emissions, energy, waste, product_responsibility] +typical_certifications: [ISO14001, ISO9001] # pre-onboarding capability HYPOTHESIS (nicht Wahrheit) -stages: - - id: B1 - name: "Environmental Regulatory Corpus" - owner: "Legal Knowledge / Obligation Registry" - status: open # handed off — not built here - note: "Zunächst NUR Rechtsquellen + Pflichten (noch keine ISO, keine Capabilities)." - areas: # the six environmental obligation areas the customer actually faces - - water # Wasser / Abwasser - - chemicals # Chemikalien (REACH/CLP) - - emissions # Emissionen - - energy # Energie - - waste # Abfall - - product_responsibility # Produktverantwortung +# Reasoning capabilities to be modelled (Stage 3, @Execution) once the corpus lands +target_capabilities: + - chemical_management + - wastewater_management + - emissions_monitoring + - hazardous_substance_management + - energy_data_capture + - environmental_incident_management - - id: B2 - name: "Environmental Capability Model" - owner: "Compliance Execution" - status: open # depends on B1; Registry grows here - depends_on: [B1] - capabilities: - - chemical_management - - wastewater_management - - emissions_monitoring - - hazardous_substance_management - - energy_data_capture - - environmental_incident_management - - - id: B3 - name: "Transition Patterns (ISO 14001 -> Environmental Corpus)" - owner: "Reasoning (Knowledge Acquisition)" - status: blocked # LAST — only once both sides (corpus + capabilities) are known - depends_on: [B1, B2] - note: "Erst jetzt sinnvoll: ISO 14001 als Quellzustand gegen den Umwelt-Korpus (User: law-first)." - -# once B1-B3 land these extend AUTOMATICALLY via the existing engines (no new runtime architecture) -downstream_auto: [playbooks, reference_scenarios, optimization, scope, capability_delta, completeness] - -acceptance: > - Regulatory Completeness kippt `Environmental` von unsupported/open auf assessed; die sechs Bereiche - sind als Obligations + Capabilities im validierten Korpus, das ISO-14001-Delta als Transition Pattern. - Messbar an der Reference Suite (Environmental-Zelle UNSUPPORTED -> PASS). +ownership: "Stufe 1 Reasoning · 2 Legal-KG (HANDOFF, nur Recht+Pflichten) · 3 Execution (HANDOFF) · 4-7 Reasoning" +note: > + B3 (ISO 14001 -> Korpus-Transition) entsteht ZULETZT, erst wenn Recht + Capabilities bekannt sind. + Acceptance: Regulatory Completeness kippt `Environmental` von unsupported/open auf assessed. diff --git a/backend-compliance/knowledge/programs/industrial_automation.yaml b/backend-compliance/knowledge/programs/industrial_automation.yaml new file mode 100644 index 00000000..9c8a7110 --- /dev/null +++ b/backend-compliance/knowledge/programs/industrial_automation.yaml @@ -0,0 +1,22 @@ +# Domain Knowledge Program — Industrial Automation (backlog rank 1, highest current customer value). +# A domain DEFINITION, not corpus content. The 7-stage progress is DERIVED from the corpus by the +# reference suite (computed-not-stored), never stored here. See programs/README.md for the checklist. + +id: PROG-industrial-automation +name: "Industrial Automation Domain" +industry: "Maschinen-/Anlagenbau, Industrieautomation, Parksysteme, Verpackungsmaschinen" +customer_entry: "Ich baue Verpackungsmaschinen / Parksysteme / Industrieanlagen." +backlog_rank: 1 +rationale: "höchster aktueller Kundennutzen — bereits am weitesten modelliert (CRA + MaschinenVO)." +status: in_progress + +# Stage 2 — the requirement sources that typically apply to this industry +typical_requirement_sources: [CRA, MaschinenVO, EMV, RED, DataAct, IEC62443, NIS2] + +# pre-onboarding capability HYPOTHESIS (nicht Wahrheit, vgl. ETO): feeds Company 2A as `inferred` +typical_certifications: [ISO9001, ISO27001] + +ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning" +note: > + Diese Domäne hat den Vorlauf: CRA + MaschinenVO sind als Convergence-Pattern, RTS und Playbooks + bereits (teilweise) im Korpus. EMV/RED/IEC62443/NIS2 sind identifiziert, aber noch nicht modelliert. diff --git a/backend-compliance/knowledge/programs/medical.yaml b/backend-compliance/knowledge/programs/medical.yaml new file mode 100644 index 00000000..6d85eebe --- /dev/null +++ b/backend-compliance/knowledge/programs/medical.yaml @@ -0,0 +1,18 @@ +# Domain Knowledge Program — Medical (backlog rank 4). A domain DEFINITION, not corpus content. +# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md. + +id: PROG-medical +name: "Medical Domain" +industry: "Medizinprodukte-Hersteller, Medizintechnik" +customer_entry: "Ich baue Medizinprodukte / Medizintechnik." +backlog_rank: 4 +rationale: "hoher Leidensdruck (MDR), aber spezialisierter Markt → nach Industrial/Automotive." +status: planned + +typical_requirement_sources: [MDR, IEC62304, ISO14971, IEC60601, CRA] +typical_certifications: [ISO13485, ISO14971] + +ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning" +note: > + IEC 62304 (Software-Lebenszyklus) ↔ CRA secure-development = quellenübergreifender Convergence- + Kandidat. ISO 14971 (Risikomanagement) ↔ Produkt-Risikoanalyse. Erst nach Industrial/Automotive. diff --git a/backend-compliance/reference_scenarios/_helpers.py b/backend-compliance/reference_scenarios/_helpers.py index c0744466..86d813a0 100644 --- a/backend-compliance/reference_scenarios/_helpers.py +++ b/backend-compliance/reference_scenarios/_helpers.py @@ -142,35 +142,61 @@ def completeness_section() -> None: def domain_programs_section(base_dir) -> None: - """Render the Domain Knowledge Programs section (kept here so generate.py stays under the LOC budget).""" + """Domain Knowledge Program v1 — per-domain maturity KPI DERIVED from the corpus (computed-not-stored).""" import os import yaml - from compliance.completeness import assess_completeness + from compliance.knowledge_intake import build_knowledge_index + def _load(sub): + d = os.path.join(base_dir, "..", "knowledge", sub) + return [yaml.safe_load(open(os.path.join(d, f), encoding="utf-8")) + for f in sorted(os.listdir(d)) if f.endswith(".yaml")] + + idx = build_knowledge_index(_load("transition_patterns"), _load("implementation_playbooks"), + _load("reference_transition_scenarios")) pdir = os.path.join(base_dir, "..", "knowledge", "programs") - progs = [yaml.safe_load(open(os.path.join(pdir, f), encoding="utf-8")) - for f in sorted(os.listdir(pdir)) if f.endswith(".yaml")] - w("## Domain Knowledge Programs — ab jetzt Domänen, nicht Architektur") + progs = sorted((yaml.safe_load(open(os.path.join(pdir, f), encoding="utf-8")) + for f in sorted(os.listdir(pdir)) if f.endswith(".yaml")), key=lambda p: p.get("backlog_rank", 99)) + + _ALIAS = {"cyber resilience act": "cra", "maschinenverordnung": "maschinenvo", "iatf": "iatf16949"} + + def _canon(r): + k = str(r).strip().lower() + return _ALIAS.get(k, k) + + def _hits(reg_lists, src): + cs = {_canon(s) for s in src} + return [k for k, regs in reg_lists.items() if cs & {_canon(x) for x in regs}] + + def _source_modeled(index, source, canon): + c = canon(source) + in_tp = any(c in {canon(x) for x in regs} for regs in index.transition_patterns.values()) + in_rts = any(c in {canon(x) for x in regs} for regs in index.reference_scenarios.values()) + in_pb = any(c in {canon(x) for x in index.capability_regulations.get(cap, [])} for cap in index.playbook_capabilities) + return in_tp or in_rts or in_pb + + w("## Domain Knowledge Program v1 — Reifegrad je Domäne (reproduzierbarer KPI)") w("") - w('_Die Runtime-Architektur ist eingefroren. Eine neue Domäne = Daten + Wissen, die jede Sicht automatisch erweitern. Produktionsstraße: Corpus→Obligations→Capabilities→Transition→Playbooks→Reference→Completeness. **Law-first: Recht → Pflichten → Capabilities → Managementsystem → Delta.**_') + w('_Engpass = Domänenmodellierung. Jede Domäne läuft durch DIESELBE 7-Stufen-Produktionsstraße (Domain Model → Requirement Sources → Capability Registry → Transition Patterns → Playbooks → Reference Scenarios → Completeness). Reifegrad aus dem ECHTEN Korpus abgeleitet (computed-not-stored), keine Marketingzahl. Einstieg über Industry, nicht Regelwerk._') w("") + w("| Rank | Domäne | Reifegrad (Sources modelliert) | modelliert/total | Korpus TP·PB·RTS |") + w("|---|---|---|---|---|") for p in progs: - w("**%s** — _%s_ (status: `%s`)" % (p["name"], p["customer_question"], p["status"])) - w("") - w("| Stufe | Artefakt | Owner | Status |") - w("|---|---|---|---|") - for s in p.get("stages", []): - w("| %s | %s | %s | **%s** |" % (s["id"], s["name"], s["owner"], s["status"])) - w("") - areas = next((s.get("areas", []) for s in p.get("stages", []) if s.get("id") == "B1"), []) - if areas: - rep = assess_completeness(identified_regulations=areas, corpus_status={}) # all unknown -> open baseline - w("- **Baseline (Completeness):** %s — die 6 Bereiche: %s" % (rep.completeness_summary, ", ".join(areas))) - w("") - w("_Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automatisch, wenn die Stufen landen — Reference Suite + Completeness dokumentieren den Fortschritt je Domäne._") + src = p.get("typical_requirement_sources", []) + tp, rts = _hits(idx.transition_patterns, src), _hits(idx.reference_scenarios, src) + cs = {_canon(s) for s in src} + pb = [c for c in idx.playbook_capabilities if cs & {_canon(x) for x in idx.capability_regulations.get(c, [])}] + modeled = [s for s in src if _source_modeled(idx, s, _canon)] # sources with >=1 corpus artifact + breadth = (len(modeled) / len(src)) if src else 0.0 # honest differentiator (not CRA-shared depth) + filled = int(round(breadth * 10)) + w("| %d | **%s** | `%s` %d%% | %d/%d | %d·%d·%d |" % ( + p.get("backlog_rank", 99), p["name"], "█" * filled + "░" * (10 - filled), + int(round(breadth * 100)), len(modeled), len(src), len(tp), len(pb), len(rts))) + w("") + w('_Industry-Einstieg + ETO-Hypothese: jede Domäne kennt ihre typischen Sources + Zertifikate → vor dem Onboarding „diese Prozesswelt ist wahrscheinlich vorhanden" (Hypothese, nie Wahrheit; speist Company 2A als `inferred`). Backlog nach Kundennutzen, KPI nach echtem Korpusstand — beides bewusst getrennt._') w("") coverage_table([ - ("Domain Program Blueprint (wiederverwendbar)", "PASS", "Corpus→…→Completeness, law-first, Ownership je Stufe"), - ("Environmental Program (Daten)", "PASS", "B1@Legal-KG · B2@Execution · B3@Reasoning (blocked)"), - ("Phase B = Domänen, keine Architektur", "PASS", "kein neues Runtime-Framework (Freeze, ADR-008)"), + ("Domain Knowledge Program (7-Stufen-Produktionsstraße)", "PASS", "%d Domänen im Backlog, Industrial Automation #1" % len(progs)), + ("Reifegrad-KPI (computed-not-stored)", "PASS", "aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne)"), + ("Regelwerk-ID-Normalisierung", "TODO", "Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend"), ]) diff --git a/backend-compliance/reference_scenarios/reference_scenario_suite_v1.md b/backend-compliance/reference_scenarios/reference_scenario_suite_v1.md index 3b8e7b52..c9411f64 100644 --- a/backend-compliance/reference_scenarios/reference_scenario_suite_v1.md +++ b/backend-compliance/reference_scenarios/reference_scenario_suite_v1.md @@ -365,29 +365,27 @@ _Sobald der Umwelt-Korpus (ISO 14001 etc.) landet, kippt `Environmental` automat | Begründete Ausschlüsse (Korpus/Anwendbarkeit) | **PASS** | 3 Ausschlüsse, alle mit Grund | | Fortschritts-Doku je Domäne | **PASS** | Environmental offen→validated bei Korpus-Landung | -## Domain Knowledge Programs — ab jetzt Domänen, nicht Architektur +## Domain Knowledge Program v1 — Reifegrad je Domäne (reproduzierbarer KPI) -_Die Runtime-Architektur ist eingefroren. Eine neue Domäne = Daten + Wissen, die jede Sicht automatisch erweitern. Produktionsstraße: Corpus→Obligations→Capabilities→Transition→Playbooks→Reference→Completeness. **Law-first: Recht → Pflichten → Capabilities → Managementsystem → Delta.**_ +_Engpass = Domänenmodellierung. Jede Domäne läuft durch DIESELBE 7-Stufen-Produktionsstraße (Domain Model → Requirement Sources → Capability Registry → Transition Patterns → Playbooks → Reference Scenarios → Completeness). Reifegrad aus dem ECHTEN Korpus abgeleitet (computed-not-stored), keine Marketingzahl. Einstieg über Industry, nicht Regelwerk._ -**Environmental Knowledge Program** — _Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?_ (status: `started`) +| Rank | Domäne | Reifegrad (Sources modelliert) | modelliert/total | Korpus TP·PB·RTS | +|---|---|---|---|---| +| 1 | **Industrial Automation Domain** | `████░░░░░░` 43% | 3/7 | 3·2·3 | +| 2 | **Environmental Domain** | `░░░░░░░░░░` 0% | 0/6 | 0·0·0 | +| 3 | **Automotive Domain** | `██░░░░░░░░` 17% | 1/6 | 1·0·0 | +| 4 | **Medical Domain** | `██░░░░░░░░` 20% | 1/5 | 3·2·3 | +| 5 | **Energy Domain** | `██░░░░░░░░` 25% | 1/4 | 3·2·3 | -| Stufe | Artefakt | Owner | Status | -|---|---|---|---| -| B1 | Environmental Regulatory Corpus | Legal Knowledge / Obligation Registry | **open** | -| B2 | Environmental Capability Model | Compliance Execution | **open** | -| B3 | Transition Patterns (ISO 14001 -> Environmental Corpus) | Reasoning (Knowledge Acquisition) | **blocked** | - -- **Baseline (Completeness):** Identifiziert 6 · bewertet 0 · offen 6 · Unsicherheiten 0 · Begründung ja — die 6 Bereiche: water, chemicals, emissions, energy, waste, product_responsibility - -_Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automatisch, wenn die Stufen landen — Reference Suite + Completeness dokumentieren den Fortschritt je Domäne._ +_Industry-Einstieg + ETO-Hypothese: jede Domäne kennt ihre typischen Sources + Zertifikate → vor dem Onboarding „diese Prozesswelt ist wahrscheinlich vorhanden" (Hypothese, nie Wahrheit; speist Company 2A als `inferred`). Backlog nach Kundennutzen, KPI nach echtem Korpusstand — beides bewusst getrennt._ **Architecture Coverage** | Layer | Status | Hinweis | |---|---|---| -| Domain Program Blueprint (wiederverwendbar) | **PASS** | Corpus→…→Completeness, law-first, Ownership je Stufe | -| Environmental Program (Daten) | **PASS** | B1@Legal-KG · B2@Execution · B3@Reasoning (blocked) | -| Phase B = Domänen, keine Architektur | **PASS** | kein neues Runtime-Framework (Freeze, ADR-008) | +| Domain Knowledge Program (7-Stufen-Produktionsstraße) | **PASS** | 5 Domänen im Backlog, Industrial Automation #1 | +| Reifegrad-KPI (computed-not-stored) | **PASS** | aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne) | +| Regelwerk-ID-Normalisierung | **TODO** | Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend | ## Gaps → Epics (Backlog — nur erfasst, NICHT implementiert) @@ -401,5 +399,5 @@ _Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automat ## Suite-Status (Roll-up) - Coverage-Zellen gesamt: **47** -- PASS: **36** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 6 · N/A: 1 · NEEDS_FACTS: 0 +- PASS: **35** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 7 · N/A: 1 · NEEDS_FACTS: 0 - Fortschritt = PASS-Anteil steigt, wenn Epics RS-001…004 landen (objektiver Maßstab, kein LOC). diff --git a/backend-compliance/tests/test_domain_programs.py b/backend-compliance/tests/test_domain_programs.py index 202f15f8..40ff71ec 100644 --- a/backend-compliance/tests/test_domain_programs.py +++ b/backend-compliance/tests/test_domain_programs.py @@ -1,8 +1,9 @@ -"""Characterization test for the Environmental Knowledge Program definition (data, not code). +"""Characterization tests for the Domain Knowledge Program v1 backlog (data, not code). -Pins the LAW-FIRST contract: the domain is ordered Corpus(B1) -> Capabilities(B2) -> Transition(B3), -not the reverse; ownership is assigned per stage; B3 (ISO 14001 -> corpus) is blocked until both sides -exist. If a future edit reverses the order or drops an owner, this test fails. +Pins the program FRAMEWORK contract: a ranked backlog of domain definitions, each entered by INDUSTRY +with its typical requirement sources + a pre-onboarding capability hypothesis (typical_certifications). +Industrial Automation is rank 1. Environmental stays law-first. If a future edit reorders the backlog, +drops a source list, or reverts environmental to an ISO-first framing, these tests fail. """ from __future__ import annotations @@ -11,45 +12,60 @@ import os import yaml -_PROG = os.path.join(os.path.dirname(__file__), "..", "knowledge", "programs", "environmental.yaml") +_DIR = os.path.join(os.path.dirname(__file__), "..", "knowledge", "programs") -def _program(): - with open(_PROG, encoding="utf-8") as f: - return yaml.safe_load(f) +def _programs(): + out = {} + for f in sorted(os.listdir(_DIR)): + if f.endswith(".yaml"): + with open(os.path.join(_DIR, f), encoding="utf-8") as h: + p = yaml.safe_load(h) + out[p["id"]] = p + return out -def test_blueprint_is_the_reusable_production_line(): - p = _program() - assert p["blueprint"] == ["corpus", "obligations", "capabilities", "transition_patterns", - "playbooks", "reference_scenarios", "completeness"] +def test_five_domains_ranked_backlog(): + ranks = sorted(p["backlog_rank"] for p in _programs().values()) + assert ranks == [1, 2, 3, 4, 5] -def test_stages_are_law_first_in_order(): - stages = _program()["stages"] - assert [s["id"] for s in stages] == ["B1", "B2", "B3"] # corpus -> capabilities -> transition - assert "Corpus" in stages[0]["name"] and "Transition" in stages[2]["name"] +def test_industrial_automation_is_rank_1(): + progs = _programs() + rank1 = [p for p in progs.values() if p["backlog_rank"] == 1] + assert len(rank1) == 1 and rank1[0]["id"] == "PROG-industrial-automation" + assert {"CRA", "MaschinenVO"} <= set(rank1[0]["typical_requirement_sources"]) -def test_ownership_assigned_per_stage(): - by = {s["id"]: s for s in _program()["stages"]} - assert "Legal Knowledge" in by["B1"]["owner"] # corpus + obligations - assert "Compliance Execution" in by["B2"]["owner"] # capability model - assert "Reasoning" in by["B3"]["owner"] # transition patterns +def test_every_domain_entered_by_industry_with_sources_and_hypothesis(): + for p in _programs().values(): + assert p.get("industry") and p.get("customer_entry") # industry-first entry + assert p["typical_requirement_sources"] # stage 2 defined + assert p["typical_certifications"] # pre-onboarding capability hypothesis (ETO) -def test_transition_is_blocked_until_both_sides_known(): - b3 = {s["id"]: s for s in _program()["stages"]}["B3"] - assert b3["status"] == "blocked" - assert b3["depends_on"] == ["B1", "B2"] # built LAST (law-first) +def test_no_stored_stage_status_progress_is_derived(): + # the 7-stage progress is computed-not-stored: program shells must NOT hard-code stage status + for p in _programs().values(): + assert "stages" not in p -def test_b1_covers_the_six_environmental_areas(): - b1 = {s["id"]: s for s in _program()["stages"]}["B1"] - assert set(b1["areas"]) == {"water", "chemicals", "emissions", "energy", "waste", "product_responsibility"} +def test_environmental_stays_law_first(): + env = _programs()["PROG-environmental"] + assert "ISO 14001 ist KEIN Umweltrecht" in env["principle"] + assert set(env["typical_requirement_sources"]) == {"water", "chemicals", "emissions", "energy", "waste", "product_responsibility"} -def test_program_is_a_domain_not_an_iso_project(): - p = _program() - assert "Umweltanforderungen" in p["customer_question"] # starts from the law, not ISO 14001 - assert "ISO 14001 ist KEIN Umweltrecht" in p["principle"] +def test_automotive_and_medical_present(): + progs = _programs() + assert "TISAX" in progs["PROG-automotive"]["typical_requirement_sources"] + assert "MDR" in progs["PROG-medical"]["typical_requirement_sources"] + + +def test_readme_documents_seven_stage_checklist(): + with open(os.path.join(_DIR, "README.md"), encoding="utf-8") as h: + readme = h.read() + for stage in ["Domain Model", "Requirement Sources", "Capability Registry", + "Transition Patterns", "Playbooks", "Reference Scenarios", "Completeness"]: + assert stage in readme + assert "Industrial Automation" in readme # backlog #1 documented diff --git a/docs-src/architecture/adr/ADR-009-domain-knowledge-program.md b/docs-src/architecture/adr/ADR-009-domain-knowledge-program.md new file mode 100644 index 00000000..f07c00f0 --- /dev/null +++ b/docs-src/architecture/adr/ADR-009-domain-knowledge-program.md @@ -0,0 +1,53 @@ +# ADR-009: Domain Knowledge Program — one 7-stage production line per domain + +- **Status:** Accepted +- **Datum:** 2026-06-27 +- **Typ:** Architektur- / Organisations-Entscheidung +- **Bezug:** [ADR-008](ADR-008-from-architecture-to-domains.md), [ADR-007](ADR-007-regulatory-completeness.md), [ADR-005](ADR-005-knowledge-production-pipeline.md), Architektur-Freeze v1.0, [[company-intelligence-2a]] + +## Kontext + +Der Engpass ist nicht mehr Architektur, Controls oder „Wissen" allgemein, sondern präzise: +**Domänenmodellierung.** Phase B (ADR-008) wird daher nicht als Einzel-Regelwerk-Features +organisiert, sondern als EIN Arbeitsprogramm mit Unterprogrammen je Domäne — alle durch dieselbe +Produktionsstraße. Kein weiteres Architektur-Epic, keine neue Runtime-Architektur. + +## Entscheidung + +1. **Einstieg über die INDUSTRIE, nicht über das Regelwerk.** Der Kunde sagt „ich baue + Verpackungsmaschinen / bin Automobilzulieferer / baue Parksysteme", nicht „erklär mir ISO 9001". + Die Pipeline beginnt davor: `Industry → Domain Model → Requirement Sources → Requirements → + Capabilities → … → Completeness`. + +2. **Eine 7-Stufen-Checkliste, identisch für JEDE Domäne:** + 1 Domain Model · 2 Requirement Sources · 3 Capability Registry · 4 Transition Patterns · + 5 Playbooks · 6 Reference Scenarios · 7 Completeness. Ownership je Stufe (1 Reasoning · 2 Legal-KG · + 3 Execution · 4–7 Reasoning). Das ist der Skalierungsmechanismus: jede neue Domäne nutzt dieselbe + Straße, die bestehenden Engines erweitern sich automatisch. + +3. **Domänen tragen `typical_requirement_sources` + `typical_certifications` → Pre-Onboarding-HYPOTHESE + (ETO-Einsicht).** Vor dem Onboarding: „diese Prozesswelt ist *wahrscheinlich* vorhanden" — als + Hypothese, nie Wahrheit. Speist Company 2A als `inferred`, nie `confirmed`. Wir wollen nicht wissen, + OB ein Automobilzulieferer ISO 9001 hat (das hat jeder), sondern welche Fähigkeiten dadurch + wahrscheinlich schon vorhanden sind. + +4. **Per-Domain-KPI, reproduzierbar (computed-not-stored).** Reifegrad wird aus dem ECHTEN Korpus + abgeleitet (modellierte Sources / Transition Patterns / Playbooks / Reference Scenarios / bewusst + ausgewiesene Lücken — auf Basis der Regulatory Completeness Engine), NICHT als kuratierte Zahl. + Programm-Shells speichern KEINEN Stufen-Status. Keine Marketingzahl. + +5. **Domain Knowledge Program v1 — Backlog nach Kundennutzen** (getrennt vom KPI nach Korpusstand): + 1 Industrial Automation · 2 Environmental · 3 Automotive · 4 Medical · 5 Energy. + +## Konsequenzen + +- **Programme statt Features:** jede Domäne ist eine maschinenlesbare Definition (`programs/*.yaml`); + der Reifegrad-KPI im Reference-Suite ist aus dem Korpus abgeleitet und differenziert ehrlich + (Industrial Automation führt, Environmental 0 % — die Arbeit liegt vor uns). +- **Backlog ≠ KPI:** der Backlog ordnet nach Kundennutzen, der KPI misst den echten Korpusstand — + bewusst getrennt (z. B. eine Domäne kann hoch im Backlog, aber niedrig im KPI stehen). +- **Arbeit verschiebt sich endgültig von Software- zu Wissensproduktion.** Wettbewerbsvorteil = + Qualität und Breite der modellierten Domänen. +- **Freeze-konform:** kein neues Metamodell, kein Graph, kein neues `compliance/`-Modul. Nur + Programm-Daten (`knowledge/programs/`) + abgeleitete Reporting-Sicht im Reference-Suite. +- Diese ADR ist non-runtime → kein Deploy (siehe [ADR-001](ADR-001-runtime-deploy-policy.md)).