Merge pull request 'feat: open Domain Knowledge Program v1 (7-stage production line + per-domain KPI)' (#23) from feat/domain-knowledge-program-v1 into main

2026-06-27 18:50:12 +02:00
parent c737e1ad7d 1a9439d013
commit a2403eaed9
10 changed files with 312 additions and 159 deletions
@@ -1,51 +1,58 @@
-# Domain Knowledge Programs — from architecture to domains
+# Domain Knowledge Program — the production line for every domain

-**The architecture is stable. From here the value comes from DOMAINS, not more software.**
-The runtime architecture (scope, regulatory map, capability delta, optimization, playbooks, intake,
-production, completeness) is frozen. A new regulatory domain is a **data + knowledge** project that
-extends every existing view automatically — no new runtime framework (see ADR-008, Freeze v1.0).
+**The architecture is stable. From here the value comes from DOMAIN MODELLING, not more software.**
+The real bottleneck is no longer architecture or controls or even „knowledge" — it is **domain
+modelling**. Phase B is therefore organised as ONE program with sub-programs per domain, each run
+through the SAME production line. No new runtime framework (ADR-008/009, Freeze v1.0).

-## The production line (reusable for EVERY domain)
+## The customer enters by INDUSTRY, not by regulation
+
+A customer never says „explain ISO 9001". They say „I build packaging machines" / „I'm an automotive
+supplier" / „I build parking systems". So the pipeline starts at the industry:

 ```
-Regulatory Corpus → Obligations → Capabilities → Transition Patterns → Playbooks → Reference Scenarios → Completeness
+Industry → Domain Model → Requirement Sources → Requirements → Capabilities → … → Reality / Verification
 ```

-Each domain delivers the SAME artifacts. Once they land, the existing engines automatically extend:
-Scope · Gap · Capability Delta · Optimization · Playbooks · Reference Scenarios · Regulatory Completeness.
+## The 7-stage checklist (identical for EVERY domain)

-## Law-first (a deliberate ordering)
-
-Start from the **law**, not from a management system. A management system (e.g. ISO 14001) is NOT
-the domain — it is one possible source state. The customer asks *„welche Anforderungen gelten für mein
-Produkt?"*, not *„wie komme ich von ISO 14001 weg?"*. So the order is:
-
-```
-Recht → Obligations → Capabilities → (Managementsystem) → Delta
-```
-
-A management-system→corpus transition pattern is built LAST, once BOTH sides are known.
-
-## Ownership per stage (coordinate via the board, do not duplicate)
-
-| Stage | Artifact | Owner |
+| # | Stage | Owner |
 |---|---|---|
-| B1 Corpus + Obligations | legal sources, obligations registry | **Legal Knowledge / Obligation Registry** |
-| B2 Capability Model | new capabilities in the Master Capability Registry | **Compliance Execution** |
-| B3 Transition Patterns | source-state → corpus delta patterns | **Reasoning (Knowledge Acquisition)** |
-| Playbooks | implementation playbooks per capability | **Reasoning** |
-| Reference Scenarios | canonical regression + expected outcomes | **Reasoning** |
-| Completeness | corpus-status registry per domain | **Reasoning / curation** |
+| 1 | **Domain Model** (industry → what world is this?) | Reasoning / curation |
+| 2 | **Requirement Sources** (which regulations/standards/specs apply) | Legal Knowledge |
+| 3 | **Capability Registry** (capabilities the sources require) | Compliance Execution |
+| 4 | **Transition Patterns** (source-state → domain delta) | Reasoning |
+| 5 | **Playbooks** (how to implement each capability) | Reasoning |
+| 6 | **Reference Scenarios** (canonical regression + expected outcomes) | Reasoning |
+| 7 | **Completeness** (auditable coverage per domain) | Reasoning / curation |

-## Programs (planned)
+This is the scaling mechanism: every new domain reuses the same production line; the existing engines
+(Scope, Gap, Capability Delta, Optimization, Playbooks, Reference, Completeness) extend automatically.

-| Program | File | Status |
-|---|---|---|
-| Environmental Knowledge Program | `environmental.yaml` | started (B1 handed off) |
-| Automotive Knowledge Program | _planned_ | — |
-| OT / IEC 62443 Knowledge Program | _planned_ | — |
-| Functional Safety Knowledge Program | _planned_ | — |
+## A domain knows its typical sources → pre-onboarding HYPOTHESIS (the ETO insight)

-Each program is a machine-readable definition (`*.yaml`) consumed by the reference suite to track its
-progress; future sessions flip stage `status` as artifacts land, and the Completeness engine reports
-the domain flipping `unsupported → validated` automatically.
+Each domain definition lists `typical_requirement_sources` and `typical_certifications`. So before
+onboarding, BreakPilot can say „this process world is *probably* present" — as a **hypothesis, not a
+truth**. We don't want to know whether an automotive supplier has ISO 9001 (everyone does); we want
+to know **which company capabilities are therefore probably already present** (feeds Company 2A as
+`inferred`, never `confirmed`).
+
+## Per-domain KPI — reproducible, not marketing
+
+Progress per domain is **derived from the Regulatory Completeness Engine + the actual corpus**
+(computed-not-stored): identified requirement sources · modelled capabilities · transition patterns ·
+playbooks · passed reference scenarios · consciously declared corpus gaps. Rendered as a bar
+(`Industrial ███████░░░ 70 %`). These are reproducible quality metrics — no curated numbers.
+
+## Domain Knowledge Program v1 — backlog (by current customer value)
+
+| Rank | Domain | File | Typical sources |
+|---|---|---|---|
+| 1 | **Industrial Automation** | `industrial_automation.yaml` | CRA · MaschinenVO · EMV · RED · Data Act · IEC 62443 · NIS2 |
+| 2 | Environmental | `environmental.yaml` | Wasser · Chemikalien · Luft · Energie · Abfall · Produktverantwortung |
+| 3 | Automotive | `automotive.yaml` | IATF · TISAX · UNECE R155/R156 · ASPICE · OEM-Lastenhefte |
+| 4 | Medical | `medical.yaml` | MDR · IEC 62304 · ISO 14971 |
+| 5 | Energy | `energy.yaml` | je nach Zielmarkt |
+
+The work shifts decisively from software development to knowledge production; the competitive
+advantage now comes from the quality and breadth of the modelled domains.
@@ -0,0 +1,19 @@
+# Domain Knowledge Program — Automotive (backlog rank 3). A domain DEFINITION, not corpus content.
+# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md.
+
+id: PROG-automotive
+name: "Automotive Domain"
+industry: "Automobilzulieferer, OEM-Zulieferkette"
+customer_entry: "Ich bin Automobilzulieferer."
+backlog_rank: 3
+rationale: "großer Markt; OEM-Lastenhefte = früher Business-Requirement-Anwendungsfall."
+status: planned
+
+typical_requirement_sources: [IATF16949, TISAX, "UNECE R155", "UNECE R156", ASPICE, OEM_Lastenheft]
+typical_certifications: [ISO9001, IATF16949, TISAX, ISO27001]
+
+ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning"
+note: >
+  ISMS→TISAX-Transition-Pattern existiert bereits (Vorarbeit). UNECE R155 (Cybersecurity Management
+  System) ↔ CRA = quellenübergreifender Convergence-Kandidat. OEM-Lastenheft = erster Business
+  Requirement (siehe Vision V2 / Requirements Verification, NICHT jetzt).
@@ -0,0 +1,18 @@
+# Domain Knowledge Program — Energy (backlog rank 5). A domain DEFINITION, not corpus content.
+# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md.
+
+id: PROG-energy
+name: "Energy Domain"
+industry: "Energieerzeugung/-verteilung, Anlagen kritischer Infrastruktur"
+customer_entry: "Ich baue Anlagen für Energieerzeugung / kritische Infrastruktur."
+backlog_rank: 5
+rationale: "Zielmarkt-abhängig; nach den klareren Industrie-/Produkt-Domänen."
+status: planned
+
+typical_requirement_sources: [NIS2, IEC62443, CRA, "netzcode/marktabhängig"]
+typical_certifications: [ISO27001, IEC62443]
+
+ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning"
+note: >
+  Stark zielmarkt-abhängig (Netzcodes, nationale Vorgaben). NIS2/IEC62443 teilen sich Capabilities
+  mit Industrial Automation → Wiederverwendung wahrscheinlich hoch.
@@ -1,39 +1,25 @@
-# Environmental Knowledge Program — a regulatory DOMAIN, not an ISO-14001 project.
-# Machine-readable program definition consumed by the reference suite to track progress.
+# Domain Knowledge Program — Environmental (backlog rank 2). A domain DEFINITION, not corpus content.
 # LAW-FIRST: Umweltrecht -> Obligations -> Capabilities -> ISO 14001 -> Delta (never the reverse).
+# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md.

 id: PROG-environmental
-name: "Environmental Knowledge Program"
-customer_question: "Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?"
-status: started                 # planned | started | in_progress | complete
+name: "Environmental Domain"
+industry: "Hersteller mit Umweltpflichten (z. B. Industriespülmaschinen, Anlagenbau)"
+customer_entry: "Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?"
+backlog_rank: 2
+rationale: "konkreter Kundenbezug (Abwasser/Chemikalien) — direkt nach Industrial Automation."
+status: started

 principle: >
-  ISO 14001 ist KEIN Umweltrecht, sondern ein Managementsystem. Wir starten beim Recht und fragen
-  erst danach, welche vorhandenen Managementsysteme davon wahrscheinlich schon etwas abdecken.
+  ISO 14001 ist KEIN Umweltrecht, sondern ein Managementsystem (= ein Quellzustand). LAW-FIRST:
+  erst das Recht, dann welche vorhandenen Managementsysteme davon wahrscheinlich schon etwas abdecken.

-# the reusable production line, instantiated for this domain
-blueprint: [corpus, obligations, capabilities, transition_patterns, playbooks, reference_scenarios, completeness]
+# Stage 2 — the requirement-source areas of this domain (each becomes laws/obligations at Stage 2-3)
+typical_requirement_sources: [water, chemicals, emissions, energy, waste, product_responsibility]
+typical_certifications: [ISO14001, ISO9001]   # pre-onboarding capability HYPOTHESIS (nicht Wahrheit)

-stages:
-  - id: B1
-    name: "Environmental Regulatory Corpus"
-    owner: "Legal Knowledge / Obligation Registry"
-    status: open                # handed off — not built here
-    note: "Zunächst NUR Rechtsquellen + Pflichten (noch keine ISO, keine Capabilities)."
-    areas:                      # the six environmental obligation areas the customer actually faces
-      - water                   # Wasser / Abwasser
-      - chemicals               # Chemikalien (REACH/CLP)
-      - emissions               # Emissionen
-      - energy                  # Energie
-      - waste                   # Abfall
-      - product_responsibility  # Produktverantwortung
-
-  - id: B2
-    name: "Environmental Capability Model"
-    owner: "Compliance Execution"
-    status: open                # depends on B1; Registry grows here
-    depends_on: [B1]
-    capabilities:
+# Reasoning capabilities to be modelled (Stage 3, @Execution) once the corpus lands
+target_capabilities:
  - chemical_management
  - wastewater_management
  - emissions_monitoring
@@ -41,17 +27,7 @@ stages:
  - energy_data_capture
  - environmental_incident_management

-  - id: B3
-    name: "Transition Patterns (ISO 14001 -> Environmental Corpus)"
-    owner: "Reasoning (Knowledge Acquisition)"
-    status: blocked             # LAST — only once both sides (corpus + capabilities) are known
-    depends_on: [B1, B2]
-    note: "Erst jetzt sinnvoll: ISO 14001 als Quellzustand gegen den Umwelt-Korpus (User: law-first)."
-
-# once B1-B3 land these extend AUTOMATICALLY via the existing engines (no new runtime architecture)
-downstream_auto: [playbooks, reference_scenarios, optimization, scope, capability_delta, completeness]
-
-acceptance: >
-  Regulatory Completeness kippt `Environmental` von unsupported/open auf assessed; die sechs Bereiche
-  sind als Obligations + Capabilities im validierten Korpus, das ISO-14001-Delta als Transition Pattern.
-  Messbar an der Reference Suite (Environmental-Zelle UNSUPPORTED -> PASS).
+ownership: "Stufe 1 Reasoning · 2 Legal-KG (HANDOFF, nur Recht+Pflichten) · 3 Execution (HANDOFF) · 4-7 Reasoning"
+note: >
+  B3 (ISO 14001 -> Korpus-Transition) entsteht ZULETZT, erst wenn Recht + Capabilities bekannt sind.
+  Acceptance: Regulatory Completeness kippt `Environmental` von unsupported/open auf assessed.
@@ -0,0 +1,22 @@
+# Domain Knowledge Program — Industrial Automation (backlog rank 1, highest current customer value).
+# A domain DEFINITION, not corpus content. The 7-stage progress is DERIVED from the corpus by the
+# reference suite (computed-not-stored), never stored here. See programs/README.md for the checklist.
+
+id: PROG-industrial-automation
+name: "Industrial Automation Domain"
+industry: "Maschinen-/Anlagenbau, Industrieautomation, Parksysteme, Verpackungsmaschinen"
+customer_entry: "Ich baue Verpackungsmaschinen / Parksysteme / Industrieanlagen."
+backlog_rank: 1
+rationale: "höchster aktueller Kundennutzen — bereits am weitesten modelliert (CRA + MaschinenVO)."
+status: in_progress
+
+# Stage 2 — the requirement sources that typically apply to this industry
+typical_requirement_sources: [CRA, MaschinenVO, EMV, RED, DataAct, IEC62443, NIS2]
+
+# pre-onboarding capability HYPOTHESIS (nicht Wahrheit, vgl. ETO): feeds Company 2A as `inferred`
+typical_certifications: [ISO9001, ISO27001]
+
+ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning"
+note: >
+  Diese Domäne hat den Vorlauf: CRA + MaschinenVO sind als Convergence-Pattern, RTS und Playbooks
+  bereits (teilweise) im Korpus. EMV/RED/IEC62443/NIS2 sind identifiziert, aber noch nicht modelliert.
@@ -0,0 +1,18 @@
+# Domain Knowledge Program — Medical (backlog rank 4). A domain DEFINITION, not corpus content.
+# 7-stage progress is DERIVED from the corpus (computed-not-stored). See programs/README.md.
+
+id: PROG-medical
+name: "Medical Domain"
+industry: "Medizinprodukte-Hersteller, Medizintechnik"
+customer_entry: "Ich baue Medizinprodukte / Medizintechnik."
+backlog_rank: 4
+rationale: "hoher Leidensdruck (MDR), aber spezialisierter Markt → nach Industrial/Automotive."
+status: planned
+
+typical_requirement_sources: [MDR, IEC62304, ISO14971, IEC60601, CRA]
+typical_certifications: [ISO13485, ISO14971]
+
+ownership: "Stufe 1 Reasoning · 2 Legal-KG · 3 Execution · 4-7 Reasoning"
+note: >
+  IEC 62304 (Software-Lebenszyklus) ↔ CRA secure-development = quellenübergreifender Convergence-
+  Kandidat. ISO 14971 (Risikomanagement) ↔ Produkt-Risikoanalyse. Erst nach Industrial/Automotive.
@@ -142,35 +142,61 @@ def completeness_section() -> None:


 def domain_programs_section(base_dir) -> None:
-    """Render the Domain Knowledge Programs section (kept here so generate.py stays under the LOC budget)."""
+    """Domain Knowledge Program v1 — per-domain maturity KPI DERIVED from the corpus (computed-not-stored)."""
    import os
    import yaml
-    from compliance.completeness import assess_completeness
+    from compliance.knowledge_intake import build_knowledge_index

+    def _load(sub):
+        d = os.path.join(base_dir, "..", "knowledge", sub)
+        return [yaml.safe_load(open(os.path.join(d, f), encoding="utf-8"))
+                for f in sorted(os.listdir(d)) if f.endswith(".yaml")]
+
+    idx = build_knowledge_index(_load("transition_patterns"), _load("implementation_playbooks"),
+                                _load("reference_transition_scenarios"))
    pdir = os.path.join(base_dir, "..", "knowledge", "programs")
-    progs = [yaml.safe_load(open(os.path.join(pdir, f), encoding="utf-8"))
-             for f in sorted(os.listdir(pdir)) if f.endswith(".yaml")]
-    w("## Domain Knowledge Programs — ab jetzt Domänen, nicht Architektur")
+    progs = sorted((yaml.safe_load(open(os.path.join(pdir, f), encoding="utf-8"))
+                    for f in sorted(os.listdir(pdir)) if f.endswith(".yaml")), key=lambda p: p.get("backlog_rank", 99))
+
+    _ALIAS = {"cyber resilience act": "cra", "maschinenverordnung": "maschinenvo", "iatf": "iatf16949"}
+
+    def _canon(r):
+        k = str(r).strip().lower()
+        return _ALIAS.get(k, k)
+
+    def _hits(reg_lists, src):
+        cs = {_canon(s) for s in src}
+        return [k for k, regs in reg_lists.items() if cs & {_canon(x) for x in regs}]
+
+    def _source_modeled(index, source, canon):
+        c = canon(source)
+        in_tp = any(c in {canon(x) for x in regs} for regs in index.transition_patterns.values())
+        in_rts = any(c in {canon(x) for x in regs} for regs in index.reference_scenarios.values())
+        in_pb = any(c in {canon(x) for x in index.capability_regulations.get(cap, [])} for cap in index.playbook_capabilities)
+        return in_tp or in_rts or in_pb
+
+    w("## Domain Knowledge Program v1 — Reifegrad je Domäne (reproduzierbarer KPI)")
    w("")
-    w('_Die Runtime-Architektur ist eingefroren. Eine neue Domäne = Daten + Wissen, die jede Sicht automatisch erweitern. Produktionsstraße: Corpus→Obligations→Capabilities→Transition→Playbooks→Reference→Completeness. **Law-first: Recht → Pflichten → Capabilities → Managementsystem → Delta.**_')
+    w('_Engpass = Domänenmodellierung. Jede Domäne läuft durch DIESELBE 7-Stufen-Produktionsstraße (Domain Model → Requirement Sources → Capability Registry → Transition Patterns → Playbooks → Reference Scenarios → Completeness). Reifegrad aus dem ECHTEN Korpus abgeleitet (computed-not-stored), keine Marketingzahl. Einstieg über Industry, nicht Regelwerk._')
    w("")
+    w("| Rank | Domäne | Reifegrad (Sources modelliert) | modelliert/total | Korpus TP·PB·RTS |")
+    w("|---|---|---|---|---|")
    for p in progs:
-        w("**%s** — _%s_ (status: `%s`)" % (p["name"], p["customer_question"], p["status"]))
+        src = p.get("typical_requirement_sources", [])
+        tp, rts = _hits(idx.transition_patterns, src), _hits(idx.reference_scenarios, src)
+        cs = {_canon(s) for s in src}
+        pb = [c for c in idx.playbook_capabilities if cs & {_canon(x) for x in idx.capability_regulations.get(c, [])}]
+        modeled = [s for s in src if _source_modeled(idx, s, _canon)]   # sources with >=1 corpus artifact
+        breadth = (len(modeled) / len(src)) if src else 0.0             # honest differentiator (not CRA-shared depth)
+        filled = int(round(breadth * 10))
+        w("| %d | **%s** | `%s` %d%% | %d/%d | %d·%d·%d |" % (
+            p.get("backlog_rank", 99), p["name"], "█" * filled + "░" * (10 - filled),
+            int(round(breadth * 100)), len(modeled), len(src), len(tp), len(pb), len(rts)))
    w("")
-        w("| Stufe | Artefakt | Owner | Status |")
-        w("|---|---|---|---|")
-        for s in p.get("stages", []):
-            w("| %s | %s | %s | **%s** |" % (s["id"], s["name"], s["owner"], s["status"]))
-        w("")
-        areas = next((s.get("areas", []) for s in p.get("stages", []) if s.get("id") == "B1"), [])
-        if areas:
-            rep = assess_completeness(identified_regulations=areas, corpus_status={})   # all unknown -> open baseline
-            w("- **Baseline (Completeness):** %s — die 6 Bereiche: %s" % (rep.completeness_summary, ", ".join(areas)))
-        w("")
-    w("_Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automatisch, wenn die Stufen landen — Reference Suite + Completeness dokumentieren den Fortschritt je Domäne._")
+    w('_Industry-Einstieg + ETO-Hypothese: jede Domäne kennt ihre typischen Sources + Zertifikate → vor dem Onboarding „diese Prozesswelt ist wahrscheinlich vorhanden" (Hypothese, nie Wahrheit; speist Company 2A als `inferred`). Backlog nach Kundennutzen, KPI nach echtem Korpusstand — beides bewusst getrennt._')
    w("")
    coverage_table([
-        ("Domain Program Blueprint (wiederverwendbar)", "PASS", "Corpus→…→Completeness, law-first, Ownership je Stufe"),
-        ("Environmental Program (Daten)", "PASS", "B1@Legal-KG · B2@Execution · B3@Reasoning (blocked)"),
-        ("Phase B = Domänen, keine Architektur", "PASS", "kein neues Runtime-Framework (Freeze, ADR-008)"),
+        ("Domain Knowledge Program (7-Stufen-Produktionsstraße)", "PASS", "%d Domänen im Backlog, Industrial Automation #1" % len(progs)),
+        ("Reifegrad-KPI (computed-not-stored)", "PASS", "aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne)"),
+        ("Regelwerk-ID-Normalisierung", "TODO", "Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend"),
    ])
@@ -365,29 +365,27 @@ _Sobald der Umwelt-Korpus (ISO 14001 etc.) landet, kippt `Environmental` automat
 | Begründete Ausschlüsse (Korpus/Anwendbarkeit) | **PASS** | 3 Ausschlüsse, alle mit Grund |
 | Fortschritts-Doku je Domäne | **PASS** | Environmental offen→validated bei Korpus-Landung |

-## Domain Knowledge Programs — ab jetzt Domänen, nicht Architektur
+## Domain Knowledge Program v1 — Reifegrad je Domäne (reproduzierbarer KPI)

-_Die Runtime-Architektur ist eingefroren. Eine neue Domäne = Daten + Wissen, die jede Sicht automatisch erweitern. Produktionsstraße: Corpus→Obligations→Capabilities→Transition→Playbooks→Reference→Completeness. **Law-first: Recht → Pflichten → Capabilities → Managementsystem → Delta.**_
+_Engpass = Domänenmodellierung. Jede Domäne läuft durch DIESELBE 7-Stufen-Produktionsstraße (Domain Model → Requirement Sources → Capability Registry → Transition Patterns → Playbooks → Reference Scenarios → Completeness). Reifegrad aus dem ECHTEN Korpus abgeleitet (computed-not-stored), keine Marketingzahl. Einstieg über Industry, nicht Regelwerk._

-**Environmental Knowledge Program** — _Welche Umweltanforderungen gelten für mein Produkt (z. B. Industriespülmaschine)?_ (status: `started`)
+| Rank | Domäne | Reifegrad (Sources modelliert) | modelliert/total | Korpus TP·PB·RTS |
+|---|---|---|---|---|
+| 1 | **Industrial Automation Domain** | `████░░░░░░` 43% | 3/7 | 3·2·3 |
+| 2 | **Environmental Domain** | `░░░░░░░░░░` 0% | 0/6 | 0·0·0 |
+| 3 | **Automotive Domain** | `██░░░░░░░░` 17% | 1/6 | 1·0·0 |
+| 4 | **Medical Domain** | `██░░░░░░░░` 20% | 1/5 | 3·2·3 |
+| 5 | **Energy Domain** | `██░░░░░░░░` 25% | 1/4 | 3·2·3 |

-| Stufe | Artefakt | Owner | Status |
-|---|---|---|---|
-| B1 | Environmental Regulatory Corpus | Legal Knowledge / Obligation Registry | **open** |
-| B2 | Environmental Capability Model | Compliance Execution | **open** |
-| B3 | Transition Patterns (ISO 14001 -> Environmental Corpus) | Reasoning (Knowledge Acquisition) | **blocked** |
-
- **Baseline (Completeness):** Identifiziert 6 · bewertet 0 · offen 6 · Unsicherheiten 0 · Begründung ja — die 6 Bereiche: water, chemicals, emissions, energy, waste, product_responsibility
-
-_Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automatisch, wenn die Stufen landen — Reference Suite + Completeness dokumentieren den Fortschritt je Domäne._
+_Industry-Einstieg + ETO-Hypothese: jede Domäne kennt ihre typischen Sources + Zertifikate → vor dem Onboarding „diese Prozesswelt ist wahrscheinlich vorhanden" (Hypothese, nie Wahrheit; speist Company 2A als `inferred`). Backlog nach Kundennutzen, KPI nach echtem Korpusstand — beides bewusst getrennt._

 **Architecture Coverage**

 | Layer | Status | Hinweis |
 |---|---|---|
-| Domain Program Blueprint (wiederverwendbar) | **PASS** | Corpus→…→Completeness, law-first, Ownership je Stufe |
-| Environmental Program (Daten) | **PASS** | B1@Legal-KG · B2@Execution · B3@Reasoning (blocked) |
-| Phase B = Domänen, keine Architektur | **PASS** | kein neues Runtime-Framework (Freeze, ADR-008) |
+| Domain Knowledge Program (7-Stufen-Produktionsstraße) | **PASS** | 5 Domänen im Backlog, Industrial Automation #1 |
+| Reifegrad-KPI (computed-not-stored) | **PASS** | aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne) |
+| Regelwerk-ID-Normalisierung | **TODO** | Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend |

 ## Gaps → Epics (Backlog — nur erfasst, NICHT implementiert)

@@ -401,5 +399,5 @@ _Jedes Programm liefert dieselben Artefakte; Status `open/blocked` kippt automat
 ## Suite-Status (Roll-up)

 - Coverage-Zellen gesamt: **47**
- PASS: **36** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 6 · N/A: 1 · NEEDS_FACTS: 0
+- PASS: **35** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 7 · N/A: 1 · NEEDS_FACTS: 0
 - Fortschritt = PASS-Anteil steigt, wenn Epics RS-001…004 landen (objektiver Maßstab, kein LOC).
@@ -1,8 +1,9 @@
-"""Characterization test for the Environmental Knowledge Program definition (data, not code).
+"""Characterization tests for the Domain Knowledge Program v1 backlog (data, not code).

-Pins the LAW-FIRST contract: the domain is ordered Corpus(B1) -> Capabilities(B2) -> Transition(B3),
-not the reverse; ownership is assigned per stage; B3 (ISO 14001 -> corpus) is blocked until both sides
-exist. If a future edit reverses the order or drops an owner, this test fails.
+Pins the program FRAMEWORK contract: a ranked backlog of domain definitions, each entered by INDUSTRY
+with its typical requirement sources + a pre-onboarding capability hypothesis (typical_certifications).
+Industrial Automation is rank 1. Environmental stays law-first. If a future edit reorders the backlog,
+drops a source list, or reverts environmental to an ISO-first framing, these tests fail.
 """

 from __future__ import annotations
@@ -11,45 +12,60 @@ import os

 import yaml

-_PROG = os.path.join(os.path.dirname(__file__), "..", "knowledge", "programs", "environmental.yaml")
+_DIR = os.path.join(os.path.dirname(__file__), "..", "knowledge", "programs")


-def _program():
-    with open(_PROG, encoding="utf-8") as f:
-        return yaml.safe_load(f)
+def _programs():
+    out = {}
+    for f in sorted(os.listdir(_DIR)):
+        if f.endswith(".yaml"):
+            with open(os.path.join(_DIR, f), encoding="utf-8") as h:
+                p = yaml.safe_load(h)
+            out[p["id"]] = p
+    return out


-def test_blueprint_is_the_reusable_production_line():
-    p = _program()
-    assert p["blueprint"] == ["corpus", "obligations", "capabilities", "transition_patterns",
-                              "playbooks", "reference_scenarios", "completeness"]
+def test_five_domains_ranked_backlog():
+    ranks = sorted(p["backlog_rank"] for p in _programs().values())
+    assert ranks == [1, 2, 3, 4, 5]


-def test_stages_are_law_first_in_order():
-    stages = _program()["stages"]
-    assert [s["id"] for s in stages] == ["B1", "B2", "B3"]          # corpus -> capabilities -> transition
-    assert "Corpus" in stages[0]["name"] and "Transition" in stages[2]["name"]
+def test_industrial_automation_is_rank_1():
+    progs = _programs()
+    rank1 = [p for p in progs.values() if p["backlog_rank"] == 1]
+    assert len(rank1) == 1 and rank1[0]["id"] == "PROG-industrial-automation"
+    assert {"CRA", "MaschinenVO"} <= set(rank1[0]["typical_requirement_sources"])


-def test_ownership_assigned_per_stage():
-    by = {s["id"]: s for s in _program()["stages"]}
-    assert "Legal Knowledge" in by["B1"]["owner"]                   # corpus + obligations
-    assert "Compliance Execution" in by["B2"]["owner"]             # capability model
-    assert "Reasoning" in by["B3"]["owner"]                        # transition patterns
+def test_every_domain_entered_by_industry_with_sources_and_hypothesis():
+    for p in _programs().values():
+        assert p.get("industry") and p.get("customer_entry")           # industry-first entry
+        assert p["typical_requirement_sources"]                         # stage 2 defined
+        assert p["typical_certifications"]                             # pre-onboarding capability hypothesis (ETO)


-def test_transition_is_blocked_until_both_sides_known():
-    b3 = {s["id"]: s for s in _program()["stages"]}["B3"]
-    assert b3["status"] == "blocked"
-    assert b3["depends_on"] == ["B1", "B2"]                         # built LAST (law-first)
+def test_no_stored_stage_status_progress_is_derived():
+    # the 7-stage progress is computed-not-stored: program shells must NOT hard-code stage status
+    for p in _programs().values():
+        assert "stages" not in p


-def test_b1_covers_the_six_environmental_areas():
-    b1 = {s["id"]: s for s in _program()["stages"]}["B1"]
-    assert set(b1["areas"]) == {"water", "chemicals", "emissions", "energy", "waste", "product_responsibility"}
+def test_environmental_stays_law_first():
+    env = _programs()["PROG-environmental"]
+    assert "ISO 14001 ist KEIN Umweltrecht" in env["principle"]
+    assert set(env["typical_requirement_sources"]) == {"water", "chemicals", "emissions", "energy", "waste", "product_responsibility"}


-def test_program_is_a_domain_not_an_iso_project():
-    p = _program()
-    assert "Umweltanforderungen" in p["customer_question"]          # starts from the law, not ISO 14001
-    assert "ISO 14001 ist KEIN Umweltrecht" in p["principle"]
+def test_automotive_and_medical_present():
+    progs = _programs()
+    assert "TISAX" in progs["PROG-automotive"]["typical_requirement_sources"]
+    assert "MDR" in progs["PROG-medical"]["typical_requirement_sources"]
+
+
+def test_readme_documents_seven_stage_checklist():
+    with open(os.path.join(_DIR, "README.md"), encoding="utf-8") as h:
+        readme = h.read()
+    for stage in ["Domain Model", "Requirement Sources", "Capability Registry",
+                  "Transition Patterns", "Playbooks", "Reference Scenarios", "Completeness"]:
+        assert stage in readme
+    assert "Industrial Automation" in readme                            # backlog #1 documented
@@ -0,0 +1,53 @@
+# ADR-009: Domain Knowledge Program — one 7-stage production line per domain
+
+- **Status:** Accepted
+- **Datum:** 2026-06-27
+- **Typ:** Architektur- / Organisations-Entscheidung
+- **Bezug:** [ADR-008](ADR-008-from-architecture-to-domains.md), [ADR-007](ADR-007-regulatory-completeness.md), [ADR-005](ADR-005-knowledge-production-pipeline.md), Architektur-Freeze v1.0, [[company-intelligence-2a]]
+
+## Kontext
+
+Der Engpass ist nicht mehr Architektur, Controls oder „Wissen" allgemein, sondern präzise:
+**Domänenmodellierung.** Phase B (ADR-008) wird daher nicht als Einzel-Regelwerk-Features
+organisiert, sondern als EIN Arbeitsprogramm mit Unterprogrammen je Domäne — alle durch dieselbe
+Produktionsstraße. Kein weiteres Architektur-Epic, keine neue Runtime-Architektur.
+
+## Entscheidung
+
+1. **Einstieg über die INDUSTRIE, nicht über das Regelwerk.** Der Kunde sagt „ich baue
+   Verpackungsmaschinen / bin Automobilzulieferer / baue Parksysteme", nicht „erklär mir ISO 9001".
+   Die Pipeline beginnt davor: `Industry → Domain Model → Requirement Sources → Requirements →
+   Capabilities → … → Completeness`.
+
+2. **Eine 7-Stufen-Checkliste, identisch für JEDE Domäne:**
+   1 Domain Model · 2 Requirement Sources · 3 Capability Registry · 4 Transition Patterns ·
+   5 Playbooks · 6 Reference Scenarios · 7 Completeness. Ownership je Stufe (1 Reasoning · 2 Legal-KG ·
+   3 Execution · 4–7 Reasoning). Das ist der Skalierungsmechanismus: jede neue Domäne nutzt dieselbe
+   Straße, die bestehenden Engines erweitern sich automatisch.
+
+3. **Domänen tragen `typical_requirement_sources` + `typical_certifications` → Pre-Onboarding-HYPOTHESE
+   (ETO-Einsicht).** Vor dem Onboarding: „diese Prozesswelt ist *wahrscheinlich* vorhanden" — als
+   Hypothese, nie Wahrheit. Speist Company 2A als `inferred`, nie `confirmed`. Wir wollen nicht wissen,
+   OB ein Automobilzulieferer ISO 9001 hat (das hat jeder), sondern welche Fähigkeiten dadurch
+   wahrscheinlich schon vorhanden sind.
+
+4. **Per-Domain-KPI, reproduzierbar (computed-not-stored).** Reifegrad wird aus dem ECHTEN Korpus
+   abgeleitet (modellierte Sources / Transition Patterns / Playbooks / Reference Scenarios / bewusst
+   ausgewiesene Lücken — auf Basis der Regulatory Completeness Engine), NICHT als kuratierte Zahl.
+   Programm-Shells speichern KEINEN Stufen-Status. Keine Marketingzahl.
+
+5. **Domain Knowledge Program v1 — Backlog nach Kundennutzen** (getrennt vom KPI nach Korpusstand):
+   1 Industrial Automation · 2 Environmental · 3 Automotive · 4 Medical · 5 Energy.
+
+## Konsequenzen
+
+- **Programme statt Features:** jede Domäne ist eine maschinenlesbare Definition (`programs/*.yaml`);
+  der Reifegrad-KPI im Reference-Suite ist aus dem Korpus abgeleitet und differenziert ehrlich
+  (Industrial Automation führt, Environmental 0 % — die Arbeit liegt vor uns).
+- **Backlog ≠ KPI:** der Backlog ordnet nach Kundennutzen, der KPI misst den echten Korpusstand —
+  bewusst getrennt (z. B. eine Domäne kann hoch im Backlog, aber niedrig im KPI stehen).
+- **Arbeit verschiebt sich endgültig von Software- zu Wissensproduktion.** Wettbewerbsvorteil =
+  Qualität und Breite der modellierten Domänen.
+- **Freeze-konform:** kein neues Metamodell, kein Graph, kein neues `compliance/`-Modul. Nur
+  Programm-Daten (`knowledge/programs/`) + abgeleitete Reporting-Sicht im Reference-Suite.
+- Diese ADR ist non-runtime → kein Deploy (siehe [ADR-001](ADR-001-runtime-deploy-policy.md)).