feat(vocabulary): Domain Vocabulary — identity vs representation; regulation aliases fix the KPI normalization

Before the next Journey: the LANGUAGE. With 5 knowledge objects but no vocabulary, the same reise gets
named four different ways (ISO9001->MaschinenVO vs Quality Management->Product Safety vs ...). The spec
answers ONE question: which terms are IDENTITIES and which are REPRESENTATIONS of the same meaning?

- spec docs-src/architecture/domain-vocabulary-spec-v1.md (PROPOSAL): identity hierarchy
  (Requirement RQ / Capability MCAP [Registry 2C] / regulation-source-target / Journey Class MJRN
  [PROVISIONAL] / Journey instance / Playbook MPLB); canonical name + aliases; capability vocabulary =
  the Capability Registry (not rebuilt); reorder Vocabulary -> Transition #2 -> #3 -> Rule of Three.
- knowledge/vocabulary/regulations.yaml: regulation/standard IDENTITIES (id + canonical + aliases).
  SOLVES the regulation-ID normalization the KPIs flagged: CRA == "Cyber Resilience Act" == "Regulation
  (EU) 2024/2847" all resolve to `cra`; ISO9001/QMS -> iso9001; etc. Shared artifact (@Legal-KG/@Execution
  please adopt).
- knowledge/vocabulary/journey_classes.yaml (PROVISIONAL): clusters our transitions into classes
  (Information Security -> Product Cybersecurity; Quality Management -> Product Compliance/Safety).
  Finding: ISO9001->MaschinenVO is an INSTANCE of an existing class (like ISO9001->CRA, ISO13485->MDR),
  not a new kind -> avoids duplication. Journey Class is a new abstraction -> its own Rule of Three (no
  MJRN minting yet).
- reference suite: both KPIs now read aliases from regulations.yaml instead of hard-coded maps; the
  "Regelwerk-ID-Normalisierung" line flips TODO -> PASS. KPI numbers unchanged (vocab is a superset).
- Side effect = Requirements Intelligence: a Tender "Security Patch Procedure" resolves to MCAP-0017.

7 vocabulary tests (17 with domain programs), check-loc 0. Knowledge data + spec + reference harness =
non-runtime -> no deploy (ADR-001). No new module, no runtime change, no minting (Freeze).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-28 08:11:30 +02:00
parent 23a6f02ec2
commit ecae5bc7f1
6 changed files with 239 additions and 10 deletions
@@ -0,0 +1,30 @@
# Domain Vocabulary — Journey CLASSES (PROVISIONAL). A class clusters journey instances that are
# "the same reise". So we do NOT write a new journey for every certification when many share a class.
# PROVISIONAL: Journey Class is a NEW abstraction -> its OWN Rule of Three (>= 3 instances per class
# before minting MJRN ids). Endpoints reference regulation vocabulary ids (see regulations.yaml).
id: VOCAB-journey-classes-v1
status: provisional
classes:
- id: infosec-to-product-cyber # provisional id, NOT a minted MJRN
name: "Information Security → Product Cybersecurity"
from_kind: information_security
to_kind: product_cybersecurity
instances:
- {from: iso27001, to: cra} # ✅ modelled (TP-ISO27001-CRA-v1)
- {from: tisax, to: cra} # ⏳ Rule-of-Three transition #3
- {from: iec62443, to: cra} # ⏳
- id: qm-to-product-compliance
name: "Quality Management → Product Compliance/Safety"
from_kind: quality_management
to_kind: product_compliance_safety
instances:
- {from: iso9001, to: cra} # ✅ modelled (TP-ISO9001-CRA-v1)
- {from: iso9001, to: maschinenvo} # ⏳ Rule-of-Three transition #2 — INSTANCE of this class, not a new kind
- {from: iso13485, to: mdr} # same CLASS, different domain (medical) — proves the class generalises
note: >
Befund: ISO9001→MaschinenVO ist KEINE neue Journey-Art, sondern eine INSTANZ der Klasse
„Quality Management → Product Compliance/Safety" (wie ISO9001→CRA, ISO13485→MDR). Das ist genau die
Duplikation, die das Vokabular verhindert.
@@ -0,0 +1,21 @@
# Domain Vocabulary — regulation/standard IDENTITIES (Requirement Sources + Targets).
# Each has a stable id + a canonical name + every alias/spelling. SOLVES the regulation-ID
# normalization that the Transition Coverage KPI + Knowledge Intake flagged (CRA vs "Cyber Resilience
# Act"). Reasoning seeds this; @Legal-KG / @Execution please adopt as the SHARED vocabulary.
# Not runtime, no minting — a shared knowledge artifact.
id: VOCAB-regulations-v1
regulations:
- {id: cra, canonical: "Cyber Resilience Act", aliases: [CRA, "Cyber Resilience Act", "Regulation (EU) 2024/2847"]}
- {id: maschinenvo, canonical: "Maschinenverordnung", aliases: [MaschinenVO, Maschinenverordnung, "Machinery Regulation", "Regulation (EU) 2023/1230"]}
- {id: iso9001, canonical: "ISO 9001", aliases: [ISO9001, "ISO 9001", "ISO/IEC 9001", QMS, "Quality Management System"]}
- {id: iso27001, canonical: "ISO/IEC 27001", aliases: [ISO27001, "ISO 27001", "ISO/IEC 27001", ISMS, "Information Security Management System"]}
- {id: tisax, canonical: "TISAX", aliases: [TISAX, "Trusted Information Security Assessment Exchange"]}
- {id: iec62443, canonical: "IEC 62443", aliases: [IEC62443, "IEC 62443", "ISO/IEC 62443"]}
- {id: nis2, canonical: "NIS2", aliases: [NIS2, "NIS 2", "Directive (EU) 2022/2555"]}
- {id: dataact, canonical: "Data Act", aliases: [DataAct, "Data Act", "Regulation (EU) 2023/2854"]}
- {id: iso13485, canonical: "ISO 13485", aliases: [ISO13485, "ISO 13485"]}
- {id: mdr, canonical: "MDR", aliases: [MDR, "Medical Device Regulation", "Regulation (EU) 2017/745"]}
- {id: iec62304, canonical: "IEC 62304", aliases: [IEC62304, "IEC 62304"]}
- {id: iso14001, canonical: "ISO 14001", aliases: [ISO14001, "ISO 14001"]}
- {id: iatf16949, canonical: "IATF 16949", aliases: [IATF16949, "IATF 16949", IATF]}
@@ -141,6 +141,25 @@ def completeness_section() -> None:
])
def _regulation_aliases(base_dir):
"""Build a normalized alias -> canonical-id map from the Domain Vocabulary (regulations.yaml)."""
import os
import yaml
path = os.path.join(base_dir, "..", "knowledge", "vocabulary", "regulations.yaml")
amap = {}
with open(path, encoding="utf-8") as h:
for r in (yaml.safe_load(h) or {}).get("regulations", []):
for name in [r["canonical"]] + list(r.get("aliases", [])):
amap["".join(c for c in str(name).lower() if c.isalnum())] = r["id"]
return amap
def _canon_reg(s, amap):
"""Canonicalize a regulation string via the vocabulary (replaces the old hard-coded alias maps)."""
return amap.get("".join(c for c in str(s).lower() if c.isalnum()),
"".join(c for c in str(s).lower() if c.isalnum()))
def domain_programs_section(base_dir) -> None:
"""Domain Knowledge Program v1 — per-domain maturity KPI DERIVED from the corpus (computed-not-stored)."""
import os
@@ -159,11 +178,10 @@ def domain_programs_section(base_dir) -> None:
for f in sorted(os.listdir(pdir)) if f.endswith(".yaml")]
progs = sorted((p for p in _all if "backlog_rank" in p), key=lambda p: p["backlog_rank"]) # domain programs only
_ALIAS = {"cyber resilience act": "cra", "maschinenverordnung": "maschinenvo", "iatf": "iatf16949"}
_amap = _regulation_aliases(base_dir) # Domain Vocabulary (regulations.yaml)
def _canon(r):
k = str(r).strip().lower()
return _ALIAS.get(k, k)
return _canon_reg(r, _amap)
def _hits(reg_lists, src):
cs = {_canon(s) for s in src}
@@ -199,7 +217,7 @@ def domain_programs_section(base_dir) -> None:
coverage_table([
("Domain Knowledge Program (7-Stufen-Produktionsstraße)", "PASS", "%d Domänen im Backlog, Industrial Automation #1" % len(progs)),
("Reifegrad-KPI (computed-not-stored)", "PASS", "aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne)"),
("Regelwerk-ID-Normalisierung", "TODO", "Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend"),
("Regelwerk-ID-Normalisierung (Domain Vocabulary)", "PASS", "Aliase aus `vocabulary/regulations.yaml`, nicht mehr hartkodiert"),
])
@@ -224,12 +242,10 @@ def transition_coverage_section(base_dir) -> None:
for it in (to if isinstance(to, list) else [to]) if isinstance(it, dict)]
pats.append((frm, [t for t in tos if t], str(d.get("status", "draft"))))
_ALIAS = {"isoiec27001": "iso27001", "isoiec62443": "iec62443",
"cyberresilienceact": "cra", "maschinenverordnung": "maschinenvo"}
_amap = _regulation_aliases(base_dir) # Domain Vocabulary (regulations.yaml)
def _c(s):
k = "".join(ch for ch in str(s).lower() if ch.isalnum())
return _ALIAS.get(k, k)
return _canon_reg(s, _amap)
_RANK = {"draft": 1, "reviewed": 2, "validated": 3, "proven": 4}
_ICON = {0: "⚪ nicht begonnen", 1: "🟡 Draft", 2: "✅ reviewed", 3: "✅ validated", 4: "✅ Gold"}
@@ -385,7 +385,7 @@ _Industry-Einstieg + ETO-Hypothese: jede Domäne kennt ihre typischen Sources +
|---|---|---|
| Domain Knowledge Program (7-Stufen-Produktionsstraße) | **PASS** | 5 Domänen im Backlog, Industrial Automation #1 |
| Reifegrad-KPI (computed-not-stored) | **PASS** | aus echtem Korpus abgeleitet (TP/PB/RTS je Domäne) |
| Regelwerk-ID-Normalisierung | **TODO** | Alias CRA/MaschinenVO im KPI — kanonische IDs ausstehend |
| Regelwerk-ID-Normalisierung (Domain Vocabulary) | **PASS** | Aliase aus `vocabulary/regulations.yaml`, nicht mehr hartkodiert |
## Transition Coverage — die Transition ist die Wissenseinheit (Operational Knowledge)
@@ -424,5 +424,5 @@ _Der Kunde kauft nicht „EMV-Domain", sondern „wir haben ISO 9001 — helfen
## Suite-Status (Roll-up)
- Coverage-Zellen gesamt: **50**
- PASS: **38** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 7 · N/A: 1 · NEEDS_FACTS: 0
- PASS: **39** · PARTIAL: 3 · UNSUPPORTED: 1 · TODO: 6 · N/A: 1 · NEEDS_FACTS: 0
- Fortschritt = PASS-Anteil steigt, wenn Epics RS-001…004 landen (objektiver Maßstab, kein LOC).
@@ -0,0 +1,86 @@
"""Characterization tests for the Domain Vocabulary (data, not code).
Pins the IDENTITY-vs-REPRESENTATION contract: regulations have a stable id + canonical name + aliases
(so CRA and "Cyber Resilience Act" resolve to the SAME identity — the normalization that the KPIs
flagged). Journey classes cluster transition instances so we do not duplicate the same reise; they
are PROVISIONAL (no MJRN minting) and reference regulation ids that exist in the vocabulary.
"""
from __future__ import annotations
import os
import yaml
_VOCAB = os.path.join(os.path.dirname(__file__), "..", "knowledge", "vocabulary")
def _regs():
with open(os.path.join(_VOCAB, "regulations.yaml"), encoding="utf-8") as h:
return yaml.safe_load(h)["regulations"]
def _classes():
with open(os.path.join(_VOCAB, "journey_classes.yaml"), encoding="utf-8") as h:
return yaml.safe_load(h)
def _norm(s):
return "".join(c for c in str(s).lower() if c.isalnum())
def _alias_map():
amap = {}
for r in _regs():
for name in [r["canonical"]] + list(r.get("aliases", [])):
amap[_norm(name)] = r["id"]
return amap
def test_every_regulation_has_id_canonical_aliases():
for r in _regs():
assert r["id"] and r["canonical"] and r["aliases"]
assert r["id"] == r["id"].lower() # ids are lowercase stable keys
def test_cra_spellings_resolve_to_one_identity():
amap = _alias_map()
# the exact normalization the KPIs needed: CRA == Cyber Resilience Act
assert amap[_norm("CRA")] == "cra" and amap[_norm("Cyber Resilience Act")] == "cra"
assert amap[_norm("Regulation (EU) 2024/2847")] == "cra"
def test_iso_and_management_system_aliases_resolve():
amap = _alias_map()
assert amap[_norm("ISO9001")] == "iso9001" and amap[_norm("QMS")] == "iso9001"
assert amap[_norm("ISO/IEC 27001")] == "iso27001" and amap[_norm("ISMS")] == "iso27001"
assert amap[_norm("Maschinenverordnung")] == "maschinenvo" and amap[_norm("MaschinenVO")] == "maschinenvo"
def test_aliases_are_unambiguous():
# no normalized alias maps to two different regulation identities
seen = {}
for r in _regs():
for name in [r["canonical"]] + list(r.get("aliases", [])):
k = _norm(name)
assert seen.get(k, r["id"]) == r["id"], "ambiguous alias %r" % name
seen[k] = r["id"]
def test_journey_classes_are_provisional():
assert _classes()["status"] == "provisional" # new abstraction -> own Rule of Three
def test_iso9001_maschinenvo_is_an_instance_not_a_new_kind():
classes = _classes()["classes"]
qm = [c for c in classes if c["id"] == "qm-to-product-compliance"][0]
pairs = {(i["from"], i["to"]) for i in qm["instances"]}
assert ("iso9001", "maschinenvo") in pairs # same CLASS as iso9001->cra, iso13485->mdr
assert ("iso13485", "mdr") in pairs # class generalises across domains
def test_class_endpoints_reference_known_regulations():
reg_ids = {r["id"] for r in _regs()}
for c in _classes()["classes"]:
for inst in c["instances"]:
assert inst["from"] in reg_ids and inst["to"] in reg_ids # vocabulary is internally consistent