Files
breakpilot-compliance/backend-compliance/reference_scenarios/journey_matcher_demo.py
T
Benjamin Admin 80bf1993e0 feat: Journey Matcher — the delta explains the journey (Delta -> Journey, ADR-011)
The sanctioned last architectural building block. Reverses the order: not Goal -> Journey -> Delta
but Goal -> Required -> Delta -> Journey. A Journey is the EXPLANATION of the Capability Delta, not
its cause — so this is a Matcher/Explainer, not a Selector.

New module compliance/journey_matcher/ = the third independent, interchangeable function of the
pipeline, beside Company 2A (Evidence -> Capability) and RS-005 (Capability -> Delta):

  match_journeys(delta, journeys, context) -> ranked, auditable explanation

- Looks ONLY at the Capability Delta — never at certificates, regulation, tenders or the goal.
  Journey signatures are certificate-agnostic capability clusters (Input -> Output pattern).
- score = share of the delta a journey explains (recall over the missing capabilities); journey_only
  documents where a journey reaches beyond the delta so a broad journey is not silently preferred.
- Deliberately dumb + deterministic (pure set overlap; NO ML/embeddings/LLM), fully auditable
  (matched / unexplained / journey_only / context signals); a learning ranker can sit on top later.
- Signatures injected, engine hermetic. mypy --strict clean.

Validated on the real patterns (demo): a CRA+MaschinenVO delta ranks the convergence journey 100%,
"ISO27001 -> CRA" 56% (misses the machine-safety caps), "ISMS -> TISAX" 0%. This resolves the
"Scope -> Journey" jump from Customer Mission #1. Freeze exception explicitly authorised; non-runtime
-> no deploy. 12 tests pass, check-loc 0.
2026-06-28 10:36:43 +02:00

109 lines
6.0 KiB
Python

# ruff: noqa
# mypy: ignore-errors
"""Journey Matcher demo — Delta -> Journey on the REAL transition patterns.
Validates the new matcher end-to-end: take a real Capability Delta (a multi-certified company that
wants CRA + MaschinenVO), then rank the KNOWN journeys purely by how much of THAT delta each explains.
The matcher never looks at the certificates, the regulation or the goal — only at the delta. The
journey is the EXPLANATION of the delta, not its cause (order: Goal -> Required -> Delta -> Journey).
Journey signatures are derived from the transition-pattern YAMLs here (non-core), then injected into the
hermetic engine. Synthetic company (NO real names). Non-runtime -> no deploy.
Run: cd backend-compliance && PYTHONPATH=. python3 reference_scenarios/journey_matcher_demo.py
"""
from __future__ import annotations
import os
import yaml
from compliance.company import (
CompanyContext, Certification, CapabilityMappingEntry, build_company_profile,
)
from compliance.reasoning.enums import Confidence
from compliance.transition_reasoning import (
TransitionContext, TransitionGoal, TargetRequirement, assess_transition, CoverageStatus,
)
from compliance.journey_matcher import JourneySignature, MatchContext, match_journeys
OUT = []
def w(s=""):
OUT.append(s)
_K = os.path.join(os.path.dirname(__file__), "..", "knowledge", "transition_patterns")
_PATTERNS = {
"transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml": ("ISO27001 -> CRA + MaschinenVO", "regulation"),
"transition_pattern_iso27001_to_cra_v1.yaml": ("ISO27001 -> CRA", "regulation"),
"transition_pattern_iso9001_to_cra_v1.yaml": ("ISO9001 -> CRA", "regulation"),
"transition_pattern_isms_to_tisax_v1.yaml": ("ISMS -> TISAX", "certification"),
}
def _load(name):
return yaml.safe_load(open(os.path.join(_K, name), encoding="utf-8"))
# ── Journey library: signatures = capability CLUSTERS (the matcher never reads the IDs) ──────
journeys = []
for fname, (label, ttype) in _PATTERNS.items():
p = _load(fname)
journeys.append(JourneySignature(
journey_id=p.get("id", fname),
label=label,
capability_pattern=[d["capability"] for d in p["delta_requirements"]], # OUTPUT cluster
assumed_capabilities=[a["capability"] for a in p["likely_covered"]], # INPUT cluster
target_type=ttype,
))
# ── A real Capability Delta: multi-certified company that wants CRA + MaschinenVO ────────────
CP = _load("transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml")
infosec = [a["capability"] for a in CP["likely_covered"]]
cmap = {
"ISO27001": CapabilityMappingEntry(capability_ids=infosec, confidence=Confidence.MEDIUM),
"PSIRT": CapabilityMappingEntry(capability_ids=["coordinated_vulnerability_disclosure",
"exploited_vuln_and_incident_reporting"], confidence=Confidence.HIGH),
"ISO9001": CapabilityMappingEntry(capability_ids=["ce_conformity_assessment_and_technical_documentation"],
confidence=Confidence.MEDIUM),
}
profile = build_company_profile(
CompanyContext(company_id="d", certifications=[Certification(certification_id=k) for k in cmap]), cmap)
reqs = [TargetRequirement(capability_id=a["capability"]) for a in CP["likely_covered"]]
reqs += [TargetRequirement(capability_id=d["capability"]) for d in CP["delta_requirements"]]
assess = assess_transition(TransitionContext(company_id="d", target=TransitionGoal(target_id="CRA+MaschinenVO")), reqs, profile)
delta = sorted({c.capability_id for c in assess.coverage if c.status == CoverageStatus.MISSING})
# ── Delta -> Journey: rank the known journeys that EXPLAIN this delta ────────────────────────
result = match_journeys(delta, journeys, MatchContext(target_type="regulation"))
w("# Journey Matcher — Delta -> Journey (an echten Pattern validiert)")
w("")
w('_Der Matcher fragt NICHT „welche Journey passt?", sondern „welche bekannten Journeys ERKLÄREN dieses Capability Delta?". Er sieht nur das Delta — keine Zertifikate, kein Regelwerk, kein Ziel. Journey = Erklärung, nicht Ursache. Deterministisch, kein ML/Embedding/LLM. Synthetischer Kunde, keine echten Namen._')
w("")
w("## Eingang: ein echtes Capability Delta")
w("- Multi-zertifiziertes Unternehmen will **CRA + MaschinenVO** → **%d fehlende Capabilities** (aus RS-005)." % len(delta))
w("- Der Matcher bekommt **nur diese %d Capabilities** — sonst nichts." % len(delta))
w("")
w("## Delta -> Journey: Rangliste (Anteil des Deltas, den die Journey erklärt)")
w("> %s" % result.headline)
w("")
w("| Journey (Capability-Cluster) | erklärt | Anteil |")
w("|---|---|---|")
for m in result.matches:
w("| **%s** | %s | %d%% |" % (m.label, m.explains, round(m.score * 100)))
w("")
b = result.best
w('## Warum „%s"? — auditierbar, keine Blackbox' % b.label)
w("- **Erklärte Capabilities (%d):** %s" % (len(b.reason.matched_capabilities), ", ".join("`%s`" % c for c in b.reason.matched_capabilities[:6]) + (" …" if len(b.reason.matched_capabilities) > 6 else "")))
w("- **Nicht erklärt (Rest-Delta):** %s" % (", ".join("`%s`" % c for c in b.reason.unexplained_delta) or "— (Journey erklärt das GESAMTE Delta)"))
w("- **Journey reicht darüber hinaus:** %s" % (", ".join("`%s`" % c for c in b.reason.journey_only) or "—"))
w("- **Kontext-Signale:** %s" % (", ".join(b.reason.context_signals) or "—"))
w("")
w("## Der Paradigmenwechsel")
w("")
w('> Reihenfolge ist jetzt **`Goal → Required → Delta → Journey`**, nicht mehr `Goal → Journey → Delta`. Die Journey ist die **Erklärung** des Deltas. Der Matcher ist bewusst **dumm + deterministisch** (reine Mengenüberlappung) und damit auditierbar; ein lernendes Ranking kann später DAVOR gesetzt werden. Drei austauschbare Funktionen: `Evidence→Capability` (Company 2A) · `Capability→Delta` (RS-005) · **`Delta→Journey` (dieser Matcher)**. In keiner kommt „Regulation" als Sonderfall vor — CRA, TISAX, Ausschreibung, OEM-Spec und Umweltziel sind nur verschiedene Quellen des Required State.')
w("")
print("\n".join(OUT))