feat: Journey Matcher — the delta explains the journey (Delta -> Journey, ADR-011)

The sanctioned last architectural building block. Reverses the order: not Goal -> Journey -> Delta but Goal -> Required -> Delta -> Journey. A Journey is the EXPLANATION of the Capability Delta, not its cause — so this is a Matcher/Explainer, not a Selector. New module compliance/journey_matcher/ = the third independent, interchangeable function of the pipeline, beside Company 2A (Evidence -> Capability) and RS-005 (Capability -> Delta): match_journeys(delta, journeys, context) -> ranked, auditable explanation - Looks ONLY at the Capability Delta — never at certificates, regulation, tenders or the goal. Journey signatures are certificate-agnostic capability clusters (Input -> Output pattern). - score = share of the delta a journey explains (recall over the missing capabilities); journey_only documents where a journey reaches beyond the delta so a broad journey is not silently preferred. - Deliberately dumb + deterministic (pure set overlap; NO ML/embeddings/LLM), fully auditable (matched / unexplained / journey_only / context signals); a learning ranker can sit on top later. - Signatures injected, engine hermetic. mypy --strict clean. Validated on the real patterns (demo): a CRA+MaschinenVO delta ranks the convergence journey 100%, "ISO27001 -> CRA" 56% (misses the machine-safety caps), "ISMS -> TISAX" 0%. This resolves the "Scope -> Journey" jump from Customer Mission #1. Freeze exception explicitly authorised; non-runtime -> no deploy. 12 tests pass, check-loc 0.
2026-06-28 10:36:43 +02:00
parent 3c6e2a2acc
commit 80bf1993e0
8 changed files with 511 additions and 0 deletions
@@ -0,0 +1,30 @@
+"""Journey Matcher — the Delta -> Journey function of the Capability Delta Engine.
+
+The third independent function of the pipeline (after Company 2A `Evidence -> Capability` and RS-005
+`Capability -> Delta`): given ONLY the Capability Delta, rank the known journeys that best EXPLAIN it.
+A Journey is an EXPLANATION of the delta, not its cause — order is `Goal -> Required -> Delta -> Journey`.
+
+Deliberately dumb + deterministic (pure set overlap; no ML/embeddings/LLM), fully auditable, signatures
+INJECTED (certificate-agnostic capability clusters). No new corpus, no graph (freeze v1.0). The Matcher
+is sanctioned as the last architectural building block; everything after is knowledge work.
+"""
+
+from __future__ import annotations
+
+from .engine import match_journeys
+from .schemas import (
+    JourneyMatch,
+    JourneyMatchReason,
+    JourneyMatchResult,
+    JourneySignature,
+    MatchContext,
+)
+
+__all__ = [
+    "match_journeys",
+    "JourneySignature",
+    "MatchContext",
+    "JourneyMatch",
+    "JourneyMatchReason",
+    "JourneyMatchResult",
+]
@@ -0,0 +1,94 @@
+"""Journey Matcher — the Delta -> Journey function of the Capability Delta Engine.
+
+Three INDEPENDENT functions now compose the pipeline, each a different problem, all interchangeable:
+  1. Evidence   -> Capability   (Company 2A)
+  2. Capability -> Delta        (RS-005, transition_reasoning)
+  3. Delta      -> Journey      (THIS module)
+
+The paradigm shift: a Journey is no longer the CAUSE (Goal -> Journey -> Delta) but the EXPLANATION
+(Goal -> Required -> Delta -> Journey). The matcher does NOT look at certifications, regulations,
+tenders, OEM specs or the goal — it looks ONLY at the Capability Delta and asks: which known journeys
+describe exactly this delta? Output is a ranked, auditable explanation ("Journey A explains 82% of the
+delta, because 8 of 10 missing capabilities are identical, same target type, ...").
+
+Deliberately DUMB and deterministic: pure set overlap, NO ML, NO embeddings, NO LLM. A learning ranker
+can be layered ON TOP later; this core stays auditable. Journey signatures are INJECTED (certificate-
+agnostic capability clusters), never loaded here — the engine stays hermetic. No new corpus, no
+graph/meta-model class (freeze v1.0). Python 3.9 compatible.
+
+Honesty: `score` is the share of the DELTA a journey explains (recall over the customer's missing
+capabilities), never a "fit" or a compliance verdict. `journey_only` documents where a journey reaches
+BEYOND this delta, so a broad journey that explains everything is not silently preferred.
+"""
+
+from __future__ import annotations
+
+from typing import List, Optional, Sequence
+
+from .schemas import (
+    JourneyMatch,
+    JourneyMatchReason,
+    JourneyMatchResult,
+    JourneySignature,
+    MatchContext,
+)
+
+
+def _context_signals(journey: JourneySignature, context: Optional[MatchContext]) -> List[str]:
+    """Corroborating reasons only — these are documented, they never change the score."""
+    if context is None:
+        return []
+    signals: List[str] = []
+    if context.target_type and journey.target_type and context.target_type == journey.target_type:
+        signals.append("gleiche Zielart")
+    if context.industry and journey.industry and context.industry == journey.industry:
+        signals.append("gleiche Branche")
+    if context.product_type and journey.product_type and context.product_type == journey.product_type:
+        signals.append("gleicher Produkttyp")
+    return signals
+
+
+def match_journeys(
+    delta: Sequence[str],
+    journeys: Sequence[JourneySignature],
+    context: Optional[MatchContext] = None,
+) -> JourneyMatchResult:
+    """Rank known journeys by the share of the Capability Delta they EXPLAIN.
+
+    `delta` = the customer's MISSING capabilities (from RS-005). `journeys` = injected, certificate-
+    agnostic signatures. score = |delta INTERSECT pattern| / |delta|. Ranking is deterministic:
+    score desc, then context-signal count desc (corroboration only), then journey_id asc. Context
+    never changes the score — only the documented reasons. Pure; no I/O; computed-not-stored.
+    """
+    delta_set = set(delta)
+    n = len(delta_set)
+    matches: List[JourneyMatch] = []
+    for j in journeys:
+        pattern = set(j.capability_pattern)
+        matched = sorted(delta_set & pattern)
+        score = (len(matched) / n) if n else 0.0
+        signals = _context_signals(j, context)
+        reason = JourneyMatchReason(
+            matched_capabilities=matched,
+            unexplained_delta=sorted(delta_set - pattern),
+            journey_only=sorted(pattern - delta_set),
+            context_signals=signals,
+        )
+        matches.append(
+            JourneyMatch(
+                journey_id=j.journey_id,
+                label=j.label,
+                score=round(score, 2),
+                explains="%d von %d fehlenden Capabilities" % (len(matched), n),
+                reason=reason,
+            )
+        )
+    matches.sort(key=lambda m: (-m.score, -len(m.reason.context_signals), m.journey_id))
+    best = matches[0] if matches and matches[0].score > 0.0 else None
+    headline = (
+        "%d Journeys erklaeren das Delta; beste: %s (%d%% des Deltas)"
+        % (sum(1 for m in matches if m.score > 0.0), best.label, round(best.score * 100))
+        if best
+        else "Keine bekannte Journey erklaert dieses Delta (neue Journey-Kandidatin)"
+    )
+    return JourneyMatchResult(delta_size=n, matches=matches, best=best, headline=headline)
@@ -0,0 +1,66 @@
+"""Schemas for the Journey Matcher — the Delta -> Journey function of the Capability Delta Engine.
+
+Derived views (computed-not-stored): nothing here is persisted; every match is recomputed from the
+input delta + injected journey signatures each call. No new corpus, no graph (freeze v1.0).
+Python 3.9 compatible (no `|` unions).
+"""
+
+from __future__ import annotations
+
+from typing import List, Optional
+
+from pydantic import BaseModel, Field
+
+
+class JourneySignature(BaseModel):
+    """A known journey described ONLY by its capability pattern (Input cluster -> Output cluster).
+
+    Deliberately certificate-/regulation-agnostic: the match uses `capability_pattern` alone. `label`
+    and the context fields exist for the human-auditable explanation, NEVER for the score. (Today the
+    signatures are derived from the transition patterns; the IDs like "ISO27001->CRA" are just one way
+    to describe the clusters — the matcher never reads them.)
+    """
+
+    journey_id: str
+    label: str
+    capability_pattern: List[str] = Field(default_factory=list)     # OUTPUT cluster: the delta this journey is about
+    assumed_capabilities: List[str] = Field(default_factory=list)   # INPUT cluster: typically already present
+    industry: Optional[str] = None
+    product_type: Optional[str] = None
+    target_type: Optional[str] = None        # context only: regulation / certification / contract / environmental
+
+
+class MatchContext(BaseModel):
+    """Optional corroborating context — surfaced as documented reasons, never part of the score."""
+
+    industry: Optional[str] = None
+    product_type: Optional[str] = None
+    target_type: Optional[str] = None
+
+
+class JourneyMatchReason(BaseModel):
+    """The auditable WHY behind one match — everything a reviewer needs, no opaque score."""
+
+    matched_capabilities: List[str] = Field(default_factory=list)   # delta INTERSECT pattern (what it explains)
+    unexplained_delta: List[str] = Field(default_factory=list)      # delta - pattern (what it does NOT explain)
+    journey_only: List[str] = Field(default_factory=list)           # pattern - delta (journey covers, not needed here)
+    context_signals: List[str] = Field(default_factory=list)        # "gleiche Zielart", "gleiche Branche", ...
+
+
+class JourneyMatch(BaseModel):
+    """One known journey, ranked by how much of the delta it EXPLAINS (not how well it 'fits')."""
+
+    journey_id: str
+    label: str
+    score: float = 0.0                       # |delta INTERSECT pattern| / |delta|, 0..1: share of the delta explained
+    explains: str = ""                       # "8 von 10 fehlenden Capabilities"
+    reason: JourneyMatchReason
+
+
+class JourneyMatchResult(BaseModel):
+    """Ranked known journeys that EXPLAIN a Capability Delta. Journey = explanation, not cause."""
+
+    delta_size: int = 0
+    matches: List[JourneyMatch] = Field(default_factory=list)       # ranked desc by score
+    best: Optional[JourneyMatch] = None
+    headline: str = ""