Merge pull request 'Smart Onboarding Advisor (ADR-012) — orchestration over existing engines' (#41) from feat/smart-onboarding-advisor into main

This commit is contained in:
pilotadmin
2026-06-28 12:46:23 +02:00
7 changed files with 487 additions and 0 deletions
@@ -0,0 +1,30 @@
"""Smart Onboarding Advisor — the onboarding runtime step (orchestration over existing engines).
Turns (company + products + certifications + target) into inferred assumptions, the next best questions
(<=5, each self-explaining), the capability delta, top measures, evidence requests and completeness —
with NO sales interpretation and NO regulation picking. Orchestrator only: no new engine/registry/
meta-model; certificate->capability hypotheses and target requirements are INJECTED.
"""
from __future__ import annotations
from .engine import advisor_start, apply_answer
from .schemas import (
AdvisorMeasure,
AdvisorQuestion,
AdvisorResult,
InferredAssumption,
OnboardingInput,
RejectedAssumption,
)
__all__ = [
"advisor_start",
"apply_answer",
"OnboardingInput",
"AdvisorResult",
"AdvisorQuestion",
"AdvisorMeasure",
"InferredAssumption",
"RejectedAssumption",
]
@@ -0,0 +1,144 @@
"""Smart Onboarding Advisor — orchestration over the existing engines (the onboarding runtime step).
The point of the whole platform, made usable: the user types company + products + certifications +
target, and the system does the rest — no sales interpretation, no regulation picking. This is an
ORCHESTRATOR, not a new engine: it wires Company 2A (Evidence -> Capability), RS-005 (Capability ->
Delta), optimization (Delta -> Roadmap) and completeness into one onboarding flow.
Three principles it must honour (acceptance criteria):
- Multi-cert works; a profile is built from ALL certificates.
- relevance(evidence, target): ISO 14001 is NOT falsely relevant to the CRA; ISO 27001/TISAX REDUCE
questions but satisfy NOTHING automatically (Welt-1 -> verification_required).
- Only the NEXT BEST questions (<= 5), each explaining WHY; every answer updates the profile.
Certificate -> probable-capability hypotheses and the target's required capabilities are INJECTED (the
hypotheses are curated knowledge, not in this code). No corpus loaded here. Python 3.9 compatible.
"""
from __future__ import annotations
from typing import Dict, List, Optional, Sequence
from ..company import (
CapabilityMappingEntry,
Certification,
CompanyCapabilityProfile,
CompanyContext,
build_company_profile,
)
from ..completeness import assess_completeness
from ..optimization import roadmap_from_delta
from ..reasoning.enums import Confidence
from ..transition_reasoning import (
CoverageStatus,
TargetRequirement,
TransitionContext,
TransitionGoal,
assess_transition,
)
from .schemas import (
AdvisorMeasure,
AdvisorQuestion,
AdvisorResult,
InferredAssumption,
OnboardingInput,
RejectedAssumption,
)
_GAIN = {"high": 3, "medium": 2, "low": 1}
_RISK = {"high": 2, "medium": 1, "low": 0}
def _profile(inp: OnboardingInput, cert_hypotheses: Dict[str, List[str]]) -> CompanyCapabilityProfile:
cmap = {
cert: CapabilityMappingEntry(capability_ids=list(caps), confidence=Confidence.MEDIUM)
for cert, caps in cert_hypotheses.items()
if cert in inp.certifications and caps
}
ctx = CompanyContext(company_id=inp.company or "company",
certifications=[Certification(certification_id=c) for c in cmap])
return build_company_profile(ctx, cmap)
def advisor_start(
inp: OnboardingInput,
cert_hypotheses: Dict[str, List[str]],
target_requirements: Sequence[TargetRequirement],
target_id: str = "target",
covers_targets: Optional[Dict[str, List[str]]] = None,
corpus_status: Optional[Dict[str, str]] = None,
uncertain: Optional[List[Dict[str, str]]] = None,
) -> AdvisorResult:
"""Run the onboarding flow: certs -> profile -> delta -> ranked next-best questions + measures.
Pure orchestration; deterministic. `cert_hypotheses` (cert -> probable cap ids) and
`target_requirements` are INJECTED. `covers_targets` (cap -> targets it closes) drives leverage.
"""
covers_targets = covers_targets or {}
required = {r.capability_id for r in target_requirements}
profile = _profile(inp, cert_hypotheses)
assess = assess_transition(
TransitionContext(company_id=inp.company or "company", target=TransitionGoal(target_id=target_id)),
list(target_requirements), profile)
# inferred (Welt-1): per cert, the caps it probably provides that are RELEVANT to this target
inferred: List[InferredAssumption] = []
rejected: List[RejectedAssumption] = []
for cert in inp.certifications:
caps = set(cert_hypotheses.get(cert, []))
relevant = sorted(caps & required)
if relevant:
inferred.append(InferredAssumption(
certification=cert, capabilities=relevant,
statement="%s legt %d relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt"
% (cert, len(relevant))))
elif caps:
rejected.append(RejectedAssumption(
certification=cert,
statement="%s ist für dieses Ziel nicht relevant" % cert,
reason="relevance(evidence, target) = 0 — keine geforderte Fähigkeit abgedeckt"))
# next best questions (<=5): re-rank the RS-005 requests by info gain + leverage + risk + evidence-gap
known_ev = set(inp.known_evidence)
scored = []
for q in assess.question_requests:
lev = len(covers_targets.get(q.capability_id, []))
ev_missing = 1 if (q.expected_evidence and not (set(q.expected_evidence) & known_ev)) else 0
score = _GAIN.get(q.information_gain.value, 1) + lev + _RISK.get(q.priority.value, 0) + ev_missing
scored.append((score, q))
scored.sort(key=lambda x: (-x[0], x[1].capability_id))
next_q = [
AdvisorQuestion(capability_id=q.capability_id, question_intent=q.question_intent, why=q.reason,
information_value=float(s), priority=q.priority.value)
for s, q in scored[:5]
]
delta = sorted({c.capability_id for c in assess.coverage if c.status == CoverageStatus.MISSING})
plan = roadmap_from_delta(assess, {c: covers_targets.get(c, []) for c in delta})
measures = [AdvisorMeasure(capability_id=m.capability_id, leverage=m.leverage, closes=m.covers)
for m in plan.ranked_measures[:5]]
evidence = sorted({e for q in assess.question_requests for e in q.expected_evidence})
applicable = list(inp.target) or [target_id]
rep = assess_completeness(applicable, corpus_status or {}, uncertain=uncertain or [])
unsupported = [e.subject for e in rep.exclusions]
probably = assess.summary.probably_covered
return AdvisorResult(
inferred_assumptions=inferred, rejected_assumptions=rejected, next_best_questions=next_q,
capability_delta=delta, top_measures=measures, evidence_requests=evidence,
unsupported_domains=unsupported, completeness_summary=rep.completeness_summary,
headline="%d Anforderungen erkannt · %d wahrscheinlich abgedeckt · %d zu klären"
% (len(assess.coverage), len(probably), len(next_q)))
def apply_answer(known_capabilities: Sequence[str], capability_id: str, answer: str) -> List[str]:
"""Update the known-capability set from one answer. `answer` in {confirmed, rejected, unknown}.
A confirmed answer adds the capability to the known set (shrinking the delta on the next run);
rejected/unknown leave it open. This is how every answer updates the profile (criterion 6).
"""
known = list(dict.fromkeys(known_capabilities))
if answer == "confirmed" and capability_id not in known:
known.append(capability_id)
return known
@@ -0,0 +1,62 @@
"""Schemas for the Smart Onboarding Advisor — the onboarding RUNTIME step.
DTOs only. The Advisor ORCHESTRATES the existing engines (Company 2A, RS-005, optimization,
completeness) — no new reasoning engine, no new capability registry, no new meta-model. Welt-1
discipline: a certificate yields PROBABLE capabilities (verification required), never "erfüllt".
Python 3.9 compatible (no `|` unions).
"""
from __future__ import annotations
from typing import List, Optional
from pydantic import BaseModel, Field
class OnboardingInput(BaseModel):
company: str = ""
industry: Optional[str] = None
products: List[str] = Field(default_factory=list)
markets: List[str] = Field(default_factory=list)
certifications: List[str] = Field(default_factory=list)
known_evidence: List[str] = Field(default_factory=list)
target: List[str] = Field(default_factory=list) # informational; the delta uses injected requirements
class InferredAssumption(BaseModel):
certification: str
capabilities: List[str] = Field(default_factory=list) # RELEVANT-to-target caps the cert probably provides
verification_required: bool = True # Welt-1: never auto-satisfied
statement: str = ""
class RejectedAssumption(BaseModel):
certification: Optional[str] = None
statement: str = ""
reason: str = "" # e.g. "relevance(evidence, target) = 0"
class AdvisorQuestion(BaseModel):
capability_id: str
question_intent: str
why: str # every question explains itself
information_value: float = 0.0 # deterministic rank score
priority: str = "medium"
class AdvisorMeasure(BaseModel):
capability_id: str
leverage: int = 0
closes: List[str] = Field(default_factory=list)
class AdvisorResult(BaseModel):
inferred_assumptions: List[InferredAssumption] = Field(default_factory=list)
rejected_assumptions: List[RejectedAssumption] = Field(default_factory=list)
next_best_questions: List[AdvisorQuestion] = Field(default_factory=list) # max 5
capability_delta: List[str] = Field(default_factory=list)
top_measures: List[AdvisorMeasure] = Field(default_factory=list)
evidence_requests: List[str] = Field(default_factory=list)
unsupported_domains: List[str] = Field(default_factory=list)
completeness_summary: str = ""
headline: str = "" # "N erkannt, M wahrscheinlich abgedeckt, K zu klären"
@@ -0,0 +1,44 @@
# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)
_Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die Orchestrierung über die bestehenden Engines (Company 2A · RS-005 · Optimization · Completeness). Synthetisch, keine echten Namen._
## Eingabe
> Zertifizierungen: **ISO9001, ISO27001, ISO14001, TISAX** · Produkt: **Parkschein-/Schrankensystem** · Ziel: **CRA**
## Was wir erkannt haben
> 17 Anforderungen erkannt · 6 wahrscheinlich abgedeckt · 5 zu klären
**Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**
- ISO9001 legt 1 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- ISO27001 legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- TISAX legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- _ISO14001 ist für dieses Ziel nicht relevant — relevance(evidence, target) = 0 — keine geforderte Fähigkeit abgedeckt_
## Die wenigen offenen Punkte — nur die nächsten besten Fragen
**Frage 1 von 5** _(Informationswert 8)_
> product cyber risk assessment? — _Warum fragen wir das: Keine Anhaltspunkte im Unternehmensprofil — klären._
**Frage 2 von 5** _(Informationswert 8)_
> protection against corruption of safety functions? — _Warum fragen wir das: Keine Anhaltspunkte im Unternehmensprofil — klären._
**Frage 3 von 5** _(Informationswert 8)_
> secure signed update distribution? — _Warum fragen wir das: Keine Anhaltspunkte im Unternehmensprofil — klären._
**Frage 4 von 5** _(Informationswert 7)_
> coordinated vulnerability disclosure? — _Warum fragen wir das: Keine Anhaltspunkte im Unternehmensprofil — klären._
**Frage 5 von 5** _(Informationswert 7)_
> exploited vuln and incident reporting? — _Warum fragen wir das: Keine Anhaltspunkte im Unternehmensprofil — klären._
## Womit zuerst anfangen (größter Hebel)
- `product_cyber_risk_assessment` — schließt 2 Anforderung(en): CRA, MaschinenVO
- `protection_against_corruption_of_safety_functions` — schließt 2 Anforderung(en): CRA, MaschinenVO
- `secure_signed_update_distribution` — schließt 2 Anforderung(en): CRA, MaschinenVO
- `coordinated_vulnerability_disclosure` — schließt 1 Anforderung(en): CRA
- `exploited_vuln_and_incident_reporting` — schließt 1 Anforderung(en): CRA
## Vollständigkeit (ehrlich)
> Identifiziert 1 · bewertet 1 · offen 0 · Unsicherheiten 0 · Begründung ja
---
_Der Vertrieb wählt KEIN Regelwerk und interpretiert nichts — er sieht nur dieses Ergebnis. Jede beantwortete Frage aktualisiert das Capability Profile und verkleinert das Delta._
@@ -0,0 +1,72 @@
# ruff: noqa
# mypy: ignore-errors
"""Smart Onboarding Advisor demo — what the frontend shows, automatically (no sales interpretation).
The user types company + products + certifications + target. The Advisor orchestrates the existing
engines and returns the next best questions, assumptions and measures. Sales sees only the result.
Synthetic, no real names. Non-runtime demo of a runtime step.
Run: cd backend-compliance && PYTHONPATH=. python3 reference_scenarios/onboarding_advisor_demo.py
"""
from __future__ import annotations
import os
import yaml
from compliance.onboarding import OnboardingInput, advisor_start
from compliance.transition_reasoning import TargetRequirement
OUT = []
def w(s=""):
OUT.append(s)
CRA = yaml.safe_load(open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "transition_patterns",
"transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml"), encoding="utf-8"))
infosec = [a["capability"] for a in CRA["likely_covered"]]
req = [TargetRequirement(capability_id=a["capability"]) for a in CRA["likely_covered"]]
req += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", [])) for d in CRA["delta_requirements"]]
covers = {d["capability"]: d.get("covers_targets", []) for d in CRA["delta_requirements"]}
hyp = {"ISO27001": infosec, "TISAX": infosec,
"ISO9001": ["ce_conformity_assessment_and_technical_documentation"],
"ISO14001": ["environmental_management_documentation"]}
inp = OnboardingInput(company="synthetisch", industry="machine_builder",
products=["Parkschein-/Schrankensystem"], markets=["EU", "DE"],
certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"})
w("# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)")
w("")
w("_Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die Orchestrierung über die bestehenden Engines (Company 2A · RS-005 · Optimization · Completeness). Synthetisch, keine echten Namen._")
w("")
w("## Eingabe")
w("> Zertifizierungen: **%s** · Produkt: **%s** · Ziel: **%s**" % (", ".join(inp.certifications), inp.products[0], ", ".join(inp.target)))
w("")
w("## Was wir erkannt haben")
w("> %s" % res.headline)
w("")
w("**Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**")
for a in res.inferred_assumptions:
w("- %s" % a.statement)
for r in res.rejected_assumptions:
w("- _%s%s_" % (r.statement, r.reason))
w("")
w("## Die wenigen offenen Punkte — nur die nächsten besten Fragen")
for n, q in enumerate(res.next_best_questions, 1):
w("**Frage %d von %d** _(Informationswert %.0f)_" % (n, len(res.next_best_questions), q.information_value))
w("> %s? — _Warum fragen wir das: %s_" % (q.capability_id.replace("_", " "), q.why))
w("")
w("## Womit zuerst anfangen (größter Hebel)")
for m in res.top_measures[:5]:
w("- `%s` — schließt %d Anforderung(en): %s" % (m.capability_id, m.leverage, ", ".join(m.closes) or ""))
w("")
w("## Vollständigkeit (ehrlich)")
w("> %s" % res.completeness_summary)
w("")
w("---")
w("_Der Vertrieb wählt KEIN Regelwerk und interpretiert nichts — er sieht nur dieses Ergebnis. Jede beantwortete Frage aktualisiert das Capability Profile und verkleinert das Delta._")
print("\n".join(OUT))
@@ -0,0 +1,90 @@
"""Smart Onboarding Advisor — acceptance tests (the 7 criteria).
A synthetic multi-certified company (ISO 9001 + ISO 27001 + ISO 14001 + TISAX) onboards toward the CRA.
The Advisor orchestrates the existing engines and must satisfy: multi-cert works; ISO 14001 is not
falsely relevant; certs reduce questions but satisfy nothing automatically (Welt-1); <=5 self-explaining
next-best questions; answers update the profile (delta shrinks); sales selects/interprets nothing.
"""
from __future__ import annotations
import os
import yaml
from compliance.onboarding import OnboardingInput, advisor_start, apply_answer
from compliance.transition_reasoning import TargetRequirement
_CRA = yaml.safe_load(open(os.path.join(
os.path.dirname(__file__), "..", "knowledge", "transition_patterns",
"transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml"), encoding="utf-8"))
_INFOSEC = [a["capability"] for a in _CRA["likely_covered"]]
_REQ = [TargetRequirement(capability_id=a["capability"]) for a in _CRA["likely_covered"]]
_REQ += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", []))
for d in _CRA["delta_requirements"]]
_COVERS = {d["capability"]: d.get("covers_targets", []) for d in _CRA["delta_requirements"]}
_HYP = {
"ISO27001": _INFOSEC,
"TISAX": _INFOSEC,
"ISO9001": ["ce_conformity_assessment_and_technical_documentation"], # a CRA delta cap (relevant)
"ISO14001": ["environmental_management_documentation"], # NOT in the CRA required set
}
_INPUT = OnboardingInput(
company="synthetic", industry="machine_builder", products=["parking payment system"],
markets=["EU"], certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
def _run(inp=_INPUT, hyp=_HYP):
return advisor_start(inp, hyp, _REQ, target_id="CRA", covers_targets=_COVERS,
corpus_status={"CRA": "validated"})
def test_1_multi_certification_works():
res = _run()
certs = {a.certification for a in res.inferred_assumptions}
assert {"ISO27001", "ISO9001"} <= certs # several certs contribute inferred capabilities
def test_2_iso14001_not_falsely_relevant_for_cra():
res = _run()
assert any(r.certification == "ISO14001" for r in res.rejected_assumptions)
assert all(a.certification != "ISO14001" for a in res.inferred_assumptions)
def test_3_certs_reduce_questions_but_satisfy_nothing_automatically():
res = _run()
for a in res.inferred_assumptions:
assert a.verification_required is True
assert "nicht automatisch erfüllt" in a.statement
def test_4_at_most_five_next_best_questions():
res = _run()
assert 0 < len(res.next_best_questions) <= 5
def test_5_every_question_explains_why():
res = _run()
assert all(q.why.strip() for q in res.next_best_questions)
def test_6_each_answer_updates_the_profile():
res = _run()
open_cap = res.capability_delta[0]
# the answer "confirmed" adds the capability; re-running shrinks the delta
confirmed = apply_answer([], open_cap, "confirmed")
assert confirmed == [open_cap]
hyp2 = {**_HYP, "ANSWERED": confirmed}
inp2 = _INPUT.model_copy(update={"certifications": _INPUT.certifications + ["ANSWERED"]})
res2 = advisor_start(inp2, hyp2, _REQ, target_id="CRA", covers_targets=_COVERS, corpus_status={"CRA": "validated"})
assert len(res2.capability_delta) < len(res.capability_delta)
def test_7_sales_selects_nothing_engine_produces_everything():
res = _run()
# from plain inputs the engine produced the whole advisory payload
assert res.headline and res.capability_delta and res.top_measures and res.evidence_requests
assert res.completeness_summary
@@ -0,0 +1,45 @@
# ADR-012: Smart Onboarding Advisor — make the knowledge usable in onboarding (orchestration)
- **Status:** Accepted
- **Datum:** 2026-06-28
- **Typ:** Architektur-Entscheidung (Runtime-Schritt — Orchestrierung, KEINE neue Engine)
- **Bezug:** [ADR-003](ADR-003-capability-delta-engine-with-renderers.md), [ADR-011](ADR-011-journey-matcher-delta-explains-journey.md), [[strategy-knowledge-layers-and-hypotheses]], [[evidence-attributed-to-origin]], [[transition-reasoning]]
## Kontext
Das Wissen ist gebaut; der nächste Schritt ist, es **automatisch im Onboarding** nutzbar zu machen — der
Vertrieb ist nicht schulbar und darf KEINE Regelwerke auswählen oder interpretieren. Zugleich gilt die
reale Grenze: **proprietäre Normen (ISO/TISAX/PCI…) dürfen nicht ingestiert werden** — also wird aus den
Zertifikaten über eine **kuratierte Hypothese** (Welt-1) auf *wahrscheinlich vorhandene* Fähigkeiten
geschlossen, nie auf „erfüllt".
## Entscheidung
1. **`compliance/onboarding/` ist ein ORCHESTRATOR, keine neue Engine.** Er verdrahtet die bestehenden
Bausteine zu einem Flow: Company 2A (`Evidence→Capability`) → RS-005 (`Capability→Delta`) →
Optimization (`Delta→Roadmap`) → Completeness. Keine neue Reasoning-Engine, Capability-Registry oder
Metamodell (Freeze).
2. **`advisor_start(input, cert_hypotheses, target_requirements, …)`** liefert: `inferred_assumptions`,
`rejected_assumptions`, `next_best_questions` (≤5), `capability_delta`, `top_measures`,
`evidence_requests`, `unsupported_domains`, `completeness_summary`.
3. **Welt-1-Disziplin:** Zertifikate **reduzieren Fragen, erfüllen aber NICHTS automatisch**
(`verification_required`). **`relevance(evidence, target)`** hält ISO 14001 aus dem CRA-Ergebnis heraus
(nicht-relevante Zertifikate → `rejected_assumptions`, Grund „relevance = 0").
4. **Nur die nächsten besten Fragen** (≤5), deterministisch gerankt nach
`information_gain + leverage (regulatory+business) + unknown_high_risk + evidence_missing`; **jede Frage
erklärt sich** (`why`). **Jede Antwort aktualisiert das Profil** (`apply_answer` → Delta schrumpft).
5. **Zertifikat→Capability-Hypothesen und Ziel-Anforderungen werden INJIZIERT** — kuratiertes Wissen,
NICHT im Code ([[evidence-attributed-to-origin]]). Die Hypothesen-Kuratierung ist ein eigener,
ausgelagerter Knowledge-Task.
## Konsequenzen
- **Erster „App-Caller", der die Engines zu einem Produkt-Flow verbindet** — der vom User benannte
„richtige nächste Runtime-Schritt". Noch OHNE Endpoint/DB-Persistenz → aktuell **kein Runtime-Effekt →
kein Deploy** ([ADR-001](ADR-001-runtime-deploy-policy.md)); Deploy, sobald `POST /onboarding/advisor-start`
+ Persistenz + Frontend verdrahtet sind (Folgeschritt).
- **7 Akzeptanzkriterien erfüllt + getestet** (Multi-Cert · ISO 14001 nicht fälschlich relevant ·
Welt-1 · ≤5 Fragen · Frage erklärt sich · Antwort updatet Profil · Vertrieb interpretiert nichts).
- **Langfristiger Moat = EMPIRIE:** `confidence` der Hypothesen kommt später aus BEOBACHTUNGEN
(bestätigt/widerlegt je Kunde), nicht vom LLM — drei Wissensebenen
([[strategy-knowledge-layers-and-hypotheses]]).