Files
breakpilot-compliance/backend-compliance/reference_scenarios/onboarding_advisor_demo.py
T
Benjamin Admin 2d2cb2a244 feat: Certification Capability Hypotheses — capability-centric library + empirical confidence
The bottleneck is knowledge, not the endpoint. This builds the knowledge the Onboarding Advisor needs,
restructured per the user's key insight: NOT "ISO27001 -> 30 capabilities" but each hypothesis as its
own object "capability -> supported_by: [certs]". A capability is written ONCE with all supporting
certs, so the shared management-system core (document control, incident, supplier, audit, access,
asset, monitoring, training, crypto, release, risk) covers most certifications with ~18 hypotheses
instead of ~300 — and multi-certification merges AUTOMATICALLY (a company's inferred caps = every
hypothesis whose supported_by intersects its certs).

Welt-1 throughout: "IF cert present, EXPECT capability (verification required)", never "erfüllt".
Capabilities NO cert suggests (SBOM, signed updates, CVD, support period) have no hypothesis -> they
stay in the delta and get asked. confidence is EMPIRICAL: computed from real-onboarding observations
(confirmed/(confirmed+refuted)), None until calibrated — never an LLM/expert score (record_observation
+ empirical_confidence). The long-term moat: knowledge that learns from reality, not from a norm.

compliance/onboarding/hypotheses.py (resolve_for_certifications / inferred_hypotheses / empirical_
confidence / record_observation) feeds the existing advisor_start unchanged; the demo now runs on the
curated library. Pure, mypy --strict clean, library is DATA (no norm text, no real names). Non-runtime
-> no deploy. 12 tests pass, check-loc 0.
2026-06-28 13:16:45 +02:00

74 lines
3.8 KiB
Python

# ruff: noqa
# mypy: ignore-errors
"""Smart Onboarding Advisor demo — what the frontend shows, automatically (no sales interpretation).
The user types company + products + certifications + target. The Advisor orchestrates the existing
engines and returns the next best questions, assumptions and measures. Sales sees only the result.
Synthetic, no real names. Non-runtime demo of a runtime step.
Run: cd backend-compliance && PYTHONPATH=. python3 reference_scenarios/onboarding_advisor_demo.py
"""
from __future__ import annotations
import os
import yaml
from compliance.onboarding import CapabilityHypothesis, OnboardingInput, advisor_start, resolve_for_certifications
from compliance.transition_reasoning import TargetRequirement
OUT = []
def w(s=""):
OUT.append(s)
CRA = yaml.safe_load(open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "transition_patterns",
"transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml"), encoding="utf-8"))
infosec = [a["capability"] for a in CRA["likely_covered"]]
req = [TargetRequirement(capability_id=a["capability"]) for a in CRA["likely_covered"]]
req += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", [])) for d in CRA["delta_requirements"]]
covers = {d["capability"]: d.get("covers_targets", []) for d in CRA["delta_requirements"]}
# certificate hypotheses come from the CURATED, capability-centric library (multi-cert merges automatically)
_lib = [CapabilityHypothesis(**h) for h in yaml.safe_load(
open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "certification_hypotheses", "hypotheses.yaml"), encoding="utf-8"))["hypotheses"]]
inp = OnboardingInput(company="synthetisch", industry="machine_builder",
products=["Parkschein-/Schrankensystem"], markets=["EU", "DE"],
certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
hyp = resolve_for_certifications(inp.certifications, _lib)
res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"})
w("# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)")
w("")
w("_Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die Orchestrierung über die bestehenden Engines (Company 2A · RS-005 · Optimization · Completeness). Synthetisch, keine echten Namen._")
w("")
w("## Eingabe")
w("> Zertifizierungen: **%s** · Produkt: **%s** · Ziel: **%s**" % (", ".join(inp.certifications), inp.products[0], ", ".join(inp.target)))
w("")
w("## Was wir erkannt haben")
w("> %s" % res.headline)
w("")
w("**Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**")
for a in res.inferred_assumptions:
w("- %s" % a.statement)
for r in res.rejected_assumptions:
w("- _%s%s_" % (r.statement, r.reason))
w("")
w("## Die wenigen offenen Punkte — nur die nächsten besten Fragen")
for n, q in enumerate(res.next_best_questions, 1):
w("**Frage %d von %d** _(Informationswert %.0f)_" % (n, len(res.next_best_questions), q.information_value))
w("> %s? — _Warum fragen wir das: %s_" % (q.capability_id.replace("_", " "), q.why))
w("")
w("## Womit zuerst anfangen (größter Hebel)")
for m in res.top_measures[:5]:
w("- `%s` — schließt %d Anforderung(en): %s" % (m.capability_id, m.leverage, ", ".join(m.closes) or "—"))
w("")
w("## Vollständigkeit (ehrlich)")
w("> %s" % res.completeness_summary)
w("")
w("---")
w("_Der Vertrieb wählt KEIN Regelwerk und interpretiert nichts — er sieht nur dieses Ergebnis. Jede beantwortete Frage aktualisiert das Capability Profile und verkleinert das Delta._")
print("\n".join(OUT))