Files
breakpilot-compliance/backend-compliance/reference_scenarios/onboarding_advisor_demo.py
T
Benjamin Admin c2c8f7e424 feat: Signal Producer interface + Normalizer — one signal language for all sources (before #58)
Not scanner stubs — the scanners exist. The Silent Pass needs only their UNIFIED output. This adds the
small common DATA FORMAT (not a new module/framework) the user asked for, exactly the Requirement-
Source / MCAP / regulation-alias pattern: many inputs, one language.

  Producer A / B / C  ->  normalize_signals (vocabulary: id + aliases)  ->  canonical IntakeSignal  ->  Silent Pass

- ProducedSignal {signal_id, source_type, confidence, evidence, provenance} = what ANY source emits
  (website scanner, repo scanner, PDF parser, tender parser, API, the user).
- knowledge/onboarding/signal_vocabulary.yaml reduces producer dialects to a canonical signal: "SBOM
  present" arrives as cyclonedx_found / spdx_found / sbom_uploaded / requires_sbom (tender) — all become
  `sbom_file_found`. The Silent Pass cannot tell where it came from -> no per-scanner special logic, ever.
- Unknown signals pass through (a new producer stays visible). confidence/evidence/provenance flow to
  the detected capability for the audit trail.

A tender that "requires SBOM" now produces the same effect as a repo that HAS one — fits Vision V2
(Requirement Source over Regulation). Endpoint (#58) then has its final shape: POST -> Producers ->
Normalizer -> Silent Pass -> Profile -> Delta -> Questions -> Roadmap. Non-runtime -> no deploy. mypy
--strict clean, 14 onboarding tests pass, check-loc 0.
2026-06-28 14:49:57 +02:00

97 lines
5.7 KiB
Python

# ruff: noqa
# mypy: ignore-errors
"""Smart Onboarding Advisor demo — what the frontend shows, automatically (no sales interpretation).
The user types company + products + certifications + target. The Advisor orchestrates the existing
engines and returns the next best questions, assumptions and measures. Sales sees only the result.
Synthetic, no real names. Non-runtime demo of a runtime step.
Run: cd backend-compliance && PYTHONPATH=. python3 reference_scenarios/onboarding_advisor_demo.py
"""
from __future__ import annotations
import os
import yaml
from compliance.onboarding import (
CapabilityHypothesis, OnboardingInput, ProducedSignal, SignalMapping, SignalVocabularyEntry,
advisor_start, normalize_signals, resolve_for_certifications, silent_intake,
)
from compliance.transition_reasoning import TargetRequirement
OUT = []
def w(s=""):
OUT.append(s)
CRA = yaml.safe_load(open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "transition_patterns",
"transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml"), encoding="utf-8"))
infosec = [a["capability"] for a in CRA["likely_covered"]]
req = [TargetRequirement(capability_id=a["capability"]) for a in CRA["likely_covered"]]
req += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", [])) for d in CRA["delta_requirements"]]
covers = {d["capability"]: d.get("covers_targets", []) for d in CRA["delta_requirements"]}
# certificate hypotheses come from the CURATED, capability-centric library (multi-cert merges automatically)
_lib = [CapabilityHypothesis(**h) for h in yaml.safe_load(
open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "certification_hypotheses", "hypotheses.yaml"), encoding="utf-8"))["hypotheses"]]
inp = OnboardingInput(company="synthetisch", industry="machine_builder",
products=["Parkschein-/Schrankensystem"], markets=["EU", "DE"],
certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
hyp = resolve_for_certifications(inp.certifications, _lib)
# Phase 0 — Signal Producers emit raw dialects -> Normalizer -> one canonical stream -> Silent Pass.
_K = os.path.join(os.path.dirname(__file__), "..", "knowledge", "onboarding")
_vocab = [SignalVocabularyEntry(**v) for v in yaml.safe_load(open(os.path.join(_K, "signal_vocabulary.yaml"), encoding="utf-8"))["signals"]]
_smap = [SignalMapping(**m) for m in yaml.safe_load(open(os.path.join(_K, "intake_signal_map.yaml"), encoding="utf-8"))["mappings"]]
_produced = [ProducedSignal(signal_id="vdp_found", source_type="website", provenance="/.well-known/security.txt"),
ProducedSignal(signal_id="cyclonedx_found", source_type="repository", evidence="sbom", provenance="sbom.cdx.json"),
ProducedSignal(signal_id="cosign_found", source_type="repository", provenance="cosign.pub"),
ProducedSignal(signal_id="risk_assessment_pdf", source_type="document", provenance="risk_assessment.pdf"),
ProducedSignal(signal_id="cloud_hosted", source_type="product"),
ProducedSignal(signal_id="plc_detected", source_type="product")]
_signals = normalize_signals(_produced, _vocab) # raw producer dialects -> ONE canonical signal language
si = silent_intake(_signals, _smap)
res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"},
detected_capabilities=si.capability_ids())
w("# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)")
w("")
w("_Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die Orchestrierung über die bestehenden Engines (Company 2A · RS-005 · Optimization · Completeness). Synthetisch, keine echten Namen._")
w("")
w("## Eingabe")
w("> Zertifizierungen: **%s** · Produkt: **%s** · Ziel: **%s**" % (", ".join(inp.certifications), inp.products[0], ", ".join(inp.target)))
w("")
w("## Phase 0 — Stille Vorbefüllung (BEVOR eine Frage erscheint)")
w("- **Signal Producer (verschiedene Dialekte → ein kanonisches Signal):** %s" % ", ".join("`%s`(%s)" % (p.signal_id, p.source_type) for p in _produced))
w("> %s" % si.summary)
w("- **Automatisch erkannte Fähigkeiten:** %s" % ", ".join("`%s`" % d.capability for d in si.detected_capabilities))
w("- **Produktfakten (steuern den Scope):** %s" % ", ".join("`%s=%s`" % (f.key, f.value) for f in si.product_facts))
w("- **Nachweise bereits in der Hand (kein Upload nötig):** %s" % ", ".join(si.evidence_found))
w("")
w("## Was wir erkannt haben")
w("> %s" % res.headline)
w("")
w("**Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**")
for a in res.inferred_assumptions:
w("- %s" % a.statement)
for r in res.rejected_assumptions:
w("- _%s%s_" % (r.statement, r.reason))
w("")
w("## Die wenigen offenen Punkte — nur die nächsten besten Fragen")
for n, q in enumerate(res.next_best_questions, 1):
w("**Frage %d von %d** _(Informationswert %.0f)_" % (n, len(res.next_best_questions), q.information_value))
w("> %s? — _Warum fragen wir das: %s_" % (q.capability_id.replace("_", " "), q.why))
w("")
w("## Womit zuerst anfangen (größter Hebel)")
for m in res.top_measures[:5]:
w("- `%s` — schließt %d Anforderung(en): %s" % (m.capability_id, m.leverage, ", ".join(m.closes) or "—"))
w("")
w("## Vollständigkeit (ehrlich)")
w("> %s" % res.completeness_summary)
w("")
w("---")
w("_Der Vertrieb wählt KEIN Regelwerk und interpretiert nichts — er sieht nur dieses Ergebnis. Jede beantwortete Frage aktualisiert das Capability Profile und verkleinert das Delta._")
print("\n".join(OUT))