feat: Certification Capability Hypotheses — capability-centric library + empirical confidence

The bottleneck is knowledge, not the endpoint. This builds the knowledge the Onboarding Advisor needs, restructured per the user's key insight: NOT "ISO27001 -> 30 capabilities" but each hypothesis as its own object "capability -> supported_by: [certs]". A capability is written ONCE with all supporting certs, so the shared management-system core (document control, incident, supplier, audit, access, asset, monitoring, training, crypto, release, risk) covers most certifications with ~18 hypotheses instead of ~300 — and multi-certification merges AUTOMATICALLY (a company's inferred caps = every hypothesis whose supported_by intersects its certs). Welt-1 throughout: "IF cert present, EXPECT capability (verification required)", never "erfüllt". Capabilities NO cert suggests (SBOM, signed updates, CVD, support period) have no hypothesis -> they stay in the delta and get asked. confidence is EMPIRICAL: computed from real-onboarding observations (confirmed/(confirmed+refuted)), None until calibrated — never an LLM/expert score (record_observation + empirical_confidence). The long-term moat: knowledge that learns from reality, not from a norm. compliance/onboarding/hypotheses.py (resolve_for_certifications / inferred_hypotheses / empirical_ confidence / record_observation) feeds the existing advisor_start unchanged; the demo now runs on the curated library. Pure, mypy --strict clean, library is DATA (no norm text, no real names). Non-runtime -> no deploy. 12 tests pass, check-loc 0.
2026-06-28 13:16:45 +02:00
parent 02c9fdb18e
commit 2d2cb2a244
6 changed files with 260 additions and 7 deletions
@@ -6,12 +6,12 @@ _Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die O
 > Zertifizierungen: **ISO9001, ISO27001, ISO14001, TISAX** · Produkt: **Parkschein-/Schrankensystem** · Ziel: **CRA**

 ## Was wir erkannt haben
-> 17 Anforderungen erkannt · 6 wahrscheinlich abgedeckt · 5 zu klären
+> 17 Anforderungen erkannt · 5 wahrscheinlich abgedeckt · 5 zu klären

 **Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**
 - ISO9001 legt 1 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- ISO27001 legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- TISAX legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
+- ISO27001 legt 4 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
+- TISAX legt 4 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
 - _ISO14001 ist für dieses Ziel nicht relevant — relevance(evidence, target) = 0 — keine geforderte Fähigkeit abgedeckt_

 ## Die wenigen offenen Punkte — nur die nächsten besten Fragen
@@ -12,7 +12,7 @@ from __future__ import annotations
 import os
 import yaml

-from compliance.onboarding import OnboardingInput, advisor_start
+from compliance.onboarding import CapabilityHypothesis, OnboardingInput, advisor_start, resolve_for_certifications
 from compliance.transition_reasoning import TargetRequirement

 OUT = []
@@ -29,13 +29,14 @@ req = [TargetRequirement(capability_id=a["capability"]) for a in CRA["likely_cov
 req += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
                          expected_evidence=d.get("expected_evidence", [])) for d in CRA["delta_requirements"]]
 covers = {d["capability"]: d.get("covers_targets", []) for d in CRA["delta_requirements"]}
-hyp = {"ISO27001": infosec, "TISAX": infosec,
-       "ISO9001": ["ce_conformity_assessment_and_technical_documentation"],
-       "ISO14001": ["environmental_management_documentation"]}
+# certificate hypotheses come from the CURATED, capability-centric library (multi-cert merges automatically)
+_lib = [CapabilityHypothesis(**h) for h in yaml.safe_load(
+    open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "certification_hypotheses", "hypotheses.yaml"), encoding="utf-8"))["hypotheses"]]
 inp = OnboardingInput(company="synthetisch", industry="machine_builder",
                      products=["Parkschein-/Schrankensystem"], markets=["EU", "DE"],
                      certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
                      known_evidence=["CE process"], target=["CRA"])
+hyp = resolve_for_certifications(inp.certifications, _lib)
 res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"})

 w("# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)")