feat: Certification Capability Hypotheses — capability-centric library + empirical confidence

The bottleneck is knowledge, not the endpoint. This builds the knowledge the Onboarding Advisor needs,
restructured per the user's key insight: NOT "ISO27001 -> 30 capabilities" but each hypothesis as its
own object "capability -> supported_by: [certs]". A capability is written ONCE with all supporting
certs, so the shared management-system core (document control, incident, supplier, audit, access,
asset, monitoring, training, crypto, release, risk) covers most certifications with ~18 hypotheses
instead of ~300 — and multi-certification merges AUTOMATICALLY (a company's inferred caps = every
hypothesis whose supported_by intersects its certs).

Welt-1 throughout: "IF cert present, EXPECT capability (verification required)", never "erfüllt".
Capabilities NO cert suggests (SBOM, signed updates, CVD, support period) have no hypothesis -> they
stay in the delta and get asked. confidence is EMPIRICAL: computed from real-onboarding observations
(confirmed/(confirmed+refuted)), None until calibrated — never an LLM/expert score (record_observation
+ empirical_confidence). The long-term moat: knowledge that learns from reality, not from a norm.

compliance/onboarding/hypotheses.py (resolve_for_certifications / inferred_hypotheses / empirical_
confidence / record_observation) feeds the existing advisor_start unchanged; the demo now runs on the
curated library. Pure, mypy --strict clean, library is DATA (no norm text, no real names). Non-runtime
-> no deploy. 12 tests pass, check-loc 0.
This commit is contained in:
Benjamin Admin
2026-06-28 13:16:45 +02:00
parent 02c9fdb18e
commit 2d2cb2a244
6 changed files with 260 additions and 7 deletions
@@ -6,12 +6,12 @@ _Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die O
> Zertifizierungen: **ISO9001, ISO27001, ISO14001, TISAX** · Produkt: **Parkschein-/Schrankensystem** · Ziel: **CRA**
## Was wir erkannt haben
> 17 Anforderungen erkannt · 6 wahrscheinlich abgedeckt · 5 zu klären
> 17 Anforderungen erkannt · 5 wahrscheinlich abgedeckt · 5 zu klären
**Aus Ihren Zertifizierungen abgeleitet (zu bestätigen, nicht automatisch erfüllt):**
- ISO9001 legt 1 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- ISO27001 legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- TISAX legt 5 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- ISO27001 legt 4 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- TISAX legt 4 relevante Fähigkeit(en) nahe — Verifikation erforderlich, nicht automatisch erfüllt
- _ISO14001 ist für dieses Ziel nicht relevant — relevance(evidence, target) = 0 — keine geforderte Fähigkeit abgedeckt_
## Die wenigen offenen Punkte — nur die nächsten besten Fragen
@@ -12,7 +12,7 @@ from __future__ import annotations
import os
import yaml
from compliance.onboarding import OnboardingInput, advisor_start
from compliance.onboarding import CapabilityHypothesis, OnboardingInput, advisor_start, resolve_for_certifications
from compliance.transition_reasoning import TargetRequirement
OUT = []
@@ -29,13 +29,14 @@ req = [TargetRequirement(capability_id=a["capability"]) for a in CRA["likely_cov
req += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", [])) for d in CRA["delta_requirements"]]
covers = {d["capability"]: d.get("covers_targets", []) for d in CRA["delta_requirements"]}
hyp = {"ISO27001": infosec, "TISAX": infosec,
"ISO9001": ["ce_conformity_assessment_and_technical_documentation"],
"ISO14001": ["environmental_management_documentation"]}
# certificate hypotheses come from the CURATED, capability-centric library (multi-cert merges automatically)
_lib = [CapabilityHypothesis(**h) for h in yaml.safe_load(
open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "certification_hypotheses", "hypotheses.yaml"), encoding="utf-8"))["hypotheses"]]
inp = OnboardingInput(company="synthetisch", industry="machine_builder",
products=["Parkschein-/Schrankensystem"], markets=["EU", "DE"],
certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
hyp = resolve_for_certifications(inp.certifications, _lib)
res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"})
w("# Smart Onboarding Advisor — was der Nutzer sieht (automatisch, ohne Vertrieb)")