feat: Observation Model — the empirical learning unit, defined BEFORE persistence (Task 59a)

The learning point is not the hypothesis, it is the QUESTION — and confirmed/refuted is too coarse. "partial, only critical suppliers" or "certified but not lived" are not "wrong", they are valuable knowledge. So the chain is Hypothesis -> Question -> Observation -> (Review) -> Hypothesis, and the observation model must be defined cleanly before any store/API (else thousands of too-coarse observations get migrated later). compliance/onboarding/observations.py: - ObservationType: confirmed / partial / refuted / not_applicable / unknown (richer than binary). - Observation: {hypothesis_id, capability, question, answer (free text), observation_type, scope_note ("only critical suppliers"), evidence_uploaded, reviewed, reviewed_by}. - empirical_distribution() -> a DISTRIBUTION (confirmed 61 / partial 31 / refuted 8), not one %. - empirical_confidence() -> (confirmed + 0.5*partial) / (confirmed+partial+refuted); n.a./unknown excluded; None until calibrated. - REVIEW GATE: only reviewed observations calibrate — a raw answer never changes a hypothesis (no learning from outliers). Refactor: the hypothesis is now PURE curated knowledge — the binary observations counter and any confidence are removed from CapabilityHypothesis and the YAML; confidence is COMPUTED from the separate reviewed observation stream. Pure, mypy --strict clean. Persistence/aggregation/calibration are 59b/c/d. Non-runtime -> no deploy. 12 tests pass, check-loc 0.
2026-06-28 13:31:43 +02:00
parent 59b7006e5a
commit 98d616d82b
5 changed files with 143 additions and 76 deletions
@@ -14,12 +14,13 @@ import yaml

 from compliance.onboarding import (
    CapabilityHypothesis,
-    HypothesisObservations,
+    Observation,
+    ObservationType,
    OnboardingInput,
    advisor_start,
    empirical_confidence,
+    empirical_distribution,
    inferred_hypotheses,
-    record_observation,
    resolve_for_certifications,
 )
 from compliance.transition_reasoning import TargetRequirement
@@ -47,13 +48,21 @@ def test_multi_certification_merges_automatically():
    assert "sbom_creation" not in caps and "secure_signed_update_distribution" not in caps


-def test_empirical_confidence_is_computed_not_assigned():
-    obs = HypothesisObservations()
-    assert empirical_confidence(obs) is None               # null until observed
-    obs = record_observation(obs, True)
-    obs = record_observation(obs, True)
-    obs = record_observation(obs, False)
-    assert empirical_confidence(obs) == 0.67               # 2 / 3, from observations only
+def test_observations_are_richer_than_binary_and_review_gated():
+    # the learning unit is the QUESTION; an answer can be partial with a scope note, not just yes/no
+    raw = [Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.CONFIRMED)]
+    assert empirical_confidence(raw) is None                # unreviewed -> does NOT calibrate (review gate)
+    obs = [
+        Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.CONFIRMED, reviewed=True),
+        Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.PARTIAL,
+                    scope_note="nur kritische Lieferanten", reviewed=True),
+        Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.REFUTED, reviewed=True),
+        Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.NOT_APPLICABLE, reviewed=True),
+    ]
+    dist = empirical_distribution(obs)                      # a DISTRIBUTION, not a single percentage
+    assert dist["confirmed"] == 1 and dist["partial"] == 1 and dist["refuted"] == 1 and dist["not_applicable"] == 1
+    # confidence = (confirmed + 0.5*partial) / (confirmed+partial+refuted); n.a. excluded from the base
+    assert empirical_confidence(obs) == 0.5


 def test_resolve_adapts_to_advisor_input():