feat: Observation Model — the empirical learning unit, defined BEFORE persistence (Task 59a)

The learning point is not the hypothesis, it is the QUESTION — and confirmed/refuted is too coarse.
"partial, only critical suppliers" or "certified but not lived" are not "wrong", they are valuable
knowledge. So the chain is Hypothesis -> Question -> Observation -> (Review) -> Hypothesis, and the
observation model must be defined cleanly before any store/API (else thousands of too-coarse
observations get migrated later).

compliance/onboarding/observations.py:
  - ObservationType: confirmed / partial / refuted / not_applicable / unknown (richer than binary).
  - Observation: {hypothesis_id, capability, question, answer (free text), observation_type,
    scope_note ("only critical suppliers"), evidence_uploaded, reviewed, reviewed_by}.
  - empirical_distribution() -> a DISTRIBUTION (confirmed 61 / partial 31 / refuted 8), not one %.
  - empirical_confidence() -> (confirmed + 0.5*partial) / (confirmed+partial+refuted); n.a./unknown
    excluded; None until calibrated.
  - REVIEW GATE: only reviewed observations calibrate — a raw answer never changes a hypothesis (no
    learning from outliers).

Refactor: the hypothesis is now PURE curated knowledge — the binary observations counter and any
confidence are removed from CapabilityHypothesis and the YAML; confidence is COMPUTED from the separate
reviewed observation stream. Pure, mypy --strict clean. Persistence/aggregation/calibration are 59b/c/d.
Non-runtime -> no deploy. 12 tests pass, check-loc 0.
This commit is contained in:
Benjamin Admin
2026-06-28 13:31:43 +02:00
parent 59b7006e5a
commit 98d616d82b
5 changed files with 143 additions and 76 deletions
@@ -14,12 +14,13 @@ import yaml
from compliance.onboarding import (
CapabilityHypothesis,
HypothesisObservations,
Observation,
ObservationType,
OnboardingInput,
advisor_start,
empirical_confidence,
empirical_distribution,
inferred_hypotheses,
record_observation,
resolve_for_certifications,
)
from compliance.transition_reasoning import TargetRequirement
@@ -47,13 +48,21 @@ def test_multi_certification_merges_automatically():
assert "sbom_creation" not in caps and "secure_signed_update_distribution" not in caps
def test_empirical_confidence_is_computed_not_assigned():
obs = HypothesisObservations()
assert empirical_confidence(obs) is None # null until observed
obs = record_observation(obs, True)
obs = record_observation(obs, True)
obs = record_observation(obs, False)
assert empirical_confidence(obs) == 0.67 # 2 / 3, from observations only
def test_observations_are_richer_than_binary_and_review_gated():
# the learning unit is the QUESTION; an answer can be partial with a scope note, not just yes/no
raw = [Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.CONFIRMED)]
assert empirical_confidence(raw) is None # unreviewed -> does NOT calibrate (review gate)
obs = [
Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.CONFIRMED, reviewed=True),
Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.PARTIAL,
scope_note="nur kritische Lieferanten", reviewed=True),
Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.REFUTED, reviewed=True),
Observation(hypothesis_id="HYP-supplier", observation_type=ObservationType.NOT_APPLICABLE, reviewed=True),
]
dist = empirical_distribution(obs) # a DISTRIBUTION, not a single percentage
assert dist["confirmed"] == 1 and dist["partial"] == 1 and dist["refuted"] == 1 and dist["not_applicable"] == 1
# confidence = (confirmed + 0.5*partial) / (confirmed+partial+refuted); n.a. excluded from the base
assert empirical_confidence(obs) == 0.5
def test_resolve_adapts_to_advisor_input():