feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1) · 7df15010ff - breakpilot-compliance

feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)

Per the user's decision (2026-06-28): observations are CALIBRATION data for the knowledge base, NOT
business data and NOT product-DB data. So they live with the other versioned knowledge artifacts as an
append-only JSONL log under knowledge/observations/ — NO migration, NO DB. (A real persistence layer is
only warranted once thousands of onboardings exist; not before.)

  - ObservationRecord = Observation + log metadata (observation_id, timestamp [caller-stamped, no hidden
    clock], customer_archetype [anonymised — NEVER a real name], evidence, provenance, knowledge_version).
  - append_observation() writes one JSON line; append-only, lines are never rewritten. A later review is a
    NEW line with the same observation_id; load_observations(reconcile=True) keeps the latest per id.
  - load_observations() reads a single .jsonl or a directory of monthly .jsonl files.
  - aggregate_by_hypothesis() (59c) -> per-hypothesis distribution + confidence, COMPUTED from the log
    (computed-not-stored); the review gate (reviewed-only) is enforced in empirical_distribution/confidence.
  - review_queue() -> the unreviewed worklist. Observation -> Review -> Accepted -> recompute, never
    Observation -> confidence++. Nothing is ever written back to a hypothesis.

You can `rm` the log and recompute, `git diff` it over months, or rebuild confidence under a new policy —
fully consistent with computed-not-stored and the product/knowledge data separation.

Non-runtime (module + tests only, no endpoint) -> origin/main, NO dev deploy. 5 new tests (append-only,
review supersession, review-gate statistics, queue, monthly-file load); 27 onboarding tests pass, mypy
--strict clean (9 modules), check-loc 0. 59d (surface computed confidence at runtime) stays a later step.

This commit is contained in:

Benjamin Admin

2026-06-28 16:29:54 +02:00

parent e54f3cde94

commit 7df15010ff

4 changed files with 197 additions and 0 deletions

									
										backend-compliance/compliance/onboarding/__init__.py
									
		+14
		
												View File
												
				@@ -21,6 +21,14 @@ from .observations import (

				    empirical_distribution,

				    reviewed,

				)

				from .observation_log import (

				    HypothesisStats,

				    ObservationRecord,

				    aggregate_by_hypothesis,

				    append_observation,

				    load_observations,

				    review_queue,

				)

				from .signals import (

				    ProducedSignal,

				    SignalVocabularyEntry,

				@@ -69,4 +77,10 @@ __all__ = [

				    "ProducedSignal",

				    "SignalVocabularyEntry",

				    "normalize_signals",

				    "ObservationRecord",

				    "HypothesisStats",

				    "append_observation",

				    "load_observations",

				    "aggregate_by_hypothesis",

				    "review_queue",

				]