Files
breakpilot-compliance/backend-compliance/compliance/onboarding/__init__.py
T
Benjamin Admin 7df15010ff feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)
Per the user's decision (2026-06-28): observations are CALIBRATION data for the knowledge base, NOT
business data and NOT product-DB data. So they live with the other versioned knowledge artifacts as an
append-only JSONL log under knowledge/observations/ — NO migration, NO DB. (A real persistence layer is
only warranted once thousands of onboardings exist; not before.)

  - ObservationRecord = Observation + log metadata (observation_id, timestamp [caller-stamped, no hidden
    clock], customer_archetype [anonymised — NEVER a real name], evidence, provenance, knowledge_version).
  - append_observation() writes one JSON line; append-only, lines are never rewritten. A later review is a
    NEW line with the same observation_id; load_observations(reconcile=True) keeps the latest per id.
  - load_observations() reads a single .jsonl or a directory of monthly .jsonl files.
  - aggregate_by_hypothesis() (59c) -> per-hypothesis distribution + confidence, COMPUTED from the log
    (computed-not-stored); the review gate (reviewed-only) is enforced in empirical_distribution/confidence.
  - review_queue() -> the unreviewed worklist. Observation -> Review -> Accepted -> recompute, never
    Observation -> confidence++. Nothing is ever written back to a hypothesis.

You can `rm` the log and recompute, `git diff` it over months, or rebuild confidence under a new policy —
fully consistent with computed-not-stored and the product/knowledge data separation.

Non-runtime (module + tests only, no endpoint) -> origin/main, NO dev deploy. 5 new tests (append-only,
review supersession, review-gate statistics, queue, monthly-file load); 27 onboarding tests pass, mypy
--strict clean (9 modules), check-loc 0. 59d (surface computed confidence at runtime) stays a later step.
2026-06-28 16:29:54 +02:00

87 lines
2.1 KiB
Python

"""Smart Onboarding Advisor — the onboarding runtime step (orchestration over existing engines).
Turns (company + products + certifications + target) into inferred assumptions, the next best questions
(<=5, each self-explaining), the capability delta, top measures, evidence requests and completeness —
with NO sales interpretation and NO regulation picking. Orchestrator only: no new engine/registry/
meta-model; certificate->capability hypotheses and target requirements are INJECTED.
"""
from __future__ import annotations
from .engine import advisor_start, apply_answer
from .hypotheses import (
CapabilityHypothesis,
inferred_hypotheses,
resolve_for_certifications,
)
from .observations import (
Observation,
ObservationType,
empirical_confidence,
empirical_distribution,
reviewed,
)
from .observation_log import (
HypothesisStats,
ObservationRecord,
aggregate_by_hypothesis,
append_observation,
load_observations,
review_queue,
)
from .signals import (
ProducedSignal,
SignalVocabularyEntry,
normalize_signals,
)
from .silent_intake import (
DetectedCapability,
IntakeSignal,
ProductFact,
SignalMapping,
SilentIntakeResult,
silent_intake,
)
from .schemas import (
AdvisorMeasure,
AdvisorQuestion,
AdvisorResult,
InferredAssumption,
OnboardingInput,
RejectedAssumption,
)
__all__ = [
"advisor_start",
"apply_answer",
"OnboardingInput",
"AdvisorResult",
"AdvisorQuestion",
"AdvisorMeasure",
"InferredAssumption",
"RejectedAssumption",
"CapabilityHypothesis",
"inferred_hypotheses",
"resolve_for_certifications",
"Observation",
"ObservationType",
"empirical_distribution",
"empirical_confidence",
"reviewed",
"silent_intake",
"IntakeSignal",
"SignalMapping",
"DetectedCapability",
"ProductFact",
"SilentIntakeResult",
"ProducedSignal",
"SignalVocabularyEntry",
"normalize_signals",
"ObservationRecord",
"HypothesisStats",
"append_observation",
"load_observations",
"aggregate_by_hypothesis",
"review_queue",
]