Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the
Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of
signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an
SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and
suppressed the very question that should have been asked.
Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring):
- `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical
SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two).
- Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req);
security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required.
requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state).
- silent_intake() consumes ONLY kind==observation; requirement signals are preserved in
`requirements_seen` (visible/auditable) but NEVER become a detected capability.
- normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through.
This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on:
observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec /
law now produces requirement signals; scanners / repos / documents produce observation signals.
Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin
the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while
`cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not
auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests
pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs-
detected decoupling follows as Fix B before #59.)
This exposes the existing Smart Onboarding Advisor through a runtime endpoint; it does not add new
reasoning logic. Tightly scoped: adapter boundary + endpoint, no big frontend, no persistence, no
empirical learning, no new scanners, no LLM.
POST /onboarding/advisor-start : (company + certifications + target + scanner_findings[ProducedSignal])
-> Normalizer -> Silent Knowledge Pass -> Advisor -> { silent_intake_summary, inferred_assumptions,
rejected_assumptions, top_5_questions, capability_delta, top_measures, evidence_requests,
completeness_summary, auto_detected, headline }
GET /onboarding/targets : the supported target ids (CRA, TISAX, MDR, Environmental)
compliance/services/onboarding_service.py is the app-caller: it loads the curated knowledge (hypothesis
library, signal vocabulary + map, the target's required capabilities) once and calls the pure, tested
orchestration (normalize_signals -> silent_intake -> advisor_start). The scanner ADAPTER boundary is the
ProducedSignal format the request carries — existing scanners emit it, no new scanners. Thin handler
(<30 LOC), registered in the auto-load list. No DB. Additive to the OpenAPI contract (contract test is
additive-friendly; baseline regenerates on CI/py3.12). First deployable runtime feature -> dev deploy +
smoke. mypy --strict clean, 22 onboarding tests pass, check-loc 0.