Commit Graph

2 Commits

Author SHA1 Message Date
Benjamin Admin c39787ad96 fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM
Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the
Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of
signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an
SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and
suppressed the very question that should have been asked.

Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring):
  - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical
    SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two).
  - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req);
    security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required.
    requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state).
  - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in
    `requirements_seen` (visible/auditable) but NEVER become a detected capability.
  - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through.

This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on:
observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec /
law now produces requirement signals; scanners / repos / documents produce observation signals.

Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin
the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while
`cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not
auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests
pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs-
detected decoupling follows as Fix B before #59.)
2026-06-28 15:52:50 +02:00
Benjamin Admin c2c8f7e424 feat: Signal Producer interface + Normalizer — one signal language for all sources (before #58)
Not scanner stubs — the scanners exist. The Silent Pass needs only their UNIFIED output. This adds the
small common DATA FORMAT (not a new module/framework) the user asked for, exactly the Requirement-
Source / MCAP / regulation-alias pattern: many inputs, one language.

  Producer A / B / C  ->  normalize_signals (vocabulary: id + aliases)  ->  canonical IntakeSignal  ->  Silent Pass

- ProducedSignal {signal_id, source_type, confidence, evidence, provenance} = what ANY source emits
  (website scanner, repo scanner, PDF parser, tender parser, API, the user).
- knowledge/onboarding/signal_vocabulary.yaml reduces producer dialects to a canonical signal: "SBOM
  present" arrives as cyclonedx_found / spdx_found / sbom_uploaded / requires_sbom (tender) — all become
  `sbom_file_found`. The Silent Pass cannot tell where it came from -> no per-scanner special logic, ever.
- Unknown signals pass through (a new producer stays visible). confidence/evidence/provenance flow to
  the detected capability for the audit trail.

A tender that "requires SBOM" now produces the same effect as a repo that HAS one — fits Vision V2
(Requirement Source over Regulation). Endpoint (#58) then has its final shape: POST -> Producers ->
Normalizer -> Silent Pass -> Profile -> Delta -> Questions -> Roadmap. Non-runtime -> no deploy. mypy
--strict clean, 14 onboarding tests pass, check-loc 0.
2026-06-28 14:49:57 +02:00