Files
breakpilot-compliance/backend-compliance/knowledge/onboarding/intake_signal_map.yaml
T
Benjamin Admin c39787ad96 fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM
Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the
Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of
signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an
SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and
suppressed the very question that should have been asked.

Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring):
  - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical
    SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two).
  - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req);
    security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required.
    requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state).
  - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in
    `requirements_seen` (visible/auditable) but NEVER become a detected capability.
  - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through.

This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on:
observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec /
law now produces requirement signals; scanners / repos / documents produce observation signals.

Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin
the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while
`cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not
auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests
pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs-
detected decoupling follows as Fix B before #59.)
2026-06-28 15:52:50 +02:00

35 lines
3.7 KiB
YAML

# Silent Knowledge Pass — signal -> conclusion map (curated DATA, injected).
#
# What a scanner finding lets us conclude WITHOUT asking the user. A signal yields either a capability
# the company demonstrably has (with the evidence already in hand) or a product fact that drives scope.
# `relationship: detected` = a concrete artifact (strong, no question); `partial` = indicative (still
# verify, but lower priority). The scanners (website crawler, repo scanner, doc parser, product intake)
# are UPSTREAM and produce the signals; this file only interprets them. No norm text, no real names.
mappings:
# Only OBSERVATION-kind signals appear here. requirement-kind signals (sbom_required, psirt_required,
# signed_updates_required) are intentionally ABSENT — they describe a target, never the present state,
# and the Silent Pass would never consume them anyway (it filters on kind == observation).
# ── website ───────────────────────────────────────────────────────────────────────────────
- {signal: cvd_policy_present, capability: coordinated_vulnerability_disclosure, relationship: detected, evidence: cvd_policy}
- {signal: ce_marking_on_site, capability: ce_conformity_assessment_and_technical_documentation, relationship: partial, evidence: ce_declaration}
- {signal: support_lifecycle_page, capability: security_update_support_period, relationship: partial, evidence: support_policy}
- {signal: security_policy_page, capability: information_security_management, relationship: partial}
# ── repository ────────────────────────────────────────────────────────────────────────────
- {signal: sbom_present, capability: sbom_creation, relationship: detected, evidence: sbom}
- {signal: signed_releases, capability: secure_signed_update_distribution, relationship: detected, evidence: signing_config}
- {signal: github_actions_ci, capability: secure_development_lifecycle, relationship: partial, evidence: ci_pipeline}
- {signal: dependency_scanning, capability: technical_vulnerability_management, relationship: partial, evidence: vuln_scanning_config}
# ── documents ─────────────────────────────────────────────────────────────────────────────
- {signal: ce_conformity_doc, capability: ce_conformity_assessment_and_technical_documentation, relationship: detected, evidence: technical_documentation}
- {signal: product_risk_assessment_doc, capability: product_cyber_risk_assessment, relationship: detected, evidence: product_risk_assessment}
- {signal: patch_policy_doc, capability: secure_signed_update_distribution, relationship: partial, evidence: patch_policy}
- {signal: incident_response_plan_doc, capability: incident_management, relationship: detected, evidence: incident_procedure}
# ── product facts (drive scope / target applicability) ──────────────────────────────────────
- {signal: cloud_connectivity, product_fact: connected_to_internet}
- {signal: plc_sps, product_fact: is_machine}
- {signal: embedded_software, product_fact: has_embedded_software}
- {signal: wireless_radio, product_fact: has_radio_equipment}
- {signal: remote_access, product_fact: has_remote_access}
- {signal: generates_usage_data, product_fact: generates_usage_data}