fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM

Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the
Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of
signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an
SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and
suppressed the very question that should have been asked.

Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring):
  - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical
    SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two).
  - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req);
    security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required.
    requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state).
  - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in
    `requirements_seen` (visible/auditable) but NEVER become a detected capability.
  - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through.

This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on:
observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec /
law now produces requirement signals; scanners / repos / documents produce observation signals.

Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin
the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while
`cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not
auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests
pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs-
detected decoupling follows as Fix B before #59.)
This commit is contained in:
Benjamin Admin
2026-06-28 15:52:50 +02:00
parent b5b6cdddb3
commit c39787ad96
7 changed files with 121 additions and 42 deletions
@@ -1,16 +1,21 @@
"""Signal Producer interface + Normalizer — one signal language for all sources (NOT new architecture). """Signal Producer interface + Normalizer — one signal language, but TWO signal KINDS.
The platform already HAS scanners (website, repo/code, SBOM, security headers, TLS, SPF/DKIM/DMARC, The platform already HAS scanners (website, repo/code, SBOM, security headers, TLS, SPF/DKIM/DMARC,
document analysis, RAG over uploads, product classification). The Silent Pass does not want a document analysis, RAG over uploads, product classification). The Silent Pass does not want a
WebsiteScanner or a RepoScanner — it wants their UNIFIED output. So every source (a scanner, a PDF WebsiteScanner or a RepoScanner — it wants their UNIFIED output. So every source (a scanner, a PDF
parser, a tender parser, an API, or the user) emits the SAME `ProducedSignal` parser, a tender parser, an OEM spec, an API, or the user) emits the SAME `ProducedSignal`
{signal_id, source_type, confidence, evidence, provenance}, and `normalize_signals` reduces producer- {signal_id, source_type, kind, confidence, evidence, provenance}, and `normalize_signals` reduces
specific signal ids to ONE canonical signal id via a vocabulary (id + aliases) — exactly the producer-specific ids to ONE canonical signal via a vocabulary (id + aliases + kind) — exactly the
Requirement-Source / MCAP / regulation-alias pattern. The Silent Pass then never gets per-scanner logic. Requirement-Source / MCAP / regulation-alias pattern. The Silent Pass then never gets per-scanner logic.
A common DATA FORMAT, not a new module/framework. Later a tender (`requires_sbom`) or an OEM spec CRITICAL — a signal is one of two KINDS, and they NEVER substitute for each other:
(`supplier_requires_psirt`) produces the same stream as a website — the Silent Pass cannot tell the observation = "I SAW X" — a repo with an SBOM, a published security.txt, a risk-assessment PDF.
difference. Pure, deterministic, no I/O. Python 3.9 compatible. requirement = "someone DEMANDS X" — a tender clause `requires_sbom`, an OEM spec `supplier_requires_psirt`.
A demanded SBOM is NOT a present SBOM. `kind` is carried on the canonical VOCABULARY entry (authoritative),
so even a mislabelled producer signal cannot collapse the two. The Silent Pass consumes ONLY observations;
requirement signals are preserved and feed the required-set / prioritisation later. This Observation-vs-
Requirement split is the very one the Requirements Verification Platform rests on: Observations (reality)
vs Requirements (targets); their comparison IS the delta. Pure, deterministic, no I/O. Python 3.9 compatible.
""" """
from __future__ import annotations from __future__ import annotations
@@ -27,15 +32,17 @@ class ProducedSignal(BaseModel):
signal_id: str # raw or canonical id the producer used signal_id: str # raw or canonical id the producer used
source_type: str = "" # website / repository / document / product / tender / oem / user / api source_type: str = "" # website / repository / document / product / tender / oem / user / api
kind: str = "" # "observation" | "requirement"; empty -> resolved from the vocabulary
confidence: float = 1.0 confidence: float = 1.0
evidence: Optional[str] = None # the artifact found (already in hand) evidence: Optional[str] = None # the artifact found (already in hand)
provenance: str = "" # url / filename / tender clause / "customer statement" provenance: str = "" # url / filename / tender clause / "customer statement"
class SignalVocabularyEntry(BaseModel): class SignalVocabularyEntry(BaseModel):
"""One canonical signal + the producer-specific aliases that mean the same thing.""" """One canonical signal + its aliases + its KIND (the authoritative observation/requirement label)."""
id: str id: str
kind: str = "observation" # "observation" (I saw X) | "requirement" (someone DEMANDS X)
aliases: List[str] = Field(default_factory=list) aliases: List[str] = Field(default_factory=list)
@@ -44,18 +51,23 @@ def normalize_signals(
) -> List[IntakeSignal]: ) -> List[IntakeSignal]:
"""Reduce heterogeneous producer signals to the canonical IntakeSignal stream (alias resolution). """Reduce heterogeneous producer signals to the canonical IntakeSignal stream (alias resolution).
Unknown signal ids pass through unchanged (a new producer's signal stays visible, not silently The canonical vocabulary entry's `kind` is AUTHORITATIVE — a producer cannot relabel a requirement as
dropped). Deterministic; carries confidence/evidence/provenance for the audit trail. an observation (that is what stops a demanded SBOM from masquerading as a present one). Unknown signal
ids pass through unchanged (a new producer's signal stays visible, not silently dropped) and keep the
producer-declared kind (default observation). Deterministic; carries confidence/evidence/provenance.
""" """
alias: Dict[str, str] = {} alias: Dict[str, str] = {}
kind_of: Dict[str, str] = {}
for v in vocabulary: for v in vocabulary:
alias[v.id] = v.id alias[v.id] = v.id
kind_of[v.id] = v.kind
for a in v.aliases: for a in v.aliases:
alias[a] = v.id alias[a] = v.id
out: List[IntakeSignal] = [] out: List[IntakeSignal] = []
for p in produced: for p in produced:
canonical = alias.get(p.signal_id, p.signal_id) canonical = alias.get(p.signal_id, p.signal_id)
kind = kind_of.get(canonical) or p.kind or "observation"
out.append(IntakeSignal( out.append(IntakeSignal(
source=p.source_type, signal=canonical, confidence=p.confidence, source=p.source_type, signal=canonical, kind=kind, confidence=p.confidence,
evidence=p.evidence, provenance=p.provenance)) evidence=p.evidence, provenance=p.provenance))
return out return out
@@ -24,7 +24,8 @@ class IntakeSignal(BaseModel):
from a website, a repo, a PDF, a tender or the user — normalize_signals() unified them (see signals.py).""" from a website, a repo, a PDF, a tender or the user — normalize_signals() unified them (see signals.py)."""
source: str # source_type: website / repository / document / product / tender / user source: str # source_type: website / repository / document / product / tender / user
signal: str # CANONICAL signal id, e.g. "sbom_file_found" signal: str # CANONICAL signal id, e.g. "sbom_present"
kind: str = "observation" # "observation" (I saw X) | "requirement" (someone DEMANDS X)
confidence: float = 1.0 # carried from the producer confidence: float = 1.0 # carried from the producer
evidence: Optional[str] = None # the artifact already in hand evidence: Optional[str] = None # the artifact already in hand
provenance: str = "" # where it came from (url / filename / tender clause) — audit trail provenance: str = "" # where it came from (url / filename / tender clause) — audit trail
@@ -61,10 +62,13 @@ class SilentIntakeResult(BaseModel):
detected_capabilities: List[DetectedCapability] = Field(default_factory=list) detected_capabilities: List[DetectedCapability] = Field(default_factory=list)
product_facts: List[ProductFact] = Field(default_factory=list) product_facts: List[ProductFact] = Field(default_factory=list)
evidence_found: List[str] = Field(default_factory=list) evidence_found: List[str] = Field(default_factory=list)
requirements_seen: List[str] = Field(default_factory=list) # requirement-kind signals — preserved, NOT present
summary: str = "" summary: str = ""
def capability_ids(self) -> List[str]: def capability_ids(self) -> List[str]:
"""The detected capability ids — fed into the Advisor as already-present (delta-reducing).""" """The detected capability ids — fed into the Advisor as already-present (delta-reducing).
ONLY observation-kind signals reach here (requirements never become a present capability)."""
return sorted({d.capability for d in self.detected_capabilities}) return sorted({d.capability for d in self.detected_capabilities})
@@ -83,7 +87,11 @@ def silent_intake(
caps: Dict[str, DetectedCapability] = {} caps: Dict[str, DetectedCapability] = {}
facts: Dict[str, ProductFact] = {} facts: Dict[str, ProductFact] = {}
evidence: Set[str] = set() evidence: Set[str] = set()
requirements: Set[str] = set()
for s in signals: for s in signals:
if s.kind != "observation": # a requirement describes a TARGET, never the present state
requirements.add(s.signal) # preserved + visible, but NEVER turned into a capability
continue
for m in by_signal.get(s.signal, []): for m in by_signal.get(s.signal, []):
if m.capability and m.capability not in caps: if m.capability and m.capability not in caps:
caps[m.capability] = DetectedCapability( caps[m.capability] = DetectedCapability(
@@ -97,10 +105,12 @@ def silent_intake(
detected = [caps[k] for k in sorted(caps)] detected = [caps[k] for k in sorted(caps)]
product_facts = [facts[k] for k in sorted(facts)] product_facts = [facts[k] for k in sorted(facts)]
requirements_seen = sorted(requirements)
summary = ( summary = (
"Stille Vorbefüllung: %d Fähigkeit(en) automatisch erkannt, %d Produktfakt(en), %d Nachweis(e) bereits vorhanden." "Stille Vorbefüllung: %d Fähigkeit(en) automatisch erkannt, %d Produktfakt(en), %d Nachweis(e) "
% (len(detected), len(product_facts), len(evidence)) "bereits vorhanden, %d Anforderung(en) erkannt (nicht als vorhanden gewertet)."
% (len(detected), len(product_facts), len(evidence), len(requirements_seen))
) )
return SilentIntakeResult( return SilentIntakeResult(
detected_capabilities=detected, product_facts=product_facts, detected_capabilities=detected, product_facts=product_facts,
evidence_found=sorted(evidence), summary=summary) evidence_found=sorted(evidence), requirements_seen=requirements_seen, summary=summary)
@@ -7,13 +7,16 @@
# are UPSTREAM and produce the signals; this file only interprets them. No norm text, no real names. # are UPSTREAM and produce the signals; this file only interprets them. No norm text, no real names.
mappings: mappings:
# Only OBSERVATION-kind signals appear here. requirement-kind signals (sbom_required, psirt_required,
# signed_updates_required) are intentionally ABSENT — they describe a target, never the present state,
# and the Silent Pass would never consume them anyway (it filters on kind == observation).
# ── website ─────────────────────────────────────────────────────────────────────────────── # ── website ───────────────────────────────────────────────────────────────────────────────
- {signal: security_txt_or_cvd_policy, capability: coordinated_vulnerability_disclosure, relationship: detected, evidence: cvd_policy} - {signal: cvd_policy_present, capability: coordinated_vulnerability_disclosure, relationship: detected, evidence: cvd_policy}
- {signal: ce_marking_on_site, capability: ce_conformity_assessment_and_technical_documentation, relationship: partial, evidence: ce_declaration} - {signal: ce_marking_on_site, capability: ce_conformity_assessment_and_technical_documentation, relationship: partial, evidence: ce_declaration}
- {signal: support_lifecycle_page, capability: security_update_support_period, relationship: partial, evidence: support_policy} - {signal: support_lifecycle_page, capability: security_update_support_period, relationship: partial, evidence: support_policy}
- {signal: security_policy_page, capability: information_security_management, relationship: partial} - {signal: security_policy_page, capability: information_security_management, relationship: partial}
# ── repository ──────────────────────────────────────────────────────────────────────────── # ── repository ────────────────────────────────────────────────────────────────────────────
- {signal: sbom_file_found, capability: sbom_creation, relationship: detected, evidence: sbom} - {signal: sbom_present, capability: sbom_creation, relationship: detected, evidence: sbom}
- {signal: signed_releases, capability: secure_signed_update_distribution, relationship: detected, evidence: signing_config} - {signal: signed_releases, capability: secure_signed_update_distribution, relationship: detected, evidence: signing_config}
- {signal: github_actions_ci, capability: secure_development_lifecycle, relationship: partial, evidence: ci_pipeline} - {signal: github_actions_ci, capability: secure_development_lifecycle, relationship: partial, evidence: ci_pipeline}
- {signal: dependency_scanning, capability: technical_vulnerability_management, relationship: partial, evidence: vuln_scanning_config} - {signal: dependency_scanning, capability: technical_vulnerability_management, relationship: partial, evidence: vuln_scanning_config}
@@ -1,14 +1,23 @@
# Signal Vocabulary — canonical signal id + the producer-specific aliases that mean the same thing. # Signal Vocabulary — canonical signal id + aliases + KIND. One language, but TWO kinds of signal.
# #
# The same fact ("SBOM present") can arrive as CycloneDX, SPDX, a GitHub Action, a Maven plugin, a # The same fact ("SBOM present") can arrive as CycloneDX, SPDX, a GitHub Action, a Maven plugin, a
# document upload, a customer statement, a tender clause or a repo file. For the Silent Pass they are # document upload or a customer statement — for the Silent Pass they are ALL `sbom_present`. This file
# ALL identical: `sbom_file_found`. This file reduces them to one canonical signal — same pattern as the # reduces producer dialects to one canonical signal — same pattern as the regulation-alias vocabulary,
# regulation-alias vocabulary, MCAPs and Requirement Sources: many inputs, one language. No scanner- # MCAPs and Requirement Sources: many inputs, one language. No scanner-specific logic reaches the Silent
# specific logic ever reaches the Silent Pass. Pure DATA, injected into normalize_signals(). No real names. # Pass. Pure DATA, injected into normalize_signals(). No real names.
#
# KIND is the load-bearing distinction (default: observation):
# observation = "I SAW X" — a repo with an SBOM, a published security.txt, a risk-assessment PDF.
# requirement = "someone DEMANDS X" — a tender clause `requires_sbom`, an OEM spec `supplier_requires_psirt`.
# A DEMANDED SBOM is NOT a PRESENT SBOM. `kind` lives on the canonical entry (AUTHORITATIVE), so even a
# mislabelled producer signal cannot collapse the two. The Silent Pass consumes ONLY observations;
# requirement signals are preserved (requirements_seen) and drive the required-set / prioritisation later
# (Requirement Source). This is the Observation-vs-Requirement split the Verification Platform rests on.
signals: signals:
- {id: sbom_file_found, aliases: [cyclonedx_found, spdx_found, sbom_in_repo, sbom_present, sbom_uploaded, requires_sbom, sbom_in_tender]} # ── OBSERVATIONS — "I saw X" (kind: observation, the default) ────────────────────────────────
- {id: security_txt_or_cvd_policy, aliases: [security_txt, vdp_found, cvd_policy_pdf, psirt_page, coordinated_disclosure_policy, supplier_requires_psirt]} - {id: sbom_present, aliases: [cyclonedx_found, spdx_found, sbom_in_repo, sbom_uploaded]}
- {id: cvd_policy_present, aliases: [security_txt, vdp_found, cvd_policy_pdf, psirt_page, coordinated_disclosure_policy]}
- {id: signed_releases, aliases: [signed_artifacts, cosign_found, gpg_signed_releases, code_signing_cert, secure_boot]} - {id: signed_releases, aliases: [signed_artifacts, cosign_found, gpg_signed_releases, code_signing_cert, secure_boot]}
- {id: github_actions_ci, aliases: [ci_pipeline, gitlab_ci, jenkins_pipeline, build_automation]} - {id: github_actions_ci, aliases: [ci_pipeline, gitlab_ci, jenkins_pipeline, build_automation]}
- {id: dependency_scanning, aliases: [dependabot, renovate, snyk_found, trivy_in_ci, sca_tool]} - {id: dependency_scanning, aliases: [dependabot, renovate, snyk_found, trivy_in_ci, sca_tool]}
@@ -19,10 +28,17 @@ signals:
- {id: product_risk_assessment_doc, aliases: [risk_assessment_pdf, hazard_analysis_doc, tara_doc]} - {id: product_risk_assessment_doc, aliases: [risk_assessment_pdf, hazard_analysis_doc, tara_doc]}
- {id: patch_policy_doc, aliases: [patch_management_policy, update_policy_pdf]} - {id: patch_policy_doc, aliases: [patch_management_policy, update_policy_pdf]}
- {id: incident_response_plan_doc, aliases: [irp_doc, incident_playbook]} - {id: incident_response_plan_doc, aliases: [irp_doc, incident_playbook]}
# product facts # product facts (also observations: an observed product property that drives scope)
- {id: cloud_connectivity, aliases: [cloud_hosted, saas, internet_facing, connected_product]} - {id: cloud_connectivity, aliases: [cloud_hosted, saas, internet_facing, connected_product]}
- {id: plc_sps, aliases: [plc_detected, sps_steuerung, industrial_controller]} - {id: plc_sps, aliases: [plc_detected, sps_steuerung, industrial_controller]}
- {id: embedded_software, aliases: [firmware_present, embedded_device]} - {id: embedded_software, aliases: [firmware_present, embedded_device]}
- {id: wireless_radio, aliases: [bluetooth, wifi_module, radio_equipment, funkmodul]} - {id: wireless_radio, aliases: [bluetooth, wifi_module, radio_equipment, funkmodul]}
- {id: remote_access, aliases: [remote_maintenance, vpn_access, teleservice, fernwartung]} - {id: remote_access, aliases: [remote_maintenance, vpn_access, teleservice, fernwartung]}
- {id: generates_usage_data, aliases: [telemetry_collected, usage_analytics]} - {id: generates_usage_data, aliases: [telemetry_collected, usage_analytics]}
# ── REQUIREMENTS — "someone DEMANDS X" (kind: requirement; NEVER read as present) ─────────────
# Preserved + visible, but the Silent Pass does NOT turn them into detected capabilities. A tender /
# OEM spec / law lands here; a scanner / repo / document lands above. Intentionally UNMAPPED in
# intake_signal_map.yaml — they describe the target, not the present state.
- {id: sbom_required, kind: requirement, aliases: [requires_sbom, sbom_in_tender, tender_requires_sbom]}
- {id: psirt_required, kind: requirement, aliases: [supplier_requires_psirt, requires_psirt, requires_cvd, oem_requires_psirt]}
- {id: signed_updates_required, kind: requirement, aliases: [requires_signed_updates, supplier_requires_signed_updates]}
@@ -47,6 +47,20 @@ def test_advisor_start_returns_full_payload():
assert "sbom_creation" not in {q["capability_id"] for q in d["top_5_questions"]} # detected -> not asked assert "sbom_creation" not in {q["capability_id"] for q in d["top_5_questions"]} # detected -> not asked
def test_requirement_signal_does_not_auto_detect_capability():
# a tender that DEMANDS an SBOM (requirement) must NOT be read as "SBOM present": sbom_creation stays
# open (asked / in the delta), unlike a real cyclonedx_found observation.
body = dict(_BODY, scanner_findings=[
{"signal_id": "requires_sbom", "source_type": "tender", "provenance": "tender §4.2"},
])
r = _client.post("/onboarding/advisor-start", json=body)
assert r.status_code == 200, r.text
d = r.json()
assert "sbom_creation" not in d["auto_detected"] # demanded != present
asked = {q["capability_id"] for q in d["top_5_questions"]}
assert "sbom_creation" in asked or "sbom_creation" in d["capability_delta"] # still an open gap
def test_unknown_target_is_404(): def test_unknown_target_is_404():
body = dict(_BODY, target="NOPE") body = dict(_BODY, target="NOPE")
r = _client.post("/onboarding/advisor-start", json=body) r = _client.post("/onboarding/advisor-start", json=body)
@@ -1,9 +1,10 @@
"""Signal Producer + Normalizer — one signal language for all sources. """Signal Producer + Normalizer — one signal language, but TWO signal KINDS.
Pins the abstraction the user asked for: every source emits the same ProducedSignal, and the Normalizer Pins the abstraction: every source emits the same ProducedSignal, and the Normalizer reduces
reduces producer-specific signal ids to ONE canonical signal via a vocabulary. The Silent Pass therefore producer-specific ids to ONE canonical signal via a vocabulary. CRITICAL: an OBSERVATION ("I saw an
cannot tell whether "SBOM present" came from a website, a repo, a PDF, a tender or the user — and gets no SBOM") and a REQUIREMENT ("a tender DEMANDS an SBOM") must NEVER collapse to the same signal — a
per-scanner logic. demanded SBOM is not a present one. kind is authoritative on the canonical vocabulary entry, and the
Silent Pass consumes only observations.
""" """
from __future__ import annotations from __future__ import annotations
@@ -27,24 +28,47 @@ _MAP = [SignalMapping(**m) for m in yaml.safe_load(
open(os.path.join(_DIR, "..", "knowledge", "onboarding", "intake_signal_map.yaml"), encoding="utf-8"))["mappings"]] open(os.path.join(_DIR, "..", "knowledge", "onboarding", "intake_signal_map.yaml"), encoding="utf-8"))["mappings"]]
def test_different_producers_yield_the_same_canonical_signal(): def test_observation_producers_yield_one_canonical_signal():
# the SAME fact, emitted by four totally different producers with different raw ids # the SAME OBSERVATION, emitted by three different producers with different raw ids
produced = [ produced = [
ProducedSignal(signal_id="cyclonedx_found", source_type="repository", provenance="sbom.cdx.json"), ProducedSignal(signal_id="cyclonedx_found", source_type="repository", provenance="sbom.cdx.json"),
ProducedSignal(signal_id="spdx_found", source_type="repository", provenance="sbom.spdx"), ProducedSignal(signal_id="spdx_found", source_type="repository", provenance="sbom.spdx"),
ProducedSignal(signal_id="sbom_uploaded", source_type="document", provenance="customer_upload.pdf"), ProducedSignal(signal_id="sbom_uploaded", source_type="document", provenance="customer_upload.pdf"),
ProducedSignal(signal_id="requires_sbom", source_type="tender", provenance="tender §4.2"),
] ]
normalized = normalize_signals(produced, _VOCAB) normalized = normalize_signals(produced, _VOCAB)
assert {s.signal for s in normalized} == {"sbom_file_found"} # all reduced to ONE canonical signal assert {s.signal for s in normalized} == {"sbom_present"} # all reduced to ONE canonical observation
assert {s.source for s in normalized} == {"repository", "document", "tender"} # provenance preserved assert {s.kind for s in normalized} == {"observation"} # all observations
assert {s.source for s in normalized} == {"repository", "document"} # provenance preserved
def test_silent_pass_consumes_normalized_signals_source_agnostic(): def test_requirement_and_observation_never_collapse():
# a tender that "requires SBOM" produces the same effect as a repo that HAS one # a tender that DEMANDS an SBOM must NOT become the same signal as a repo that HAS one
from_repo = normalize_signals([ProducedSignal(signal_id="cyclonedx_found", source_type="repository", evidence="sbom")], _VOCAB) normalized = normalize_signals([
ProducedSignal(signal_id="cyclonedx_found", source_type="repository"), # observation
ProducedSignal(signal_id="requires_sbom", source_type="tender", provenance="tender §4.2"), # requirement
], _VOCAB)
by_kind = {s.kind: s.signal for s in normalized}
assert by_kind["observation"] == "sbom_present"
assert by_kind["requirement"] == "sbom_required"
assert by_kind["observation"] != by_kind["requirement"]
def test_requirement_signal_produces_no_capability():
# the regression the whole fix is about: a DEMANDED SBOM yields NO detected capability,
# but is preserved as a requirement; a real SBOM in the repo still IS detected.
from_tender = normalize_signals([ProducedSignal(signal_id="requires_sbom", source_type="tender")], _VOCAB) from_tender = normalize_signals([ProducedSignal(signal_id="requires_sbom", source_type="tender")], _VOCAB)
assert silent_intake(from_repo, _MAP).capability_ids() == silent_intake(from_tender, _MAP).capability_ids() == ["sbom_creation"] res_tender = silent_intake(from_tender, _MAP)
assert res_tender.capability_ids() == [] # NOT read as present
assert res_tender.requirements_seen == ["sbom_required"] # but preserved + visible
from_repo = normalize_signals([ProducedSignal(signal_id="cyclonedx_found", source_type="repository", evidence="sbom")], _VOCAB)
assert silent_intake(from_repo, _MAP).capability_ids() == ["sbom_creation"]
def test_vocabulary_kind_overrides_a_mislabelled_producer():
# even if a producer wrongly tags a requirement as observation, the vocabulary is authoritative
norm = normalize_signals([ProducedSignal(signal_id="requires_sbom", source_type="tender", kind="observation")], _VOCAB)
assert norm[0].signal == "sbom_required" and norm[0].kind == "requirement"
def test_unknown_signal_passes_through_not_dropped(): def test_unknown_signal_passes_through_not_dropped():
@@ -36,8 +36,8 @@ _REQ += [TargetRequirement(capability_id=d["capability"], expected_evidence=d.ge
# scanner findings (injected): a machine builder with a public CVD policy, an SBOM + signed releases in # scanner findings (injected): a machine builder with a public CVD policy, an SBOM + signed releases in
# the repo, a product risk-assessment doc, and a cloud-connected PLC product. # the repo, a product risk-assessment doc, and a cloud-connected PLC product.
_SIGNALS = [ _SIGNALS = [
IntakeSignal(source="website", signal="security_txt_or_cvd_policy", detail="/.well-known/security.txt"), IntakeSignal(source="website", signal="cvd_policy_present", detail="/.well-known/security.txt"),
IntakeSignal(source="repository", signal="sbom_file_found", detail="sbom.cdx.json"), IntakeSignal(source="repository", signal="sbom_present", detail="sbom.cdx.json"),
IntakeSignal(source="repository", signal="signed_releases"), IntakeSignal(source="repository", signal="signed_releases"),
IntakeSignal(source="document", signal="product_risk_assessment_doc"), IntakeSignal(source="document", signal="product_risk_assessment_doc"),
IntakeSignal(source="product", signal="cloud_connectivity"), IntakeSignal(source="product", signal="cloud_connectivity"),