fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM
Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and suppressed the very question that should have been asked. Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring): - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two). - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req); security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required. requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state). - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in `requirements_seen` (visible/auditable) but NEVER become a detected capability. - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through. This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on: observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec / law now produces requirement signals; scanners / repos / documents produce observation signals. Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while `cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs- detected decoupling follows as Fix B before #59.)
This commit is contained in:
@@ -24,7 +24,8 @@ class IntakeSignal(BaseModel):
|
||||
from a website, a repo, a PDF, a tender or the user — normalize_signals() unified them (see signals.py)."""
|
||||
|
||||
source: str # source_type: website / repository / document / product / tender / user
|
||||
signal: str # CANONICAL signal id, e.g. "sbom_file_found"
|
||||
signal: str # CANONICAL signal id, e.g. "sbom_present"
|
||||
kind: str = "observation" # "observation" (I saw X) | "requirement" (someone DEMANDS X)
|
||||
confidence: float = 1.0 # carried from the producer
|
||||
evidence: Optional[str] = None # the artifact already in hand
|
||||
provenance: str = "" # where it came from (url / filename / tender clause) — audit trail
|
||||
@@ -61,10 +62,13 @@ class SilentIntakeResult(BaseModel):
|
||||
detected_capabilities: List[DetectedCapability] = Field(default_factory=list)
|
||||
product_facts: List[ProductFact] = Field(default_factory=list)
|
||||
evidence_found: List[str] = Field(default_factory=list)
|
||||
requirements_seen: List[str] = Field(default_factory=list) # requirement-kind signals — preserved, NOT present
|
||||
summary: str = ""
|
||||
|
||||
def capability_ids(self) -> List[str]:
|
||||
"""The detected capability ids — fed into the Advisor as already-present (delta-reducing)."""
|
||||
"""The detected capability ids — fed into the Advisor as already-present (delta-reducing).
|
||||
|
||||
ONLY observation-kind signals reach here (requirements never become a present capability)."""
|
||||
return sorted({d.capability for d in self.detected_capabilities})
|
||||
|
||||
|
||||
@@ -83,7 +87,11 @@ def silent_intake(
|
||||
caps: Dict[str, DetectedCapability] = {}
|
||||
facts: Dict[str, ProductFact] = {}
|
||||
evidence: Set[str] = set()
|
||||
requirements: Set[str] = set()
|
||||
for s in signals:
|
||||
if s.kind != "observation": # a requirement describes a TARGET, never the present state
|
||||
requirements.add(s.signal) # preserved + visible, but NEVER turned into a capability
|
||||
continue
|
||||
for m in by_signal.get(s.signal, []):
|
||||
if m.capability and m.capability not in caps:
|
||||
caps[m.capability] = DetectedCapability(
|
||||
@@ -97,10 +105,12 @@ def silent_intake(
|
||||
|
||||
detected = [caps[k] for k in sorted(caps)]
|
||||
product_facts = [facts[k] for k in sorted(facts)]
|
||||
requirements_seen = sorted(requirements)
|
||||
summary = (
|
||||
"Stille Vorbefüllung: %d Fähigkeit(en) automatisch erkannt, %d Produktfakt(en), %d Nachweis(e) bereits vorhanden."
|
||||
% (len(detected), len(product_facts), len(evidence))
|
||||
"Stille Vorbefüllung: %d Fähigkeit(en) automatisch erkannt, %d Produktfakt(en), %d Nachweis(e) "
|
||||
"bereits vorhanden, %d Anforderung(en) erkannt (nicht als vorhanden gewertet)."
|
||||
% (len(detected), len(product_facts), len(evidence), len(requirements_seen))
|
||||
)
|
||||
return SilentIntakeResult(
|
||||
detected_capabilities=detected, product_facts=product_facts,
|
||||
evidence_found=sorted(evidence), summary=summary)
|
||||
evidence_found=sorted(evidence), requirements_seen=requirements_seen, summary=summary)
|
||||
|
||||
Reference in New Issue
Block a user