feat: Signal Producer interface + Normalizer — one signal language for all sources (before #58)

Not scanner stubs — the scanners exist. The Silent Pass needs only their UNIFIED output. This adds the
small common DATA FORMAT (not a new module/framework) the user asked for, exactly the Requirement-
Source / MCAP / regulation-alias pattern: many inputs, one language.

  Producer A / B / C  ->  normalize_signals (vocabulary: id + aliases)  ->  canonical IntakeSignal  ->  Silent Pass

- ProducedSignal {signal_id, source_type, confidence, evidence, provenance} = what ANY source emits
  (website scanner, repo scanner, PDF parser, tender parser, API, the user).
- knowledge/onboarding/signal_vocabulary.yaml reduces producer dialects to a canonical signal: "SBOM
  present" arrives as cyclonedx_found / spdx_found / sbom_uploaded / requires_sbom (tender) — all become
  `sbom_file_found`. The Silent Pass cannot tell where it came from -> no per-scanner special logic, ever.
- Unknown signals pass through (a new producer stays visible). confidence/evidence/provenance flow to
  the detected capability for the audit trail.

A tender that "requires SBOM" now produces the same effect as a repo that HAS one — fits Vision V2
(Requirement Source over Regulation). Endpoint (#58) then has its final shape: POST -> Producers ->
Normalizer -> Silent Pass -> Profile -> Delta -> Questions -> Roadmap. Non-runtime -> no deploy. mypy
--strict clean, 14 onboarding tests pass, check-loc 0.
This commit is contained in:
Benjamin Admin
2026-06-28 14:49:57 +02:00
parent 9c33582412
commit c2c8f7e424
7 changed files with 184 additions and 16 deletions
@@ -6,6 +6,7 @@ _Eingabe: Unternehmen + Produkte + Zertifizierungen + Ziel. Den Rest macht die O
> Zertifizierungen: **ISO9001, ISO27001, ISO14001, TISAX** · Produkt: **Parkschein-/Schrankensystem** · Ziel: **CRA**
## Phase 0 — Stille Vorbefüllung (BEVOR eine Frage erscheint)
- **Signal Producer (verschiedene Dialekte → ein kanonisches Signal):** `vdp_found`(website), `cyclonedx_found`(repository), `cosign_found`(repository), `risk_assessment_pdf`(document), `cloud_hosted`(product), `plc_detected`(product)
> Stille Vorbefüllung: 4 Fähigkeit(en) automatisch erkannt, 2 Produktfakt(en), 4 Nachweis(e) bereits vorhanden.
- **Automatisch erkannte Fähigkeiten:** `coordinated_vulnerability_disclosure`, `product_cyber_risk_assessment`, `sbom_creation`, `secure_signed_update_distribution`
- **Produktfakten (steuern den Scope):** `connected_to_internet=true`, `is_machine=true`
@@ -13,8 +13,8 @@ import os
import yaml
from compliance.onboarding import (
CapabilityHypothesis, IntakeSignal, OnboardingInput, SignalMapping,
advisor_start, resolve_for_certifications, silent_intake,
CapabilityHypothesis, OnboardingInput, ProducedSignal, SignalMapping, SignalVocabularyEntry,
advisor_start, normalize_signals, resolve_for_certifications, silent_intake,
)
from compliance.transition_reasoning import TargetRequirement
@@ -40,15 +40,17 @@ inp = OnboardingInput(company="synthetisch", industry="machine_builder",
certifications=["ISO9001", "ISO27001", "ISO14001", "TISAX"],
known_evidence=["CE process"], target=["CRA"])
hyp = resolve_for_certifications(inp.certifications, _lib)
# Phase 0 — Silent Knowledge Pass: recognise everything possible from scanner signals BEFORE asking.
_smap = [SignalMapping(**m) for m in yaml.safe_load(
open(os.path.join(os.path.dirname(__file__), "..", "knowledge", "onboarding", "intake_signal_map.yaml"), encoding="utf-8"))["mappings"]]
_signals = [IntakeSignal(source="website", signal="security_txt_or_cvd_policy", detail="/.well-known/security.txt"),
IntakeSignal(source="repository", signal="sbom_file_found", detail="sbom.cdx.json"),
IntakeSignal(source="repository", signal="signed_releases"),
IntakeSignal(source="document", signal="product_risk_assessment_doc"),
IntakeSignal(source="product", signal="cloud_connectivity"),
IntakeSignal(source="product", signal="plc_sps")]
# Phase 0 — Signal Producers emit raw dialects -> Normalizer -> one canonical stream -> Silent Pass.
_K = os.path.join(os.path.dirname(__file__), "..", "knowledge", "onboarding")
_vocab = [SignalVocabularyEntry(**v) for v in yaml.safe_load(open(os.path.join(_K, "signal_vocabulary.yaml"), encoding="utf-8"))["signals"]]
_smap = [SignalMapping(**m) for m in yaml.safe_load(open(os.path.join(_K, "intake_signal_map.yaml"), encoding="utf-8"))["mappings"]]
_produced = [ProducedSignal(signal_id="vdp_found", source_type="website", provenance="/.well-known/security.txt"),
ProducedSignal(signal_id="cyclonedx_found", source_type="repository", evidence="sbom", provenance="sbom.cdx.json"),
ProducedSignal(signal_id="cosign_found", source_type="repository", provenance="cosign.pub"),
ProducedSignal(signal_id="risk_assessment_pdf", source_type="document", provenance="risk_assessment.pdf"),
ProducedSignal(signal_id="cloud_hosted", source_type="product"),
ProducedSignal(signal_id="plc_detected", source_type="product")]
_signals = normalize_signals(_produced, _vocab) # raw producer dialects -> ONE canonical signal language
si = silent_intake(_signals, _smap)
res = advisor_start(inp, hyp, req, target_id="CRA", covers_targets=covers, corpus_status={"CRA": "validated"},
detected_capabilities=si.capability_ids())
@@ -61,6 +63,7 @@ w("## Eingabe")
w("> Zertifizierungen: **%s** · Produkt: **%s** · Ziel: **%s**" % (", ".join(inp.certifications), inp.products[0], ", ".join(inp.target)))
w("")
w("## Phase 0 — Stille Vorbefüllung (BEVOR eine Frage erscheint)")
w("- **Signal Producer (verschiedene Dialekte → ein kanonisches Signal):** %s" % ", ".join("`%s`(%s)" % (p.signal_id, p.source_type) for p in _produced))
w("> %s" % si.summary)
w("- **Automatisch erkannte Fähigkeiten:** %s" % ", ".join("`%s`" % d.capability for d in si.detected_capabilities))
w("- **Produktfakten (steuern den Scope):** %s" % ", ".join("`%s=%s`" % (f.key, f.value) for f in si.product_facts))