feat(onboarding): surface curated expert text + human capability labels (advisor was showing snake_case)

The advisor was structurally correct but unusable: every question showed a snake_case capability id plus a
single generic fallback reason ("Keine Anhaltspunkte im Unternehmensprofil — klären"). The expert text
already EXISTED in the transition patterns (why_asked / reviewable_claim) — the pipeline just dropped it.

  - transition_reasoning: TargetRequirement gains `rationale`; assess_transition uses it as the request
    reason when present, else the generic fallback (additive, backward-compatible for all consumers).
  - onboarding_service._target carries the pattern's why_asked (delta) and reviewable_claim (likely_covered)
    into the requirement rationale -> the question's `why`.
  - knowledge/onboarding/capability_labels.yaml: curated DE labels (id -> human), reusable across targets;
    labels_for() + response.capability_labels expose them; the frontend renders label || prettified id.

Now ISO27001->TISAX reads "Auftragsverarbeitung (Art. 28 DSGVO) — If a TISAX data label is in scope, you
must show Art. 28 GDPR processing-on-behalf controls; ISO 27001 does not establish these." instead of
"data_protection_processing_on_behalf — klären". why_asked text is still EN (existing knowledge; translation
is curation). 34 onboarding+transition tests pass, mypy --strict clean (13 modules), check-loc 0.
This commit is contained in:
Benjamin Admin
2026-06-28 18:46:56 +02:00
parent 5beb5a319a
commit 807a7002b2
7 changed files with 89 additions and 13 deletions
@@ -9,7 +9,7 @@ It adds NO new reasoning logic — it only exposes what exists. No DB, no persis
from __future__ import annotations
import os
from typing import Any, Dict, List, Sequence, Tuple
from typing import Any, Dict, Iterable, List, Sequence, Tuple
import yaml
@@ -37,6 +37,13 @@ def _load(*parts: str) -> Any:
_HYP_LIB = [CapabilityHypothesis(**h) for h in _load("certification_hypotheses", "hypotheses.yaml")["hypotheses"]]
_VOCAB = [SignalVocabularyEntry(**v) for v in _load("onboarding", "signal_vocabulary.yaml")["signals"]]
_SIGNAL_MAP = [SignalMapping(**m) for m in _load("onboarding", "intake_signal_map.yaml")["mappings"]]
_LABELS: Dict[str, str] = _load("onboarding", "capability_labels.yaml")["labels"]
def labels_for(capability_ids: Iterable[str]) -> Dict[str, str]:
"""Human labels (DE) for the given capability ids — presentation only. Ids without a curated label
are omitted (the frontend falls back to a prettified id). Deduped, deterministic."""
return {c: _LABELS[c] for c in dict.fromkeys(capability_ids) if c in _LABELS}
# target id -> transition pattern that defines its required capabilities (curated registry)
_TARGET_PATTERNS = {
@@ -53,9 +60,10 @@ def supported_targets() -> List[str]:
def _target(target_id: str) -> Tuple[List[TargetRequirement], Dict[str, List[str]]]:
pat = _load("transition_patterns", _TARGET_PATTERNS[target_id])
reqs = [TargetRequirement(capability_id=a["capability"]) for a in pat["likely_covered"]]
reqs = [TargetRequirement(capability_id=a["capability"], rationale=a.get("reviewable_claim", "")) for a in pat["likely_covered"]]
reqs += [TargetRequirement(capability_id=d["capability"], question_intent=d.get("needed_information", "verify_existence"),
expected_evidence=d.get("expected_evidence", [])) for d in pat["delta_requirements"]]
rationale=d.get("why_asked", ""), expected_evidence=d.get("expected_evidence", []))
for d in pat["delta_requirements"]]
covers = {d["capability"]: d.get("covers_targets", []) for d in pat["delta_requirements"]}
return reqs, covers