docs(architecture): RAG retrieval engine architecture set (01-09)
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
9 docs + index in docs-src/architecture/ documenting the deterministic retrieval engine: retrieval pipeline, authority rerank, source_class, source_role, control-intent + diversity, assessment, confidence, explainability + supersede, framework_* layer. Each doc carries the exact constants, the rationale behind them, code refs, and the failure class it addresses. Audit/onboarding reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# 05 — Control-Intent + Diversity
|
||||
|
||||
**Zweck:** Bei einer **Umsetzungsfrage** („Welche Controls/Maßnahmen passen?") den Control-Pool ([04](04-source-role.md)) über die abstrakte Pflicht heben — und sicherstellen, dass die Ergebnisliste **verschiedene Quellenarten** zeigt, statt dass eine Rolle sie flutet. Bei einer **Rechtsfrage** bleibt alles beim Authority-Rerank ([02](02-authority.md)).
|
||||
|
||||
## Intent-Erkennung
|
||||
|
||||
`queryWantsControls()` (`authority_rerank.go`) — Keyword-Match (`controlIntentSignals`):
|
||||
|
||||
> control, controls, maßnahme, schutzmaßnahme, best practice, umsetzen, implementier, absicher, härt, hardening, nist, owasp, grundschutz, ccm, iso 27001, isms
|
||||
|
||||
Nur wenn dieser Gate `true` ist, feuern `applyControlRoles()` und `ensureControlDiversity()`.
|
||||
|
||||
## Rollen-Boost (`applyControlRoles`)
|
||||
|
||||
Jeder Control-Pool-Treffer bekommt `controlPoolGain + controlRoleBonus[role]` auf den Score:
|
||||
|
||||
| Größe | Wert | Warum |
|
||||
|-------|------|-------|
|
||||
| `controlPoolGain` | `0.15` | hebt **jede** Control-Pool-Rolle über die Nicht-Control-Rollen (obligation/interpretation/definition) — sonst gewinnt die bindende abstrakte `obligation` per Autorität allein |
|
||||
| `controlRoleBonus[operational_requirement]` | `0.100` | weicher Intra-Pool-Vorrang (User 2026-06-24): op_req zuerst |
|
||||
| `controlRoleBonus[procedural_requirement]` | `0.075` | … dann Prozess-Pflichten |
|
||||
| `controlRoleBonus[control_standard]` | `0.050` | … dann Standard-Kataloge |
|
||||
| `controlRoleBonus[implementation_guidance]` | `0.000` | guidance als Basis, kein Bonus |
|
||||
|
||||
> **Bewusst weich, keine harte Hierarchie:** Eine semantisch dominante `implementation_guidance` (z.B. ENISA bei einer EU-Cyber-Umsetzungsfrage) **darf Top-1 bleiben** — das ist fachlich korrekt. Der Boost demoted nur die abstrakte Pflicht, er erzwingt keine Reihenfolge.
|
||||
|
||||
## Control-Diversity-Regel (`ensureControlDiversity`)
|
||||
|
||||
**Problem:** Selbst mit Boost kann eine dichte Wolke gleicher Rolle (viele ENISA-Chunks) `operational_requirement` und `control_standard` aus der Top-K verdrängen — die Quellenarten werden unsichtbar.
|
||||
|
||||
**Lösung (statt harter `+0.30`-Rollenkeule):** Wenn die Top-K nur `implementation_guidance` enthält, **injiziere** den besten `operational_requirement` + besten `control_standard` aus dem Pool, indem der niedrigst-platzierte redundante guidance-Slot verdrängt wird. Algorithmus:
|
||||
|
||||
1. Rolle jedes Treffers bestimmen (`roleAt`).
|
||||
2. Prüfen, welche Rollen in der Top-K vertreten sind.
|
||||
3. Für jede fehlende Wunsch-Rolle (`operational_requirement`, `control_standard`): besten Treffer dieser Rolle unterhalb der Top-K finden, niedrigste `implementation_guidance` in der Top-K überschreiben.
|
||||
4. Truncate auf `topK` (das ursprüngliche Duplikat fällt im Tail weg).
|
||||
|
||||
**Ergebnis live:** Umsetzungsfrage → `1.–4. ENISA · 5. NIST SP 800-82r3 (control_standard) · 6. MaschinenVO Anhang-III (op_req)`. ENISA behält Top-1, die anderen Quellenarten sind sichtbar.
|
||||
|
||||
> **Prinzip:** Nicht raten, nicht erzwingen, sondern relevante Quellenarten sichtbar machen.
|
||||
|
||||
## Code
|
||||
|
||||
- `authority_rerank.go` → `queryWantsControls()`
|
||||
- `control_role.go` → `applyControlRoles()`, `ensureControlDiversity()`
|
||||
|
||||
## Adressierte Fehlerklassen
|
||||
|
||||
- **„abstrakte Pflicht dominiert Umsetzungsfrage"** → `controlPoolGain`.
|
||||
- **„eine Rolle flutet die Top-K, Quellenarten unsichtbar"** → `ensureControlDiversity`.
|
||||
- **„harte Tier-Ordnung overfittet auf eine Frage"** → weicher Boost statt Keule.
|
||||
Reference in New Issue
Block a user