docs(architecture): RAG retrieval engine architecture set (01-09)
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
9 docs + index in docs-src/architecture/ documenting the deterministic retrieval engine: retrieval pipeline, authority rerank, source_class, source_role, control-intent + diversity, assessment, confidence, explainability + supersede, framework_* layer. Each doc carries the exact constants, the rationale behind them, code refs, and the failure class it addresses. Audit/onboarding reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# 02 — Authority-Re-Ranking
|
||||
|
||||
**Zweck:** Bindendes Recht der passenden Jurisdiktion/Domäne nach oben, Guidance/Fremdrecht/Off-Domain nach unten — **Reihenfolge only, nichts wird gelöscht**. Der `Score` trägt nach dem Rerank den Authority-Score, damit nachgelagerte Multi-Collection-Merges (Advisor) die Ordnung bewahren.
|
||||
|
||||
## Mechanik
|
||||
|
||||
`authorityScore()` (`authority_rerank.go`) berechnet pro Treffer einen normativen Relevanz-Score aus dem rohen Semantik-Score + gewichteter Autorität + Kontext-Bonus/Penalty:
|
||||
|
||||
```
|
||||
score = rawSemantic
|
||||
+ authorityCoef · weight/100 (Autorität, siehe 03)
|
||||
+ jurisdictionGain (DE/EU-Match)
|
||||
− foreignPenalty (CH bei DE/EU-Frage)
|
||||
− unknownPenalty (unbekannte Klasse)
|
||||
+ domainMatchGain (Chunk-Domäne == Query-Domäne)
|
||||
− offDomainPenalty (bindend, aber off-domain)
|
||||
− scopePenalty (BDSG Teil 3 bei allgemeiner DS-Frage)
|
||||
+ topicGain (bevorzugte kanonische Norm)
|
||||
− supersededPenalty (status="superseded")
|
||||
```
|
||||
|
||||
`rerankByAuthority()` sortiert stabil nach diesem Score und schreibt ihn zurück. `liftAboveBinding()` hebt bei **Auslegungs-Intent** eine semantisch konkurrenzfähige Guidance knapp über das bindende Recht — mit Margin-Guard, damit off-topic-Guidance das Gesetz nicht überholt.
|
||||
|
||||
## Konstanten + Warum
|
||||
|
||||
| Konstante | Wert | Warum |
|
||||
|-----------|------|-------|
|
||||
| `authorityCoef` | `0.40` | Gewicht→Score-Multiplikator. Konservativ kalibriert gegen die Offline-Golden-Harness (Phase A): hoch genug, dass bindendes Recht gewinnt, niedrig genug, dass starke Semantik nicht erschlagen wird |
|
||||
| `jurisdictionGain` | `0.05` | leichter Vorzug für DE/EU-Quellen bei DE/EU-Frage |
|
||||
| `foreignPenalty` | `0.60` | Fremdrecht (CH) bei DE/EU-Frage klar demoten — aber **nicht** entfernen (Vergleichsfälle bleiben auffindbar) |
|
||||
| `unknownPenalty` | `0.08` | unklassifizierte Quellen leicht zurückstufen |
|
||||
| `domainMatchGain` | `0.15` | Domänen-Treffer (data_protection / cyber / ai / product_safety) belohnen |
|
||||
| `offDomainPenalty` | `0.10` | bindende, aber fachfremde Norm demoten (z.B. DSGVO bei reiner Cyber-Frage) |
|
||||
| `scopePenalty` | `0.25` | BDSG §45–84 (Justiz/Strafverfolgung) bei allgemeiner DS-Frage zurückstufen — häufige Scope-Verwechslung |
|
||||
| `topicGain` | `0.18` | Verstärker für bevorzugte kanonische Normen (z.B. Art. 37 DSGVO bei DSB-Fragen) |
|
||||
| `supersededPenalty` | `0.50` | abgelöste Alt-Quelle demoten, „damit Default-Fragen die eu-v1-Norm sehen, History aber auffindbar bleibt" |
|
||||
| `intentLiftGain` | `0.10` | Epsilon-Lift einer Guidance über das beste bindende Recht bei Auslegungs-Intent |
|
||||
| `intentLiftMargin` | `0.05` | Guard: Lift nur, wenn die Semantik innerhalb von 0.05 zum besten bindenden Treffer liegt |
|
||||
|
||||
**Auslegungs-Intent-Signale** (`guidanceIntentSignals`): `edpb`, `dsk`, `enisa`, `bsi`, `leitlinie`, `guideline`, `orientierungshilfe`, `auslegung`, `empfiehlt`, `empfehlung`, `sagt`, `laut`, …
|
||||
|
||||
## Code
|
||||
|
||||
- `authority_rerank.go` → `authorityScore()`, `rerankByAuthority()`, `bestBindingSemantic()`, `liftAboveBinding()`
|
||||
|
||||
## Adressierte Fehlerklassen
|
||||
|
||||
- **„Guidance verdrängt Gesetz"** → `authorityCoef`·weight hebt bindendes Recht; `liftAboveBinding` nur mit Margin-Guard.
|
||||
- **„Fremdrecht Top-1"** → `foreignPenalty`.
|
||||
- **„Off-Domain-Gesetz dominiert"** → `domainMatchGain` / `offDomainPenalty` / `scopePenalty`.
|
||||
- **„Veraltete Norm gewinnt"** → `supersededPenalty` (siehe [08](08-explainability.md)).
|
||||
Reference in New Issue
Block a user