docs(architecture): RAG retrieval engine architecture set (01-09)
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
9 docs + index in docs-src/architecture/ documenting the deterministic retrieval engine: retrieval pipeline, authority rerank, source_class, source_role, control-intent + diversity, assessment, confidence, explainability + supersede, framework_* layer. Each doc carries the exact constants, the rationale behind them, code refs, and the failure class it addresses. Audit/onboarding reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# 03 — `source_class` (Rechtsnatur / Autorität)
|
||||
|
||||
**Zweck:** Die Autoritäts-Achse, die den **Rang** bestimmt (siehe [02](02-authority.md)). Deterministisch abgeleitet — der noch nicht re-ingestierte (ungetaggte) Korpus wird trotzdem klassifiziert, ohne Re-Tagging des Bestands.
|
||||
|
||||
## Mechanik
|
||||
|
||||
`classifyAuthority()` (`authority.go`) entscheidet in dieser Reihenfolge:
|
||||
|
||||
1. **Standard-NAME-Override** — erkannter Standard-Name (NIST/OWASP/ISO 27001/CIS/CSA CCM/Grundschutz) erzwingt `technical_standard` (Gewicht 80), **auch wenn die Payload `supervisory_guidance` sagt**. Grund: der Korpus taggt viele Standards mit generischem guidance-`source_class`; der Name ist autoritativer. `binding_law` bleibt unangetastet.
|
||||
2. **Explizite Payload-Werte** — gesetztes `source_class` / `authority_weight` gewinnen.
|
||||
3. **Marker-Fallback** — foreign → standard → guidance → regulation → unknown.
|
||||
|
||||
`inferJurisdiction()`: Fremd-Marker → `CH`; enthält `§` oder DE-Marker → `DE`; sonst → `EU`.
|
||||
|
||||
## Konstanten + Warum
|
||||
|
||||
**Gewichte je Klasse** (`sourceClassFromWeight()`):
|
||||
|
||||
| `source_class` | Gewicht | Schwelle | Bedeutung |
|
||||
|----------------|---------|----------|-----------|
|
||||
| `binding_law` | `100` | w ≥ 100 | bindendes Recht (Gesetz/VO) |
|
||||
| `technical_standard` | `80` | 80 ≤ w < 100 | Best-Practice-Control-Katalog (NIST/OWASP/ISO) |
|
||||
| `supervisory_guidance` | `70` | 70 ≤ w < 80 | Aufsichts-/Auslegungs-Guidance (ENISA/BSI/EDPB) |
|
||||
| `unknown` | `50` | default | unklassifiziert |
|
||||
| `foreign_law` | `0` | w ≤ 0 | Fremdrecht (CH) |
|
||||
|
||||
**Marker-Listen** (Substring-Match):
|
||||
|
||||
| Liste | Einträge (Auszug) | Wirkung |
|
||||
|-------|-------------------|---------|
|
||||
| `standardMarkers` *(vor guidance geprüft)* | NIST, OWASP, Grundschutz, ISO 27001, ISO/IEC 27001, CSA CCM, Cloud Controls Matrix, CIS Benchmark, CIS Control | → `technical_standard` (80) |
|
||||
| `guidanceMarkers` | DSK, EDPB, BfDI, ENISA, BSI, EUCC, Standards Mapping, Orientierungshilfe, Handreichung, Leitlinie, Empfehlung, OECD, CISA, Blue Guide, … | → `supervisory_guidance` (70) |
|
||||
| `foreignMarkers` | RevDSG, fedlex, (CH) | → `foreign_law` (0) |
|
||||
| `deMarkers` | BDSG, DSK, BfDI, BayLfD, BSI | Signal **DE**-Jurisdiktion |
|
||||
|
||||
## Der Standard-Name-Override (Fix 2026-06-25)
|
||||
|
||||
**Problem:** Der CE-Korpus taggt z.B. `NIST SP 800-82r3` als `source_class=supervisory_guidance` (Gewicht 70), **nicht** technical_standard. `classifyAuthority` vertraute dem Payload-Tag → NIST landete als guidance, **kein `control_standard`** im Pool → die Diversity-Regel ([05](05-control-intent.md)) konnte nichts injizieren.
|
||||
|
||||
**Fix:** Erkannter Standard-Name überschreibt ein fehl-getaggtes guidance/unknown-`source_class` → `technical_standard`. Code-Fix, **kein Re-Ingest** nötig. Bindendes Recht bleibt unangetastet (Sanity geprüft: Rechtsfrage liefert weiterhin binding Top-1).
|
||||
|
||||
## Code
|
||||
|
||||
- `authority.go` → `classifyAuthority()`, `sourceClassFromWeight()`, `inferJurisdiction()`
|
||||
|
||||
## Adressierte Fehlerklassen
|
||||
|
||||
- **„Standard als guidance mistagged → kein control_standard"** → Standard-Name-Override.
|
||||
- **„Fremdrecht falsch eingeordnet"** → `foreignMarkers` + `foreign_law`-Gewicht 0.
|
||||
Reference in New Issue
Block a user