docs(architecture): RAG retrieval engine architecture set (01-09)
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Successful in 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
9 docs + index in docs-src/architecture/ documenting the deterministic retrieval engine: retrieval pipeline, authority rerank, source_class, source_role, control-intent + diversity, assessment, confidence, explainability + supersede, framework_* layer. Each doc carries the exact constants, the rationale behind them, code refs, and the failure class it addresses. Audit/onboarding reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# 09 — `framework_*`-Layer (Control-Mapping-Brücke)
|
||||
|
||||
**Zweck:** Einen **konkreten Control adressierbar** machen (z.B. `V14.2.4`), damit das System vom „welches Dokument passt?" zum „welcher konkrete Control erfüllt CRA Annex I?" übergeht. Das ist die Brücke zur nächsten Stufe — **Control → Evidence** — und der eigentliche Burggraben.
|
||||
|
||||
> **Ehrlicher Status:** Dieser Layer lebt **heute in der Qdrant-Payload**, nicht im Retrieval-Code. Die `ucca`-Engine liest/routet `framework_*` (noch) nicht — sie ist die **Datengrundlage**, auf der Prio 4 aufsetzt. `framework_control` reist aktuell im Feld `article` mit und ist daher bereits in den Antworten sichtbar.
|
||||
|
||||
## Schema (pro Chunk)
|
||||
|
||||
| Feld | Beispiel (OWASP) | Bedeutung |
|
||||
|------|------------------|-----------|
|
||||
| `framework` | `OWASP ASVS` | Rahmenwerk |
|
||||
| `framework_version` | `5.0` | Version (mit `superseded`-Mechanik historisierbar, [08](08-explainability.md)) |
|
||||
| `framework_section` | `V6` | Kapitel/Sektion |
|
||||
| `framework_control` | `V6.2.4` | konkrete Requirement-ID — der adressierbare Control |
|
||||
| `framework_section_name` | `Password Security` | menschenlesbarer Kontext |
|
||||
| `asvs_level` | `L1`/`L2`/`L3` | (OWASP-spezifisch) Stufe |
|
||||
|
||||
Analog für NIST geplant: `framework="NIST SP 800-53"`, `framework_family="SI"`, `framework_control="SI-2"`, `framework_revision="5"`.
|
||||
|
||||
## OWASP ASVS 5.0 — die erste Referenz (Parser-4-Muster)
|
||||
|
||||
- **Quelle:** `OWASP/ASVS` GitHub, `5.0/docs_en/...flat.json` (345 Requirements). Lizenz **CC-BY-SA-4.0** (zulässig; nur CC-BY-NC ist geblockt), Attribution `OWASP`.
|
||||
- **Ingestion = per-Requirement Direct-Upsert** (nicht der RAG-Chunker, der `framework_control` zerschneiden würde): 1 Qdrant-Punkt pro Requirement, `id = uuid5("owasp_asvs_5.0_"+req_id)` (idempotent), `source_class=technical_standard` / `authority_weight=80`, bge-m3-Vektor.
|
||||
- **Stand:** 345 Punkte auf macmini-qdrant **und** qdrant-dev, live verifiziert (`„OWASP … Authentifizierung"` → Top-OWASP mit `V`-Codes).
|
||||
- **Lehre:** Künftige Standards (NIST-Re-Tag, BSI Grundschutz) **immer** mit `source_class=technical_standard` + `framework_*` direkt setzen — das NIST-Altskript ließ `source_class` leer, daher der guidance-Mistag ([03](03-source-class.md)).
|
||||
|
||||
## Brücke zu Prio 4 — Control → Evidence
|
||||
|
||||
```
|
||||
Regulation
|
||||
↓ (legal obligation layer)
|
||||
Obligation
|
||||
↓ (source_role: operational_requirement)
|
||||
Operational Requirement ── CRA Annex I
|
||||
↓ (Control-Mapping über framework_control)
|
||||
Control ── OWASP V6.x · NIST SI-2 · BSI OPS.1.1
|
||||
↓
|
||||
Evidence ── der Nachweis, den ein Auditor sehen will
|
||||
```
|
||||
|
||||
Der nächste Schritt verdrahtet `framework_control` in eine **Control-Mapping-Tabelle** (welcher konkrete Control erfüllt welche Obligation) und darunter die **Evidence-Schicht**. NIST + BSI ziehen im selben `framework_*`-Muster nach.
|
||||
|
||||
## Code / Daten
|
||||
|
||||
- Daten: Qdrant `bp_compliance_ce` (Payload-Felder oben), Ingestion-Skripte (`ingest_owasp.py` u.a.)
|
||||
- Retrieval-Verdrahtung: **offen** (Prio 4)
|
||||
|
||||
## Adressierte Fehlerklassen
|
||||
|
||||
- **„nur Dokument-Treffer, kein adressierbarer Control"** → `framework_control` pro Chunk.
|
||||
- **„Control-Katalog ohne Stand"** → `framework_version` + Supersede.
|
||||
Reference in New Issue
Block a user