docs: add Chunk-Browser documentation
- Document Chunk-Browser tab functionality and API - Cover scroll endpoint, text search, pagination - Document Originalquelle links and low-chunk warnings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
114
docs-src/services/klausur-service/Chunk-Browser.md
Normal file
114
docs-src/services/klausur-service/Chunk-Browser.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# Chunk-Browser
|
||||||
|
|
||||||
|
## Uebersicht
|
||||||
|
|
||||||
|
Der Chunk-Browser ermoeglicht das sequenzielle Durchblaettern aller Chunks in einer Qdrant-Collection. Er ist als Tab "Chunk-Browser" auf der RAG-Seite (`/ai/rag`) verfuegbar.
|
||||||
|
|
||||||
|
**URL:** `https://macmini:3002/ai/rag` → Tab "Chunk-Browser"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Funktionen
|
||||||
|
|
||||||
|
### Collection-Auswahl
|
||||||
|
Dropdown mit allen verfuegbaren Compliance-Collections:
|
||||||
|
|
||||||
|
- `bp_compliance_gesetze`
|
||||||
|
- `bp_compliance_ce`
|
||||||
|
- `bp_compliance_datenschutz`
|
||||||
|
- `bp_dsfa_corpus`
|
||||||
|
- `bp_compliance_recht`
|
||||||
|
- `bp_legal_templates`
|
||||||
|
- `bp_compliance_gdpr`
|
||||||
|
- `bp_compliance_schulrecht`
|
||||||
|
- `bp_dsfa_templates`
|
||||||
|
- `bp_dsfa_risks`
|
||||||
|
|
||||||
|
### Seitenweise Navigation
|
||||||
|
- 20 Chunks pro Seite
|
||||||
|
- Zurueck/Weiter-Buttons
|
||||||
|
- Seitennummer und Chunk-Zaehler
|
||||||
|
- Cursor-basierte Pagination via Qdrant Scroll API
|
||||||
|
|
||||||
|
### Textsuche
|
||||||
|
- Filtert Chunks auf der aktuell geladenen Seite
|
||||||
|
- Treffer werden gelb hervorgehoben
|
||||||
|
- Suche ueber den Chunk-Text (payload.text, payload.content, payload.chunk_text)
|
||||||
|
|
||||||
|
### Chunk-Details
|
||||||
|
- Klick auf einen Chunk klappt alle Metadaten aus
|
||||||
|
- Zeigt: regulation_code, article, language, source, licence, etc.
|
||||||
|
- Chunks haben eine fortlaufende Nummer (#1, #2, ...)
|
||||||
|
|
||||||
|
### Integration mit Regulierungen-Tab
|
||||||
|
Der Button "In Chunks suchen" bei jeder Regulierung wechselt zum Chunk-Browser mit:
|
||||||
|
- Vorauswahl der richtigen Collection
|
||||||
|
- Vorausgefuelltem Suchbegriff (Regulierungsname)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API
|
||||||
|
|
||||||
|
### Scroll-Endpoint (API Proxy)
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/legal-corpus?action=scroll&collection=bp_compliance_ce&limit=20&offset={cursor}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameter:**
|
||||||
|
|
||||||
|
| Parameter | Typ | Beschreibung |
|
||||||
|
|-----------|-----|--------------|
|
||||||
|
| `collection` | string | Qdrant Collection Name |
|
||||||
|
| `limit` | number | Chunks pro Seite (max 100) |
|
||||||
|
| `offset` | string | Cursor fuer naechste Seite (optional) |
|
||||||
|
| `text_search` | string | Textsuche-Filter (optional) |
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"chunks": [
|
||||||
|
{
|
||||||
|
"id": "uuid",
|
||||||
|
"text": "...",
|
||||||
|
"regulation_code": "GDPR",
|
||||||
|
"article": "Art. 5",
|
||||||
|
"language": "de"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"next_offset": "uuid-or-null",
|
||||||
|
"total_in_page": 20
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Collection-Count-Endpoint
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/legal-corpus?action=collection-count&collection=bp_compliance_ce
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"count": 12345
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technische Details
|
||||||
|
|
||||||
|
- Der API-Proxy spricht direkt mit Qdrant (Port 6333) via dessen `POST /collections/{name}/points/scroll` Endpoint
|
||||||
|
- Kein Embedding oder rag-service erforderlich
|
||||||
|
- Textsuche ist client-seitig (kein Embedding noetig)
|
||||||
|
- Pagination ist cursor-basiert (Qdrant `next_page_offset`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Weitere Features auf der RAG-Seite
|
||||||
|
|
||||||
|
### Originalquelle-Links
|
||||||
|
Jede Regulierung in der Tabelle hat einen "Originalquelle" Link zum offiziellen Dokument (EUR-Lex, gesetze-im-internet.de, etc.). Definiert in `REGULATION_SOURCES` (88 Eintraege).
|
||||||
|
|
||||||
|
### Low-Chunk-Warnung
|
||||||
|
Regulierungen mit weniger als 10 Chunks aber einem erwarteten Wert >= 10 werden mit einem Amber-Warnsymbol markiert. Dies hilft, fehlgeschlagene oder unvollstaendige Ingestions zu erkennen.
|
||||||
@@ -70,6 +70,7 @@ nav:
|
|||||||
- OCR Vergleich: services/klausur-service/OCR-Compare.md
|
- OCR Vergleich: services/klausur-service/OCR-Compare.md
|
||||||
- RAG Admin: services/klausur-service/RAG-Admin-Spec.md
|
- RAG Admin: services/klausur-service/RAG-Admin-Spec.md
|
||||||
- Worksheet Editor: services/klausur-service/Worksheet-Editor-Architecture.md
|
- Worksheet Editor: services/klausur-service/Worksheet-Editor-Architecture.md
|
||||||
|
- Chunk-Browser: services/klausur-service/Chunk-Browser.md
|
||||||
- Voice-Service:
|
- Voice-Service:
|
||||||
- Uebersicht: services/voice-service/index.md
|
- Uebersicht: services/voice-service/index.md
|
||||||
- Agent-Core:
|
- Agent-Core:
|
||||||
|
|||||||
Reference in New Issue
Block a user