fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.
This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).
Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
58
docs-src/Dockerfile
Normal file
58
docs-src/Dockerfile
Normal file
@@ -0,0 +1,58 @@
|
||||
# ============================================
|
||||
# Breakpilot Dokumentation - MkDocs Build
|
||||
# Multi-stage build fuer minimale Image-Groesse
|
||||
# ============================================
|
||||
|
||||
# Stage 1: Build MkDocs Site
|
||||
FROM python:3.11-slim AS builder
|
||||
|
||||
WORKDIR /docs
|
||||
|
||||
# Install MkDocs with Material theme and plugins
|
||||
RUN pip install --no-cache-dir \
|
||||
mkdocs==1.6.1 \
|
||||
mkdocs-material==9.5.47 \
|
||||
pymdown-extensions==10.12
|
||||
|
||||
# Copy configuration and source files
|
||||
COPY mkdocs.yml /docs/
|
||||
COPY docs-src/ /docs/docs-src/
|
||||
|
||||
# Build static site
|
||||
RUN mkdocs build
|
||||
|
||||
# Stage 2: Serve with Nginx
|
||||
FROM nginx:alpine
|
||||
|
||||
# Copy built site from builder stage
|
||||
COPY --from=builder /docs/docs-site /usr/share/nginx/html
|
||||
|
||||
# Custom nginx config for SPA routing
|
||||
RUN echo 'server { \
|
||||
listen 80; \
|
||||
server_name localhost; \
|
||||
root /usr/share/nginx/html; \
|
||||
index index.html; \
|
||||
\
|
||||
location / { \
|
||||
try_files $uri $uri/ /index.html; \
|
||||
} \
|
||||
\
|
||||
# Enable gzip compression \
|
||||
gzip on; \
|
||||
gzip_types text/plain text/css application/json application/javascript text/xml application/xml; \
|
||||
gzip_min_length 1000; \
|
||||
\
|
||||
# Cache static assets \
|
||||
location ~* \.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2)$ { \
|
||||
expires 1y; \
|
||||
add_header Cache-Control "public, immutable"; \
|
||||
} \
|
||||
}' > /etc/nginx/conf.d/default.conf
|
||||
|
||||
EXPOSE 80
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
||||
CMD wget --no-verbose --tries=1 --spider http://localhost/ || exit 1
|
||||
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
||||
263
docs-src/api/backend-api.md
Normal file
263
docs-src/api/backend-api.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# BreakPilot Backend API Dokumentation
|
||||
|
||||
## Übersicht
|
||||
|
||||
Base URL: `http://localhost:8000/api`
|
||||
|
||||
Alle Endpoints erfordern Authentifizierung via JWT im Authorization-Header:
|
||||
```
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Worksheets API
|
||||
|
||||
Generiert Lernmaterialien (MC-Tests, Lückentexte, Mindmaps, Quiz).
|
||||
|
||||
### POST /worksheets/generate/multiple-choice
|
||||
|
||||
Generiert Multiple-Choice-Fragen aus Quelltext.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"source_text": "Der Text, aus dem Fragen generiert werden sollen...",
|
||||
"num_questions": 5,
|
||||
"difficulty": "medium",
|
||||
"topic": "Thema",
|
||||
"subject": "Deutsch"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200):**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"content": {
|
||||
"type": "multiple_choice",
|
||||
"data": {
|
||||
"questions": [
|
||||
{
|
||||
"question": "Was ist...?",
|
||||
"options": ["A", "B", "C", "D"],
|
||||
"correct": 0,
|
||||
"explanation": "Erklärung..."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### POST /worksheets/generate/cloze
|
||||
|
||||
Generiert Lückentexte.
|
||||
|
||||
### POST /worksheets/generate/mindmap
|
||||
|
||||
Generiert Mindmap als Mermaid-Diagramm.
|
||||
|
||||
### POST /worksheets/generate/quiz
|
||||
|
||||
Generiert Mix aus verschiedenen Fragetypen.
|
||||
|
||||
---
|
||||
|
||||
## Corrections API
|
||||
|
||||
OCR-basierte Klausurkorrektur mit automatischer Bewertung.
|
||||
|
||||
### POST /corrections/
|
||||
|
||||
Erstellt neue Korrektur-Session.
|
||||
|
||||
### POST /corrections/{id}/upload
|
||||
|
||||
Lädt gescannte Klausur hoch und startet OCR im Hintergrund.
|
||||
|
||||
### GET /corrections/{id}
|
||||
|
||||
Ruft Korrektur-Status ab.
|
||||
|
||||
**Status-Werte:**
|
||||
- `uploaded` - Datei hochgeladen
|
||||
- `processing` - OCR läuft
|
||||
- `ocr_complete` - OCR fertig
|
||||
- `analyzing` - Analyse läuft
|
||||
- `analyzed` - Analyse abgeschlossen
|
||||
- `completed` - Fertig
|
||||
- `error` - Fehler
|
||||
|
||||
### POST /corrections/{id}/analyze
|
||||
|
||||
Analysiert extrahierten Text und bewertet Antworten.
|
||||
|
||||
### GET /corrections/{id}/export-pdf
|
||||
|
||||
Exportiert korrigierte Arbeit als PDF.
|
||||
|
||||
---
|
||||
|
||||
## Letters API
|
||||
|
||||
Elternbriefe mit GFK-Integration und PDF-Export.
|
||||
|
||||
### POST /letters/
|
||||
|
||||
Erstellt neuen Elternbrief.
|
||||
|
||||
**letter_type Werte:**
|
||||
- `general` - Allgemeine Information
|
||||
- `halbjahr` - Halbjahresinformation
|
||||
- `fehlzeiten` - Fehlzeiten-Mitteilung
|
||||
- `elternabend` - Einladung Elternabend
|
||||
- `lob` - Positives Feedback
|
||||
- `custom` - Benutzerdefiniert
|
||||
|
||||
### POST /letters/improve
|
||||
|
||||
Verbessert Text nach GFK-Prinzipien.
|
||||
|
||||
---
|
||||
|
||||
## State Engine API
|
||||
|
||||
Begleiter-Modus mit Phasen-Management und Antizipation.
|
||||
|
||||
### GET /state/dashboard
|
||||
|
||||
Komplettes Dashboard für Begleiter-Modus.
|
||||
|
||||
### GET /state/suggestions
|
||||
|
||||
Ruft Vorschläge für Lehrer ab.
|
||||
|
||||
### POST /state/milestone
|
||||
|
||||
Schließt Meilenstein ab.
|
||||
|
||||
---
|
||||
|
||||
## Klausur-Korrektur API (Abitur)
|
||||
|
||||
Abitur-Klausurkorrektur mit 15-Punkte-System, Erst-/Zweitprüfer-Workflow und KI-gestützter Bewertung.
|
||||
|
||||
### Klausur-Modi
|
||||
|
||||
| Modus | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `landes_abitur` | NiBiS Niedersachsen - rechtlich geklärte Aufgaben |
|
||||
| `vorabitur` | Lehrer-erstellte Klausuren mit Rights-Gate |
|
||||
|
||||
### POST /klausur-korrektur/klausuren
|
||||
|
||||
Erstellt neue Abitur-Klausur.
|
||||
|
||||
### POST /klausur-korrektur/students/{id}/evaluate
|
||||
|
||||
Startet KI-Bewertung.
|
||||
|
||||
**Response (200):**
|
||||
```json
|
||||
{
|
||||
"criteria_scores": {
|
||||
"rechtschreibung": {"score": 85, "weight": 0.15},
|
||||
"grammatik": {"score": 90, "weight": 0.15},
|
||||
"inhalt": {"score": 75, "weight": 0.40},
|
||||
"struktur": {"score": 80, "weight": 0.15},
|
||||
"stil": {"score": 85, "weight": 0.15}
|
||||
},
|
||||
"raw_points": 80,
|
||||
"grade_points": 11,
|
||||
"grade_label": "2"
|
||||
}
|
||||
```
|
||||
|
||||
### 15-Punkte-Notenschlüssel
|
||||
|
||||
| Punkte | Prozent | Note |
|
||||
|--------|---------|------|
|
||||
| 15 | ≥95% | 1+ |
|
||||
| 14 | ≥90% | 1 |
|
||||
| 13 | ≥85% | 1- |
|
||||
| 12 | ≥80% | 2+ |
|
||||
| 11 | ≥75% | 2 |
|
||||
| 10 | ≥70% | 2- |
|
||||
| 9 | ≥65% | 3+ |
|
||||
| 8 | ≥60% | 3 |
|
||||
| 7 | ≥55% | 3- |
|
||||
| 6 | ≥50% | 4+ |
|
||||
| 5 | ≥45% | 4 |
|
||||
| 4 | ≥40% | 4- |
|
||||
| 3 | ≥33% | 5+ |
|
||||
| 2 | ≥27% | 5 |
|
||||
| 1 | ≥20% | 5- |
|
||||
| 0 | <20% | 6 |
|
||||
|
||||
### Bewertungskriterien
|
||||
|
||||
| Kriterium | Gewicht | Beschreibung |
|
||||
|-----------|---------|--------------|
|
||||
| `rechtschreibung` | 15% | Orthografie |
|
||||
| `grammatik` | 15% | Grammatik & Syntax |
|
||||
| `inhalt` | 40% | Inhaltliche Qualität |
|
||||
| `struktur` | 15% | Aufbau & Gliederung |
|
||||
| `stil` | 15% | Ausdruck & Stil |
|
||||
|
||||
---
|
||||
|
||||
## Security API (DevSecOps Dashboard)
|
||||
|
||||
API fuer das Security Dashboard mit DevSecOps-Tools Integration.
|
||||
|
||||
### GET /v1/security/tools
|
||||
|
||||
Gibt Status aller DevSecOps-Tools zurueck.
|
||||
|
||||
### GET /v1/security/findings
|
||||
|
||||
Gibt alle Security-Findings zurueck.
|
||||
|
||||
### GET /v1/security/sbom
|
||||
|
||||
Gibt SBOM (Software Bill of Materials) zurueck.
|
||||
|
||||
### POST /v1/security/scan/{type}
|
||||
|
||||
Startet einen Security-Scan.
|
||||
|
||||
**Path Parameter:**
|
||||
- `type`: Scan-Typ (secrets, sast, deps, containers, sbom, all)
|
||||
|
||||
---
|
||||
|
||||
## Fehler-Responses
|
||||
|
||||
### 400 Bad Request
|
||||
```json
|
||||
{
|
||||
"detail": "Beschreibung des Fehlers"
|
||||
}
|
||||
```
|
||||
|
||||
### 401 Unauthorized
|
||||
```json
|
||||
{
|
||||
"detail": "Not authenticated"
|
||||
}
|
||||
```
|
||||
|
||||
### 404 Not Found
|
||||
```json
|
||||
{
|
||||
"detail": "Ressource nicht gefunden"
|
||||
}
|
||||
```
|
||||
|
||||
### 500 Internal Server Error
|
||||
```json
|
||||
{
|
||||
"detail": "Interner Serverfehler"
|
||||
}
|
||||
```
|
||||
294
docs-src/architecture/auth-system.md
Normal file
294
docs-src/architecture/auth-system.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# BreakPilot Authentifizierung & Autorisierung
|
||||
|
||||
## Uebersicht
|
||||
|
||||
BreakPilot verwendet einen **Hybrid-Ansatz** fuer Authentifizierung und Autorisierung:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ AUTHENTIFIZIERUNG │
|
||||
│ "Wer bist du?" │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ HybridAuthenticator │ │
|
||||
│ │ ┌─────────────────────┐ ┌─────────────────────────────────┐ │ │
|
||||
│ │ │ Keycloak │ │ Lokales JWT │ │ │
|
||||
│ │ │ (Produktion) │ OR │ (Entwicklung) │ │ │
|
||||
│ │ │ RS256 + JWKS │ │ HS256 + Secret │ │ │
|
||||
│ │ └─────────────────────┘ └─────────────────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ AUTORISIERUNG │
|
||||
│ "Was darfst du?" │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ rbac.py (Eigenentwicklung) │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │ │
|
||||
│ │ │ Rollen-Hierarchie│ │ PolicySet │ │ DEFAULT_PERMISSIONS│ │ │
|
||||
│ │ │ 15+ Rollen │ │ Bundesland- │ │ Matrix │ │ │
|
||||
│ │ │ - Erstkorrektor │ │ spezifisch │ │ Rolle→Ressource→ │ │ │
|
||||
│ │ │ - Klassenlehrer │ │ - Niedersachsen │ │ Aktion │ │ │
|
||||
│ │ │ - Schulleitung │ │ - Bayern │ │ │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────┘ └───────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Warum dieser Ansatz?
|
||||
|
||||
### Alternative Loesungen (verworfen)
|
||||
|
||||
| Tool | Problem fuer BreakPilot |
|
||||
|------|-------------------------|
|
||||
| **Casbin** | Zu generisch fuer Bundesland-spezifische Policies |
|
||||
| **Cerbos** | Overhead: Externer PDP-Service fuer ~15 Rollen ueberdimensioniert |
|
||||
| **OpenFGA** | Zanzibar-Modell optimiert fuer Graph-Beziehungen, nicht Hierarchien |
|
||||
| **Keycloak RBAC** | Kann keine ressourcen-spezifischen Zuweisungen (User X ist Erstkorrektor fuer Package Y) |
|
||||
|
||||
### Vorteile des Hybrid-Ansatzes
|
||||
|
||||
1. **Keycloak fuer Authentifizierung:**
|
||||
- Bewährtes IAM-System
|
||||
- SSO, Federation, MFA
|
||||
- Apache-2.0 Lizenz
|
||||
|
||||
2. **Eigenes rbac.py fuer Autorisierung:**
|
||||
- Domaenenspezifische Logik (Korrekturkette, Zeugnis-Workflow)
|
||||
- Bundesland-spezifische Regeln
|
||||
- Zeitlich begrenzte Zuweisungen
|
||||
- Key-Sharing fuer verschluesselte Klausuren
|
||||
|
||||
---
|
||||
|
||||
## Authentifizierung (auth/keycloak_auth.py)
|
||||
|
||||
### Konfiguration
|
||||
|
||||
```python
|
||||
# Entwicklung: Lokales JWT (Standard)
|
||||
JWT_SECRET=your-secret-key
|
||||
|
||||
# Produktion: Keycloak
|
||||
KEYCLOAK_SERVER_URL=https://keycloak.breakpilot.app
|
||||
KEYCLOAK_REALM=breakpilot
|
||||
KEYCLOAK_CLIENT_ID=breakpilot-backend
|
||||
KEYCLOAK_CLIENT_SECRET=your-client-secret
|
||||
```
|
||||
|
||||
### Token-Erkennung
|
||||
|
||||
Der `HybridAuthenticator` erkennt automatisch den Token-Typ:
|
||||
|
||||
```python
|
||||
# Keycloak-Token (RS256)
|
||||
{
|
||||
"iss": "https://keycloak.breakpilot.app/realms/breakpilot",
|
||||
"sub": "user-uuid",
|
||||
"realm_access": {"roles": ["teacher", "admin"]},
|
||||
...
|
||||
}
|
||||
|
||||
# Lokales JWT (HS256)
|
||||
{
|
||||
"iss": "breakpilot",
|
||||
"user_id": "user-uuid",
|
||||
"role": "admin",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### FastAPI Integration
|
||||
|
||||
```python
|
||||
from auth import get_current_user
|
||||
|
||||
@app.get("/api/protected")
|
||||
async def protected_endpoint(user: dict = Depends(get_current_user)):
|
||||
# user enthält: user_id, email, role, realm_roles, tenant_id
|
||||
return {"user_id": user["user_id"]}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Autorisierung (klausur-service/backend/rbac.py)
|
||||
|
||||
### Rollen (15+)
|
||||
|
||||
| Rolle | Beschreibung | Bereich |
|
||||
|-------|--------------|---------|
|
||||
| `erstkorrektor` | Erster Prüfer | Klausur |
|
||||
| `zweitkorrektor` | Zweiter Prüfer | Klausur |
|
||||
| `drittkorrektor` | Dritter Prüfer | Klausur |
|
||||
| `klassenlehrer` | Klassenleitung | Zeugnis |
|
||||
| `fachlehrer` | Fachlehrkraft | Noten |
|
||||
| `fachvorsitz` | Fachkonferenz-Leitung | Fachschaft |
|
||||
| `schulleitung` | Schulleiter/in | Schule |
|
||||
| `zeugnisbeauftragter` | Zeugnis-Koordination | Zeugnis |
|
||||
| `sekretariat` | Verwaltung | Schule |
|
||||
| `data_protection_officer` | DSB | DSGVO |
|
||||
| ... | | |
|
||||
|
||||
### Ressourcentypen (25+)
|
||||
|
||||
```python
|
||||
class ResourceType(str, Enum):
|
||||
EXAM_PACKAGE = "exam_package" # Klausurpaket
|
||||
STUDENT_SUBMISSION = "student_submission"
|
||||
CORRECTION = "correction"
|
||||
ZEUGNIS = "zeugnis"
|
||||
FACHNOTE = "fachnote"
|
||||
KOPFNOTE = "kopfnote"
|
||||
BEMERKUNG = "bemerkung"
|
||||
...
|
||||
```
|
||||
|
||||
### Aktionen (17)
|
||||
|
||||
```python
|
||||
class Action(str, Enum):
|
||||
CREATE = "create"
|
||||
READ = "read"
|
||||
UPDATE = "update"
|
||||
DELETE = "delete"
|
||||
SIGN_OFF = "sign_off" # Freigabe
|
||||
BREAK_GLASS = "break_glass" # Notfall-Zugriff
|
||||
SHARE_KEY = "share_key" # Schlüssel teilen
|
||||
...
|
||||
```
|
||||
|
||||
### Permission-Pruefung
|
||||
|
||||
```python
|
||||
from klausur_service.backend.rbac import PolicyEngine
|
||||
|
||||
engine = PolicyEngine()
|
||||
|
||||
# Pruefe ob User X Klausur Y korrigieren darf
|
||||
allowed = engine.check_permission(
|
||||
user_id="user-uuid",
|
||||
action=Action.UPDATE,
|
||||
resource_type=ResourceType.CORRECTION,
|
||||
resource_id="klausur-uuid"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bundesland-spezifische Policies
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class PolicySet:
|
||||
bundesland: str
|
||||
abitur_type: str # "landesabitur" | "zentralabitur"
|
||||
|
||||
# Korrekturkette
|
||||
korrektoren_anzahl: int # 2 oder 3
|
||||
anonyme_erstkorrektur: bool
|
||||
|
||||
# Sichtbarkeit
|
||||
zk_visibility_mode: ZKVisibilityMode # BLIND | SEMI | FULL
|
||||
eh_visibility_mode: EHVisibilityMode
|
||||
|
||||
# Zeugnis
|
||||
kopfnoten_enabled: bool
|
||||
...
|
||||
```
|
||||
|
||||
### Beispiel: Niedersachsen
|
||||
|
||||
```python
|
||||
NIEDERSACHSEN_POLICY = PolicySet(
|
||||
bundesland="niedersachsen",
|
||||
abitur_type="landesabitur",
|
||||
korrektoren_anzahl=2,
|
||||
anonyme_erstkorrektur=True,
|
||||
zk_visibility_mode=ZKVisibilityMode.BLIND,
|
||||
eh_visibility_mode=EHVisibilityMode.SUMMARY_ONLY,
|
||||
kopfnoten_enabled=True,
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow-Beispiele
|
||||
|
||||
### Klausurkorrektur-Workflow
|
||||
|
||||
```
|
||||
1. Lehrer laedt Klausuren hoch
|
||||
└── Rolle: "lehrer" + Action.CREATE auf EXAM_PACKAGE
|
||||
|
||||
2. Erstkorrektor korrigiert
|
||||
└── Rolle: "erstkorrektor" (ressourcen-spezifisch) + Action.UPDATE auf CORRECTION
|
||||
|
||||
3. Zweitkorrektor ueberprueft
|
||||
└── Rolle: "zweitkorrektor" + Action.READ auf CORRECTION
|
||||
└── Policy: zk_visibility_mode bestimmt Sichtbarkeit
|
||||
|
||||
4. Drittkorrektor (bei Abweichung)
|
||||
└── Rolle: "drittkorrektor" + Action.SIGN_OFF
|
||||
```
|
||||
|
||||
### Zeugnis-Workflow
|
||||
|
||||
```
|
||||
1. Fachlehrer traegt Noten ein
|
||||
└── Rolle: "fachlehrer" + Action.CREATE auf FACHNOTE
|
||||
|
||||
2. Klassenlehrer prueft
|
||||
└── Rolle: "klassenlehrer" + Action.READ auf ZEUGNIS
|
||||
└── Action.SIGN_OFF freigeben
|
||||
|
||||
3. Zeugnisbeauftragter final
|
||||
└── Rolle: "zeugnisbeauftragter" + Action.SIGN_OFF
|
||||
|
||||
4. Schulleitung unterzeichnet
|
||||
└── Rolle: "schulleitung" + Action.SIGN_OFF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dateien
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `backend/auth/__init__.py` | Auth-Modul Exports |
|
||||
| `backend/auth/keycloak_auth.py` | Hybrid-Authentifizierung |
|
||||
| `klausur-service/backend/rbac.py` | Autorisierungs-Engine |
|
||||
| `backend/rbac_api.py` | REST API fuer Rollenverwaltung |
|
||||
|
||||
---
|
||||
|
||||
## Konfiguration
|
||||
|
||||
### Entwicklung (ohne Keycloak)
|
||||
|
||||
```bash
|
||||
# .env
|
||||
ENVIRONMENT=development
|
||||
JWT_SECRET=dev-secret-32-chars-minimum-here
|
||||
```
|
||||
|
||||
### Produktion (mit Keycloak)
|
||||
|
||||
```bash
|
||||
# .env
|
||||
ENVIRONMENT=production
|
||||
JWT_SECRET=<openssl rand -hex 32>
|
||||
KEYCLOAK_SERVER_URL=https://keycloak.breakpilot.app
|
||||
KEYCLOAK_REALM=breakpilot
|
||||
KEYCLOAK_CLIENT_ID=breakpilot-backend
|
||||
KEYCLOAK_CLIENT_SECRET=<from keycloak admin console>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sicherheitshinweise
|
||||
|
||||
1. **Secrets niemals im Code** - Immer Umgebungsvariablen verwenden
|
||||
2. **JWT_SECRET in Produktion** - Mindestens 32 Bytes, generiert mit `openssl rand -hex 32`
|
||||
3. **Keycloak HTTPS** - KEYCLOAK_VERIFY_SSL=true in Produktion
|
||||
4. **Token-Expiration** - Keycloak-Tokens kurz halten (5-15 Minuten)
|
||||
5. **Audit-Trail** - Alle Berechtigungspruefungen werden geloggt
|
||||
215
docs-src/architecture/devsecops.md
Normal file
215
docs-src/architecture/devsecops.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# BreakPilot DevSecOps Architecture
|
||||
|
||||
## Uebersicht
|
||||
|
||||
BreakPilot implementiert einen umfassenden DevSecOps-Ansatz mit Security-by-Design fuer die Entwicklung und den Betrieb der Bildungsplattform.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEVSECOPS PIPELINE │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Pre-Commit │───►│ CI/CD │───►│ Build │───►│ Deploy │ │
|
||||
│ │ Hooks │ │ Pipeline │ │ & Scan │ │ & Monitor │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ ▼ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Gitleaks │ │ Semgrep │ │ Trivy │ │ Falco │ │
|
||||
│ │ Bandit │ │ OWASP DC │ │ Grype │ │ (optional) │ │
|
||||
│ │ Secrets │ │ SAST/SCA │ │ SBOM │ │ Runtime │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Security Tools Stack
|
||||
|
||||
### 1. Secrets Detection
|
||||
|
||||
| Tool | Version | Lizenz | Verwendung |
|
||||
|------|---------|--------|------------|
|
||||
| **Gitleaks** | 8.18.x | MIT | Pre-commit Hook, CI/CD |
|
||||
| **detect-secrets** | 1.4.x | Apache-2.0 | Zusaetzliche Baseline-Pruefung |
|
||||
|
||||
**Konfiguration:** `.gitleaks.toml`
|
||||
|
||||
```bash
|
||||
# Lokal ausfuehren
|
||||
gitleaks detect --source . -v
|
||||
|
||||
# Pre-commit (automatisch)
|
||||
gitleaks protect --staged -v
|
||||
```
|
||||
|
||||
### 2. Static Application Security Testing (SAST)
|
||||
|
||||
| Tool | Version | Lizenz | Sprachen |
|
||||
|------|---------|--------|----------|
|
||||
| **Semgrep** | 1.52.x | LGPL-2.1 | Python, Go, JavaScript, TypeScript |
|
||||
| **Bandit** | 1.7.x | Apache-2.0 | Python (spezialisiert) |
|
||||
|
||||
**Konfiguration:** `.semgrep.yml`
|
||||
|
||||
```bash
|
||||
# Semgrep ausfuehren
|
||||
semgrep scan --config auto --config .semgrep.yml
|
||||
|
||||
# Bandit ausfuehren
|
||||
bandit -r backend/ -ll
|
||||
```
|
||||
|
||||
### 3. Software Composition Analysis (SCA)
|
||||
|
||||
| Tool | Version | Lizenz | Verwendung |
|
||||
|------|---------|--------|------------|
|
||||
| **Trivy** | 0.48.x | Apache-2.0 | Filesystem, Container, IaC |
|
||||
| **Grype** | 0.74.x | Apache-2.0 | Vulnerability Scanning |
|
||||
| **OWASP Dependency-Check** | 9.x | Apache-2.0 | CVE/NVD Abgleich |
|
||||
|
||||
**Konfiguration:** `.trivy.yaml`
|
||||
|
||||
```bash
|
||||
# Filesystem-Scan
|
||||
trivy fs . --severity HIGH,CRITICAL
|
||||
|
||||
# Container-Scan
|
||||
trivy image breakpilot-pwa-backend:latest
|
||||
```
|
||||
|
||||
### 4. SBOM (Software Bill of Materials)
|
||||
|
||||
| Tool | Version | Lizenz | Formate |
|
||||
|------|---------|--------|---------|
|
||||
| **Syft** | 0.100.x | Apache-2.0 | CycloneDX, SPDX |
|
||||
|
||||
```bash
|
||||
# SBOM generieren
|
||||
syft dir:. -o cyclonedx-json=sbom.json
|
||||
syft dir:. -o spdx-json=sbom-spdx.json
|
||||
```
|
||||
|
||||
### 5. Dynamic Application Security Testing (DAST)
|
||||
|
||||
| Tool | Version | Lizenz | Verwendung |
|
||||
|------|---------|--------|------------|
|
||||
| **OWASP ZAP** | 2.14.x | Apache-2.0 | Staging-Scans (nightly) |
|
||||
|
||||
```bash
|
||||
# ZAP Scan gegen Staging
|
||||
docker run -t owasp/zap2docker-stable zap-baseline.py \
|
||||
-t http://staging.breakpilot.app -r zap-report.html
|
||||
```
|
||||
|
||||
## Pre-Commit Hooks
|
||||
|
||||
Die Pre-Commit-Konfiguration (`.pre-commit-config.yaml`) fuehrt automatisch bei jedem Commit aus:
|
||||
|
||||
1. **Schnelle Checks** (< 10 Sekunden):
|
||||
- Gitleaks (Secrets)
|
||||
- Trailing Whitespace
|
||||
- YAML/JSON Validierung
|
||||
|
||||
2. **Code Quality** (< 30 Sekunden):
|
||||
- Black/Ruff (Python Formatting)
|
||||
- Go fmt/vet
|
||||
- ESLint (JavaScript)
|
||||
|
||||
3. **Security Checks** (< 60 Sekunden):
|
||||
- Bandit (Python Security)
|
||||
- Semgrep (Error-Severity)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Pre-commit installieren
|
||||
pip install pre-commit
|
||||
|
||||
# Hooks aktivieren
|
||||
pre-commit install
|
||||
|
||||
# Alle Checks manuell ausfuehren
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
## Severity-Gates
|
||||
|
||||
| Phase | Severity | Aktion |
|
||||
|-------|----------|--------|
|
||||
| Pre-Commit | ERROR | Commit blockiert |
|
||||
| PR/CI | CRITICAL, HIGH | Pipeline blockiert |
|
||||
| Nightly Scan | MEDIUM+ | Report generiert |
|
||||
| Production Deploy | CRITICAL | Deploy blockiert |
|
||||
|
||||
## Security Dashboard
|
||||
|
||||
Das BreakPilot Admin Panel enthaelt ein integriertes Security Dashboard unter **Verwaltung > Security**.
|
||||
|
||||
### Features
|
||||
|
||||
**Fuer Entwickler:**
|
||||
- Scan-Ergebnisse auf einen Blick
|
||||
- Pre-commit Hook Status
|
||||
- Quick-Fix Suggestions
|
||||
- SBOM Viewer mit Suchfunktion
|
||||
|
||||
**Fuer Security-Experten:**
|
||||
- Vulnerability Severity Distribution (Critical/High/Medium/Low)
|
||||
- CVE-Tracking mit Fix-Verfuegbarkeit
|
||||
- Compliance-Status (OWASP Top 10, DSGVO)
|
||||
- Secrets Detection History
|
||||
|
||||
**Fuer Ops:**
|
||||
- Container Image Scan Results
|
||||
- Dependency Update Status
|
||||
- Security Scan Scheduling
|
||||
- Auto-Refresh alle 30 Sekunden
|
||||
|
||||
### API Endpoints
|
||||
|
||||
```
|
||||
GET /api/v1/security/tools - Tool-Status
|
||||
GET /api/v1/security/findings - Alle Findings
|
||||
GET /api/v1/security/summary - Severity-Zusammenfassung
|
||||
GET /api/v1/security/sbom - SBOM-Daten
|
||||
GET /api/v1/security/history - Scan-Historie
|
||||
GET /api/v1/security/reports/{tool} - Tool-spezifischer Report
|
||||
POST /api/v1/security/scan/{type} - Scan starten
|
||||
GET /api/v1/security/health - Health-Check
|
||||
```
|
||||
|
||||
## Compliance
|
||||
|
||||
Die DevSecOps-Pipeline unterstuetzt folgende Compliance-Anforderungen:
|
||||
|
||||
- **DSGVO/GDPR**: Automatische Erkennung von PII-Leaks
|
||||
- **OWASP Top 10**: SAST/DAST-Scans gegen bekannte Schwachstellen
|
||||
- **Supply Chain Security**: SBOM-Generierung fuer Audit-Trails
|
||||
- **CVE Tracking**: Automatischer Abgleich mit NVD/CVE-Datenbanken
|
||||
|
||||
## Tool-Installation
|
||||
|
||||
### macOS (Homebrew)
|
||||
|
||||
```bash
|
||||
# Security Tools
|
||||
brew install gitleaks
|
||||
brew install trivy
|
||||
brew install syft
|
||||
brew install grype
|
||||
|
||||
# Python Tools
|
||||
pip install semgrep bandit pre-commit
|
||||
```
|
||||
|
||||
### Linux (apt/snap)
|
||||
|
||||
```bash
|
||||
# Gitleaks
|
||||
sudo snap install gitleaks
|
||||
|
||||
# Trivy
|
||||
sudo apt-get install trivy
|
||||
|
||||
# Python Tools
|
||||
pip install semgrep bandit pre-commit
|
||||
```
|
||||
197
docs-src/architecture/environments.md
Normal file
197
docs-src/architecture/environments.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Umgebungs-Architektur
|
||||
|
||||
## Übersicht
|
||||
|
||||
BreakPilot verwendet eine 3-Umgebungs-Strategie für sichere Entwicklung und Deployment:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Development │────▶│ Staging │────▶│ Production │
|
||||
│ (develop) │ │ (staging) │ │ (main) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
Tägliche Getesteter Code Produktionsreif
|
||||
Entwicklung
|
||||
```
|
||||
|
||||
## Umgebungen
|
||||
|
||||
### Development (Dev)
|
||||
|
||||
**Zweck:** Tägliche Entwicklungsarbeit
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `develop` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.override.yml` (auto) |
|
||||
| Env File | `.env.dev` |
|
||||
| Database | `breakpilot_dev` |
|
||||
| Debug | Aktiviert |
|
||||
| Hot-Reload | Aktiviert |
|
||||
|
||||
**Start:**
|
||||
```bash
|
||||
./scripts/start.sh dev
|
||||
# oder einfach:
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Staging
|
||||
|
||||
**Zweck:** Getesteter, freigegebener Code vor Produktion
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `staging` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.staging.yml` |
|
||||
| Env File | `.env.staging` |
|
||||
| Database | `breakpilot_staging` (separates Volume) |
|
||||
| Debug | Deaktiviert |
|
||||
| Hot-Reload | Deaktiviert |
|
||||
|
||||
**Start:**
|
||||
```bash
|
||||
./scripts/start.sh staging
|
||||
# oder:
|
||||
docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d
|
||||
```
|
||||
|
||||
### Production (Prod)
|
||||
|
||||
**Zweck:** Live-System für Endbenutzer (ab Launch)
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `main` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.prod.yml` |
|
||||
| Env File | `.env.prod` (NICHT im Repository!) |
|
||||
| Database | `breakpilot_prod` (separates Volume) |
|
||||
| Debug | Deaktiviert |
|
||||
| Vault | Pflicht (keine Env-Fallbacks) |
|
||||
|
||||
## Datenbank-Trennung
|
||||
|
||||
Jede Umgebung verwendet separate Docker Volumes für vollständige Datenisolierung:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ PostgreSQL Volumes │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ breakpilot-dev_postgres_data │ Development Database │
|
||||
│ breakpilot_staging_postgres │ Staging Database │
|
||||
│ breakpilot_prod_postgres │ Production Database │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Port-Mapping
|
||||
|
||||
Um mehrere Umgebungen gleichzeitig laufen zu lassen, verwenden sie unterschiedliche Ports:
|
||||
|
||||
| Service | Dev Port | Staging Port | Prod Port |
|
||||
|---------|----------|--------------|-----------|
|
||||
| Backend | 8000 | 8001 | 8000 |
|
||||
| PostgreSQL | 5432 | 5433 | - (intern) |
|
||||
| MinIO | 9000/9001 | 9002/9003 | - (intern) |
|
||||
| Qdrant | 6333/6334 | 6335/6336 | - (intern) |
|
||||
| Mailpit | 8025/1025 | 8026/1026 | - (deaktiviert) |
|
||||
|
||||
## Git Branching Strategie
|
||||
|
||||
```
|
||||
main (Prod) ← Nur Release-Merges, geschützt
|
||||
│
|
||||
▼
|
||||
staging ← Getesteter Code, Review erforderlich
|
||||
│
|
||||
▼
|
||||
develop (Dev) ← Tägliche Arbeit, Default-Branch
|
||||
│
|
||||
▼
|
||||
feature/* ← Feature-Branches (optional)
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Entwicklung:** Arbeite auf `develop`
|
||||
2. **Code-Review:** Erstelle PR von Feature-Branch → `develop`
|
||||
3. **Staging:** Promote `develop` → `staging` mit Tests
|
||||
4. **Release:** Promote `staging` → `main` nach Freigabe
|
||||
|
||||
### Promotion-Befehle
|
||||
|
||||
```bash
|
||||
# develop → staging
|
||||
./scripts/promote.sh dev-to-staging
|
||||
|
||||
# staging → main (Production)
|
||||
./scripts/promote.sh staging-to-prod
|
||||
```
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### Development
|
||||
- `.env.dev` enthält Entwicklungs-Credentials
|
||||
- Vault optional (Dev-Token)
|
||||
- Mailpit für E-Mail-Tests
|
||||
|
||||
### Staging
|
||||
- `.env.staging` enthält Test-Credentials
|
||||
- Vault empfohlen
|
||||
- Mailpit für E-Mail-Sicherheit
|
||||
|
||||
### Production
|
||||
- `.env.prod` NICHT im Repository
|
||||
- Vault PFLICHT
|
||||
- Echte SMTP-Konfiguration
|
||||
|
||||
Siehe auch: [Secrets Management](./secrets-management.md)
|
||||
|
||||
## Docker Compose Architektur
|
||||
|
||||
```
|
||||
docker-compose.yml ← Basis-Konfiguration
|
||||
│
|
||||
├── docker-compose.override.yml ← Dev (auto-geladen)
|
||||
│
|
||||
├── docker-compose.staging.yml ← Staging (explizit)
|
||||
│
|
||||
└── docker-compose.prod.yml ← Production (explizit)
|
||||
```
|
||||
|
||||
### Automatisches Laden
|
||||
|
||||
Docker Compose lädt automatisch:
|
||||
1. `docker-compose.yml`
|
||||
2. `docker-compose.override.yml` (falls vorhanden)
|
||||
|
||||
Daher startet `docker compose up` automatisch die Dev-Umgebung.
|
||||
|
||||
## Helper Scripts
|
||||
|
||||
| Script | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `scripts/env-switch.sh` | Wechselt zwischen Umgebungen |
|
||||
| `scripts/start.sh` | Startet Services für Umgebung |
|
||||
| `scripts/stop.sh` | Stoppt Services |
|
||||
| `scripts/promote.sh` | Promotet Code zwischen Branches |
|
||||
| `scripts/status.sh` | Zeigt aktuellen Status |
|
||||
|
||||
## Verifikation
|
||||
|
||||
Nach Setup prüfen:
|
||||
|
||||
```bash
|
||||
# Status anzeigen
|
||||
./scripts/status.sh
|
||||
|
||||
# Branches prüfen
|
||||
git branch -v
|
||||
|
||||
# Volumes prüfen
|
||||
docker volume ls | grep breakpilot
|
||||
```
|
||||
|
||||
## Verwandte Dokumentation
|
||||
|
||||
- [Secrets Management](./secrets-management.md) - Vault & Secrets
|
||||
- [DevSecOps](./devsecops.md) - CI/CD & Security
|
||||
- [System-Architektur](./system-architecture.md) - Gesamtarchitektur
|
||||
215
docs-src/architecture/mail-rbac-architecture.md
Normal file
215
docs-src/architecture/mail-rbac-architecture.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Mail-RBAC Architektur mit Mitarbeiter-Anonymisierung
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Status:** Architekturplanung
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Dieses Dokument beschreibt eine neuartige Architektur, die E-Mail, Kalender und Videokonferenzen mit rollenbasierter Zugriffskontrolle (RBAC) verbindet. Das Kernkonzept ermöglicht die **vollständige Anonymisierung von Mitarbeiterdaten** bei Verlassen des Unternehmens, während geschäftliche Kommunikationshistorie erhalten bleibt.
|
||||
|
||||
---
|
||||
|
||||
## 1. Das Problem
|
||||
|
||||
### Traditionelle E-Mail-Systeme
|
||||
```
|
||||
max.mustermann@firma.de → Person gebunden
|
||||
→ DSGVO: Daten müssen gelöscht werden
|
||||
→ Geschäftshistorie geht verloren
|
||||
```
|
||||
|
||||
### BreakPilot-Lösung: Rollenbasierte E-Mail
|
||||
```
|
||||
klassenlehrer.5a@schule.breakpilot.app → Rolle gebunden
|
||||
→ Person kann anonymisiert werden
|
||||
→ Kommunikationshistorie bleibt erhalten
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Architektur-Übersicht
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ BreakPilot Groupware │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Webmail │ │ Kalender │ │ Jitsi │ │
|
||||
│ │ (SOGo) │ │ (SOGo) │ │ Meeting │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────────────┼────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────┴───────────┐ │
|
||||
│ │ RBAC-Mail-Bridge │ ◄─── Neue Komponente │
|
||||
│ │ (Python/Go) │ │
|
||||
│ └───────────┬───────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────┼─────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
|
||||
│ │PostgreSQL│ │ Mail Server │ │ MinIO │ │
|
||||
│ │(RBAC DB) │ │ (Stalwart) │ │ (Backups) │ │
|
||||
│ └──────────┘ └──────────────┘ └────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Komponenten-Auswahl
|
||||
|
||||
### 3.1 E-Mail Server: Stalwart Mail Server
|
||||
|
||||
**Empfehlung:** [Stalwart Mail Server](https://stalw.art/)
|
||||
|
||||
| Kriterium | Bewertung |
|
||||
|-----------|-----------|
|
||||
| Lizenz | AGPL-3.0 (Open Source) |
|
||||
| Sprache | Rust (performant, sicher) |
|
||||
| Features | IMAP, SMTP, JMAP, WebSocket |
|
||||
| Kalender | CalDAV integriert |
|
||||
| Kontakte | CardDAV integriert |
|
||||
| Spam/Virus | Integriert |
|
||||
| API | REST API für Administration |
|
||||
|
||||
### 3.2 Webmail-Client: SOGo oder Roundcube
|
||||
|
||||
**Option A: SOGo** (empfohlen)
|
||||
- Lizenz: GPL-2.0 / LGPL-2.1
|
||||
- Kalender, Kontakte, Mail in einem
|
||||
- ActiveSync Support
|
||||
- Outlook-ähnliche Oberfläche
|
||||
|
||||
**Option B: Roundcube**
|
||||
- Lizenz: GPL-3.0
|
||||
- Nur Webmail
|
||||
- Benötigt separaten Kalender
|
||||
|
||||
---
|
||||
|
||||
## 4. Anonymisierungs-Workflow
|
||||
|
||||
```
|
||||
Mitarbeiter kündigt
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ 1. Functional Mailboxes │
|
||||
│ → Neu zuweisen oder │
|
||||
│ → Deaktivieren │
|
||||
└───────────┬───────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ 2. Personal Email Account │
|
||||
│ → Anonymisieren: │
|
||||
│ max.mustermann@... │
|
||||
│ → mitarbeiter_a7x2@... │
|
||||
└───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ 3. Users-Tabelle │
|
||||
│ → Pseudonymisieren: │
|
||||
│ name: "Max Mustermann" │
|
||||
│ → "Ehem. Mitarbeiter" │
|
||||
└───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ 4. Mailbox Assignments │
|
||||
│ → Bleiben für Audit │
|
||||
│ → User-Referenz zeigt │
|
||||
│ auf anonymisierte │
|
||||
│ Daten │
|
||||
└───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ 5. E-Mail-Archiv │
|
||||
│ → Header anonymisieren │
|
||||
│ → Inhalte optional │
|
||||
│ löschen │
|
||||
└───────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Unified Inbox Implementation
|
||||
|
||||
### Implementierte Komponenten
|
||||
|
||||
Die Unified Inbox wurde als Teil des klausur-service implementiert:
|
||||
|
||||
| Komponente | Pfad | Beschreibung |
|
||||
|------------|------|--------------|
|
||||
| **Models** | `klausur-service/backend/mail/models.py` | Pydantic Models für Accounts, E-Mails, Tasks |
|
||||
| **Database** | `klausur-service/backend/mail/mail_db.py` | PostgreSQL-Operationen mit asyncpg |
|
||||
| **Credentials** | `klausur-service/backend/mail/credentials.py` | Vault-Integration für IMAP/SMTP-Passwörter |
|
||||
| **Aggregator** | `klausur-service/backend/mail/aggregator.py` | Multi-Account IMAP Sync |
|
||||
| **AI Service** | `klausur-service/backend/mail/ai_service.py` | KI-Analyse (Absender, Fristen, Kategorien) |
|
||||
| **Task Service** | `klausur-service/backend/mail/task_service.py` | Arbeitsvorrat-Management |
|
||||
| **API** | `klausur-service/backend/mail/api.py` | FastAPI Router mit 30+ Endpoints |
|
||||
|
||||
### API-Endpoints (Port 8086)
|
||||
|
||||
```
|
||||
# Account Management
|
||||
POST /api/v1/mail/accounts - Neues Konto hinzufügen
|
||||
GET /api/v1/mail/accounts - Alle Konten auflisten
|
||||
DELETE /api/v1/mail/accounts/{id} - Konto entfernen
|
||||
POST /api/v1/mail/accounts/{id}/test - Verbindung testen
|
||||
|
||||
# Unified Inbox
|
||||
GET /api/v1/mail/inbox - Aggregierte Inbox
|
||||
GET /api/v1/mail/inbox/{id} - Einzelne E-Mail
|
||||
POST /api/v1/mail/send - E-Mail senden
|
||||
|
||||
# KI-Features
|
||||
POST /api/v1/mail/analyze/{id} - E-Mail analysieren
|
||||
GET /api/v1/mail/suggestions/{id} - Antwortvorschläge
|
||||
|
||||
# Arbeitsvorrat
|
||||
GET /api/v1/mail/tasks - Alle Tasks
|
||||
POST /api/v1/mail/tasks - Manuelle Task erstellen
|
||||
PATCH /api/v1/mail/tasks/{id} - Task aktualisieren
|
||||
GET /api/v1/mail/tasks/dashboard - Dashboard-Statistiken
|
||||
```
|
||||
|
||||
### Niedersachsen-spezifische Absendererkennung
|
||||
|
||||
```python
|
||||
KNOWN_AUTHORITIES_NI = {
|
||||
"@mk.niedersachsen.de": "Kultusministerium Niedersachsen",
|
||||
"@rlsb.de": "Regionales Landesamt für Schule und Bildung",
|
||||
"@landesschulbehoerde-nds.de": "Landesschulbehörde",
|
||||
"@nibis.de": "NiBiS",
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Lizenz-Übersicht
|
||||
|
||||
| Komponente | Lizenz | Kommerzielle Nutzung | Veröffentlichungspflicht |
|
||||
|------------|--------|---------------------|-------------------------|
|
||||
| Stalwart Mail | AGPL-3.0 | Ja | Nur bei Code-Änderungen |
|
||||
| SOGo | GPL-2.0/LGPL | Ja | Nur bei Code-Änderungen |
|
||||
| Roundcube | GPL-3.0 | Ja | Nur bei Code-Änderungen |
|
||||
| RBAC-Mail-Bridge | Eigene | N/A | Kann proprietär bleiben |
|
||||
| BreakPilot Backend | Eigene | N/A | Proprietär |
|
||||
|
||||
---
|
||||
|
||||
## 7. Referenzen
|
||||
|
||||
- [Stalwart Mail Server](https://stalw.art/)
|
||||
- [SOGo Groupware](https://www.sogo.nu/)
|
||||
- [Roundcube Webmail](https://roundcube.net/)
|
||||
- [CalDAV Standard](https://tools.ietf.org/html/rfc4791)
|
||||
- [DSGVO Art. 17 - Recht auf Löschung](https://dsgvo-gesetz.de/art-17-dsgvo/)
|
||||
286
docs-src/architecture/multi-agent.md
Normal file
286
docs-src/architecture/multi-agent.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Multi-Agent Architektur - Entwicklerdokumentation
|
||||
|
||||
**Status:** Implementiert
|
||||
**Modul:** `/agent-core/`
|
||||
|
||||
---
|
||||
|
||||
## 1. Übersicht
|
||||
|
||||
Die Multi-Agent-Architektur erweitert Breakpilot um ein verteiltes Agent-System basierend auf Mission Control Konzepten.
|
||||
|
||||
### Kernkomponenten
|
||||
|
||||
| Komponente | Pfad | Beschreibung |
|
||||
|------------|------|--------------|
|
||||
| Session Management | `/agent-core/sessions/` | Lifecycle & Recovery |
|
||||
| Shared Brain | `/agent-core/brain/` | Langzeit-Gedächtnis |
|
||||
| Orchestrator | `/agent-core/orchestrator/` | Koordination |
|
||||
| SOUL Files | `/agent-core/soul/` | Agent-Persönlichkeiten |
|
||||
|
||||
---
|
||||
|
||||
## 2. Agent-Typen
|
||||
|
||||
| Agent | Aufgabe | SOUL-Datei |
|
||||
|-------|---------|------------|
|
||||
| **TutorAgent** | Lernbegleitung, Fragen beantworten | `tutor-agent.soul.md` |
|
||||
| **GraderAgent** | Klausur-Korrektur, Bewertung | `grader-agent.soul.md` |
|
||||
| **QualityJudge** | BQAS Qualitätsprüfung | `quality-judge.soul.md` |
|
||||
| **AlertAgent** | Monitoring, Benachrichtigungen | `alert-agent.soul.md` |
|
||||
| **Orchestrator** | Task-Koordination | `orchestrator.soul.md` |
|
||||
|
||||
---
|
||||
|
||||
## 3. Wichtige Dateien
|
||||
|
||||
### Session Management
|
||||
```
|
||||
agent-core/sessions/
|
||||
├── session_manager.py # AgentSession, SessionManager, SessionState
|
||||
├── heartbeat.py # HeartbeatMonitor, HeartbeatClient
|
||||
└── checkpoint.py # CheckpointManager
|
||||
```
|
||||
|
||||
### Shared Brain
|
||||
```
|
||||
agent-core/brain/
|
||||
├── memory_store.py # MemoryStore, Memory (mit TTL)
|
||||
├── context_manager.py # ConversationContext, ContextManager
|
||||
└── knowledge_graph.py # KnowledgeGraph, Entity, Relationship
|
||||
```
|
||||
|
||||
### Orchestrator
|
||||
```
|
||||
agent-core/orchestrator/
|
||||
├── message_bus.py # MessageBus, AgentMessage, MessagePriority
|
||||
├── supervisor.py # AgentSupervisor, AgentInfo, AgentStatus
|
||||
└── task_router.py # TaskRouter, RoutingRule, RoutingResult
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Datenbank-Schema
|
||||
|
||||
Die Migration befindet sich in:
|
||||
`/backend/migrations/add_agent_core_tables.sql`
|
||||
|
||||
### Tabellen
|
||||
|
||||
1. **agent_sessions** - Session-Daten mit Checkpoints
|
||||
2. **agent_memory** - Langzeit-Gedächtnis mit TTL
|
||||
3. **agent_messages** - Audit-Trail für Inter-Agent Kommunikation
|
||||
|
||||
### Helper-Funktionen
|
||||
|
||||
```sql
|
||||
-- Abgelaufene Memories bereinigen
|
||||
SELECT cleanup_expired_agent_memory();
|
||||
|
||||
-- Inaktive Sessions bereinigen
|
||||
SELECT cleanup_stale_agent_sessions(48); -- 48 Stunden
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Integration Voice-Service
|
||||
|
||||
Der `EnhancedTaskOrchestrator` erweitert den bestehenden `TaskOrchestrator`:
|
||||
|
||||
```python
|
||||
# voice-service/services/enhanced_task_orchestrator.py
|
||||
|
||||
from agent_core.sessions import SessionManager
|
||||
from agent_core.orchestrator import MessageBus
|
||||
|
||||
class EnhancedTaskOrchestrator(TaskOrchestrator):
|
||||
# Nutzt Session-Checkpoints für Recovery
|
||||
# Routet komplexe Tasks an spezialisierte Agents
|
||||
# Führt Quality-Checks via BQAS durch
|
||||
```
|
||||
|
||||
**Wichtig:** Der Enhanced Orchestrator ist abwärtskompatibel und kann parallel zum Original verwendet werden.
|
||||
|
||||
---
|
||||
|
||||
## 6. Integration BQAS
|
||||
|
||||
Der `QualityJudgeAgent` integriert BQAS mit dem Multi-Agent-System:
|
||||
|
||||
```python
|
||||
# voice-service/bqas/quality_judge_agent.py
|
||||
|
||||
from bqas.judge import LLMJudge
|
||||
from agent_core.orchestrator import MessageBus
|
||||
|
||||
class QualityJudgeAgent:
|
||||
# Wertet Responses in Echtzeit aus
|
||||
# Nutzt Memory für konsistente Bewertungen
|
||||
# Empfängt Evaluierungs-Requests via Message Bus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Code-Beispiele
|
||||
|
||||
### Session erstellen
|
||||
|
||||
```python
|
||||
from agent_core.sessions import SessionManager
|
||||
|
||||
manager = SessionManager(redis_client=redis, db_pool=pool)
|
||||
session = await manager.create_session(
|
||||
agent_type="tutor-agent",
|
||||
user_id="user-123"
|
||||
)
|
||||
```
|
||||
|
||||
### Memory speichern
|
||||
|
||||
```python
|
||||
from agent_core.brain import MemoryStore
|
||||
|
||||
store = MemoryStore(redis_client=redis, db_pool=pool)
|
||||
await store.remember(
|
||||
key="student:123:progress",
|
||||
value={"level": 5, "score": 85},
|
||||
agent_id="tutor-agent",
|
||||
ttl_days=30
|
||||
)
|
||||
```
|
||||
|
||||
### Nachricht senden
|
||||
|
||||
```python
|
||||
from agent_core.orchestrator import MessageBus, AgentMessage
|
||||
|
||||
bus = MessageBus(redis_client=redis)
|
||||
await bus.publish(AgentMessage(
|
||||
sender="orchestrator",
|
||||
receiver="grader-agent",
|
||||
message_type="grade_request",
|
||||
payload={"exam_id": "exam-1"}
|
||||
))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Tests ausführen
|
||||
|
||||
```bash
|
||||
# Alle Agent-Core Tests
|
||||
cd agent-core && pytest -v
|
||||
|
||||
# Mit Coverage-Report
|
||||
pytest --cov=. --cov-report=html
|
||||
|
||||
# Einzelne Module
|
||||
pytest tests/test_session_manager.py -v
|
||||
pytest tests/test_message_bus.py -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Deployment-Schritte
|
||||
|
||||
### 1. Migration ausführen
|
||||
|
||||
```bash
|
||||
psql -h localhost -U breakpilot -d breakpilot \
|
||||
-f backend/migrations/add_agent_core_tables.sql
|
||||
```
|
||||
|
||||
### 2. Voice-Service aktualisieren
|
||||
|
||||
```bash
|
||||
# Sync zu Server
|
||||
rsync -avz --exclude 'node_modules' --exclude '.git' \
|
||||
/path/to/breakpilot-pwa/ server:/path/to/breakpilot-pwa/
|
||||
|
||||
# Container neu bauen
|
||||
docker compose build --no-cache voice-service
|
||||
|
||||
# Starten
|
||||
docker compose up -d voice-service
|
||||
```
|
||||
|
||||
### 3. Verifizieren
|
||||
|
||||
```bash
|
||||
# Session-Tabelle prüfen
|
||||
psql -c "SELECT COUNT(*) FROM agent_sessions;"
|
||||
|
||||
# Memory-Tabelle prüfen
|
||||
psql -c "SELECT COUNT(*) FROM agent_memory;"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Monitoring
|
||||
|
||||
### Metriken
|
||||
|
||||
| Metrik | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `agent_session_count` | Anzahl aktiver Sessions |
|
||||
| `agent_heartbeat_delay_ms` | Zeit seit letztem Heartbeat |
|
||||
| `agent_message_latency_ms` | Nachrichtenlatenz |
|
||||
| `agent_memory_count` | Gespeicherte Memories |
|
||||
| `agent_routing_success_rate` | Erfolgreiche Routings |
|
||||
|
||||
### Health-Check-Endpunkte
|
||||
|
||||
```
|
||||
GET /api/v1/agents/health # Supervisor Status
|
||||
GET /api/v1/agents/sessions # Aktive Sessions
|
||||
GET /api/v1/agents/memory/stats # Memory-Statistiken
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Troubleshooting
|
||||
|
||||
### Problem: Session nicht gefunden
|
||||
|
||||
1. Prüfen ob Valkey läuft: `redis-cli ping`
|
||||
2. Session-Timeout prüfen (default 24h)
|
||||
3. Heartbeat-Status checken
|
||||
|
||||
### Problem: Message Bus Timeout
|
||||
|
||||
1. Redis Pub/Sub Status prüfen
|
||||
2. Ziel-Agent registriert?
|
||||
3. Timeout erhöhen (default 30s)
|
||||
|
||||
### Problem: Memory nicht gefunden
|
||||
|
||||
1. Namespace korrekt?
|
||||
2. TTL abgelaufen?
|
||||
3. Cleanup-Job gelaufen?
|
||||
|
||||
---
|
||||
|
||||
## 12. Erweiterungen
|
||||
|
||||
### Neuen Agent hinzufügen
|
||||
|
||||
1. SOUL-Datei erstellen in `/agent-core/soul/`
|
||||
2. Routing-Regel in `task_router.py` hinzufügen
|
||||
3. Handler beim Supervisor registrieren
|
||||
4. Tests schreiben
|
||||
|
||||
### Neuen Memory-Typ hinzufügen
|
||||
|
||||
1. Key-Schema definieren (z.B. `student:*:progress`)
|
||||
2. TTL festlegen
|
||||
3. Access-Pattern dokumentieren
|
||||
|
||||
---
|
||||
|
||||
## 13. Referenzen
|
||||
|
||||
- **Agent-Core README:** `/agent-core/README.md`
|
||||
- **Migration:** `/backend/migrations/add_agent_core_tables.sql`
|
||||
- **Voice-Service Integration:** `/voice-service/services/enhanced_task_orchestrator.py`
|
||||
- **BQAS Integration:** `/voice-service/bqas/quality_judge_agent.py`
|
||||
- **Tests:** `/agent-core/tests/`
|
||||
251
docs-src/architecture/secrets-management.md
Normal file
251
docs-src/architecture/secrets-management.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# BreakPilot Secrets Management
|
||||
|
||||
## Uebersicht
|
||||
|
||||
BreakPilot verwendet **HashiCorp Vault** als zentrales Secrets-Management-System.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS MANAGEMENT │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ HashiCorp Vault │ │
|
||||
│ │ Port 8200 │ │
|
||||
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
|
||||
│ │ │ KV v2 Engine │ │ AppRole Auth │ │ Audit Logging │ │ │
|
||||
│ │ │ secret/ │ │ Token Auth │ │ Verschluesselung │ │ │
|
||||
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────┼─────────────────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||||
│ │ Python Backend │ │ Go Services │ │ Frontend │ │
|
||||
│ │ (hvac client) │ │ (vault-client) │ │ (via Backend) │ │
|
||||
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Warum Vault?
|
||||
|
||||
| Alternative | Nachteil |
|
||||
|-------------|----------|
|
||||
| Environment Variables | Keine Audit-Logs, keine Verschluesselung, keine Rotation |
|
||||
| Docker Secrets | Nur fuer Docker Swarm, keine zentrale Verwaltung |
|
||||
| AWS Secrets Manager | Cloud Lock-in, Kosten |
|
||||
| Kubernetes Secrets | Keine Verschluesselung by default, nur K8s |
|
||||
| **HashiCorp Vault** | Open Source (BSL 1.1), Self-Hosted, Enterprise Features |
|
||||
|
||||
## Architektur
|
||||
|
||||
### Secret-Hierarchie
|
||||
|
||||
```
|
||||
secret/breakpilot/
|
||||
├── api_keys/
|
||||
│ ├── anthropic # Anthropic Claude API Key
|
||||
│ ├── vast # vast.ai GPU API Key
|
||||
│ ├── stripe # Stripe Payment Key
|
||||
│ ├── stripe_webhook
|
||||
│ └── tavily # Tavily Search API Key
|
||||
├── database/
|
||||
│ ├── postgres # username, password, url
|
||||
│ └── synapse # Matrix Synapse DB
|
||||
├── auth/
|
||||
│ ├── jwt # secret, refresh_secret
|
||||
│ └── keycloak # client_secret
|
||||
├── communication/
|
||||
│ ├── matrix # access_token, db_password
|
||||
│ └── jitsi # app_secret, jicofo, jvb passwords
|
||||
├── storage/
|
||||
│ └── minio # access_key, secret_key
|
||||
└── infra/
|
||||
└── vast # api_key, instance_id, control_key
|
||||
```
|
||||
|
||||
### Python Integration
|
||||
|
||||
```python
|
||||
from secrets import get_secret
|
||||
|
||||
# Einzelnes Secret abrufen
|
||||
api_key = get_secret("ANTHROPIC_API_KEY")
|
||||
|
||||
# Mit Default-Wert
|
||||
debug = get_secret("DEBUG", default="false")
|
||||
|
||||
# Als Pflicht-Secret
|
||||
db_url = get_secret("DATABASE_URL", required=True)
|
||||
```
|
||||
|
||||
### Fallback-Reihenfolge
|
||||
|
||||
```
|
||||
1. HashiCorp Vault (wenn VAULT_ADDR gesetzt)
|
||||
↓ falls nicht verfuegbar
|
||||
2. Environment Variables
|
||||
↓ falls nicht gesetzt
|
||||
3. Docker Secrets (/run/secrets/)
|
||||
↓ falls nicht vorhanden
|
||||
4. Default-Wert (wenn angegeben)
|
||||
↓ sonst
|
||||
5. SecretNotFoundError (wenn required=True)
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### Entwicklung (Dev Mode)
|
||||
|
||||
```bash
|
||||
# Vault starten (Dev Mode - NICHT fuer Produktion!)
|
||||
docker-compose -f docker-compose.vault.yml up -d vault
|
||||
|
||||
# Warten bis healthy
|
||||
docker-compose -f docker-compose.vault.yml up vault-init
|
||||
|
||||
# Environment setzen
|
||||
export VAULT_ADDR=http://localhost:8200
|
||||
export VAULT_TOKEN=breakpilot-dev-token
|
||||
```
|
||||
|
||||
### Secrets setzen
|
||||
|
||||
```bash
|
||||
# Anthropic API Key
|
||||
vault kv put secret/breakpilot/api_keys/anthropic value='sk-ant-api03-...'
|
||||
|
||||
# vast.ai Credentials
|
||||
vault kv put secret/breakpilot/infra/vast \
|
||||
api_key='xxx' \
|
||||
instance_id='123' \
|
||||
control_key='yyy'
|
||||
|
||||
# Database
|
||||
vault kv put secret/breakpilot/database/postgres \
|
||||
username='breakpilot' \
|
||||
password='supersecret' \
|
||||
url='postgres://breakpilot:supersecret@localhost:5432/breakpilot_db'
|
||||
```
|
||||
|
||||
### Secrets lesen
|
||||
|
||||
```bash
|
||||
# Liste aller Secrets
|
||||
vault kv list secret/breakpilot/
|
||||
|
||||
# Secret anzeigen
|
||||
vault kv get secret/breakpilot/api_keys/anthropic
|
||||
|
||||
# Nur den Wert
|
||||
vault kv get -field=value secret/breakpilot/api_keys/anthropic
|
||||
```
|
||||
|
||||
## Produktion
|
||||
|
||||
### AppRole Authentication
|
||||
|
||||
In Produktion verwenden Services AppRole statt Token-Auth:
|
||||
|
||||
```bash
|
||||
# 1. AppRole aktivieren (einmalig)
|
||||
vault auth enable approle
|
||||
|
||||
# 2. Policy erstellen
|
||||
vault policy write breakpilot-backend - <<EOF
|
||||
path "secret/data/breakpilot/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
EOF
|
||||
|
||||
# 3. Role erstellen
|
||||
vault write auth/approle/role/breakpilot-backend \
|
||||
token_policies="breakpilot-backend" \
|
||||
token_ttl=1h \
|
||||
token_max_ttl=4h
|
||||
|
||||
# 4. Role-ID holen (fix)
|
||||
vault read -field=role_id auth/approle/role/breakpilot-backend/role-id
|
||||
|
||||
# 5. Secret-ID generieren (bei jedem Deploy neu)
|
||||
vault write -f auth/approle/role/breakpilot-backend/secret-id
|
||||
```
|
||||
|
||||
### Environment fuer Services
|
||||
|
||||
```bash
|
||||
# Docker-Compose / Kubernetes
|
||||
VAULT_ADDR=https://vault.breakpilot.app:8200
|
||||
VAULT_AUTH_METHOD=approle
|
||||
VAULT_ROLE_ID=<role-id>
|
||||
VAULT_SECRET_ID=<secret-id>
|
||||
VAULT_SECRETS_PATH=breakpilot
|
||||
```
|
||||
|
||||
## Sicherheits-Checkliste
|
||||
|
||||
### Muss erfuellt sein
|
||||
|
||||
- [ ] Keine echten Secrets in `.env` Dateien
|
||||
- [ ] `.env` in `.gitignore`
|
||||
- [ ] Vault im Sealed-State wenn nicht in Verwendung
|
||||
- [ ] TLS fuer Vault in Produktion
|
||||
- [ ] AppRole statt Token-Auth in Produktion
|
||||
- [ ] Audit-Logging aktiviert
|
||||
- [ ] Minimale Policies (Least Privilege)
|
||||
|
||||
### Sollte erfuellt sein
|
||||
|
||||
- [ ] Automatische Secret-Rotation
|
||||
- [ ] Separate Vault-Instanz fuer Produktion
|
||||
- [ ] HSM-basiertes Auto-Unseal
|
||||
- [ ] Disaster Recovery Plan
|
||||
|
||||
## Dateien
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `backend/secrets/__init__.py` | Secrets-Modul Exports |
|
||||
| `backend/secrets/vault_client.py` | Vault Client Implementation |
|
||||
| `docker-compose.vault.yml` | Vault Docker Configuration |
|
||||
| `vault/init-secrets.sh` | Entwicklungs-Secrets Initialisierung |
|
||||
| `vault/policies/` | Vault Policy Files |
|
||||
|
||||
## Fehlerbehebung
|
||||
|
||||
### Vault nicht erreichbar
|
||||
|
||||
```bash
|
||||
# Status pruefen
|
||||
vault status
|
||||
|
||||
# Falls sealed
|
||||
vault operator unseal <unseal-key>
|
||||
```
|
||||
|
||||
### Secret nicht gefunden
|
||||
|
||||
```bash
|
||||
# Pfad pruefen
|
||||
vault kv list secret/breakpilot/
|
||||
|
||||
# Cache leeren (Python)
|
||||
from secrets import get_secrets_manager
|
||||
get_secrets_manager().clear_cache()
|
||||
```
|
||||
|
||||
### Token abgelaufen
|
||||
|
||||
```bash
|
||||
# Neuen Token holen (AppRole)
|
||||
vault write auth/approle/login \
|
||||
role_id=$VAULT_ROLE_ID \
|
||||
secret_id=$VAULT_SECRET_ID
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Referenzen
|
||||
|
||||
- [HashiCorp Vault Documentation](https://developer.hashicorp.com/vault/docs)
|
||||
- [hvac Python Client](https://hvac.readthedocs.io/)
|
||||
- [Vault Best Practices](https://developer.hashicorp.com/vault/tutorials/recommended-patterns)
|
||||
311
docs-src/architecture/system-architecture.md
Normal file
311
docs-src/architecture/system-architecture.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# BreakPilot PWA - System-Architektur
|
||||
|
||||
## Übersicht
|
||||
|
||||
BreakPilot ist eine modulare Bildungsplattform für Lehrkräfte mit folgenden Hauptkomponenten:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Browser │
|
||||
│ ┌───────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Frontend (Studio UI) │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │Dashboard │ │Worksheets│ │Correction│ │Letters/Companion │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────────┘ │
|
||||
└───────────────────────────┬─────────────────────────────────────────┘
|
||||
│ HTTP/REST
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Python Backend (FastAPI) │
|
||||
│ Port 8000 │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ API Layer │ │
|
||||
│ │ /api/worksheets /api/corrections /api/letters /api/state │ │
|
||||
│ │ /api/school /api/certificates /api/messenger /api/jitsi │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Service Layer │ │
|
||||
│ │ FileProcessor │ PDFService │ ContentGenerators │ StateEngine │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
└───────────────────────────┬─────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌───────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Go Consent │ │ PostgreSQL │ │ LLM Gateway │ │ HashiCorp │
|
||||
│ Service │ │ Database │ │ (optional) │ │ Vault │
|
||||
│ Port 8081 │ │ Port 5432 │ │ │ │ Port 8200 │
|
||||
└─────────────────┘ └───────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
## Komponenten
|
||||
|
||||
### 1. Admin Frontend (Next.js Website)
|
||||
|
||||
Das **Admin Frontend** ist eine vollständige Next.js 15 Anwendung für Developer und Administratoren:
|
||||
|
||||
**Technologie:** Next.js 15, React 18, TypeScript, Tailwind CSS
|
||||
|
||||
**Container:** `breakpilot-pwa-website` auf **Port 3000**
|
||||
|
||||
**Verzeichnis:** `/website`
|
||||
|
||||
| Modul | Route | Beschreibung |
|
||||
|-------|-------|--------------|
|
||||
| Dashboard | `/admin` | Übersicht & Statistiken |
|
||||
| GPU Infrastruktur | `/admin/gpu` | vast.ai GPU Management |
|
||||
| Consent Verwaltung | `/admin/consent` | Rechtliche Dokumente & Versionen |
|
||||
| Datenschutzanfragen | `/admin/dsr` | DSGVO Art. 15-21 Anfragen |
|
||||
| DSMS | `/admin/dsms` | Datenschutz-Management-System |
|
||||
| Education Search | `/admin/edu-search` | Bildungsquellen & Crawler |
|
||||
| Personensuche | `/admin/staff-search` | Uni-Mitarbeiter & Publikationen |
|
||||
| Uni-Crawler | `/admin/uni-crawler` | Universitäts-Crawling Orchestrator |
|
||||
| LLM Vergleich | `/admin/llm-compare` | KI-Provider Vergleich |
|
||||
| PCA Platform | `/admin/pca-platform` | Bot-Erkennung & Monetarisierung |
|
||||
| Production Backlog | `/admin/backlog` | Go-Live Checkliste |
|
||||
| Developer Docs | `/admin/docs` | API & Architektur Dokumentation |
|
||||
| Kommunikation | `/admin/communication` | Matrix & Jitsi Monitoring |
|
||||
| **Security** | `/admin/security` | DevSecOps Dashboard, Scans, Findings |
|
||||
| **SBOM** | `/admin/sbom` | Software Bill of Materials |
|
||||
|
||||
### 2. Lehrer Frontend (Studio UI)
|
||||
|
||||
Das **Lehrer Frontend** ist ein Single-Page-Application-ähnliches System für Lehrkräfte, das in Python-Modulen organisiert ist:
|
||||
|
||||
| Modul | Datei | Beschreibung |
|
||||
|-------|-------|--------------|
|
||||
| Base | `frontend/modules/base.py` | TopBar, Sidebar, Theme, Login |
|
||||
| Dashboard | `frontend/modules/dashboard.py` | Übersichtsseite |
|
||||
| Worksheets | `frontend/modules/worksheets.py` | Lerneinheiten-Generator |
|
||||
| Correction | `frontend/modules/correction.py` | OCR-Klausurkorrektur |
|
||||
| Letters | `frontend/modules/letters.py` | Elternkommunikation |
|
||||
| Companion | `frontend/modules/companion.py` | Begleiter-Modus mit State Engine |
|
||||
| School | `frontend/modules/school.py` | Schulverwaltung |
|
||||
| Gradebook | `frontend/modules/gradebook.py` | Notenbuch |
|
||||
| ContentCreator | `frontend/modules/content_creator.py` | H5P Content Creator |
|
||||
| ContentFeed | `frontend/modules/content_feed.py` | Content Discovery |
|
||||
| Messenger | `frontend/modules/messenger.py` | Matrix Messenger |
|
||||
| Jitsi | `frontend/modules/jitsi.py` | Videokonferenzen |
|
||||
| **KlausurKorrektur** | `frontend/modules/klausur_korrektur.py` | **Abitur-Klausurkorrektur (15-Punkte-System)** |
|
||||
| **AbiturDocsAdmin** | `frontend/modules/abitur_docs_admin.py` | **Admin für Abitur-Dokumente (NiBiS)** |
|
||||
|
||||
Jedes Modul exportiert:
|
||||
- `get_css()` - CSS-Styles
|
||||
- `get_html()` - HTML-Template
|
||||
- `get_js()` - JavaScript-Logik
|
||||
|
||||
### 3. Python Backend (FastAPI)
|
||||
|
||||
#### API-Router
|
||||
|
||||
| Router | Präfix | Beschreibung |
|
||||
|--------|--------|--------------|
|
||||
| `worksheets_api` | `/api/worksheets` | Content-Generatoren (MC, Cloze, Mindmap, Quiz) |
|
||||
| `correction_api` | `/api/corrections` | OCR-Pipeline für Klausurkorrektur |
|
||||
| `letters_api` | `/api/letters` | Elternbriefe mit GFK-Integration |
|
||||
| `state_engine_api` | `/api/state` | Begleiter-Modus Phasen & Vorschläge |
|
||||
| `school_api` | `/api/school` | Schulverwaltung (Proxy zu school-service) |
|
||||
| `certificates_api` | `/api/certificates` | Zeugniserstellung |
|
||||
| `messenger_api` | `/api/messenger` | Matrix Messenger Integration |
|
||||
| `jitsi_api` | `/api/jitsi` | Jitsi Meeting-Einladungen |
|
||||
| `consent_api` | `/api/consent` | DSGVO Consent-Verwaltung |
|
||||
| `gdpr_api` | `/api/gdpr` | GDPR-Export |
|
||||
| **`klausur_korrektur_api`** | `/api/klausur-korrektur` | **Abitur-Klausuren (15-Punkte, Gutachten, Fairness)** |
|
||||
| **`abitur_docs_api`** | `/api/abitur-docs` | **NiBiS-Dokumentenverwaltung für RAG** |
|
||||
|
||||
#### Services
|
||||
|
||||
| Service | Datei | Beschreibung |
|
||||
|---------|-------|--------------|
|
||||
| FileProcessor | `services/file_processor.py` | OCR mit PaddleOCR |
|
||||
| PDFService | `services/pdf_service.py` | PDF-Generierung |
|
||||
| ContentGenerators | `services/content_generators/` | MC, Cloze, Mindmap, Quiz |
|
||||
| StateEngine | `state_engine/` | Phasen-Management & Antizipation |
|
||||
|
||||
### 4. Klausur-Korrektur System (Abitur)
|
||||
|
||||
Das Klausur-Korrektur-System implementiert die vollständige Abitur-Bewertungspipeline:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Klausur-Korrektur Modul │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ │
|
||||
│ │ Modus-Wahl │───►│ Text-Quellen & │───►│ Erwartungs- │ │
|
||||
│ │ LandesAbi/ │ │ Rights-Gate │ │ horizont │ │
|
||||
│ │ Vorabitur │ └──────────────────┘ └─────────────────┘ │
|
||||
│ └─────────────┘ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Schülerarbeiten-Pipeline │ │
|
||||
│ │ Upload → OCR → KI-Bewertung → Gutachten → 15-Punkte-Note │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌────────────────────┐ ┌──────────────────────────────────┐ │
|
||||
│ │ Erst-/Zweitprüfer │───►│ Fairness-Analyse & PDF-Export │ │
|
||||
│ └────────────────────┘ └──────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### 15-Punkte-Notensystem
|
||||
|
||||
Das System verwendet den deutschen Abitur-Notenschlüssel:
|
||||
|
||||
| Punkte | Prozent | Note |
|
||||
|--------|---------|------|
|
||||
| 15-13 | 95-85% | 1+/1/1- |
|
||||
| 12-10 | 80-70% | 2+/2/2- |
|
||||
| 9-7 | 65-55% | 3+/3/3- |
|
||||
| 6-4 | 50-40% | 4+/4/4- |
|
||||
| 3-1 | 33-20% | 5+/5/5- |
|
||||
| 0 | <20% | 6 |
|
||||
|
||||
#### Bewertungskriterien
|
||||
|
||||
| Kriterium | Gewicht | Beschreibung |
|
||||
|-----------|---------|--------------|
|
||||
| Rechtschreibung | 15% | Orthografie |
|
||||
| Grammatik | 15% | Grammatik & Syntax |
|
||||
| **Inhalt** | **40%** | Inhaltliche Qualität (höchste Gewichtung) |
|
||||
| Struktur | 15% | Aufbau & Gliederung |
|
||||
| Stil | 15% | Ausdruck & Stil |
|
||||
|
||||
### 5. Go Consent Service
|
||||
|
||||
Verwaltet DSGVO-Einwilligungen:
|
||||
|
||||
```
|
||||
consent-service/
|
||||
├── cmd/server/ # Main entry point
|
||||
├── internal/
|
||||
│ ├── handlers/ # HTTP Handler
|
||||
│ ├── services/ # Business Logic
|
||||
│ ├── models/ # Data Models
|
||||
│ └── middleware/ # Auth Middleware
|
||||
└── migrations/ # SQL Migrations
|
||||
```
|
||||
|
||||
### 6. LLM Gateway (Optional)
|
||||
|
||||
Wenn `LLM_GATEWAY_ENABLED=true`:
|
||||
|
||||
```
|
||||
llm_gateway/
|
||||
├── routes/
|
||||
│ ├── chat.py # Chat-Completion API
|
||||
│ ├── communication.py # GFK-Validierung
|
||||
│ ├── edu_search_seeds.py # Bildungssuche
|
||||
│ └── legal_crawler.py # Schulgesetz-Crawler
|
||||
└── services/
|
||||
└── communication_service.py
|
||||
```
|
||||
|
||||
## Datenfluss
|
||||
|
||||
### Worksheet-Generierung
|
||||
|
||||
```
|
||||
User Input → Frontend (worksheets.py)
|
||||
↓
|
||||
POST /api/worksheets/generate/multiple-choice
|
||||
↓
|
||||
worksheets_api.py → MCGenerator (services/content_generators/)
|
||||
↓
|
||||
Optional: LLM für erweiterte Generierung
|
||||
↓
|
||||
Response: WorksheetContent → Frontend rendert Ergebnis
|
||||
```
|
||||
|
||||
### Klausurkorrektur
|
||||
|
||||
```
|
||||
File Upload → Frontend (correction.py)
|
||||
↓
|
||||
POST /api/corrections/ (erstellen)
|
||||
POST /api/corrections/{id}/upload (Datei)
|
||||
↓
|
||||
Background Task: OCR via FileProcessor
|
||||
↓
|
||||
Poll GET /api/corrections/{id} bis status="ocr_complete"
|
||||
↓
|
||||
POST /api/corrections/{id}/analyze
|
||||
↓
|
||||
Review Interface → PUT /api/corrections/{id} (Anpassungen)
|
||||
↓
|
||||
GET /api/corrections/{id}/export-pdf
|
||||
```
|
||||
|
||||
## Sicherheit
|
||||
|
||||
### Authentifizierung & Autorisierung
|
||||
|
||||
BreakPilot verwendet einen **Hybrid-Ansatz**:
|
||||
|
||||
| Schicht | Komponente | Beschreibung |
|
||||
|---------|------------|--------------|
|
||||
| **Authentifizierung** | Keycloak (Prod) / Lokales JWT (Dev) | Token-Validierung via JWKS oder HS256 |
|
||||
| **Autorisierung** | rbac.py (Eigenentwicklung) | Domaenenspezifische Berechtigungen |
|
||||
|
||||
Siehe: [Auth-System](auth-system.md)
|
||||
|
||||
### Basis-Rollen
|
||||
|
||||
| Rolle | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `user` | Normaler Benutzer |
|
||||
| `teacher` / `lehrer` | Lehrkraft |
|
||||
| `admin` | Administrator |
|
||||
| `data_protection_officer` | Datenschutzbeauftragter |
|
||||
|
||||
### Erweiterte Rollen (rbac.py)
|
||||
|
||||
15+ domaenenspezifische Rollen fuer Klausurkorrektur und Zeugnisse:
|
||||
- `erstkorrektor`, `zweitkorrektor`, `drittkorrektor`
|
||||
- `klassenlehrer`, `fachlehrer`, `fachvorsitz`
|
||||
- `schulleitung`, `zeugnisbeauftragter`, `sekretariat`
|
||||
|
||||
### Sicherheitsfeatures
|
||||
|
||||
- JWT-basierte Authentifizierung (RS256/HS256)
|
||||
- CORS konfiguriert für Frontend-Zugriff
|
||||
- DSGVO-konformes Consent-Management
|
||||
- **HashiCorp Vault** fuer Secrets-Management (keine hardcodierten Secrets)
|
||||
- Bundesland-spezifische Policy-Sets
|
||||
- **DevSecOps Pipeline** mit automatisierten Security-Scans (SAST, SCA, Secrets Detection)
|
||||
|
||||
Siehe:
|
||||
- [Secrets Management](secrets-management.md)
|
||||
- [DevSecOps](devsecops.md)
|
||||
|
||||
## Deployment
|
||||
|
||||
```yaml
|
||||
services:
|
||||
backend:
|
||||
build: ./backend
|
||||
ports: ["8000:8000"]
|
||||
environment:
|
||||
- DATABASE_URL=postgresql://...
|
||||
- LLM_GATEWAY_ENABLED=false
|
||||
|
||||
consent-service:
|
||||
build: ./consent-service
|
||||
ports: ["8081:8081"]
|
||||
|
||||
postgres:
|
||||
image: postgres:15
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
```
|
||||
|
||||
## Erweiterung
|
||||
|
||||
Neues Frontend-Modul hinzufügen:
|
||||
|
||||
1. Modul erstellen: `frontend/modules/new_module.py`
|
||||
2. Klasse mit `get_css()`, `get_html()`, `get_js()` implementieren
|
||||
3. In `frontend/modules/__init__.py` importieren und exportieren
|
||||
4. Optional: Zugehörige API in `new_module_api.py` erstellen
|
||||
5. In `main.py` Router registrieren
|
||||
169
docs-src/architecture/zeugnis-system.md
Normal file
169
docs-src/architecture/zeugnis-system.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# Zeugnis-System - Architecture Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The Zeugnis (Certificate) System enables schools to generate official school certificates with grades, attendance data, and remarks. It extends the existing School-Service with comprehensive grade management and certificate generation workflows.
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Python Backend (Port 8000) │
|
||||
│ backend/frontend/modules/school.py │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ panel-school-certificates │ │
|
||||
│ │ - Klassenauswahl │ │
|
||||
│ │ - Notenspiegel │ │
|
||||
│ │ - Zeugnis-Wizard (5 Steps) │ │
|
||||
│ │ - Workflow-Status │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
└──────────────────┬──────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ School-Service (Go, Port 8084) │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
|
||||
│ │ Grade Handlers │ │ Statistics Handlers │ │ Certificate Handlers │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ GetClassGrades │ │ GetClassStatistics │ │ GetCertificateTemplates │ │
|
||||
│ │ GetStudentGrades │ │ GetSubjectStatistics│ │ GetClassCertificates │ │
|
||||
│ │ UpdateOralGrade │ │ GetStudentStatistics│ │ GenerateCertificate │ │
|
||||
│ │ CalculateFinalGrades│ │ GetNotenspiegel │ │ BulkGenerateCertificates │ │
|
||||
│ │ LockFinalGrade │ │ │ │ FinalizeCertificate │ │
|
||||
│ │ UpdateGradeWeights │ │ │ │ GetCertificatePDF │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ └─────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ PostgreSQL Database │
|
||||
│ │
|
||||
│ Tables: │
|
||||
│ - grade_overview │
|
||||
│ - exam_results │
|
||||
│ - students │
|
||||
│ - classes │
|
||||
│ - subjects │
|
||||
│ - certificates │
|
||||
│ - attendance │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Zeugnis Workflow (Role Chain)
|
||||
|
||||
The certificate workflow follows a strict approval chain from subject teachers to school principal:
|
||||
|
||||
```
|
||||
┌──────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ ┌────────────────────┐ ┌──────────────────┐
|
||||
│ FACHLEHRER │───▶│ KLASSENLEHRER │───▶│ ZEUGNISBEAUFTRAGTER │───▶│ SCHULLEITUNG │───▶│ SEKRETARIAT │
|
||||
│ (Subject │ │ (Class │ │ (Certificate │ │ (Principal) │ │ (Secretary) │
|
||||
│ Teacher) │ │ Teacher) │ │ Coordinator) │ │ │ │ │
|
||||
└──────────────────┘ └──────────────────┘ └────────────────────────┘ └────────────────────┘ └──────────────────┘
|
||||
│ │ │ │ │
|
||||
▼ ▼ ▼ ▼ ▼
|
||||
Grades Entry Approve Quality Check Sign-off & Lock Print & Archive
|
||||
(Oral/Written) Grades & Review
|
||||
```
|
||||
|
||||
### Workflow States
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ DRAFT │────▶│ SUBMITTED │────▶│ REVIEWED │────▶│ SIGNED │────▶│ PRINTED │
|
||||
│ (Entwurf) │ │ (Eingereicht)│ │ (Geprueft) │ │(Unterzeichnet) │ (Gedruckt) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
Fachlehrer Klassenlehrer Zeugnisbeauftragter Schulleitung
|
||||
```
|
||||
|
||||
## RBAC Integration
|
||||
|
||||
### Certificate-Related Roles
|
||||
|
||||
| Role | German | Description |
|
||||
|------|--------|-------------|
|
||||
| `FACHLEHRER` | Fachlehrer | Subject teacher - enters grades |
|
||||
| `KLASSENLEHRER` | Klassenlehrer | Class teacher - approves class grades |
|
||||
| `ZEUGNISBEAUFTRAGTER` | Zeugnisbeauftragter | Certificate coordinator - quality control |
|
||||
| `SCHULLEITUNG` | Schulleitung | Principal - final sign-off |
|
||||
| `SEKRETARIAT` | Sekretariat | Secretary - printing & archiving |
|
||||
|
||||
### Certificate Resource Types
|
||||
|
||||
| ResourceType | Description |
|
||||
|--------------|-------------|
|
||||
| `ZEUGNIS` | Final certificate document |
|
||||
| `ZEUGNIS_VORLAGE` | Certificate template (per Bundesland) |
|
||||
| `ZEUGNIS_ENTWURF` | Draft certificate (before approval) |
|
||||
| `FACHNOTE` | Subject grade |
|
||||
| `KOPFNOTE` | Head grade (Arbeits-/Sozialverhalten) |
|
||||
| `BEMERKUNG` | Certificate remarks |
|
||||
| `STATISTIK` | Class/subject statistics |
|
||||
| `NOTENSPIEGEL` | Grade distribution chart |
|
||||
|
||||
## German Grading System
|
||||
|
||||
| Grade | Meaning | Points |
|
||||
|-------|---------|--------|
|
||||
| 1 | sehr gut (excellent) | 15-13 |
|
||||
| 2 | gut (good) | 12-10 |
|
||||
| 3 | befriedigend (satisfactory) | 9-7 |
|
||||
| 4 | ausreichend (adequate) | 6-4 |
|
||||
| 5 | mangelhaft (poor) | 3-1 |
|
||||
| 6 | ungenuegend (inadequate) | 0 |
|
||||
|
||||
### Grade Calculation
|
||||
|
||||
```
|
||||
Final Grade = (Written Weight * Written Avg) + (Oral Weight * Oral Avg)
|
||||
|
||||
Default weights:
|
||||
- Written (Klassenarbeiten): 50%
|
||||
- Oral (muendliche Note): 50%
|
||||
|
||||
Customizable per subject/student via UpdateGradeWeights endpoint.
|
||||
```
|
||||
|
||||
## API Routes (School-Service)
|
||||
|
||||
### Grade Management
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/v1/school/grades/:classId` | Get class grades |
|
||||
| GET | `/api/v1/school/grades/student/:studentId` | Get student grades |
|
||||
| PUT | `/api/v1/school/grades/:studentId/:subjectId/oral` | Update oral grade |
|
||||
| POST | `/api/v1/school/grades/calculate` | Calculate final grades |
|
||||
| PUT | `/api/v1/school/grades/:studentId/:subjectId/lock` | Lock final grade |
|
||||
|
||||
### Statistics
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/v1/school/statistics/:classId` | Class statistics |
|
||||
| GET | `/api/v1/school/statistics/:classId/subject/:subjectId` | Subject statistics |
|
||||
| GET | `/api/v1/school/statistics/student/:studentId` | Student statistics |
|
||||
| GET | `/api/v1/school/statistics/:classId/notenspiegel` | Grade distribution |
|
||||
|
||||
### Certificates
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/v1/school/certificates/templates` | List templates |
|
||||
| GET | `/api/v1/school/certificates/class/:classId` | Class certificates |
|
||||
| POST | `/api/v1/school/certificates/generate` | Generate single |
|
||||
| POST | `/api/v1/school/certificates/generate-bulk` | Generate bulk |
|
||||
| GET | `/api/v1/school/certificates/detail/:id/pdf` | Download PDF |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **RBAC Enforcement**: All certificate operations check user role permissions
|
||||
2. **Tenant Isolation**: Teachers only see their own classes/students
|
||||
3. **Audit Trail**: All grade changes and approvals logged
|
||||
4. **Lock Mechanism**: Finalized certificates cannot be modified
|
||||
5. **Workflow Enforcement**: Cannot skip approval steps
|
||||
402
docs-src/development/ci-cd-pipeline.md
Normal file
402
docs-src/development/ci-cd-pipeline.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# CI/CD Pipeline
|
||||
|
||||
Übersicht über den Deployment-Prozess für Breakpilot.
|
||||
|
||||
## Übersicht
|
||||
|
||||
| Komponente | Build-Tool | Deployment |
|
||||
|------------|------------|------------|
|
||||
| Frontend (Next.js) | Docker | Mac Mini |
|
||||
| Backend (FastAPI) | Docker | Mac Mini |
|
||||
| Go Services | Docker (Multi-stage) | Mac Mini |
|
||||
| Documentation | MkDocs | Docker (Nginx) |
|
||||
|
||||
## Deployment-Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Entwickler-MacBook │
|
||||
│ │
|
||||
│ breakpilot-pwa/ │
|
||||
│ ├── studio-v2/ (Next.js Frontend) │
|
||||
│ ├── admin-v2/ (Next.js Admin) │
|
||||
│ ├── backend/ (Python FastAPI) │
|
||||
│ ├── consent-service/ (Go Service) │
|
||||
│ ├── klausur-service/ (Python FastAPI) │
|
||||
│ ├── voice-service/ (Python FastAPI) │
|
||||
│ ├── ai-compliance-sdk/ (Go Service) │
|
||||
│ └── docs-src/ (MkDocs) │
|
||||
│ │
|
||||
│ $ ./sync-and-deploy.sh │
|
||||
└───────────────────────────────┬─────────────────────────────────┘
|
||||
│
|
||||
│ rsync + SSH
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Mac Mini Server │
|
||||
│ │
|
||||
│ Docker Compose │
|
||||
│ ├── website (Port 3000) │
|
||||
│ ├── studio-v2 (Port 3001) │
|
||||
│ ├── admin-v2 (Port 3002) │
|
||||
│ ├── backend (Port 8000) │
|
||||
│ ├── consent-service (Port 8081) │
|
||||
│ ├── klausur-service (Port 8086) │
|
||||
│ ├── voice-service (Port 8082) │
|
||||
│ ├── ai-compliance-sdk (Port 8090) │
|
||||
│ ├── docs (Port 8009) │
|
||||
│ ├── postgres │
|
||||
│ ├── valkey (Redis) │
|
||||
│ ├── qdrant │
|
||||
│ └── minio │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Sync & Deploy Workflow
|
||||
|
||||
### 1. Dateien synchronisieren
|
||||
|
||||
```bash
|
||||
# Sync aller relevanten Verzeichnisse zum Mac Mini
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' \
|
||||
--exclude '.next' \
|
||||
--exclude '.git' \
|
||||
--exclude '__pycache__' \
|
||||
--exclude 'venv' \
|
||||
--exclude '.pytest_cache' \
|
||||
/Users/benjaminadmin/Projekte/breakpilot-pwa/ \
|
||||
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/
|
||||
```
|
||||
|
||||
### 2. Container bauen
|
||||
|
||||
```bash
|
||||
# Einzelnen Service bauen
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache <service-name>"
|
||||
|
||||
# Beispiele:
|
||||
# studio-v2, admin-v2, website, backend, klausur-service, docs
|
||||
```
|
||||
|
||||
### 3. Container deployen
|
||||
|
||||
```bash
|
||||
# Container neu starten
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d <service-name>"
|
||||
```
|
||||
|
||||
### 4. Logs prüfen
|
||||
|
||||
```bash
|
||||
# Container-Logs anzeigen
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
logs -f <service-name>"
|
||||
```
|
||||
|
||||
## Service-spezifische Deployments
|
||||
|
||||
### Next.js Frontend (studio-v2, admin-v2, website)
|
||||
|
||||
```bash
|
||||
# 1. Sync
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' --exclude '.next' --exclude '.git' \
|
||||
/Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2/ \
|
||||
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2/
|
||||
|
||||
# 2. Build & Deploy
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache studio-v2 && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d studio-v2"
|
||||
```
|
||||
|
||||
### Python Services (backend, klausur-service, voice-service)
|
||||
|
||||
```bash
|
||||
# Build mit requirements.txt
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build klausur-service && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d klausur-service"
|
||||
```
|
||||
|
||||
### Go Services (consent-service, ai-compliance-sdk)
|
||||
|
||||
```bash
|
||||
# Multi-stage Build (Go → Alpine)
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache consent-service && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d consent-service"
|
||||
```
|
||||
|
||||
### MkDocs Dokumentation
|
||||
|
||||
```bash
|
||||
# Build & Deploy
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache docs && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d docs"
|
||||
|
||||
# Verfügbar unter: http://macmini:8009
|
||||
```
|
||||
|
||||
## Health Checks
|
||||
|
||||
### Service-Status prüfen
|
||||
|
||||
```bash
|
||||
# Alle Container-Status
|
||||
ssh macmini "docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"
|
||||
|
||||
# Health-Endpoints prüfen
|
||||
curl -s http://macmini:8000/health
|
||||
curl -s http://macmini:8081/health
|
||||
curl -s http://macmini:8086/health
|
||||
curl -s http://macmini:8090/health
|
||||
```
|
||||
|
||||
### Logs analysieren
|
||||
|
||||
```bash
|
||||
# Letzte 100 Zeilen
|
||||
ssh macmini "docker logs --tail 100 breakpilot-pwa-backend-1"
|
||||
|
||||
# Live-Logs folgen
|
||||
ssh macmini "docker logs -f breakpilot-pwa-backend-1"
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
### Container auf vorherige Version zurücksetzen
|
||||
|
||||
```bash
|
||||
# 1. Aktuelles Image taggen
|
||||
ssh macmini "docker tag breakpilot-pwa-backend:latest breakpilot-pwa-backend:backup"
|
||||
|
||||
# 2. Altes Image deployen
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d backend"
|
||||
|
||||
# 3. Bei Problemen: Backup wiederherstellen
|
||||
ssh macmini "docker tag breakpilot-pwa-backend:backup breakpilot-pwa-backend:latest"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Container startet nicht
|
||||
|
||||
```bash
|
||||
# 1. Logs prüfen
|
||||
ssh macmini "docker logs breakpilot-pwa-<service>-1"
|
||||
|
||||
# 2. Container manuell starten für Debug-Output
|
||||
ssh macmini "docker compose -f .../docker-compose.yml run --rm <service>"
|
||||
|
||||
# 3. In Container einloggen
|
||||
ssh macmini "docker exec -it breakpilot-pwa-<service>-1 /bin/sh"
|
||||
```
|
||||
|
||||
### Port bereits belegt
|
||||
|
||||
```bash
|
||||
# Port-Belegung prüfen
|
||||
ssh macmini "lsof -i :8000"
|
||||
|
||||
# Container mit dem Port finden
|
||||
ssh macmini "docker ps --filter publish=8000"
|
||||
```
|
||||
|
||||
### Build-Fehler
|
||||
|
||||
```bash
|
||||
# Cache komplett leeren
|
||||
ssh macmini "docker builder prune -a"
|
||||
|
||||
# Ohne Cache bauen
|
||||
ssh macmini "docker compose build --no-cache <service>"
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Resource-Nutzung
|
||||
|
||||
```bash
|
||||
# CPU/Memory aller Container
|
||||
ssh macmini "docker stats --no-stream"
|
||||
|
||||
# Disk-Nutzung
|
||||
ssh macmini "docker system df"
|
||||
```
|
||||
|
||||
### Cleanup
|
||||
|
||||
```bash
|
||||
# Ungenutzte Images/Container entfernen
|
||||
ssh macmini "docker system prune -a --volumes"
|
||||
|
||||
# Nur dangling Images
|
||||
ssh macmini "docker image prune"
|
||||
```
|
||||
|
||||
## Umgebungsvariablen
|
||||
|
||||
Umgebungsvariablen werden über `.env` Dateien und docker-compose.yml verwaltet:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
backend:
|
||||
environment:
|
||||
- DATABASE_URL=postgresql://...
|
||||
- REDIS_URL=redis://valkey:6379
|
||||
- SECRET_KEY=${SECRET_KEY}
|
||||
```
|
||||
|
||||
**Wichtig**: Sensible Werte niemals in Git committen. Stattdessen:
|
||||
- `.env` Datei auf dem Server pflegen
|
||||
- Secrets über HashiCorp Vault (siehe unten)
|
||||
|
||||
## Woodpecker CI - Automatisierte OAuth Integration
|
||||
|
||||
### Überblick
|
||||
|
||||
Die OAuth-Integration zwischen Woodpecker CI und Gitea ist **vollständig automatisiert**. Credentials werden in HashiCorp Vault gespeichert und bei Bedarf automatisch regeneriert.
|
||||
|
||||
!!! info "Warum automatisiert?"
|
||||
Diese Automatisierung ist eine DevSecOps Best Practice:
|
||||
|
||||
- **Infrastructure-as-Code**: Alles ist reproduzierbar
|
||||
- **Disaster Recovery**: Verlorene Credentials können automatisch regeneriert werden
|
||||
- **Security**: Secrets werden zentral in Vault verwaltet
|
||||
- **Onboarding**: Neue Entwickler müssen nichts manuell konfigurieren
|
||||
|
||||
### Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Mac Mini Server │
|
||||
│ │
|
||||
│ ┌───────────────┐ OAuth 2.0 ┌───────────────┐ │
|
||||
│ │ Gitea │ ←─────────────────────────→│ Woodpecker │ │
|
||||
│ │ (Port 3003) │ Client ID + Secret │ (Port 8090) │ │
|
||||
│ └───────────────┘ └───────────────┘ │
|
||||
│ │ │ │
|
||||
│ │ OAuth App │ Env Vars│
|
||||
│ │ (DB: oauth2_application) │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ HashiCorp Vault (Port 8200) │ │
|
||||
│ │ │ │
|
||||
│ │ secret/cicd/woodpecker: │ │
|
||||
│ │ - gitea_client_id │ │
|
||||
│ │ - gitea_client_secret │ │
|
||||
│ │ │ │
|
||||
│ │ secret/cicd/api-tokens: │ │
|
||||
│ │ - gitea_token (für API-Zugriff) │ │
|
||||
│ │ - woodpecker_token (für Pipeline-Trigger) │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Credentials-Speicherorte
|
||||
|
||||
| Ort | Pfad | Inhalt |
|
||||
|-----|------|--------|
|
||||
| **HashiCorp Vault** | `secret/cicd/woodpecker` | Client ID + Secret (Quelle der Wahrheit) |
|
||||
| **.env Datei** | `WOODPECKER_GITEA_CLIENT/SECRET` | Für Docker Compose (aus Vault geladen) |
|
||||
| **Gitea PostgreSQL** | `oauth2_application` Tabelle | OAuth App Registration (gehashtes Secret) |
|
||||
|
||||
### Troubleshooting: OAuth Fehler
|
||||
|
||||
Falls der Fehler "Client ID not registered" oder "user does not exist [uid: 0]" auftritt:
|
||||
|
||||
```bash
|
||||
# Option 1: Automatisches Regenerieren (empfohlen)
|
||||
./scripts/sync-woodpecker-credentials.sh --regenerate
|
||||
|
||||
# Option 2: Manuelles Vorgehen
|
||||
# 1. Credentials aus Vault laden
|
||||
vault kv get secret/cicd/woodpecker
|
||||
|
||||
# 2. .env aktualisieren
|
||||
WOODPECKER_GITEA_CLIENT=<client_id>
|
||||
WOODPECKER_GITEA_SECRET=<client_secret>
|
||||
|
||||
# 3. Zu Mac Mini synchronisieren
|
||||
rsync .env macmini:~/Projekte/breakpilot-pwa/
|
||||
|
||||
# 4. Woodpecker neu starten
|
||||
ssh macmini "cd ~/Projekte/breakpilot-pwa && \
|
||||
docker compose up -d --force-recreate woodpecker-server"
|
||||
```
|
||||
|
||||
### Das Sync-Script
|
||||
|
||||
Das Script `scripts/sync-woodpecker-credentials.sh` automatisiert den gesamten Prozess:
|
||||
|
||||
```bash
|
||||
# Credentials aus Vault laden und .env aktualisieren
|
||||
./scripts/sync-woodpecker-credentials.sh
|
||||
|
||||
# Neue Credentials generieren (OAuth App in Gitea + Vault + .env)
|
||||
./scripts/sync-woodpecker-credentials.sh --regenerate
|
||||
```
|
||||
|
||||
Was das Script macht:
|
||||
|
||||
1. **Liest** die aktuellen Credentials aus Vault
|
||||
2. **Aktualisiert** die .env Datei automatisch
|
||||
3. **Bei `--regenerate`**:
|
||||
- Löscht alte OAuth Apps in Gitea
|
||||
- Erstellt neue OAuth App mit neuem Client ID/Secret
|
||||
- Speichert Credentials in Vault
|
||||
- Aktualisiert .env
|
||||
|
||||
### Vault-Zugriff
|
||||
|
||||
```bash
|
||||
# Vault Token (Development)
|
||||
export VAULT_TOKEN=breakpilot-dev-token
|
||||
|
||||
# Credentials lesen
|
||||
docker exec -e VAULT_TOKEN=$VAULT_TOKEN breakpilot-pwa-vault \
|
||||
vault kv get secret/cicd/woodpecker
|
||||
|
||||
# Credentials setzen
|
||||
docker exec -e VAULT_TOKEN=$VAULT_TOKEN breakpilot-pwa-vault \
|
||||
vault kv put secret/cicd/woodpecker \
|
||||
gitea_client_id="..." \
|
||||
gitea_client_secret="..."
|
||||
```
|
||||
|
||||
### Services neustarten nach Credentials-Änderung
|
||||
|
||||
```bash
|
||||
# Wichtig: --force-recreate um neue Env Vars zu laden
|
||||
cd /Users/benjaminadmin/Projekte/breakpilot-pwa
|
||||
docker compose up -d --force-recreate woodpecker-server
|
||||
|
||||
# Logs prüfen
|
||||
docker logs breakpilot-pwa-woodpecker-server --tail 50
|
||||
```
|
||||
159
docs-src/development/documentation.md
Normal file
159
docs-src/development/documentation.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Dokumentations-Regeln
|
||||
|
||||
## Automatische Dokumentations-Aktualisierung
|
||||
|
||||
**WICHTIG:** Bei JEDER Code-Aenderung muss die entsprechende Dokumentation aktualisiert werden!
|
||||
|
||||
## Wann Dokumentation aktualisieren?
|
||||
|
||||
### API-Aenderungen
|
||||
|
||||
Wenn du einen Endpoint aenderst, hinzufuegst oder entfernst:
|
||||
|
||||
- Aktualisiere die [Backend API Dokumentation](../api/backend-api.md)
|
||||
- Aktualisiere Service-spezifische API-Docs
|
||||
|
||||
### Neue Funktionen/Klassen
|
||||
|
||||
Wenn du neue Funktionen, Klassen oder Module erstellst:
|
||||
|
||||
- Aktualisiere die entsprechende Service-Dokumentation
|
||||
- Fuege Code-Beispiele hinzu
|
||||
|
||||
### Architektur-Aenderungen
|
||||
|
||||
Wenn du die Systemarchitektur aenderst:
|
||||
|
||||
- Aktualisiere die [System-Architektur](../architecture/system-architecture.md)
|
||||
- Aktualisiere Datenmodell-Dokumentation bei DB-Aenderungen
|
||||
|
||||
### Neue Konfigurationsoptionen
|
||||
|
||||
Wenn du neue Umgebungsvariablen oder Konfigurationen hinzufuegst:
|
||||
|
||||
- Aktualisiere die entsprechende README
|
||||
- Fuege zur [Umgebungs-Setup](../getting-started/environment-setup.md) hinzu
|
||||
|
||||
## Dokumentations-Format
|
||||
|
||||
### API-Endpoints dokumentieren
|
||||
|
||||
```markdown
|
||||
### METHOD /path/to/endpoint
|
||||
|
||||
Kurze Beschreibung.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"field": "value"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200):**
|
||||
```json
|
||||
{
|
||||
"result": "value"
|
||||
}
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
- `400`: Beschreibung
|
||||
- `401`: Beschreibung
|
||||
```
|
||||
|
||||
### Funktionen dokumentieren
|
||||
|
||||
```markdown
|
||||
### FunctionName (file.go:123)
|
||||
|
||||
```go
|
||||
func FunctionName(param Type) ReturnType
|
||||
```
|
||||
|
||||
**Beschreibung:** Was macht die Funktion?
|
||||
|
||||
**Parameter:**
|
||||
- `param`: Beschreibung
|
||||
|
||||
**Rueckgabe:** Beschreibung
|
||||
```
|
||||
|
||||
## Checkliste nach Code-Aenderungen
|
||||
|
||||
Vor dem Abschluss einer Aufgabe pruefen:
|
||||
|
||||
- [ ] Wurden neue API-Endpoints hinzugefuegt? → API-Docs aktualisieren
|
||||
- [ ] Wurden Datenmodelle geaendert? → Architektur-Docs aktualisieren
|
||||
- [ ] Wurden neue Konfigurationen hinzugefuegt? → README aktualisieren
|
||||
- [ ] Wurden neue Abhaengigkeiten hinzugefuegt? → requirements.txt/go.mod UND Docs
|
||||
- [ ] Wurde die Architektur geaendert? → architecture/ aktualisieren
|
||||
|
||||
## Beispiel: Vollstaendige Dokumentation einer neuen Funktion
|
||||
|
||||
Wenn du z.B. `GetUserStats()` im Go Service hinzufuegst:
|
||||
|
||||
1. **Code schreiben** in `internal/services/stats_service.go`
|
||||
2. **API-Doc aktualisieren** in der API-Dokumentation
|
||||
3. **Service-Doc aktualisieren** in der Service-README
|
||||
4. **Test schreiben** (siehe [Testing](./testing.md))
|
||||
|
||||
## Dokumentations-Struktur
|
||||
|
||||
Die zentrale Dokumentation befindet sich unter `docs-src/`:
|
||||
|
||||
```
|
||||
docs-src/
|
||||
├── index.md # Startseite
|
||||
├── getting-started/ # Erste Schritte
|
||||
│ ├── environment-setup.md
|
||||
│ └── mac-mini-setup.md
|
||||
├── architecture/ # Architektur-Dokumentation
|
||||
│ ├── system-architecture.md
|
||||
│ ├── auth-system.md
|
||||
│ └── ...
|
||||
├── api/ # API-Dokumentation
|
||||
│ └── backend-api.md
|
||||
├── services/ # Service-Dokumentation
|
||||
│ ├── klausur-service/
|
||||
│ ├── agent-core/
|
||||
│ └── ...
|
||||
├── development/ # Entwickler-Guides
|
||||
│ ├── testing.md
|
||||
│ └── documentation.md
|
||||
└── guides/ # Weitere Anleitungen
|
||||
```
|
||||
|
||||
## MkDocs Konventionen
|
||||
|
||||
Diese Dokumentation wird mit MkDocs + Material Theme generiert:
|
||||
|
||||
- **Admonitions** fuer Hinweise:
|
||||
```markdown
|
||||
!!! note "Hinweis"
|
||||
Wichtige Information hier.
|
||||
|
||||
!!! warning "Warnung"
|
||||
Vorsicht bei dieser Aktion.
|
||||
```
|
||||
|
||||
- **Code-Tabs** fuer mehrere Sprachen:
|
||||
```markdown
|
||||
=== "Python"
|
||||
```python
|
||||
print("Hello")
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
```go
|
||||
fmt.Println("Hello")
|
||||
```
|
||||
```
|
||||
|
||||
- **Mermaid-Diagramme** fuer Visualisierungen:
|
||||
```markdown
|
||||
```mermaid
|
||||
graph LR
|
||||
A --> B --> C
|
||||
```
|
||||
```
|
||||
211
docs-src/development/testing.md
Normal file
211
docs-src/development/testing.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Test-Regeln
|
||||
|
||||
## Automatische Test-Erweiterung
|
||||
|
||||
**WICHTIG:** Bei JEDER Code-Aenderung muessen entsprechende Tests erstellt oder aktualisiert werden!
|
||||
|
||||
## Wann Tests schreiben?
|
||||
|
||||
### IMMER wenn du:
|
||||
|
||||
1. **Neue Funktionen** erstellst → Unit Test
|
||||
2. **Neue API-Endpoints** hinzufuegst → Handler Test
|
||||
3. **Bugs fixst** → Regression Test (der Bug sollte nie wieder auftreten)
|
||||
4. **Bestehenden Code aenderst** → Bestehende Tests anpassen
|
||||
|
||||
## Test-Struktur
|
||||
|
||||
### Go Tests (Consent Service)
|
||||
|
||||
**Speicherort:** Im gleichen Verzeichnis wie der Code
|
||||
|
||||
```
|
||||
internal/
|
||||
├── services/
|
||||
│ ├── auth_service.go
|
||||
│ └── auth_service_test.go ← Test hier
|
||||
├── handlers/
|
||||
│ ├── handlers.go
|
||||
│ └── handlers_test.go ← Test hier
|
||||
└── middleware/
|
||||
├── auth.go
|
||||
└── middleware_test.go ← Test hier
|
||||
```
|
||||
|
||||
**Test-Namenskonvention:**
|
||||
|
||||
```go
|
||||
func TestFunctionName_Scenario_ExpectedResult(t *testing.T)
|
||||
|
||||
// Beispiele:
|
||||
func TestHashPassword_ValidPassword_ReturnsHash(t *testing.T)
|
||||
func TestLogin_InvalidCredentials_Returns401(t *testing.T)
|
||||
func TestCreateDocument_MissingTitle_ReturnsError(t *testing.T)
|
||||
```
|
||||
|
||||
**Test-Template:**
|
||||
|
||||
```go
|
||||
func TestFunctionName(t *testing.T) {
|
||||
// Arrange
|
||||
service := &MyService{}
|
||||
input := "test-input"
|
||||
|
||||
// Act
|
||||
result, err := service.DoSomething(input)
|
||||
|
||||
// Assert
|
||||
if err != nil {
|
||||
t.Fatalf("Expected no error, got %v", err)
|
||||
}
|
||||
if result != expected {
|
||||
t.Errorf("Expected %v, got %v", expected, result)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Table-Driven Tests bevorzugen:**
|
||||
|
||||
```go
|
||||
func TestValidateEmail(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
email string
|
||||
expected bool
|
||||
}{
|
||||
{"valid email", "test@example.com", true},
|
||||
{"missing @", "testexample.com", false},
|
||||
{"empty", "", false},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
result := ValidateEmail(tt.email)
|
||||
if result != tt.expected {
|
||||
t.Errorf("Expected %v, got %v", tt.expected, result)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Python Tests (Backend)
|
||||
|
||||
**Speicherort:** `/backend/tests/`
|
||||
|
||||
```
|
||||
backend/
|
||||
├── consent_client.py
|
||||
├── gdpr_api.py
|
||||
└── tests/
|
||||
├── __init__.py
|
||||
├── test_consent_client.py ← Tests fuer consent_client.py
|
||||
└── test_gdpr_api.py ← Tests fuer gdpr_api.py
|
||||
```
|
||||
|
||||
**Test-Namenskonvention:**
|
||||
|
||||
```python
|
||||
class TestClassName:
|
||||
def test_method_scenario_expected_result(self):
|
||||
pass
|
||||
|
||||
# Beispiele:
|
||||
class TestConsentClient:
|
||||
def test_check_consent_valid_token_returns_status(self):
|
||||
pass
|
||||
|
||||
def test_check_consent_expired_token_raises_error(self):
|
||||
pass
|
||||
```
|
||||
|
||||
**Test-Template:**
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, patch, MagicMock
|
||||
|
||||
class TestMyFeature:
|
||||
def test_sync_function(self):
|
||||
# Arrange
|
||||
input_data = "test"
|
||||
|
||||
# Act
|
||||
result = my_function(input_data)
|
||||
|
||||
# Assert
|
||||
assert result == expected
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_function(self):
|
||||
# Arrange
|
||||
client = MyClient()
|
||||
|
||||
# Act
|
||||
with patch("httpx.AsyncClient") as mock:
|
||||
mock_instance = AsyncMock()
|
||||
mock.return_value = mock_instance
|
||||
result = await client.fetch_data()
|
||||
|
||||
# Assert
|
||||
assert result is not None
|
||||
```
|
||||
|
||||
## Test-Kategorien
|
||||
|
||||
### 1. Unit Tests (Hoechste Prioritaet)
|
||||
|
||||
- Testen einzelne Funktionen/Methoden
|
||||
- Keine externen Abhaengigkeiten (Mocks verwenden)
|
||||
- Schnell ausfuehrbar
|
||||
|
||||
### 2. Integration Tests
|
||||
|
||||
- Testen Zusammenspiel mehrerer Komponenten
|
||||
- Koennen echte DB verwenden (Test-DB)
|
||||
|
||||
### 3. Security Tests
|
||||
|
||||
- Auth/JWT Validierung
|
||||
- Passwort-Hashing
|
||||
- Berechtigungspruefung
|
||||
|
||||
## Checkliste vor Abschluss
|
||||
|
||||
Vor dem Abschluss einer Aufgabe:
|
||||
|
||||
- [ ] Gibt es Tests fuer alle neuen Funktionen?
|
||||
- [ ] Gibt es Tests fuer alle Edge Cases?
|
||||
- [ ] Gibt es Tests fuer Fehlerfaelle?
|
||||
- [ ] Laufen alle bestehenden Tests noch? (`go test ./...` / `pytest`)
|
||||
- [ ] Ist die Test-Coverage angemessen?
|
||||
|
||||
## Tests ausfuehren
|
||||
|
||||
```bash
|
||||
# Go - Alle Tests
|
||||
cd consent-service && go test -v ./...
|
||||
|
||||
# Go - Mit Coverage
|
||||
cd consent-service && go test -cover ./...
|
||||
|
||||
# Python - Alle Tests
|
||||
cd backend && source venv/bin/activate && pytest -v
|
||||
|
||||
# Python - Mit Coverage
|
||||
cd backend && pytest --cov=. --cov-report=html
|
||||
```
|
||||
|
||||
## Beispiel: Vollstaendiger Test-Workflow
|
||||
|
||||
Wenn du z.B. eine neue `GetUserStats()` Funktion im Go Service hinzufuegst:
|
||||
|
||||
1. **Funktion schreiben** in `internal/services/stats_service.go`
|
||||
2. **Test erstellen** in `internal/services/stats_service_test.go`:
|
||||
```go
|
||||
func TestGetUserStats_ValidUser_ReturnsStats(t *testing.T) {...}
|
||||
func TestGetUserStats_InvalidUser_ReturnsError(t *testing.T) {...}
|
||||
func TestGetUserStats_NoConsents_ReturnsEmptyStats(t *testing.T) {...}
|
||||
```
|
||||
3. **Tests ausfuehren**: `go test -v ./internal/services/...`
|
||||
4. **Dokumentation aktualisieren** (siehe [Dokumentation](./documentation.md))
|
||||
258
docs-src/getting-started/environment-setup.md
Normal file
258
docs-src/getting-started/environment-setup.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# Entwickler-Guide: Umgebungs-Setup
|
||||
|
||||
Dieser Guide erklärt das tägliche Arbeiten mit den Dev/Staging/Prod-Umgebungen.
|
||||
|
||||
## Schnellstart
|
||||
|
||||
```bash
|
||||
# 1. Wechsle in das Projektverzeichnis
|
||||
cd /Users/benjaminadmin/Projekte/breakpilot-pwa
|
||||
|
||||
# 2. Starte die Entwicklungsumgebung
|
||||
./scripts/start.sh dev
|
||||
|
||||
# 3. Prüfe den Status
|
||||
./scripts/status.sh
|
||||
```
|
||||
|
||||
## Täglicher Workflow
|
||||
|
||||
### Morgens: Entwicklung starten
|
||||
|
||||
```bash
|
||||
# Auf develop-Branch wechseln
|
||||
git checkout develop
|
||||
|
||||
# Neueste Änderungen holen (falls Remote konfiguriert)
|
||||
git pull origin develop
|
||||
|
||||
# Umgebung starten
|
||||
./scripts/start.sh dev
|
||||
```
|
||||
|
||||
### Während der Arbeit
|
||||
|
||||
```bash
|
||||
# Logs eines Services anzeigen
|
||||
docker compose logs -f backend
|
||||
|
||||
# Service neustarten
|
||||
docker compose restart backend
|
||||
|
||||
# Status prüfen
|
||||
./scripts/status.sh
|
||||
```
|
||||
|
||||
### Änderungen committen
|
||||
|
||||
```bash
|
||||
# Änderungen anzeigen
|
||||
git status
|
||||
|
||||
# Dateien hinzufügen
|
||||
git add .
|
||||
|
||||
# Commit erstellen
|
||||
git commit -m "Feature: Beschreibung der Änderung"
|
||||
```
|
||||
|
||||
### Abends: Umgebung stoppen
|
||||
|
||||
```bash
|
||||
./scripts/stop.sh dev
|
||||
```
|
||||
|
||||
## Umgebung wechseln
|
||||
|
||||
### Von Dev zu Staging
|
||||
|
||||
```bash
|
||||
# Stoppe Dev
|
||||
./scripts/stop.sh dev
|
||||
|
||||
# Starte Staging
|
||||
./scripts/start.sh staging
|
||||
```
|
||||
|
||||
### Zurück zu Dev
|
||||
|
||||
```bash
|
||||
./scripts/stop.sh staging
|
||||
./scripts/start.sh dev
|
||||
```
|
||||
|
||||
## Code promoten
|
||||
|
||||
### Dev → Staging (nach erfolgreichem Test)
|
||||
|
||||
```bash
|
||||
# Stelle sicher, dass alle Änderungen committet sind
|
||||
git status
|
||||
|
||||
# Promote zu Staging
|
||||
./scripts/promote.sh dev-to-staging
|
||||
|
||||
# Push zu Remote (falls konfiguriert)
|
||||
git push origin staging
|
||||
```
|
||||
|
||||
### Staging → Production (Release)
|
||||
|
||||
```bash
|
||||
# Nur nach vollständigem Test auf Staging!
|
||||
./scripts/promote.sh staging-to-prod
|
||||
|
||||
# Push zu Remote
|
||||
git push origin main
|
||||
```
|
||||
|
||||
## Nützliche Befehle
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
# Alle Container anzeigen
|
||||
docker compose ps
|
||||
|
||||
# Logs folgen
|
||||
docker compose logs -f [service]
|
||||
|
||||
# In Container einsteigen
|
||||
docker compose exec backend bash
|
||||
docker compose exec postgres psql -U breakpilot -d breakpilot_dev
|
||||
|
||||
# Container neustarten
|
||||
docker compose restart [service]
|
||||
|
||||
# Alle Container stoppen und entfernen
|
||||
docker compose down
|
||||
|
||||
# Mit Volumes löschen (VORSICHT!)
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
### Git
|
||||
|
||||
```bash
|
||||
# Aktuellen Branch anzeigen
|
||||
git branch --show-current
|
||||
|
||||
# Alle Branches anzeigen
|
||||
git branch -v
|
||||
|
||||
# Änderungen zwischen Branches anzeigen
|
||||
git diff develop..staging
|
||||
```
|
||||
|
||||
### Datenbank
|
||||
|
||||
```bash
|
||||
# Direkt mit PostgreSQL verbinden (Dev)
|
||||
docker compose exec postgres psql -U breakpilot -d breakpilot_dev
|
||||
|
||||
# Backup erstellen
|
||||
./scripts/backup.sh
|
||||
|
||||
# Backup wiederherstellen
|
||||
./scripts/restore.sh backup-file.sql.gz
|
||||
```
|
||||
|
||||
## Häufige Probleme
|
||||
|
||||
### "Port already in use"
|
||||
|
||||
Ein anderer Prozess oder Container verwendet den Port.
|
||||
|
||||
```bash
|
||||
# Laufende Container prüfen
|
||||
docker ps
|
||||
|
||||
# Alte Container stoppen
|
||||
docker compose down
|
||||
|
||||
# Prozess auf Port finden (z.B. 8000)
|
||||
lsof -i :8000
|
||||
```
|
||||
|
||||
### Container startet nicht
|
||||
|
||||
```bash
|
||||
# Logs prüfen
|
||||
docker compose logs backend
|
||||
|
||||
# Container neu bauen
|
||||
docker compose build backend
|
||||
docker compose up -d backend
|
||||
```
|
||||
|
||||
### Datenbank-Verbindungsfehler
|
||||
|
||||
```bash
|
||||
# Prüfen ob PostgreSQL läuft
|
||||
docker compose ps postgres
|
||||
|
||||
# PostgreSQL-Logs prüfen
|
||||
docker compose logs postgres
|
||||
|
||||
# Neustart
|
||||
docker compose restart postgres
|
||||
```
|
||||
|
||||
### Falsche Umgebung aktiv
|
||||
|
||||
```bash
|
||||
# Status prüfen
|
||||
./scripts/status.sh
|
||||
|
||||
# Auf richtige Umgebung wechseln
|
||||
./scripts/env-switch.sh dev
|
||||
```
|
||||
|
||||
## Umgebungs-Dateien
|
||||
|
||||
| Datei | Beschreibung | Im Git? |
|
||||
|-------|--------------|---------|
|
||||
| `.env` | Aktive Umgebung | Nein |
|
||||
| `.env.dev` | Development Werte | Ja |
|
||||
| `.env.staging` | Staging Werte | Ja |
|
||||
| `.env.prod` | Production Werte | **NEIN** |
|
||||
| `.env.example` | Template | Ja |
|
||||
|
||||
## Ports Übersicht
|
||||
|
||||
### Development
|
||||
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| Backend | 8000 | http://localhost:8000 |
|
||||
| Website | 3000 | http://localhost:3000 |
|
||||
| Consent Service | 8081 | http://localhost:8081 |
|
||||
| PostgreSQL | 5432 | localhost:5432 |
|
||||
| Mailpit UI | 8025 | http://localhost:8025 |
|
||||
| MinIO Console | 9001 | http://localhost:9001 |
|
||||
|
||||
### Staging
|
||||
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| Backend | 8001 | http://localhost:8001 |
|
||||
| PostgreSQL | 5433 | localhost:5433 |
|
||||
| Mailpit UI | 8026 | http://localhost:8026 |
|
||||
| MinIO Console | 9003 | http://localhost:9003 |
|
||||
|
||||
## Hilfe
|
||||
|
||||
```bash
|
||||
# Status und Übersicht
|
||||
./scripts/status.sh
|
||||
|
||||
# Script-Hilfe
|
||||
./scripts/env-switch.sh --help
|
||||
./scripts/promote.sh --help
|
||||
```
|
||||
|
||||
## Verwandte Dokumentation
|
||||
|
||||
- [Architektur: Umgebungen](../architecture/environments.md)
|
||||
- [Secrets Management](../architecture/secrets-management.md)
|
||||
- [System-Architektur](../architecture/system-architecture.md)
|
||||
109
docs-src/getting-started/mac-mini-setup.md
Normal file
109
docs-src/getting-started/mac-mini-setup.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Mac Mini Headless Setup - Vollständig Automatisch
|
||||
|
||||
## Verbindungsdaten
|
||||
|
||||
- **IP (LAN):** 192.168.178.100
|
||||
- **User:** benjaminadmin
|
||||
- **SSH:** `ssh benjaminadmin@192.168.178.100`
|
||||
|
||||
## Nach Neustart - Alles startet automatisch!
|
||||
|
||||
| Service | Auto-Start | Port |
|
||||
|---------|------------|------|
|
||||
| SSH | Ja | 22 |
|
||||
| Docker Desktop | Ja | - |
|
||||
| Docker Container | Ja (nach ~2 Min) | 8000, 8081, etc. |
|
||||
| Ollama Server | Ja | 11434 |
|
||||
| Unity Hub | Ja | - |
|
||||
| VS Code | Ja | - |
|
||||
|
||||
**Keine Aktion nötig nach Neustart!** Einfach 2-3 Minuten warten.
|
||||
|
||||
## Status prüfen
|
||||
|
||||
```bash
|
||||
./scripts/mac-mini/status.sh
|
||||
```
|
||||
|
||||
## Services & Ports
|
||||
|
||||
| Service | Port | URL |
|
||||
|---------|------|-----|
|
||||
| Backend API | 8000 | http://192.168.178.100:8000/admin |
|
||||
| Consent Service | 8081 | - |
|
||||
| PostgreSQL | 5432 | - |
|
||||
| Valkey/Redis | 6379 | - |
|
||||
| MinIO | 9000/9001 | http://192.168.178.100:9001 |
|
||||
| Mailpit | 8025 | http://192.168.178.100:8025 |
|
||||
| Ollama | 11434 | http://192.168.178.100:11434/api/tags |
|
||||
| Dokumentation | 8008 | http://192.168.178.100:8008 |
|
||||
|
||||
## LLM Modelle
|
||||
|
||||
- **Qwen 2.5 14B** (14.8 Milliarden Parameter)
|
||||
|
||||
## Scripts (auf MacBook)
|
||||
|
||||
```bash
|
||||
./scripts/mac-mini/status.sh # Status prüfen
|
||||
./scripts/mac-mini/sync.sh # Code synchronisieren
|
||||
./scripts/mac-mini/docker.sh # Docker-Befehle
|
||||
./scripts/mac-mini/backup.sh # Backup erstellen
|
||||
```
|
||||
|
||||
## Docker-Befehle
|
||||
|
||||
```bash
|
||||
./scripts/mac-mini/docker.sh ps # Container anzeigen
|
||||
./scripts/mac-mini/docker.sh logs backend # Logs
|
||||
./scripts/mac-mini/docker.sh restart # Neustart
|
||||
./scripts/mac-mini/docker.sh build # Image bauen
|
||||
```
|
||||
|
||||
## LaunchAgents (Auto-Start)
|
||||
|
||||
Pfad auf Mac Mini: `~/Library/LaunchAgents/`
|
||||
|
||||
| Agent | Funktion |
|
||||
|-------|----------|
|
||||
| `com.docker.desktop.plist` | Docker Desktop |
|
||||
| `com.breakpilot.docker-containers.plist` | Container Auto-Start |
|
||||
| `com.ollama.serve.plist` | Ollama Server |
|
||||
| `com.unity.hub.plist` | Unity Hub |
|
||||
| `com.microsoft.vscode.plist` | VS Code |
|
||||
|
||||
## Projekt-Pfade
|
||||
|
||||
- **MacBook:** `/Users/benjaminadmin/Projekte/breakpilot-pwa/`
|
||||
- **Mac Mini:** `/Users/benjaminadmin/Projekte/breakpilot-pwa/`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Docker Onboarding erscheint wieder
|
||||
|
||||
Docker-Einstellungen sind gesichert in `~/docker-settings-backup/`
|
||||
|
||||
```bash
|
||||
# Wiederherstellen:
|
||||
cp -r ~/docker-settings-backup/* ~/Library/Group\ Containers/group.com.docker/
|
||||
```
|
||||
|
||||
### Container starten nicht automatisch
|
||||
|
||||
Log prüfen:
|
||||
|
||||
```bash
|
||||
ssh benjaminadmin@192.168.178.100 "cat /tmp/docker-autostart.log"
|
||||
```
|
||||
|
||||
Manuell starten:
|
||||
|
||||
```bash
|
||||
./scripts/mac-mini/docker.sh up
|
||||
```
|
||||
|
||||
### SSH nicht erreichbar
|
||||
|
||||
- Prüfe ob Mac Mini an ist (Ping: `ping 192.168.178.100`)
|
||||
- Warte 1-2 Minuten nach Boot
|
||||
- Prüfe Netzwerkverbindung
|
||||
124
docs-src/index.md
Normal file
124
docs-src/index.md
Normal file
@@ -0,0 +1,124 @@
|
||||
# Breakpilot Dokumentation
|
||||
|
||||
Willkommen zur zentralen Dokumentation des Breakpilot-Projekts.
|
||||
|
||||
## Was ist Breakpilot?
|
||||
|
||||
Breakpilot ist eine DSGVO-konforme Bildungsplattform fuer Lehrkraefte mit folgenden Kernfunktionen:
|
||||
|
||||
- **Consent-Management** - Datenschutzkonforme Einwilligungsverwaltung
|
||||
- **KI-gestuetzte Klausurkorrektur** - Automatische Bewertungsvorschlaege fuer Abiturklausuren
|
||||
- **Zeugnisgenerierung** - Workflow-basierte Zeugniserstellung mit Rollenkonzept
|
||||
- **Lernmaterial-Generator** - MC-Tests, Lueckentexte, Mindmaps, Quiz
|
||||
- **Elternbriefe** - GFK-basierte Kommunikation mit PDF-Export
|
||||
|
||||
## Schnellstart
|
||||
|
||||
<div class="grid cards" markdown>
|
||||
|
||||
- :material-rocket-launch:{ .lg .middle } **Erste Schritte**
|
||||
|
||||
---
|
||||
|
||||
Entwicklungsumgebung einrichten und das Projekt starten.
|
||||
|
||||
[:octicons-arrow-right-24: Umgebung einrichten](getting-started/environment-setup.md)
|
||||
|
||||
- :material-server:{ .lg .middle } **Mac Mini Setup**
|
||||
|
||||
---
|
||||
|
||||
Headless Server-Konfiguration fuer den Entwicklungsserver.
|
||||
|
||||
[:octicons-arrow-right-24: Mac Mini Setup](getting-started/mac-mini-setup.md)
|
||||
|
||||
</div>
|
||||
|
||||
## Architektur
|
||||
|
||||
<div class="grid cards" markdown>
|
||||
|
||||
- :material-sitemap:{ .lg .middle } **System-Architektur**
|
||||
|
||||
---
|
||||
|
||||
Ueberblick ueber alle Komponenten und deren Zusammenspiel.
|
||||
|
||||
[:octicons-arrow-right-24: Architektur](architecture/system-architecture.md)
|
||||
|
||||
- :material-shield-lock:{ .lg .middle } **Auth-System**
|
||||
|
||||
---
|
||||
|
||||
Hybrid-Authentifizierung mit Keycloak und lokalem JWT.
|
||||
|
||||
[:octicons-arrow-right-24: Auth-System](architecture/auth-system.md)
|
||||
|
||||
- :material-robot:{ .lg .middle } **Multi-Agent System**
|
||||
|
||||
---
|
||||
|
||||
Verteilte Agent-Architektur fuer KI-Funktionen.
|
||||
|
||||
[:octicons-arrow-right-24: Multi-Agent](architecture/multi-agent.md)
|
||||
|
||||
- :material-key-chain:{ .lg .middle } **Secrets Management**
|
||||
|
||||
---
|
||||
|
||||
HashiCorp Vault Integration fuer sichere Credentials.
|
||||
|
||||
[:octicons-arrow-right-24: Secrets](architecture/secrets-management.md)
|
||||
|
||||
</div>
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Port | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| [Backend (Python)](api/backend-api.md) | 8000 | FastAPI Backend mit Panel UI |
|
||||
| [Consent Service (Go)](architecture/auth-system.md) | 8081 | DSGVO-konforme Einwilligungsverwaltung |
|
||||
| [Klausur Service](services/klausur-service/index.md) | 8086 | KI-gestuetzte Klausurkorrektur |
|
||||
| [Agent Core](services/agent-core/index.md) | - | Multi-Agent Infrastructure |
|
||||
| PostgreSQL | 5432 | Relationale Datenbank |
|
||||
| Qdrant | 6333 | Vektor-Datenbank fuer RAG |
|
||||
| MinIO | 9000 | Object Storage |
|
||||
| Vault | 8200 | Secrets Management |
|
||||
|
||||
## Entwicklung
|
||||
|
||||
- [Testing](development/testing.md) - Test-Standards und Ausfuehrung
|
||||
- [Dokumentation](development/documentation.md) - Dokumentations-Richtlinien
|
||||
- [DevSecOps](architecture/devsecops.md) - Security Pipeline
|
||||
- [Umgebungen](architecture/environments.md) - Dev/Staging/Prod
|
||||
|
||||
## Weitere Ressourcen
|
||||
|
||||
- **GitHub Repository**: Internes GitLab
|
||||
- **Issue Tracker**: GitLab Issues
|
||||
- **API Playground**: [http://macmini:8000/docs](http://macmini:8000/docs)
|
||||
|
||||
---
|
||||
|
||||
## Projektstruktur
|
||||
|
||||
```
|
||||
breakpilot-pwa/
|
||||
├── backend/ # Python FastAPI Backend
|
||||
├── consent-service/ # Go Consent Service
|
||||
├── klausur-service/ # Klausur-Korrektur Service
|
||||
├── agent-core/ # Multi-Agent Infrastructure
|
||||
├── voice-service/ # Voice/Audio Processing
|
||||
├── website/ # Next.js Frontend
|
||||
├── studio-v2/ # Admin Dashboard (Next.js)
|
||||
├── docs-src/ # Diese Dokumentation
|
||||
└── docker-compose.yml # Container-Orchestrierung
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
Bei Fragen oder Problemen:
|
||||
|
||||
1. Pruefen Sie zuerst die relevante Dokumentation
|
||||
2. Suchen Sie im Issue Tracker nach aehnlichen Problemen
|
||||
3. Erstellen Sie ein neues Issue mit detaillierter Beschreibung
|
||||
420
docs-src/services/agent-core/index.md
Normal file
420
docs-src/services/agent-core/index.md
Normal file
@@ -0,0 +1,420 @@
|
||||
# Breakpilot Agent Core
|
||||
|
||||
Multi-Agent Architecture Infrastructure fuer Breakpilot.
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Das `agent-core` Modul stellt die gemeinsame Infrastruktur fuer Breakpilots Multi-Agent-System bereit:
|
||||
|
||||
- **Session Management**: Agent-Sessions mit Checkpoints und Recovery
|
||||
- **Shared Brain**: Langzeit-Gedaechtnis und Kontext-Verwaltung
|
||||
- **Orchestration**: Message Bus, Supervisor und Task-Routing
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Breakpilot Services │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │Voice Service│ │Klausur Svc │ │ Admin-v2 / AlertAgent │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────────────┼──────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────▼───────────────────────────────────┐ │
|
||||
│ │ Agent Core │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────────┐ │ │
|
||||
│ │ │ Sessions │ │Shared Brain │ │ Orchestrator │ │ │
|
||||
│ │ │ - Manager │ │ - Memory │ │ - Message Bus │ │ │
|
||||
│ │ │ - Heartbeat │ │ - Context │ │ - Supervisor │ │ │
|
||||
│ │ │ - Checkpoint│ │ - Knowledge │ │ - Task Router │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └───────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────▼───────────────────────────────────┐ │
|
||||
│ │ Infrastructure │ │
|
||||
│ │ Valkey (Redis) PostgreSQL Qdrant │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Verzeichnisstruktur
|
||||
|
||||
```
|
||||
agent-core/
|
||||
├── __init__.py # Modul-Exports
|
||||
├── README.md # Diese Datei
|
||||
├── requirements.txt # Python-Abhaengigkeiten
|
||||
├── pytest.ini # Test-Konfiguration
|
||||
│
|
||||
├── soul/ # Agent SOUL Files (Persoenlichkeiten)
|
||||
│ ├── tutor-agent.soul.md
|
||||
│ ├── grader-agent.soul.md
|
||||
│ ├── quality-judge.soul.md
|
||||
│ ├── alert-agent.soul.md
|
||||
│ └── orchestrator.soul.md
|
||||
│
|
||||
├── brain/ # Shared Brain Implementation
|
||||
│ ├── __init__.py
|
||||
│ ├── memory_store.py # Langzeit-Gedaechtnis
|
||||
│ ├── context_manager.py # Konversations-Kontext
|
||||
│ └── knowledge_graph.py # Entity-Beziehungen
|
||||
│
|
||||
├── sessions/ # Session Management
|
||||
│ ├── __init__.py
|
||||
│ ├── session_manager.py # Session-Lifecycle
|
||||
│ ├── heartbeat.py # Liveness-Monitoring
|
||||
│ └── checkpoint.py # Recovery-Checkpoints
|
||||
│
|
||||
├── orchestrator/ # Multi-Agent Orchestration
|
||||
│ ├── __init__.py
|
||||
│ ├── message_bus.py # Inter-Agent Kommunikation
|
||||
│ ├── supervisor.py # Agent-Ueberwachung
|
||||
│ └── task_router.py # Intent-basiertes Routing
|
||||
│
|
||||
└── tests/ # Unit Tests
|
||||
├── conftest.py
|
||||
├── test_session_manager.py
|
||||
├── test_heartbeat.py
|
||||
├── test_message_bus.py
|
||||
├── test_memory_store.py
|
||||
└── test_task_router.py
|
||||
```
|
||||
|
||||
## Komponenten
|
||||
|
||||
### 1. Session Management
|
||||
|
||||
Verwaltet Agent-Sessions mit State-Machine und Recovery-Faehigkeiten.
|
||||
|
||||
```python
|
||||
from agent_core.sessions import SessionManager, AgentSession
|
||||
|
||||
# Session Manager erstellen
|
||||
manager = SessionManager(
|
||||
redis_client=redis,
|
||||
db_pool=pg_pool,
|
||||
namespace="breakpilot"
|
||||
)
|
||||
|
||||
# Session erstellen
|
||||
session = await manager.create_session(
|
||||
agent_type="tutor-agent",
|
||||
user_id="user-123",
|
||||
context={"subject": "math"}
|
||||
)
|
||||
|
||||
# Checkpoint setzen
|
||||
session.checkpoint("task_started", {"task_id": "abc"})
|
||||
|
||||
# Session beenden
|
||||
session.complete({"result": "success"})
|
||||
```
|
||||
|
||||
**Session States:**
|
||||
|
||||
- `ACTIVE` - Session laeuft
|
||||
- `PAUSED` - Session pausiert
|
||||
- `COMPLETED` - Session erfolgreich beendet
|
||||
- `FAILED` - Session fehlgeschlagen
|
||||
|
||||
### 2. Heartbeat Monitoring
|
||||
|
||||
Ueberwacht Agent-Liveness und triggert Recovery bei Timeout.
|
||||
|
||||
```python
|
||||
from agent_core.sessions import HeartbeatMonitor, HeartbeatClient
|
||||
|
||||
# Monitor starten
|
||||
monitor = HeartbeatMonitor(
|
||||
timeout_seconds=30,
|
||||
check_interval_seconds=5,
|
||||
max_missed_beats=3
|
||||
)
|
||||
await monitor.start_monitoring()
|
||||
|
||||
# Agent registrieren
|
||||
monitor.register("agent-1", "tutor-agent")
|
||||
|
||||
# Heartbeat senden
|
||||
async with HeartbeatClient("agent-1", monitor) as client:
|
||||
# Agent-Arbeit...
|
||||
pass
|
||||
```
|
||||
|
||||
### 3. Memory Store
|
||||
|
||||
Langzeit-Gedaechtnis fuer Agents mit TTL und Access-Tracking.
|
||||
|
||||
```python
|
||||
from agent_core.brain import MemoryStore
|
||||
|
||||
store = MemoryStore(redis_client=redis, db_pool=pg_pool)
|
||||
|
||||
# Erinnerung speichern
|
||||
await store.remember(
|
||||
key="evaluation:math:student-1",
|
||||
value={"score": 85, "feedback": "Gut gemacht!"},
|
||||
agent_id="grader-agent",
|
||||
ttl_days=30
|
||||
)
|
||||
|
||||
# Erinnerung abrufen
|
||||
result = await store.recall("evaluation:math:student-1")
|
||||
|
||||
# Nach Pattern suchen
|
||||
similar = await store.search("evaluation:math:*")
|
||||
```
|
||||
|
||||
### 4. Context Manager
|
||||
|
||||
Verwaltet Konversationskontext mit automatischer Komprimierung.
|
||||
|
||||
```python
|
||||
from agent_core.brain import ContextManager, MessageRole
|
||||
|
||||
ctx_manager = ContextManager(redis_client=redis)
|
||||
|
||||
# Kontext erstellen
|
||||
context = ctx_manager.create_context(
|
||||
session_id="session-123",
|
||||
system_prompt="Du bist ein hilfreicher Tutor...",
|
||||
max_messages=50
|
||||
)
|
||||
|
||||
# Nachrichten hinzufuegen
|
||||
context.add_message(MessageRole.USER, "Was ist Photosynthese?")
|
||||
context.add_message(MessageRole.ASSISTANT, "Photosynthese ist...")
|
||||
|
||||
# Fuer LLM API formatieren
|
||||
messages = context.get_messages_for_llm()
|
||||
```
|
||||
|
||||
### 5. Message Bus
|
||||
|
||||
Inter-Agent Kommunikation via Redis Pub/Sub.
|
||||
|
||||
```python
|
||||
from agent_core.orchestrator import MessageBus, AgentMessage, MessagePriority
|
||||
|
||||
bus = MessageBus(redis_client=redis)
|
||||
await bus.start()
|
||||
|
||||
# Handler registrieren
|
||||
async def handle_message(msg):
|
||||
return {"status": "processed"}
|
||||
|
||||
await bus.subscribe("grader-agent", handle_message)
|
||||
|
||||
# Nachricht senden
|
||||
await bus.publish(AgentMessage(
|
||||
sender="orchestrator",
|
||||
receiver="grader-agent",
|
||||
message_type="grade_request",
|
||||
payload={"exam_id": "exam-1"},
|
||||
priority=MessagePriority.HIGH
|
||||
))
|
||||
|
||||
# Request-Response Pattern
|
||||
response = await bus.request(message, timeout=30.0)
|
||||
```
|
||||
|
||||
### 6. Agent Supervisor
|
||||
|
||||
Ueberwacht und koordiniert alle Agents.
|
||||
|
||||
```python
|
||||
from agent_core.orchestrator import AgentSupervisor, RestartPolicy
|
||||
|
||||
supervisor = AgentSupervisor(message_bus=bus, heartbeat_monitor=monitor)
|
||||
|
||||
# Agent registrieren
|
||||
await supervisor.register_agent(
|
||||
agent_id="tutor-1",
|
||||
agent_type="tutor-agent",
|
||||
restart_policy=RestartPolicy.ON_FAILURE,
|
||||
max_restarts=3,
|
||||
capacity=10
|
||||
)
|
||||
|
||||
# Agent starten
|
||||
await supervisor.start_agent("tutor-1")
|
||||
|
||||
# Load Balancing
|
||||
available = supervisor.get_available_agent("tutor-agent")
|
||||
```
|
||||
|
||||
### 7. Task Router
|
||||
|
||||
Intent-basiertes Routing mit Fallback-Ketten.
|
||||
|
||||
```python
|
||||
from agent_core.orchestrator import TaskRouter, RoutingRule, RoutingStrategy
|
||||
|
||||
router = TaskRouter(supervisor=supervisor)
|
||||
|
||||
# Eigene Regel hinzufuegen
|
||||
router.add_rule(RoutingRule(
|
||||
intent_pattern="learning_*",
|
||||
agent_type="tutor-agent",
|
||||
priority=10,
|
||||
fallback_agent="orchestrator"
|
||||
))
|
||||
|
||||
# Task routen
|
||||
result = await router.route(
|
||||
intent="learning_math",
|
||||
context={"grade": 10},
|
||||
strategy=RoutingStrategy.LEAST_LOADED
|
||||
)
|
||||
|
||||
if result.success:
|
||||
print(f"Routed to {result.agent_id}")
|
||||
```
|
||||
|
||||
## SOUL Files
|
||||
|
||||
SOUL-Dateien definieren die Persoenlichkeit und Verhaltensregeln jedes Agents.
|
||||
|
||||
| Agent | SOUL File | Verantwortlichkeit |
|
||||
|-------|-----------|-------------------|
|
||||
| TutorAgent | tutor-agent.soul.md | Lernbegleitung, Fragen beantworten |
|
||||
| GraderAgent | grader-agent.soul.md | Klausur-Korrektur, Bewertung |
|
||||
| QualityJudge | quality-judge.soul.md | BQAS Qualitaetspruefung |
|
||||
| AlertAgent | alert-agent.soul.md | Monitoring, Benachrichtigungen |
|
||||
| Orchestrator | orchestrator.soul.md | Task-Koordination |
|
||||
|
||||
## Datenbank-Schema
|
||||
|
||||
### agent_sessions
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_sessions (
|
||||
id UUID PRIMARY KEY,
|
||||
agent_type VARCHAR(50) NOT NULL,
|
||||
user_id UUID REFERENCES users(id),
|
||||
state VARCHAR(20) NOT NULL DEFAULT 'active',
|
||||
context JSONB DEFAULT '{}',
|
||||
checkpoints JSONB DEFAULT '[]',
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
last_heartbeat TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### agent_memory
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_memory (
|
||||
id UUID PRIMARY KEY,
|
||||
namespace VARCHAR(100) NOT NULL,
|
||||
key VARCHAR(500) NOT NULL,
|
||||
value JSONB NOT NULL,
|
||||
agent_id VARCHAR(50) NOT NULL,
|
||||
access_count INTEGER DEFAULT 0,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
expires_at TIMESTAMPTZ,
|
||||
UNIQUE(namespace, key)
|
||||
);
|
||||
```
|
||||
|
||||
### agent_messages
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_messages (
|
||||
id UUID PRIMARY KEY,
|
||||
sender VARCHAR(50) NOT NULL,
|
||||
receiver VARCHAR(50) NOT NULL,
|
||||
message_type VARCHAR(50) NOT NULL,
|
||||
payload JSONB NOT NULL,
|
||||
priority INTEGER DEFAULT 1,
|
||||
correlation_id UUID,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### Mit Voice-Service
|
||||
|
||||
```python
|
||||
from services.enhanced_task_orchestrator import EnhancedTaskOrchestrator
|
||||
|
||||
orchestrator = EnhancedTaskOrchestrator(
|
||||
redis_client=redis,
|
||||
db_pool=pg_pool
|
||||
)
|
||||
|
||||
await orchestrator.start()
|
||||
|
||||
# Session fuer Voice-Interaktion
|
||||
session = await orchestrator.create_session(
|
||||
voice_session_id="voice-123",
|
||||
user_id="teacher-1"
|
||||
)
|
||||
|
||||
# Task verarbeiten (nutzt Multi-Agent wenn noetig)
|
||||
await orchestrator.process_task(task)
|
||||
```
|
||||
|
||||
### Mit BQAS
|
||||
|
||||
```python
|
||||
from bqas.quality_judge_agent import QualityJudgeAgent
|
||||
|
||||
judge = QualityJudgeAgent(
|
||||
message_bus=bus,
|
||||
memory_store=memory
|
||||
)
|
||||
|
||||
await judge.start()
|
||||
|
||||
# Direkte Evaluation
|
||||
result = await judge.evaluate(
|
||||
response="Der Satz des Pythagoras...",
|
||||
task_type="learning_math",
|
||||
context={"user_input": "Was ist Pythagoras?"}
|
||||
)
|
||||
|
||||
if result["verdict"] == "production_ready":
|
||||
# Response ist OK
|
||||
pass
|
||||
```
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
# In agent-core Verzeichnis
|
||||
cd agent-core
|
||||
|
||||
# Alle Tests ausfuehren
|
||||
pytest -v
|
||||
|
||||
# Mit Coverage
|
||||
pytest --cov=. --cov-report=html
|
||||
|
||||
# Einzelnes Test-Modul
|
||||
pytest tests/test_session_manager.py -v
|
||||
|
||||
# Async-Tests
|
||||
pytest tests/test_message_bus.py -v
|
||||
```
|
||||
|
||||
## Metriken
|
||||
|
||||
Das Agent-Core exportiert folgende Metriken:
|
||||
|
||||
| Metrik | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `agent_session_duration_seconds` | Dauer von Agent-Sessions |
|
||||
| `agent_heartbeat_delay_seconds` | Zeit seit letztem Heartbeat |
|
||||
| `agent_message_latency_ms` | Latenz der Inter-Agent Kommunikation |
|
||||
| `agent_memory_access_total` | Memory-Zugriffe pro Agent |
|
||||
| `agent_error_total` | Fehler pro Agent-Typ |
|
||||
|
||||
## Naechste Schritte
|
||||
|
||||
1. **Migration ausfuehren**: `psql -f backend/migrations/add_agent_core_tables.sql`
|
||||
2. **Voice-Service erweitern**: Enhanced Orchestrator aktivieren
|
||||
3. **BQAS integrieren**: Quality Judge Agent starten
|
||||
4. **Monitoring aufsetzen**: Metriken in Grafana integrieren
|
||||
947
docs-src/services/ai-compliance-sdk/ARCHITECTURE.md
Normal file
947
docs-src/services/ai-compliance-sdk/ARCHITECTURE.md
Normal file
@@ -0,0 +1,947 @@
|
||||
# UCCA - Use-Case Compliance & Feasibility Advisor
|
||||
|
||||
## Systemarchitektur
|
||||
|
||||
### 1. Übersicht
|
||||
|
||||
Das UCCA-System ist ein **deterministisches Compliance-Bewertungssystem** für KI-Anwendungsfälle. Es kombiniert regelbasierte Evaluation mit optionaler LLM-Erklärung und semantischer Rechtstextsuche.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ UCCA System │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Frontend │───>│ SDK API │───>│ PostgreSQL │ │
|
||||
│ │ (Next.js) │ │ (Go) │ │ Database │ │
|
||||
│ └──────────────┘ └──────┬───────┘ └──────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────┼────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Policy │ │ Escalation │ │ Legal RAG │ │
|
||||
│ │ Engine │ │ Workflow │ │ (Qdrant) │ │
|
||||
│ │ (45 Regeln) │ │ (E0-E3) │ │ 2,274 Chunks │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────────────────┴────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────┐ │
|
||||
│ │ LLM Provider │ │
|
||||
│ │ (Ollama/API) │ │
|
||||
│ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2. Kernprinzip
|
||||
|
||||
> **"LLM ist NICHT die Quelle der Wahrheit. Wahrheit = Regeln + Evidenz. LLM = Übersetzer + Subsumptionshelfer"**
|
||||
|
||||
Das System folgt einem strikten **Human-in-the-Loop** Ansatz:
|
||||
|
||||
1. **Deterministische Regeln** treffen alle Compliance-Entscheidungen
|
||||
2. **LLM** erklärt nur Ergebnisse, überschreibt nie BLOCK-Entscheidungen
|
||||
3. **Menschen** (DSB, Legal) treffen finale Entscheidungen bei kritischen Fällen
|
||||
|
||||
---
|
||||
|
||||
## 3. Komponenten
|
||||
|
||||
### 3.1 Policy Engine (`internal/ucca/rules.go`)
|
||||
|
||||
Die Policy Engine evaluiert Use-Cases gegen ~45 deterministische Regeln.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Policy Engine │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ UseCaseIntake ──────────────────────────────────────────────> │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Regelkategorien (A-J) │ │
|
||||
│ ├─────────────────────────────────────────────────────────────┤ │
|
||||
│ │ A. Datenklassifikation │ R-001 bis R-006 │ 6 Regeln │ │
|
||||
│ │ B. Zweck & Kontext │ R-010 bis R-013 │ 4 Regeln │ │
|
||||
│ │ C. Automatisierung │ R-020 bis R-025 │ 6 Regeln │ │
|
||||
│ │ D. Training vs Nutzung │ R-030 bis R-035 │ 6 Regeln │ │
|
||||
│ │ E. Speicherung │ R-040 bis R-042 │ 3 Regeln │ │
|
||||
│ │ F. Hosting │ R-050 bis R-052 │ 3 Regeln │ │
|
||||
│ │ G. Transparenz │ R-060 bis R-062 │ 3 Regeln │ │
|
||||
│ │ H. Domain-spezifisch │ R-070 bis R-074 │ 5 Regeln │ │
|
||||
│ │ I. Aggregation │ R-090 bis R-092 │ 3 Regeln │ │
|
||||
│ │ J. Erklärung │ R-100 │ 1 Regel │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ AssessmentResult │
|
||||
│ ├── feasibility: YES | CONDITIONAL | NO │
|
||||
│ ├── risk_score: 0-100 │
|
||||
│ ├── risk_level: MINIMAL | LOW | MEDIUM | HIGH | CRITICAL │
|
||||
│ ├── triggered_rules: []TriggeredRule │
|
||||
│ ├── required_controls: []RequiredControl │
|
||||
│ ├── recommended_architecture: []PatternRecommendation │
|
||||
│ └── forbidden_patterns: []ForbiddenPattern │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Regel-Severities:**
|
||||
- `INFO`: Informativ, kein Risiko-Impact
|
||||
- `WARN`: Warnung, erhöht Risk Score
|
||||
- `BLOCK`: Kritisch, führt zu `feasibility=NO`
|
||||
|
||||
### 3.2 Escalation Workflow (`internal/ucca/escalation_*.go`)
|
||||
|
||||
Das Eskalationssystem routet kritische Assessments zur menschlichen Prüfung.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Escalation Workflow │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ AssessmentResult ─────────────────────────────────────────────> │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Escalation Level Determination │ │
|
||||
│ ├─────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ E0: Nur INFO-Regeln, Risk < 20 │ │
|
||||
│ │ → Auto-Approve, keine menschliche Prüfung │ │
|
||||
│ │ │ │
|
||||
│ │ E1: WARN-Regeln, Risk 20-39 │ │
|
||||
│ │ → Team-Lead Review (SLA: 24h) │ │
|
||||
│ │ │ │
|
||||
│ │ E2: Art.9 Daten ODER Risk 40-59 ODER DSFA empfohlen │ │
|
||||
│ │ → DSB Consultation (SLA: 8h) │ │
|
||||
│ │ │ │
|
||||
│ │ E3: BLOCK-Regel ODER Risk ≥60 ODER Art.22 Risiko │ │
|
||||
│ │ → DSB + Legal Review (SLA: 4h) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DSB Pool Assignment │ │
|
||||
│ ├─────────────────────────────────────────────────────────────┤ │
|
||||
│ │ Role │ Level │ Max Concurrent │ Auto-Assign │ │
|
||||
│ │ ──────────────┼───────┼────────────────┼────────────────── │ │
|
||||
│ │ team_lead │ E1 │ 10 │ Round-Robin │ │
|
||||
│ │ dsb │ E2,E3 │ 5 │ Workload-Based │ │
|
||||
│ │ legal │ E3 │ 3 │ Workload-Based │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Escalation Status Flow: │
|
||||
│ │
|
||||
│ pending → assigned → in_review → approved/rejected/returned │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3.3 Legal RAG (`internal/llm/legal_rag.go`)
|
||||
|
||||
Semantische Suche in 19 EU-Regulierungen für kontextbasierte Erklärungen.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Legal RAG System │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Explain Request ──────────────────────────────────────────────> │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Qdrant Vector DB │ │
|
||||
│ │ Collection: bp_legal_corpus │ │
|
||||
│ │ 2,274 Chunks, 1024-dim BGE-M3 │ │
|
||||
│ ├─────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ EU-Verordnungen: │ │
|
||||
│ │ ├── DSGVO (128) ├── AI Act (96) ├── NIS2 (128) │ │
|
||||
│ │ ├── CRA (256) ├── Data Act (256) ├── DSA (256) │ │
|
||||
│ │ ├── DGA (32) ├── EUCSA (32) ├── DPF (714) │ │
|
||||
│ │ └── ... │ │
|
||||
│ │ │ │
|
||||
│ │ Deutsche Gesetze: │ │
|
||||
│ │ ├── TDDDG (1) ├── SCC (32) ├── ... │ │
|
||||
│ │ │ │
|
||||
│ │ BSI-Standards: │ │
|
||||
│ │ ├── TR-03161-1 (6) ├── TR-03161-2 (6) ├── TR-03161-3 │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Hybrid Search (Dense + Sparse) │
|
||||
│ │ Re-Ranking (Cross-Encoder) │
|
||||
│ ▼ │
|
||||
│ Top-K Relevant Passages ─────────────────────────────────────> │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ LLM Explanation │ │
|
||||
│ │ Provider: Ollama (local) / Anthropic (fallback) │ │
|
||||
│ │ Prompt: Assessment + Legal Context → Erklärung │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Datenfluss
|
||||
|
||||
### 4.1 Assessment-Erstellung
|
||||
|
||||
```
|
||||
User Input (Frontend)
|
||||
│
|
||||
▼
|
||||
POST /sdk/v1/ucca/assess
|
||||
│
|
||||
├──────────────────────────────────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ Policy │ │ Escalation │
|
||||
│ Engine │ │ Trigger │
|
||||
│ Evaluation │ │ Check │
|
||||
└──────┬───────┘ └──────┬───────┘
|
||||
│ │
|
||||
│ AssessmentResult │ EscalationLevel
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ PostgreSQL │
|
||||
│ ├── ucca_assessments (Assessment + Result) │
|
||||
│ └── ucca_escalations (wenn Level > E0) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ If Level > E0
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ DSB Pool │
|
||||
│ Auto-Assign │
|
||||
└──────────────┘
|
||||
│
|
||||
▼
|
||||
Notification (E-Mail/Webhook)
|
||||
```
|
||||
|
||||
### 4.2 Erklärung mit Legal RAG
|
||||
|
||||
```
|
||||
POST /sdk/v1/ucca/assessments/:id/explain
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Load │
|
||||
│ Assessment │
|
||||
└──────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────┐ Query Vector ┌──────────────┐
|
||||
│ Extract │ ──────────────────>│ Qdrant │
|
||||
│ Keywords │ │ bp_legal_ │
|
||||
│ from Rules │<───────────────────│ corpus │
|
||||
└──────┬───────┘ Top-K Docs └──────────────┘
|
||||
│
|
||||
│ Assessment + Legal Context
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ LLM │
|
||||
│ Provider │
|
||||
│ Registry │
|
||||
└──────┬───────┘
|
||||
│
|
||||
▼
|
||||
Explanation (DE) + Legal References
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Entscheidungsdiagramm
|
||||
|
||||
### 5.1 Feasibility-Entscheidung
|
||||
|
||||
```
|
||||
UseCaseIntake
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Hat BLOCK-Regeln? │
|
||||
└──────────┬──────────┘
|
||||
│ │
|
||||
Ja Nein
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────┐ ┌─────────────────────┐
|
||||
│ NO │ │ Hat WARN-Regeln? │
|
||||
│ (blocked) │ └──────────┬──────────┘
|
||||
└───────────┘ │ │
|
||||
Ja Nein
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────┐ ┌───────────┐
|
||||
│CONDITIONAL│ │ YES │
|
||||
│(mit │ │(grünes │
|
||||
│Auflagen) │ │Licht) │
|
||||
└───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
### 5.2 Escalation-Level-Entscheidung
|
||||
|
||||
```
|
||||
AssessmentResult
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ BLOCK-Regel oder │
|
||||
│ Art.22 Risiko? │
|
||||
└──────────┬──────────┘
|
||||
│ │
|
||||
Ja Nein
|
||||
│ │
|
||||
▼ │
|
||||
┌───────────┐ │
|
||||
│ E3 │ │
|
||||
│ DSB+Legal │ │
|
||||
└───────────┘ ▼
|
||||
┌─────────────────────┐
|
||||
│ Risk ≥40 oder │
|
||||
│ Art.9 Daten oder │
|
||||
│ DSFA empfohlen? │
|
||||
└──────────┬──────────┘
|
||||
│ │
|
||||
Ja Nein
|
||||
│ │
|
||||
▼ │
|
||||
┌───────────┐ │
|
||||
│ E2 │ │
|
||||
│ DSB │ │
|
||||
└───────────┘ ▼
|
||||
┌─────────────────────┐
|
||||
│ Risk ≥20 oder │
|
||||
│ WARN-Regeln? │
|
||||
└──────────┬──────────┘
|
||||
│ │
|
||||
Ja Nein
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────┐ ┌───────────┐
|
||||
│ E1 │ │ E0 │
|
||||
│ Team-Lead │ │ Auto-OK │
|
||||
└───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Datenbank-Schema
|
||||
|
||||
### 6.1 ucca_assessments
|
||||
|
||||
```sql
|
||||
CREATE TABLE ucca_assessments (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id UUID NOT NULL,
|
||||
namespace_id UUID,
|
||||
title VARCHAR(500),
|
||||
policy_version VARCHAR(50) NOT NULL,
|
||||
status VARCHAR(50) DEFAULT 'completed',
|
||||
|
||||
-- Input
|
||||
intake JSONB NOT NULL,
|
||||
use_case_text_stored BOOLEAN DEFAULT FALSE,
|
||||
use_case_text_hash VARCHAR(64),
|
||||
domain VARCHAR(50),
|
||||
|
||||
-- Result
|
||||
feasibility VARCHAR(20) NOT NULL,
|
||||
risk_level VARCHAR(20) NOT NULL,
|
||||
risk_score INT NOT NULL DEFAULT 0,
|
||||
triggered_rules JSONB DEFAULT '[]',
|
||||
required_controls JSONB DEFAULT '[]',
|
||||
recommended_architecture JSONB DEFAULT '[]',
|
||||
forbidden_patterns JSONB DEFAULT '[]',
|
||||
example_matches JSONB DEFAULT '[]',
|
||||
|
||||
-- Flags
|
||||
dsfa_recommended BOOLEAN DEFAULT FALSE,
|
||||
art22_risk BOOLEAN DEFAULT FALSE,
|
||||
training_allowed VARCHAR(50),
|
||||
|
||||
-- Explanation
|
||||
explanation_text TEXT,
|
||||
explanation_generated_at TIMESTAMPTZ,
|
||||
explanation_model VARCHAR(100),
|
||||
|
||||
-- Audit
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
created_by UUID NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 6.2 ucca_escalations
|
||||
|
||||
```sql
|
||||
CREATE TABLE ucca_escalations (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id UUID NOT NULL,
|
||||
assessment_id UUID NOT NULL REFERENCES ucca_assessments(id),
|
||||
|
||||
-- Level & Status
|
||||
escalation_level VARCHAR(10) NOT NULL,
|
||||
escalation_reason TEXT,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending',
|
||||
|
||||
-- Assignment
|
||||
assigned_to UUID,
|
||||
assigned_role VARCHAR(50),
|
||||
assigned_at TIMESTAMPTZ,
|
||||
|
||||
-- Review
|
||||
reviewer_id UUID,
|
||||
reviewer_notes TEXT,
|
||||
reviewed_at TIMESTAMPTZ,
|
||||
|
||||
-- Decision
|
||||
decision VARCHAR(50),
|
||||
decision_notes TEXT,
|
||||
decision_at TIMESTAMPTZ,
|
||||
conditions JSONB DEFAULT '[]',
|
||||
|
||||
-- SLA
|
||||
due_date TIMESTAMPTZ,
|
||||
notification_sent BOOLEAN DEFAULT FALSE,
|
||||
notification_sent_at TIMESTAMPTZ,
|
||||
|
||||
-- Audit
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### 6.3 ucca_dsb_pool
|
||||
|
||||
```sql
|
||||
CREATE TABLE ucca_dsb_pool (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id UUID NOT NULL,
|
||||
user_id UUID NOT NULL,
|
||||
user_name VARCHAR(255) NOT NULL,
|
||||
user_email VARCHAR(255) NOT NULL,
|
||||
role VARCHAR(50) NOT NULL,
|
||||
is_active BOOLEAN DEFAULT TRUE,
|
||||
max_concurrent_reviews INT DEFAULT 10,
|
||||
current_reviews INT DEFAULT 0,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. API-Endpunkte
|
||||
|
||||
### 7.1 Assessment
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/assess` | Assessment erstellen |
|
||||
| GET | `/sdk/v1/ucca/assessments` | Assessments auflisten |
|
||||
| GET | `/sdk/v1/ucca/assessments/:id` | Assessment abrufen |
|
||||
| DELETE | `/sdk/v1/ucca/assessments/:id` | Assessment löschen |
|
||||
| POST | `/sdk/v1/ucca/assessments/:id/explain` | LLM-Erklärung generieren |
|
||||
| GET | `/sdk/v1/ucca/export/:id` | Assessment exportieren |
|
||||
|
||||
### 7.2 Kataloge
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/patterns` | Architektur-Patterns |
|
||||
| GET | `/sdk/v1/ucca/examples` | Didaktische Beispiele |
|
||||
| GET | `/sdk/v1/ucca/rules` | Alle Regeln |
|
||||
| GET | `/sdk/v1/ucca/controls` | Required Controls |
|
||||
| GET | `/sdk/v1/ucca/problem-solutions` | Problem-Lösungen |
|
||||
|
||||
### 7.3 Eskalation
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/escalations` | Eskalationen auflisten |
|
||||
| GET | `/sdk/v1/ucca/escalations/:id` | Eskalation abrufen |
|
||||
| POST | `/sdk/v1/ucca/escalations` | Manuelle Eskalation |
|
||||
| POST | `/sdk/v1/ucca/escalations/:id/assign` | Zuweisen |
|
||||
| POST | `/sdk/v1/ucca/escalations/:id/review` | Review starten |
|
||||
| POST | `/sdk/v1/ucca/escalations/:id/decide` | Entscheidung treffen |
|
||||
| GET | `/sdk/v1/ucca/escalations/stats` | Statistiken |
|
||||
|
||||
### 7.4 DSB Pool
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/dsb-pool` | Pool-Mitglieder auflisten |
|
||||
| POST | `/sdk/v1/ucca/dsb-pool` | Mitglied hinzufügen |
|
||||
|
||||
---
|
||||
|
||||
## 8. Sicherheit
|
||||
|
||||
### 8.1 Authentifizierung
|
||||
|
||||
- JWT-basierte Authentifizierung
|
||||
- Header: `X-User-ID`, `X-Tenant-ID`
|
||||
- Multi-Tenant-Isolation
|
||||
|
||||
### 8.2 Autorisierung
|
||||
|
||||
- RBAC (Role-Based Access Control)
|
||||
- Permissions: `ucca:assess`, `ucca:review`, `ucca:admin`
|
||||
- Namespace-Level Isolation
|
||||
|
||||
### 8.3 Datenschutz
|
||||
|
||||
- Use-Case-Text optional (Opt-in)
|
||||
- SHA-256 Hash statt Klartext
|
||||
- Audit-Trail für alle Operationen
|
||||
- Legal RAG: `training_allowed: false`
|
||||
|
||||
---
|
||||
|
||||
## 9. Deployment
|
||||
|
||||
### 9.1 Container
|
||||
|
||||
```yaml
|
||||
ai-compliance-sdk:
|
||||
build: ./ai-compliance-sdk
|
||||
ports:
|
||||
- "8090:8090"
|
||||
environment:
|
||||
- DATABASE_URL=postgres://...
|
||||
- OLLAMA_URL=http://ollama:11434
|
||||
- QDRANT_URL=http://qdrant:6333
|
||||
depends_on:
|
||||
- postgres
|
||||
- qdrant
|
||||
```
|
||||
|
||||
### 9.2 Abhängigkeiten
|
||||
|
||||
- PostgreSQL 15+
|
||||
- Qdrant 1.12+
|
||||
- Embedding Service (BGE-M3)
|
||||
- Ollama (optional, für LLM)
|
||||
|
||||
---
|
||||
|
||||
## 10. Monitoring
|
||||
|
||||
### 10.1 Health Check
|
||||
|
||||
```
|
||||
GET /sdk/v1/health
|
||||
→ {"status": "ok"}
|
||||
```
|
||||
|
||||
### 10.2 Metriken
|
||||
|
||||
- Assessment-Durchsatz
|
||||
- Escalation-SLA-Compliance
|
||||
- LLM-Latenz
|
||||
- RAG-Trefferqualität
|
||||
|
||||
---
|
||||
|
||||
## 11. Wizard & Legal Assistant
|
||||
|
||||
### 11.1 Wizard-Architektur
|
||||
|
||||
Der UCCA-Wizard führt Benutzer durch 9 Schritte zur Erfassung aller relevanten Compliance-Fakten.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ UCCA Wizard v1.1 │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Step 1: Grundlegende Informationen │
|
||||
│ Step 2: Datenarten (Personal Data, Art. 9, etc.) │
|
||||
│ Step 3: Verarbeitungszweck (Profiling, Scoring) │
|
||||
│ Step 4: Hosting & Provider │
|
||||
│ Step 5: Internationaler Datentransfer (SCC, TIA) │
|
||||
│ Step 6: KI-Modell und Training │
|
||||
│ Step 7: Verträge & Compliance (AVV, DSFA) │
|
||||
│ Step 8: Automatisierung & Human Oversight │
|
||||
│ Step 9: Standards & Normen (für Maschinenbauer) ← NEU │
|
||||
│ │
|
||||
│ Features: │
|
||||
│ ├── Adaptive Subflows (visible_if Conditions) │
|
||||
│ ├── Simple/Expert Mode Toggle │
|
||||
│ ├── Legal Assistant Chat pro Step │
|
||||
│ └── simple_explanation für Nicht-Juristen │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 11.2 Legal Assistant (Wizard Chat)
|
||||
|
||||
Integrierter Rechtsassistent für Echtzeit-Hilfe bei Wizard-Fragen.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Legal Assistant Flow │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ User Question ─────────────────────────────────────────────────>│
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────┐ │
|
||||
│ │ Build RAG Query │ │
|
||||
│ │ + Step Context │ │
|
||||
│ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────┐ Search ┌──────────────────┐ │
|
||||
│ │ Legal RAG │ ────────────>│ Qdrant │ │
|
||||
│ │ Client │ │ bp_legal_corpus │ │
|
||||
│ │ │<────────────│ + SCC Corpus │ │
|
||||
│ └────────┬─────────┘ Top-5 └──────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Question + Legal Context │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────┐ │
|
||||
│ │ Internal 32B LLM │ │
|
||||
│ │ (Ollama) │ │
|
||||
│ │ temp=0.3 │ │
|
||||
│ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Answer + Sources + Related Fields │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**API-Endpunkte:**
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/wizard/schema` | Wizard-Schema abrufen |
|
||||
| POST | `/sdk/v1/ucca/wizard/ask` | Frage an Legal Assistant |
|
||||
|
||||
---
|
||||
|
||||
## 12. License Policy Engine (Standards Compliance)
|
||||
|
||||
### 12.1 Übersicht
|
||||
|
||||
Die License Policy Engine verwaltet die Lizenz-/Urheberrechts-Compliance für Standards und Normen (DIN, ISO, VDI, etc.).
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ License Policy Engine │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ LicensedContentFacts ─────────────────────────────────────────>│
|
||||
│ │ │
|
||||
│ │ ├── present: bool │
|
||||
│ │ ├── publisher: DIN_MEDIA | VDI | ISO | ... │
|
||||
│ │ ├── license_type: SINGLE | NETWORK | ENTERPRISE | AI │
|
||||
│ │ ├── ai_use_permitted: YES | NO | UNKNOWN │
|
||||
│ │ ├── operation_mode: LINK | NOTES | FULLTEXT | TRAINING │
|
||||
│ │ └── proof_uploaded: bool │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐│
|
||||
│ │ Operation Mode Evaluation ││
|
||||
│ ├─────────────────────────────────────────────────────────────┤│
|
||||
│ │ ││
|
||||
│ │ LINK_ONLY ──────────── Always Allowed ───────────> OK ││
|
||||
│ │ NOTES_ONLY ─────────── Usually Allowed ──────────> OK ││
|
||||
│ │ FULLTEXT_RAG ────┬──── ai_use=YES + proof ───────> OK ││
|
||||
│ │ └──── else ─────────────────────> BLOCK ││
|
||||
│ │ TRAINING ────────┬──── AI_LICENSE + proof ───────> OK ││
|
||||
│ │ └──── else ─────────────────────> BLOCK ││
|
||||
│ │ ││
|
||||
│ └─────────────────────────────────────────────────────────────┘│
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ LicensePolicyResult │
|
||||
│ ├── allowed: bool │
|
||||
│ ├── effective_mode: string (may be downgraded) │
|
||||
│ ├── gaps: []LicenseGap │
|
||||
│ ├── required_controls: []LicenseControl │
|
||||
│ ├── stop_line: *StopLine (if hard blocked) │
|
||||
│ └── output_restrictions: *OutputRestrictions │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 12.2 Betriebs-Modi (Operation Modes)
|
||||
|
||||
| Modus | Beschreibung | Lizenz-Anforderung | Ingest | Output |
|
||||
|-------|--------------|-------------------|--------|--------|
|
||||
| **LINK_ONLY** | Nur Verweise & Checklisten | Keine | Metadata only | Keine Zitate |
|
||||
| **NOTES_ONLY** | Kundeneigene Zusammenfassungen | Standard | Notes only | Paraphrasen |
|
||||
| **EXCERPT_ONLY** | Kurze Zitate (Zitatrecht) | Standard + Zitatrecht | Notes | Max 150 Zeichen |
|
||||
| **FULLTEXT_RAG** | Volltext indexiert | AI-Lizenz + Proof | Fulltext | Max 500 Zeichen |
|
||||
| **TRAINING** | Modell-Training | AI-Training-Lizenz | Fulltext | N/A |
|
||||
|
||||
### 12.3 Publisher-spezifische Regeln
|
||||
|
||||
**DIN Media (ehem. Beuth):**
|
||||
- AI-Nutzung aktuell verboten (ohne explizite Genehmigung)
|
||||
- AI-Lizenzmodell geplant ab Q4/2025
|
||||
- Crawler/Scraper verboten (AGB)
|
||||
- TDM-Vorbehalt nach §44b UrhG
|
||||
|
||||
### 12.4 Stop-Lines (Hard Deny)
|
||||
|
||||
```
|
||||
STOP_DIN_FULLTEXT_AI_NOT_ALLOWED
|
||||
WENN: publisher=DIN_MEDIA AND operation_mode in [FULLTEXT_RAG, TRAINING]
|
||||
AND ai_use_permitted in [NO, UNKNOWN]
|
||||
DANN: BLOCKIERT
|
||||
FALLBACK: LINK_ONLY
|
||||
|
||||
STOP_TRAINING_WITHOUT_PROOF
|
||||
WENN: operation_mode=TRAINING AND proof_uploaded=false
|
||||
DANN: BLOCKIERT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. SCC & Transfer Impact Assessment
|
||||
|
||||
### 13.1 Drittlandtransfer-Bewertung
|
||||
|
||||
Das System unterstützt die vollständige Bewertung internationaler Datentransfers.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ SCC/Transfer Assessment Flow │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ hosting.region ─────────────────────────────────────────────── │
|
||||
│ │ │
|
||||
│ ├── EU/EWR ────────────────────────────────> OK (no SCC) │
|
||||
│ │ │
|
||||
│ ├── Adequacy Country (UK, CH, JP) ─────────> OK (no SCC) │
|
||||
│ │ │
|
||||
│ └── Third Country (US, etc.) ──────────────────────────── │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐│
|
||||
│ │ USA: DPF-Zertifizierung prüfen ││
|
||||
│ │ ├── Zertifiziert ───> OK (SCC empfohlen als Backup) ││
|
||||
│ │ └── Nicht zertifiziert ───> SCC + TIA erforderlich ││
|
||||
│ └─────────────────────────────────────────────────────────┘│
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐│
|
||||
│ │ Transfer Impact Assessment (TIA) ││
|
||||
│ │ ├── Adequate ─────────────> Transfer OK ││
|
||||
│ │ ├── Adequate + Measures ──> + Technical Supplementary ││
|
||||
│ │ ├── Inadequate ───────────> Fix required ││
|
||||
│ │ └── Not Feasible ─────────> Transfer NOT allowed ││
|
||||
│ └─────────────────────────────────────────────────────────┘│
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 13.2 SCC-Versionen
|
||||
|
||||
- Neue SCC (EU 2021/914) - **erforderlich** seit 27.12.2022
|
||||
- Alte SCC (vor 2021) - **nicht mehr gültig**
|
||||
|
||||
---
|
||||
|
||||
## 14. Controls Catalog
|
||||
|
||||
### 14.1 Übersicht
|
||||
|
||||
Der Controls Catalog enthält ~30 Maßnahmenbausteine mit detaillierten Handlungsanweisungen.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Controls Catalog v1.0 │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Kategorien: │
|
||||
│ ├── DSGVO (Rechtsgrundlagen, Betroffenenrechte, Dokumentation) │
|
||||
│ ├── AI_Act (Transparenz, HITL, Risikoeinstufung) │
|
||||
│ ├── Technical (Verschlüsselung, Anonymisierung, PII-Gateway) │
|
||||
│ └── Contractual (AVV, SCC, TIA) │
|
||||
│ │
|
||||
│ Struktur pro Control: │
|
||||
│ ├── id: CTRL-xxx │
|
||||
│ ├── title: Kurztitel │
|
||||
│ ├── when_applicable: Wann erforderlich? │
|
||||
│ ├── what_to_do: Konkrete Handlungsschritte │
|
||||
│ ├── evidence_needed: Erforderliche Nachweise │
|
||||
│ ├── effort: low | medium | high │
|
||||
│ └── gdpr_ref: Rechtsgrundlage │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 14.2 Beispiel-Controls
|
||||
|
||||
| ID | Titel | Kategorie |
|
||||
|----|-------|-----------|
|
||||
| CTRL-CONSENT-EXPLICIT | Ausdrückliche Einwilligung | DSGVO |
|
||||
| CTRL-AI-TRANSPARENCY | KI-Transparenz-Hinweis | AI_Act |
|
||||
| CTRL-DSFA | Datenschutz-Folgenabschätzung | DSGVO |
|
||||
| CTRL-SCC | Standardvertragsklauseln | Contractual |
|
||||
| CTRL-TIA | Transfer Impact Assessment | Contractual |
|
||||
| CTRL-LICENSE-PROOF | Lizenz-/Rechte-Nachweis | License |
|
||||
| CTRL-LINK-ONLY-MODE | Evidence Navigator | License |
|
||||
| CTRL-PII-GATEWAY | PII-Redaction Gateway | Technical |
|
||||
|
||||
---
|
||||
|
||||
## 15. Policy-Dateien
|
||||
|
||||
### 15.1 Dateistruktur
|
||||
|
||||
```
|
||||
policies/
|
||||
├── ucca_policy_v1.yaml # Haupt-Policy (Regeln, Controls)
|
||||
├── controls_catalog.yaml # Detaillierter Maßnahmenkatalog
|
||||
├── gap_mapping.yaml # Facts → Gaps → Controls
|
||||
├── wizard_schema_v1.yaml # Wizard-Fragen (9 Steps)
|
||||
├── scc_legal_corpus.yaml # SCC/Transfer Rechtstexte
|
||||
└── licensed_content_policy.yaml # Normen-Lizenz-Compliance (NEU)
|
||||
```
|
||||
|
||||
### 15.2 Versions-Management
|
||||
|
||||
- Jedes Assessment speichert die `policy_version`
|
||||
- Regeländerungen erzeugen neue Version
|
||||
- Audit-Trail zeigt welche Policy-Version verwendet wurde
|
||||
|
||||
---
|
||||
|
||||
## 16. Generic Obligations Framework
|
||||
|
||||
### 16.1 Übersicht
|
||||
|
||||
Das Generic Obligations Framework ermöglicht die automatische Ableitung regulatorischer Pflichten aus mehreren Verordnungen basierend auf Unternehmensfakten.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Generic Obligations Framework │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ UnifiedFacts ───────────────────────────────────────────────── │
|
||||
│ │ │
|
||||
│ │ ├── organization: EmployeeCount, Revenue, Country │
|
||||
│ │ ├── sector: PrimarySector, IsKRITIS, SpecialServices │
|
||||
│ │ ├── data_protection: ProcessesPersonalData │
|
||||
│ │ └── ai_usage: UsesAI, HighRiskCategories, IsGPAI │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐│
|
||||
│ │ Obligations Registry ││
|
||||
│ │ (Module Registration & Evaluation) ││
|
||||
│ └──────────────────────────┬──────────────────────────────────┘│
|
||||
│ │ │
|
||||
│ ┌───────────────────┼───────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ NIS2 │ │ DSGVO │ │ AI Act │ │
|
||||
│ │ Module │ │ Module │ │ Module │ │
|
||||
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
|
||||
│ │ │ │ │
|
||||
│ └───────────────────┴───────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐│
|
||||
│ │ ManagementObligationsOverview ││
|
||||
│ │ ├── ApplicableRegulations[] ││
|
||||
│ │ ├── Obligations[] (sortiert nach Priorität) ││
|
||||
│ │ ├── RequiredControls[] ││
|
||||
│ │ ├── IncidentDeadlines[] ││
|
||||
│ │ ├── SanctionsSummary ││
|
||||
│ │ └── ExecutiveSummary ││
|
||||
│ └─────────────────────────────────────────────────────────────┘│
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 16.2 Regulation Modules
|
||||
|
||||
Jede Regulierung wird als eigenständiges Modul implementiert:
|
||||
|
||||
**Implementierte Module:**
|
||||
|
||||
| Modul | ID | Datei | Pflichten | Kontrollen |
|
||||
|-------|-----|-------|-----------|------------|
|
||||
| NIS2 | `nis2` | `nis2_module.go` | ~15 | ~8 |
|
||||
| DSGVO | `dsgvo` | `dsgvo_module.go` | ~12 | ~6 |
|
||||
| AI Act | `ai_act` | `ai_act_module.go` | ~15 | ~6 |
|
||||
|
||||
---
|
||||
|
||||
## 17. Obligations API-Endpunkte
|
||||
|
||||
### 17.1 Assessment
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/obligations/assess` | Pflichten-Assessment erstellen |
|
||||
| GET | `/sdk/v1/ucca/obligations/:id` | Assessment abrufen |
|
||||
| GET | `/sdk/v1/ucca/obligations` | Assessments auflisten |
|
||||
|
||||
### 17.2 Export
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/obligations/export/memo` | Memo exportieren (gespeichert) |
|
||||
| POST | `/sdk/v1/ucca/obligations/export/direct` | Direkt-Export ohne Speicherung |
|
||||
|
||||
### 17.3 Regulations
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/regulations` | Liste aller Regulierungsmodule |
|
||||
| GET | `/sdk/v1/ucca/regulations/:id/decision-tree` | Decision Tree für Regulierung |
|
||||
|
||||
---
|
||||
|
||||
## 18. Dateien des Obligations Framework
|
||||
|
||||
### 18.1 Backend (Go)
|
||||
|
||||
```
|
||||
internal/ucca/
|
||||
├── obligations_framework.go # Interfaces, Typen, Konstanten
|
||||
├── obligations_registry.go # Modul-Registry, EvaluateAll()
|
||||
├── nis2_module.go # NIS2 Decision Tree + Pflichten
|
||||
├── nis2_module_test.go # NIS2 Tests
|
||||
├── dsgvo_module.go # DSGVO Pflichten
|
||||
├── dsgvo_module_test.go # DSGVO Tests
|
||||
├── ai_act_module.go # AI Act Risk Classification
|
||||
├── ai_act_module_test.go # AI Act Tests
|
||||
├── pdf_export.go # PDF/Markdown Export
|
||||
└── pdf_export_test.go # Export Tests
|
||||
```
|
||||
|
||||
### 18.2 Policy-Dateien (YAML)
|
||||
|
||||
```
|
||||
policies/obligations/
|
||||
├── nis2_obligations.yaml # ~15 NIS2-Pflichten
|
||||
├── dsgvo_obligations.yaml # ~12 DSGVO-Pflichten
|
||||
└── ai_act_obligations.yaml # ~15 AI Act-Pflichten
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Dokumentation erstellt: 2026-01-29*
|
||||
*Version: 2.1.0*
|
||||
387
docs-src/services/ai-compliance-sdk/AUDITOR_DOCUMENTATION.md
Normal file
387
docs-src/services/ai-compliance-sdk/AUDITOR_DOCUMENTATION.md
Normal file
@@ -0,0 +1,387 @@
|
||||
# UCCA - Dokumentation für externe Auditoren
|
||||
|
||||
## Systemdokumentation nach Art. 30 DSGVO
|
||||
|
||||
**Verantwortlicher:** [Name des Unternehmens]
|
||||
**Datenschutzbeauftragter:** [Kontakt]
|
||||
**Dokumentationsstand:** 2026-01-29
|
||||
**Version:** 1.0.0
|
||||
|
||||
---
|
||||
|
||||
## 1. Zweck und Funktionsweise des Systems
|
||||
|
||||
### 1.1 Systembezeichnung
|
||||
|
||||
**UCCA - Use-Case Compliance & Feasibility Advisor**
|
||||
|
||||
### 1.2 Zweckbeschreibung
|
||||
|
||||
Das UCCA-System ist ein **Compliance-Prüfwerkzeug**, das Organisationen bei der Bewertung geplanter KI-Anwendungsfälle hinsichtlich ihrer datenschutzrechtlichen Zulässigkeit unterstützt.
|
||||
|
||||
**Kernfunktionen:**
|
||||
- Automatisierte Vorprüfung von KI-Anwendungsfällen gegen EU-Regulierungen
|
||||
- Identifikation erforderlicher technischer und organisatorischer Maßnahmen
|
||||
- Eskalation kritischer Fälle zur menschlichen Prüfung
|
||||
- Dokumentation und Nachvollziehbarkeit aller Prüfentscheidungen
|
||||
|
||||
### 1.3 Rechtsgrundlage
|
||||
|
||||
Die Verarbeitung erfolgt auf Basis von:
|
||||
- **Art. 6 Abs. 1 lit. c DSGVO** - Erfüllung rechtlicher Verpflichtungen
|
||||
- **Art. 6 Abs. 1 lit. f DSGVO** - Berechtigte Interessen (Compliance-Management)
|
||||
|
||||
---
|
||||
|
||||
## 2. Verarbeitete Datenkategorien
|
||||
|
||||
### 2.1 Eingabedaten (Use-Case-Beschreibungen)
|
||||
|
||||
| Datenkategorie | Beschreibung | Speicherung |
|
||||
|----------------|--------------|-------------|
|
||||
| Use-Case-Text | Freitextbeschreibung des geplanten Anwendungsfalls | Optional (Opt-in), ansonsten nur Hash |
|
||||
| Domain | Branchenkategorie (z.B. "education", "healthcare") | Ja |
|
||||
| Datentyp-Flags | Angaben zu verarbeiteten Datenarten | Ja |
|
||||
| Automatisierungsgrad | assistiv/teil-/vollautomatisch | Ja |
|
||||
| Hosting-Informationen | Region, Provider | Ja |
|
||||
|
||||
**Wichtig:** Der System speichert standardmäßig **keine Freitexte**, sondern nur:
|
||||
- SHA-256 Hash des Textes (zur Deduplizierung)
|
||||
- Strukturierte Metadaten (Checkboxen, Dropdowns)
|
||||
|
||||
### 2.2 Bewertungsergebnisse
|
||||
|
||||
| Datenkategorie | Beschreibung | Aufbewahrung |
|
||||
|----------------|--------------|--------------|
|
||||
| Risk Score | Numerischer Wert 0-100 | Dauerhaft |
|
||||
| Triggered Rules | Ausgelöste Compliance-Regeln | Dauerhaft |
|
||||
| Required Controls | Empfohlene Maßnahmen | Dauerhaft |
|
||||
| Explanation | KI-generierte Erklärung | Dauerhaft |
|
||||
|
||||
### 2.3 Audit-Trail-Daten
|
||||
|
||||
| Datenkategorie | Beschreibung | Aufbewahrung |
|
||||
|----------------|--------------|--------------|
|
||||
| Benutzer-ID | UUID des ausführenden Benutzers | 10 Jahre |
|
||||
| Timestamp | Zeitpunkt der Aktion | 10 Jahre |
|
||||
| Aktionstyp | created/reviewed/decided | 10 Jahre |
|
||||
| Entscheidungsnotizen | Begründungen bei Eskalationen | 10 Jahre |
|
||||
|
||||
---
|
||||
|
||||
## 3. Entscheidungslogik und Automatisierung
|
||||
|
||||
### 3.1 Regelbasierte Bewertung (Deterministische Logik)
|
||||
|
||||
Das System verwendet **ausschließlich deterministische Regeln** für Compliance-Entscheidungen. Diese Regeln sind:
|
||||
|
||||
1. **Transparent** - Alle Regeln sind im Quellcode einsehbar
|
||||
2. **Nachvollziehbar** - Jede ausgelöste Regel wird dokumentiert
|
||||
3. **Überprüfbar** - Regellogik basiert auf konkreten DSGVO-Artikeln
|
||||
|
||||
**Beispiel-Regel R-F001:**
|
||||
```
|
||||
WENN:
|
||||
- Domain = "education" UND
|
||||
- Automation = "fully_automated" UND
|
||||
- Output enthält "rankings_or_scores"
|
||||
DANN:
|
||||
- Severity = BLOCK
|
||||
- DSGVO-Referenz = Art. 22 Abs. 1
|
||||
- Begründung = "Vollautomatisierte Bewertung von Schülern ohne menschliche Überprüfung"
|
||||
```
|
||||
|
||||
### 3.2 Keine autonomen KI-Entscheidungen
|
||||
|
||||
**Das System trifft KEINE autonomen KI-Entscheidungen bezüglich:**
|
||||
- Zulässigkeit eines Anwendungsfalls (immer regelbasiert)
|
||||
- Freigabe oder Ablehnung (immer durch Mensch)
|
||||
- Rechtliche Bewertungen (immer durch DSB/Legal)
|
||||
|
||||
**KI wird ausschließlich verwendet für:**
|
||||
- Erklärung bereits getroffener Regelentscheidungen
|
||||
- Zusammenfassung von Rechtstexten
|
||||
- Sprachliche Formulierung von Hinweisen
|
||||
|
||||
### 3.3 Human-in-the-Loop
|
||||
|
||||
Bei allen kritischen Entscheidungen ist ein **menschlicher Prüfer** eingebunden:
|
||||
|
||||
| Eskalationsstufe | Auslöser | Prüfer | SLA |
|
||||
|------------------|----------|--------|-----|
|
||||
| E0 | Nur informative Regeln | Automatisch | - |
|
||||
| E1 | Warnungen, geringes Risiko | Team-Lead | 24h |
|
||||
| E2 | Art. 9-Daten, DSFA empfohlen | DSB | 8h |
|
||||
| E3 | BLOCK-Regeln, hohes Risiko | DSB + Legal | 4h |
|
||||
|
||||
**BLOCK-Entscheidungen können NICHT durch KI überschrieben werden.**
|
||||
|
||||
---
|
||||
|
||||
## 4. Technische und organisatorische Maßnahmen (Art. 32 DSGVO)
|
||||
|
||||
### 4.1 Vertraulichkeit
|
||||
|
||||
| Maßnahme | Umsetzung |
|
||||
|----------|-----------|
|
||||
| Zugriffskontrolle | RBAC mit Tenant-Isolation |
|
||||
| Verschlüsselung in Transit | TLS 1.3 |
|
||||
| Verschlüsselung at Rest | AES-256 (PostgreSQL, Qdrant) |
|
||||
| Authentifizierung | JWT-basiert, Token-Expiry |
|
||||
|
||||
### 4.2 Integrität
|
||||
|
||||
| Maßnahme | Umsetzung |
|
||||
|----------|-----------|
|
||||
| Audit-Trail | Unveränderlicher Verlauf aller Aktionen |
|
||||
| Versionierung | Policy-Version in jedem Assessment |
|
||||
| Input-Validierung | Schema-Validierung aller API-Eingaben |
|
||||
|
||||
### 4.3 Verfügbarkeit
|
||||
|
||||
| Maßnahme | Umsetzung |
|
||||
|----------|-----------|
|
||||
| Backup | Tägliche PostgreSQL-Backups |
|
||||
| Redundanz | Container-Orchestrierung mit Auto-Restart |
|
||||
| Monitoring | Health-Checks, SLA-Überwachung |
|
||||
|
||||
### 4.4 Belastbarkeit
|
||||
|
||||
| Maßnahme | Umsetzung |
|
||||
|----------|-----------|
|
||||
| Rate Limiting | API-Anfragenbegrenzung |
|
||||
| Graceful Degradation | LLM-Fallback bei Ausfall |
|
||||
| Ressourcenlimits | Container-Memory-Limits |
|
||||
|
||||
---
|
||||
|
||||
## 5. Datenschutz-Folgenabschätzung (Art. 35 DSGVO)
|
||||
|
||||
### 5.1 Risikobewertung
|
||||
|
||||
| Risiko | Bewertung | Mitigierung |
|
||||
|--------|-----------|-------------|
|
||||
| Fehleinschätzung durch KI | Mittel | Deterministische Regeln, Human Review |
|
||||
| Datenverlust | Niedrig | Backup, Verschlüsselung |
|
||||
| Unbefugter Zugriff | Niedrig | RBAC, Audit-Trail |
|
||||
| Bias in Regellogik | Niedrig | Transparente Regeln, Review-Prozess |
|
||||
|
||||
### 5.2 DSFA-Trigger im System
|
||||
|
||||
Das System erkennt automatisch, wann eine DSFA erforderlich ist:
|
||||
- Verarbeitung besonderer Kategorien (Art. 9 DSGVO)
|
||||
- Systematische Bewertung natürlicher Personen
|
||||
- Neue Technologien mit hohem Risiko
|
||||
|
||||
---
|
||||
|
||||
## 6. Betroffenenrechte (Art. 15-22 DSGVO)
|
||||
|
||||
### 6.1 Auskunftsrecht (Art. 15)
|
||||
|
||||
Betroffene können Auskunft erhalten über:
|
||||
- Gespeicherte Assessments mit ihren Daten
|
||||
- Audit-Trail ihrer Interaktionen
|
||||
- Regelbasierte Entscheidungsbegründungen
|
||||
|
||||
### 6.2 Recht auf Berichtigung (Art. 16)
|
||||
|
||||
Betroffene können die Korrektur fehlerhafter Eingabedaten verlangen.
|
||||
|
||||
### 6.3 Recht auf Löschung (Art. 17)
|
||||
|
||||
Assessments können gelöscht werden, sofern:
|
||||
- Keine gesetzlichen Aufbewahrungspflichten bestehen
|
||||
- Keine laufenden Eskalationsverfahren existieren
|
||||
|
||||
### 6.4 Recht auf Einschränkung (Art. 18)
|
||||
|
||||
Die Verarbeitung kann eingeschränkt werden durch:
|
||||
- Archivierung statt Löschung
|
||||
- Sperrung des Datensatzes
|
||||
|
||||
### 6.5 Automatisierte Entscheidungen (Art. 22)
|
||||
|
||||
**Das System trifft keine automatisierten Einzelentscheidungen** im Sinne von Art. 22 DSGVO, da:
|
||||
|
||||
1. Regelauswertung ist **keine rechtlich bindende Entscheidung**
|
||||
2. Alle kritischen Fälle werden **menschlich geprüft** (E1-E3)
|
||||
3. BLOCK-Entscheidungen erfordern **immer menschliche Freigabe**
|
||||
4. Betroffene haben **Anfechtungsmöglichkeit** über Eskalation
|
||||
|
||||
---
|
||||
|
||||
## 7. Auftragsverarbeitung
|
||||
|
||||
### 7.1 Unterauftragnehmer
|
||||
|
||||
| Dienst | Anbieter | Standort | Zweck |
|
||||
|--------|----------|----------|-------|
|
||||
| Embedding-Service | Lokal (Self-Hosted) | EU | Vektorisierung |
|
||||
| Vector-DB (Qdrant) | Lokal (Self-Hosted) | EU | Ähnlichkeitssuche |
|
||||
| LLM (Ollama) | Lokal (Self-Hosted) | EU | Erklärungsgenerierung |
|
||||
|
||||
**Hinweis:** Das System kann vollständig on-premise betrieben werden ohne externe Dienste.
|
||||
|
||||
### 7.2 Internationale Transfers
|
||||
|
||||
Bei Nutzung von Cloud-LLM-Anbietern:
|
||||
- Anthropic Claude: US (DPF-zertifiziert)
|
||||
- OpenAI: US (DPF-zertifiziert)
|
||||
|
||||
**Empfehlung:** Nutzung des lokalen Ollama-Providers für sensible Daten.
|
||||
|
||||
---
|
||||
|
||||
## 8. Audit-Trail und Nachvollziehbarkeit
|
||||
|
||||
### 8.1 Protokollierte Ereignisse
|
||||
|
||||
| Ereignis | Protokollierte Daten |
|
||||
|----------|---------------------|
|
||||
| Assessment erstellt | Benutzer, Timestamp, Intake-Hash, Ergebnis |
|
||||
| Eskalation erstellt | Level, Grund, SLA |
|
||||
| Zuweisung | Benutzer, Rolle |
|
||||
| Review gestartet | Benutzer, Timestamp |
|
||||
| Entscheidung | Benutzer, Entscheidung, Begründung |
|
||||
|
||||
### 8.2 Aufbewahrungsfristen
|
||||
|
||||
| Datenart | Aufbewahrung | Rechtsgrundlage |
|
||||
|----------|--------------|-----------------|
|
||||
| Assessments | 10 Jahre | § 147 AO |
|
||||
| Audit-Trail | 10 Jahre | § 147 AO |
|
||||
| Eskalationen | 10 Jahre | § 147 AO |
|
||||
| Löschprotokolle | 3 Jahre | Art. 17 DSGVO |
|
||||
|
||||
---
|
||||
|
||||
## 9. Lizenzierte Inhalte & Normen-Compliance (§44b UrhG)
|
||||
|
||||
### 9.1 Zweck
|
||||
|
||||
Das System enthält einen spezialisierten **License Policy Engine** zur Compliance-Prüfung bei der Verarbeitung urheberrechtlich geschützter Inhalte, insbesondere:
|
||||
|
||||
- **DIN-Normen** (DIN Media / Beuth Verlag)
|
||||
- **VDI-Richtlinien**
|
||||
- **ISO/IEC-Standards**
|
||||
- **VDE-Normen**
|
||||
|
||||
### 9.2 Rechtlicher Hintergrund
|
||||
|
||||
**§44b UrhG - Text und Data Mining:**
|
||||
> "Die Vervielfältigung von rechtmäßig zugänglichen Werken für das Text und Data Mining ist zulässig."
|
||||
|
||||
**ABER:** Rechteinhaber können TDM gem. §44b Abs. 3 UrhG vorbehalten:
|
||||
- **DIN Media:** Expliziter Vorbehalt in AGB – keine KI/TDM-Nutzung ohne Sonderlizenz
|
||||
- **Geplante KI-Lizenzmodelle:** Ab Q4/2025 (DIN Media)
|
||||
|
||||
### 9.3 Operationsmodi im System
|
||||
|
||||
| Modus | Beschreibung | Lizenzanforderung |
|
||||
|-------|--------------|-------------------|
|
||||
| `LINK_ONLY` | Nur Verlinkung zum Original | Keine |
|
||||
| `NOTES_ONLY` | Eigene Notizen/Zusammenfassungen | Keine (§51 UrhG) |
|
||||
| `EXCERPT_ONLY` | Kurze Zitate (<100 Wörter) | Standard-Lizenz |
|
||||
| `FULLTEXT_RAG` | Volltextsuche mit Embedding | Explizite KI-Lizenz |
|
||||
| `TRAINING` | Modell-Training | Enterprise-Lizenz + Vertrag |
|
||||
|
||||
### 9.4 Stop-Lines (Automatische Sperren)
|
||||
|
||||
Das System **blockiert automatisch** folgende Kombinationen:
|
||||
|
||||
| Stop-Line ID | Bedingung | Aktion |
|
||||
|--------------|-----------|--------|
|
||||
| `STOP_DIN_FULLTEXT_AI_NOT_ALLOWED` | DIN Media + FULLTEXT_RAG + keine KI-Lizenz | Ablehnung |
|
||||
| `STOP_LICENSE_UNKNOWN_FULLTEXT` | Lizenz unbekannt + FULLTEXT_RAG | Warnung + Eskalation |
|
||||
| `STOP_TRAINING_WITHOUT_ENTERPRISE` | Beliebig + TRAINING + keine Enterprise-Lizenz | Ablehnung |
|
||||
|
||||
### 9.5 License Policy Engine - Entscheidungslogik
|
||||
|
||||
```
|
||||
INPUT:
|
||||
├── licensed_content.present = true
|
||||
├── licensed_content.publisher = "DIN_MEDIA"
|
||||
├── licensed_content.license_type = "SINGLE_WORKSTATION"
|
||||
├── licensed_content.ai_use_permitted = "NO"
|
||||
└── licensed_content.operation_mode = "FULLTEXT_RAG"
|
||||
|
||||
REGEL-EVALUATION:
|
||||
├── Prüfe Publisher-spezifische Regeln
|
||||
├── Prüfe Lizenztyp vs. gewünschter Modus
|
||||
├── Prüfe AI-Use-Flag
|
||||
└── Bestimme maximal zulässigen Modus
|
||||
|
||||
OUTPUT:
|
||||
├── allowed: false
|
||||
├── max_allowed_mode: "NOTES_ONLY"
|
||||
├── required_controls: ["CTRL-LICENSE-PROOF", "CTRL-NO-CRAWLING-DIN"]
|
||||
├── gaps: ["GAP_DIN_MEDIA_WITHOUT_AI_LICENSE"]
|
||||
├── stop_lines: ["STOP_DIN_FULLTEXT_AI_NOT_ALLOWED"]
|
||||
└── explanation: "DIN Media verbietet KI-Nutzung ohne explizite Lizenz..."
|
||||
```
|
||||
|
||||
### 9.6 Erforderliche Controls bei lizenzierten Inhalten
|
||||
|
||||
| Control ID | Beschreibung | Evidence |
|
||||
|------------|--------------|----------|
|
||||
| `CTRL-LICENSE-PROOF` | Lizenznachweis dokumentieren | Lizenzvertrag, Rechnung |
|
||||
| `CTRL-LICENSE-GATED-INGEST` | Technische Sperre vor Ingest | Konfiguration, Logs |
|
||||
| `CTRL-NO-CRAWLING-DIN` | Kein automatisches Crawling | System-Konfiguration |
|
||||
| `CTRL-OUTPUT-GUARD` | Ausgabe-Beschränkung (Zitatlimit) | API-Logs |
|
||||
|
||||
### 9.7 Audit-relevante Protokollierung
|
||||
|
||||
Bei jeder Verarbeitung lizenzierter Inhalte wird dokumentiert:
|
||||
|
||||
| Feld | Beschreibung | Aufbewahrung |
|
||||
|------|--------------|--------------|
|
||||
| `license_check_timestamp` | Zeitpunkt der Prüfung | 10 Jahre |
|
||||
| `license_decision` | Ergebnis (allowed/denied) | 10 Jahre |
|
||||
| `license_proof_hash` | Hash des Lizenznachweises | 10 Jahre |
|
||||
| `operation_mode_requested` | Angefragter Modus | 10 Jahre |
|
||||
| `operation_mode_granted` | Erlaubter Modus | 10 Jahre |
|
||||
| `publisher` | Rechteinhaber | 10 Jahre |
|
||||
|
||||
### 9.8 On-Premise-Deployment für sensible Normen
|
||||
|
||||
Für Unternehmen mit strengen Compliance-Anforderungen:
|
||||
|
||||
| Komponente | Deployment | Isolation |
|
||||
|------------|------------|-----------|
|
||||
| Normen-Datenbank | Lokaler Mac Studio | Air-gapped |
|
||||
| Embedding-Service | Lokal (bge-m3) | Keine Cloud |
|
||||
| Vector-DB (Qdrant) | Lokaler Container | Tenant-Isolation |
|
||||
| LLM (Ollama) | Lokal (Qwen2.5-Coder) | Keine API-Calls |
|
||||
|
||||
---
|
||||
|
||||
## 10. Kontakt und Verantwortlichkeiten
|
||||
|
||||
### 10.1 Verantwortlicher
|
||||
|
||||
[Name und Adresse des Unternehmens]
|
||||
|
||||
### 10.2 Datenschutzbeauftragter
|
||||
|
||||
Name: [Name]
|
||||
E-Mail: [E-Mail]
|
||||
Telefon: [Telefon]
|
||||
|
||||
### 10.3 Technischer Ansprechpartner
|
||||
|
||||
Name: [Name]
|
||||
E-Mail: [E-Mail]
|
||||
|
||||
---
|
||||
|
||||
## 11. Änderungshistorie
|
||||
|
||||
| Version | Datum | Änderung | Autor |
|
||||
|---------|-------|----------|-------|
|
||||
| 1.1.0 | 2026-01-29 | License Policy Engine & Standards-Compliance (§44b UrhG) | [Autor] |
|
||||
| 1.0.0 | 2026-01-29 | Erstversion | [Autor] |
|
||||
|
||||
---
|
||||
|
||||
*Diese Dokumentation erfüllt die Anforderungen nach Art. 30 DSGVO (Verzeichnis von Verarbeitungstätigkeiten) und dient als Grundlage für Audits nach Art. 32 DSGVO (Sicherheit der Verarbeitung).*
|
||||
746
docs-src/services/ai-compliance-sdk/DEVELOPER.md
Normal file
746
docs-src/services/ai-compliance-sdk/DEVELOPER.md
Normal file
@@ -0,0 +1,746 @@
|
||||
# AI Compliance SDK - Entwickler-Dokumentation
|
||||
|
||||
## Inhaltsverzeichnis
|
||||
|
||||
1. [Schnellstart](#1-schnellstart)
|
||||
2. [Architektur-Übersicht](#2-architektur-übersicht)
|
||||
3. [Policy Engine](#3-policy-engine)
|
||||
4. [License Policy Engine](#4-license-policy-engine)
|
||||
5. [Legal RAG Integration](#5-legal-rag-integration)
|
||||
6. [Wizard & Legal Assistant](#6-wizard--legal-assistant)
|
||||
7. [Eskalations-System](#7-eskalations-system)
|
||||
8. [API-Endpoints](#8-api-endpoints)
|
||||
9. [Policy-Dateien](#9-policy-dateien)
|
||||
10. [Tests ausführen](#10-tests-ausführen)
|
||||
|
||||
---
|
||||
|
||||
## 1. Schnellstart
|
||||
|
||||
### Voraussetzungen
|
||||
|
||||
- Go 1.21+
|
||||
- PostgreSQL (für Eskalations-Store)
|
||||
- Qdrant (für Legal RAG)
|
||||
- Ollama oder Anthropic API Key (für LLM)
|
||||
|
||||
### Build & Run
|
||||
|
||||
```bash
|
||||
# Build
|
||||
cd ai-compliance-sdk
|
||||
go build -o server ./cmd/server
|
||||
|
||||
# Run
|
||||
./server --config config.yaml
|
||||
|
||||
# Alternativ: mit Docker
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Erste Anfrage
|
||||
|
||||
```bash
|
||||
# UCCA Assessment erstellen
|
||||
curl -X POST http://localhost:8080/sdk/v1/ucca/assess \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"use_case_text": "Chatbot für Kundenservice mit FAQ-Suche",
|
||||
"domain": "utilities",
|
||||
"data_types": {
|
||||
"personal_data": false,
|
||||
"public_data": true
|
||||
},
|
||||
"automation": "assistive",
|
||||
"model_usage": {
|
||||
"rag": true
|
||||
},
|
||||
"hosting": {
|
||||
"region": "eu"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Architektur-Übersicht
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ API Layer (Gin) │
|
||||
│ internal/api/handlers/ │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ UCCA │ │ License │ │ Eskalation │ │
|
||||
│ │ Handler │ │ Handler │ │ Handler │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
|
||||
│ │ │ │ │
|
||||
├─────────┼────────────────┼──────────────────────┼────────────────┤
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Policy │ │ License │ │ Escalation │ │
|
||||
│ │ Engine │ │ Policy │ │ Store │ │
|
||||
│ │ │ │ Engine │ │ │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────┬───────┴──────────────────────┘ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ Legal RAG System │ │
|
||||
│ │ (Qdrant + LLM Integration) │ │
|
||||
│ └─────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Kernprinzip
|
||||
|
||||
**LLM ist NICHT die Quelle der Wahrheit!**
|
||||
|
||||
| Komponente | Entscheidet | LLM-Nutzung |
|
||||
|------------|-------------|-------------|
|
||||
| Policy Engine | Feasibility, Risk Level | Nein |
|
||||
| License Engine | Operation Mode, Stop-Lines | Nein |
|
||||
| Gap Mapping | Facts → Gaps → Controls | Nein |
|
||||
| Legal RAG | Erklärung generieren | Ja (nur Output) |
|
||||
|
||||
---
|
||||
|
||||
## 3. Policy Engine
|
||||
|
||||
### Übersicht
|
||||
|
||||
Die Policy Engine (`internal/ucca/policy_engine.go`) evaluiert Use Cases gegen deterministische Regeln.
|
||||
|
||||
### Verwendung
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
// Engine erstellen
|
||||
engine, err := ucca.NewPolicyEngineFromPath("policies/ucca_policy_v1.yaml")
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
// Intake erstellen
|
||||
intake := &ucca.UseCaseIntake{
|
||||
UseCaseText: "Chatbot für Kundenservice",
|
||||
Domain: ucca.DomainUtilities,
|
||||
DataTypes: ucca.DataTypes{
|
||||
PersonalData: false,
|
||||
PublicData: true,
|
||||
},
|
||||
Automation: ucca.AutomationAssistive,
|
||||
ModelUsage: ucca.ModelUsage{
|
||||
RAG: true,
|
||||
},
|
||||
Hosting: ucca.Hosting{
|
||||
Region: "eu",
|
||||
},
|
||||
}
|
||||
|
||||
// Evaluieren
|
||||
result := engine.Evaluate(intake)
|
||||
|
||||
// Ergebnis auswerten
|
||||
fmt.Println("Feasibility:", result.Feasibility) // YES, NO, CONDITIONAL
|
||||
fmt.Println("Risk Level:", result.RiskLevel) // MINIMAL, LOW, MEDIUM, HIGH
|
||||
fmt.Println("Risk Score:", result.RiskScore) // 0-100
|
||||
```
|
||||
|
||||
### Ergebnis-Struktur
|
||||
|
||||
```go
|
||||
type EvaluationResult struct {
|
||||
Feasibility Feasibility // YES, NO, CONDITIONAL
|
||||
RiskLevel RiskLevel // MINIMAL, LOW, MEDIUM, HIGH
|
||||
RiskScore int // 0-100
|
||||
TriggeredRules []TriggeredRule // Ausgelöste Regeln
|
||||
RequiredControls []Control // Erforderliche Maßnahmen
|
||||
RecommendedArchitecture []Pattern // Empfohlene Patterns
|
||||
DSFARecommended bool // DSFA erforderlich?
|
||||
Art22Risk bool // Art. 22 Risiko?
|
||||
TrainingAllowed TrainingAllowed // YES, NO, CONDITIONAL
|
||||
PolicyVersion string // Version der Policy
|
||||
}
|
||||
```
|
||||
|
||||
### Regeln hinzufügen
|
||||
|
||||
Neue Regeln werden in `policies/ucca_policy_v1.yaml` definiert:
|
||||
|
||||
```yaml
|
||||
rules:
|
||||
- id: R-CUSTOM-001
|
||||
code: R-CUSTOM-001
|
||||
category: custom
|
||||
title: Custom Rule
|
||||
title_de: Benutzerdefinierte Regel
|
||||
description: Custom rule description
|
||||
severity: WARN # INFO, WARN, BLOCK
|
||||
gdpr_ref: "Art. 6 DSGVO"
|
||||
condition:
|
||||
all_of:
|
||||
- field: domain
|
||||
equals: custom_domain
|
||||
- field: data_types.personal_data
|
||||
equals: true
|
||||
controls:
|
||||
- C_CUSTOM_CONTROL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. License Policy Engine
|
||||
|
||||
### Übersicht
|
||||
|
||||
Die License Policy Engine (`internal/ucca/license_policy.go`) prüft die Lizenz-Compliance für Standards und Normen.
|
||||
|
||||
### Operationsmodi
|
||||
|
||||
| Modus | Beschreibung | Lizenzanforderung |
|
||||
|-------|--------------|-------------------|
|
||||
| `LINK_ONLY` | Nur Verweise | Keine |
|
||||
| `NOTES_ONLY` | Eigene Notizen | Keine |
|
||||
| `EXCERPT_ONLY` | Kurzzitate (<150 Zeichen) | Standard-Lizenz |
|
||||
| `FULLTEXT_RAG` | Volltext-Embedding | Explizite KI-Lizenz |
|
||||
| `TRAINING` | Modell-Training | Enterprise + Vertrag |
|
||||
|
||||
### Verwendung
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
engine := ucca.NewLicensePolicyEngine()
|
||||
|
||||
facts := &ucca.LicensedContentFacts{
|
||||
Present: true,
|
||||
Publisher: "DIN_MEDIA",
|
||||
LicenseType: "SINGLE_WORKSTATION",
|
||||
AIUsePermitted: "NO",
|
||||
ProofUploaded: false,
|
||||
OperationMode: "FULLTEXT_RAG",
|
||||
}
|
||||
|
||||
result := engine.Evaluate(facts)
|
||||
|
||||
if !result.Allowed {
|
||||
fmt.Println("Blockiert:", result.StopLine.Message)
|
||||
fmt.Println("Effektiver Modus:", result.EffectiveMode)
|
||||
}
|
||||
```
|
||||
|
||||
### Ingest-Entscheidung
|
||||
|
||||
```go
|
||||
// Prüfen ob Volltext-Ingest erlaubt ist
|
||||
canIngest := engine.CanIngestFulltext(facts)
|
||||
|
||||
// Oder detaillierte Entscheidung
|
||||
decision := engine.DecideIngest(facts)
|
||||
fmt.Println("Fulltext:", decision.AllowFulltext)
|
||||
fmt.Println("Notes:", decision.AllowNotes)
|
||||
fmt.Println("Metadata:", decision.AllowMetadata)
|
||||
```
|
||||
|
||||
### Audit-Logging
|
||||
|
||||
```go
|
||||
// Audit-Entry erstellen
|
||||
entry := engine.FormatAuditEntry("tenant-123", "doc-456", facts, result)
|
||||
|
||||
// Human-readable Summary
|
||||
summary := engine.FormatHumanReadableSummary(result)
|
||||
fmt.Println(summary)
|
||||
```
|
||||
|
||||
### Publisher-spezifische Regeln
|
||||
|
||||
DIN Media hat explizite Restriktionen:
|
||||
|
||||
```go
|
||||
// DIN Media blockiert FULLTEXT_RAG ohne AI-Lizenz
|
||||
if facts.Publisher == "DIN_MEDIA" && facts.AIUsePermitted != "YES" {
|
||||
// → STOP_DIN_FULLTEXT_AI_NOT_ALLOWED
|
||||
// → Downgrade auf LINK_ONLY
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Legal RAG Integration
|
||||
|
||||
### Übersicht
|
||||
|
||||
Das Legal RAG System (`internal/ucca/legal_rag.go`) generiert Erklärungen mit rechtlichem Kontext.
|
||||
|
||||
### Verwendung
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
rag := ucca.NewLegalRAGService(qdrantClient, llmClient, "bp_legal_corpus")
|
||||
|
||||
// Erklärung generieren
|
||||
explanation, err := rag.Explain(ctx, result, intake)
|
||||
if err != nil {
|
||||
log.Error(err)
|
||||
}
|
||||
|
||||
fmt.Println("Erklärung:", explanation.Text)
|
||||
fmt.Println("Rechtsquellen:", explanation.Sources)
|
||||
```
|
||||
|
||||
### Rechtsquellen im RAG
|
||||
|
||||
| Quelle | Chunks | Beschreibung |
|
||||
|--------|--------|--------------|
|
||||
| DSGVO | 128 | EU Datenschutz-Grundverordnung |
|
||||
| AI Act | 96 | EU AI-Verordnung |
|
||||
| NIS2 | 128 | Netzwerk-Informationssicherheit |
|
||||
| SCC | 32 | Standardvertragsklauseln |
|
||||
| DPF | 714 | Data Privacy Framework |
|
||||
|
||||
---
|
||||
|
||||
## 6. Wizard & Legal Assistant
|
||||
|
||||
### Wizard-Schema
|
||||
|
||||
Das Wizard-Schema (`policies/wizard_schema_v1.yaml`) definiert die Fragen für das Frontend.
|
||||
|
||||
### Legal Assistant verwenden
|
||||
|
||||
```go
|
||||
// Wizard-Frage an Legal Assistant stellen
|
||||
type WizardAskRequest struct {
|
||||
Question string `json:"question"`
|
||||
StepNumber int `json:"step_number"`
|
||||
FieldID string `json:"field_id,omitempty"`
|
||||
CurrentData map[string]interface{} `json:"current_data,omitempty"`
|
||||
}
|
||||
|
||||
// POST /sdk/v1/ucca/wizard/ask
|
||||
```
|
||||
|
||||
### Beispiel API-Call
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/sdk/v1/ucca/wizard/ask \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"question": "Was sind personenbezogene Daten?",
|
||||
"step_number": 2,
|
||||
"field_id": "data_types.personal_data"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Eskalations-System
|
||||
|
||||
### Eskalationsstufen
|
||||
|
||||
| Level | Auslöser | Prüfer | SLA |
|
||||
|-------|----------|--------|-----|
|
||||
| E0 | Nur INFO | Automatisch | - |
|
||||
| E1 | WARN, geringes Risiko | Team-Lead | 24h |
|
||||
| E2 | Art. 9, DSFA empfohlen | DSB | 8h |
|
||||
| E3 | BLOCK, hohes Risiko | DSB + Legal | 4h |
|
||||
|
||||
### Eskalation erstellen
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
store := ucca.NewEscalationStore(db)
|
||||
|
||||
escalation := &ucca.Escalation{
|
||||
AssessmentID: "assess-123",
|
||||
Level: ucca.EscalationE2,
|
||||
TriggerReason: "Art. 9 Daten betroffen",
|
||||
RequiredReviews: 1,
|
||||
}
|
||||
|
||||
err := store.CreateEscalation(ctx, escalation)
|
||||
```
|
||||
|
||||
### SLA-Monitor
|
||||
|
||||
```go
|
||||
monitor := ucca.NewSLAMonitor(store, notificationService)
|
||||
|
||||
// Im Hintergrund starten
|
||||
go monitor.Start(ctx)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. API-Endpoints
|
||||
|
||||
### UCCA Endpoints
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/assess` | Assessment erstellen |
|
||||
| GET | `/sdk/v1/ucca/assess/:id` | Assessment abrufen |
|
||||
| POST | `/sdk/v1/ucca/explain` | Erklärung generieren |
|
||||
| GET | `/sdk/v1/ucca/wizard/schema` | Wizard-Schema abrufen |
|
||||
| POST | `/sdk/v1/ucca/wizard/ask` | Legal Assistant fragen |
|
||||
|
||||
### License Endpoints
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/license/evaluate` | Lizenz-Prüfung |
|
||||
| POST | `/sdk/v1/license/decide-ingest` | Ingest-Entscheidung |
|
||||
|
||||
### Eskalations-Endpoints
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/escalations` | Offene Eskalationen |
|
||||
| GET | `/sdk/v1/escalations/:id` | Eskalation abrufen |
|
||||
| POST | `/sdk/v1/escalations/:id/decide` | Entscheidung treffen |
|
||||
|
||||
---
|
||||
|
||||
## 9. Policy-Dateien
|
||||
|
||||
### Dateistruktur
|
||||
|
||||
```
|
||||
policies/
|
||||
├── ucca_policy_v1.yaml # Haupt-Policy (Regeln, Controls, Patterns)
|
||||
├── wizard_schema_v1.yaml # Wizard-Fragen und Legal Assistant
|
||||
├── controls_catalog.yaml # Detaillierte Control-Beschreibungen
|
||||
├── gap_mapping.yaml # Facts → Gaps → Controls
|
||||
├── licensed_content_policy.yaml # Standards/Normen Compliance
|
||||
└── scc_legal_corpus.yaml # SCC Rechtsquellen
|
||||
```
|
||||
|
||||
### Policy-Version
|
||||
|
||||
Jede Policy hat eine Version:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
version: "1.0.0"
|
||||
effective_date: "2025-01-01"
|
||||
author: "Compliance Team"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Tests ausführen
|
||||
|
||||
### Alle Tests
|
||||
|
||||
```bash
|
||||
cd ai-compliance-sdk
|
||||
go test -v ./...
|
||||
```
|
||||
|
||||
### Spezifische Tests
|
||||
|
||||
```bash
|
||||
# Policy Engine Tests
|
||||
go test -v ./internal/ucca/policy_engine_test.go
|
||||
|
||||
# License Policy Tests
|
||||
go test -v ./internal/ucca/license_policy_test.go
|
||||
|
||||
# Eskalation Tests
|
||||
go test -v ./internal/ucca/escalation_test.go
|
||||
```
|
||||
|
||||
### Test-Coverage
|
||||
|
||||
```bash
|
||||
go test -cover ./...
|
||||
|
||||
# HTML-Report
|
||||
go test -coverprofile=coverage.out ./...
|
||||
go tool cover -html=coverage.out
|
||||
```
|
||||
|
||||
### Beispiel: Neuen Test hinzufügen
|
||||
|
||||
```go
|
||||
func TestMyNewFeature(t *testing.T) {
|
||||
engine := NewLicensePolicyEngine()
|
||||
|
||||
facts := &LicensedContentFacts{
|
||||
Present: true,
|
||||
Publisher: "DIN_MEDIA",
|
||||
OperationMode: "FULLTEXT_RAG",
|
||||
}
|
||||
|
||||
result := engine.Evaluate(facts)
|
||||
|
||||
if result.Allowed {
|
||||
t.Error("Expected blocked for DIN_MEDIA FULLTEXT_RAG")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Generic Obligations Framework
|
||||
|
||||
### Übersicht
|
||||
|
||||
Das Obligations Framework ermöglicht die automatische Ableitung regulatorischer Pflichten aus NIS2, DSGVO und AI Act.
|
||||
|
||||
### Verwendung
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
// Registry erstellen (lädt alle Module)
|
||||
registry := ucca.NewObligationsRegistry()
|
||||
|
||||
// UnifiedFacts aufbauen
|
||||
facts := &ucca.UnifiedFacts{
|
||||
Organization: ucca.OrganizationFacts{
|
||||
EmployeeCount: 150,
|
||||
AnnualRevenue: 30000000,
|
||||
Country: "DE",
|
||||
EUMember: true,
|
||||
},
|
||||
Sector: ucca.SectorFacts{
|
||||
PrimarySector: "digital_infrastructure",
|
||||
SpecialServices: []string{"cloud", "msp"},
|
||||
IsKRITIS: false,
|
||||
},
|
||||
DataProtection: ucca.DataProtectionFacts{
|
||||
ProcessesPersonalData: true,
|
||||
},
|
||||
AIUsage: ucca.AIUsageFacts{
|
||||
UsesAI: true,
|
||||
HighRiskCategories: []string{"employment"},
|
||||
IsGPAIProvider: false,
|
||||
},
|
||||
}
|
||||
|
||||
// Alle anwendbaren Pflichten evaluieren
|
||||
overview := registry.EvaluateAll(facts, "Muster GmbH")
|
||||
|
||||
// Ergebnis auswerten
|
||||
fmt.Println("Anwendbare Regulierungen:", len(overview.ApplicableRegulations))
|
||||
fmt.Println("Gesamtzahl Pflichten:", len(overview.Obligations))
|
||||
fmt.Println("Kritische Pflichten:", overview.ExecutiveSummary.CriticalObligations)
|
||||
```
|
||||
|
||||
### Neues Regulierungsmodul erstellen
|
||||
|
||||
```go
|
||||
// 1. Module-Interface implementieren
|
||||
type MyRegulationModule struct {
|
||||
obligations []ucca.Obligation
|
||||
controls []ucca.ObligationControl
|
||||
incidentDeadlines []ucca.IncidentDeadline
|
||||
}
|
||||
|
||||
func (m *MyRegulationModule) ID() string { return "my_regulation" }
|
||||
func (m *MyRegulationModule) Name() string { return "My Regulation" }
|
||||
|
||||
func (m *MyRegulationModule) IsApplicable(facts *ucca.UnifiedFacts) bool {
|
||||
// Prüflogik implementieren
|
||||
return facts.Organization.Country == "DE"
|
||||
}
|
||||
|
||||
func (m *MyRegulationModule) DeriveObligations(facts *ucca.UnifiedFacts) []ucca.Obligation {
|
||||
// Pflichten basierend auf Facts ableiten
|
||||
return m.obligations
|
||||
}
|
||||
|
||||
// 2. In Registry registrieren
|
||||
func NewMyRegulationModule() (*MyRegulationModule, error) {
|
||||
m := &MyRegulationModule{}
|
||||
// YAML laden oder hardcoded Pflichten definieren
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// In obligations_registry.go:
|
||||
// r.Register(NewMyRegulationModule())
|
||||
```
|
||||
|
||||
### YAML-basierte Pflichten
|
||||
|
||||
```yaml
|
||||
# policies/obligations/my_regulation_obligations.yaml
|
||||
regulation: my_regulation
|
||||
name: "My Regulation"
|
||||
|
||||
obligations:
|
||||
- id: "MYREG-OBL-001"
|
||||
title: "Compliance-Pflicht"
|
||||
description: "Beschreibung der Pflicht"
|
||||
applies_when: "classification != 'nicht_betroffen'"
|
||||
legal_basis:
|
||||
- norm: "§ 1 MyReg"
|
||||
category: "Governance"
|
||||
responsible: "Geschäftsführung"
|
||||
deadline:
|
||||
type: "relative"
|
||||
duration: "12 Monate"
|
||||
sanctions:
|
||||
max_fine: "1 Mio. EUR"
|
||||
priority: "high"
|
||||
|
||||
controls:
|
||||
- id: "MYREG-CTRL-001"
|
||||
name: "Kontrollmaßnahme"
|
||||
category: "Technical"
|
||||
when_applicable: "immer"
|
||||
what_to_do: "Maßnahme implementieren"
|
||||
evidence_needed:
|
||||
- "Dokumentation"
|
||||
```
|
||||
|
||||
### PDF Export
|
||||
|
||||
```go
|
||||
import "ai-compliance-sdk/internal/ucca"
|
||||
|
||||
// Exporter erstellen
|
||||
exporter := ucca.NewPDFExporter("de")
|
||||
|
||||
// PDF generieren
|
||||
response, err := exporter.ExportManagementMemo(overview)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
// base64-kodierter PDF-Inhalt
|
||||
fmt.Println("Content-Type:", response.ContentType) // application/pdf
|
||||
fmt.Println("Filename:", response.Filename)
|
||||
|
||||
// PDF speichern
|
||||
decoded, _ := base64.StdEncoding.DecodeString(response.Content)
|
||||
os.WriteFile("memo.pdf", decoded, 0644)
|
||||
|
||||
// Alternativ: Markdown
|
||||
mdResponse, err := exporter.ExportMarkdown(overview)
|
||||
fmt.Println(mdResponse.Content) // Markdown-Text
|
||||
```
|
||||
|
||||
### API-Endpoints
|
||||
|
||||
```bash
|
||||
# Assessment erstellen
|
||||
curl -X POST http://localhost:8090/sdk/v1/ucca/obligations/assess \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"facts": {
|
||||
"organization": {"employee_count": 150, "country": "DE"},
|
||||
"sector": {"primary_sector": "healthcare"},
|
||||
"data_protection": {"processes_personal_data": true},
|
||||
"ai_usage": {"uses_ai": false}
|
||||
},
|
||||
"organization_name": "Test GmbH"
|
||||
}'
|
||||
|
||||
# PDF Export (direkt)
|
||||
curl -X POST http://localhost:8090/sdk/v1/ucca/obligations/export/direct \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"overview": { ... },
|
||||
"format": "pdf",
|
||||
"language": "de"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Tests für Obligations Framework
|
||||
|
||||
```bash
|
||||
# Alle Obligations-Tests
|
||||
go test -v ./internal/ucca/..._module_test.go
|
||||
|
||||
# NIS2 Module Tests
|
||||
go test -v ./internal/ucca/nis2_module_test.go
|
||||
|
||||
# DSGVO Module Tests
|
||||
go test -v ./internal/ucca/dsgvo_module_test.go
|
||||
|
||||
# AI Act Module Tests
|
||||
go test -v ./internal/ucca/ai_act_module_test.go
|
||||
|
||||
# PDF Export Tests
|
||||
go test -v ./internal/ucca/pdf_export_test.go
|
||||
```
|
||||
|
||||
### Beispiel-Tests
|
||||
|
||||
```go
|
||||
func TestNIS2Module_LargeCompanyInAnnexISector(t *testing.T) {
|
||||
module, _ := ucca.NewNIS2Module()
|
||||
|
||||
facts := &ucca.UnifiedFacts{
|
||||
Organization: ucca.OrganizationFacts{
|
||||
EmployeeCount: 500,
|
||||
AnnualRevenue: 100000000,
|
||||
Country: "DE",
|
||||
},
|
||||
Sector: ucca.SectorFacts{
|
||||
PrimarySector: "energy",
|
||||
},
|
||||
}
|
||||
|
||||
if !module.IsApplicable(facts) {
|
||||
t.Error("Expected NIS2 to apply to large energy company")
|
||||
}
|
||||
|
||||
classification := module.Classify(facts)
|
||||
if classification != "besonders_wichtige_einrichtung" {
|
||||
t.Errorf("Expected 'besonders_wichtige_einrichtung', got '%s'", classification)
|
||||
}
|
||||
}
|
||||
|
||||
func TestAIActModule_HighRiskEmploymentAI(t *testing.T) {
|
||||
module, _ := ucca.NewAIActModule()
|
||||
|
||||
facts := &ucca.UnifiedFacts{
|
||||
AIUsage: ucca.AIUsageFacts{
|
||||
UsesAI: true,
|
||||
HighRiskCategories: []string{"employment"},
|
||||
},
|
||||
}
|
||||
|
||||
if !module.IsApplicable(facts) {
|
||||
t.Error("Expected AI Act to apply")
|
||||
}
|
||||
|
||||
riskLevel := module.ClassifyRisk(facts)
|
||||
if riskLevel != ucca.AIActHighRisk {
|
||||
t.Errorf("Expected 'high_risk', got '%s'", riskLevel)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anhang: Wichtige Dateien
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `internal/ucca/policy_engine.go` | Haupt-Policy-Engine |
|
||||
| `internal/ucca/license_policy.go` | License Policy Engine |
|
||||
| `internal/ucca/obligations_framework.go` | Obligations Interfaces & Typen |
|
||||
| `internal/ucca/obligations_registry.go` | Modul-Registry |
|
||||
| `internal/ucca/nis2_module.go` | NIS2 Decision Tree |
|
||||
| `internal/ucca/dsgvo_module.go` | DSGVO Pflichten |
|
||||
| `internal/ucca/ai_act_module.go` | AI Act Risk Classification |
|
||||
| `internal/ucca/pdf_export.go` | PDF/Markdown Export |
|
||||
| `internal/api/handlers/obligations_handlers.go` | Obligations API |
|
||||
| `policies/obligations/*.yaml` | Pflichten-Kataloge |
|
||||
|
||||
---
|
||||
|
||||
*Dokumentationsstand: 2026-01-29*
|
||||
220
docs-src/services/ai-compliance-sdk/SBOM.md
Normal file
220
docs-src/services/ai-compliance-sdk/SBOM.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# AI Compliance SDK - Software Bill of Materials (SBOM)
|
||||
|
||||
**Erstellt:** 2026-01-29
|
||||
**Go-Version:** 1.24.0
|
||||
|
||||
---
|
||||
|
||||
## Zusammenfassung
|
||||
|
||||
| Kategorie | Anzahl | Status |
|
||||
|-----------|--------|--------|
|
||||
| Direkte Abhängigkeiten | 7 | ✅ Alle kommerziell nutzbar |
|
||||
| Indirekte Abhängigkeiten | ~45 | ✅ Alle kommerziell nutzbar |
|
||||
| **Gesamt** | ~52 | ✅ **Alle Open Source, kommerziell nutzbar** |
|
||||
|
||||
---
|
||||
|
||||
## Direkte Abhängigkeiten
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/gin-gonic/gin` | v1.10.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/gin-contrib/cors` | v1.7.6 | **MIT** | ✅ Ja |
|
||||
| `github.com/google/uuid` | v1.6.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `github.com/jackc/pgx/v5` | v5.5.3 | **MIT** | ✅ Ja |
|
||||
| `github.com/joho/godotenv` | v1.5.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/xuri/excelize/v2` | v2.9.1 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `gopkg.in/yaml.v3` | v3.0.1 | **MIT / Apache-2.0** | ✅ Ja |
|
||||
|
||||
---
|
||||
|
||||
## Indirekte Abhängigkeiten (Transitive)
|
||||
|
||||
### JSON / Serialisierung
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/bytedance/sonic` | v1.13.3 | **Apache-2.0** | ✅ Ja |
|
||||
| `github.com/goccy/go-json` | v0.10.5 | **MIT** | ✅ Ja |
|
||||
| `github.com/json-iterator/go` | v1.1.12 | **MIT** | ✅ Ja |
|
||||
| `github.com/pelletier/go-toml/v2` | v2.2.4 | **MIT** | ✅ Ja |
|
||||
| `gopkg.in/yaml.v3` | v3.0.1 | **MIT / Apache-2.0** | ✅ Ja |
|
||||
| `github.com/ugorji/go/codec` | v1.3.0 | **MIT** | ✅ Ja |
|
||||
|
||||
### Web Framework (Gin-Ökosystem)
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/gin-contrib/sse` | v1.1.0 | **MIT** | ✅ Ja |
|
||||
| `github.com/go-playground/validator/v10` | v10.26.0 | **MIT** | ✅ Ja |
|
||||
| `github.com/go-playground/locales` | v0.14.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/go-playground/universal-translator` | v0.18.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/leodido/go-urn` | v1.4.0 | **MIT** | ✅ Ja |
|
||||
|
||||
### Datenbank (PostgreSQL)
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/jackc/pgpassfile` | v1.0.0 | **MIT** | ✅ Ja |
|
||||
| `github.com/jackc/pgservicefile` | v0.0.0-... | **MIT** | ✅ Ja |
|
||||
| `github.com/jackc/puddle/v2` | v2.2.1 | **MIT** | ✅ Ja |
|
||||
|
||||
### Excel-Verarbeitung
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/xuri/excelize/v2` | v2.9.1 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `github.com/xuri/efp` | v0.0.1 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `github.com/xuri/nfp` | v0.0.2-... | **BSD-3-Clause** | ✅ Ja |
|
||||
| `github.com/richardlehane/mscfb` | v1.0.4 | **Apache-2.0** | ✅ Ja |
|
||||
| `github.com/richardlehane/msoleps` | v1.0.4 | **Apache-2.0** | ✅ Ja |
|
||||
|
||||
### PDF-Generierung
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/jung-kurt/gofpdf` | v1.16.2 | **MIT** | ✅ Ja |
|
||||
|
||||
### Utilities
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `github.com/gabriel-vasile/mimetype` | v1.4.9 | **MIT** | ✅ Ja |
|
||||
| `github.com/mattn/go-isatty` | v0.0.20 | **MIT** | ✅ Ja |
|
||||
| `github.com/modern-go/concurrent` | v0.0.0-... | **Apache-2.0** | ✅ Ja |
|
||||
| `github.com/modern-go/reflect2` | v1.0.2 | **Apache-2.0** | ✅ Ja |
|
||||
| `github.com/klauspost/cpuid/v2` | v2.2.10 | **MIT** | ✅ Ja |
|
||||
| `github.com/tiendc/go-deepcopy` | v1.7.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/twitchyliquid64/golang-asm` | v0.15.1 | **MIT** | ✅ Ja |
|
||||
| `github.com/cloudwego/base64x` | v0.1.5 | **Apache-2.0** | ✅ Ja |
|
||||
|
||||
### Go Standardbibliothek Erweiterungen
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `golang.org/x/arch` | v0.18.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `golang.org/x/crypto` | v0.43.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `golang.org/x/net` | v0.46.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `golang.org/x/sync` | v0.17.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `golang.org/x/sys` | v0.37.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
| `golang.org/x/text` | v0.30.0 | **BSD-3-Clause** | ✅ Ja |
|
||||
|
||||
### Protokoll-Bibliotheken
|
||||
|
||||
| Package | Version | Lizenz | Kommerziell nutzbar |
|
||||
|---------|---------|--------|---------------------|
|
||||
| `google.golang.org/protobuf` | v1.36.6 | **BSD-3-Clause** | ✅ Ja |
|
||||
|
||||
---
|
||||
|
||||
## Lizenz-Übersicht
|
||||
|
||||
| Lizenz | Anzahl Packages | Kommerziell nutzbar | Copyleft |
|
||||
|--------|-----------------|---------------------|----------|
|
||||
| **MIT** | ~25 | ✅ Ja | ❌ Nein |
|
||||
| **Apache-2.0** | ~8 | ✅ Ja | ❌ Nein (schwach) |
|
||||
| **BSD-3-Clause** | ~12 | ✅ Ja | ❌ Nein |
|
||||
| **BSD-2-Clause** | 0 | ✅ Ja | ❌ Nein |
|
||||
|
||||
### Keine problematischen Lizenzen!
|
||||
|
||||
| Lizenz | Status |
|
||||
|--------|--------|
|
||||
| GPL-2.0 | ❌ **Nicht verwendet** |
|
||||
| GPL-3.0 | ❌ **Nicht verwendet** |
|
||||
| AGPL | ❌ **Nicht verwendet** |
|
||||
| LGPL | ❌ **Nicht verwendet** |
|
||||
| SSPL | ❌ **Nicht verwendet** |
|
||||
| Commons Clause | ❌ **Nicht verwendet** |
|
||||
|
||||
---
|
||||
|
||||
## Eigene Komponenten (Keine externen Abhängigkeiten)
|
||||
|
||||
Die folgenden Komponenten wurden im Rahmen des AI Compliance SDK entwickelt und haben **keine zusätzlichen Abhängigkeiten**:
|
||||
|
||||
| Komponente | Dateien | Externe Deps |
|
||||
|------------|---------|--------------|
|
||||
| Policy Engine | `internal/ucca/policy_engine.go` | Keine |
|
||||
| License Policy Engine | `internal/ucca/license_policy.go` | Keine |
|
||||
| Legal RAG | `internal/ucca/legal_rag.go` | Keine |
|
||||
| Escalation System | `internal/ucca/escalation_*.go` | Keine |
|
||||
| SLA Monitor | `internal/ucca/sla_monitor.go` | Keine |
|
||||
| UCCA Handlers | `internal/api/handlers/ucca_handlers.go` | Gin (MIT) |
|
||||
| **Obligations Framework** | `internal/ucca/obligations_framework.go` | Keine |
|
||||
| **Obligations Registry** | `internal/ucca/obligations_registry.go` | Keine |
|
||||
| **NIS2 Module** | `internal/ucca/nis2_module.go` | Keine |
|
||||
| **DSGVO Module** | `internal/ucca/dsgvo_module.go` | Keine |
|
||||
| **AI Act Module** | `internal/ucca/ai_act_module.go` | Keine |
|
||||
| **PDF Export** | `internal/ucca/pdf_export.go` | gofpdf (MIT) |
|
||||
| **Obligations Handlers** | `internal/api/handlers/obligations_handlers.go` | Gin (MIT) |
|
||||
| **Funding Models** | `internal/funding/models.go` | Keine |
|
||||
| **Funding Store** | `internal/funding/store.go`, `postgres_store.go` | pgx (MIT) |
|
||||
| **Funding Export** | `internal/funding/export.go` | gofpdf (MIT), excelize (BSD-3) |
|
||||
| **Funding Handlers** | `internal/api/handlers/funding_handlers.go` | Gin (MIT) |
|
||||
|
||||
### Policy-Dateien (Reine YAML/JSON)
|
||||
|
||||
| Datei | Format | Abhängigkeiten |
|
||||
|-------|--------|----------------|
|
||||
| `ucca_policy_v1.yaml` | YAML | Keine |
|
||||
| `wizard_schema_v1.yaml` | YAML | Keine |
|
||||
| `controls_catalog.yaml` | YAML | Keine |
|
||||
| `gap_mapping.yaml` | YAML | Keine |
|
||||
| `licensed_content_policy.yaml` | YAML | Keine |
|
||||
| `financial_regulations_policy.yaml` | YAML | Keine |
|
||||
| `financial_regulations_corpus.yaml` | YAML | Keine |
|
||||
| `scc_legal_corpus.yaml` | YAML | Keine |
|
||||
| **`obligations/nis2_obligations.yaml`** | YAML | Keine |
|
||||
| **`obligations/dsgvo_obligations.yaml`** | YAML | Keine |
|
||||
| **`obligations/ai_act_obligations.yaml`** | YAML | Keine |
|
||||
| **`funding/foerderantrag_wizard_v1.yaml`** | YAML | Keine |
|
||||
| **`funding/bundesland_profiles.yaml`** | YAML | Keine |
|
||||
|
||||
---
|
||||
|
||||
## Compliance-Erklärung
|
||||
|
||||
### Für kommerzielle Nutzung geeignet: ✅ JA
|
||||
|
||||
Alle verwendeten Abhängigkeiten verwenden **permissive Open-Source-Lizenzen**:
|
||||
|
||||
1. **MIT-Lizenz**: Erlaubt kommerzielle Nutzung, Modifikation, Distribution. Nur Lizenzhinweis erforderlich.
|
||||
|
||||
2. **Apache-2.0-Lizenz**: Erlaubt kommerzielle Nutzung, Modifikation, Distribution. Patentgewährung enthalten.
|
||||
|
||||
3. **BSD-3-Clause**: Erlaubt kommerzielle Nutzung, Modifikation, Distribution. Nur Lizenzhinweis erforderlich.
|
||||
|
||||
### Keine Copyleft-Lizenzen
|
||||
|
||||
Es werden **keine** Copyleft-Lizenzen (GPL, AGPL, LGPL) verwendet, die eine Offenlegung des eigenen Quellcodes erfordern würden.
|
||||
|
||||
### Empfohlene Maßnahmen
|
||||
|
||||
1. **NOTICE-Datei pflegen**: Alle Lizenztexte in einer NOTICE-Datei zusammenfassen
|
||||
2. **Regelmäßige Updates**: Abhängigkeiten auf bekannte Schwachstellen prüfen
|
||||
3. **License-Scanner**: Tool wie `go-licenses` oder `fossa` für automatisierte Prüfung
|
||||
|
||||
---
|
||||
|
||||
## Generierung des SBOM
|
||||
|
||||
```bash
|
||||
# SBOM im SPDX-Format generieren
|
||||
go install github.com/spdx/tools-golang/cmd/spdx-tvwriter@latest
|
||||
go mod download
|
||||
# Manuell: SPDX-Dokument erstellen
|
||||
|
||||
# Alternativ: CycloneDX Format
|
||||
go install github.com/CycloneDX/cyclonedx-gomod/cmd/cyclonedx-gomod@latest
|
||||
cyclonedx-gomod mod -output sbom.json
|
||||
|
||||
# Lizenz-Prüfung
|
||||
go install github.com/google/go-licenses@latest
|
||||
go-licenses csv github.com/breakpilot/ai-compliance-sdk/...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Dokumentationsstand: 2026-01-29*
|
||||
97
docs-src/services/ai-compliance-sdk/index.md
Normal file
97
docs-src/services/ai-compliance-sdk/index.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# AI Compliance SDK
|
||||
|
||||
Das AI Compliance SDK ist ein Go-basierter Service zur Compliance-Bewertung von KI-Anwendungsfällen.
|
||||
|
||||
## Übersicht
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| **Port** | 8090 |
|
||||
| **Framework** | Go (Gin) |
|
||||
| **Datenbank** | PostgreSQL |
|
||||
| **Vector DB** | Qdrant (Legal RAG) |
|
||||
|
||||
## Kernkomponenten
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ UCCA System │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Frontend │───>│ SDK API │───>│ PostgreSQL │ │
|
||||
│ │ (Next.js) │ │ (Go) │ │ Database │ │
|
||||
│ └──────────────┘ └──────┬───────┘ └──────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────┼────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Policy │ │ Escalation │ │ Legal RAG │ │
|
||||
│ │ Engine │ │ Workflow │ │ (Qdrant) │ │
|
||||
│ │ (45 Regeln) │ │ (E0-E3) │ │ 2,274 Chunks │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **UCCA (Use-Case Compliance Advisor)**: Deterministische Bewertung von KI-Anwendungsfällen
|
||||
- **Policy Engine**: 45 regelbasierte Compliance-Prüfungen
|
||||
- **License Policy Engine**: Standards/Normen-Compliance (DIN, ISO, VDI)
|
||||
- **Legal RAG**: Semantische Suche in EU-Verordnungen (DSGVO, AI Act, NIS2)
|
||||
- **Eskalations-Workflow**: E0-E3 Stufen mit Human-in-the-Loop
|
||||
- **Wizard & Legal Assistant**: Geführte Eingabe mit Rechtsassistent
|
||||
- **Generic Obligations Framework**: NIS2, DSGVO, AI Act Module
|
||||
|
||||
## Kernprinzip
|
||||
|
||||
> **"LLM ist NICHT die Quelle der Wahrheit. Wahrheit = Regeln + Evidenz. LLM = Übersetzer + Subsumptionshelfer"**
|
||||
|
||||
Das System folgt einem strikten **Human-in-the-Loop** Ansatz:
|
||||
|
||||
1. **Deterministische Regeln** treffen alle Compliance-Entscheidungen
|
||||
2. **LLM** erklärt nur Ergebnisse, überschreibt nie BLOCK-Entscheidungen
|
||||
3. **Menschen** (DSB, Legal) treffen finale Entscheidungen bei kritischen Fällen
|
||||
|
||||
## API-Endpunkte
|
||||
|
||||
### Assessment
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/assess` | Assessment erstellen |
|
||||
| GET | `/sdk/v1/ucca/assessments` | Assessments auflisten |
|
||||
| GET | `/sdk/v1/ucca/assessments/:id` | Assessment abrufen |
|
||||
| POST | `/sdk/v1/ucca/assessments/:id/explain` | LLM-Erklärung generieren |
|
||||
|
||||
### Eskalation
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/sdk/v1/ucca/escalations` | Eskalationen auflisten |
|
||||
| POST | `/sdk/v1/ucca/escalations/:id/decide` | Entscheidung treffen |
|
||||
|
||||
### Obligations Framework
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/sdk/v1/ucca/obligations/assess` | Pflichten-Assessment |
|
||||
| POST | `/sdk/v1/ucca/obligations/export/memo` | PDF-Export |
|
||||
|
||||
## Weiterführende Dokumentation
|
||||
|
||||
- [Architektur](./ARCHITECTURE.md) - Detaillierte Systemarchitektur
|
||||
- [Entwickler-Guide](./DEVELOPER.md) - Entwickler-Dokumentation
|
||||
- [Auditor-Dokumentation](./AUDITOR_DOCUMENTATION.md) - Dokumentation für externe Auditoren
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
cd ai-compliance-sdk
|
||||
go test -v ./...
|
||||
|
||||
# Mit Coverage
|
||||
go test -cover ./...
|
||||
```
|
||||
353
docs-src/services/ki-daten-pipeline/architecture.md
Normal file
353
docs-src/services/ki-daten-pipeline/architecture.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# KI-Daten-Pipeline Architektur
|
||||
|
||||
Diese Seite dokumentiert die technische Architektur der KI-Daten-Pipeline im Detail.
|
||||
|
||||
## Systemuebersicht
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Users["Benutzer"]
|
||||
U1[Entwickler]
|
||||
U2[Data Scientists]
|
||||
U3[Lehrer]
|
||||
end
|
||||
|
||||
subgraph Frontend["Frontend (admin-v2)"]
|
||||
direction TB
|
||||
F1["OCR-Labeling<br/>/ai/ocr-labeling"]
|
||||
F2["RAG Pipeline<br/>/ai/rag-pipeline"]
|
||||
F3["Daten & RAG<br/>/ai/rag"]
|
||||
F4["Klausur-Korrektur<br/>/ai/klausur-korrektur"]
|
||||
end
|
||||
|
||||
subgraph Backend["Backend Services"]
|
||||
direction TB
|
||||
B1["klausur-service<br/>Port 8086"]
|
||||
B2["embedding-service<br/>Port 8087"]
|
||||
end
|
||||
|
||||
subgraph Storage["Persistenz"]
|
||||
direction TB
|
||||
D1[(PostgreSQL<br/>Metadaten)]
|
||||
D2[(Qdrant<br/>Vektoren)]
|
||||
D3[(MinIO<br/>Bilder/PDFs)]
|
||||
end
|
||||
|
||||
subgraph External["Externe APIs"]
|
||||
E1[OpenAI API]
|
||||
E2[Ollama]
|
||||
end
|
||||
|
||||
U1 --> F1
|
||||
U2 --> F2
|
||||
U3 --> F4
|
||||
|
||||
F1 --> B1
|
||||
F2 --> B1
|
||||
F3 --> B1
|
||||
F4 --> B1
|
||||
|
||||
B1 --> D1
|
||||
B1 --> D2
|
||||
B1 --> D3
|
||||
B1 --> B2
|
||||
|
||||
B2 --> E1
|
||||
B1 --> E2
|
||||
```
|
||||
|
||||
## Komponenten-Details
|
||||
|
||||
### OCR-Labeling Modul
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Upload["Upload-Prozess"]
|
||||
U1[Bilder hochladen] --> U2[MinIO speichern]
|
||||
U2 --> U3[Session erstellen]
|
||||
end
|
||||
|
||||
subgraph OCR["OCR-Verarbeitung"]
|
||||
O1[Bild laden] --> O2{Modell wählen}
|
||||
O2 -->|llama3.2-vision| O3a[Vision LLM]
|
||||
O2 -->|trocr| O3b[Transformer]
|
||||
O2 -->|paddleocr| O3c[PaddleOCR]
|
||||
O2 -->|donut| O3d[Document AI]
|
||||
O3a --> O4[OCR-Text]
|
||||
O3b --> O4
|
||||
O3c --> O4
|
||||
O3d --> O4
|
||||
end
|
||||
|
||||
subgraph Labeling["Labeling-Prozess"]
|
||||
L1[Queue laden] --> L2[Item anzeigen]
|
||||
L2 --> L3{Entscheidung}
|
||||
L3 -->|korrekt| L4[Bestaetigen]
|
||||
L3 -->|falsch| L5[Korrigieren]
|
||||
L3 -->|unklar| L6[Ueberspringen]
|
||||
L4 --> L7[PostgreSQL]
|
||||
L5 --> L7
|
||||
L6 --> L7
|
||||
end
|
||||
|
||||
subgraph Export["Export"]
|
||||
E1[Gelabelte Items] --> E2{Format}
|
||||
E2 -->|TrOCR| E3a[Transformer Format]
|
||||
E2 -->|Llama| E3b[Vision Format]
|
||||
E2 -->|Generic| E3c[JSON]
|
||||
end
|
||||
|
||||
Upload --> OCR
|
||||
OCR --> Labeling
|
||||
Labeling --> Export
|
||||
```
|
||||
|
||||
### RAG Pipeline Modul
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Sources["Datenquellen"]
|
||||
S1[NiBiS PDFs]
|
||||
S2[Uploads]
|
||||
S3[Rechtskorpus]
|
||||
S4[Schulordnungen]
|
||||
end
|
||||
|
||||
subgraph Processing["Verarbeitung"]
|
||||
direction TB
|
||||
P1[PDF Parser] --> P2[OCR falls noetig]
|
||||
P2 --> P3[Text Cleaning]
|
||||
P3 --> P4[Chunking<br/>1000 chars, 200 overlap]
|
||||
P4 --> P5[Metadata Extraction]
|
||||
end
|
||||
|
||||
subgraph Embedding["Embedding"]
|
||||
E1[embedding-service] --> E2[OpenAI API]
|
||||
E2 --> E3[1536-dim Vektor]
|
||||
end
|
||||
|
||||
subgraph Indexing["Indexierung"]
|
||||
I1{Collection waehlen}
|
||||
I1 -->|EH| I2a[bp_nibis_eh]
|
||||
I1 -->|Custom| I2b[bp_eh]
|
||||
I1 -->|Legal| I2c[bp_legal_corpus]
|
||||
I1 -->|Schul| I2d[bp_schulordnungen]
|
||||
I2a --> I3[Qdrant upsert]
|
||||
I2b --> I3
|
||||
I2c --> I3
|
||||
I2d --> I3
|
||||
end
|
||||
|
||||
Sources --> Processing
|
||||
Processing --> Embedding
|
||||
Embedding --> Indexing
|
||||
```
|
||||
|
||||
### Daten & RAG Modul
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Query["Suchanfrage"]
|
||||
Q1[User Query] --> Q2[Query Embedding]
|
||||
Q2 --> Q3[1536-dim Vektor]
|
||||
end
|
||||
|
||||
subgraph Search["Qdrant Suche"]
|
||||
S1[Collection waehlen] --> S2[Vector Search]
|
||||
S2 --> S3[Top-k Results]
|
||||
S3 --> S4[Score Filtering]
|
||||
end
|
||||
|
||||
subgraph Results["Ergebnisse"]
|
||||
R1[Chunks] --> R2[Metadata anreichern]
|
||||
R2 --> R3[Source URLs]
|
||||
R3 --> R4[Response]
|
||||
end
|
||||
|
||||
Query --> Search
|
||||
Search --> Results
|
||||
```
|
||||
|
||||
## Datenmodelle
|
||||
|
||||
### OCR-Labeling
|
||||
|
||||
```typescript
|
||||
interface OCRSession {
|
||||
id: string
|
||||
name: string
|
||||
source_type: 'klausur' | 'handwriting_sample' | 'scan'
|
||||
ocr_model: 'llama3.2-vision:11b' | 'trocr' | 'paddleocr' | 'donut'
|
||||
total_items: number
|
||||
labeled_items: number
|
||||
status: 'active' | 'completed' | 'archived'
|
||||
created_at: string
|
||||
}
|
||||
|
||||
interface OCRItem {
|
||||
id: string
|
||||
session_id: string
|
||||
image_path: string
|
||||
ocr_text: string | null
|
||||
ocr_confidence: number | null
|
||||
ground_truth: string | null
|
||||
status: 'pending' | 'confirmed' | 'corrected' | 'skipped'
|
||||
label_time_seconds: number | null
|
||||
}
|
||||
```
|
||||
|
||||
### RAG Pipeline
|
||||
|
||||
```typescript
|
||||
interface TrainingJob {
|
||||
id: string
|
||||
name: string
|
||||
status: 'queued' | 'preparing' | 'training' | 'validating' | 'completed' | 'failed' | 'paused'
|
||||
progress: number
|
||||
current_epoch: number
|
||||
total_epochs: number
|
||||
documents_processed: number
|
||||
total_documents: number
|
||||
config: {
|
||||
batch_size: number
|
||||
bundeslaender: string[]
|
||||
mixed_precision: boolean
|
||||
}
|
||||
}
|
||||
|
||||
interface DataSource {
|
||||
id: string
|
||||
name: string
|
||||
collection: string
|
||||
document_count: number
|
||||
chunk_count: number
|
||||
status: 'active' | 'pending' | 'error'
|
||||
last_updated: string | null
|
||||
}
|
||||
```
|
||||
|
||||
### Legal Corpus
|
||||
|
||||
```typescript
|
||||
interface RegulationStatus {
|
||||
code: string
|
||||
name: string
|
||||
fullName: string
|
||||
type: 'eu_regulation' | 'eu_directive' | 'de_law' | 'bsi_standard'
|
||||
chunkCount: number
|
||||
status: 'ready' | 'empty' | 'error'
|
||||
}
|
||||
|
||||
interface SearchResult {
|
||||
text: string
|
||||
regulation_code: string
|
||||
regulation_name: string
|
||||
article: string | null
|
||||
paragraph: string | null
|
||||
source_url: string
|
||||
score: number
|
||||
}
|
||||
```
|
||||
|
||||
## Qdrant Collections
|
||||
|
||||
### Konfiguration
|
||||
|
||||
| Collection | Vektor-Dimension | Distanz-Metrik | Payload |
|
||||
|------------|-----------------|----------------|---------|
|
||||
| `bp_nibis_eh` | 1536 | COSINE | bundesland, fach, aufgabe |
|
||||
| `bp_eh` | 1536 | COSINE | user_id, klausur_id |
|
||||
| `bp_legal_corpus` | 1536 | COSINE | regulation, article, source_url |
|
||||
| `bp_schulordnungen` | 1536 | COSINE | bundesland, typ, datum |
|
||||
|
||||
### Chunk-Strategie
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Originaldokument │
|
||||
│ Lorem ipsum dolor sit amet, consectetur adipiscing elit... │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │
|
||||
│ 0-1000 chars │ │ 800-1800 chars │ │ 1600-2600 chars │
|
||||
│ │ │ (200 overlap) │ │ (200 overlap) │
|
||||
└──────────────────────┘ └──────────────────────┘ └──────────────────────┘
|
||||
```
|
||||
|
||||
## API-Authentifizierung
|
||||
|
||||
Alle Endpunkte nutzen die zentrale Auth-Middleware:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as Client
|
||||
participant A as API Gateway
|
||||
participant S as klausur-service
|
||||
participant D as Datenbank
|
||||
|
||||
C->>A: Request + JWT Token
|
||||
A->>A: Token validieren
|
||||
A->>S: Forwarded Request
|
||||
S->>D: Daten abfragen
|
||||
D->>S: Response
|
||||
S->>C: JSON Response
|
||||
```
|
||||
|
||||
## Monitoring & Metriken
|
||||
|
||||
### Verfuegbare Metriken
|
||||
|
||||
| Metrik | Beschreibung | Endpoint |
|
||||
|--------|--------------|----------|
|
||||
| `ocr_items_total` | Gesamtzahl OCR-Items | `/api/v1/ocr-label/stats` |
|
||||
| `ocr_accuracy_rate` | OCR-Genauigkeit | `/api/v1/ocr-label/stats` |
|
||||
| `rag_chunk_count` | Anzahl indexierter Chunks | `/api/legal-corpus/status` |
|
||||
| `rag_collection_status` | Collection-Status | `/api/legal-corpus/status` |
|
||||
|
||||
### Logging
|
||||
|
||||
```python
|
||||
# Strukturiertes Logging im klausur-service
|
||||
logger.info("OCR processing started", extra={
|
||||
"session_id": session_id,
|
||||
"item_count": item_count,
|
||||
"model": ocr_model
|
||||
})
|
||||
```
|
||||
|
||||
## Fehlerbehandlung
|
||||
|
||||
### Retry-Strategien
|
||||
|
||||
| Operation | Max Retries | Backoff |
|
||||
|-----------|-------------|---------|
|
||||
| OCR-Verarbeitung | 3 | Exponentiell (1s, 2s, 4s) |
|
||||
| Embedding-API | 5 | Exponentiell mit Jitter |
|
||||
| Qdrant-Upsert | 3 | Linear (1s) |
|
||||
|
||||
### Fallback-Verhalten
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Embedding Request] --> B{OpenAI verfuegbar?}
|
||||
B -->|Ja| C[OpenAI API]
|
||||
B -->|Nein| D{Lokales Modell?}
|
||||
D -->|Ja| E[Ollama Embedding]
|
||||
D -->|Nein| F[Error + Queue]
|
||||
```
|
||||
|
||||
## Skalierung
|
||||
|
||||
### Aktueller Stand
|
||||
|
||||
- **Single Node**: Alle Services auf Mac Mini
|
||||
- **Qdrant**: Standalone, ~50k Chunks
|
||||
- **PostgreSQL**: Shared mit anderen Services
|
||||
|
||||
### Geplante Erweiterungen
|
||||
|
||||
1. **Qdrant Cluster**: Bei > 1M Chunks
|
||||
2. **Worker Queue**: Redis-basiert fuer Batch-Jobs
|
||||
3. **GPU-Offloading**: OCR auf vast.ai GPU-Instanzen
|
||||
215
docs-src/services/ki-daten-pipeline/index.md
Normal file
215
docs-src/services/ki-daten-pipeline/index.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# KI-Daten-Pipeline
|
||||
|
||||
Die KI-Daten-Pipeline ist ein zusammenhaengendes System aus drei Modulen, das den Datenfluss von der Erfassung bis zur semantischen Suche abbildet.
|
||||
|
||||
## Uebersicht
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph OCR["OCR-Labeling"]
|
||||
A[Klausur-Scans] --> B[OCR Erkennung]
|
||||
B --> C[Ground Truth Labels]
|
||||
end
|
||||
|
||||
subgraph RAG["RAG Pipeline"]
|
||||
D[PDF Dokumente] --> E[Text-Extraktion]
|
||||
E --> F[Chunking]
|
||||
F --> G[Embedding]
|
||||
end
|
||||
|
||||
subgraph SEARCH["Daten & RAG"]
|
||||
H[Qdrant Collections]
|
||||
I[Semantische Suche]
|
||||
end
|
||||
|
||||
C -->|Export| D
|
||||
G -->|Indexierung| H
|
||||
H --> I
|
||||
I -->|Ergebnisse| J[Klausur-Korrektur]
|
||||
```
|
||||
|
||||
## Module
|
||||
|
||||
| Modul | Pfad | Funktion | Backend |
|
||||
|-------|------|----------|---------|
|
||||
| **OCR-Labeling** | `/ai/ocr-labeling` | Ground Truth fuer Handschrift-OCR | klausur-service:8086 |
|
||||
| **RAG Pipeline** | `/ai/rag-pipeline` | Dokument-Indexierung | klausur-service:8086 |
|
||||
| **Daten & RAG** | `/ai/rag` | Vektor-Suche & Collection-Mapping | klausur-service:8086 |
|
||||
|
||||
## Datenfluss
|
||||
|
||||
### 1. OCR-Labeling (Eingabe)
|
||||
|
||||
Das OCR-Labeling-Modul erfasst Ground Truth Daten fuer das Training von Handschrift-Erkennungsmodellen:
|
||||
|
||||
- **Upload**: Klausur-Scans (PDF/Bilder) werden hochgeladen
|
||||
- **OCR-Verarbeitung**: Mehrere OCR-Modelle erkennen den Text
|
||||
- `llama3.2-vision:11b` - Vision LLM (beste Qualitaet)
|
||||
- `trocr` - Microsoft Transformer (schnell)
|
||||
- `paddleocr` - PaddleOCR + LLM (4x schneller)
|
||||
- `donut` - Document Understanding (strukturiert)
|
||||
- **Labeling**: Manuelles Pruefen und Korrigieren der OCR-Ergebnisse
|
||||
- **Export**: Gelabelte Daten koennen exportiert werden fuer:
|
||||
- TrOCR Fine-Tuning
|
||||
- Llama Vision Fine-Tuning
|
||||
- Generic JSON
|
||||
|
||||
### 2. RAG Pipeline (Verarbeitung)
|
||||
|
||||
Die RAG Pipeline verarbeitet Dokumente und macht sie suchbar:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Datenquellen] --> B[OCR/Text-Extraktion]
|
||||
B --> C[Chunking]
|
||||
C --> D[Embedding]
|
||||
D --> E[Qdrant Indexierung]
|
||||
|
||||
subgraph sources["Datenquellen"]
|
||||
S1[NiBiS PDFs]
|
||||
S2[Eigene EH]
|
||||
S3[Rechtskorpus]
|
||||
S4[Schulordnungen]
|
||||
end
|
||||
```
|
||||
|
||||
**Verarbeitungsschritte:**
|
||||
|
||||
1. **Dokumentenextraktion**: PDFs und Bilder werden per OCR in Text umgewandelt
|
||||
2. **Chunking**: Lange Texte werden in Abschnitte aufgeteilt
|
||||
- Chunk-Groesse: 1000 Zeichen
|
||||
- Ueberlappung: 200 Zeichen
|
||||
3. **Embedding**: Jeder Chunk wird in einen Vektor umgewandelt
|
||||
- Modell: `text-embedding-3-small`
|
||||
- Dimensionen: 1536
|
||||
4. **Indexierung**: Vektoren werden in Qdrant gespeichert
|
||||
|
||||
### 3. Daten & RAG (Ausgabe)
|
||||
|
||||
Das Daten & RAG Modul ermoeglicht die Verwaltung und Suche:
|
||||
|
||||
- **Collection-Uebersicht**: Status aller Qdrant Collections
|
||||
- **Semantische Suche**: Fragen werden in Vektoren umgewandelt und aehnliche Dokumente gefunden
|
||||
- **Regulierungs-Mapping**: Zeigt welche Regulierungen indexiert sind
|
||||
|
||||
## Qdrant Collections
|
||||
|
||||
| Collection | Inhalt | Status |
|
||||
|------------|--------|--------|
|
||||
| `bp_nibis_eh` | Offizielle NiBiS Erwartungshorizonte | Aktiv |
|
||||
| `bp_eh` | Benutzerdefinierte Erwartungshorizonte | Aktiv |
|
||||
| `bp_schulordnungen` | Schulordnungen aller Bundeslaender | In Arbeit |
|
||||
| `bp_legal_corpus` | Rechtskorpus (DSGVO, AI Act, BSI, etc.) | Aktiv |
|
||||
|
||||
## Technische Architektur
|
||||
|
||||
### Services
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Frontend["Admin-v2 (Next.js)"]
|
||||
F1["/ai/ocr-labeling"]
|
||||
F2["/ai/rag-pipeline"]
|
||||
F3["/ai/rag"]
|
||||
end
|
||||
|
||||
subgraph Backend["klausur-service (Python)"]
|
||||
B1[OCR Endpoints]
|
||||
B2[Indexierungs-Jobs]
|
||||
B3[Such-API]
|
||||
end
|
||||
|
||||
subgraph Storage["Datenbanken"]
|
||||
D1[(PostgreSQL)]
|
||||
D2[(Qdrant)]
|
||||
D3[(MinIO)]
|
||||
end
|
||||
|
||||
F1 --> B1
|
||||
F2 --> B2
|
||||
F3 --> B3
|
||||
|
||||
B1 --> D1
|
||||
B1 --> D3
|
||||
B2 --> D2
|
||||
B3 --> D2
|
||||
```
|
||||
|
||||
### Backend-Endpunkte
|
||||
|
||||
#### OCR-Labeling (`/api/v1/ocr-label/`)
|
||||
|
||||
| Endpoint | Methode | Beschreibung |
|
||||
|----------|---------|--------------|
|
||||
| `/sessions` | GET/POST | Session-Verwaltung |
|
||||
| `/sessions/{id}/upload` | POST | Bilder hochladen |
|
||||
| `/queue` | GET | Labeling-Queue |
|
||||
| `/confirm` | POST | OCR bestaetigen |
|
||||
| `/correct` | POST | OCR korrigieren |
|
||||
| `/skip` | POST | Item ueberspringen |
|
||||
| `/stats` | GET | Statistiken |
|
||||
| `/export` | POST | Trainingsdaten exportieren |
|
||||
|
||||
#### RAG Pipeline (`/api/ai/rag-pipeline`)
|
||||
|
||||
| Action | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `jobs` | Indexierungs-Jobs auflisten |
|
||||
| `dataset-stats` | Datensatz-Statistiken |
|
||||
| `create-job` | Neue Indexierung starten |
|
||||
| `pause` | Job pausieren |
|
||||
| `resume` | Job fortsetzen |
|
||||
| `cancel` | Job abbrechen |
|
||||
|
||||
#### Legal Corpus (`/api/legal-corpus/`)
|
||||
|
||||
| Endpoint | Beschreibung |
|
||||
|----------|--------------|
|
||||
| `/status` | Collection-Status |
|
||||
| `/search` | Semantische Suche |
|
||||
| `/ingest` | Dokumente indexieren |
|
||||
|
||||
## Integration mit Klausur-Korrektur
|
||||
|
||||
Die KI-Daten-Pipeline liefert Erwartungshorizont-Vorschlaege fuer die Klausur-Korrektur:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant L as Lehrer
|
||||
participant K as Klausur-Korrektur
|
||||
participant R as RAG-Suche
|
||||
participant Q as Qdrant
|
||||
|
||||
L->>K: Schueler-Antwort pruefen
|
||||
K->>R: EH-Vorschlaege laden
|
||||
R->>Q: Semantische Suche
|
||||
Q->>R: Top-k Chunks
|
||||
R->>K: Relevante EH-Passagen
|
||||
K->>L: Bewertungsvorschlaege
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Die Module werden als Teil des admin-v2 Containers deployed:
|
||||
|
||||
```bash
|
||||
# 1. Sync
|
||||
rsync -avz --delete --exclude 'node_modules' --exclude '.next' --exclude '.git' \
|
||||
/Users/benjaminadmin/Projekte/breakpilot-pwa/admin-v2/ \
|
||||
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/admin-v2/
|
||||
|
||||
# 2. Build & Deploy
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache admin-v2 && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d admin-v2"
|
||||
```
|
||||
|
||||
## Verwandte Dokumentation
|
||||
|
||||
- [OCR Labeling Spezifikation](../klausur-service/OCR-Labeling-Spec.md)
|
||||
- [RAG Admin Spezifikation](../klausur-service/RAG-Admin-Spec.md)
|
||||
- [NiBiS Ingestion Pipeline](../klausur-service/NiBiS-Ingestion-Pipeline.md)
|
||||
- [Multi-Agent Architektur](../../architecture/multi-agent.md)
|
||||
322
docs-src/services/klausur-service/BYOEH-Architecture.md
Normal file
322
docs-src/services/klausur-service/BYOEH-Architecture.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# BYOEH (Bring-Your-Own-Expectation-Horizon) - Architecture Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The BYOEH module enables teachers to upload their own Erwartungshorizonte (expectation horizons/grading rubrics) and use them for RAG-assisted grading suggestions. Key design principles:
|
||||
|
||||
- **Tenant Isolation**: Each teacher/school has an isolated namespace
|
||||
- **No Training Guarantee**: EH content is only used for RAG, never for model training
|
||||
- **Operator Blindness**: Client-side encryption ensures Breakpilot cannot view plaintext
|
||||
- **Rights Confirmation**: Required legal acknowledgment at upload time
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (Port 8086) │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ ┌────────────────────┐ ┌─────────────────────────────────────────┐ │
|
||||
│ │ BYOEH REST API │ │ BYOEH Service Layer │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ POST /api/v1/eh │───▶│ - Upload Wizard Logic │ │
|
||||
│ │ GET /api/v1/eh │ │ - Rights Confirmation │ │
|
||||
│ │ DELETE /api/v1/eh │ │ - Chunking Pipeline │ │
|
||||
│ │ POST /rag-query │ │ - Encryption Service │ │
|
||||
│ └────────────────────┘ └────────────────────┬────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┼────────────────────────┘
|
||||
│
|
||||
┌───────────────────────────────────────┼───────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐
|
||||
│ PostgreSQL │ │ Qdrant │ │ Encrypted Storage │
|
||||
│ (Metadata + Audit) │ │ (Vector Search) │ │ /app/eh-uploads/ │
|
||||
│ │ │ │ │ │
|
||||
│ In-Memory Storage: │ │ Collection: bp_eh │ │ {tenant}/{eh_id}/ │
|
||||
│ - erwartungshorizonte│ │ - tenant_id (filter) │ │ encrypted.bin │
|
||||
│ - eh_chunks │ │ - eh_id │ │ salt.txt │
|
||||
│ - eh_key_shares │ │ - embedding[1536] │ │ │
|
||||
│ - eh_klausur_links │ │ - encrypted_content │ └──────────────────────┘
|
||||
│ - eh_audit_log │ │ │
|
||||
└──────────────────────┘ └──────────────────────────┘
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Upload Flow
|
||||
|
||||
```
|
||||
Browser Backend Storage
|
||||
│ │ │
|
||||
│ 1. User selects PDF │ │
|
||||
│ 2. User enters passphrase │ │
|
||||
│ 3. PBKDF2 key derivation │ │
|
||||
│ 4. AES-256-GCM encryption │ │
|
||||
│ 5. SHA-256 key hash │ │
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/upload │ │
|
||||
│ (encrypted blob + key_hash) │ │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ Store encrypted.bin + salt │
|
||||
│ │◀──────────────────────────────│
|
||||
│ │ │
|
||||
│ │ Save metadata to DB │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return EH record │ │
|
||||
```
|
||||
|
||||
### 2. Indexing Flow (RAG Preparation)
|
||||
|
||||
```
|
||||
Browser Backend Qdrant
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/{id}/index │ │
|
||||
│ (passphrase for decryption) │ │
|
||||
│ │ │
|
||||
│ │ 1. Verify key hash │
|
||||
│ │ 2. Decrypt content │
|
||||
│ │ 3. Extract text (PDF) │
|
||||
│ │ 4. Chunk text │
|
||||
│ │ 5. Generate embeddings │
|
||||
│ │ 6. Re-encrypt each chunk │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ Index vectors + encrypted │
|
||||
│ │ chunks with tenant filter │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return chunk count │ │
|
||||
```
|
||||
|
||||
### 3. RAG Query Flow
|
||||
|
||||
```
|
||||
Browser Backend Qdrant
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/rag-query │ │
|
||||
│ (query + passphrase) │ │
|
||||
│ │ │
|
||||
│ │ 1. Generate query embedding │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ 2. Semantic search │
|
||||
│ │ (tenant-filtered) │
|
||||
│ │◀──────────────────────────────│
|
||||
│ │ 3. Decrypt matched chunks │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return decrypted context │ │
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Client-Side Encryption
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Browser (Client-Side) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. User enters passphrase (NEVER sent to server) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 2. Key Derivation: PBKDF2-SHA256(passphrase, salt, 100k iter) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 3. Encryption: AES-256-GCM(key, iv, file_content) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 4. Key-Hash: SHA-256(derived_key) → server verification only │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 5. Upload: encrypted_blob + key_hash + salt (NOT key!) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Security Guarantees
|
||||
|
||||
| Guarantee | Implementation |
|
||||
|-----------|----------------|
|
||||
| **No Training** | `training_allowed: false` on all Qdrant points |
|
||||
| **Operator Blindness** | Passphrase never leaves browser; server only sees key hash |
|
||||
| **Tenant Isolation** | Every query filtered by `tenant_id` |
|
||||
| **Audit Trail** | All actions logged with timestamps |
|
||||
|
||||
## Key Sharing System
|
||||
|
||||
The key sharing system enables first examiners to grant access to their EH to second examiners and supervisors.
|
||||
|
||||
### Share Flow
|
||||
|
||||
```
|
||||
First Examiner Backend Second Examiner
|
||||
│ │ │
|
||||
│ 1. Encrypt passphrase for │ │
|
||||
│ recipient (client-side) │ │
|
||||
│ │ │
|
||||
│─────────────────────────────▶ │
|
||||
│ POST /eh/{id}/share │ │
|
||||
│ (encrypted_passphrase, role)│ │
|
||||
│ │ │
|
||||
│ │ Store EHKeyShare │
|
||||
│◀───────────────────────────── │
|
||||
│ │ │
|
||||
│ │ │
|
||||
│ │◀────────────────────────────│
|
||||
│ │ GET /eh/shared-with-me │
|
||||
│ │ │
|
||||
│ │─────────────────────────────▶
|
||||
│ │ Return shared EH list │
|
||||
│ │ │
|
||||
│ │◀────────────────────────────│
|
||||
│ │ RAG query with decrypted │
|
||||
│ │ passphrase │
|
||||
```
|
||||
|
||||
### Data Structures
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EHKeyShare:
|
||||
id: str
|
||||
eh_id: str
|
||||
user_id: str # Recipient
|
||||
encrypted_passphrase: str # Client-encrypted for recipient
|
||||
passphrase_hint: str # Optional hint
|
||||
granted_by: str # Grantor user ID
|
||||
granted_at: datetime
|
||||
role: str # second_examiner, third_examiner, supervisor
|
||||
klausur_id: Optional[str] # Link to specific Klausur
|
||||
active: bool
|
||||
|
||||
@dataclass
|
||||
class EHKlausurLink:
|
||||
id: str
|
||||
eh_id: str
|
||||
klausur_id: str
|
||||
linked_by: str
|
||||
linked_at: datetime
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Core EH Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/upload` | Upload encrypted EH |
|
||||
| GET | `/api/v1/eh` | List user's EH |
|
||||
| GET | `/api/v1/eh/{id}` | Get single EH |
|
||||
| DELETE | `/api/v1/eh/{id}` | Soft delete EH |
|
||||
| POST | `/api/v1/eh/{id}/index` | Index EH for RAG |
|
||||
| POST | `/api/v1/eh/rag-query` | Query EH content |
|
||||
|
||||
### Key Sharing Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/{id}/share` | Share EH with examiner |
|
||||
| GET | `/api/v1/eh/{id}/shares` | List shares (owner) |
|
||||
| DELETE | `/api/v1/eh/{id}/shares/{shareId}` | Revoke share |
|
||||
| GET | `/api/v1/eh/shared-with-me` | List EH shared with user |
|
||||
|
||||
### Klausur Integration Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/{id}/link-klausur` | Link EH to Klausur |
|
||||
| DELETE | `/api/v1/eh/{id}/link-klausur/{klausurId}` | Unlink EH |
|
||||
| GET | `/api/v1/klausuren/{id}/linked-eh` | Get linked EH for Klausur |
|
||||
|
||||
### Audit & Admin Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/v1/eh/audit-log` | Get audit log |
|
||||
| GET | `/api/v1/eh/rights-text` | Get rights confirmation text |
|
||||
| GET | `/api/v1/eh/qdrant-status` | Get Qdrant status (admin) |
|
||||
|
||||
## Frontend Components
|
||||
|
||||
### EHUploadWizard
|
||||
|
||||
5-step wizard for uploading Erwartungshorizonte:
|
||||
|
||||
1. **File Selection** - Choose PDF file
|
||||
2. **Metadata** - Title, Subject, Niveau, Year
|
||||
3. **Rights Confirmation** - Legal acknowledgment
|
||||
4. **Encryption** - Set passphrase (2x confirmation)
|
||||
5. **Summary** - Review and upload
|
||||
|
||||
### Integration Points
|
||||
|
||||
- **KorrekturPage**: Shows EH prompt after first student upload
|
||||
- **GutachtenGeneration**: Uses RAG context from linked EH
|
||||
- **Sidebar Badge**: Shows linked EH count
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # API endpoints + data structures
|
||||
│ ├── qdrant_service.py # Vector database operations
|
||||
│ ├── eh_pipeline.py # Chunking, embedding, encryption
|
||||
│ └── requirements.txt # Python dependencies
|
||||
├── frontend/
|
||||
│ └── src/
|
||||
│ ├── components/
|
||||
│ │ └── EHUploadWizard.tsx
|
||||
│ ├── services/
|
||||
│ │ ├── api.ts # API client
|
||||
│ │ └── encryption.ts # Client-side crypto
|
||||
│ ├── pages/
|
||||
│ │ └── KorrekturPage.tsx # EH integration
|
||||
│ └── styles/
|
||||
│ └── eh-wizard.css
|
||||
└── docs/
|
||||
├── BYOEH-Architecture.md
|
||||
└── BYOEH-Developer-Guide.md
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```env
|
||||
QDRANT_URL=http://qdrant:6333
|
||||
OPENAI_API_KEY=sk-... # For embeddings
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
EH_UPLOAD_DIR=/app/eh-uploads
|
||||
```
|
||||
|
||||
### Docker Services
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
qdrant:
|
||||
image: qdrant/qdrant:v1.7.4
|
||||
ports:
|
||||
- "6333:6333"
|
||||
volumes:
|
||||
- qdrant_data:/qdrant/storage
|
||||
```
|
||||
|
||||
## Audit Events
|
||||
|
||||
| Action | Description |
|
||||
|--------|-------------|
|
||||
| `upload` | EH uploaded |
|
||||
| `index` | EH indexed for RAG |
|
||||
| `rag_query` | RAG query executed |
|
||||
| `delete` | EH soft deleted |
|
||||
| `share` | EH shared with examiner |
|
||||
| `revoke_share` | Share revoked |
|
||||
| `link_klausur` | EH linked to Klausur |
|
||||
| `unlink_klausur` | EH unlinked from Klausur |
|
||||
|
||||
## See Also
|
||||
|
||||
- [Zeugnis-System Architektur](../../architecture/zeugnis-system.md)
|
||||
- [Klausur-Service Index](./index.md)
|
||||
481
docs-src/services/klausur-service/BYOEH-Developer-Guide.md
Normal file
481
docs-src/services/klausur-service/BYOEH-Developer-Guide.md
Normal file
@@ -0,0 +1,481 @@
|
||||
# BYOEH Developer Guide
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- Docker & Docker Compose
|
||||
- OpenAI API Key (for embeddings)
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Start services:**
|
||||
```bash
|
||||
docker-compose up -d qdrant
|
||||
```
|
||||
|
||||
2. **Configure environment:**
|
||||
```env
|
||||
QDRANT_URL=http://localhost:6333
|
||||
OPENAI_API_KEY=sk-your-key
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
```
|
||||
|
||||
3. **Run klausur-service:**
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
pip install -r requirements.txt
|
||||
uvicorn main:app --reload --port 8086
|
||||
```
|
||||
|
||||
4. **Run frontend:**
|
||||
```bash
|
||||
cd klausur-service/frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## Client-Side Encryption
|
||||
|
||||
The encryption service (`encryption.ts`) handles all cryptographic operations in the browser:
|
||||
|
||||
### Encrypting a File
|
||||
|
||||
```typescript
|
||||
import { encryptFile, generateSalt } from '../services/encryption'
|
||||
|
||||
const file = document.getElementById('fileInput').files[0]
|
||||
const passphrase = 'user-secret-password'
|
||||
|
||||
const encrypted = await encryptFile(file, passphrase)
|
||||
// Result:
|
||||
// {
|
||||
// encryptedData: ArrayBuffer,
|
||||
// keyHash: string, // SHA-256 hash for verification
|
||||
// salt: string, // Hex-encoded salt
|
||||
// iv: string // Hex-encoded initialization vector
|
||||
// }
|
||||
```
|
||||
|
||||
### Decrypting Content
|
||||
|
||||
```typescript
|
||||
import { decryptText, verifyPassphrase } from '../services/encryption'
|
||||
|
||||
// First verify the passphrase
|
||||
const isValid = await verifyPassphrase(passphrase, salt, expectedKeyHash)
|
||||
|
||||
if (isValid) {
|
||||
const decrypted = await decryptText(encryptedBase64, passphrase, salt)
|
||||
}
|
||||
```
|
||||
|
||||
## Backend API Usage
|
||||
|
||||
### Upload an Erwartungshorizont
|
||||
|
||||
```python
|
||||
# The upload endpoint accepts FormData with:
|
||||
# - file: encrypted binary blob
|
||||
# - metadata_json: JSON string with metadata
|
||||
|
||||
POST /api/v1/eh/upload
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
{
|
||||
"file": <encrypted_blob>,
|
||||
"metadata_json": {
|
||||
"metadata": {
|
||||
"title": "Deutsch LK 2025",
|
||||
"subject": "deutsch",
|
||||
"niveau": "eA",
|
||||
"year": 2025,
|
||||
"aufgaben_nummer": "Aufgabe 1"
|
||||
},
|
||||
"encryption_key_hash": "abc123...",
|
||||
"salt": "def456...",
|
||||
"rights_confirmed": true,
|
||||
"original_filename": "erwartungshorizont.pdf"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Index for RAG
|
||||
|
||||
```python
|
||||
POST /api/v1/eh/{eh_id}/index
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"passphrase": "user-secret-password"
|
||||
}
|
||||
```
|
||||
|
||||
The backend will:
|
||||
1. Verify the passphrase against stored key hash
|
||||
2. Decrypt the file
|
||||
3. Extract text from PDF
|
||||
4. Chunk the text (1000 chars, 200 overlap)
|
||||
5. Generate OpenAI embeddings
|
||||
6. Re-encrypt each chunk
|
||||
7. Index in Qdrant with tenant filter
|
||||
|
||||
### RAG Query
|
||||
|
||||
```python
|
||||
POST /api/v1/eh/rag-query
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query_text": "Wie sollte die Einleitung strukturiert sein?",
|
||||
"passphrase": "user-secret-password",
|
||||
"subject": "deutsch", # Optional filter
|
||||
"limit": 5 # Max results
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"context": "Die Einleitung sollte...",
|
||||
"sources": [
|
||||
{
|
||||
"text": "Die Einleitung sollte...",
|
||||
"eh_id": "uuid",
|
||||
"eh_title": "Deutsch LK 2025",
|
||||
"chunk_index": 2,
|
||||
"score": 0.89
|
||||
}
|
||||
],
|
||||
"query": "Wie sollte die Einleitung strukturiert sein?"
|
||||
}
|
||||
```
|
||||
|
||||
## Key Sharing Implementation
|
||||
|
||||
### Invitation Flow (Recommended)
|
||||
|
||||
The invitation flow provides a two-phase sharing process: Invite -> Accept
|
||||
|
||||
```typescript
|
||||
import { ehApi } from '../services/api'
|
||||
|
||||
// 1. First examiner sends invitation to second examiner
|
||||
const invitation = await ehApi.inviteToEH(ehId, {
|
||||
invitee_email: 'zweitkorrektor@school.de',
|
||||
role: 'second_examiner',
|
||||
klausur_id: 'klausur-uuid', // Optional: link to specific Klausur
|
||||
message: 'Bitte fuer Zweitkorrektur nutzen',
|
||||
expires_in_days: 14 // Default: 14 days
|
||||
})
|
||||
// Returns: { invitation_id, eh_id, invitee_email, role, expires_at, eh_title }
|
||||
|
||||
// 2. Second examiner sees pending invitation
|
||||
const pending = await ehApi.getPendingInvitations()
|
||||
// [{ invitation: {...}, eh: { id, title, subject, niveau, year } }]
|
||||
|
||||
// 3. Second examiner accepts invitation
|
||||
const accepted = await ehApi.acceptInvitation(
|
||||
invitationId,
|
||||
encryptedPassphrase // Passphrase encrypted for recipient
|
||||
)
|
||||
// Returns: { status: 'accepted', share_id, eh_id, role, klausur_id }
|
||||
```
|
||||
|
||||
### Invitation Management
|
||||
|
||||
```typescript
|
||||
// Get invitations sent by current user
|
||||
const sent = await ehApi.getSentInvitations()
|
||||
|
||||
// Decline an invitation (as invitee)
|
||||
await ehApi.declineInvitation(invitationId)
|
||||
|
||||
// Revoke a pending invitation (as inviter)
|
||||
await ehApi.revokeInvitation(invitationId)
|
||||
|
||||
// Get complete access chain for an EH
|
||||
const chain = await ehApi.getAccessChain(ehId)
|
||||
// Returns: { eh_id, eh_title, owner, active_shares, pending_invitations, revoked_shares }
|
||||
```
|
||||
|
||||
### Direct Sharing (Legacy)
|
||||
|
||||
For immediate sharing without invitation:
|
||||
|
||||
```typescript
|
||||
// First examiner shares directly with second examiner
|
||||
await ehApi.shareEH(ehId, {
|
||||
user_id: 'second-examiner-uuid',
|
||||
role: 'second_examiner',
|
||||
encrypted_passphrase: encryptedPassphrase, // Encrypted for recipient
|
||||
passphrase_hint: 'Das uebliche Passwort',
|
||||
klausur_id: 'klausur-uuid' // Optional
|
||||
})
|
||||
```
|
||||
|
||||
### Accessing Shared EH
|
||||
|
||||
```typescript
|
||||
// Second examiner gets shared EH
|
||||
const shared = await ehApi.getSharedWithMe()
|
||||
// [{ eh: {...}, share: {...} }]
|
||||
|
||||
// Query using provided passphrase
|
||||
const result = await ehApi.ragQuery({
|
||||
query_text: 'search query',
|
||||
passphrase: decryptedPassphrase,
|
||||
subject: 'deutsch'
|
||||
})
|
||||
```
|
||||
|
||||
### Revoking Access
|
||||
|
||||
```typescript
|
||||
// List all shares for an EH
|
||||
const shares = await ehApi.listShares(ehId)
|
||||
|
||||
// Revoke a share
|
||||
await ehApi.revokeShare(ehId, shareId)
|
||||
```
|
||||
|
||||
## Klausur Integration
|
||||
|
||||
### Automatic EH Prompt
|
||||
|
||||
The `KorrekturPage` shows an EH upload prompt after the first student work is uploaded:
|
||||
|
||||
```typescript
|
||||
// In KorrekturPage.tsx
|
||||
useEffect(() => {
|
||||
if (
|
||||
currentKlausur?.students.length === 1 &&
|
||||
linkedEHs.length === 0 &&
|
||||
!ehPromptDismissed
|
||||
) {
|
||||
setShowEHPrompt(true)
|
||||
}
|
||||
}, [currentKlausur?.students.length])
|
||||
```
|
||||
|
||||
### Linking EH to Klausur
|
||||
|
||||
```typescript
|
||||
// After EH upload, auto-link to Klausur
|
||||
await ehApi.linkToKlausur(ehId, klausurId)
|
||||
|
||||
// Get linked EH for a Klausur
|
||||
const linked = await klausurEHApi.getLinkedEH(klausurId)
|
||||
```
|
||||
|
||||
## Frontend Components
|
||||
|
||||
### EHUploadWizard Props
|
||||
|
||||
```typescript
|
||||
interface EHUploadWizardProps {
|
||||
onClose: () => void
|
||||
onComplete?: (ehId: string) => void
|
||||
defaultSubject?: string // Pre-fill subject
|
||||
defaultYear?: number // Pre-fill year
|
||||
klausurId?: string // Auto-link after upload
|
||||
}
|
||||
|
||||
// Usage
|
||||
<EHUploadWizard
|
||||
onClose={() => setShowWizard(false)}
|
||||
onComplete={(ehId) => console.log('Uploaded:', ehId)}
|
||||
defaultSubject={klausur.subject}
|
||||
defaultYear={klausur.year}
|
||||
klausurId={klausur.id}
|
||||
/>
|
||||
```
|
||||
|
||||
### Wizard Steps
|
||||
|
||||
1. **file** - PDF file selection with drag & drop
|
||||
2. **metadata** - Form for title, subject, niveau, year
|
||||
3. **rights** - Rights confirmation checkbox
|
||||
4. **encryption** - Passphrase input with strength meter
|
||||
5. **summary** - Review and confirm upload
|
||||
|
||||
## Qdrant Operations
|
||||
|
||||
### Collection Schema
|
||||
|
||||
```python
|
||||
# Collection: bp_eh
|
||||
{
|
||||
"vectors": {
|
||||
"size": 1536, # OpenAI text-embedding-3-small
|
||||
"distance": "Cosine"
|
||||
}
|
||||
}
|
||||
|
||||
# Point payload
|
||||
{
|
||||
"tenant_id": "school-uuid",
|
||||
"eh_id": "eh-uuid",
|
||||
"chunk_index": 0,
|
||||
"encrypted_content": "base64...",
|
||||
"training_allowed": false # ALWAYS false
|
||||
}
|
||||
```
|
||||
|
||||
### Tenant-Isolated Search
|
||||
|
||||
```python
|
||||
from qdrant_service import search_eh
|
||||
|
||||
results = await search_eh(
|
||||
query_embedding=embedding,
|
||||
tenant_id="school-uuid",
|
||||
subject="deutsch",
|
||||
limit=5
|
||||
)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
pytest tests/test_byoeh.py -v
|
||||
```
|
||||
|
||||
### Test Structure
|
||||
|
||||
```python
|
||||
# tests/test_byoeh.py
|
||||
class TestBYOEH:
|
||||
def test_upload_eh(self, client, auth_headers):
|
||||
"""Test EH upload with encryption"""
|
||||
pass
|
||||
|
||||
def test_index_eh(self, client, auth_headers, uploaded_eh):
|
||||
"""Test EH indexing for RAG"""
|
||||
pass
|
||||
|
||||
def test_rag_query(self, client, auth_headers, indexed_eh):
|
||||
"""Test RAG query returns relevant chunks"""
|
||||
pass
|
||||
|
||||
def test_share_eh(self, client, auth_headers, uploaded_eh):
|
||||
"""Test sharing EH with another user"""
|
||||
pass
|
||||
```
|
||||
|
||||
### Frontend Tests
|
||||
|
||||
```typescript
|
||||
// EHUploadWizard.test.tsx
|
||||
describe('EHUploadWizard', () => {
|
||||
it('completes all steps successfully', async () => {
|
||||
// ...
|
||||
})
|
||||
|
||||
it('validates passphrase strength', async () => {
|
||||
// ...
|
||||
})
|
||||
|
||||
it('auto-links to klausur when klausurId provided', async () => {
|
||||
// ...
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| `Passphrase verification failed` | Wrong passphrase | Ask user to re-enter |
|
||||
| `EH not found` | Invalid ID or deleted | Check ID, reload list |
|
||||
| `Access denied` | User not owner/shared | Check permissions |
|
||||
| `Qdrant connection failed` | Service unavailable | Check Qdrant container |
|
||||
|
||||
### Error Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Passphrase verification failed"
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Do's
|
||||
|
||||
- Store key hash, never the key itself
|
||||
- Always filter by tenant_id
|
||||
- Log all access in audit trail
|
||||
- Use HTTPS in production
|
||||
|
||||
### Don'ts
|
||||
|
||||
- Never log passphrase or decrypted content
|
||||
- Never store passphrase in localStorage
|
||||
- Never send passphrase as URL parameter
|
||||
- Never return decrypted content without auth
|
||||
|
||||
## Performance Tips
|
||||
|
||||
### Chunking Configuration
|
||||
|
||||
```python
|
||||
CHUNK_SIZE = 1000 # Characters per chunk
|
||||
CHUNK_OVERLAP = 200 # Overlap for context continuity
|
||||
```
|
||||
|
||||
### Embedding Batching
|
||||
|
||||
```python
|
||||
# Generate embeddings in batches of 20
|
||||
EMBEDDING_BATCH_SIZE = 20
|
||||
```
|
||||
|
||||
### Qdrant Optimization
|
||||
|
||||
```python
|
||||
# Use HNSW index for fast approximate search
|
||||
# Collection is automatically optimized on creation
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
```python
|
||||
import logging
|
||||
logging.getLogger('byoeh').setLevel(logging.DEBUG)
|
||||
```
|
||||
|
||||
### Check Qdrant Status
|
||||
|
||||
```bash
|
||||
curl http://localhost:6333/collections/bp_eh
|
||||
```
|
||||
|
||||
### Verify Encryption
|
||||
|
||||
```typescript
|
||||
import { isEncryptionSupported } from '../services/encryption'
|
||||
|
||||
if (!isEncryptionSupported()) {
|
||||
console.error('Web Crypto API not available')
|
||||
}
|
||||
```
|
||||
|
||||
## Migration Notes
|
||||
|
||||
### From v1.0 to v1.1
|
||||
|
||||
1. Added key sharing system
|
||||
2. Added Klausur linking
|
||||
3. EH prompt after student upload
|
||||
|
||||
No database migrations required - all data structures are additive.
|
||||
227
docs-src/services/klausur-service/NiBiS-Ingestion-Pipeline.md
Normal file
227
docs-src/services/klausur-service/NiBiS-Ingestion-Pipeline.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# NiBiS Ingestion Pipeline
|
||||
|
||||
## Overview
|
||||
|
||||
Die NiBiS Ingestion Pipeline verarbeitet Abitur-Erwartungshorizonte aus Niedersachsen und indexiert sie in Qdrant für RAG-basierte Klausurkorrektur.
|
||||
|
||||
## Unterstützte Daten
|
||||
|
||||
### Verzeichnisse
|
||||
|
||||
| Verzeichnis | Jahre | Namenskonvention |
|
||||
|-------------|-------|------------------|
|
||||
| `docs/za-download` | 2024, 2025 | `{Jahr}_{Fach}_{niveau}_{Nr}_EWH.pdf` |
|
||||
| `docs/za-download-2` | 2016 | `{Jahr}{Fach}{Niveau}Lehrer/{Jahr}{Fach}{Niveau}A{Nr}L.pdf` |
|
||||
| `docs/za-download-3` | 2017 | `{Jahr}{Fach}{Niveau}Lehrer/{Jahr}{Fach}{Niveau}A{Nr}L.pdf` |
|
||||
|
||||
### Dokumenttypen
|
||||
|
||||
- **EWH** - Erwartungshorizont (Hauptziel)
|
||||
- **Aufgabe** - Prüfungsaufgaben
|
||||
- **Material** - Zusatzmaterialien
|
||||
- **GBU** - Gefährdungsbeurteilung (Chemie/Biologie)
|
||||
- **Bewertungsbogen** - Standardisierte Bewertungsbögen
|
||||
|
||||
### Fächer
|
||||
|
||||
Deutsch, Englisch, Mathematik, Informatik, Biologie, Chemie, Physik, Geschichte, Erdkunde, Kunst, Musik, Sport, Latein, Griechisch, Französisch, Spanisch, Katholische Religion, Evangelische Religion, Werte und Normen, BRC, BVW, Gesundheit-Pflege
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ NiBiS Ingestion Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. ZIP Extraction │
|
||||
│ └── Entpackt 2024.zip, 2025.zip, etc. │
|
||||
│ │
|
||||
│ 2. Document Discovery │
|
||||
│ ├── Parst alte Namenskonvention (2016/2017) │
|
||||
│ └── Parst neue Namenskonvention (2024/2025) │
|
||||
│ │
|
||||
│ 3. PDF Processing │
|
||||
│ ├── Text-Extraktion (PyPDF2) │
|
||||
│ └── Chunking (1000 chars, 200 overlap) │
|
||||
│ │
|
||||
│ 4. Embedding Generation │
|
||||
│ └── OpenAI text-embedding-3-small (1536 dim) │
|
||||
│ │
|
||||
│ 5. Qdrant Indexing │
|
||||
│ └── Collection: bp_nibis_eh │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Verwendung
|
||||
|
||||
### Via API (empfohlen)
|
||||
|
||||
```bash
|
||||
# 1. Vorschau der verfügbaren Dokumente
|
||||
curl http://localhost:8086/api/v1/admin/nibis/discover
|
||||
|
||||
# 2. ZIP-Dateien entpacken
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/extract-zips
|
||||
|
||||
# 3. Ingestion starten
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/ingest \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"ewh_only": true}'
|
||||
|
||||
# 4. Status prüfen
|
||||
curl http://localhost:8086/api/v1/admin/nibis/status
|
||||
|
||||
# 5. Semantische Suche testen
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "Analyse literarischer Texte", "subject": "Deutsch", "limit": 5}'
|
||||
```
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
# Dry-Run (nur analysieren)
|
||||
cd klausur-service/backend
|
||||
python nibis_ingestion.py --dry-run
|
||||
|
||||
# Vollständige Ingestion
|
||||
python nibis_ingestion.py
|
||||
|
||||
# Nur bestimmtes Jahr
|
||||
python nibis_ingestion.py --year 2024
|
||||
|
||||
# Nur bestimmtes Fach
|
||||
python nibis_ingestion.py --subject Deutsch
|
||||
|
||||
# Manifest erstellen
|
||||
python nibis_ingestion.py --manifest /tmp/nibis_manifest.json
|
||||
```
|
||||
|
||||
### Via Shell Script
|
||||
|
||||
```bash
|
||||
./klausur-service/scripts/run_nibis_ingestion.sh --dry-run
|
||||
./klausur-service/scripts/run_nibis_ingestion.sh --year 2024 --subject Deutsch
|
||||
```
|
||||
|
||||
## Qdrant Schema
|
||||
|
||||
### Collection: `bp_nibis_eh`
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "nibis_2024_deutsch_ea_1_abc123_chunk_0",
|
||||
"vector": [1536 dimensions],
|
||||
"payload": {
|
||||
"doc_id": "nibis_2024_deutsch_ea_1_abc123",
|
||||
"chunk_index": 0,
|
||||
"text": "Der Erwartungshorizont...",
|
||||
"year": 2024,
|
||||
"subject": "Deutsch",
|
||||
"niveau": "eA",
|
||||
"task_number": 1,
|
||||
"doc_type": "EWH",
|
||||
"bundesland": "NI",
|
||||
"variant": null,
|
||||
"source": "nibis",
|
||||
"training_allowed": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Methode | Endpoint | Beschreibung |
|
||||
|---------|----------|--------------|
|
||||
| GET | `/api/v1/admin/nibis/status` | Ingestion-Status |
|
||||
| POST | `/api/v1/admin/nibis/extract-zips` | ZIP-Dateien entpacken |
|
||||
| GET | `/api/v1/admin/nibis/discover` | Dokumente finden |
|
||||
| POST | `/api/v1/admin/nibis/ingest` | Ingestion starten |
|
||||
| POST | `/api/v1/admin/nibis/search` | Semantische Suche |
|
||||
| GET | `/api/v1/admin/nibis/stats` | Statistiken |
|
||||
| GET | `/api/v1/admin/nibis/collections` | Qdrant Collections |
|
||||
| DELETE | `/api/v1/admin/nibis/collection` | Collection löschen |
|
||||
|
||||
## Erweiterung für andere Bundesländer
|
||||
|
||||
Die Pipeline ist so designed, dass sie leicht erweitert werden kann:
|
||||
|
||||
### 1. Neues Bundesland hinzufügen
|
||||
|
||||
```python
|
||||
# In nibis_ingestion.py
|
||||
|
||||
# Bundesland-Code (ISO 3166-2:DE)
|
||||
BUNDESLAND_CODES = {
|
||||
"NI": "Niedersachsen",
|
||||
"BE": "Berlin",
|
||||
"BY": "Bayern",
|
||||
# ...
|
||||
}
|
||||
|
||||
# Parsing-Funktion für neues Format
|
||||
def parse_filename_berlin(filename: str, file_path: Path) -> Optional[Dict]:
|
||||
# Berlin-spezifische Namenskonvention
|
||||
pass
|
||||
```
|
||||
|
||||
### 2. Neues Verzeichnis registrieren
|
||||
|
||||
```python
|
||||
# docs/za-download-berlin/ hinzufügen
|
||||
ZA_DOWNLOAD_DIRS = [
|
||||
"za-download",
|
||||
"za-download-2",
|
||||
"za-download-3",
|
||||
"za-download-berlin", # NEU
|
||||
]
|
||||
```
|
||||
|
||||
### 3. Dokumenttyp-Erweiterung
|
||||
|
||||
Für Zeugnisgeneration oder andere Dokumenttypen:
|
||||
|
||||
```python
|
||||
DOC_TYPES = {
|
||||
"EWH": "Erwartungshorizont",
|
||||
"ZEUGNIS_VORLAGE": "Zeugnisvorlage",
|
||||
"NOTENSPIEGEL": "Notenspiegel",
|
||||
"BEMERKUNG": "Bemerkungstexte",
|
||||
}
|
||||
```
|
||||
|
||||
## Rechtliche Hinweise
|
||||
|
||||
- NiBiS-Daten sind unter den [NiBiS-Nutzungsbedingungen](https://nibis.de) frei nutzbar
|
||||
- `training_allowed: true` - Strukturelles Wissen darf für KI-Training genutzt werden
|
||||
- Für Lehrer-eigene Erwartungshorizonte (BYOEH) gilt: `training_allowed: false`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Qdrant nicht erreichbar
|
||||
|
||||
```bash
|
||||
# Prüfen ob Qdrant läuft
|
||||
curl http://localhost:6333/health
|
||||
|
||||
# Docker starten
|
||||
docker-compose up -d qdrant
|
||||
```
|
||||
|
||||
### OpenAI API Fehler
|
||||
|
||||
```bash
|
||||
# API Key setzen
|
||||
export OPENAI_API_KEY=sk-...
|
||||
```
|
||||
|
||||
### PDF-Extraktion fehlgeschlagen
|
||||
|
||||
Einige PDFs können problematisch sein (gescannte Dokumente ohne OCR). Diese werden übersprungen und im Error-Log protokolliert.
|
||||
|
||||
## Performance
|
||||
|
||||
- ~500-1000 Chunks pro Minute (abhängig von OpenAI API)
|
||||
- ~2-3 GB Qdrant Storage für alle NiBiS-Daten (2016-2025)
|
||||
- Embeddings werden nur einmal generiert (idempotent via Hash)
|
||||
@@ -1,366 +1,235 @@
|
||||
# OCR Compare Tool - Dokumentation
|
||||
# OCR Compare - Block Review Feature
|
||||
|
||||
**Status:** Produktiv
|
||||
**Version:** 4.0
|
||||
**Letzte Aktualisierung:** 2026-02-08
|
||||
**URL:** https://macmini:3002/ai/ocr-compare
|
||||
|
||||
---
|
||||
|
||||
## Übersicht
|
||||
## Uebersicht
|
||||
|
||||
Das OCR Compare Tool ermöglicht die automatische Analyse von gescannten Vokabeltabellen mit:
|
||||
- Grid-basierter OCR-Erkennung
|
||||
- Automatischer Spalten-Erkennung (Englisch/Deutsch/Beispiel)
|
||||
- mm-Koordinatensystem für präzise Positionierung
|
||||
- Deskew-Korrektur für schiefe Scans
|
||||
- Export zum Worksheet-Editor
|
||||
Das OCR Compare Tool ermoeglicht den Vergleich verschiedener OCR-Methoden zur Texterkennung aus gescannten Dokumenten. Die Block Review Funktion erlaubt eine zellenweise Ueberpruefung und Korrektur der OCR-Ergebnisse.
|
||||
|
||||
### Hauptfunktionen
|
||||
|
||||
| Feature | Beschreibung |
|
||||
|---------|--------------|
|
||||
| **Multi-Method OCR** | Vergleich von Vision LLM, Tesseract, PaddleOCR und Claude Vision |
|
||||
| **Grid Detection** | Automatische Erkennung von Tabellenstrukturen |
|
||||
| **Block Review** | Zellenweise Ueberpruefung und Korrektur |
|
||||
| **Session Persistence** | Sessions bleiben bei Seitenwechsel erhalten |
|
||||
| **High-Resolution Display** | Hochaufloesende Bildanzeige (zoom=2.0) |
|
||||
|
||||
---
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (admin-v2) │
|
||||
│ /admin-v2/app/(admin)/ai/ocr-compare/page.tsx │
|
||||
│ - Bild-Upload │
|
||||
│ - Grid-Overlay Visualisierung │
|
||||
│ - Cell-Edit Popup │
|
||||
│ - Export zum Worksheet-Editor │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ admin-v2 (Next.js) │
|
||||
│ /app/(admin)/ai/ocr-compare/page.tsx │
|
||||
│ - PDF Upload & Session Management │
|
||||
│ - Grid Visualization mit SVG Overlay │
|
||||
│ - Block Review Panel │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI) │
|
||||
│ Port 8086 - /klausur-service/backend/ │
|
||||
│ - /api/v1/ocr/analyze-grid (Grid-Analyse) │
|
||||
│ - services/grid_detection_service.py (v4) │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ PaddleOCR Service │
|
||||
│ Port 8088 - OCR-Erkennung │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI) │
|
||||
│ Port 8086 │
|
||||
│ - /api/v1/vocab/sessions (Session CRUD) │
|
||||
│ - /api/v1/vocab/sessions/{id}/pdf-thumbnail (Bild-Export) │
|
||||
│ - /api/v1/vocab/sessions/{id}/detect-grid (Grid-Erkennung) │
|
||||
│ - /api/v1/vocab/sessions/{id}/run-ocr (OCR-Ausfuehrung) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Features (Version 4)
|
||||
## Komponenten
|
||||
|
||||
### 1. mm-Koordinatensystem
|
||||
### GridOverlay
|
||||
|
||||
Alle Koordinaten werden im A4-Format (210x297mm) ausgegeben:
|
||||
SVG-Overlay zur Visualisierung der erkannten Grid-Struktur.
|
||||
|
||||
| Feld | Beschreibung |
|
||||
|------|--------------|
|
||||
| `x_mm` | X-Position in mm (0-210) |
|
||||
| `y_mm` | Y-Position in mm (0-297) |
|
||||
| `width_mm` | Breite in mm |
|
||||
| `height_mm` | Höhe in mm |
|
||||
**Datei:** `/admin-v2/components/ocr/GridOverlay.tsx`
|
||||
|
||||
**Konvertierung:**
|
||||
```typescript
|
||||
// Prozent zu mm
|
||||
const x_mm = (x_percent / 100) * 210
|
||||
const y_mm = (y_percent / 100) * 297
|
||||
|
||||
// mm zu Pixel (für Canvas bei 96 DPI)
|
||||
const MM_TO_PX = 3.7795275591
|
||||
const x_px = x_mm * MM_TO_PX
|
||||
```
|
||||
|
||||
### 2. Deskew-Korrektur
|
||||
|
||||
Automatische Ausrichtung schiefer Scans basierend auf der ersten Spalte:
|
||||
|
||||
1. **Erkennung:** Alle Wörter in der ersten Spalte (x < 33%) werden analysiert
|
||||
2. **Berechnung:** Lineare Regression auf den linken Kanten
|
||||
3. **Korrektur:** Rotation aller Koordinaten um den berechneten Winkel
|
||||
4. **Limitierung:** Maximal ±5° Korrektur
|
||||
|
||||
```python
|
||||
# Deskew-Winkel im Response
|
||||
{
|
||||
"deskew_angle_deg": -1.2, # Negativer Wert = nach links geneigt
|
||||
...
|
||||
interface GridOverlayProps {
|
||||
grid: GridData
|
||||
imageUrl?: string
|
||||
onCellClick?: (cell: GridCell) => void
|
||||
selectedCell?: GridCell | null
|
||||
showEmpty?: boolean // Leere Zellen anzeigen
|
||||
showLabels?: boolean // Spaltenlabels (EN, DE, Ex)
|
||||
showNumbers?: boolean // Block-Nummern anzeigen
|
||||
highlightedBlockNumber?: number | null // Hervorgehobener Block
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Spalten-Erkennung mit 1mm Margin
|
||||
**Zellenstatus-Farben:**
|
||||
|
||||
Spalten werden automatisch erkannt und beginnen 1mm vor dem ersten Wort:
|
||||
| Status | Farbe | Bedeutung |
|
||||
|--------|-------|-----------|
|
||||
| `recognized` | Gruen | Text erfolgreich erkannt |
|
||||
| `problematic` | Orange | Niedriger Confidence-Wert |
|
||||
| `manual` | Blau | Manuell korrigiert |
|
||||
| `empty` | Transparent | Keine Erkennung |
|
||||
|
||||
```json
|
||||
{
|
||||
"detected_columns": [
|
||||
{
|
||||
"column_type": "english",
|
||||
"x_start": 9.52, // Prozent
|
||||
"x_end": 35.0,
|
||||
"x_start_mm": 20.0, // mm (1mm vor erstem Wort)
|
||||
"x_end_mm": 73.5,
|
||||
"word_count": 15
|
||||
},
|
||||
{
|
||||
"column_type": "german",
|
||||
"x_start_mm": 74.0,
|
||||
"x_end_mm": 140.0,
|
||||
"word_count": 15
|
||||
},
|
||||
{
|
||||
"column_type": "example",
|
||||
"x_start_mm": 141.0,
|
||||
"x_end_mm": 200.0,
|
||||
"word_count": 12
|
||||
}
|
||||
]
|
||||
### BlockReviewPanel
|
||||
|
||||
Panel zur Block-fuer-Block Ueberpruefung der OCR-Ergebnisse.
|
||||
|
||||
**Datei:** `/admin-v2/components/ocr/BlockReviewPanel.tsx`
|
||||
|
||||
```typescript
|
||||
interface BlockReviewPanelProps {
|
||||
grid: GridData
|
||||
methodResults: Record<string, { vocabulary: Array<...> }>
|
||||
currentBlockNumber: number
|
||||
onBlockChange: (blockNumber: number) => void
|
||||
onApprove: (blockNumber: number, methodId: string, text: string) => void
|
||||
onCorrect: (blockNumber: number, correctedText: string) => void
|
||||
onSkip: (blockNumber: number) => void
|
||||
reviewData: Record<number, BlockReviewData>
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Zellen-Status
|
||||
**Review-Status:**
|
||||
|
||||
| Status | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `empty` | Keine OCR-Erkennung in dieser Zelle |
|
||||
| `recognized` | Text erkannt mit Confidence ≥ 50% |
|
||||
| `problematic` | Text erkannt mit Confidence < 50% |
|
||||
| `manual` | Manuell korrigiert |
|
||||
| `pending` | Noch nicht ueberprueft |
|
||||
| `approved` | OCR-Ergebnis akzeptiert |
|
||||
| `corrected` | Manuell korrigiert |
|
||||
| `skipped` | Uebersprungen |
|
||||
|
||||
---
|
||||
### BlockReviewSummary
|
||||
|
||||
## API-Endpoints
|
||||
Zusammenfassung aller ueberprueften Bloecke.
|
||||
|
||||
### POST /api/v1/ocr/analyze-grid
|
||||
|
||||
Analysiert ein Bild und erkennt die Vokabeltabellen-Struktur.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"image_base64": "data:image/jpeg;base64,...",
|
||||
"min_confidence": 0.5,
|
||||
"padding": 2.0
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"cells": [
|
||||
[
|
||||
{
|
||||
"row": 0,
|
||||
"col": 0,
|
||||
"x": 10.0,
|
||||
"y": 15.0,
|
||||
"width": 25.0,
|
||||
"height": 3.0,
|
||||
"x_mm": 21.0,
|
||||
"y_mm": 44.55,
|
||||
"width_mm": 52.5,
|
||||
"height_mm": 8.91,
|
||||
"text": "house",
|
||||
"confidence": 0.95,
|
||||
"status": "recognized",
|
||||
"column_type": "english",
|
||||
"logical_row": 0,
|
||||
"logical_col": 0
|
||||
}
|
||||
]
|
||||
],
|
||||
"detected_columns": [...],
|
||||
"page_dimensions": {
|
||||
"width_mm": 210.0,
|
||||
"height_mm": 297.0,
|
||||
"format": "A4"
|
||||
},
|
||||
"deskew_angle_deg": -0.5,
|
||||
"statistics": {
|
||||
"total_cells": 45,
|
||||
"recognized_cells": 42,
|
||||
"problematic_cells": 3,
|
||||
"empty_cells": 0
|
||||
}
|
||||
```typescript
|
||||
interface BlockReviewSummaryProps {
|
||||
reviewData: Record<number, BlockReviewData>
|
||||
totalBlocks: number
|
||||
onBlockClick: (blockNumber: number) => void
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Frontend-Komponenten
|
||||
## OCR-Methoden
|
||||
|
||||
### GridOverlay.tsx
|
||||
|
||||
Zeigt die erkannten Zellen als farbiges Overlay über dem Bild.
|
||||
|
||||
**Props:**
|
||||
```typescript
|
||||
interface GridOverlayProps {
|
||||
cells: GridCell[][]
|
||||
imageWidth: number
|
||||
imageHeight: number
|
||||
showLabels?: boolean
|
||||
onCellClick?: (cell: GridCell) => void
|
||||
}
|
||||
```
|
||||
|
||||
**Farbkodierung:**
|
||||
- Grün: `recognized` (gut erkannt)
|
||||
- Gelb: `problematic` (niedrige Confidence)
|
||||
- Grau: `empty`
|
||||
- Blau: `manual` (manuell korrigiert)
|
||||
|
||||
### CellEditPopup.tsx
|
||||
|
||||
Popup zum Bearbeiten einer Zelle.
|
||||
|
||||
**Features:**
|
||||
- Text bearbeiten
|
||||
- Spaltentyp ändern (English/German/Example)
|
||||
- Confidence anzeigen
|
||||
- mm-Koordinaten anzeigen
|
||||
- Keyboard-Shortcuts: Ctrl+Enter (Speichern), Esc (Abbrechen)
|
||||
| ID | Name | Beschreibung |
|
||||
|----|------|--------------|
|
||||
| `vision_llm` | Vision LLM | Qwen VL 32B ueber Ollama |
|
||||
| `tesseract` | Tesseract | Klassisches OCR (lokal) |
|
||||
| `paddleocr` | PaddleOCR | PaddleOCR Engine |
|
||||
| `claude_vision` | Claude Vision | Anthropic Claude Vision API |
|
||||
|
||||
---
|
||||
|
||||
## Worksheet-Editor Integration
|
||||
## API Endpoints
|
||||
|
||||
### Export
|
||||
### Session Management
|
||||
|
||||
Der "Zum Editor exportieren" Button speichert die OCR-Daten in localStorage:
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/vocab/upload-pdf-info` | PDF hochladen |
|
||||
| GET | `/api/v1/vocab/sessions/{id}` | Session-Details |
|
||||
| DELETE | `/api/v1/vocab/sessions/{id}` | Session loeschen |
|
||||
|
||||
```typescript
|
||||
interface OCRExportData {
|
||||
version: '1.0'
|
||||
source: 'ocr-compare'
|
||||
exported_at: string
|
||||
session_id: string
|
||||
page_number: number
|
||||
page_dimensions: {
|
||||
width_mm: number
|
||||
height_mm: number
|
||||
format: string
|
||||
}
|
||||
words: OCRWord[]
|
||||
detected_columns: DetectedColumn[]
|
||||
### Bildexport
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/vocab/sessions/{id}/pdf-thumbnail/{page}` | Thumbnail (zoom=0.5) |
|
||||
| GET | `/api/v1/vocab/sessions/{id}/pdf-thumbnail/{page}?hires=true` | High-Res (zoom=2.0) |
|
||||
|
||||
### Grid-Erkennung
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/vocab/sessions/{id}/detect-grid` | Grid-Struktur erkennen |
|
||||
| POST | `/api/v1/vocab/sessions/{id}/run-ocr` | OCR auf Grid ausfuehren |
|
||||
|
||||
---
|
||||
|
||||
## Session Persistence
|
||||
|
||||
Die aktive Session wird im localStorage gespeichert:
|
||||
|
||||
```javascript
|
||||
// Speichern
|
||||
localStorage.setItem('ocr-compare-active-session', sessionId)
|
||||
|
||||
// Wiederherstellen beim Seitenladen
|
||||
const lastSessionId = localStorage.getItem('ocr-compare-active-session')
|
||||
if (lastSessionId) {
|
||||
// Session-Daten laden
|
||||
}
|
||||
```
|
||||
|
||||
**localStorage Keys:**
|
||||
- `ocr_export_{session_id}_{page_number}`: Export-Daten
|
||||
- `ocr_export_latest`: Referenz zum neuesten Export
|
||||
---
|
||||
|
||||
### Import im Worksheet-Editor
|
||||
## Block Review Workflow
|
||||
|
||||
1. Öffnen Sie den Worksheet-Editor: https://macmini/worksheet-editor
|
||||
2. Klicken Sie auf den OCR-Import Button (grünes Icon)
|
||||
3. Die Wörter werden auf dem Canvas platziert
|
||||
1. **PDF hochladen** - Dokument in das System laden
|
||||
2. **Grid erkennen** - Automatische Tabellenerkennung
|
||||
3. **OCR ausfuehren** - Alle Methoden parallel ausfuehren
|
||||
4. **Block Review starten** - "Block Review" Button klicken
|
||||
5. **Bloecke pruefen** - Fuer jeden Block:
|
||||
- Ergebnisse aller Methoden vergleichen
|
||||
- Bestes Ergebnis waehlen oder manuell korrigieren
|
||||
6. **Zusammenfassung** - Uebersicht der Korrekturen
|
||||
|
||||
---
|
||||
|
||||
## High-Resolution Bilder
|
||||
|
||||
Fuer die Anzeige werden hochaufloesende Bilder verwendet:
|
||||
|
||||
**Konvertierung mm → Pixel:**
|
||||
```typescript
|
||||
const MM_TO_PX = 3.7795275591
|
||||
const x_px = word.x_mm * MM_TO_PX
|
||||
const y_px = word.y_mm * MM_TO_PX
|
||||
// Thumbnail URL mit High-Resolution Parameter
|
||||
const imageUrl = `${KLAUSUR_API}/api/v1/vocab/sessions/${sessionId}/pdf-thumbnail/${pageNumber}?hires=true`
|
||||
```
|
||||
|
||||
| Parameter | Zoom | Verwendung |
|
||||
|-----------|------|------------|
|
||||
| Ohne `hires` | 0.5 | Vorschau/Thumbnails |
|
||||
| Mit `hires=true` | 2.0 | Anzeige/OCR |
|
||||
|
||||
---
|
||||
|
||||
## Dateien
|
||||
|
||||
### Backend (klausur-service)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `services/grid_detection_service.py` | Grid-Erkennung v4 mit Deskew |
|
||||
| `tests/test_grid_detection.py` | Unit Tests |
|
||||
|
||||
### Frontend (admin-v2)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `app/(admin)/ai/ocr-compare/page.tsx` | Haupt-UI |
|
||||
| `components/ocr/GridOverlay.tsx` | Grid-Visualisierung |
|
||||
| `components/ocr/CellEditPopup.tsx` | Zellen-Editor |
|
||||
| `components/ocr/GridOverlay.tsx` | SVG Grid-Overlay |
|
||||
| `components/ocr/BlockReviewPanel.tsx` | Review-Panel |
|
||||
| `components/ocr/CellCorrectionDialog.tsx` | Korrektur-Dialog |
|
||||
| `components/ocr/index.ts` | Exports |
|
||||
|
||||
### Frontend (studio-v2)
|
||||
### Backend (klausur-service)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `lib/worksheet-editor/ocr-integration.ts` | OCR Import/Export Utility |
|
||||
| `app/worksheet-editor/page.tsx` | Editor mit OCR-Import |
|
||||
| `components/worksheet-editor/EditorToolbar.tsx` | Toolbar mit OCR-Button |
|
||||
| `vocab_worksheet_api.py` | API-Router |
|
||||
| `hybrid_vocab_extractor.py` | OCR-Extraktion |
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
## Aenderungshistorie
|
||||
|
||||
```bash
|
||||
# 1. Backend synchronisieren
|
||||
scp grid_detection_service.py macmini:.../klausur-service/backend/services/
|
||||
|
||||
# 2. Tests synchronisieren
|
||||
scp test_grid_detection.py macmini:.../klausur-service/backend/tests/
|
||||
|
||||
# 3. klausur-service neu bauen
|
||||
ssh macmini "docker compose build --no-cache klausur-service"
|
||||
|
||||
# 4. Container starten
|
||||
ssh macmini "docker compose up -d klausur-service"
|
||||
|
||||
# 5. Frontend (admin-v2) deployen
|
||||
ssh macmini "docker compose build --no-cache admin-v2 && docker compose up -d admin-v2"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verwendete Open-Source-Bibliotheken
|
||||
|
||||
| Bibliothek | Version | Lizenz | Verwendung |
|
||||
|------------|---------|--------|------------|
|
||||
| NumPy | ≥1.24 | BSD-3-Clause | Deskew-Berechnung (polyfit) |
|
||||
| OpenCV | ≥4.8 | Apache-2.0 | Bildverarbeitung (optional) |
|
||||
| PaddleOCR | 2.7 | Apache-2.0 | OCR-Erkennung |
|
||||
| Fabric.js | 6.x | MIT | Canvas-Rendering (Frontend) |
|
||||
|
||||
---
|
||||
|
||||
## Fehlerbehandlung
|
||||
|
||||
### Häufige Probleme
|
||||
|
||||
| Problem | Lösung |
|
||||
|---------|--------|
|
||||
| "Grid analysieren" lädt nicht | klausur-service Container prüfen |
|
||||
| Keine Zellen erkannt | Min. Confidence reduzieren |
|
||||
| Falsche Spalten-Zuordnung | Manuell im CellEditPopup korrigieren |
|
||||
| Export funktioniert nicht | Browser-Console auf Fehler prüfen |
|
||||
|
||||
### Logging
|
||||
|
||||
```bash
|
||||
# klausur-service Logs
|
||||
docker logs breakpilot-pwa-klausur-service --tail=100
|
||||
|
||||
# Grid Detection spezifisch
|
||||
docker logs breakpilot-pwa-klausur-service 2>&1 | grep "grid_detection"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Änderungshistorie
|
||||
|
||||
| Version | Datum | Änderungen |
|
||||
|---------|-------|------------|
|
||||
| 4.0 | 2026-02-08 | Deskew-Korrektur, 1mm Column Margin |
|
||||
| 3.0 | 2026-02-07 | mm-Koordinatensystem |
|
||||
| 2.0 | 2026-02-06 | Spalten-Erkennung |
|
||||
| 1.0 | 2026-02-05 | Initiale Implementierung |
|
||||
|
||||
---
|
||||
|
||||
## Referenzen
|
||||
|
||||
- [Worksheet-Editor Architektur](Worksheet-Editor-Architecture.md)
|
||||
- [OCR Labeling Spec](OCR-Labeling-Spec.md)
|
||||
- [SBOM](/infrastructure/sbom)
|
||||
| Datum | Aenderung |
|
||||
|-------|-----------|
|
||||
| 2026-02-08 | Block Review Feature hinzugefuegt |
|
||||
| 2026-02-08 | High-Resolution Bilder aktiviert |
|
||||
| 2026-02-08 | Session Persistence implementiert |
|
||||
| 2026-02-07 | Grid Detection und Multi-Method OCR |
|
||||
|
||||
445
docs-src/services/klausur-service/OCR-Labeling-Spec.md
Normal file
445
docs-src/services/klausur-service/OCR-Labeling-Spec.md
Normal file
@@ -0,0 +1,445 @@
|
||||
# OCR-Labeling System Spezifikation
|
||||
|
||||
**Version:** 1.1.0
|
||||
**Status:** In Produktion (Mac Mini)
|
||||
|
||||
## Übersicht
|
||||
|
||||
Das OCR-Labeling System ermöglicht das Erstellen von Trainingsdaten für Handschrift-OCR-Modelle aus eingescannten Klausuren. Es unterstützt folgende OCR-Modelle:
|
||||
|
||||
| Modell | Beschreibung | Geschwindigkeit | Empfohlen für |
|
||||
|--------|--------------|-----------------|---------------|
|
||||
| **llama3.2-vision:11b** | Vision-LLM (Standard) | Langsam | Handschrift, beste Qualität |
|
||||
| **TrOCR** | Microsoft Transformer | Schnell | Gedruckter Text |
|
||||
| **PaddleOCR + LLM** | Hybrid-Ansatz (NEU) | Sehr schnell (4x) | Gemischte Dokumente |
|
||||
| **Donut** | Document Understanding (NEU) | Mittel | Tabellen, Formulare |
|
||||
| **qwen2.5:14b** | Korrektur-LLM | - | Klausurbewertung |
|
||||
|
||||
### Neue OCR-Optionen (v1.1.0)
|
||||
|
||||
#### PaddleOCR + LLM (Empfohlen für Geschwindigkeit)
|
||||
|
||||
PaddleOCR ist ein zweistufiger Ansatz:
|
||||
1. **PaddleOCR** - Schnelle, präzise Texterkennung mit Bounding-Boxes
|
||||
2. **qwen2.5:14b** - Semantische Strukturierung des erkannten Texts
|
||||
|
||||
**Vorteile:**
|
||||
- 4x schneller als Vision-LLM (~7-15 Sek vs 30-60 Sek pro Seite)
|
||||
- Höhere Genauigkeit bei gedrucktem Text (95-99%)
|
||||
- Weniger Halluzinationen (LLM korrigiert nur, erfindet nicht)
|
||||
- Position-basierte Spaltenerkennung möglich
|
||||
|
||||
**Dateien:**
|
||||
- `/klausur-service/backend/hybrid_vocab_extractor.py` - PaddleOCR Integration
|
||||
|
||||
#### Donut (Document Understanding Transformer)
|
||||
|
||||
Donut ist speziell für strukturierte Dokumente optimiert:
|
||||
- Tabellen und Formulare
|
||||
- Rechnungen und Quittungen
|
||||
- Multi-Spalten-Layouts
|
||||
|
||||
**Dateien:**
|
||||
- `/klausur-service/backend/services/donut_ocr_service.py` - Donut Service
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────┐
|
||||
│ OCR-Labeling System │
|
||||
├──────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────────┐ ┌────────────────────────┐ │
|
||||
│ │ Frontend │◄──►│ Klausur-Service │◄──►│ PostgreSQL │ │
|
||||
│ │ (Next.js) │ │ (FastAPI) │ │ - ocr_labeling_sessions│ │
|
||||
│ │ Port 3000 │ │ Port 8086 │ │ - ocr_labeling_items │ │
|
||||
│ └─────────────┘ └────────┬─────────┘ │ - ocr_training_samples │ │
|
||||
│ │ └────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────┼──────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────┐ ┌─────────┐ ┌───────────────┐ │
|
||||
│ │ MinIO │ │ Ollama │ │ Export Service │ │
|
||||
│ │ (Images) │ │ (OCR) │ │ (Training) │ │
|
||||
│ │ Port 9000 │ │ :11434 │ │ │ │
|
||||
│ └───────────┘ └─────────┘ └───────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Datenmodell
|
||||
|
||||
### PostgreSQL Tabellen
|
||||
|
||||
```sql
|
||||
-- Labeling Sessions (gruppiert zusammengehörige Bilder)
|
||||
CREATE TABLE ocr_labeling_sessions (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
source_type VARCHAR(50) NOT NULL, -- 'klausur', 'handwriting_sample', 'scan'
|
||||
description TEXT,
|
||||
ocr_model VARCHAR(100), -- z.B. 'llama3.2-vision:11b'
|
||||
total_items INTEGER DEFAULT 0,
|
||||
labeled_items INTEGER DEFAULT 0,
|
||||
confirmed_items INTEGER DEFAULT 0,
|
||||
corrected_items INTEGER DEFAULT 0,
|
||||
skipped_items INTEGER DEFAULT 0,
|
||||
teacher_id VARCHAR(100),
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Einzelne Labeling Items (Bild + OCR + Ground Truth)
|
||||
CREATE TABLE ocr_labeling_items (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
session_id VARCHAR(36) REFERENCES ocr_labeling_sessions(id),
|
||||
image_path TEXT NOT NULL, -- MinIO Pfad oder lokaler Pfad
|
||||
image_hash VARCHAR(64), -- SHA256 für Deduplizierung
|
||||
ocr_text TEXT, -- Von LLM erkannter Text
|
||||
ocr_confidence FLOAT, -- Konfidenz (0-1)
|
||||
ocr_model VARCHAR(100),
|
||||
ground_truth TEXT, -- Korrigierter/bestätigter Text
|
||||
status VARCHAR(20) DEFAULT 'pending', -- pending/confirmed/corrected/skipped
|
||||
labeled_by VARCHAR(100),
|
||||
labeled_at TIMESTAMP,
|
||||
label_time_seconds INTEGER,
|
||||
metadata JSONB,
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Exportierte Training Samples
|
||||
CREATE TABLE ocr_training_samples (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
item_id VARCHAR(36) REFERENCES ocr_labeling_items(id),
|
||||
image_path TEXT NOT NULL,
|
||||
ground_truth TEXT NOT NULL,
|
||||
export_format VARCHAR(50) NOT NULL, -- 'generic', 'trocr', 'llama_vision'
|
||||
exported_at TIMESTAMP DEFAULT NOW(),
|
||||
training_batch VARCHAR(100),
|
||||
used_in_training BOOLEAN DEFAULT FALSE
|
||||
);
|
||||
```
|
||||
|
||||
## API Referenz
|
||||
|
||||
Base URL: `http://macmini:8086/api/v1/ocr-label`
|
||||
|
||||
### Sessions
|
||||
|
||||
#### POST /sessions
|
||||
Neue Labeling-Session erstellen.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "Klausur Deutsch 12a Q1",
|
||||
"source_type": "klausur",
|
||||
"description": "Gedichtanalyse Expressionismus",
|
||||
"ocr_model": "llama3.2-vision:11b"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "abc-123-def",
|
||||
"name": "Klausur Deutsch 12a Q1",
|
||||
"source_type": "klausur",
|
||||
"total_items": 0,
|
||||
"labeled_items": 0,
|
||||
"created_at": "2026-01-21T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /sessions
|
||||
Sessions auflisten.
|
||||
|
||||
**Query Parameter:**
|
||||
- `limit` (int, default: 50) - Maximale Anzahl
|
||||
|
||||
#### GET /sessions/{session_id}
|
||||
Einzelne Session abrufen.
|
||||
|
||||
### Upload
|
||||
|
||||
#### POST /sessions/{session_id}/upload
|
||||
Bilder zu einer Session hochladen.
|
||||
|
||||
**Request:** Multipart Form Data
|
||||
- `files` (File[]) - PNG/JPG/PDF Dateien
|
||||
- `run_ocr` (bool, default: true) - OCR direkt ausführen
|
||||
- `metadata` (JSON string) - Optional: Metadaten
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"session_id": "abc-123-def",
|
||||
"uploaded_count": 5,
|
||||
"items": [
|
||||
{
|
||||
"id": "item-1",
|
||||
"filename": "scan_001.png",
|
||||
"image_path": "ocr-labeling/abc-123/item-1.png",
|
||||
"ocr_text": "Die Lösung der Aufgabe...",
|
||||
"ocr_confidence": 0.87,
|
||||
"status": "pending"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Labeling Queue
|
||||
|
||||
#### GET /queue
|
||||
Nächste zu labelnde Items abrufen.
|
||||
|
||||
**Query Parameter:**
|
||||
- `session_id` (str, optional) - Nach Session filtern
|
||||
- `status` (str, default: "pending") - Status-Filter
|
||||
- `limit` (int, default: 10) - Maximale Anzahl
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "item-456",
|
||||
"session_id": "abc-123",
|
||||
"session_name": "Klausur Deutsch",
|
||||
"image_path": "/app/ocr-labeling/abc-123/item-456.png",
|
||||
"image_url": "/api/v1/ocr-label/images/abc-123/item-456.png",
|
||||
"ocr_text": "Erkannter Text...",
|
||||
"ocr_confidence": 0.87,
|
||||
"ground_truth": null,
|
||||
"status": "pending",
|
||||
"metadata": {"page": 1}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Labeling Actions
|
||||
|
||||
#### POST /confirm
|
||||
OCR-Text als korrekt bestätigen.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456",
|
||||
"label_time_seconds": 5
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `ground_truth = ocr_text`, `status = 'confirmed'`
|
||||
|
||||
#### POST /correct
|
||||
Ground Truth korrigieren.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456",
|
||||
"ground_truth": "Korrigierter Text hier",
|
||||
"label_time_seconds": 15
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `ground_truth = <input>`, `status = 'corrected'`
|
||||
|
||||
#### POST /skip
|
||||
Item überspringen (unbrauchbar).
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456"
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `status = 'skipped'` (wird nicht exportiert)
|
||||
|
||||
### Statistiken
|
||||
|
||||
#### GET /stats
|
||||
Labeling-Statistiken abrufen.
|
||||
|
||||
**Query Parameter:**
|
||||
- `session_id` (str, optional) - Für Session-spezifische Stats
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"total_items": 100,
|
||||
"labeled_items": 75,
|
||||
"confirmed_items": 60,
|
||||
"corrected_items": 15,
|
||||
"pending_items": 25,
|
||||
"accuracy_rate": 0.80,
|
||||
"avg_label_time_seconds": 8.5
|
||||
}
|
||||
```
|
||||
|
||||
### Training Export
|
||||
|
||||
#### POST /export
|
||||
Trainingsdaten exportieren.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"export_format": "trocr",
|
||||
"session_id": "abc-123",
|
||||
"batch_id": "batch_20260121"
|
||||
}
|
||||
```
|
||||
|
||||
**Export Formate:**
|
||||
|
||||
| Format | Beschreibung | Output |
|
||||
|--------|--------------|--------|
|
||||
| `generic` | Allgemeines JSONL | `{"id", "image_path", "ground_truth", ...}` |
|
||||
| `trocr` | Microsoft TrOCR | `{"file_name", "text", "id"}` |
|
||||
| `llama_vision` | Llama 3.2 Vision | OpenAI-style Messages mit image_url |
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"export_format": "trocr",
|
||||
"batch_id": "batch_20260121",
|
||||
"exported_count": 75,
|
||||
"export_path": "/app/ocr-exports/trocr/batch_20260121",
|
||||
"manifest_path": "/app/ocr-exports/trocr/batch_20260121/manifest.json",
|
||||
"samples": [...]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /exports
|
||||
Verfügbare Exports auflisten.
|
||||
|
||||
**Query Parameter:**
|
||||
- `export_format` (str, optional) - Nach Format filtern
|
||||
|
||||
## Export Formate im Detail
|
||||
|
||||
### TrOCR Format
|
||||
|
||||
```
|
||||
batch_20260121/
|
||||
├── manifest.json
|
||||
├── train.jsonl
|
||||
└── images/
|
||||
├── item-1.png
|
||||
└── item-2.png
|
||||
```
|
||||
|
||||
**train.jsonl:**
|
||||
```jsonl
|
||||
{"file_name": "images/item-1.png", "text": "Ground truth text", "id": "item-1"}
|
||||
{"file_name": "images/item-2.png", "text": "Another text", "id": "item-2"}
|
||||
```
|
||||
|
||||
### Llama Vision Format
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"id": "item-1",
|
||||
"messages": [
|
||||
{"role": "system", "content": "Du bist ein OCR-Experte für deutsche Handschrift..."},
|
||||
{"role": "user", "content": [
|
||||
{"type": "image_url", "image_url": {"url": "images/item-1.png"}},
|
||||
{"type": "text", "text": "Lies den handgeschriebenen Text in diesem Bild."}
|
||||
]},
|
||||
{"role": "assistant", "content": "Ground truth text"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Generic Format
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"id": "item-1",
|
||||
"image_path": "images/item-1.png",
|
||||
"ground_truth": "Ground truth text",
|
||||
"ocr_text": "OCR recognized text",
|
||||
"ocr_confidence": 0.87,
|
||||
"metadata": {"page": 1, "session": "Deutsch 12a"}
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend Integration
|
||||
|
||||
Die OCR-Labeling UI ist unter `/admin/ocr-labeling` verfügbar.
|
||||
|
||||
### Keyboard Shortcuts
|
||||
|
||||
| Taste | Aktion |
|
||||
|-------|--------|
|
||||
| `Enter` | Bestätigen (OCR korrekt) |
|
||||
| `Tab` | Ins Korrekturfeld springen |
|
||||
| `Escape` | Überspringen |
|
||||
| `←` / `→` | Navigation (Prev/Next) |
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Session erstellen** - Name, Typ, OCR-Modell wählen
|
||||
2. **Bilder hochladen** - Drag & Drop oder File-Browser
|
||||
3. **Labeling durchführen** - Bild + OCR-Text vergleichen
|
||||
- Korrekt → Bestätigen (Enter)
|
||||
- Falsch → Korrigieren + Speichern
|
||||
- Unbrauchbar → Überspringen
|
||||
4. **Export** - Format wählen (TrOCR, Llama Vision, Generic)
|
||||
5. **Training starten** - Export-Ordner für Fine-Tuning nutzen
|
||||
|
||||
## Umgebungsvariablen
|
||||
|
||||
```bash
|
||||
# PostgreSQL
|
||||
DATABASE_URL=postgres://user:pass@postgres:5432/breakpilot_db
|
||||
|
||||
# MinIO (S3-kompatibel)
|
||||
MINIO_ENDPOINT=minio:9000
|
||||
MINIO_ACCESS_KEY=breakpilot
|
||||
MINIO_SECRET_KEY=breakpilot123
|
||||
MINIO_BUCKET=breakpilot-rag
|
||||
MINIO_SECURE=false
|
||||
|
||||
# Ollama (Vision-LLM)
|
||||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
OLLAMA_VISION_MODEL=llama3.2-vision:11b
|
||||
OLLAMA_CORRECTION_MODEL=qwen2.5:14b
|
||||
|
||||
# Export
|
||||
OCR_EXPORT_PATH=/app/ocr-exports
|
||||
OCR_STORAGE_PATH=/app/ocr-labeling
|
||||
```
|
||||
|
||||
## Sicherheit & Datenschutz
|
||||
|
||||
- **100% Lokale Verarbeitung** - Alle Daten bleiben auf dem Mac Mini
|
||||
- **Keine Cloud-Uploads** - Ollama läuft vollständig offline
|
||||
- **DSGVO-konform** - Keine Schülerdaten verlassen das Schulnetzwerk
|
||||
- **Deduplizierung** - SHA256-Hash verhindert doppelte Bilder
|
||||
|
||||
## Dateien
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `klausur-service/backend/ocr_labeling_api.py` | FastAPI Router mit OCR Model Dispatcher |
|
||||
| `klausur-service/backend/training_export_service.py` | Export-Service für TrOCR/Llama |
|
||||
| `klausur-service/backend/metrics_db.py` | PostgreSQL CRUD Funktionen |
|
||||
| `klausur-service/backend/minio_storage.py` | MinIO OCR-Image Storage |
|
||||
| `klausur-service/backend/hybrid_vocab_extractor.py` | PaddleOCR Integration |
|
||||
| `klausur-service/backend/services/donut_ocr_service.py` | Donut OCR Service (NEU) |
|
||||
| `klausur-service/backend/services/trocr_service.py` | TrOCR Service (NEU) |
|
||||
| `website/app/admin/ocr-labeling/page.tsx` | Frontend UI mit Model-Auswahl |
|
||||
| `website/app/admin/ocr-labeling/types.ts` | TypeScript Interfaces inkl. OCRModel Type |
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
# Backend-Tests ausführen
|
||||
cd klausur-service/backend
|
||||
pytest tests/test_ocr_labeling.py -v
|
||||
|
||||
# Mit Coverage
|
||||
pytest tests/test_ocr_labeling.py --cov=. --cov-report=html
|
||||
```
|
||||
472
docs-src/services/klausur-service/RAG-Admin-Spec.md
Normal file
472
docs-src/services/klausur-service/RAG-Admin-Spec.md
Normal file
@@ -0,0 +1,472 @@
|
||||
# RAG & Daten-Management Spezifikation
|
||||
|
||||
## Übersicht
|
||||
|
||||
Admin-Frontend für die Verwaltung von Trainingsdaten und RAG-Systemen in BreakPilot.
|
||||
|
||||
**Location**: `/admin/docs` → Tab "Daten & RAG"
|
||||
**Backend**: `klausur-service` (Port 8086)
|
||||
**Storage**: MinIO (persistentes Docker Volume `minio_data`)
|
||||
**Vector DB**: Qdrant (Port 6333)
|
||||
|
||||
## Datenmodell
|
||||
|
||||
### Zwei Datentypen mit unterschiedlichen Regeln
|
||||
|
||||
| Typ | Quelle | Training erlaubt | Isolation | Collection |
|
||||
|-----|--------|------------------|-----------|------------|
|
||||
| **Landes-Daten** | NiBiS, andere Bundesländer | ✅ Ja | Pro Bundesland | `bp_{bundesland}_{usecase}` |
|
||||
| **Lehrer-Daten** | Lehrer-Upload (BYOEH) | ❌ Nein | Pro Tenant (Schule/Lehrer) | `bp_eh` (verschlüsselt) |
|
||||
|
||||
### Bundesland-Codes (ISO 3166-2:DE)
|
||||
|
||||
```
|
||||
NI = Niedersachsen BY = Bayern BW = Baden-Württemberg
|
||||
NW = Nordrhein-Westf. HE = Hessen SN = Sachsen
|
||||
BE = Berlin HH = Hamburg SH = Schleswig-Holstein
|
||||
BB = Brandenburg MV = Meckl.-Vorp. ST = Sachsen-Anhalt
|
||||
TH = Thüringen RP = Rheinland-Pfalz SL = Saarland
|
||||
HB = Bremen
|
||||
```
|
||||
|
||||
### Use Cases (RAG-Sammlungen)
|
||||
|
||||
| Use Case | Collection Pattern | Beschreibung |
|
||||
|----------|-------------------|--------------|
|
||||
| Klausurkorrektur | `bp_{bl}_klausur` | Erwartungshorizonte für Abitur |
|
||||
| Zeugnisgenerator | `bp_{bl}_zeugnis` | Textbausteine für Zeugnisse |
|
||||
| Lehrplan | `bp_{bl}_lehrplan` | Kerncurricula, Rahmenrichtlinien |
|
||||
|
||||
Beispiel: `bp_ni_klausur` = Niedersachsen Klausurkorrektur
|
||||
|
||||
## MinIO Bucket-Struktur
|
||||
|
||||
```
|
||||
breakpilot-rag/
|
||||
├── landes-daten/
|
||||
│ ├── ni/ # Niedersachsen
|
||||
│ │ ├── klausur/
|
||||
│ │ │ ├── 2016/
|
||||
│ │ │ │ ├── manifest.json
|
||||
│ │ │ │ └── *.pdf
|
||||
│ │ │ ├── 2017/
|
||||
│ │ │ ├── ...
|
||||
│ │ │ └── 2025/
|
||||
│ │ └── zeugnis/
|
||||
│ ├── by/ # Bayern
|
||||
│ └── .../
|
||||
│
|
||||
└── lehrer-daten/ # BYOEH - verschlüsselt
|
||||
└── {tenant_id}/
|
||||
└── {lehrer_id}/
|
||||
└── *.pdf.enc
|
||||
```
|
||||
|
||||
## Qdrant Schema
|
||||
|
||||
### Landes-Daten Collection (z.B. `bp_ni_klausur`)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid-v5-from-string",
|
||||
"vector": [384 dimensions],
|
||||
"payload": {
|
||||
"original_id": "nibis_2024_deutsch_ea_1_abc123_chunk_0",
|
||||
"doc_id": "nibis_2024_deutsch_ea_1_abc123",
|
||||
"chunk_index": 0,
|
||||
"text": "Der Erwartungshorizont...",
|
||||
"year": 2024,
|
||||
"subject": "Deutsch",
|
||||
"niveau": "eA",
|
||||
"task_number": 1,
|
||||
"doc_type": "EWH",
|
||||
"bundesland": "NI",
|
||||
"source": "nibis",
|
||||
"training_allowed": true,
|
||||
"minio_path": "landes-daten/ni/klausur/2024/2024_Deutsch_eA_I_EWH.pdf"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Lehrer-Daten Collection (`bp_eh`)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"vector": [384 dimensions],
|
||||
"payload": {
|
||||
"tenant_id": "schule_123",
|
||||
"eh_id": "eh_abc",
|
||||
"chunk_index": 0,
|
||||
"subject": "deutsch",
|
||||
"encrypted_content": "base64...",
|
||||
"training_allowed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend-Komponenten
|
||||
|
||||
### 1. Sammlungen-Übersicht (`/admin/rag/collections`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Daten & RAG │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sammlungen [+ Neu] │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ 📚 Niedersachsen - Klausurkorrektur │ │
|
||||
│ │ bp_ni_klausur | 630 Docs | 4.521 Chunks | 2016-2025 │ │
|
||||
│ │ [Suchen] [Indexieren] [Details] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ 📚 Niedersachsen - Zeugnisgenerator │ │
|
||||
│ │ bp_ni_zeugnis | 0 Docs | Leer │ │
|
||||
│ │ [Suchen] [Indexieren] [Details] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2. Upload-Bereich (`/admin/rag/upload`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Dokumente hochladen │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Ziel-Sammlung: [Niedersachsen - Klausurkorrektur ▼] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ │ 📁 ZIP-Datei oder Ordner hierher ziehen │ │
|
||||
│ │ │ │
|
||||
│ │ oder [Dateien auswählen] │ │
|
||||
│ │ │ │
|
||||
│ │ Unterstützt: .zip, .pdf, Ordner │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Upload-Queue: │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ ✅ 2018.zip - 45 PDFs erkannt │ │
|
||||
│ │ ⏳ 2019.zip - Wird analysiert... │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [Hochladen & Indexieren] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3. Ingestion-Status (`/admin/rag/ingestion`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Ingestion Status │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Aktueller Job: Niedersachsen Klausur 2024 │
|
||||
│ ████████████████████░░░░░░░░░░ 65% (412/630 Docs) │
|
||||
│ Chunks: 2.891 | Fehler: 3 | ETA: 4:32 │
|
||||
│ [Pausieren] [Abbrechen] │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Letzte Jobs: │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ ✅ 09.01.2025 15:30 - NI Klausur 2024 - 128 Chunks │ │
|
||||
│ │ ✅ 09.01.2025 14:00 - NI Klausur 2017 - 890 Chunks │ │
|
||||
│ │ ❌ 08.01.2025 10:15 - BY Klausur - Fehler: Timeout │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 4. Suche & Qualitätstest (`/admin/rag/search`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RAG Suche & Qualitätstest │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sammlung: [Niedersachsen - Klausurkorrektur ▼] │
|
||||
│ │
|
||||
│ Query: [Analyse eines Gedichts von Rilke ] │
|
||||
│ │
|
||||
│ Filter: │
|
||||
│ Jahr: [Alle ▼] Fach: [Deutsch ▼] Niveau: [eA ▼] │
|
||||
│ │
|
||||
│ [🔍 Suchen] │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Ergebnisse (3): Latenz: 45ms │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ #1 | Score: 0.847 | 2024 Deutsch eA Aufgabe 2 │ │
|
||||
│ │ │ │
|
||||
│ │ "...Die Analyse des Rilke-Gedichts soll folgende │ │
|
||||
│ │ Aspekte berücksichtigen: Aufbau, Bildsprache..." │ │
|
||||
│ │ │ │
|
||||
│ │ Relevanz: [⭐⭐⭐⭐⭐] [⭐⭐⭐⭐] [⭐⭐⭐] [⭐⭐] [⭐] │ │
|
||||
│ │ Notizen: [Optional: Warum relevant/nicht relevant? ] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 5. Metriken-Dashboard (`/admin/rag/metrics`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RAG Qualitätsmetriken │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Zeitraum: [Letzte 7 Tage ▼] Sammlung: [Alle ▼] │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Precision@5 │ │ Recall@10 │ │ MRR │ │
|
||||
│ │ 0.78 │ │ 0.85 │ │ 0.72 │ │
|
||||
│ │ ↑ +5% │ │ ↑ +3% │ │ ↓ -2% │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Avg Latency │ │ Bewertungen │ │ Fehlerrate │ │
|
||||
│ │ 52ms │ │ 127 │ │ 0.3% │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Score-Verteilung: │
|
||||
│ 0.9+ ████████████████ 23% │
|
||||
│ 0.7+ ████████████████████████████ 41% │
|
||||
│ 0.5+ ████████████████████ 28% │
|
||||
│ <0.5 ██████ 8% │
|
||||
│ │
|
||||
│ [Export CSV] [Detailbericht] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Collections API
|
||||
|
||||
```
|
||||
GET /api/v1/admin/rag/collections
|
||||
POST /api/v1/admin/rag/collections
|
||||
GET /api/v1/admin/rag/collections/{id}
|
||||
DELETE /api/v1/admin/rag/collections/{id}
|
||||
GET /api/v1/admin/rag/collections/{id}/stats
|
||||
```
|
||||
|
||||
### Upload API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/upload
|
||||
Content-Type: multipart/form-data
|
||||
- file: ZIP oder PDF
|
||||
- collection_id: string
|
||||
- metadata: JSON (optional)
|
||||
|
||||
POST /api/v1/admin/rag/upload/folder
|
||||
- Für Ordner-Upload (WebKitDirectory)
|
||||
```
|
||||
|
||||
### Ingestion API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/ingest
|
||||
- collection_id: string
|
||||
- filters: {year?, subject?, doc_type?}
|
||||
|
||||
GET /api/v1/admin/rag/ingest/status
|
||||
GET /api/v1/admin/rag/ingest/history
|
||||
POST /api/v1/admin/rag/ingest/cancel
|
||||
```
|
||||
|
||||
### Search API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/search
|
||||
- query: string
|
||||
- collection_id: string
|
||||
- filters: {year?, subject?, niveau?}
|
||||
- limit: int
|
||||
|
||||
POST /api/v1/admin/rag/search/feedback
|
||||
- result_id: string
|
||||
- rating: 1-5
|
||||
- notes: string (optional)
|
||||
```
|
||||
|
||||
### Metrics API
|
||||
|
||||
```
|
||||
GET /api/v1/admin/rag/metrics
|
||||
- collection_id?: string
|
||||
- from_date?: date
|
||||
- to_date?: date
|
||||
|
||||
GET /api/v1/admin/rag/metrics/export
|
||||
- format: csv|json
|
||||
```
|
||||
|
||||
## Embedding-Konfiguration
|
||||
|
||||
```python
|
||||
# Default: Lokale Embeddings (kein API-Key nötig)
|
||||
EMBEDDING_BACKEND = "local"
|
||||
LOCAL_EMBEDDING_MODEL = "all-MiniLM-L6-v2"
|
||||
VECTOR_DIMENSIONS = 384
|
||||
|
||||
# Optional: OpenAI (für Produktion)
|
||||
EMBEDDING_BACKEND = "openai"
|
||||
EMBEDDING_MODEL = "text-embedding-3-small"
|
||||
VECTOR_DIMENSIONS = 1536
|
||||
```
|
||||
|
||||
## Datenpersistenz
|
||||
|
||||
### Docker Volumes (WICHTIG - nicht löschen!)
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
minio_data: # Alle hochgeladenen Dokumente
|
||||
qdrant_data: # Alle Vektoren und Embeddings
|
||||
postgres_data: # Metadaten, Bewertungen, History
|
||||
```
|
||||
|
||||
### Backup-Strategie
|
||||
|
||||
```bash
|
||||
# MinIO Backup
|
||||
docker exec breakpilot-pwa-minio mc mirror /data /backup
|
||||
|
||||
# Qdrant Backup
|
||||
curl -X POST http://localhost:6333/collections/bp_ni_klausur/snapshots
|
||||
|
||||
# Postgres Backup (bereits implementiert)
|
||||
# Läuft automatisch täglich um 2 Uhr
|
||||
```
|
||||
|
||||
## Implementierungsreihenfolge
|
||||
|
||||
1. ✅ Backend: Basis-Ingestion (nibis_ingestion.py)
|
||||
2. ✅ Backend: Lokale Embeddings (sentence-transformers)
|
||||
3. ✅ Backend: MinIO-Integration (minio_storage.py)
|
||||
4. ✅ Backend: Collections API (admin_api.py)
|
||||
5. ✅ Backend: Upload API mit ZIP-Support
|
||||
6. ✅ Backend: Metrics API mit PostgreSQL (metrics_db.py)
|
||||
7. ✅ Frontend: Sammlungen-Übersicht
|
||||
8. ✅ Frontend: Upload-Bereich (Drag & Drop)
|
||||
9. ✅ Frontend: Ingestion-Status
|
||||
10. ✅ Frontend: Suche & Qualitätstest (mit Stern-Bewertungen)
|
||||
11. ✅ Frontend: Metriken-Dashboard
|
||||
|
||||
## Technologie-Stack
|
||||
|
||||
- **Frontend**: Next.js 15 (`/website/app/admin/rag/page.tsx`)
|
||||
- **Backend**: FastAPI (`klausur-service/backend/`)
|
||||
- **Vector DB**: Qdrant v1.7.4 (384-dim Vektoren)
|
||||
- **Object Storage**: MinIO (S3-kompatibel)
|
||||
- **Embeddings**: sentence-transformers `all-MiniLM-L6-v2`
|
||||
- **Metrics DB**: PostgreSQL 16
|
||||
|
||||
## Entwickler-Dokumentation
|
||||
|
||||
### Projektstruktur
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # FastAPI App + BYOEH Endpoints
|
||||
│ ├── admin_api.py # RAG Admin API (Upload, Search, Metrics)
|
||||
│ ├── nibis_ingestion.py # NiBiS Dokument-Ingestion Pipeline
|
||||
│ ├── eh_pipeline.py # Chunking, Embeddings, Encryption
|
||||
│ ├── qdrant_service.py # Qdrant Client + Search
|
||||
│ ├── minio_storage.py # MinIO S3 Storage
|
||||
│ ├── metrics_db.py # PostgreSQL Metrics
|
||||
│ ├── requirements.txt # Python Dependencies
|
||||
│ └── tests/
|
||||
│ └── test_rag_admin.py
|
||||
└── docs/
|
||||
└── RAG-Admin-Spec.md # Diese Datei
|
||||
```
|
||||
|
||||
### Schnellstart für Entwickler
|
||||
|
||||
```bash
|
||||
# 1. Services starten
|
||||
cd /path/to/breakpilot-pwa
|
||||
docker-compose up -d qdrant minio postgres
|
||||
|
||||
# 2. Dependencies installieren
|
||||
cd klausur-service/backend
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 3. Service starten
|
||||
python -m uvicorn main:app --port 8086 --reload
|
||||
|
||||
# 4. RAG-Services initialisieren (erstellt Bucket + Tabellen)
|
||||
curl -X POST http://localhost:8086/api/v1/admin/rag/init
|
||||
```
|
||||
|
||||
### API-Referenz (Implementiert)
|
||||
|
||||
#### NiBiS Ingestion
|
||||
```
|
||||
GET /api/v1/admin/nibis/discover # Dokumente finden
|
||||
POST /api/v1/admin/nibis/ingest # Indexierung starten
|
||||
GET /api/v1/admin/nibis/status # Status abfragen
|
||||
GET /api/v1/admin/nibis/stats # Statistiken
|
||||
POST /api/v1/admin/nibis/search # Semantische Suche
|
||||
GET /api/v1/admin/nibis/collections # Qdrant Collections
|
||||
```
|
||||
|
||||
#### RAG Upload & Storage
|
||||
```
|
||||
POST /api/v1/admin/rag/upload # ZIP/PDF hochladen
|
||||
GET /api/v1/admin/rag/upload/history # Upload-Verlauf
|
||||
GET /api/v1/admin/rag/storage/stats # MinIO Statistiken
|
||||
```
|
||||
|
||||
#### Metrics & Feedback
|
||||
```
|
||||
GET /api/v1/admin/rag/metrics # Qualitätsmetriken
|
||||
POST /api/v1/admin/rag/search/feedback # Bewertung abgeben
|
||||
POST /api/v1/admin/rag/init # Services initialisieren
|
||||
```
|
||||
|
||||
### Umgebungsvariablen
|
||||
|
||||
```bash
|
||||
# Qdrant
|
||||
QDRANT_URL=http://localhost:6333
|
||||
|
||||
# MinIO
|
||||
MINIO_ENDPOINT=localhost:9000
|
||||
MINIO_ACCESS_KEY=breakpilot
|
||||
MINIO_SECRET_KEY=breakpilot123
|
||||
MINIO_BUCKET=breakpilot-rag
|
||||
|
||||
# PostgreSQL
|
||||
DATABASE_URL=postgres://breakpilot:breakpilot123@localhost:5432/breakpilot_db
|
||||
|
||||
# Embeddings
|
||||
EMBEDDING_BACKEND=local
|
||||
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
|
||||
```
|
||||
|
||||
### Aktuelle Indexierungs-Statistik
|
||||
|
||||
- **Dokumente**: 579 Erwartungshorizonte (NiBiS)
|
||||
- **Chunks**: 7.352
|
||||
- **Jahre**: 2016, 2017, 2024, 2025
|
||||
- **Fächer**: Deutsch, Englisch, Mathematik, Physik, Chemie, Biologie, Geschichte, Politik-Wirtschaft, Erdkunde, Sport, Kunst, Musik, Latein, Informatik, Ev. Religion, Kath. Religion, Werte und Normen, etc.
|
||||
- **Collection**: `bp_nibis_eh`
|
||||
- **Vektor-Dimensionen**: 384
|
||||
@@ -0,0 +1,409 @@
|
||||
# Visual Worksheet Editor - Architecture Documentation
|
||||
|
||||
**Version:** 1.0
|
||||
**Status:** Implementiert
|
||||
|
||||
## 1. Übersicht
|
||||
|
||||
Der Visual Worksheet Editor ist ein Canvas-basierter Editor für die Erstellung und Bearbeitung von Arbeitsblättern. Er ermöglicht Lehrern, eingescannte Arbeitsblätter originalgetreu zu rekonstruieren oder neue Arbeitsblätter visuell zu gestalten.
|
||||
|
||||
### 1.1 Hauptfunktionen
|
||||
|
||||
- **Canvas-basiertes Editieren** mit Fabric.js
|
||||
- **Freie Positionierung** von Text, Bildern und Formen
|
||||
- **Typografie-Steuerung** (Schriftarten, Größen, Stile)
|
||||
- **Bilder & Grafiken** hochladen und einfügen
|
||||
- **KI-generierte Bilder** via Ollama/Stable Diffusion
|
||||
- **PDF/Bild-Export** für Druck und digitale Nutzung
|
||||
- **Mehrseitige Dokumente** mit Seitennavigation
|
||||
|
||||
### 1.2 Technologie-Stack
|
||||
|
||||
| Komponente | Technologie | Lizenz |
|
||||
|------------|-------------|--------|
|
||||
| Canvas-Bibliothek | Fabric.js 6.x | MIT |
|
||||
| PDF-Export | pdf-lib 1.17.x | MIT |
|
||||
| Frontend | Next.js / React | MIT |
|
||||
| Backend API | FastAPI | MIT |
|
||||
| KI-Bilder | Ollama + Stable Diffusion | Apache 2.0 / MIT |
|
||||
|
||||
## 2. Architektur
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (studio-v2 / Next.js) │
|
||||
│ /studio-v2/app/worksheet-editor/page.tsx │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌────────────────────────────┐ ┌────────────────┐ │
|
||||
│ │ Toolbar │ │ Fabric.js Canvas │ │ Properties │ │
|
||||
│ │ (Links) │ │ (Mitte - 60%) │ │ Panel │ │
|
||||
│ │ │ │ │ │ (Rechts) │ │
|
||||
│ │ - Select │ │ ┌──────────────────────┐ │ │ │ │
|
||||
│ │ - Text │ │ │ │ │ │ - Schriftart │ │
|
||||
│ │ - Formen │ │ │ A4 Arbeitsfläche │ │ │ - Größe │ │
|
||||
│ │ - Bilder │ │ │ mit Grid │ │ │ - Farbe │ │
|
||||
│ │ - KI-Bild │ │ │ │ │ │ - Position │ │
|
||||
│ │ - Tabelle │ │ └──────────────────────┘ │ │ - Ebene │ │
|
||||
│ └─────────────┘ └────────────────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Seiten-Navigation | Zoom | Grid | Export PDF │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI - Port 8086) │
|
||||
│ POST /api/v1/worksheet/ai-image → Bild via Ollama generieren │
|
||||
│ POST /api/v1/worksheet/save → Worksheet speichern │
|
||||
│ GET /api/v1/worksheet/{id} → Worksheet laden │
|
||||
│ POST /api/v1/worksheet/export-pdf → PDF generieren │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Ollama (Port 11434) │
|
||||
│ Model: stable-diffusion oder kompatibles Text-to-Image Modell │
|
||||
│ Text-to-Image für KI-generierte Grafiken │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 3. Dateistruktur
|
||||
|
||||
### 3.1 Frontend (studio-v2)
|
||||
|
||||
```
|
||||
/studio-v2/
|
||||
├── app/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── page.tsx # Haupt-Editor-Seite
|
||||
│ └── types.ts # TypeScript Interfaces
|
||||
│
|
||||
├── components/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── index.ts # Exports
|
||||
│ ├── FabricCanvas.tsx # Fabric.js Canvas Wrapper
|
||||
│ ├── EditorToolbar.tsx # Werkzeugleiste (links)
|
||||
│ ├── PropertiesPanel.tsx # Eigenschaften-Panel (rechts)
|
||||
│ ├── AIImageGenerator.tsx # KI-Bild Generator Modal
|
||||
│ ├── CanvasControls.tsx # Zoom, Grid, Seiten
|
||||
│ ├── ExportPanel.tsx # PDF/Bild Export
|
||||
│ └── PageNavigator.tsx # Mehrseitige Dokumente
|
||||
│
|
||||
├── lib/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── index.ts # Exports
|
||||
│ └── WorksheetContext.tsx # State Management
|
||||
```
|
||||
|
||||
### 3.2 Backend (klausur-service)
|
||||
|
||||
```
|
||||
/klausur-service/backend/
|
||||
├── worksheet_editor_api.py # API Endpoints
|
||||
└── main.py # Router-Registrierung
|
||||
```
|
||||
|
||||
## 4. API Endpoints
|
||||
|
||||
### 4.1 KI-Bild generieren
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/ai-image
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"prompt": "Ein freundlicher Cartoon-Hund der ein Buch liest",
|
||||
"style": "cartoon",
|
||||
"width": 512,
|
||||
"height": 512
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"image_base64": "data:image/png;base64,...",
|
||||
"prompt_used": "...",
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
**Styles:**
|
||||
- `realistic` - Fotorealistisch
|
||||
- `cartoon` - Cartoon/Comic
|
||||
- `sketch` - Handgezeichnete Skizze
|
||||
- `clipart` - Einfache Clipart-Grafiken
|
||||
- `educational` - Bildungs-Illustrationen
|
||||
|
||||
### 4.2 Worksheet speichern
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/save
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"id": "optional-existing-id",
|
||||
"title": "Englisch Vokabeln Unit 3",
|
||||
"pages": [
|
||||
{ "id": "page_1", "index": 0, "canvasJSON": "{...}" }
|
||||
],
|
||||
"pageFormat": {
|
||||
"width": 210,
|
||||
"height": 297,
|
||||
"orientation": "portrait"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Worksheet laden
|
||||
|
||||
```http
|
||||
GET /api/v1/worksheet/{id}
|
||||
```
|
||||
|
||||
### 4.4 PDF exportieren
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/{id}/export-pdf
|
||||
```
|
||||
|
||||
**Response:** PDF-Datei als Download
|
||||
|
||||
### 4.5 Worksheets auflisten
|
||||
|
||||
```http
|
||||
GET /api/v1/worksheet/list/all
|
||||
```
|
||||
|
||||
## 5. Komponenten
|
||||
|
||||
### 5.1 FabricCanvas
|
||||
|
||||
Die Kernkomponente für den Canvas-Bereich:
|
||||
|
||||
- **A4-Format**: 794 x 1123 Pixel (96 DPI)
|
||||
- **Grid-Overlay**: Optionales Raster mit Snap-Funktion
|
||||
- **Zoom/Pan**: Mausrad und Controls
|
||||
- **Selection**: Einzel- und Mehrfachauswahl
|
||||
- **Keyboard Shortcuts**: Del, Ctrl+C/V/Z/D
|
||||
|
||||
### 5.2 EditorToolbar
|
||||
|
||||
Werkzeuge für die Bearbeitung:
|
||||
|
||||
| Icon | Tool | Beschreibung |
|
||||
|------|------|--------------|
|
||||
| 🖱️ | Select | Elemente auswählen/verschieben |
|
||||
| T | Text | Text hinzufügen (IText) |
|
||||
| ▭ | Rechteck | Rechteck zeichnen |
|
||||
| ○ | Kreis | Kreis/Ellipse zeichnen |
|
||||
| ― | Linie | Linie zeichnen |
|
||||
| → | Pfeil | Pfeil zeichnen |
|
||||
| 🖼️ | Bild | Bild hochladen |
|
||||
| ✨ | KI-Bild | Bild mit KI generieren |
|
||||
| ⊞ | Tabelle | Tabelle einfügen |
|
||||
|
||||
### 5.3 PropertiesPanel
|
||||
|
||||
Eigenschaften-Editor für ausgewählte Objekte:
|
||||
|
||||
**Text-Eigenschaften:**
|
||||
- Schriftart (Arial, Times, Georgia, OpenDyslexic, Schulschrift)
|
||||
- Schriftgröße (8-120pt)
|
||||
- Schriftstil (Normal, Fett, Kursiv)
|
||||
- Zeilenhöhe, Zeichenabstand
|
||||
- Textausrichtung
|
||||
- Textfarbe
|
||||
|
||||
**Form-Eigenschaften:**
|
||||
- Füllfarbe
|
||||
- Rahmenfarbe und -stärke
|
||||
- Eckenradius
|
||||
|
||||
**Allgemein:**
|
||||
- Deckkraft
|
||||
- Löschen-Button
|
||||
|
||||
### 5.4 WorksheetContext
|
||||
|
||||
React Context für globalen State:
|
||||
|
||||
```typescript
|
||||
interface WorksheetContextType {
|
||||
canvas: Canvas | null
|
||||
document: WorksheetDocument | null
|
||||
activeTool: EditorTool
|
||||
selectedObjects: FabricObject[]
|
||||
zoom: number
|
||||
showGrid: boolean
|
||||
snapToGrid: boolean
|
||||
currentPageIndex: number
|
||||
canUndo: boolean
|
||||
canRedo: boolean
|
||||
isDirty: boolean
|
||||
// ... Methoden
|
||||
}
|
||||
```
|
||||
|
||||
## 6. Datenmodelle
|
||||
|
||||
### 6.1 WorksheetDocument
|
||||
|
||||
```typescript
|
||||
interface WorksheetDocument {
|
||||
id: string
|
||||
title: string
|
||||
description?: string
|
||||
pages: WorksheetPage[]
|
||||
pageFormat: PageFormat
|
||||
createdAt: string
|
||||
updatedAt: string
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 WorksheetPage
|
||||
|
||||
```typescript
|
||||
interface WorksheetPage {
|
||||
id: string
|
||||
index: number
|
||||
canvasJSON: string // Serialisierter Fabric.js Canvas
|
||||
thumbnail?: string
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 PageFormat
|
||||
|
||||
```typescript
|
||||
interface PageFormat {
|
||||
width: number // in mm (Standard: 210)
|
||||
height: number // in mm (Standard: 297)
|
||||
orientation: 'portrait' | 'landscape'
|
||||
margins: { top, right, bottom, left: number }
|
||||
}
|
||||
```
|
||||
|
||||
## 7. Features
|
||||
|
||||
### 7.1 Undo/Redo
|
||||
|
||||
- History-Stack mit max. 50 Einträgen
|
||||
- Automatische Speicherung bei jeder Änderung
|
||||
- Keyboard: Ctrl+Z (Undo), Ctrl+Y (Redo)
|
||||
|
||||
### 7.2 Grid & Snap
|
||||
|
||||
- Konfigurierbares Raster (5mm, 10mm, 15mm, 20mm)
|
||||
- Snap-to-Grid beim Verschieben
|
||||
- Ein-/Ausblendbar
|
||||
|
||||
### 7.3 Export
|
||||
|
||||
- **PDF**: Mehrseitig, A4-Format
|
||||
- **PNG**: Hochauflösend (2x Multiplier)
|
||||
- **JPG**: Mit Qualitätseinstellung
|
||||
|
||||
### 7.4 Speicherung
|
||||
|
||||
- **Backend**: REST API mit JSON-Persistierung
|
||||
- **Fallback**: localStorage bei Offline-Betrieb
|
||||
|
||||
## 8. KI-Bildgenerierung
|
||||
|
||||
### 8.1 Ollama Integration
|
||||
|
||||
Der Editor nutzt Ollama für die KI-Bildgenerierung:
|
||||
|
||||
```python
|
||||
OLLAMA_URL = "http://host.docker.internal:11434"
|
||||
```
|
||||
|
||||
### 8.2 Placeholder-System
|
||||
|
||||
Falls Ollama nicht verfügbar ist, wird ein Placeholder-Bild generiert:
|
||||
- Farbcodiert nach Stil
|
||||
- Prompt-Text als Beschreibung
|
||||
- "KI-Bild (Platzhalter)"-Badge
|
||||
|
||||
### 8.3 Stil-Prompts
|
||||
|
||||
Jeder Stil fügt automatisch Modifikatoren zum Prompt hinzu:
|
||||
|
||||
```python
|
||||
STYLE_PROMPTS = {
|
||||
"realistic": "photorealistic, high detail",
|
||||
"cartoon": "cartoon style, colorful, child-friendly",
|
||||
"sketch": "pencil sketch, hand-drawn",
|
||||
"clipart": "clipart style, flat design",
|
||||
"educational": "educational illustration, textbook style"
|
||||
}
|
||||
```
|
||||
|
||||
## 9. Glassmorphism Design
|
||||
|
||||
Der Editor folgt dem Glassmorphism-Design des Studio v2:
|
||||
|
||||
```typescript
|
||||
// Dark Theme
|
||||
'backdrop-blur-xl bg-white/10 border border-white/20'
|
||||
|
||||
// Light Theme
|
||||
'backdrop-blur-xl bg-white/70 border border-black/10 shadow-xl'
|
||||
```
|
||||
|
||||
## 10. Internationalisierung
|
||||
|
||||
Unterstützte Sprachen:
|
||||
- 🇩🇪 Deutsch
|
||||
- 🇬🇧 English
|
||||
- 🇹🇷 Türkçe
|
||||
- 🇸🇦 العربية (RTL)
|
||||
- 🇷🇺 Русский
|
||||
- 🇺🇦 Українська
|
||||
- 🇵🇱 Polski
|
||||
|
||||
Translation Key: `nav_worksheet_editor`
|
||||
|
||||
## 11. Sicherheit
|
||||
|
||||
### 11.1 Bild-Upload
|
||||
|
||||
- Nur Bildformate (image/*)
|
||||
- Client-seitige Validierung
|
||||
- Base64-Konvertierung
|
||||
|
||||
### 11.2 CORS
|
||||
|
||||
Aktiviert für lokale Entwicklung und Docker-Umgebung.
|
||||
|
||||
## 12. Deployment
|
||||
|
||||
### 12.1 Frontend
|
||||
|
||||
```bash
|
||||
cd studio-v2
|
||||
npm install
|
||||
npm run dev # Port 3001
|
||||
```
|
||||
|
||||
### 12.2 Backend
|
||||
|
||||
Der klausur-service läuft auf Port 8086:
|
||||
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
python main.py
|
||||
```
|
||||
|
||||
### 12.3 Docker
|
||||
|
||||
Der Service ist Teil des docker-compose.yml.
|
||||
|
||||
## 13. Zukünftige Erweiterungen
|
||||
|
||||
- [ ] Tabellen-Tool mit Zellbearbeitung
|
||||
- [ ] Vorlagen-Bibliothek
|
||||
- [ ] Kollaboratives Editieren
|
||||
- [ ] Drag & Drop aus Dokumentenbibliothek
|
||||
- [ ] Integration mit Vocab-Worksheet
|
||||
173
docs-src/services/klausur-service/index.md
Normal file
173
docs-src/services/klausur-service/index.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Klausur-Service
|
||||
|
||||
Der Klausur-Service ist ein FastAPI-basierter Microservice fuer KI-gestuetzte Abitur-Klausurkorrektur.
|
||||
|
||||
## Uebersicht
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| **Port** | 8086 |
|
||||
| **Framework** | FastAPI (Python) |
|
||||
| **Datenbank** | PostgreSQL + Qdrant (Vektor-DB) |
|
||||
| **Speicher** | MinIO (Datei-Storage) |
|
||||
|
||||
## Features
|
||||
|
||||
- **OCR-Erkennung**: Automatische Texterkennung aus gescannten Klausuren
|
||||
- **KI-Bewertung**: Automatische Bewertungsvorschlaege basierend auf Erwartungshorizont
|
||||
- **BYOEH**: Bring-Your-Own-Expectation-Horizon mit Client-seitiger Verschluesselung
|
||||
- **Fairness-Analyse**: Statistische Analyse der Bewertungskonsistenz
|
||||
- **PDF-Export**: Gutachten und Notenuebersichten als PDF
|
||||
- **Zweitkorrektur**: Vollstaendiger Workflow fuer Erst-, Zweit- und Drittkorrektur
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (Next.js) │
|
||||
│ /website/app/admin/klausur-korrektur/ │
|
||||
│ - Klausur-Liste │
|
||||
│ - Studenten-Liste │
|
||||
│ - Korrektur-Workspace (2/3-1/3 Layout) │
|
||||
│ - Fairness-Dashboard │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI) │
|
||||
│ Port 8086 - /klausur-service/backend/main.py │
|
||||
│ - Klausur CRUD (/api/v1/klausuren) │
|
||||
│ - Student Work (/api/v1/students) │
|
||||
│ - Annotations (/api/v1/annotations) │
|
||||
│ - BYOEH (/api/v1/eh) │
|
||||
│ - PDF Export │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Infrastruktur │
|
||||
│ - Qdrant (Vektor-DB fuer RAG) │
|
||||
│ - MinIO (Datei-Storage) │
|
||||
│ - PostgreSQL (Metadaten) │
|
||||
│ - Embedding-Service (Port 8087) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Klausur-Verwaltung
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/klausuren` | Liste aller Klausuren |
|
||||
| POST | `/api/v1/klausuren` | Neue Klausur erstellen |
|
||||
| GET | `/api/v1/klausuren/{id}` | Klausur-Details |
|
||||
| DELETE | `/api/v1/klausuren/{id}` | Klausur loeschen |
|
||||
|
||||
### Studenten-Arbeiten
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/klausuren/{id}/students` | Arbeit hochladen |
|
||||
| GET | `/api/v1/klausuren/{id}/students` | Studenten-Liste |
|
||||
| GET | `/api/v1/students/{id}` | Einzelne Arbeit |
|
||||
| PUT | `/api/v1/students/{id}/criteria` | Kriterien bewerten |
|
||||
| PUT | `/api/v1/students/{id}/gutachten` | Gutachten speichern |
|
||||
|
||||
### KI-Funktionen
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/students/{id}/gutachten/generate` | Gutachten generieren |
|
||||
| GET | `/api/v1/klausuren/{id}/fairness` | Fairness-Analyse |
|
||||
| POST | `/api/v1/students/{id}/eh-suggestions` | EH-Vorschlaege via RAG |
|
||||
|
||||
### PDF-Export
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/students/{id}/export/gutachten` | Einzelgutachten PDF |
|
||||
| GET | `/api/v1/students/{id}/export/annotations` | Anmerkungen PDF |
|
||||
| GET | `/api/v1/klausuren/{id}/export/overview` | Notenuebersicht PDF |
|
||||
| GET | `/api/v1/klausuren/{id}/export/all-gutachten` | Alle Gutachten PDF |
|
||||
|
||||
## Notensystem
|
||||
|
||||
Das System verwendet das deutsche 15-Punkte-System fuer Abiturklausuren:
|
||||
|
||||
| Punkte | Prozent | Note |
|
||||
|--------|---------|------|
|
||||
| 15 | >= 95% | 1+ |
|
||||
| 14 | >= 90% | 1 |
|
||||
| 13 | >= 85% | 1- |
|
||||
| 12 | >= 80% | 2+ |
|
||||
| 11 | >= 75% | 2 |
|
||||
| 10 | >= 70% | 2- |
|
||||
| 9 | >= 65% | 3+ |
|
||||
| 8 | >= 60% | 3 |
|
||||
| 7 | >= 55% | 3- |
|
||||
| 6 | >= 50% | 4+ |
|
||||
| 5 | >= 45% | 4 |
|
||||
| 4 | >= 40% | 4- |
|
||||
| 3 | >= 33% | 5+ |
|
||||
| 2 | >= 27% | 5 |
|
||||
| 1 | >= 20% | 5- |
|
||||
| 0 | < 20% | 6 |
|
||||
|
||||
## Bewertungskriterien
|
||||
|
||||
| Kriterium | Gewicht | Beschreibung |
|
||||
|-----------|---------|--------------|
|
||||
| Rechtschreibung | 15% | Orthografie |
|
||||
| Grammatik | 15% | Grammatik & Syntax |
|
||||
| Inhalt | 40% | Inhaltliche Qualitaet |
|
||||
| Struktur | 15% | Aufbau & Gliederung |
|
||||
| Stil | 15% | Ausdruck & Stil |
|
||||
|
||||
## Verzeichnisstruktur
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # API Endpoints + Datenmodelle
|
||||
│ ├── qdrant_service.py # Vektor-Datenbank Operationen
|
||||
│ ├── eh_pipeline.py # BYOEH Verarbeitung
|
||||
│ ├── hybrid_search.py # Hybrid Search (BM25 + Semantic)
|
||||
│ └── requirements.txt # Python Dependencies
|
||||
├── frontend/
|
||||
│ └── src/
|
||||
│ ├── components/ # React Komponenten
|
||||
│ ├── pages/ # Seiten
|
||||
│ └── services/ # API Client
|
||||
└── docs/
|
||||
├── BYOEH-Architecture.md
|
||||
└── BYOEH-Developer-Guide.md
|
||||
```
|
||||
|
||||
## Konfiguration
|
||||
|
||||
### Umgebungsvariablen
|
||||
|
||||
```env
|
||||
# Klausur-Service
|
||||
KLAUSUR_SERVICE_PORT=8086
|
||||
QDRANT_URL=http://qdrant:6333
|
||||
MINIO_ENDPOINT=minio:9000
|
||||
MINIO_ACCESS_KEY=...
|
||||
MINIO_SECRET_KEY=...
|
||||
|
||||
# Embedding-Service
|
||||
EMBEDDING_SERVICE_URL=http://embedding:8087
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# BYOEH
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
EH_UPLOAD_DIR=/app/eh-uploads
|
||||
```
|
||||
|
||||
## Weiterführende Dokumentation
|
||||
|
||||
- [BYOEH Architektur](./BYOEH-Architecture.md) - Client-seitige Verschluesselung
|
||||
- [OCR Compare](./OCR-Compare.md) - Block Review Feature fuer OCR-Vergleich
|
||||
- [Zeugnis-System](../../architecture/zeugnis-system.md) - Zeugniserstellung
|
||||
- [Backend API](../../api/backend-api.md) - Allgemeine API-Dokumentation
|
||||
160
docs-src/services/voice-service/index.md
Normal file
160
docs-src/services/voice-service/index.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# Voice Service
|
||||
|
||||
Der Voice Service ist eine Voice-First Interface für die Breakpilot-Plattform mit DSGVO-konformem Design.
|
||||
|
||||
## Übersicht
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| **Port** | 8082 |
|
||||
| **Framework** | FastAPI (Python) |
|
||||
| **Streaming** | WebSocket |
|
||||
| **DSGVO** | Privacy-by-Design |
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Voice Service (Port 8082) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Sessions │───>│ Task │───>│ BQAS │ │
|
||||
│ │ API │ │ Orchestrator │ │ (Quality) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────┼────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ WebSocket │ │ Encryption │ │ Logging │ │
|
||||
│ │ Streaming │ │ Service │ │ (structlog) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Kernkomponenten
|
||||
|
||||
### PersonaPlex + TaskOrchestrator
|
||||
|
||||
- Voice-first Interface für Breakpilot
|
||||
- Real-time Voice Processing
|
||||
- Multi-Agent Integration
|
||||
|
||||
### DSGVO-Compliance (Privacy-by-Design)
|
||||
|
||||
| Feature | Beschreibung |
|
||||
|---------|--------------|
|
||||
| **Keine Audio-Persistenz** | Nur RAM-basiert, keine dauerhafte Speicherung |
|
||||
| **Namespace-Verschlüsselung** | Schlüssel nur auf Lehrer-Gerät |
|
||||
| **TTL-basierte Löschung** | Automatische Datenlöschung nach Zeitablauf |
|
||||
| **Transcript-Verschlüsselung** | Verschlüsselte Transkripte |
|
||||
|
||||
## API-Endpunkte
|
||||
|
||||
### Sessions
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/sessions` | Session erstellen |
|
||||
| GET | `/api/v1/sessions/:id` | Session abrufen |
|
||||
| DELETE | `/api/v1/sessions/:id` | Session beenden |
|
||||
|
||||
### Task Orchestration
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/tasks` | Task erstellen |
|
||||
| GET | `/api/v1/tasks/:id` | Task-Status abrufen |
|
||||
|
||||
### BQAS (Quality Assessment)
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/bqas/evaluate` | Qualitätsbewertung |
|
||||
| GET | `/api/v1/bqas/metrics` | Metriken abrufen |
|
||||
|
||||
### WebSocket
|
||||
|
||||
| Endpoint | Beschreibung |
|
||||
|----------|--------------|
|
||||
| `/ws/voice` | Real-time Voice Streaming |
|
||||
|
||||
### Health
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/health` | Health Check |
|
||||
| GET | `/ready` | Readiness Check |
|
||||
|
||||
## Verzeichnisstruktur
|
||||
|
||||
```
|
||||
voice-service/
|
||||
├── main.py # FastAPI Application
|
||||
├── config.py # Konfiguration
|
||||
├── pyproject.toml # Projekt-Metadaten
|
||||
├── requirements.txt # Dependencies
|
||||
├── api/
|
||||
│ ├── sessions.py # Session-Management
|
||||
│ ├── streaming.py # WebSocket Voice Streaming
|
||||
│ ├── tasks.py # Task Orchestration
|
||||
│ └── bqas.py # Quality Assessment
|
||||
├── services/
|
||||
│ ├── task_orchestrator.py # Task-Routing
|
||||
│ └── encryption.py # Verschlüsselung
|
||||
├── bqas/
|
||||
│ ├── judge.py # LLM Judge
|
||||
│ └── quality_judge_agent.py # Agent-Integration
|
||||
├── models/ # Datenmodelle
|
||||
├── scripts/ # Utility-Scripts
|
||||
└── tests/ # Test-Suite
|
||||
```
|
||||
|
||||
## Konfiguration
|
||||
|
||||
```env
|
||||
# .env
|
||||
VOICE_SERVICE_PORT=8082
|
||||
REDIS_URL=redis://localhost:6379
|
||||
DATABASE_URL=postgresql://...
|
||||
ENCRYPTION_KEY=...
|
||||
TTL_MINUTES=60
|
||||
```
|
||||
|
||||
## Entwicklung
|
||||
|
||||
```bash
|
||||
# Dependencies installieren
|
||||
cd voice-service
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Server starten
|
||||
uvicorn main:app --reload --port 8082
|
||||
|
||||
# Tests ausführen
|
||||
pytest -v
|
||||
```
|
||||
|
||||
## Docker
|
||||
|
||||
Der Service läuft als Teil von docker-compose.yml:
|
||||
|
||||
```yaml
|
||||
voice-service:
|
||||
build:
|
||||
context: ./voice-service
|
||||
ports:
|
||||
- "8082:8082"
|
||||
environment:
|
||||
- REDIS_URL=redis://valkey:6379
|
||||
depends_on:
|
||||
- valkey
|
||||
- postgres
|
||||
```
|
||||
|
||||
## Weiterführende Dokumentation
|
||||
|
||||
- [Multi-Agent Architektur](../../architecture/multi-agent.md)
|
||||
- [BQAS Quality System](../../architecture/bqas.md)
|
||||
Reference in New Issue
Block a user