Some checks failed
Tests / Go Tests (push) Has been cancelled
Tests / Python Tests (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / Go Lint (push) Has been cancelled
Tests / Python Lint (push) Has been cancelled
Tests / Security Scan (push) Has been cancelled
Tests / All Checks Passed (push) Has been cancelled
Security Scanning / Secret Scanning (push) Has been cancelled
Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled
Security Scanning / Go Security Scan (push) Has been cancelled
Security Scanning / Python Security Scan (push) Has been cancelled
Security Scanning / Node.js Security Scan (push) Has been cancelled
Security Scanning / Docker Image Security (push) Has been cancelled
Security Scanning / Security Summary (push) Has been cancelled
CI/CD Pipeline / Go Tests (push) Has been cancelled
CI/CD Pipeline / Python Tests (push) Has been cancelled
CI/CD Pipeline / Website Tests (push) Has been cancelled
CI/CD Pipeline / Linting (push) Has been cancelled
CI/CD Pipeline / Security Scan (push) Has been cancelled
CI/CD Pipeline / Docker Build & Push (push) Has been cancelled
CI/CD Pipeline / Integration Tests (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / CI Summary (push) Has been cancelled
ci/woodpecker/manual/build-ci-image Pipeline was successful
ci/woodpecker/manual/main Pipeline failed
All services: admin-v2, studio-v2, website, ai-compliance-sdk, consent-service, klausur-service, voice-service, and infrastructure. Large PDFs and compiled binaries excluded via .gitignore.
1178 lines
33 KiB
Markdown
1178 lines
33 KiB
Markdown
# Klausur-Modul - Vollständige Entwicklerspezifikation
|
|
|
|
**Version:** 2.0
|
|
**Stand:** Januar 2025
|
|
**Autor:** BreakPilot Development Team
|
|
|
|
---
|
|
|
|
## Inhaltsverzeichnis
|
|
|
|
1. [Übersicht](#1-übersicht)
|
|
2. [Systemarchitektur](#2-systemarchitektur)
|
|
3. [Datenmodelle](#3-datenmodelle)
|
|
4. [API-Spezifikation](#4-api-spezifikation)
|
|
5. [Frontend-Architektur](#5-frontend-architektur)
|
|
6. [Sicherheit & Compliance](#6-sicherheit--compliance)
|
|
7. [Testing-Strategie](#7-testing-strategie)
|
|
8. [Deployment & Operations](#8-deployment--operations)
|
|
9. [Entwicklungsrichtlinien](#9-entwicklungsrichtlinien)
|
|
|
|
---
|
|
|
|
## 1. Übersicht
|
|
|
|
### 1.1 Modulbeschreibung
|
|
|
|
Das Klausur-Modul ist ein umfassendes System für die digitale Korrektur, Bewertung und Verwaltung von Abitur- und Vorabiturklausuren. Es besteht aus folgenden Kernkomponenten:
|
|
|
|
| Komponente | Beschreibung | Technologie |
|
|
|------------|--------------|-------------|
|
|
| Klausur-Service Backend | Hauptservice für alle Klausur-Operationen | Python FastAPI |
|
|
| BYOEH (Bring Your Own EH) | Erwartungshorizont-Management mit RAG | Qdrant, MinIO |
|
|
| Zeugnisse-Modul | Verordnungen und KI-Assistent | Crawler, Embeddings |
|
|
| Training-Modul | KI-Modell Training & Monitoring | Background Tasks |
|
|
| Frontend | Admin & Lehrer Oberflächen | Next.js, React |
|
|
|
|
### 1.2 Technologie-Stack
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Frontend Layer │
|
|
│ Next.js 15 │ React 18 │ TypeScript │ Tailwind CSS │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ API Gateway Layer │
|
|
│ Next.js API Routes │ Server-Side Proxy │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Backend Services │
|
|
│ Klausur-Service (FastAPI) │ Port 8086 │
|
|
│ ├── main.py (Klausur CRUD, BYOEH) │
|
|
│ ├── admin_api.py (NiBiS Ingestion) │
|
|
│ ├── zeugnis_api.py (Zeugnisse Crawler) │
|
|
│ ├── training_api.py (Training Management) │
|
|
│ └── metrics_db.py (PostgreSQL Operations) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
▼ ▼ ▼
|
|
┌──────────────────┐ ┌──────────────┐ ┌────────────────┐
|
|
│ PostgreSQL │ │ Qdrant │ │ MinIO │
|
|
│ (Metadata) │ │ (Vectors) │ │ (Documents) │
|
|
│ Port 5432 │ │ Port 6333 │ │ Port 9000 │
|
|
└──────────────────┘ └──────────────┘ └────────────────┘
|
|
```
|
|
|
|
### 1.3 Kernfunktionen
|
|
|
|
1. **Klausurverwaltung**
|
|
- Erstellen/Bearbeiten von Klausuren
|
|
- Upload von Schülerarbeiten
|
|
- Kriterien-basierte Bewertung
|
|
- Gutachten-Generierung
|
|
|
|
2. **BYOEH - Erwartungshorizont**
|
|
- Upload & Verschlüsselung von EH-Dokumenten
|
|
- Chunking & Embedding-Generierung
|
|
- RAG-basierte Suche
|
|
- Tenant-Isolation
|
|
|
|
3. **Zeugnisse**
|
|
- Rights-Aware Crawler für Verordnungen
|
|
- KI-Assistent für Lehrer
|
|
- Bundesland-spezifische Suche
|
|
|
|
4. **Training**
|
|
- Modell-Training mit Monitoring
|
|
- Hyperparameter-Konfiguration
|
|
- Versions-Management
|
|
|
|
---
|
|
|
|
## 2. Systemarchitektur
|
|
|
|
### 2.1 Microservice-Architektur
|
|
|
|
```
|
|
┌───────────────────┐
|
|
│ Load Balancer │
|
|
│ (Nginx) │
|
|
└─────────┬─────────┘
|
|
│
|
|
┌─────────────────────┼─────────────────────┐
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ Website │ │ Backend │ │ Klausur-Svc │
|
|
│ (Next.js) │ │ (FastAPI) │ │ (FastAPI) │
|
|
│ Port 3000 │ │ Port 8000 │ │ Port 8086 │
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
│ │ │
|
|
└─────────────────────┴─────────────────────┘
|
|
│
|
|
┌─────────┴─────────┐
|
|
│ Service Mesh │
|
|
│ (Docker Net) │
|
|
└─────────┬─────────┘
|
|
│
|
|
┌─────────────┬───────┴───────┬─────────────┐
|
|
▼ ▼ ▼ ▼
|
|
PostgreSQL Qdrant MinIO Mailpit
|
|
```
|
|
|
|
### 2.2 Datenfluss
|
|
|
|
#### 2.2.1 Klausur-Korrektur Flow
|
|
|
|
```
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ Upload │────▶│ OCR │────▶│ Analyse │────▶│ Bewertung│
|
|
│ Arbeit │ │ (extern) │ │ (LLM) │ │ Kriterien│
|
|
└──────────┘ └──────────┘ └──────────┘ └──────────┘
|
|
│
|
|
▼
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ Export │◀────│ Finalize │◀────│Gutachten │◀────│ RAG-EH │
|
|
│ PDF │ │ │ │ Generate │ │ Query │
|
|
└──────────┘ └──────────┘ └──────────┘ └──────────┘
|
|
```
|
|
|
|
#### 2.2.2 Zeugnis-Crawler Flow
|
|
|
|
```
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ Seed URL │────▶│ Fetch │────▶│ Extract │────▶│ Check │
|
|
│ (Config) │ │ HTTP │ │ PDF/HTML │ │ Rights │
|
|
└──────────┘ └──────────┘ └──────────┘ └──────────┘
|
|
│
|
|
┌───────────────┤
|
|
▼ ▼
|
|
┌──────────┐ ┌──────────┐
|
|
│ MinIO │ │ Qdrant │
|
|
│ (Store) │ │ (Index) │
|
|
└──────────┘ └──────────┘
|
|
```
|
|
|
|
### 2.3 Komponenten-Details
|
|
|
|
#### 2.3.1 Klausur-Service (main.py)
|
|
|
|
| Endpunkt | Methode | Beschreibung |
|
|
|----------|---------|--------------|
|
|
| `/api/v1/klausuren` | GET | Liste aller Klausuren |
|
|
| `/api/v1/klausuren` | POST | Neue Klausur erstellen |
|
|
| `/api/v1/klausuren/{id}` | GET | Klausur-Details |
|
|
| `/api/v1/klausuren/{id}` | PUT | Klausur aktualisieren |
|
|
| `/api/v1/klausuren/{id}` | DELETE | Klausur löschen |
|
|
| `/api/v1/klausuren/{id}/students` | POST | Schülerarbeit hinzufügen |
|
|
| `/api/v1/students/{id}/criteria` | PUT | Kriterien bewerten |
|
|
| `/api/v1/students/{id}/gutachten` | PUT | Gutachten speichern |
|
|
| `/api/v1/students/{id}/gutachten/generate` | POST | Gutachten generieren |
|
|
|
|
#### 2.3.2 BYOEH (eh_pipeline.py, qdrant_service.py)
|
|
|
|
| Endpunkt | Methode | Beschreibung |
|
|
|----------|---------|--------------|
|
|
| `/api/v1/eh/upload` | POST | EH hochladen |
|
|
| `/api/v1/eh/{id}/index` | POST | EH indexieren |
|
|
| `/api/v1/eh/rag-query` | POST | RAG-Suche |
|
|
| `/api/v1/eh/{id}/share` | POST | EH teilen |
|
|
| `/api/v1/eh/{id}/link-klausur` | POST | EH mit Klausur verknüpfen |
|
|
|
|
#### 2.3.3 Zeugnis-Modul (zeugnis_api.py)
|
|
|
|
| Endpunkt | Methode | Beschreibung |
|
|
|----------|---------|--------------|
|
|
| `/api/v1/admin/zeugnis/sources` | GET | Bundesländer-Quellen |
|
|
| `/api/v1/admin/zeugnis/crawler/start` | POST | Crawler starten |
|
|
| `/api/v1/admin/zeugnis/crawler/stop` | POST | Crawler stoppen |
|
|
| `/api/v1/admin/zeugnis/documents` | GET | Dokumente abrufen |
|
|
| `/api/v1/admin/zeugnis/stats` | GET | Statistiken |
|
|
|
|
#### 2.3.4 Training-Modul (training_api.py)
|
|
|
|
| Endpunkt | Methode | Beschreibung |
|
|
|----------|---------|--------------|
|
|
| `/api/v1/admin/training/jobs` | GET | Training-Jobs |
|
|
| `/api/v1/admin/training/jobs` | POST | Training starten |
|
|
| `/api/v1/admin/training/jobs/{id}/pause` | POST | Pausieren |
|
|
| `/api/v1/admin/training/jobs/{id}/resume` | POST | Fortsetzen |
|
|
| `/api/v1/admin/training/models` | GET | Modell-Versionen |
|
|
|
|
---
|
|
|
|
## 3. Datenmodelle
|
|
|
|
### 3.1 PostgreSQL Schema
|
|
|
|
#### 3.1.1 Kern-Tabellen (metrics_db.py)
|
|
|
|
```sql
|
|
-- RAG Feedback
|
|
CREATE TABLE rag_search_feedback (
|
|
id SERIAL PRIMARY KEY,
|
|
result_id VARCHAR(255) NOT NULL,
|
|
query_text TEXT,
|
|
collection_name VARCHAR(100),
|
|
score FLOAT,
|
|
rating INTEGER CHECK (rating >= 1 AND rating <= 5),
|
|
notes TEXT,
|
|
user_id VARCHAR(100),
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- RAG Search Logs
|
|
CREATE TABLE rag_search_logs (
|
|
id SERIAL PRIMARY KEY,
|
|
query_text TEXT NOT NULL,
|
|
collection_name VARCHAR(100),
|
|
result_count INTEGER,
|
|
latency_ms INTEGER,
|
|
top_score FLOAT,
|
|
filters JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Relevanz-Judgments für Precision/Recall
|
|
CREATE TABLE rag_relevance_judgments (
|
|
id SERIAL PRIMARY KEY,
|
|
query_id VARCHAR(255) NOT NULL,
|
|
query_text TEXT NOT NULL,
|
|
result_id VARCHAR(255) NOT NULL,
|
|
result_rank INTEGER,
|
|
is_relevant BOOLEAN NOT NULL,
|
|
collection_name VARCHAR(100),
|
|
user_id VARCHAR(100),
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
#### 3.1.2 Zeugnis-Tabellen
|
|
|
|
```sql
|
|
-- Bundesland-Quellen
|
|
CREATE TABLE zeugnis_sources (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
bundesland VARCHAR(10) NOT NULL,
|
|
name VARCHAR(255) NOT NULL,
|
|
base_url TEXT,
|
|
license_type VARCHAR(50) NOT NULL,
|
|
training_allowed BOOLEAN DEFAULT FALSE,
|
|
verified_by VARCHAR(100),
|
|
verified_at TIMESTAMP,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Seed URLs
|
|
CREATE TABLE zeugnis_seed_urls (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
|
|
url TEXT NOT NULL,
|
|
doc_type VARCHAR(50),
|
|
status VARCHAR(20) DEFAULT 'pending',
|
|
last_crawled TIMESTAMP,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Dokumente
|
|
CREATE TABLE zeugnis_documents (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
seed_url_id VARCHAR(36) REFERENCES zeugnis_seed_urls(id),
|
|
title VARCHAR(500),
|
|
url TEXT NOT NULL,
|
|
content_hash VARCHAR(64),
|
|
minio_path TEXT,
|
|
training_allowed BOOLEAN DEFAULT FALSE,
|
|
indexed_in_qdrant BOOLEAN DEFAULT FALSE,
|
|
file_size INTEGER,
|
|
content_type VARCHAR(100),
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Dokument-Versionen
|
|
CREATE TABLE zeugnis_document_versions (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
|
|
version INTEGER NOT NULL,
|
|
content_hash VARCHAR(64),
|
|
minio_path TEXT,
|
|
change_summary TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Usage Events (Audit Trail)
|
|
CREATE TABLE zeugnis_usage_events (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
|
|
event_type VARCHAR(50) NOT NULL,
|
|
user_id VARCHAR(100),
|
|
details JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Crawler Queue
|
|
CREATE TABLE zeugnis_crawler_queue (
|
|
id VARCHAR(36) PRIMARY KEY,
|
|
source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
|
|
priority INTEGER DEFAULT 5,
|
|
status VARCHAR(20) DEFAULT 'pending',
|
|
started_at TIMESTAMP,
|
|
completed_at TIMESTAMP,
|
|
documents_found INTEGER DEFAULT 0,
|
|
documents_indexed INTEGER DEFAULT 0,
|
|
error_count INTEGER DEFAULT 0,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
### 3.2 In-Memory Modelle (Python Dataclasses)
|
|
|
|
#### 3.2.1 Klausur-Modelle
|
|
|
|
```python
|
|
@dataclass
|
|
class StudentKlausur:
|
|
id: str
|
|
klausur_id: str
|
|
student_name: str
|
|
status: StudentKlausurStatus # Enum
|
|
criteria_scores: Dict[str, Dict]
|
|
gutachten: Optional[Dict]
|
|
file_path: Optional[str]
|
|
ocr_text: Optional[str]
|
|
created_at: datetime
|
|
updated_at: datetime
|
|
|
|
@dataclass
|
|
class Klausur:
|
|
id: str
|
|
title: str
|
|
subject: str
|
|
modus: KlausurModus # LANDES_ABITUR, VORABITUR
|
|
year: int
|
|
semester: str
|
|
erwartungshorizont: Optional[Dict]
|
|
students: List[StudentKlausur]
|
|
created_at: datetime
|
|
tenant_id: str
|
|
```
|
|
|
|
#### 3.2.2 Zeugnis-Modelle (zeugnis_models.py)
|
|
|
|
```python
|
|
class LicenseType(str, Enum):
|
|
PUBLIC_DOMAIN = "public_domain"
|
|
CC_BY = "cc_by"
|
|
CC_BY_SA = "cc_by_sa"
|
|
CC_BY_NC = "cc_by_nc"
|
|
GOV_STATUTE_FREE_USE = "gov_statute"
|
|
ALL_RIGHTS_RESERVED = "all_rights"
|
|
UNKNOWN_REQUIRES_REVIEW = "unknown"
|
|
|
|
class CrawlStatus(str, Enum):
|
|
PENDING = "pending"
|
|
RUNNING = "running"
|
|
COMPLETED = "completed"
|
|
FAILED = "failed"
|
|
PAUSED = "paused"
|
|
|
|
class ZeugnisSource(BaseModel):
|
|
id: str
|
|
bundesland: str
|
|
name: str
|
|
base_url: Optional[str]
|
|
license_type: LicenseType
|
|
training_allowed: bool
|
|
verified_by: Optional[str]
|
|
verified_at: Optional[datetime]
|
|
```
|
|
|
|
### 3.3 Qdrant Collections
|
|
|
|
#### 3.3.1 BYOEH Collection (bp_eh)
|
|
|
|
```python
|
|
# Collection Config
|
|
collection_name = "bp_eh"
|
|
vector_size = 384 # all-MiniLM-L6-v2
|
|
distance = Distance.COSINE
|
|
|
|
# Payload Schema
|
|
{
|
|
"tenant_id": str, # Tenant-Isolation
|
|
"eh_id": str, # Erwartungshorizont ID
|
|
"chunk_index": int, # Position im Dokument
|
|
"subject": str, # Fach
|
|
"encrypted_content": str, # AES-256-GCM encrypted
|
|
"training_allowed": bool, # IMMER False für EH
|
|
}
|
|
```
|
|
|
|
#### 3.3.2 Zeugnis Collection (bp_zeugnis)
|
|
|
|
```python
|
|
# Collection Config
|
|
collection_name = "bp_zeugnis"
|
|
vector_size = 384 # all-MiniLM-L6-v2
|
|
distance = Distance.COSINE
|
|
|
|
# Payload Schema
|
|
{
|
|
"document_id": str,
|
|
"chunk_index": int,
|
|
"chunk_text": str, # Preview (max 500 chars)
|
|
"bundesland": str,
|
|
"doc_type": str,
|
|
"title": str,
|
|
"source_url": str,
|
|
"training_allowed": bool, # Von Source geerbt
|
|
"indexed_at": str,
|
|
}
|
|
```
|
|
|
|
### 3.4 MinIO Bucket-Struktur
|
|
|
|
```
|
|
breakpilot-rag/
|
|
├── landes-daten/
|
|
│ ├── {bundesland}/
|
|
│ │ └── zeugnis/
|
|
│ │ └── {year}/
|
|
│ │ └── {filename}.pdf
|
|
│ └── klausur/
|
|
│ └── {year}/
|
|
│ └── {subject}/
|
|
│ └── {filename}.pdf
|
|
│
|
|
└── lehrer-daten/
|
|
└── {tenant_id}/
|
|
└── {teacher_id}/
|
|
└── {filename}.pdf.enc
|
|
```
|
|
|
|
---
|
|
|
|
## 4. API-Spezifikation
|
|
|
|
### 4.1 Authentifizierung
|
|
|
|
Alle API-Endpunkte erfordern JWT-Authentifizierung:
|
|
|
|
```http
|
|
Authorization: Bearer <jwt_token>
|
|
```
|
|
|
|
JWT-Payload:
|
|
```json
|
|
{
|
|
"sub": "user-id",
|
|
"tenant_id": "school-id",
|
|
"roles": ["teacher", "admin"],
|
|
"exp": 1704067200
|
|
}
|
|
```
|
|
|
|
### 4.2 Fehlerbehandlung
|
|
|
|
#### Standard-Fehlerformat
|
|
|
|
```json
|
|
{
|
|
"detail": "Beschreibung des Fehlers",
|
|
"code": "ERROR_CODE",
|
|
"timestamp": "2024-01-01T10:00:00Z"
|
|
}
|
|
```
|
|
|
|
#### HTTP Status Codes
|
|
|
|
| Code | Bedeutung | Verwendung |
|
|
|------|-----------|------------|
|
|
| 200 | OK | Erfolgreiche Anfrage |
|
|
| 201 | Created | Ressource erstellt |
|
|
| 400 | Bad Request | Ungültige Eingabe |
|
|
| 401 | Unauthorized | Authentifizierung fehlt |
|
|
| 403 | Forbidden | Keine Berechtigung |
|
|
| 404 | Not Found | Ressource nicht gefunden |
|
|
| 409 | Conflict | Ressourcenkonflikt |
|
|
| 422 | Unprocessable | Validierungsfehler |
|
|
| 500 | Internal Error | Serverfehler |
|
|
| 503 | Unavailable | Service nicht verfügbar |
|
|
|
|
### 4.3 Pagination
|
|
|
|
```http
|
|
GET /api/v1/resource?limit=20&offset=0
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"items": [...],
|
|
"total": 100,
|
|
"limit": 20,
|
|
"offset": 0,
|
|
"has_more": true
|
|
}
|
|
```
|
|
|
|
### 4.4 Rate Limiting
|
|
|
|
| Endpunkt-Typ | Limit |
|
|
|--------------|-------|
|
|
| Standard API | 100/min |
|
|
| RAG Query | 30/min |
|
|
| Upload | 10/min |
|
|
| Training Start | 5/hour |
|
|
|
|
---
|
|
|
|
## 5. Frontend-Architektur
|
|
|
|
### 5.1 Verzeichnisstruktur
|
|
|
|
```
|
|
website/
|
|
├── app/
|
|
│ ├── admin/
|
|
│ │ ├── training/
|
|
│ │ │ └── page.tsx # Training Dashboard
|
|
│ │ ├── zeugnisse-crawler/
|
|
│ │ │ └── page.tsx # Crawler Admin
|
|
│ │ ├── rag/
|
|
│ │ │ └── page.tsx # RAG Admin
|
|
│ │ └── uni-crawler/
|
|
│ │ └── page.tsx # Uni Crawler
|
|
│ │
|
|
│ ├── zeugnisse/
|
|
│ │ └── page.tsx # Lehrer-Frontend
|
|
│ │
|
|
│ └── api/
|
|
│ └── admin/
|
|
│ ├── zeugnisse-crawler/
|
|
│ │ └── route.ts # API Proxy
|
|
│ └── training/
|
|
│ └── route.ts # API Proxy
|
|
│
|
|
├── components/
|
|
│ ├── ui/ # Basis-Komponenten
|
|
│ └── shared/ # Geteilte Komponenten
|
|
│
|
|
└── lib/
|
|
├── api.ts # API Client
|
|
└── utils.ts # Hilfsfunktionen
|
|
```
|
|
|
|
### 5.2 Komponenten-Hierarchie
|
|
|
|
```
|
|
App
|
|
├── Layout
|
|
│ ├── Header
|
|
│ │ ├── Navigation
|
|
│ │ └── UserMenu
|
|
│ └── Sidebar
|
|
│
|
|
├── Training Dashboard Page
|
|
│ ├── StatsCards
|
|
│ ├── TrainingJobCard
|
|
│ │ ├── ProgressRing
|
|
│ │ ├── MetricCards
|
|
│ │ └── LossChart
|
|
│ ├── DatasetOverview
|
|
│ └── NewTrainingModal (Wizard)
|
|
│
|
|
├── Zeugnisse Crawler Page
|
|
│ ├── StatsCards
|
|
│ ├── BundeslandTable
|
|
│ ├── DocumentList
|
|
│ └── CrawlerControls
|
|
│
|
|
└── Lehrer Zeugnisse Page
|
|
├── OnboardingWizard
|
|
├── ChatInterface
|
|
│ └── MessageList
|
|
├── SearchInterface
|
|
│ └── SearchResults
|
|
└── DocumentBrowser
|
|
```
|
|
|
|
### 5.3 State Management
|
|
|
|
#### Local State (useState)
|
|
- UI-Zustand (Modals, Tabs)
|
|
- Formular-Eingaben
|
|
- Lokale Filter
|
|
|
|
#### Server State (SWR/Fetch)
|
|
- API-Daten mit Polling
|
|
- Caching
|
|
- Revalidierung
|
|
|
|
#### Persisted State (localStorage)
|
|
- Benutzereinstellungen
|
|
- Letzte Suchen
|
|
- Wizard-Status
|
|
|
|
### 5.4 Styling-Konventionen
|
|
|
|
```typescript
|
|
// Tailwind CSS Klassennamen
|
|
const buttonPrimary = "px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition"
|
|
const buttonSecondary = "px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 transition"
|
|
const card = "bg-white dark:bg-gray-800 rounded-xl shadow-lg border border-gray-200 dark:border-gray-700"
|
|
const input = "px-3 py-2 bg-gray-100 dark:bg-gray-900 border-0 rounded-lg focus:ring-2 focus:ring-blue-500"
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Sicherheit & Compliance
|
|
|
|
### 6.1 Authentifizierung & Autorisierung
|
|
|
|
#### JWT-Validierung
|
|
|
|
```python
|
|
def verify_jwt(token: str) -> dict:
|
|
try:
|
|
payload = jwt.decode(token, JWT_SECRET, algorithms=["HS256"])
|
|
return payload
|
|
except jwt.ExpiredSignatureError:
|
|
raise HTTPException(status_code=401, detail="Token expired")
|
|
except jwt.InvalidTokenError:
|
|
raise HTTPException(status_code=401, detail="Invalid token")
|
|
```
|
|
|
|
#### RBAC Rollen
|
|
|
|
| Rolle | Berechtigungen |
|
|
|-------|----------------|
|
|
| teacher | Klausuren erstellen, EH hochladen, Zeugnis-Assistent |
|
|
| admin | + Crawler steuern, Training starten |
|
|
| superadmin | + System-Konfiguration |
|
|
|
|
### 6.2 Datenverschlüsselung
|
|
|
|
#### AES-256-GCM für EH-Dokumente
|
|
|
|
```python
|
|
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
|
|
|
|
def encrypt_text(plaintext: str, passphrase: str) -> tuple:
|
|
salt = os.urandom(16)
|
|
iv = os.urandom(12)
|
|
key = derive_key(passphrase, salt)
|
|
cipher = AESGCM(key)
|
|
ciphertext = cipher.encrypt(iv, plaintext.encode(), None)
|
|
return base64.b64encode(salt + iv + ciphertext).decode(), hash_key(key)
|
|
```
|
|
|
|
### 6.3 Tenant-Isolation
|
|
|
|
- Alle Datenbankabfragen filtern nach `tenant_id`
|
|
- Qdrant-Suchen mit `tenant_id` Filter
|
|
- MinIO-Pfade enthalten `tenant_id`
|
|
|
|
### 6.4 Audit-Trail
|
|
|
|
```python
|
|
async def log_event(event_type: str, resource_id: str, user_id: str, details: dict):
|
|
await log_zeugnis_event(
|
|
document_id=resource_id,
|
|
event_type=event_type,
|
|
user_id=user_id,
|
|
details=details,
|
|
)
|
|
```
|
|
|
|
### 6.5 DSGVO-Compliance
|
|
|
|
- Datenexport-Funktion
|
|
- Lösch-Anfragen
|
|
- Einwilligungs-Tracking
|
|
- Protokollierung aller Zugriffe
|
|
|
|
---
|
|
|
|
## 7. Testing-Strategie
|
|
|
|
### 7.1 Test-Pyramide
|
|
|
|
```
|
|
/\
|
|
/ \
|
|
/ E2E \ <- 5% (Critical Paths)
|
|
/------\
|
|
/ Integ \ <- 25% (API, DB)
|
|
/----------\
|
|
/ Unit \ <- 70% (Functions)
|
|
/--------------\
|
|
```
|
|
|
|
### 7.2 Unit Tests (Python)
|
|
|
|
**Speicherort:** `klausur-service/backend/tests/`
|
|
|
|
```python
|
|
# tests/test_zeugnis_models.py
|
|
import pytest
|
|
from zeugnis_models import (
|
|
LicenseType, get_training_allowed, get_bundesland_name
|
|
)
|
|
|
|
class TestTrainingPermissions:
|
|
def test_niedersachsen_allows_training(self):
|
|
assert get_training_allowed("ni") == True
|
|
|
|
def test_berlin_disallows_training(self):
|
|
assert get_training_allowed("be") == False
|
|
|
|
def test_unknown_bundesland_disallows(self):
|
|
assert get_training_allowed("xx") == False
|
|
|
|
|
|
class TestBundeslandNames:
|
|
def test_valid_code_returns_name(self):
|
|
assert get_bundesland_name("ni") == "Niedersachsen"
|
|
|
|
def test_invalid_code_returns_code(self):
|
|
assert get_bundesland_name("xx") == "xx"
|
|
```
|
|
|
|
```python
|
|
# tests/test_zeugnis_crawler.py
|
|
import pytest
|
|
from zeugnis_crawler import chunk_text, compute_hash, extract_text_from_pdf
|
|
|
|
class TestChunking:
|
|
def test_short_text_single_chunk(self):
|
|
text = "Dies ist ein kurzer Text."
|
|
chunks = chunk_text(text, chunk_size=100)
|
|
assert len(chunks) == 1
|
|
|
|
def test_long_text_multiple_chunks(self):
|
|
text = "A" * 2000
|
|
chunks = chunk_text(text, chunk_size=500, overlap=50)
|
|
assert len(chunks) > 1
|
|
|
|
def test_overlap_preserved(self):
|
|
text = "ABCDE" * 200
|
|
chunks = chunk_text(text, chunk_size=100, overlap=20)
|
|
for i in range(1, len(chunks)):
|
|
assert chunks[i][:20] == chunks[i-1][-20:]
|
|
|
|
|
|
class TestHashing:
|
|
def test_same_content_same_hash(self):
|
|
content = b"Hello World"
|
|
assert compute_hash(content) == compute_hash(content)
|
|
|
|
def test_different_content_different_hash(self):
|
|
assert compute_hash(b"Hello") != compute_hash(b"World")
|
|
```
|
|
|
|
### 7.3 Integration Tests
|
|
|
|
```python
|
|
# tests/test_zeugnis_api_integration.py
|
|
import pytest
|
|
from httpx import AsyncClient
|
|
from main import app
|
|
|
|
@pytest.fixture
|
|
async def client():
|
|
async with AsyncClient(app=app, base_url="http://test") as ac:
|
|
yield ac
|
|
|
|
@pytest.mark.asyncio
|
|
class TestZeugnisAPI:
|
|
async def test_get_sources_returns_list(self, client):
|
|
response = await client.get("/api/v1/admin/zeugnis/sources")
|
|
assert response.status_code == 200
|
|
assert isinstance(response.json(), list)
|
|
|
|
async def test_start_crawler_without_running(self, client):
|
|
response = await client.post(
|
|
"/api/v1/admin/zeugnis/crawler/start",
|
|
json={"bundesland": "ni"}
|
|
)
|
|
assert response.status_code == 200
|
|
|
|
async def test_start_crawler_while_running_fails(self, client):
|
|
# First start
|
|
await client.post("/api/v1/admin/zeugnis/crawler/start")
|
|
# Second start should fail
|
|
response = await client.post("/api/v1/admin/zeugnis/crawler/start")
|
|
assert response.status_code == 409
|
|
```
|
|
|
|
### 7.4 E2E Tests (Playwright)
|
|
|
|
```typescript
|
|
// tests/e2e/zeugnisse.spec.ts
|
|
import { test, expect } from '@playwright/test'
|
|
|
|
test.describe('Zeugnis-Assistent', () => {
|
|
test('onboarding wizard completes successfully', async ({ page }) => {
|
|
await page.goto('/zeugnisse')
|
|
|
|
// Step 1: Welcome
|
|
await expect(page.locator('h2')).toContainText('Willkommen')
|
|
await page.click('button:has-text("Weiter")')
|
|
|
|
// Step 2: Select Bundesland
|
|
await page.click('button:has-text("Niedersachsen")')
|
|
await page.click('button:has-text("Weiter")')
|
|
|
|
// Step 3: Select Schulform
|
|
await page.click('button:has-text("Gymnasium")')
|
|
await page.click('button:has-text("Weiter")')
|
|
|
|
// Step 4: Complete
|
|
await page.click('button:has-text("Loslegen")')
|
|
|
|
// Verify main interface
|
|
await expect(page.locator('h1')).toContainText('Zeugnis-Assistent')
|
|
})
|
|
|
|
test('chat interface responds to questions', async ({ page }) => {
|
|
// Skip wizard (set localStorage)
|
|
await page.goto('/zeugnisse')
|
|
await page.evaluate(() => {
|
|
localStorage.setItem('zeugnis-preferences', JSON.stringify({
|
|
bundesland: 'ni',
|
|
schulform: 'gymnasium',
|
|
hasSeenWizard: true,
|
|
}))
|
|
})
|
|
await page.reload()
|
|
|
|
// Send message
|
|
await page.fill('textarea', 'Wie schreibe ich Bemerkungen?')
|
|
await page.click('button[type="submit"]')
|
|
|
|
// Wait for response
|
|
await expect(page.locator('.bg-white.rounded-2xl').last())
|
|
.toContainText('Bemerkung', { timeout: 10000 })
|
|
})
|
|
})
|
|
```
|
|
|
|
### 7.5 Test-Ausführung
|
|
|
|
```bash
|
|
# Unit Tests (Python)
|
|
cd klausur-service/backend
|
|
pytest -v tests/
|
|
|
|
# Mit Coverage
|
|
pytest --cov=. --cov-report=html tests/
|
|
|
|
# E2E Tests (Playwright)
|
|
cd website
|
|
npx playwright test
|
|
|
|
# Alle Tests
|
|
./run-tests.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Deployment & Operations
|
|
|
|
### 8.1 Docker Compose
|
|
|
|
```yaml
|
|
# docker-compose.yml (Auszug)
|
|
services:
|
|
klausur-service:
|
|
build:
|
|
context: ./klausur-service
|
|
dockerfile: Dockerfile
|
|
ports:
|
|
- "8086:8086"
|
|
environment:
|
|
- JWT_SECRET=${JWT_SECRET}
|
|
- QDRANT_URL=http://qdrant:6333
|
|
- MINIO_ENDPOINT=minio:9000
|
|
- DATABASE_URL=postgres://breakpilot:breakpilot123@postgres:5432/breakpilot_db
|
|
depends_on:
|
|
- qdrant
|
|
- minio
|
|
- postgres
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8086/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
```
|
|
|
|
### 8.2 Dockerfile (Klausur-Service)
|
|
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
# System dependencies
|
|
RUN apt-get update && apt-get install -y \
|
|
gcc \
|
|
libpq-dev \
|
|
curl \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Python dependencies
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Application code
|
|
COPY backend/ .
|
|
|
|
# Create directories
|
|
RUN mkdir -p /app/uploads /app/eh-uploads
|
|
|
|
EXPOSE 8086
|
|
|
|
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8086"]
|
|
```
|
|
|
|
### 8.3 Monitoring
|
|
|
|
#### Health Checks
|
|
|
|
```python
|
|
@app.get("/health")
|
|
async def health():
|
|
return {
|
|
"status": "healthy",
|
|
"service": "klausur-service",
|
|
"version": "2.0",
|
|
"timestamp": datetime.now().isoformat(),
|
|
}
|
|
|
|
@app.get("/health/detailed")
|
|
async def health_detailed():
|
|
# Check dependencies
|
|
qdrant_ok = await check_qdrant()
|
|
postgres_ok = await check_postgres()
|
|
minio_ok = await check_minio()
|
|
|
|
return {
|
|
"status": "healthy" if all([qdrant_ok, postgres_ok, minio_ok]) else "degraded",
|
|
"dependencies": {
|
|
"qdrant": "ok" if qdrant_ok else "error",
|
|
"postgres": "ok" if postgres_ok else "error",
|
|
"minio": "ok" if minio_ok else "error",
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Prometheus Metrics
|
|
|
|
```python
|
|
from prometheus_client import Counter, Histogram
|
|
|
|
# Metriken
|
|
request_count = Counter('klausur_requests_total', 'Total requests', ['endpoint', 'method'])
|
|
request_latency = Histogram('klausur_request_latency_seconds', 'Request latency', ['endpoint'])
|
|
training_jobs = Counter('klausur_training_jobs_total', 'Training jobs', ['status'])
|
|
```
|
|
|
|
### 8.4 Logging
|
|
|
|
```python
|
|
import logging
|
|
|
|
logging.basicConfig(
|
|
level=logging.INFO,
|
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
|
)
|
|
|
|
logger = logging.getLogger("klausur-service")
|
|
|
|
# Strukturiertes Logging
|
|
logger.info("Training started", extra={
|
|
"job_id": job_id,
|
|
"bundeslaender": config.bundeslaender,
|
|
"epochs": config.epochs,
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Entwicklungsrichtlinien
|
|
|
|
### 9.1 Code-Style
|
|
|
|
#### Python
|
|
|
|
```python
|
|
# Imports: Standard → Third-Party → Local
|
|
import os
|
|
from datetime import datetime
|
|
|
|
from fastapi import FastAPI, HTTPException
|
|
from pydantic import BaseModel
|
|
|
|
from zeugnis_models import LicenseType
|
|
|
|
# Docstrings: Google Style
|
|
def process_document(content: bytes, doc_type: str) -> dict:
|
|
"""Process a document for indexing.
|
|
|
|
Args:
|
|
content: Raw document bytes.
|
|
doc_type: Type of document (pdf, html).
|
|
|
|
Returns:
|
|
Dict with extracted text and metadata.
|
|
|
|
Raises:
|
|
ValueError: If doc_type is not supported.
|
|
"""
|
|
pass
|
|
|
|
# Type Hints: Immer verwenden
|
|
async def get_sources(
|
|
bundesland: Optional[str] = None,
|
|
limit: int = 100,
|
|
) -> List[Dict[str, Any]]:
|
|
pass
|
|
```
|
|
|
|
#### TypeScript
|
|
|
|
```typescript
|
|
// Interfaces über Types bevorzugen
|
|
interface TrainingJob {
|
|
id: string
|
|
name: string
|
|
status: TrainingStatus
|
|
}
|
|
|
|
// Props-Interface für Komponenten
|
|
interface TrainingCardProps {
|
|
job: TrainingJob
|
|
onPause: () => void
|
|
onResume: () => void
|
|
}
|
|
|
|
// Funktionskomponenten mit expliziten Typen
|
|
export function TrainingCard({ job, onPause, onResume }: TrainingCardProps) {
|
|
return (...)
|
|
}
|
|
```
|
|
|
|
### 9.2 Git-Workflow
|
|
|
|
```
|
|
main
|
|
│
|
|
├── develop
|
|
│ │
|
|
│ ├── feature/zeugnis-crawler
|
|
│ ├── feature/training-dashboard
|
|
│ └── fix/crawler-retry
|
|
│
|
|
└── release/v2.0
|
|
```
|
|
|
|
#### Commit-Messages
|
|
|
|
```
|
|
feat(zeugnis): add rights-aware crawler
|
|
|
|
- Implement PDF/HTML text extraction
|
|
- Add training_allowed flag per bundesland
|
|
- Create audit trail for document access
|
|
|
|
Closes #123
|
|
```
|
|
|
|
### 9.3 Review-Checkliste
|
|
|
|
- [ ] Tests vorhanden und bestanden
|
|
- [ ] Dokumentation aktualisiert
|
|
- [ ] Type-Hints/Interfaces vollständig
|
|
- [ ] Keine Hardcoded Credentials
|
|
- [ ] Error Handling implementiert
|
|
- [ ] Logging vorhanden
|
|
- [ ] Performance akzeptabel
|
|
|
|
### 9.4 Versionierung
|
|
|
|
Semantic Versioning: `MAJOR.MINOR.PATCH`
|
|
|
|
- MAJOR: Breaking Changes
|
|
- MINOR: Neue Features (rückwärtskompatibel)
|
|
- PATCH: Bug Fixes
|
|
|
|
---
|
|
|
|
## Anhang A: Umgebungsvariablen
|
|
|
|
```env
|
|
# Authentifizierung
|
|
JWT_SECRET=your-super-secret-key
|
|
|
|
# Datenbanken
|
|
DATABASE_URL=postgres://user:pass@host:5432/db
|
|
QDRANT_URL=http://qdrant:6333
|
|
MINIO_ENDPOINT=minio:9000
|
|
MINIO_ACCESS_KEY=breakpilot
|
|
MINIO_SECRET_KEY=breakpilot123
|
|
MINIO_BUCKET=breakpilot-rag
|
|
|
|
# Embeddings
|
|
EMBEDDING_BACKEND=local # oder "openai"
|
|
OPENAI_API_KEY=sk-... # Falls openai
|
|
|
|
# Services
|
|
BACKEND_URL=http://backend:8000
|
|
SCHOOL_SERVICE_URL=http://school-service:8084
|
|
|
|
# Feature Flags
|
|
BYOEH_ENCRYPTION_ENABLED=true
|
|
BYOEH_CHUNK_SIZE=1000
|
|
BYOEH_CHUNK_OVERLAP=200
|
|
```
|
|
|
|
---
|
|
|
|
## Anhang B: Schnellreferenz
|
|
|
|
### API-Basis-URLs
|
|
|
|
| Umgebung | URL |
|
|
|----------|-----|
|
|
| Lokal | http://localhost:8086 |
|
|
| Entwicklung | https://dev.breakpilot.app |
|
|
| Produktion | https://api.breakpilot.app |
|
|
|
|
### Wichtige Befehle
|
|
|
|
```bash
|
|
# Service starten
|
|
docker-compose up -d klausur-service
|
|
|
|
# Logs anzeigen
|
|
docker logs -f breakpilot-pwa-klausur-service
|
|
|
|
# Tests ausführen
|
|
docker exec klausur-service pytest tests/
|
|
|
|
# DB-Migration
|
|
docker exec postgres psql -U breakpilot -d breakpilot_db -f /migration.sql
|
|
```
|
|
|
|
---
|
|
|
|
*Letzte Aktualisierung: Januar 2025*
|