# Klausur-Modul - Vollständige Entwicklerspezifikation **Version:** 2.0 **Stand:** Januar 2025 **Autor:** BreakPilot Development Team --- ## Inhaltsverzeichnis 1. [Übersicht](#1-übersicht) 2. [Systemarchitektur](#2-systemarchitektur) 3. [Datenmodelle](#3-datenmodelle) 4. [API-Spezifikation](#4-api-spezifikation) 5. [Frontend-Architektur](#5-frontend-architektur) 6. [Sicherheit & Compliance](#6-sicherheit--compliance) 7. [Testing-Strategie](#7-testing-strategie) 8. [Deployment & Operations](#8-deployment--operations) 9. [Entwicklungsrichtlinien](#9-entwicklungsrichtlinien) --- ## 1. Übersicht ### 1.1 Modulbeschreibung Das Klausur-Modul ist ein umfassendes System für die digitale Korrektur, Bewertung und Verwaltung von Abitur- und Vorabiturklausuren. Es besteht aus folgenden Kernkomponenten: | Komponente | Beschreibung | Technologie | |------------|--------------|-------------| | Klausur-Service Backend | Hauptservice für alle Klausur-Operationen | Python FastAPI | | BYOEH (Bring Your Own EH) | Erwartungshorizont-Management mit RAG | Qdrant, MinIO | | Zeugnisse-Modul | Verordnungen und KI-Assistent | Crawler, Embeddings | | Training-Modul | KI-Modell Training & Monitoring | Background Tasks | | Frontend | Admin & Lehrer Oberflächen | Next.js, React | ### 1.2 Technologie-Stack ``` ┌─────────────────────────────────────────────────────────────┐ │ Frontend Layer │ │ Next.js 15 │ React 18 │ TypeScript │ Tailwind CSS │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ API Gateway Layer │ │ Next.js API Routes │ Server-Side Proxy │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Backend Services │ │ Klausur-Service (FastAPI) │ Port 8086 │ │ ├── main.py (Klausur CRUD, BYOEH) │ │ ├── admin_api.py (NiBiS Ingestion) │ │ ├── zeugnis_api.py (Zeugnisse Crawler) │ │ ├── training_api.py (Training Management) │ │ └── metrics_db.py (PostgreSQL Operations) │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌──────────────────┐ ┌──────────────┐ ┌────────────────┐ │ PostgreSQL │ │ Qdrant │ │ MinIO │ │ (Metadata) │ │ (Vectors) │ │ (Documents) │ │ Port 5432 │ │ Port 6333 │ │ Port 9000 │ └──────────────────┘ └──────────────┘ └────────────────┘ ``` ### 1.3 Kernfunktionen 1. **Klausurverwaltung** - Erstellen/Bearbeiten von Klausuren - Upload von Schülerarbeiten - Kriterien-basierte Bewertung - Gutachten-Generierung 2. **BYOEH - Erwartungshorizont** - Upload & Verschlüsselung von EH-Dokumenten - Chunking & Embedding-Generierung - RAG-basierte Suche - Tenant-Isolation 3. **Zeugnisse** - Rights-Aware Crawler für Verordnungen - KI-Assistent für Lehrer - Bundesland-spezifische Suche 4. **Training** - Modell-Training mit Monitoring - Hyperparameter-Konfiguration - Versions-Management --- ## 2. Systemarchitektur ### 2.1 Microservice-Architektur ``` ┌───────────────────┐ │ Load Balancer │ │ (Nginx) │ └─────────┬─────────┘ │ ┌─────────────────────┼─────────────────────┐ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Website │ │ Backend │ │ Klausur-Svc │ │ (Next.js) │ │ (FastAPI) │ │ (FastAPI) │ │ Port 3000 │ │ Port 8000 │ │ Port 8086 │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────┴─────────────────────┘ │ ┌─────────┴─────────┐ │ Service Mesh │ │ (Docker Net) │ └─────────┬─────────┘ │ ┌─────────────┬───────┴───────┬─────────────┐ ▼ ▼ ▼ ▼ PostgreSQL Qdrant MinIO Mailpit ``` ### 2.2 Datenfluss #### 2.2.1 Klausur-Korrektur Flow ``` ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Upload │────▶│ OCR │────▶│ Analyse │────▶│ Bewertung│ │ Arbeit │ │ (extern) │ │ (LLM) │ │ Kriterien│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Export │◀────│ Finalize │◀────│Gutachten │◀────│ RAG-EH │ │ PDF │ │ │ │ Generate │ │ Query │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ``` #### 2.2.2 Zeugnis-Crawler Flow ``` ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Seed URL │────▶│ Fetch │────▶│ Extract │────▶│ Check │ │ (Config) │ │ HTTP │ │ PDF/HTML │ │ Rights │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ ┌───────────────┤ ▼ ▼ ┌──────────┐ ┌──────────┐ │ MinIO │ │ Qdrant │ │ (Store) │ │ (Index) │ └──────────┘ └──────────┘ ``` ### 2.3 Komponenten-Details #### 2.3.1 Klausur-Service (main.py) | Endpunkt | Methode | Beschreibung | |----------|---------|--------------| | `/api/v1/klausuren` | GET | Liste aller Klausuren | | `/api/v1/klausuren` | POST | Neue Klausur erstellen | | `/api/v1/klausuren/{id}` | GET | Klausur-Details | | `/api/v1/klausuren/{id}` | PUT | Klausur aktualisieren | | `/api/v1/klausuren/{id}` | DELETE | Klausur löschen | | `/api/v1/klausuren/{id}/students` | POST | Schülerarbeit hinzufügen | | `/api/v1/students/{id}/criteria` | PUT | Kriterien bewerten | | `/api/v1/students/{id}/gutachten` | PUT | Gutachten speichern | | `/api/v1/students/{id}/gutachten/generate` | POST | Gutachten generieren | #### 2.3.2 BYOEH (eh_pipeline.py, qdrant_service.py) | Endpunkt | Methode | Beschreibung | |----------|---------|--------------| | `/api/v1/eh/upload` | POST | EH hochladen | | `/api/v1/eh/{id}/index` | POST | EH indexieren | | `/api/v1/eh/rag-query` | POST | RAG-Suche | | `/api/v1/eh/{id}/share` | POST | EH teilen | | `/api/v1/eh/{id}/link-klausur` | POST | EH mit Klausur verknüpfen | #### 2.3.3 Zeugnis-Modul (zeugnis_api.py) | Endpunkt | Methode | Beschreibung | |----------|---------|--------------| | `/api/v1/admin/zeugnis/sources` | GET | Bundesländer-Quellen | | `/api/v1/admin/zeugnis/crawler/start` | POST | Crawler starten | | `/api/v1/admin/zeugnis/crawler/stop` | POST | Crawler stoppen | | `/api/v1/admin/zeugnis/documents` | GET | Dokumente abrufen | | `/api/v1/admin/zeugnis/stats` | GET | Statistiken | #### 2.3.4 Training-Modul (training_api.py) | Endpunkt | Methode | Beschreibung | |----------|---------|--------------| | `/api/v1/admin/training/jobs` | GET | Training-Jobs | | `/api/v1/admin/training/jobs` | POST | Training starten | | `/api/v1/admin/training/jobs/{id}/pause` | POST | Pausieren | | `/api/v1/admin/training/jobs/{id}/resume` | POST | Fortsetzen | | `/api/v1/admin/training/models` | GET | Modell-Versionen | --- ## 3. Datenmodelle ### 3.1 PostgreSQL Schema #### 3.1.1 Kern-Tabellen (metrics_db.py) ```sql -- RAG Feedback CREATE TABLE rag_search_feedback ( id SERIAL PRIMARY KEY, result_id VARCHAR(255) NOT NULL, query_text TEXT, collection_name VARCHAR(100), score FLOAT, rating INTEGER CHECK (rating >= 1 AND rating <= 5), notes TEXT, user_id VARCHAR(100), created_at TIMESTAMP DEFAULT NOW() ); -- RAG Search Logs CREATE TABLE rag_search_logs ( id SERIAL PRIMARY KEY, query_text TEXT NOT NULL, collection_name VARCHAR(100), result_count INTEGER, latency_ms INTEGER, top_score FLOAT, filters JSONB, created_at TIMESTAMP DEFAULT NOW() ); -- Relevanz-Judgments für Precision/Recall CREATE TABLE rag_relevance_judgments ( id SERIAL PRIMARY KEY, query_id VARCHAR(255) NOT NULL, query_text TEXT NOT NULL, result_id VARCHAR(255) NOT NULL, result_rank INTEGER, is_relevant BOOLEAN NOT NULL, collection_name VARCHAR(100), user_id VARCHAR(100), created_at TIMESTAMP DEFAULT NOW() ); ``` #### 3.1.2 Zeugnis-Tabellen ```sql -- Bundesland-Quellen CREATE TABLE zeugnis_sources ( id VARCHAR(36) PRIMARY KEY, bundesland VARCHAR(10) NOT NULL, name VARCHAR(255) NOT NULL, base_url TEXT, license_type VARCHAR(50) NOT NULL, training_allowed BOOLEAN DEFAULT FALSE, verified_by VARCHAR(100), verified_at TIMESTAMP, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); -- Seed URLs CREATE TABLE zeugnis_seed_urls ( id VARCHAR(36) PRIMARY KEY, source_id VARCHAR(36) REFERENCES zeugnis_sources(id), url TEXT NOT NULL, doc_type VARCHAR(50), status VARCHAR(20) DEFAULT 'pending', last_crawled TIMESTAMP, error_message TEXT, created_at TIMESTAMP DEFAULT NOW() ); -- Dokumente CREATE TABLE zeugnis_documents ( id VARCHAR(36) PRIMARY KEY, seed_url_id VARCHAR(36) REFERENCES zeugnis_seed_urls(id), title VARCHAR(500), url TEXT NOT NULL, content_hash VARCHAR(64), minio_path TEXT, training_allowed BOOLEAN DEFAULT FALSE, indexed_in_qdrant BOOLEAN DEFAULT FALSE, file_size INTEGER, content_type VARCHAR(100), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); -- Dokument-Versionen CREATE TABLE zeugnis_document_versions ( id VARCHAR(36) PRIMARY KEY, document_id VARCHAR(36) REFERENCES zeugnis_documents(id), version INTEGER NOT NULL, content_hash VARCHAR(64), minio_path TEXT, change_summary TEXT, created_at TIMESTAMP DEFAULT NOW() ); -- Usage Events (Audit Trail) CREATE TABLE zeugnis_usage_events ( id VARCHAR(36) PRIMARY KEY, document_id VARCHAR(36) REFERENCES zeugnis_documents(id), event_type VARCHAR(50) NOT NULL, user_id VARCHAR(100), details JSONB, created_at TIMESTAMP DEFAULT NOW() ); -- Crawler Queue CREATE TABLE zeugnis_crawler_queue ( id VARCHAR(36) PRIMARY KEY, source_id VARCHAR(36) REFERENCES zeugnis_sources(id), priority INTEGER DEFAULT 5, status VARCHAR(20) DEFAULT 'pending', started_at TIMESTAMP, completed_at TIMESTAMP, documents_found INTEGER DEFAULT 0, documents_indexed INTEGER DEFAULT 0, error_count INTEGER DEFAULT 0, created_at TIMESTAMP DEFAULT NOW() ); ``` ### 3.2 In-Memory Modelle (Python Dataclasses) #### 3.2.1 Klausur-Modelle ```python @dataclass class StudentKlausur: id: str klausur_id: str student_name: str status: StudentKlausurStatus # Enum criteria_scores: Dict[str, Dict] gutachten: Optional[Dict] file_path: Optional[str] ocr_text: Optional[str] created_at: datetime updated_at: datetime @dataclass class Klausur: id: str title: str subject: str modus: KlausurModus # LANDES_ABITUR, VORABITUR year: int semester: str erwartungshorizont: Optional[Dict] students: List[StudentKlausur] created_at: datetime tenant_id: str ``` #### 3.2.2 Zeugnis-Modelle (zeugnis_models.py) ```python class LicenseType(str, Enum): PUBLIC_DOMAIN = "public_domain" CC_BY = "cc_by" CC_BY_SA = "cc_by_sa" CC_BY_NC = "cc_by_nc" GOV_STATUTE_FREE_USE = "gov_statute" ALL_RIGHTS_RESERVED = "all_rights" UNKNOWN_REQUIRES_REVIEW = "unknown" class CrawlStatus(str, Enum): PENDING = "pending" RUNNING = "running" COMPLETED = "completed" FAILED = "failed" PAUSED = "paused" class ZeugnisSource(BaseModel): id: str bundesland: str name: str base_url: Optional[str] license_type: LicenseType training_allowed: bool verified_by: Optional[str] verified_at: Optional[datetime] ``` ### 3.3 Qdrant Collections #### 3.3.1 BYOEH Collection (bp_eh) ```python # Collection Config collection_name = "bp_eh" vector_size = 384 # all-MiniLM-L6-v2 distance = Distance.COSINE # Payload Schema { "tenant_id": str, # Tenant-Isolation "eh_id": str, # Erwartungshorizont ID "chunk_index": int, # Position im Dokument "subject": str, # Fach "encrypted_content": str, # AES-256-GCM encrypted "training_allowed": bool, # IMMER False für EH } ``` #### 3.3.2 Zeugnis Collection (bp_zeugnis) ```python # Collection Config collection_name = "bp_zeugnis" vector_size = 384 # all-MiniLM-L6-v2 distance = Distance.COSINE # Payload Schema { "document_id": str, "chunk_index": int, "chunk_text": str, # Preview (max 500 chars) "bundesland": str, "doc_type": str, "title": str, "source_url": str, "training_allowed": bool, # Von Source geerbt "indexed_at": str, } ``` ### 3.4 MinIO Bucket-Struktur ``` breakpilot-rag/ ├── landes-daten/ │ ├── {bundesland}/ │ │ └── zeugnis/ │ │ └── {year}/ │ │ └── {filename}.pdf │ └── klausur/ │ └── {year}/ │ └── {subject}/ │ └── {filename}.pdf │ └── lehrer-daten/ └── {tenant_id}/ └── {teacher_id}/ └── {filename}.pdf.enc ``` --- ## 4. API-Spezifikation ### 4.1 Authentifizierung Alle API-Endpunkte erfordern JWT-Authentifizierung: ```http Authorization: Bearer ``` JWT-Payload: ```json { "sub": "user-id", "tenant_id": "school-id", "roles": ["teacher", "admin"], "exp": 1704067200 } ``` ### 4.2 Fehlerbehandlung #### Standard-Fehlerformat ```json { "detail": "Beschreibung des Fehlers", "code": "ERROR_CODE", "timestamp": "2024-01-01T10:00:00Z" } ``` #### HTTP Status Codes | Code | Bedeutung | Verwendung | |------|-----------|------------| | 200 | OK | Erfolgreiche Anfrage | | 201 | Created | Ressource erstellt | | 400 | Bad Request | Ungültige Eingabe | | 401 | Unauthorized | Authentifizierung fehlt | | 403 | Forbidden | Keine Berechtigung | | 404 | Not Found | Ressource nicht gefunden | | 409 | Conflict | Ressourcenkonflikt | | 422 | Unprocessable | Validierungsfehler | | 500 | Internal Error | Serverfehler | | 503 | Unavailable | Service nicht verfügbar | ### 4.3 Pagination ```http GET /api/v1/resource?limit=20&offset=0 ``` Response: ```json { "items": [...], "total": 100, "limit": 20, "offset": 0, "has_more": true } ``` ### 4.4 Rate Limiting | Endpunkt-Typ | Limit | |--------------|-------| | Standard API | 100/min | | RAG Query | 30/min | | Upload | 10/min | | Training Start | 5/hour | --- ## 5. Frontend-Architektur ### 5.1 Verzeichnisstruktur ``` website/ ├── app/ │ ├── admin/ │ │ ├── training/ │ │ │ └── page.tsx # Training Dashboard │ │ ├── zeugnisse-crawler/ │ │ │ └── page.tsx # Crawler Admin │ │ ├── rag/ │ │ │ └── page.tsx # RAG Admin │ │ └── uni-crawler/ │ │ └── page.tsx # Uni Crawler │ │ │ ├── zeugnisse/ │ │ └── page.tsx # Lehrer-Frontend │ │ │ └── api/ │ └── admin/ │ ├── zeugnisse-crawler/ │ │ └── route.ts # API Proxy │ └── training/ │ └── route.ts # API Proxy │ ├── components/ │ ├── ui/ # Basis-Komponenten │ └── shared/ # Geteilte Komponenten │ └── lib/ ├── api.ts # API Client └── utils.ts # Hilfsfunktionen ``` ### 5.2 Komponenten-Hierarchie ``` App ├── Layout │ ├── Header │ │ ├── Navigation │ │ └── UserMenu │ └── Sidebar │ ├── Training Dashboard Page │ ├── StatsCards │ ├── TrainingJobCard │ │ ├── ProgressRing │ │ ├── MetricCards │ │ └── LossChart │ ├── DatasetOverview │ └── NewTrainingModal (Wizard) │ ├── Zeugnisse Crawler Page │ ├── StatsCards │ ├── BundeslandTable │ ├── DocumentList │ └── CrawlerControls │ └── Lehrer Zeugnisse Page ├── OnboardingWizard ├── ChatInterface │ └── MessageList ├── SearchInterface │ └── SearchResults └── DocumentBrowser ``` ### 5.3 State Management #### Local State (useState) - UI-Zustand (Modals, Tabs) - Formular-Eingaben - Lokale Filter #### Server State (SWR/Fetch) - API-Daten mit Polling - Caching - Revalidierung #### Persisted State (localStorage) - Benutzereinstellungen - Letzte Suchen - Wizard-Status ### 5.4 Styling-Konventionen ```typescript // Tailwind CSS Klassennamen const buttonPrimary = "px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition" const buttonSecondary = "px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 transition" const card = "bg-white dark:bg-gray-800 rounded-xl shadow-lg border border-gray-200 dark:border-gray-700" const input = "px-3 py-2 bg-gray-100 dark:bg-gray-900 border-0 rounded-lg focus:ring-2 focus:ring-blue-500" ``` --- ## 6. Sicherheit & Compliance ### 6.1 Authentifizierung & Autorisierung #### JWT-Validierung ```python def verify_jwt(token: str) -> dict: try: payload = jwt.decode(token, JWT_SECRET, algorithms=["HS256"]) return payload except jwt.ExpiredSignatureError: raise HTTPException(status_code=401, detail="Token expired") except jwt.InvalidTokenError: raise HTTPException(status_code=401, detail="Invalid token") ``` #### RBAC Rollen | Rolle | Berechtigungen | |-------|----------------| | teacher | Klausuren erstellen, EH hochladen, Zeugnis-Assistent | | admin | + Crawler steuern, Training starten | | superadmin | + System-Konfiguration | ### 6.2 Datenverschlüsselung #### AES-256-GCM für EH-Dokumente ```python from cryptography.hazmat.primitives.ciphers.aead import AESGCM def encrypt_text(plaintext: str, passphrase: str) -> tuple: salt = os.urandom(16) iv = os.urandom(12) key = derive_key(passphrase, salt) cipher = AESGCM(key) ciphertext = cipher.encrypt(iv, plaintext.encode(), None) return base64.b64encode(salt + iv + ciphertext).decode(), hash_key(key) ``` ### 6.3 Tenant-Isolation - Alle Datenbankabfragen filtern nach `tenant_id` - Qdrant-Suchen mit `tenant_id` Filter - MinIO-Pfade enthalten `tenant_id` ### 6.4 Audit-Trail ```python async def log_event(event_type: str, resource_id: str, user_id: str, details: dict): await log_zeugnis_event( document_id=resource_id, event_type=event_type, user_id=user_id, details=details, ) ``` ### 6.5 DSGVO-Compliance - Datenexport-Funktion - Lösch-Anfragen - Einwilligungs-Tracking - Protokollierung aller Zugriffe --- ## 7. Testing-Strategie ### 7.1 Test-Pyramide ``` /\ / \ / E2E \ <- 5% (Critical Paths) /------\ / Integ \ <- 25% (API, DB) /----------\ / Unit \ <- 70% (Functions) /--------------\ ``` ### 7.2 Unit Tests (Python) **Speicherort:** `klausur-service/backend/tests/` ```python # tests/test_zeugnis_models.py import pytest from zeugnis_models import ( LicenseType, get_training_allowed, get_bundesland_name ) class TestTrainingPermissions: def test_niedersachsen_allows_training(self): assert get_training_allowed("ni") == True def test_berlin_disallows_training(self): assert get_training_allowed("be") == False def test_unknown_bundesland_disallows(self): assert get_training_allowed("xx") == False class TestBundeslandNames: def test_valid_code_returns_name(self): assert get_bundesland_name("ni") == "Niedersachsen" def test_invalid_code_returns_code(self): assert get_bundesland_name("xx") == "xx" ``` ```python # tests/test_zeugnis_crawler.py import pytest from zeugnis_crawler import chunk_text, compute_hash, extract_text_from_pdf class TestChunking: def test_short_text_single_chunk(self): text = "Dies ist ein kurzer Text." chunks = chunk_text(text, chunk_size=100) assert len(chunks) == 1 def test_long_text_multiple_chunks(self): text = "A" * 2000 chunks = chunk_text(text, chunk_size=500, overlap=50) assert len(chunks) > 1 def test_overlap_preserved(self): text = "ABCDE" * 200 chunks = chunk_text(text, chunk_size=100, overlap=20) for i in range(1, len(chunks)): assert chunks[i][:20] == chunks[i-1][-20:] class TestHashing: def test_same_content_same_hash(self): content = b"Hello World" assert compute_hash(content) == compute_hash(content) def test_different_content_different_hash(self): assert compute_hash(b"Hello") != compute_hash(b"World") ``` ### 7.3 Integration Tests ```python # tests/test_zeugnis_api_integration.py import pytest from httpx import AsyncClient from main import app @pytest.fixture async def client(): async with AsyncClient(app=app, base_url="http://test") as ac: yield ac @pytest.mark.asyncio class TestZeugnisAPI: async def test_get_sources_returns_list(self, client): response = await client.get("/api/v1/admin/zeugnis/sources") assert response.status_code == 200 assert isinstance(response.json(), list) async def test_start_crawler_without_running(self, client): response = await client.post( "/api/v1/admin/zeugnis/crawler/start", json={"bundesland": "ni"} ) assert response.status_code == 200 async def test_start_crawler_while_running_fails(self, client): # First start await client.post("/api/v1/admin/zeugnis/crawler/start") # Second start should fail response = await client.post("/api/v1/admin/zeugnis/crawler/start") assert response.status_code == 409 ``` ### 7.4 E2E Tests (Playwright) ```typescript // tests/e2e/zeugnisse.spec.ts import { test, expect } from '@playwright/test' test.describe('Zeugnis-Assistent', () => { test('onboarding wizard completes successfully', async ({ page }) => { await page.goto('/zeugnisse') // Step 1: Welcome await expect(page.locator('h2')).toContainText('Willkommen') await page.click('button:has-text("Weiter")') // Step 2: Select Bundesland await page.click('button:has-text("Niedersachsen")') await page.click('button:has-text("Weiter")') // Step 3: Select Schulform await page.click('button:has-text("Gymnasium")') await page.click('button:has-text("Weiter")') // Step 4: Complete await page.click('button:has-text("Loslegen")') // Verify main interface await expect(page.locator('h1')).toContainText('Zeugnis-Assistent') }) test('chat interface responds to questions', async ({ page }) => { // Skip wizard (set localStorage) await page.goto('/zeugnisse') await page.evaluate(() => { localStorage.setItem('zeugnis-preferences', JSON.stringify({ bundesland: 'ni', schulform: 'gymnasium', hasSeenWizard: true, })) }) await page.reload() // Send message await page.fill('textarea', 'Wie schreibe ich Bemerkungen?') await page.click('button[type="submit"]') // Wait for response await expect(page.locator('.bg-white.rounded-2xl').last()) .toContainText('Bemerkung', { timeout: 10000 }) }) }) ``` ### 7.5 Test-Ausführung ```bash # Unit Tests (Python) cd klausur-service/backend pytest -v tests/ # Mit Coverage pytest --cov=. --cov-report=html tests/ # E2E Tests (Playwright) cd website npx playwright test # Alle Tests ./run-tests.sh ``` --- ## 8. Deployment & Operations ### 8.1 Docker Compose ```yaml # docker-compose.yml (Auszug) services: klausur-service: build: context: ./klausur-service dockerfile: Dockerfile ports: - "8086:8086" environment: - JWT_SECRET=${JWT_SECRET} - QDRANT_URL=http://qdrant:6333 - MINIO_ENDPOINT=minio:9000 - DATABASE_URL=postgres://breakpilot:breakpilot123@postgres:5432/breakpilot_db depends_on: - qdrant - minio - postgres healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8086/health"] interval: 30s timeout: 10s retries: 3 ``` ### 8.2 Dockerfile (Klausur-Service) ```dockerfile FROM python:3.11-slim WORKDIR /app # System dependencies RUN apt-get update && apt-get install -y \ gcc \ libpq-dev \ curl \ && rm -rf /var/lib/apt/lists/* # Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Application code COPY backend/ . # Create directories RUN mkdir -p /app/uploads /app/eh-uploads EXPOSE 8086 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8086"] ``` ### 8.3 Monitoring #### Health Checks ```python @app.get("/health") async def health(): return { "status": "healthy", "service": "klausur-service", "version": "2.0", "timestamp": datetime.now().isoformat(), } @app.get("/health/detailed") async def health_detailed(): # Check dependencies qdrant_ok = await check_qdrant() postgres_ok = await check_postgres() minio_ok = await check_minio() return { "status": "healthy" if all([qdrant_ok, postgres_ok, minio_ok]) else "degraded", "dependencies": { "qdrant": "ok" if qdrant_ok else "error", "postgres": "ok" if postgres_ok else "error", "minio": "ok" if minio_ok else "error", } } ``` #### Prometheus Metrics ```python from prometheus_client import Counter, Histogram # Metriken request_count = Counter('klausur_requests_total', 'Total requests', ['endpoint', 'method']) request_latency = Histogram('klausur_request_latency_seconds', 'Request latency', ['endpoint']) training_jobs = Counter('klausur_training_jobs_total', 'Training jobs', ['status']) ``` ### 8.4 Logging ```python import logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) logger = logging.getLogger("klausur-service") # Strukturiertes Logging logger.info("Training started", extra={ "job_id": job_id, "bundeslaender": config.bundeslaender, "epochs": config.epochs, }) ``` --- ## 9. Entwicklungsrichtlinien ### 9.1 Code-Style #### Python ```python # Imports: Standard → Third-Party → Local import os from datetime import datetime from fastapi import FastAPI, HTTPException from pydantic import BaseModel from zeugnis_models import LicenseType # Docstrings: Google Style def process_document(content: bytes, doc_type: str) -> dict: """Process a document for indexing. Args: content: Raw document bytes. doc_type: Type of document (pdf, html). Returns: Dict with extracted text and metadata. Raises: ValueError: If doc_type is not supported. """ pass # Type Hints: Immer verwenden async def get_sources( bundesland: Optional[str] = None, limit: int = 100, ) -> List[Dict[str, Any]]: pass ``` #### TypeScript ```typescript // Interfaces über Types bevorzugen interface TrainingJob { id: string name: string status: TrainingStatus } // Props-Interface für Komponenten interface TrainingCardProps { job: TrainingJob onPause: () => void onResume: () => void } // Funktionskomponenten mit expliziten Typen export function TrainingCard({ job, onPause, onResume }: TrainingCardProps) { return (...) } ``` ### 9.2 Git-Workflow ``` main │ ├── develop │ │ │ ├── feature/zeugnis-crawler │ ├── feature/training-dashboard │ └── fix/crawler-retry │ └── release/v2.0 ``` #### Commit-Messages ``` feat(zeugnis): add rights-aware crawler - Implement PDF/HTML text extraction - Add training_allowed flag per bundesland - Create audit trail for document access Closes #123 ``` ### 9.3 Review-Checkliste - [ ] Tests vorhanden und bestanden - [ ] Dokumentation aktualisiert - [ ] Type-Hints/Interfaces vollständig - [ ] Keine Hardcoded Credentials - [ ] Error Handling implementiert - [ ] Logging vorhanden - [ ] Performance akzeptabel ### 9.4 Versionierung Semantic Versioning: `MAJOR.MINOR.PATCH` - MAJOR: Breaking Changes - MINOR: Neue Features (rückwärtskompatibel) - PATCH: Bug Fixes --- ## Anhang A: Umgebungsvariablen ```env # Authentifizierung JWT_SECRET=your-super-secret-key # Datenbanken DATABASE_URL=postgres://user:pass@host:5432/db QDRANT_URL=http://qdrant:6333 MINIO_ENDPOINT=minio:9000 MINIO_ACCESS_KEY=breakpilot MINIO_SECRET_KEY=breakpilot123 MINIO_BUCKET=breakpilot-rag # Embeddings EMBEDDING_BACKEND=local # oder "openai" OPENAI_API_KEY=sk-... # Falls openai # Services BACKEND_URL=http://backend:8000 SCHOOL_SERVICE_URL=http://school-service:8084 # Feature Flags BYOEH_ENCRYPTION_ENABLED=true BYOEH_CHUNK_SIZE=1000 BYOEH_CHUNK_OVERLAP=200 ``` --- ## Anhang B: Schnellreferenz ### API-Basis-URLs | Umgebung | URL | |----------|-----| | Lokal | http://localhost:8086 | | Entwicklung | https://dev.breakpilot.app | | Produktion | https://api.breakpilot.app | ### Wichtige Befehle ```bash # Service starten docker-compose up -d klausur-service # Logs anzeigen docker logs -f breakpilot-pwa-klausur-service # Tests ausführen docker exec klausur-service pytest tests/ # DB-Migration docker exec postgres psql -U breakpilot -d breakpilot_db -f /migration.sql ``` --- *Letzte Aktualisierung: Januar 2025*