This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
breakpilot-pwa/docs/klausur-modul/DEVELOPER_SPECIFICATION.md
Benjamin Admin 21a844cb8a fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 09:51:32 +01:00

33 KiB

Klausur-Modul - Vollständige Entwicklerspezifikation

Version: 2.0 Stand: Januar 2025 Autor: BreakPilot Development Team


Inhaltsverzeichnis

  1. Übersicht
  2. Systemarchitektur
  3. Datenmodelle
  4. API-Spezifikation
  5. Frontend-Architektur
  6. Sicherheit & Compliance
  7. Testing-Strategie
  8. Deployment & Operations
  9. Entwicklungsrichtlinien

1. Übersicht

1.1 Modulbeschreibung

Das Klausur-Modul ist ein umfassendes System für die digitale Korrektur, Bewertung und Verwaltung von Abitur- und Vorabiturklausuren. Es besteht aus folgenden Kernkomponenten:

Komponente Beschreibung Technologie
Klausur-Service Backend Hauptservice für alle Klausur-Operationen Python FastAPI
BYOEH (Bring Your Own EH) Erwartungshorizont-Management mit RAG Qdrant, MinIO
Zeugnisse-Modul Verordnungen und KI-Assistent Crawler, Embeddings
Training-Modul KI-Modell Training & Monitoring Background Tasks
Frontend Admin & Lehrer Oberflächen Next.js, React

1.2 Technologie-Stack

┌─────────────────────────────────────────────────────────────┐
│                      Frontend Layer                          │
│   Next.js 15 │ React 18 │ TypeScript │ Tailwind CSS         │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     API Gateway Layer                        │
│        Next.js API Routes │ Server-Side Proxy               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Backend Services                          │
│  Klausur-Service (FastAPI) │ Port 8086                      │
│  ├── main.py (Klausur CRUD, BYOEH)                          │
│  ├── admin_api.py (NiBiS Ingestion)                         │
│  ├── zeugnis_api.py (Zeugnisse Crawler)                     │
│  ├── training_api.py (Training Management)                  │
│  └── metrics_db.py (PostgreSQL Operations)                  │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌──────────────────┐ ┌──────────────┐ ┌────────────────┐
│    PostgreSQL    │ │    Qdrant    │ │     MinIO      │
│    (Metadata)    │ │  (Vectors)   │ │  (Documents)   │
│    Port 5432     │ │  Port 6333   │ │  Port 9000     │
└──────────────────┘ └──────────────┘ └────────────────┘

1.3 Kernfunktionen

  1. Klausurverwaltung

    • Erstellen/Bearbeiten von Klausuren
    • Upload von Schülerarbeiten
    • Kriterien-basierte Bewertung
    • Gutachten-Generierung
  2. BYOEH - Erwartungshorizont

    • Upload & Verschlüsselung von EH-Dokumenten
    • Chunking & Embedding-Generierung
    • RAG-basierte Suche
    • Tenant-Isolation
  3. Zeugnisse

    • Rights-Aware Crawler für Verordnungen
    • KI-Assistent für Lehrer
    • Bundesland-spezifische Suche
  4. Training

    • Modell-Training mit Monitoring
    • Hyperparameter-Konfiguration
    • Versions-Management

2. Systemarchitektur

2.1 Microservice-Architektur

                    ┌───────────────────┐
                    │   Load Balancer   │
                    │     (Nginx)       │
                    └─────────┬─────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   Website     │    │   Backend     │    │ Klausur-Svc   │
│  (Next.js)    │    │  (FastAPI)    │    │  (FastAPI)    │
│   Port 3000   │    │  Port 8000    │    │  Port 8086    │
└───────────────┘    └───────────────┘    └───────────────┘
        │                     │                     │
        └─────────────────────┴─────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │   Service Mesh    │
                    │   (Docker Net)    │
                    └─────────┬─────────┘
                              │
        ┌─────────────┬───────┴───────┬─────────────┐
        ▼             ▼               ▼             ▼
   PostgreSQL      Qdrant          MinIO       Mailpit

2.2 Datenfluss

2.2.1 Klausur-Korrektur Flow

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│ Upload   │────▶│   OCR    │────▶│ Analyse  │────▶│ Bewertung│
│ Arbeit   │     │ (extern) │     │   (LLM)  │     │ Kriterien│
└──────────┘     └──────────┘     └──────────┘     └──────────┘
                                                         │
                                                         ▼
┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Export  │◀────│ Finalize │◀────│Gutachten │◀────│ RAG-EH   │
│   PDF    │     │          │     │ Generate │     │  Query   │
└──────────┘     └──────────┘     └──────────┘     └──────────┘

2.2.2 Zeugnis-Crawler Flow

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│ Seed URL │────▶│  Fetch   │────▶│ Extract  │────▶│ Check    │
│ (Config) │     │  HTTP    │     │ PDF/HTML │     │ Rights   │
└──────────┘     └──────────┘     └──────────┘     └──────────┘
                                                         │
                                         ┌───────────────┤
                                         ▼               ▼
                                   ┌──────────┐   ┌──────────┐
                                   │  MinIO   │   │  Qdrant  │
                                   │ (Store)  │   │ (Index)  │
                                   └──────────┘   └──────────┘

2.3 Komponenten-Details

2.3.1 Klausur-Service (main.py)

Endpunkt Methode Beschreibung
/api/v1/klausuren GET Liste aller Klausuren
/api/v1/klausuren POST Neue Klausur erstellen
/api/v1/klausuren/{id} GET Klausur-Details
/api/v1/klausuren/{id} PUT Klausur aktualisieren
/api/v1/klausuren/{id} DELETE Klausur löschen
/api/v1/klausuren/{id}/students POST Schülerarbeit hinzufügen
/api/v1/students/{id}/criteria PUT Kriterien bewerten
/api/v1/students/{id}/gutachten PUT Gutachten speichern
/api/v1/students/{id}/gutachten/generate POST Gutachten generieren

2.3.2 BYOEH (eh_pipeline.py, qdrant_service.py)

Endpunkt Methode Beschreibung
/api/v1/eh/upload POST EH hochladen
/api/v1/eh/{id}/index POST EH indexieren
/api/v1/eh/rag-query POST RAG-Suche
/api/v1/eh/{id}/share POST EH teilen
/api/v1/eh/{id}/link-klausur POST EH mit Klausur verknüpfen

2.3.3 Zeugnis-Modul (zeugnis_api.py)

Endpunkt Methode Beschreibung
/api/v1/admin/zeugnis/sources GET Bundesländer-Quellen
/api/v1/admin/zeugnis/crawler/start POST Crawler starten
/api/v1/admin/zeugnis/crawler/stop POST Crawler stoppen
/api/v1/admin/zeugnis/documents GET Dokumente abrufen
/api/v1/admin/zeugnis/stats GET Statistiken

2.3.4 Training-Modul (training_api.py)

Endpunkt Methode Beschreibung
/api/v1/admin/training/jobs GET Training-Jobs
/api/v1/admin/training/jobs POST Training starten
/api/v1/admin/training/jobs/{id}/pause POST Pausieren
/api/v1/admin/training/jobs/{id}/resume POST Fortsetzen
/api/v1/admin/training/models GET Modell-Versionen

3. Datenmodelle

3.1 PostgreSQL Schema

3.1.1 Kern-Tabellen (metrics_db.py)

-- RAG Feedback
CREATE TABLE rag_search_feedback (
    id SERIAL PRIMARY KEY,
    result_id VARCHAR(255) NOT NULL,
    query_text TEXT,
    collection_name VARCHAR(100),
    score FLOAT,
    rating INTEGER CHECK (rating >= 1 AND rating <= 5),
    notes TEXT,
    user_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

-- RAG Search Logs
CREATE TABLE rag_search_logs (
    id SERIAL PRIMARY KEY,
    query_text TEXT NOT NULL,
    collection_name VARCHAR(100),
    result_count INTEGER,
    latency_ms INTEGER,
    top_score FLOAT,
    filters JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Relevanz-Judgments für Precision/Recall
CREATE TABLE rag_relevance_judgments (
    id SERIAL PRIMARY KEY,
    query_id VARCHAR(255) NOT NULL,
    query_text TEXT NOT NULL,
    result_id VARCHAR(255) NOT NULL,
    result_rank INTEGER,
    is_relevant BOOLEAN NOT NULL,
    collection_name VARCHAR(100),
    user_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

3.1.2 Zeugnis-Tabellen

-- Bundesland-Quellen
CREATE TABLE zeugnis_sources (
    id VARCHAR(36) PRIMARY KEY,
    bundesland VARCHAR(10) NOT NULL,
    name VARCHAR(255) NOT NULL,
    base_url TEXT,
    license_type VARCHAR(50) NOT NULL,
    training_allowed BOOLEAN DEFAULT FALSE,
    verified_by VARCHAR(100),
    verified_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Seed URLs
CREATE TABLE zeugnis_seed_urls (
    id VARCHAR(36) PRIMARY KEY,
    source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
    url TEXT NOT NULL,
    doc_type VARCHAR(50),
    status VARCHAR(20) DEFAULT 'pending',
    last_crawled TIMESTAMP,
    error_message TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Dokumente
CREATE TABLE zeugnis_documents (
    id VARCHAR(36) PRIMARY KEY,
    seed_url_id VARCHAR(36) REFERENCES zeugnis_seed_urls(id),
    title VARCHAR(500),
    url TEXT NOT NULL,
    content_hash VARCHAR(64),
    minio_path TEXT,
    training_allowed BOOLEAN DEFAULT FALSE,
    indexed_in_qdrant BOOLEAN DEFAULT FALSE,
    file_size INTEGER,
    content_type VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Dokument-Versionen
CREATE TABLE zeugnis_document_versions (
    id VARCHAR(36) PRIMARY KEY,
    document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
    version INTEGER NOT NULL,
    content_hash VARCHAR(64),
    minio_path TEXT,
    change_summary TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Usage Events (Audit Trail)
CREATE TABLE zeugnis_usage_events (
    id VARCHAR(36) PRIMARY KEY,
    document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
    event_type VARCHAR(50) NOT NULL,
    user_id VARCHAR(100),
    details JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Crawler Queue
CREATE TABLE zeugnis_crawler_queue (
    id VARCHAR(36) PRIMARY KEY,
    source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
    priority INTEGER DEFAULT 5,
    status VARCHAR(20) DEFAULT 'pending',
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    documents_found INTEGER DEFAULT 0,
    documents_indexed INTEGER DEFAULT 0,
    error_count INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

3.2 In-Memory Modelle (Python Dataclasses)

3.2.1 Klausur-Modelle

@dataclass
class StudentKlausur:
    id: str
    klausur_id: str
    student_name: str
    status: StudentKlausurStatus  # Enum
    criteria_scores: Dict[str, Dict]
    gutachten: Optional[Dict]
    file_path: Optional[str]
    ocr_text: Optional[str]
    created_at: datetime
    updated_at: datetime

@dataclass
class Klausur:
    id: str
    title: str
    subject: str
    modus: KlausurModus  # LANDES_ABITUR, VORABITUR
    year: int
    semester: str
    erwartungshorizont: Optional[Dict]
    students: List[StudentKlausur]
    created_at: datetime
    tenant_id: str

3.2.2 Zeugnis-Modelle (zeugnis_models.py)

class LicenseType(str, Enum):
    PUBLIC_DOMAIN = "public_domain"
    CC_BY = "cc_by"
    CC_BY_SA = "cc_by_sa"
    CC_BY_NC = "cc_by_nc"
    GOV_STATUTE_FREE_USE = "gov_statute"
    ALL_RIGHTS_RESERVED = "all_rights"
    UNKNOWN_REQUIRES_REVIEW = "unknown"

class CrawlStatus(str, Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    PAUSED = "paused"

class ZeugnisSource(BaseModel):
    id: str
    bundesland: str
    name: str
    base_url: Optional[str]
    license_type: LicenseType
    training_allowed: bool
    verified_by: Optional[str]
    verified_at: Optional[datetime]

3.3 Qdrant Collections

3.3.1 BYOEH Collection (bp_eh)

# Collection Config
collection_name = "bp_eh"
vector_size = 384  # all-MiniLM-L6-v2
distance = Distance.COSINE

# Payload Schema
{
    "tenant_id": str,           # Tenant-Isolation
    "eh_id": str,               # Erwartungshorizont ID
    "chunk_index": int,         # Position im Dokument
    "subject": str,             # Fach
    "encrypted_content": str,   # AES-256-GCM encrypted
    "training_allowed": bool,   # IMMER False für EH
}

3.3.2 Zeugnis Collection (bp_zeugnis)

# Collection Config
collection_name = "bp_zeugnis"
vector_size = 384  # all-MiniLM-L6-v2
distance = Distance.COSINE

# Payload Schema
{
    "document_id": str,
    "chunk_index": int,
    "chunk_text": str,          # Preview (max 500 chars)
    "bundesland": str,
    "doc_type": str,
    "title": str,
    "source_url": str,
    "training_allowed": bool,   # Von Source geerbt
    "indexed_at": str,
}

3.4 MinIO Bucket-Struktur

breakpilot-rag/
├── landes-daten/
│   ├── {bundesland}/
│   │   └── zeugnis/
│   │       └── {year}/
│   │           └── {filename}.pdf
│   └── klausur/
│       └── {year}/
│           └── {subject}/
│               └── {filename}.pdf
│
└── lehrer-daten/
    └── {tenant_id}/
        └── {teacher_id}/
            └── {filename}.pdf.enc

4. API-Spezifikation

4.1 Authentifizierung

Alle API-Endpunkte erfordern JWT-Authentifizierung:

Authorization: Bearer <jwt_token>

JWT-Payload:

{
  "sub": "user-id",
  "tenant_id": "school-id",
  "roles": ["teacher", "admin"],
  "exp": 1704067200
}

4.2 Fehlerbehandlung

Standard-Fehlerformat

{
  "detail": "Beschreibung des Fehlers",
  "code": "ERROR_CODE",
  "timestamp": "2024-01-01T10:00:00Z"
}

HTTP Status Codes

Code Bedeutung Verwendung
200 OK Erfolgreiche Anfrage
201 Created Ressource erstellt
400 Bad Request Ungültige Eingabe
401 Unauthorized Authentifizierung fehlt
403 Forbidden Keine Berechtigung
404 Not Found Ressource nicht gefunden
409 Conflict Ressourcenkonflikt
422 Unprocessable Validierungsfehler
500 Internal Error Serverfehler
503 Unavailable Service nicht verfügbar

4.3 Pagination

GET /api/v1/resource?limit=20&offset=0

Response:

{
  "items": [...],
  "total": 100,
  "limit": 20,
  "offset": 0,
  "has_more": true
}

4.4 Rate Limiting

Endpunkt-Typ Limit
Standard API 100/min
RAG Query 30/min
Upload 10/min
Training Start 5/hour

5. Frontend-Architektur

5.1 Verzeichnisstruktur

website/
├── app/
│   ├── admin/
│   │   ├── training/
│   │   │   └── page.tsx          # Training Dashboard
│   │   ├── zeugnisse-crawler/
│   │   │   └── page.tsx          # Crawler Admin
│   │   ├── rag/
│   │   │   └── page.tsx          # RAG Admin
│   │   └── uni-crawler/
│   │       └── page.tsx          # Uni Crawler
│   │
│   ├── zeugnisse/
│   │   └── page.tsx              # Lehrer-Frontend
│   │
│   └── api/
│       └── admin/
│           ├── zeugnisse-crawler/
│           │   └── route.ts      # API Proxy
│           └── training/
│               └── route.ts      # API Proxy
│
├── components/
│   ├── ui/                       # Basis-Komponenten
│   └── shared/                   # Geteilte Komponenten
│
└── lib/
    ├── api.ts                    # API Client
    └── utils.ts                  # Hilfsfunktionen

5.2 Komponenten-Hierarchie

App
├── Layout
│   ├── Header
│   │   ├── Navigation
│   │   └── UserMenu
│   └── Sidebar
│
├── Training Dashboard Page
│   ├── StatsCards
│   ├── TrainingJobCard
│   │   ├── ProgressRing
│   │   ├── MetricCards
│   │   └── LossChart
│   ├── DatasetOverview
│   └── NewTrainingModal (Wizard)
│
├── Zeugnisse Crawler Page
│   ├── StatsCards
│   ├── BundeslandTable
│   ├── DocumentList
│   └── CrawlerControls
│
└── Lehrer Zeugnisse Page
    ├── OnboardingWizard
    ├── ChatInterface
    │   └── MessageList
    ├── SearchInterface
    │   └── SearchResults
    └── DocumentBrowser

5.3 State Management

Local State (useState)

  • UI-Zustand (Modals, Tabs)
  • Formular-Eingaben
  • Lokale Filter

Server State (SWR/Fetch)

  • API-Daten mit Polling
  • Caching
  • Revalidierung

Persisted State (localStorage)

  • Benutzereinstellungen
  • Letzte Suchen
  • Wizard-Status

5.4 Styling-Konventionen

// Tailwind CSS Klassennamen
const buttonPrimary = "px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition"
const buttonSecondary = "px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 transition"
const card = "bg-white dark:bg-gray-800 rounded-xl shadow-lg border border-gray-200 dark:border-gray-700"
const input = "px-3 py-2 bg-gray-100 dark:bg-gray-900 border-0 rounded-lg focus:ring-2 focus:ring-blue-500"

6. Sicherheit & Compliance

6.1 Authentifizierung & Autorisierung

JWT-Validierung

def verify_jwt(token: str) -> dict:
    try:
        payload = jwt.decode(token, JWT_SECRET, algorithms=["HS256"])
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="Token expired")
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

RBAC Rollen

Rolle Berechtigungen
teacher Klausuren erstellen, EH hochladen, Zeugnis-Assistent
admin + Crawler steuern, Training starten
superadmin + System-Konfiguration

6.2 Datenverschlüsselung

AES-256-GCM für EH-Dokumente

from cryptography.hazmat.primitives.ciphers.aead import AESGCM

def encrypt_text(plaintext: str, passphrase: str) -> tuple:
    salt = os.urandom(16)
    iv = os.urandom(12)
    key = derive_key(passphrase, salt)
    cipher = AESGCM(key)
    ciphertext = cipher.encrypt(iv, plaintext.encode(), None)
    return base64.b64encode(salt + iv + ciphertext).decode(), hash_key(key)

6.3 Tenant-Isolation

  • Alle Datenbankabfragen filtern nach tenant_id
  • Qdrant-Suchen mit tenant_id Filter
  • MinIO-Pfade enthalten tenant_id

6.4 Audit-Trail

async def log_event(event_type: str, resource_id: str, user_id: str, details: dict):
    await log_zeugnis_event(
        document_id=resource_id,
        event_type=event_type,
        user_id=user_id,
        details=details,
    )

6.5 DSGVO-Compliance

  • Datenexport-Funktion
  • Lösch-Anfragen
  • Einwilligungs-Tracking
  • Protokollierung aller Zugriffe

7. Testing-Strategie

7.1 Test-Pyramide

           /\
          /  \
         / E2E \       <- 5% (Critical Paths)
        /------\
       /  Integ  \     <- 25% (API, DB)
      /----------\
     /    Unit    \    <- 70% (Functions)
    /--------------\

7.2 Unit Tests (Python)

Speicherort: klausur-service/backend/tests/

# tests/test_zeugnis_models.py
import pytest
from zeugnis_models import (
    LicenseType, get_training_allowed, get_bundesland_name
)

class TestTrainingPermissions:
    def test_niedersachsen_allows_training(self):
        assert get_training_allowed("ni") == True

    def test_berlin_disallows_training(self):
        assert get_training_allowed("be") == False

    def test_unknown_bundesland_disallows(self):
        assert get_training_allowed("xx") == False


class TestBundeslandNames:
    def test_valid_code_returns_name(self):
        assert get_bundesland_name("ni") == "Niedersachsen"

    def test_invalid_code_returns_code(self):
        assert get_bundesland_name("xx") == "xx"
# tests/test_zeugnis_crawler.py
import pytest
from zeugnis_crawler import chunk_text, compute_hash, extract_text_from_pdf

class TestChunking:
    def test_short_text_single_chunk(self):
        text = "Dies ist ein kurzer Text."
        chunks = chunk_text(text, chunk_size=100)
        assert len(chunks) == 1

    def test_long_text_multiple_chunks(self):
        text = "A" * 2000
        chunks = chunk_text(text, chunk_size=500, overlap=50)
        assert len(chunks) > 1

    def test_overlap_preserved(self):
        text = "ABCDE" * 200
        chunks = chunk_text(text, chunk_size=100, overlap=20)
        for i in range(1, len(chunks)):
            assert chunks[i][:20] == chunks[i-1][-20:]


class TestHashing:
    def test_same_content_same_hash(self):
        content = b"Hello World"
        assert compute_hash(content) == compute_hash(content)

    def test_different_content_different_hash(self):
        assert compute_hash(b"Hello") != compute_hash(b"World")

7.3 Integration Tests

# tests/test_zeugnis_api_integration.py
import pytest
from httpx import AsyncClient
from main import app

@pytest.fixture
async def client():
    async with AsyncClient(app=app, base_url="http://test") as ac:
        yield ac

@pytest.mark.asyncio
class TestZeugnisAPI:
    async def test_get_sources_returns_list(self, client):
        response = await client.get("/api/v1/admin/zeugnis/sources")
        assert response.status_code == 200
        assert isinstance(response.json(), list)

    async def test_start_crawler_without_running(self, client):
        response = await client.post(
            "/api/v1/admin/zeugnis/crawler/start",
            json={"bundesland": "ni"}
        )
        assert response.status_code == 200

    async def test_start_crawler_while_running_fails(self, client):
        # First start
        await client.post("/api/v1/admin/zeugnis/crawler/start")
        # Second start should fail
        response = await client.post("/api/v1/admin/zeugnis/crawler/start")
        assert response.status_code == 409

7.4 E2E Tests (Playwright)

// tests/e2e/zeugnisse.spec.ts
import { test, expect } from '@playwright/test'

test.describe('Zeugnis-Assistent', () => {
  test('onboarding wizard completes successfully', async ({ page }) => {
    await page.goto('/zeugnisse')

    // Step 1: Welcome
    await expect(page.locator('h2')).toContainText('Willkommen')
    await page.click('button:has-text("Weiter")')

    // Step 2: Select Bundesland
    await page.click('button:has-text("Niedersachsen")')
    await page.click('button:has-text("Weiter")')

    // Step 3: Select Schulform
    await page.click('button:has-text("Gymnasium")')
    await page.click('button:has-text("Weiter")')

    // Step 4: Complete
    await page.click('button:has-text("Loslegen")')

    // Verify main interface
    await expect(page.locator('h1')).toContainText('Zeugnis-Assistent')
  })

  test('chat interface responds to questions', async ({ page }) => {
    // Skip wizard (set localStorage)
    await page.goto('/zeugnisse')
    await page.evaluate(() => {
      localStorage.setItem('zeugnis-preferences', JSON.stringify({
        bundesland: 'ni',
        schulform: 'gymnasium',
        hasSeenWizard: true,
      }))
    })
    await page.reload()

    // Send message
    await page.fill('textarea', 'Wie schreibe ich Bemerkungen?')
    await page.click('button[type="submit"]')

    // Wait for response
    await expect(page.locator('.bg-white.rounded-2xl').last())
      .toContainText('Bemerkung', { timeout: 10000 })
  })
})

7.5 Test-Ausführung

# Unit Tests (Python)
cd klausur-service/backend
pytest -v tests/

# Mit Coverage
pytest --cov=. --cov-report=html tests/

# E2E Tests (Playwright)
cd website
npx playwright test

# Alle Tests
./run-tests.sh

8. Deployment & Operations

8.1 Docker Compose

# docker-compose.yml (Auszug)
services:
  klausur-service:
    build:
      context: ./klausur-service
      dockerfile: Dockerfile
    ports:
      - "8086:8086"
    environment:
      - JWT_SECRET=${JWT_SECRET}
      - QDRANT_URL=http://qdrant:6333
      - MINIO_ENDPOINT=minio:9000
      - DATABASE_URL=postgres://breakpilot:breakpilot123@postgres:5432/breakpilot_db
    depends_on:
      - qdrant
      - minio
      - postgres
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8086/health"]
      interval: 30s
      timeout: 10s
      retries: 3

8.2 Dockerfile (Klausur-Service)

FROM python:3.11-slim

WORKDIR /app

# System dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Application code
COPY backend/ .

# Create directories
RUN mkdir -p /app/uploads /app/eh-uploads

EXPOSE 8086

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8086"]

8.3 Monitoring

Health Checks

@app.get("/health")
async def health():
    return {
        "status": "healthy",
        "service": "klausur-service",
        "version": "2.0",
        "timestamp": datetime.now().isoformat(),
    }

@app.get("/health/detailed")
async def health_detailed():
    # Check dependencies
    qdrant_ok = await check_qdrant()
    postgres_ok = await check_postgres()
    minio_ok = await check_minio()

    return {
        "status": "healthy" if all([qdrant_ok, postgres_ok, minio_ok]) else "degraded",
        "dependencies": {
            "qdrant": "ok" if qdrant_ok else "error",
            "postgres": "ok" if postgres_ok else "error",
            "minio": "ok" if minio_ok else "error",
        }
    }

Prometheus Metrics

from prometheus_client import Counter, Histogram

# Metriken
request_count = Counter('klausur_requests_total', 'Total requests', ['endpoint', 'method'])
request_latency = Histogram('klausur_request_latency_seconds', 'Request latency', ['endpoint'])
training_jobs = Counter('klausur_training_jobs_total', 'Training jobs', ['status'])

8.4 Logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger("klausur-service")

# Strukturiertes Logging
logger.info("Training started", extra={
    "job_id": job_id,
    "bundeslaender": config.bundeslaender,
    "epochs": config.epochs,
})

9. Entwicklungsrichtlinien

9.1 Code-Style

Python

# Imports: Standard → Third-Party → Local
import os
from datetime import datetime

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

from zeugnis_models import LicenseType

# Docstrings: Google Style
def process_document(content: bytes, doc_type: str) -> dict:
    """Process a document for indexing.

    Args:
        content: Raw document bytes.
        doc_type: Type of document (pdf, html).

    Returns:
        Dict with extracted text and metadata.

    Raises:
        ValueError: If doc_type is not supported.
    """
    pass

# Type Hints: Immer verwenden
async def get_sources(
    bundesland: Optional[str] = None,
    limit: int = 100,
) -> List[Dict[str, Any]]:
    pass

TypeScript

// Interfaces über Types bevorzugen
interface TrainingJob {
  id: string
  name: string
  status: TrainingStatus
}

// Props-Interface für Komponenten
interface TrainingCardProps {
  job: TrainingJob
  onPause: () => void
  onResume: () => void
}

// Funktionskomponenten mit expliziten Typen
export function TrainingCard({ job, onPause, onResume }: TrainingCardProps) {
  return (...)
}

9.2 Git-Workflow

main
  │
  ├── develop
  │     │
  │     ├── feature/zeugnis-crawler
  │     ├── feature/training-dashboard
  │     └── fix/crawler-retry
  │
  └── release/v2.0

Commit-Messages

feat(zeugnis): add rights-aware crawler

- Implement PDF/HTML text extraction
- Add training_allowed flag per bundesland
- Create audit trail for document access

Closes #123

9.3 Review-Checkliste

  • Tests vorhanden und bestanden
  • Dokumentation aktualisiert
  • Type-Hints/Interfaces vollständig
  • Keine Hardcoded Credentials
  • Error Handling implementiert
  • Logging vorhanden
  • Performance akzeptabel

9.4 Versionierung

Semantic Versioning: MAJOR.MINOR.PATCH

  • MAJOR: Breaking Changes
  • MINOR: Neue Features (rückwärtskompatibel)
  • PATCH: Bug Fixes

Anhang A: Umgebungsvariablen

# Authentifizierung
JWT_SECRET=your-super-secret-key

# Datenbanken
DATABASE_URL=postgres://user:pass@host:5432/db
QDRANT_URL=http://qdrant:6333
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=breakpilot
MINIO_SECRET_KEY=breakpilot123
MINIO_BUCKET=breakpilot-rag

# Embeddings
EMBEDDING_BACKEND=local  # oder "openai"
OPENAI_API_KEY=sk-...    # Falls openai

# Services
BACKEND_URL=http://backend:8000
SCHOOL_SERVICE_URL=http://school-service:8084

# Feature Flags
BYOEH_ENCRYPTION_ENABLED=true
BYOEH_CHUNK_SIZE=1000
BYOEH_CHUNK_OVERLAP=200

Anhang B: Schnellreferenz

API-Basis-URLs

Umgebung URL
Lokal http://localhost:8086
Entwicklung https://dev.breakpilot.app
Produktion https://api.breakpilot.app

Wichtige Befehle

# Service starten
docker-compose up -d klausur-service

# Logs anzeigen
docker logs -f breakpilot-pwa-klausur-service

# Tests ausführen
docker exec klausur-service pytest tests/

# DB-Migration
docker exec postgres psql -U breakpilot -d breakpilot_db -f /migration.sql

Letzte Aktualisierung: Januar 2025