Archived

This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Benjamin Admin 21a844cb8a fix: Restore all files lost during destructive rebase

A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-09 09:51:32 +01:00

33 KiB

Raw Blame History

Klausur-Modul - Vollständige Entwicklerspezifikation

Version: 2.0 Stand: Januar 2025 Autor: BreakPilot Development Team

1. Übersicht

1.1 Modulbeschreibung

Das Klausur-Modul ist ein umfassendes System für die digitale Korrektur, Bewertung und Verwaltung von Abitur- und Vorabiturklausuren. Es besteht aus folgenden Kernkomponenten:

Komponente	Beschreibung	Technologie
Klausur-Service Backend	Hauptservice für alle Klausur-Operationen	Python FastAPI
BYOEH (Bring Your Own EH)	Erwartungshorizont-Management mit RAG	Qdrant, MinIO
Zeugnisse-Modul	Verordnungen und KI-Assistent	Crawler, Embeddings
Training-Modul	KI-Modell Training & Monitoring	Background Tasks
Frontend	Admin & Lehrer Oberflächen	Next.js, React

1.2 Technologie-Stack

┌─────────────────────────────────────────────────────────────┐
│                      Frontend Layer                          │
│   Next.js 15 │ React 18 │ TypeScript │ Tailwind CSS         │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     API Gateway Layer                        │
│        Next.js API Routes │ Server-Side Proxy               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Backend Services                          │
│  Klausur-Service (FastAPI) │ Port 8086                      │
│  ├── main.py (Klausur CRUD, BYOEH)                          │
│  ├── admin_api.py (NiBiS Ingestion)                         │
│  ├── zeugnis_api.py (Zeugnisse Crawler)                     │
│  ├── training_api.py (Training Management)                  │
│  └── metrics_db.py (PostgreSQL Operations)                  │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌──────────────────┐ ┌──────────────┐ ┌────────────────┐
│    PostgreSQL    │ │    Qdrant    │ │     MinIO      │
│    (Metadata)    │ │  (Vectors)   │ │  (Documents)   │
│    Port 5432     │ │  Port 6333   │ │  Port 9000     │
└──────────────────┘ └──────────────┘ └────────────────┘

1.3 Kernfunktionen

Klausurverwaltung
- Erstellen/Bearbeiten von Klausuren
- Upload von Schülerarbeiten
- Kriterien-basierte Bewertung
- Gutachten-Generierung
BYOEH - Erwartungshorizont
- Upload & Verschlüsselung von EH-Dokumenten
- Chunking & Embedding-Generierung
- RAG-basierte Suche
- Tenant-Isolation
Zeugnisse
- Rights-Aware Crawler für Verordnungen
- KI-Assistent für Lehrer
- Bundesland-spezifische Suche
Training
- Modell-Training mit Monitoring
- Hyperparameter-Konfiguration
- Versions-Management

2. Systemarchitektur

2.1 Microservice-Architektur

                    ┌───────────────────┐
                    │   Load Balancer   │
                    │     (Nginx)       │
                    └─────────┬─────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   Website     │    │   Backend     │    │ Klausur-Svc   │
│  (Next.js)    │    │  (FastAPI)    │    │  (FastAPI)    │
│   Port 3000   │    │  Port 8000    │    │  Port 8086    │
└───────────────┘    └───────────────┘    └───────────────┘
        │                     │                     │
        └─────────────────────┴─────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │   Service Mesh    │
                    │   (Docker Net)    │
                    └─────────┬─────────┘
                              │
        ┌─────────────┬───────┴───────┬─────────────┐
        ▼             ▼               ▼             ▼
   PostgreSQL      Qdrant          MinIO       Mailpit

2.2 Datenfluss

2.2.1 Klausur-Korrektur Flow

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│ Upload   │────▶│   OCR    │────▶│ Analyse  │────▶│ Bewertung│
│ Arbeit   │     │ (extern) │     │   (LLM)  │     │ Kriterien│
└──────────┘     └──────────┘     └──────────┘     └──────────┘
                                                         │
                                                         ▼
┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Export  │◀────│ Finalize │◀────│Gutachten │◀────│ RAG-EH   │
│   PDF    │     │          │     │ Generate │     │  Query   │
└──────────┘     └──────────┘     └──────────┘     └──────────┘

2.2.2 Zeugnis-Crawler Flow

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│ Seed URL │────▶│  Fetch   │────▶│ Extract  │────▶│ Check    │
│ (Config) │     │  HTTP    │     │ PDF/HTML │     │ Rights   │
└──────────┘     └──────────┘     └──────────┘     └──────────┘
                                                         │
                                         ┌───────────────┤
                                         ▼               ▼
                                   ┌──────────┐   ┌──────────┐
                                   │  MinIO   │   │  Qdrant  │
                                   │ (Store)  │   │ (Index)  │
                                   └──────────┘   └──────────┘

2.3 Komponenten-Details

2.3.1 Klausur-Service (main.py)

Endpunkt	Methode	Beschreibung
`/api/v1/klausuren`	GET	Liste aller Klausuren
`/api/v1/klausuren`	POST	Neue Klausur erstellen
`/api/v1/klausuren/{id}`	GET	Klausur-Details
`/api/v1/klausuren/{id}`	PUT	Klausur aktualisieren
`/api/v1/klausuren/{id}`	DELETE	Klausur löschen
`/api/v1/klausuren/{id}/students`	POST	Schülerarbeit hinzufügen
`/api/v1/students/{id}/criteria`	PUT	Kriterien bewerten
`/api/v1/students/{id}/gutachten`	PUT	Gutachten speichern
`/api/v1/students/{id}/gutachten/generate`	POST	Gutachten generieren

2.3.2 BYOEH (eh_pipeline.py, qdrant_service.py)

Endpunkt	Methode	Beschreibung
`/api/v1/eh/upload`	POST	EH hochladen
`/api/v1/eh/{id}/index`	POST	EH indexieren
`/api/v1/eh/rag-query`	POST	RAG-Suche
`/api/v1/eh/{id}/share`	POST	EH teilen
`/api/v1/eh/{id}/link-klausur`	POST	EH mit Klausur verknüpfen

2.3.3 Zeugnis-Modul (zeugnis_api.py)

Endpunkt	Methode	Beschreibung
`/api/v1/admin/zeugnis/sources`	GET	Bundesländer-Quellen
`/api/v1/admin/zeugnis/crawler/start`	POST	Crawler starten
`/api/v1/admin/zeugnis/crawler/stop`	POST	Crawler stoppen
`/api/v1/admin/zeugnis/documents`	GET	Dokumente abrufen
`/api/v1/admin/zeugnis/stats`	GET	Statistiken

2.3.4 Training-Modul (training_api.py)

Endpunkt	Methode	Beschreibung
`/api/v1/admin/training/jobs`	GET	Training-Jobs
`/api/v1/admin/training/jobs`	POST	Training starten
`/api/v1/admin/training/jobs/{id}/pause`	POST	Pausieren
`/api/v1/admin/training/jobs/{id}/resume`	POST	Fortsetzen
`/api/v1/admin/training/models`	GET	Modell-Versionen

3. Datenmodelle

3.1 PostgreSQL Schema

3.1.1 Kern-Tabellen (metrics_db.py)

-- RAG Feedback
CREATE TABLE rag_search_feedback (
    id SERIAL PRIMARY KEY,
    result_id VARCHAR(255) NOT NULL,
    query_text TEXT,
    collection_name VARCHAR(100),
    score FLOAT,
    rating INTEGER CHECK (rating >= 1 AND rating <= 5),
    notes TEXT,
    user_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

-- RAG Search Logs
CREATE TABLE rag_search_logs (
    id SERIAL PRIMARY KEY,
    query_text TEXT NOT NULL,
    collection_name VARCHAR(100),
    result_count INTEGER,
    latency_ms INTEGER,
    top_score FLOAT,
    filters JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Relevanz-Judgments für Precision/Recall
CREATE TABLE rag_relevance_judgments (
    id SERIAL PRIMARY KEY,
    query_id VARCHAR(255) NOT NULL,
    query_text TEXT NOT NULL,
    result_id VARCHAR(255) NOT NULL,
    result_rank INTEGER,
    is_relevant BOOLEAN NOT NULL,
    collection_name VARCHAR(100),
    user_id VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

3.1.2 Zeugnis-Tabellen

-- Bundesland-Quellen
CREATE TABLE zeugnis_sources (
    id VARCHAR(36) PRIMARY KEY,
    bundesland VARCHAR(10) NOT NULL,
    name VARCHAR(255) NOT NULL,
    base_url TEXT,
    license_type VARCHAR(50) NOT NULL,
    training_allowed BOOLEAN DEFAULT FALSE,
    verified_by VARCHAR(100),
    verified_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Seed URLs
CREATE TABLE zeugnis_seed_urls (
    id VARCHAR(36) PRIMARY KEY,
    source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
    url TEXT NOT NULL,
    doc_type VARCHAR(50),
    status VARCHAR(20) DEFAULT 'pending',
    last_crawled TIMESTAMP,
    error_message TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Dokumente
CREATE TABLE zeugnis_documents (
    id VARCHAR(36) PRIMARY KEY,
    seed_url_id VARCHAR(36) REFERENCES zeugnis_seed_urls(id),
    title VARCHAR(500),
    url TEXT NOT NULL,
    content_hash VARCHAR(64),
    minio_path TEXT,
    training_allowed BOOLEAN DEFAULT FALSE,
    indexed_in_qdrant BOOLEAN DEFAULT FALSE,
    file_size INTEGER,
    content_type VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Dokument-Versionen
CREATE TABLE zeugnis_document_versions (
    id VARCHAR(36) PRIMARY KEY,
    document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
    version INTEGER NOT NULL,
    content_hash VARCHAR(64),
    minio_path TEXT,
    change_summary TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Usage Events (Audit Trail)
CREATE TABLE zeugnis_usage_events (
    id VARCHAR(36) PRIMARY KEY,
    document_id VARCHAR(36) REFERENCES zeugnis_documents(id),
    event_type VARCHAR(50) NOT NULL,
    user_id VARCHAR(100),
    details JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Crawler Queue
CREATE TABLE zeugnis_crawler_queue (
    id VARCHAR(36) PRIMARY KEY,
    source_id VARCHAR(36) REFERENCES zeugnis_sources(id),
    priority INTEGER DEFAULT 5,
    status VARCHAR(20) DEFAULT 'pending',
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    documents_found INTEGER DEFAULT 0,
    documents_indexed INTEGER DEFAULT 0,
    error_count INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

3.2 In-Memory Modelle (Python Dataclasses)

3.2.1 Klausur-Modelle

@dataclass
class StudentKlausur:
    id: str
    klausur_id: str
    student_name: str
    status: StudentKlausurStatus  # Enum
    criteria_scores: Dict[str, Dict]
    gutachten: Optional[Dict]
    file_path: Optional[str]
    ocr_text: Optional[str]
    created_at: datetime
    updated_at: datetime

@dataclass
class Klausur:
    id: str
    title: str
    subject: str
    modus: KlausurModus  # LANDES_ABITUR, VORABITUR
    year: int
    semester: str
    erwartungshorizont: Optional[Dict]
    students: List[StudentKlausur]
    created_at: datetime
    tenant_id: str

3.2.2 Zeugnis-Modelle (zeugnis_models.py)

class LicenseType(str, Enum):
    PUBLIC_DOMAIN = "public_domain"
    CC_BY = "cc_by"
    CC_BY_SA = "cc_by_sa"
    CC_BY_NC = "cc_by_nc"
    GOV_STATUTE_FREE_USE = "gov_statute"
    ALL_RIGHTS_RESERVED = "all_rights"
    UNKNOWN_REQUIRES_REVIEW = "unknown"

class CrawlStatus(str, Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    PAUSED = "paused"

class ZeugnisSource(BaseModel):
    id: str
    bundesland: str
    name: str
    base_url: Optional[str]
    license_type: LicenseType
    training_allowed: bool
    verified_by: Optional[str]
    verified_at: Optional[datetime]

3.3 Qdrant Collections

3.3.1 BYOEH Collection (bp_eh)

# Collection Config
collection_name = "bp_eh"
vector_size = 384  # all-MiniLM-L6-v2
distance = Distance.COSINE

# Payload Schema
{
    "tenant_id": str,           # Tenant-Isolation
    "eh_id": str,               # Erwartungshorizont ID
    "chunk_index": int,         # Position im Dokument
    "subject": str,             # Fach
    "encrypted_content": str,   # AES-256-GCM encrypted
    "training_allowed": bool,   # IMMER False für EH
}

3.3.2 Zeugnis Collection (bp_zeugnis)

# Collection Config
collection_name = "bp_zeugnis"
vector_size = 384  # all-MiniLM-L6-v2
distance = Distance.COSINE

# Payload Schema
{
    "document_id": str,
    "chunk_index": int,
    "chunk_text": str,          # Preview (max 500 chars)
    "bundesland": str,
    "doc_type": str,
    "title": str,
    "source_url": str,
    "training_allowed": bool,   # Von Source geerbt
    "indexed_at": str,
}

3.4 MinIO Bucket-Struktur

breakpilot-rag/
├── landes-daten/
│   ├── {bundesland}/
│   │   └── zeugnis/
│   │       └── {year}/
│   │           └── {filename}.pdf
│   └── klausur/
│       └── {year}/
│           └── {subject}/
│               └── {filename}.pdf
│
└── lehrer-daten/
    └── {tenant_id}/
        └── {teacher_id}/
            └── {filename}.pdf.enc

4. API-Spezifikation

4.1 Authentifizierung

Alle API-Endpunkte erfordern JWT-Authentifizierung:

Authorization: Bearer <jwt_token>

JWT-Payload:

{
  "sub": "user-id",
  "tenant_id": "school-id",
  "roles": ["teacher", "admin"],
  "exp": 1704067200
}

4.2 Fehlerbehandlung

Standard-Fehlerformat

{
  "detail": "Beschreibung des Fehlers",
  "code": "ERROR_CODE",
  "timestamp": "2024-01-01T10:00:00Z"
}

HTTP Status Codes

Code	Bedeutung	Verwendung
200	OK	Erfolgreiche Anfrage
201	Created	Ressource erstellt
400	Bad Request	Ungültige Eingabe
401	Unauthorized	Authentifizierung fehlt
403	Forbidden	Keine Berechtigung
404	Not Found	Ressource nicht gefunden
409	Conflict	Ressourcenkonflikt
422	Unprocessable	Validierungsfehler
500	Internal Error	Serverfehler
503	Unavailable	Service nicht verfügbar

4.3 Pagination

GET /api/v1/resource?limit=20&offset=0

Response:

{
  "items": [...],
  "total": 100,
  "limit": 20,
  "offset": 0,
  "has_more": true
}

4.4 Rate Limiting

Endpunkt-Typ	Limit
Standard API	100/min
RAG Query	30/min
Upload	10/min
Training Start	5/hour

5. Frontend-Architektur

5.1 Verzeichnisstruktur

website/
├── app/
│   ├── admin/
│   │   ├── training/
│   │   │   └── page.tsx          # Training Dashboard
│   │   ├── zeugnisse-crawler/
│   │   │   └── page.tsx          # Crawler Admin
│   │   ├── rag/
│   │   │   └── page.tsx          # RAG Admin
│   │   └── uni-crawler/
│   │       └── page.tsx          # Uni Crawler
│   │
│   ├── zeugnisse/
│   │   └── page.tsx              # Lehrer-Frontend
│   │
│   └── api/
│       └── admin/
│           ├── zeugnisse-crawler/
│           │   └── route.ts      # API Proxy
│           └── training/
│               └── route.ts      # API Proxy
│
├── components/
│   ├── ui/                       # Basis-Komponenten
│   └── shared/                   # Geteilte Komponenten
│
└── lib/
    ├── api.ts                    # API Client
    └── utils.ts                  # Hilfsfunktionen

5.2 Komponenten-Hierarchie

App
├── Layout
│   ├── Header
│   │   ├── Navigation
│   │   └── UserMenu
│   └── Sidebar
│
├── Training Dashboard Page
│   ├── StatsCards
│   ├── TrainingJobCard
│   │   ├── ProgressRing
│   │   ├── MetricCards
│   │   └── LossChart
│   ├── DatasetOverview
│   └── NewTrainingModal (Wizard)
│
├── Zeugnisse Crawler Page
│   ├── StatsCards
│   ├── BundeslandTable
│   ├── DocumentList
│   └── CrawlerControls
│
└── Lehrer Zeugnisse Page
    ├── OnboardingWizard
    ├── ChatInterface
    │   └── MessageList
    ├── SearchInterface
    │   └── SearchResults
    └── DocumentBrowser

5.3 State Management

Local State (useState)

UI-Zustand (Modals, Tabs)
Formular-Eingaben
Lokale Filter

Server State (SWR/Fetch)

API-Daten mit Polling
Caching
Revalidierung

Persisted State (localStorage)

Benutzereinstellungen
Letzte Suchen
Wizard-Status

5.4 Styling-Konventionen

// Tailwind CSS Klassennamen
const buttonPrimary = "px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition"
const buttonSecondary = "px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 transition"
const card = "bg-white dark:bg-gray-800 rounded-xl shadow-lg border border-gray-200 dark:border-gray-700"
const input = "px-3 py-2 bg-gray-100 dark:bg-gray-900 border-0 rounded-lg focus:ring-2 focus:ring-blue-500"

6. Sicherheit & Compliance

6.1 Authentifizierung & Autorisierung

JWT-Validierung

def verify_jwt(token: str) -> dict:
    try:
        payload = jwt.decode(token, JWT_SECRET, algorithms=["HS256"])
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="Token expired")
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

RBAC Rollen

Rolle	Berechtigungen
teacher	Klausuren erstellen, EH hochladen, Zeugnis-Assistent
admin	+ Crawler steuern, Training starten
superadmin	+ System-Konfiguration

6.2 Datenverschlüsselung

AES-256-GCM für EH-Dokumente

from cryptography.hazmat.primitives.ciphers.aead import AESGCM

def encrypt_text(plaintext: str, passphrase: str) -> tuple:
    salt = os.urandom(16)
    iv = os.urandom(12)
    key = derive_key(passphrase, salt)
    cipher = AESGCM(key)
    ciphertext = cipher.encrypt(iv, plaintext.encode(), None)
    return base64.b64encode(salt + iv + ciphertext).decode(), hash_key(key)

6.3 Tenant-Isolation

Alle Datenbankabfragen filtern nach tenant_id
Qdrant-Suchen mit tenant_id Filter
MinIO-Pfade enthalten tenant_id

6.4 Audit-Trail

async def log_event(event_type: str, resource_id: str, user_id: str, details: dict):
    await log_zeugnis_event(
        document_id=resource_id,
        event_type=event_type,
        user_id=user_id,
        details=details,
    )

6.5 DSGVO-Compliance

Datenexport-Funktion
Lösch-Anfragen
Einwilligungs-Tracking
Protokollierung aller Zugriffe

7. Testing-Strategie

7.1 Test-Pyramide

           /\
          /  \
         / E2E \       <- 5% (Critical Paths)
        /------\
       /  Integ  \     <- 25% (API, DB)
      /----------\
     /    Unit    \    <- 70% (Functions)
    /--------------\

7.2 Unit Tests (Python)

Speicherort: klausur-service/backend/tests/

# tests/test_zeugnis_models.py
import pytest
from zeugnis_models import (
    LicenseType, get_training_allowed, get_bundesland_name
)

class TestTrainingPermissions:
    def test_niedersachsen_allows_training(self):
        assert get_training_allowed("ni") == True

    def test_berlin_disallows_training(self):
        assert get_training_allowed("be") == False

    def test_unknown_bundesland_disallows(self):
        assert get_training_allowed("xx") == False


class TestBundeslandNames:
    def test_valid_code_returns_name(self):
        assert get_bundesland_name("ni") == "Niedersachsen"

    def test_invalid_code_returns_code(self):
        assert get_bundesland_name("xx") == "xx"

# tests/test_zeugnis_crawler.py
import pytest
from zeugnis_crawler import chunk_text, compute_hash, extract_text_from_pdf

class TestChunking:
    def test_short_text_single_chunk(self):
        text = "Dies ist ein kurzer Text."
        chunks = chunk_text(text, chunk_size=100)
        assert len(chunks) == 1

    def test_long_text_multiple_chunks(self):
        text = "A" * 2000
        chunks = chunk_text(text, chunk_size=500, overlap=50)
        assert len(chunks) > 1

    def test_overlap_preserved(self):
        text = "ABCDE" * 200
        chunks = chunk_text(text, chunk_size=100, overlap=20)
        for i in range(1, len(chunks)):
            assert chunks[i][:20] == chunks[i-1][-20:]


class TestHashing:
    def test_same_content_same_hash(self):
        content = b"Hello World"
        assert compute_hash(content) == compute_hash(content)

    def test_different_content_different_hash(self):
        assert compute_hash(b"Hello") != compute_hash(b"World")

7.3 Integration Tests

# tests/test_zeugnis_api_integration.py
import pytest
from httpx import AsyncClient
from main import app

@pytest.fixture
async def client():
    async with AsyncClient(app=app, base_url="http://test") as ac:
        yield ac

@pytest.mark.asyncio
class TestZeugnisAPI:
    async def test_get_sources_returns_list(self, client):
        response = await client.get("/api/v1/admin/zeugnis/sources")
        assert response.status_code == 200
        assert isinstance(response.json(), list)

    async def test_start_crawler_without_running(self, client):
        response = await client.post(
            "/api/v1/admin/zeugnis/crawler/start",
            json={"bundesland": "ni"}
        )
        assert response.status_code == 200

    async def test_start_crawler_while_running_fails(self, client):
        # First start
        await client.post("/api/v1/admin/zeugnis/crawler/start")
        # Second start should fail
        response = await client.post("/api/v1/admin/zeugnis/crawler/start")
        assert response.status_code == 409

7.4 E2E Tests (Playwright)

// tests/e2e/zeugnisse.spec.ts
import { test, expect } from '@playwright/test'

test.describe('Zeugnis-Assistent', () => {
  test('onboarding wizard completes successfully', async ({ page }) => {
    await page.goto('/zeugnisse')

    // Step 1: Welcome
    await expect(page.locator('h2')).toContainText('Willkommen')
    await page.click('button:has-text("Weiter")')

    // Step 2: Select Bundesland
    await page.click('button:has-text("Niedersachsen")')
    await page.click('button:has-text("Weiter")')

    // Step 3: Select Schulform
    await page.click('button:has-text("Gymnasium")')
    await page.click('button:has-text("Weiter")')

    // Step 4: Complete
    await page.click('button:has-text("Loslegen")')

    // Verify main interface
    await expect(page.locator('h1')).toContainText('Zeugnis-Assistent')
  })

  test('chat interface responds to questions', async ({ page }) => {
    // Skip wizard (set localStorage)
    await page.goto('/zeugnisse')
    await page.evaluate(() => {
      localStorage.setItem('zeugnis-preferences', JSON.stringify({
        bundesland: 'ni',
        schulform: 'gymnasium',
        hasSeenWizard: true,
      }))
    })
    await page.reload()

    // Send message
    await page.fill('textarea', 'Wie schreibe ich Bemerkungen?')
    await page.click('button[type="submit"]')

    // Wait for response
    await expect(page.locator('.bg-white.rounded-2xl').last())
      .toContainText('Bemerkung', { timeout: 10000 })
  })
})

7.5 Test-Ausführung

# Unit Tests (Python)
cd klausur-service/backend
pytest -v tests/

# Mit Coverage
pytest --cov=. --cov-report=html tests/

# E2E Tests (Playwright)
cd website
npx playwright test

# Alle Tests
./run-tests.sh

8. Deployment & Operations

8.1 Docker Compose

# docker-compose.yml (Auszug)
services:
  klausur-service:
    build:
      context: ./klausur-service
      dockerfile: Dockerfile
    ports:
      - "8086:8086"
    environment:
      - JWT_SECRET=${JWT_SECRET}
      - QDRANT_URL=http://qdrant:6333
      - MINIO_ENDPOINT=minio:9000
      - DATABASE_URL=postgres://breakpilot:breakpilot123@postgres:5432/breakpilot_db
    depends_on:
      - qdrant
      - minio
      - postgres
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8086/health"]
      interval: 30s
      timeout: 10s
      retries: 3

8.2 Dockerfile (Klausur-Service)

FROM python:3.11-slim

WORKDIR /app

# System dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Application code
COPY backend/ .

# Create directories
RUN mkdir -p /app/uploads /app/eh-uploads

EXPOSE 8086

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8086"]

8.3 Monitoring

Health Checks

@app.get("/health")
async def health():
    return {
        "status": "healthy",
        "service": "klausur-service",
        "version": "2.0",
        "timestamp": datetime.now().isoformat(),
    }

@app.get("/health/detailed")
async def health_detailed():
    # Check dependencies
    qdrant_ok = await check_qdrant()
    postgres_ok = await check_postgres()
    minio_ok = await check_minio()

    return {
        "status": "healthy" if all([qdrant_ok, postgres_ok, minio_ok]) else "degraded",
        "dependencies": {
            "qdrant": "ok" if qdrant_ok else "error",
            "postgres": "ok" if postgres_ok else "error",
            "minio": "ok" if minio_ok else "error",
        }
    }

Prometheus Metrics

from prometheus_client import Counter, Histogram

# Metriken
request_count = Counter('klausur_requests_total', 'Total requests', ['endpoint', 'method'])
request_latency = Histogram('klausur_request_latency_seconds', 'Request latency', ['endpoint'])
training_jobs = Counter('klausur_training_jobs_total', 'Training jobs', ['status'])

8.4 Logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger("klausur-service")

# Strukturiertes Logging
logger.info("Training started", extra={
    "job_id": job_id,
    "bundeslaender": config.bundeslaender,
    "epochs": config.epochs,
})

9. Entwicklungsrichtlinien

9.1 Code-Style

Python

# Imports: Standard → Third-Party → Local
import os
from datetime import datetime

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

from zeugnis_models import LicenseType

# Docstrings: Google Style
def process_document(content: bytes, doc_type: str) -> dict:
    """Process a document for indexing.

    Args:
        content: Raw document bytes.
        doc_type: Type of document (pdf, html).

    Returns:
        Dict with extracted text and metadata.

    Raises:
        ValueError: If doc_type is not supported.
    """
    pass

# Type Hints: Immer verwenden
async def get_sources(
    bundesland: Optional[str] = None,
    limit: int = 100,
) -> List[Dict[str, Any]]:
    pass

TypeScript

// Interfaces über Types bevorzugen
interface TrainingJob {
  id: string
  name: string
  status: TrainingStatus
}

// Props-Interface für Komponenten
interface TrainingCardProps {
  job: TrainingJob
  onPause: () => void
  onResume: () => void
}

// Funktionskomponenten mit expliziten Typen
export function TrainingCard({ job, onPause, onResume }: TrainingCardProps) {
  return (...)
}

9.2 Git-Workflow

main
  │
  ├── develop
  │     │
  │     ├── feature/zeugnis-crawler
  │     ├── feature/training-dashboard
  │     └── fix/crawler-retry
  │
  └── release/v2.0

Commit-Messages

feat(zeugnis): add rights-aware crawler

- Implement PDF/HTML text extraction
- Add training_allowed flag per bundesland
- Create audit trail for document access

Closes #123

9.3 Review-Checkliste

Tests vorhanden und bestanden
Dokumentation aktualisiert
Type-Hints/Interfaces vollständig
Keine Hardcoded Credentials
Error Handling implementiert
Logging vorhanden
Performance akzeptabel

9.4 Versionierung

Semantic Versioning: MAJOR.MINOR.PATCH

MAJOR: Breaking Changes
MINOR: Neue Features (rückwärtskompatibel)
PATCH: Bug Fixes

Anhang A: Umgebungsvariablen

# Authentifizierung
JWT_SECRET=your-super-secret-key

# Datenbanken
DATABASE_URL=postgres://user:pass@host:5432/db
QDRANT_URL=http://qdrant:6333
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=breakpilot
MINIO_SECRET_KEY=breakpilot123
MINIO_BUCKET=breakpilot-rag

# Embeddings
EMBEDDING_BACKEND=local  # oder "openai"
OPENAI_API_KEY=sk-...    # Falls openai

# Services
BACKEND_URL=http://backend:8000
SCHOOL_SERVICE_URL=http://school-service:8084

# Feature Flags
BYOEH_ENCRYPTION_ENABLED=true
BYOEH_CHUNK_SIZE=1000
BYOEH_CHUNK_OVERLAP=200

Anhang B: Schnellreferenz

API-Basis-URLs

Umgebung	URL
Lokal	http://localhost:8086
Entwicklung	https://dev.breakpilot.app
Produktion	https://api.breakpilot.app

Wichtige Befehle

# Service starten
docker-compose up -d klausur-service

# Logs anzeigen
docker logs -f breakpilot-pwa-klausur-service

# Tests ausführen
docker exec klausur-service pytest tests/

# DB-Migration
docker exec postgres psql -U breakpilot -d breakpilot_db -f /migration.sql

Letzte Aktualisierung: Januar 2025

33 KiB Raw Blame History