Switch Vision-LLM Fusion to llama3.2-vision:11b

qwen2.5vl:32b needs ~100GB RAM and crashes Ollama. llama3.2-vision:11b is already installed and fits in memory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix: _merge_paddle_tesseract takes 2 args not 4
2026-04-24 00:44:59 +02:00 · 2026-04-24 00:33:49 +02:00 · 2026-04-24 00:24:22 +02:00 · 2026-04-23 16:55:01 +02:00 · 2026-04-23 16:40:39 +02:00 · 2026-04-23 16:18:44 +02:00
145 changed files with 23690 additions and 15136 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -256,3 +256,45 @@ ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && git push all
 | `website/app/admin/klausur-korrektur/` | Korrektur-Workspace |
 | `backend-lehrer/classroom_api.py` | Classroom Engine |
 | `backend-lehrer/state_engine_api.py` | State Engine |
 ---
 ## Code-Qualitaet Guardrails (NON-NEGOTIABLE)
 > Vollstaendige Details: `.claude/rules/architecture.md`
 > Ausnahmen: `.claude/rules/loc-exceptions.txt`
 ### File Size Budget
 - **Hard Cap: 500 LOC** pro Datei
 - Wenn eine Aenderung eine Datei ueber 500 LOC bringen wuerde: **erst splitten, dann aendern**
 - Ausnahmen nur mit Begruendung in `loc-exceptions.txt` + `[guardrail-change]` Commit-Marker
 ### Architektur
 - **Python:** Routes duenn → Business Logic in Services → Persistenz in Repositories
 - **Go:** Handler ≤40 LOC → Service-Layer → Repository-Pattern
 - **TypeScript/Next.js:** page.tsx duenn → Server Actions, Queries, Components auslagern
 - **Types:** Monolithische types.ts frueh splitten, types.ts + types/ Shadowing vermeiden
 ### Workflow (bei jeder Aenderung)
 1. Datei lesen + LOC pruefen
 2. Wenn nahe am Budget → erst splitten
 3. Minimale kohaerente Aenderung
 4. Verifikation (Tests + Lint)
 5. Zusammenfassung: Was geaendert, was verifiziert, Restrisiko
 ### Commit-Marker
 - `[migration-approved]` — Schema-/Migrations-Aenderungen
 - `[guardrail-change]` — Aenderungen an .claude/**, scripts/check-loc.sh
 - `[split-required]` — Aenderung beginnt mit Datei-Split
 - `[interface-change]` — Public API Contracts geaendert
 ### LOC-Check ausfuehren
 ```bash
 bash scripts/check-loc.sh --changed   # nur geaenderte Dateien
 bash scripts/check-loc.sh --all       # alle Dateien (zeigt alle Violations)
 ```
--- a/.claude/rules/architecture.md
+++ b/.claude/rules/architecture.md
@@ -0,0 +1,46 @@
 # Architecture Rule — BreakPilot Lehrer
 ## File Size Budget
 Hard default: **500 LOC max** per file.
 Soft targets:
 - Handler/Router/Service: 300-400 LOC
 - Models/Schemas/Types: 200-300 LOC
 - Utilities: 100-200 LOC
 Ausnahmen nur in `.claude/rules/loc-exceptions.txt` mit Begruendung.
 ## Split-Trigger
 Sofort splitten wenn:
 - Datei ueberschreitet 500 LOC
 - Datei wuerde nach Aenderung 500 LOC ueberschreiten
 - Datei mischt Transport + Business Logic + Persistence
 - Datei enthaelt mehrere unabhaengig testbare Verantwortlichkeiten
 ## Python (backend-lehrer, klausur-service, voice-service)
 - Routes duenn halten — Business Logic in Services
 - Persistenz in Repositories/Data-Access-Module
 - Pydantic Schemas nach Domain splitten
 - Zirkulaere Imports vermeiden
 ## Go (school-service, edu-search-service)
 - Handler duenn halten (≤40 LOC)
 - Business Logic in Services/Use-Cases
 - Transport/Request-Decoding getrennt von Domain-Logik
 ## TypeScript / Next.js (admin-lehrer, studio-v2, website)
 - page.tsx duenn halten — Server Actions, Queries, Forms auslagern
 - Monolithische types.ts frueh splitten
 - types.ts + types/ Shadowing vermeiden
 - Shared Client/Server Types explizit trennen
 ## Entscheidungsreihenfolge
 1. Bestehendes kleines kohaeesives Modul wiederverwenden
 2. Neues Modul in der Naehe erstellen
 3. Ueberfuellte Datei splitten, neues Verhalten in richtiges Split-Modul
 4. Nur als letzter Ausweg: Grosse bestehende Datei erweitern
--- a/.claude/rules/loc-exceptions.txt
+++ b/.claude/rules/loc-exceptions.txt
@@ -0,0 +1,20 @@
 # LOC Exceptions — BreakPilot Lehrer
 # Format: <glob> | owner=<person> | reason=<why> | review=<date>
 #
 # Jede Ausnahme braucht Begruendung und Review-Datum.
 # Temporaere Ausnahmen muessen mit [guardrail-change] Commit-Marker versehen werden.
 # Generated / Build Artifacts
 **/node_modules/** | owner=infra | reason=npm packages | review=permanent
 **/.next/** | owner=infra | reason=Next.js build output | review=permanent
 **/__pycache__/** | owner=infra | reason=Python bytecode | review=permanent
 **/venv/** | owner=infra | reason=Python virtualenv | review=permanent
 # Test-Dateien (duerfen groesser sein fuer Table-Driven Tests)
 **/tests/test_cv_vocab_pipeline.py | owner=klausur | reason=umfangreiche OCR Pipeline Tests | review=2026-07-01
 **/tests/test_rbac.py | owner=klausur | reason=RBAC Test-Matrix | review=2026-07-01
 **/tests/test_grid_editor_api.py | owner=klausur | reason=Grid Editor Integrationstests | review=2026-07-01
 # Legacy — TEMPORAER bis Refactoring abgeschlossen
 # Dateien hier werden Phase fuer Phase abgearbeitet und entfernt.
 # KEINE neuen Ausnahmen ohne [guardrail-change] Commit-Marker!
--- a/.claude/rules/ocr-pipeline-extensions.md
+++ b/.claude/rules/ocr-pipeline-extensions.md
@@ -0,0 +1,237 @@
 # OCR Pipeline Erweiterungen - Entwicklerdokumentation
 **Status:** Produktiv
 **Letzte Aktualisierung:** 2026-04-15
 **URL:** https://macmini:3002/ai/ocr-kombi
 ---
 ## Uebersicht
 Erweiterungen der OCR Kombi Pipeline (14 Steps, 0-13):
 - **SmartSpellChecker** — LLM-freie OCR-Korrektur mit Spracherkennung
 - **Box-Grid-Review** (Step 11) — Eingebettete Boxen verarbeiten
 - **Ansicht/Spreadsheet** (Step 12) — Fortune Sheet Excel-Editor
 ---
 ## Pipeline Steps
 | Step | ID | Name | Komponente |
 |------|----|------|------------|
 | 0 | upload | Upload | StepUpload |
 | 1 | orientation | Orientierung | StepOrientation |
 | 2 | page-split | Seitentrennung | StepPageSplit |
 | 3 | deskew | Begradigung | StepDeskew |
 | 4 | dewarp | Entzerrung | StepDewarp |
 | 5 | content-crop | Zuschneiden | StepContentCrop |
 | 6 | ocr | OCR | StepOcr |
 | 7 | structure | Strukturerkennung | StepStructure |
 | 8 | grid-build | Grid-Aufbau | StepGridBuild |
 | 9 | grid-review | Grid-Review | StepGridReview |
 | 10 | gutter-repair | Wortkorrektur | StepGutterRepair |
 | **11** | **box-review** | **Box-Review** | **StepBoxGridReview** |
 | **12** | **ansicht** | **Ansicht** | **StepAnsicht** |
 | 13 | ground-truth | Ground Truth | StepGroundTruth |
 Step-Definitionen: `admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts`
 ---
 ## SmartSpellChecker
 **Datei:** `klausur-service/backend/smart_spell.py`
 **Tests:** `tests/test_smart_spell.py` (43 Tests)
 **Lizenz:** Nur pyspellchecker (MIT) — kein LLM, kein Hunspell
 ### Features
 | Feature | Methode |
 |---------|---------|
 | Spracherkennung | Dual-Dictionary EN/DE Heuristik |
 | a/I Disambiguation | Bigram-Kontext (Folgewort-Lookup) |
 | Boundary Repair | Frequenz-basiert: `Pound sand`→`Pounds and` |
 | Context Split | `anew`→`a new` (Allow/Deny-Liste) |
 | Multi-Digit | BFS: `sch00l`→`school` |
 | Cross-Language Guard | DE-Woerter in EN-Spalte nicht falsch korrigieren |
 | Umlaut-Korrektur | `Schuler`→`Schueler` |
 | IPA-Schutz | Inhalte in [Klammern] nie aendern |
 | Slash→l | `p/`→`pl` (kursives l als / erkannt) |
 | Abkuerzungen | 120+ aus `_KNOWN_ABBREVIATIONS` |
 ### Integration
 ```python
 # In cv_review.py (LLM Review Step):
 from smart_spell import SmartSpellChecker
 _smart = SmartSpellChecker()
 result = _smart.correct_text(text, lang="en")  # oder "de" oder "auto"
 # In grid_editor_api.py (Grid Build + Box Build):
 # Automatisch nach Grid-Aufbau und Box-Grid-Aufbau
 ```
 ### Frequenz-Scoring
 Boundary Repair vergleicht Wort-Frequenz-Produkte:
 - `old_freq = word_freq(w1) * word_freq(w2)`
 - `new_freq = word_freq(repaired_w1) * word_freq(repaired_w2)`
 - Akzeptiert wenn `new_freq > old_freq * 5`
 - Abkuerzungs-Bonus nur wenn Original-Woerter selten (freq < 1e-6)
 ---
 ## Box-Grid-Review (Step 11)
 **Frontend:** `admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx`
 **Backend:** `klausur-service/backend/cv_box_layout.py`, `grid_editor_api.py`
 **Tests:** `tests/test_box_layout.py` (13 Tests)
 ### Backend-Endpoints
 ```
 POST /api/v1/ocr-pipeline/sessions/{id}/build-box-grids
 ```
 Verarbeitet alle erkannten Boxen aus `structure_result`:
 1. Filtert Header/Footer-Boxen (obere/untere 7% der Bildhoehe)
 2. Extrahiert OCR-Woerter pro Box aus `raw_paddle_words`
 3. Klassifiziert Layout: `flowing` | `columnar` | `bullet_list` | `header_only`
 4. Baut Grid mit layout-spezifischer Logik
 5. Wendet SmartSpellChecker an
 ### Box Layout Klassifikation (`cv_box_layout.py`)
 | Layout | Erkennung | Grid-Aufbau |
 |--------|-----------|-------------|
 | `header_only` | ≤5 Woerter oder 1 Zeile | 1 Zelle, alles zusammen |
 | `flowing` | Gleichmaessige Zeilenbreite | 1 Spalte, Bullet-Gruppierung per Einrueckung |
 | `bullet_list` | ≥40% Zeilen mit Bullet-Marker | 1 Spalte, Bullet-Items |
 | `columnar` | Mehrere X-Cluster | Standard-Spaltenerkennung |
 ### Bullet-Einrueckung
 Erkennung ueber Left-Edge-Analyse:
 - Minimale Einrueckung = Bullet-Ebene
 - Zeilen mit >15px mehr Einrueckung = Folgezeilen
 - Folgezeilen werden mit `\n` in die Bullet-Zelle integriert
 - Fehlende `•` Marker werden automatisch ergaenzt
 ### Colspan-Erkennung (`grid_editor_helpers.py`)
 Generische Funktion `_detect_colspan_cells()`:
 - Laeuft nach `_build_cells()` fuer ALLE Zonen
 - Nutzt Original-Wort-Bloecke (vor `_split_cross_column_words`)
 - Wort-Block der ueber Spaltengrenze reicht → `spanning_header` mit `colspan=N`
 - Beispiel: "In Britain you pay with pounds and pence." ueber 2 Spalten
 ### Spalten-Erkennung in Boxen
 Fuer kleine Zonen (≤60 Woerter):
 - `gap_threshold = max(median_h * 1.0, 25)` statt `3x median`
 - PaddleOCR liefert Multi-Word-Bloecke → alle Gaps sind Spalten-Gaps
 ---
 ## Ansicht / Spreadsheet (Step 12)
 **Frontend:** `admin-lehrer/components/ocr-kombi/StepAnsicht.tsx`, `SpreadsheetView.tsx`
 **Bibliothek:** `@fortune-sheet/react` (MIT, v1.0.4)
 ### Architektur
 Split-View:
 - **Links:** Original-Scan mit OCR-Overlay (`/image/words-overlay`)
 - **Rechts:** Fortune Sheet Spreadsheet mit Multi-Sheet-Tabs
 ### Multi-Sheet Ansatz
 Jede Zone wird ein eigenes Sheet-Tab:
 - Sheet "Vokabeln" — Hauptgrid mit EN/DE Spalten
 - Sheet "Pounds and euros" — Box 1 mit eigenen 4 Spalten
 - Sheet "German leihen" — Box 2 als Fliesstexttext
 Grund: Spaltenbreiten sind pro Zone unterschiedlich optimiert. Excel-Limitation: Spaltenbreite gilt fuer die ganze Spalte.
 ### Zell-Formatierung
 | Format | Quelle | Fortune Sheet Property |
 |--------|--------|----------------------|
 | Fett | `is_header`, `is_bold`, groessere Schrift | `bl: 1` |
 | Schriftfarbe | OCR word_boxes color | `fc: '#hex'` |
 | Hintergrund | Box bg_hex, Header | `bg: '#hex08'` |
 | Text-Wrap | Mehrzeilige Zellen (\n) | `tb: '2'` |
 | Vertikal oben | Mehrzeilige Zellen | `vt: 0` |
 | Groessere Schrift | word_box height >1.3x median | `fs: 12` |
 ### Spaltenbreiten
 Auto-Fit: `max(laengster_text * 7.5 + 16, original_px * scaleFactor)`
 ### Toolbar
 `undo, redo, font-bold, font-italic, font-strikethrough, font-color, background, font-size, horizontal-align, vertical-align, text-wrap, merge-cell, border`
 ---
 ## Unified Grid (Backend)
 **Datei:** `klausur-service/backend/unified_grid.py`
 **Tests:** `tests/test_unified_grid.py` (10 Tests)
 Mergt alle Zonen in ein einzelnes Grid (fuer Export/Analyse):
 ```
 POST /api/v1/ocr-pipeline/sessions/{id}/build-unified-grid
 GET  /api/v1/ocr-pipeline/sessions/{id}/unified-grid
 ```
 - Dominante Zeilenhoehe = Median der Content-Row-Abstaende
 - Full-Width Boxen: Rows direkt integriert
 - Partial-Width Boxen: Extra-Rows eingefuegt wenn Box mehr Zeilen hat
 - Box-Zellen mit `source_zone_type: "box"` und `box_region` Metadaten
 ---
 ## Dateistruktur
 ### Backend (klausur-service)
 | Datei | Zeilen | Beschreibung |
 |-------|--------|--------------|
 | `grid_build_core.py` | 1943 | `_build_grid_core()` — Haupt-Grid-Aufbau |
 | `grid_editor_api.py` | 474 | REST-Endpoints (build, save, get, gutter, box, unified) |
 | `grid_editor_helpers.py` | 1737 | Helper: Spalten, Rows, Cells, Colspan, Header |
 | `smart_spell.py` | 587 | SmartSpellChecker |
 | `cv_box_layout.py` | 339 | Box-Layout-Klassifikation + Grid-Aufbau |
 | `unified_grid.py` | 425 | Unified Grid Builder |
 ### Frontend (admin-lehrer)
 | Datei | Zeilen | Beschreibung |
 |-------|--------|--------------|
 | `StepBoxGridReview.tsx` | 283 | Box-Review Step 11 |
 | `StepAnsicht.tsx` | 112 | Ansicht Step 12 (Split-View) |
 | `SpreadsheetView.tsx` | ~160 | Fortune Sheet Integration |
 | `GridTable.tsx` | 652 | Grid-Editor Tabelle (Steps 9-11) |
 | `useGridEditor.ts` | 985 | Grid-Editor Hook |
 ### Tests
 | Datei | Tests | Beschreibung |
 |-------|-------|--------------|
 | `test_smart_spell.py` | 43 | Spracherkennung, Boundary Repair, IPA-Schutz |
 | `test_box_layout.py` | 13 | Layout-Klassifikation, Bullet-Gruppierung |
 | `test_unified_grid.py` | 10 | Unified Grid, Box-Klassifikation |
 | **Gesamt** | **66** | |
 ---
 ## Aenderungshistorie
 | Datum | Aenderung |
 |-------|-----------|
 | 2026-04-15 | Fortune Sheet Multi-Sheet Tabs, Bullet-Points, Auto-Fit, Refactoring |
 | 2026-04-14 | Unified Grid, Ansicht Step, Colspan-Erkennung |
 | 2026-04-13 | Box-Grid-Review Step, Spalten in Boxen, Header/Footer Filter |
 | 2026-04-12 | SmartSpellChecker, Frequency Scoring, IPA-Schutz, Vocab-Worksheet Refactoring |
--- a/.claude/rules/vocab-worksheet.md
+++ b/.claude/rules/vocab-worksheet.md
@@ -188,11 +188,35 @@ ssh macmini "docker compose up -d klausur-service studio-v2"
 ---
 ## Frontend Refactoring (2026-04-12)
 `page.tsx` wurde von 2337 Zeilen in 14 Dateien aufgeteilt:
 ```
 studio-v2/app/vocab-worksheet/
 ├── page.tsx                 # 198 Zeilen — Orchestrator
 ├── types.ts                 # Interfaces, VocabWorksheetHook
 ├── constants.ts             # API-Base, Formats, Defaults
 ├── useVocabWorksheet.ts     # 843 Zeilen — Custom Hook (alle State + Logik)
 └── components/
    ├── UploadScreen.tsx      # Session-Liste + Dokument-Auswahl
    ├── PageSelection.tsx     # PDF-Seitenauswahl
    ├── VocabularyTab.tsx     # Vokabel-Tabelle + IPA/Silben
    ├── WorksheetTab.tsx      # Format-Auswahl + Konfiguration
    ├── ExportTab.tsx         # PDF-Download
    ├── OcrSettingsPanel.tsx   # OCR-Filter Einstellungen
    ├── FullscreenPreview.tsx  # Vollbild-Vorschau Modal
    ├── QRCodeModal.tsx        # QR-Upload Modal
    └── OcrComparisonModal.tsx # OCR-Vergleich Modal
 ```
 ---
 ## Erweiterung: Neue Formate hinzufuegen
 1. **Backend**: Neuen Generator in `klausur-service/backend/` erstellen
 2. **API**: Neuen Endpoint in `vocab_worksheet_api.py` hinzufuegen
-3. **Frontend**: Format zu `worksheetFormats` Array in `page.tsx` hinzufuegen
+3. **Frontend**: Format zu `worksheetFormats` Array in `constants.ts` hinzufuegen
 4. **Doku**: Diese Datei aktualisieren
 ---
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -0,0 +1,9 @@
 {
  "permissions": {
    "allow": [
      "Bash",
      "Write",
      "Read"
    ]
  }
 }
--- a/AGENTS.go.md
+++ b/AGENTS.go.md
@@ -0,0 +1,36 @@
 # AGENTS.go.md — Go/Gin Konventionen
 ## Architektur
 - `handlers/`: HTTP Transport nur — Decode, Validate, Call Service, Encode Response
 - `service/` oder `usecase/`: Business Logic
 - `repo/`: Storage/Integration
 - `model/` oder `domain/`: Domain Entities
 - `tests/`: Table-driven Tests bevorzugen
 ## Regeln
 1. Handler ≤40 LOC — nur Decode → Service → Encode
 2. Business Logic NICHT in Handlers verstecken
 3. Grosse Handler nach Resource/Verb splitten
 4. Request/Response DTOs nah am Transport halten
 5. Interfaces nur an echten Boundaries (nicht ueberall fuer Mocks)
 6. Keine Giant-Utility-Dateien
 7. Generated Files nicht manuell editieren
 ## Split-Trigger
 - Handler-Datei ueberschreitet 400-500 LOC
 - Unrelated Endpoints zusammengruppiert
 - Encoding/Decoding dominiert die Handler-Datei
 - Service-Logik und Transport-Logik gemischt
 ## Verifikation
 ```bash
 gofmt -l . | grep -q . && exit 1
 go vet ./...
 golangci-lint run --timeout=5m
 go test -race ./...
 go build ./...
 ```
--- a/AGENTS.python.md
+++ b/AGENTS.python.md
@@ -0,0 +1,36 @@
 # AGENTS.python.md — Python/FastAPI Konventionen
 ## Architektur
 - `routes/` oder `api/`: Request/Response nur — kein Business Logic
 - `services/`: Business Logic
 - `repositories/`: Persistenz/Data Access
 - `schemas/`: Pydantic Models, nach Domain gesplittet
 - `tests/`: Spiegelt Produktions-Layout
 ## Regeln
 1. Route-Dateien duenn halten (≤300 LOC)
 2. Wenn eine Route-Datei 300-400 LOC erreicht → nach Resource/Operation splitten
 3. Schema-Dateien nach Domain splitten wenn sie wachsen
 4. Modul-Level Singleton-Kopplung vermeiden (Tests patchen falsches Symbol)
 5. Patch immer das Symbol das vom getesteten Modul importiert wird
 6. Dependency Injection bevorzugen statt versteckte Imports
 7. Pydantic v2: `from __future__ import annotations` NICHT verwenden (bricht Pydantic)
 8. Migrationen getrennt von Refactorings halten
 ## Split-Trigger
 - Datei naehert sich oder ueberschreitet 500 LOC
 - Zirkulaere Imports erscheinen
 - Tests brauchen tiefes Patching
 - API-Schemas mischen verschiedene Domains
 - Service-Datei macht Transport UND DB-Logik
 ## Verifikation
 ```bash
 ruff check .
 mypy . --ignore-missing-imports --no-error-summary
 pytest tests/ -x -q --no-header
 ```
--- a/AGENTS.typescript.md
+++ b/AGENTS.typescript.md
@@ -0,0 +1,55 @@
 # AGENTS.typescript.md — Next.js Konventionen
 ## Architektur
 - `app/.../page.tsx`: Minimale Seiten-Komposition (≤250 LOC)
 - `app/.../actions.ts`: Server Actions
 - `app/.../queries.ts`: Data Loading
 - `app/.../_components/`: View-Teile (Colocation)
 - `app/.../_hooks/`: Seiten-spezifische Hooks (Colocation)
 - `types/` oder `types/*.ts`: Domain-spezifische Types
 - `schemas/`: Zod/Validierungs-Schemas
 - `lib/`: Shared Utilities
 ## Regeln
 1. page.tsx duenn halten (≤250 LOC)
 2. Grosse Seiten frueh in Sections/Components splitten
 3. KEINE einzelne types.ts als Catch-All
 4. types.ts UND types/ Shadowing vermeiden (eines waehlen!)
 5. Server/Client Module-Grenzen explizit halten
 6. Pure Helpers und schmale Props bevorzugen
 7. API-Client Types getrennt von handgeschriebenen Domain Types
 ## Colocation Pattern (bevorzugt)
 ```
 app/(admin)/ai/rag/
  page.tsx              ← duenn, komponiert nur
  _components/
    SearchPanel.tsx
    ResultsTable.tsx
    FilterBar.tsx
  _hooks/
    useRagSearch.ts
  actions.ts            ← Server Actions
  queries.ts            ← Data Fetching
 ```
 ## Split-Trigger
 - page.tsx ueberschreitet 250-350 LOC
 - types.ts ueberschreitet 200-300 LOC
 - Form-Logik, Server Actions und Rendering in einer Datei
 - Mehrere unabhaengig testbare Sections vorhanden
 - Imports werden broechig
 ## Verifikation
 ```bash
 npx tsc --noEmit
 npm run lint
 npm run build
 ```
 > `npm run build` ist PFLICHT — `tsc` allein reicht nicht.
--- a/admin-lehrer/app/(admin)/ai/gpu/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/gpu/page.tsx
@@ -1,395 +0,0 @@
 'use client'
 /**
 * GPU Infrastructure Admin Page
 *
 * vast.ai GPU Management for LLM Processing
 * Part of KI-Werkzeuge
 */
 import { useEffect, useState, useCallback } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { AIToolsSidebarResponsive } from '@/components/ai/AIToolsSidebar'
 interface VastStatus {
  instance_id: number | null
  status: string
  gpu_name: string | null
  dph_total: number | null
  endpoint_base_url: string | null
  last_activity: string | null
  auto_shutdown_in_minutes: number | null
  total_runtime_hours: number | null
  total_cost_usd: number | null
  account_credit: number | null
  account_total_spend: number | null
  session_runtime_minutes: number | null
  session_cost_usd: number | null
  message: string | null
  error?: string
 }
 export default function GPUInfrastructurePage() {
  const [status, setStatus] = useState<VastStatus | null>(null)
  const [loading, setLoading] = useState(true)
  const [actionLoading, setActionLoading] = useState<string | null>(null)
  const [error, setError] = useState<string | null>(null)
  const [message, setMessage] = useState<string | null>(null)
  const API_PROXY = '/api/admin/gpu'
  const fetchStatus = useCallback(async () => {
    setLoading(true)
    setError(null)
    try {
      const response = await fetch(API_PROXY)
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || `HTTP ${response.status}`)
      }
      setStatus(data)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Verbindungsfehler')
      setStatus({
        instance_id: null,
        status: 'error',
        gpu_name: null,
        dph_total: null,
        endpoint_base_url: null,
        last_activity: null,
        auto_shutdown_in_minutes: null,
        total_runtime_hours: null,
        total_cost_usd: null,
        account_credit: null,
        account_total_spend: null,
        session_runtime_minutes: null,
        session_cost_usd: null,
        message: 'Verbindung fehlgeschlagen'
      })
    } finally {
      setLoading(false)
    }
  }, [])
  useEffect(() => {
    fetchStatus()
  }, [fetchStatus])
  useEffect(() => {
    const interval = setInterval(fetchStatus, 30000)
    return () => clearInterval(interval)
  }, [fetchStatus])
  const powerOn = async () => {
    setActionLoading('on')
    setError(null)
    setMessage(null)
    try {
      const response = await fetch(API_PROXY, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ action: 'on' }),
      })
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
      }
      setMessage('Start angefordert')
      setTimeout(fetchStatus, 3000)
      setTimeout(fetchStatus, 10000)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Fehler beim Starten')
      fetchStatus()
    } finally {
      setActionLoading(null)
    }
  }
  const powerOff = async () => {
    setActionLoading('off')
    setError(null)
    setMessage(null)
    try {
      const response = await fetch(API_PROXY, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ action: 'off' }),
      })
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
      }
      setMessage('Stop angefordert')
      setTimeout(fetchStatus, 3000)
      setTimeout(fetchStatus, 10000)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Fehler beim Stoppen')
      fetchStatus()
    } finally {
      setActionLoading(null)
    }
  }
  const getStatusBadge = (s: string) => {
    const baseClasses = 'px-3 py-1 rounded-full text-sm font-semibold uppercase'
    switch (s) {
      case 'running':
        return `${baseClasses} bg-green-100 text-green-800`
      case 'stopped':
      case 'exited':
        return `${baseClasses} bg-red-100 text-red-800`
      case 'loading':
      case 'scheduling':
      case 'creating':
      case 'starting...':
      case 'stopping...':
        return `${baseClasses} bg-yellow-100 text-yellow-800`
      default:
        return `${baseClasses} bg-slate-100 text-slate-600`
    }
  }
  const getCreditColor = (credit: number | null) => {
    if (credit === null) return 'text-slate-500'
    if (credit < 5) return 'text-red-600'
    if (credit < 15) return 'text-yellow-600'
    return 'text-green-600'
  }
  return (
    <div>
      {/* Page Purpose */}
      <PagePurpose
        title="GPU Infrastruktur"
        purpose="Verwalten Sie die vast.ai GPU-Instanzen fuer LLM-Verarbeitung und OCR. Starten/Stoppen Sie GPUs bei Bedarf und ueberwachen Sie Kosten in Echtzeit."
        audience={['DevOps', 'Entwickler', 'System-Admins']}
        architecture={{
          services: ['vast.ai API', 'Ollama', 'VLLM'],
          databases: ['PostgreSQL (Logs)'],
        }}
        relatedPages={[
          { name: 'Test Quality (BQAS)', href: '/ai/test-quality', description: 'Golden Suite & Tests' },
          { name: 'Magic Help', href: '/ai/magic-help', description: 'TrOCR Testing' },
        ]}
        collapsible={true}
        defaultCollapsed={true}
      />
      {/* KI-Werkzeuge Sidebar */}
      <AIToolsSidebarResponsive currentTool="gpu" />
      {/* Status Cards */}
      <div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
        <div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-6 gap-6">
          <div>
            <div className="text-sm text-slate-500 mb-2">Status</div>
            {loading ? (
              <span className="px-3 py-1 rounded-full text-sm font-semibold bg-slate-100 text-slate-600">
                Laden...
              </span>
            ) : (
              <span className={getStatusBadge(
                actionLoading === 'on' ? 'starting...' :
                actionLoading === 'off' ? 'stopping...' :
                status?.status || 'unknown'
              )}>
                {actionLoading === 'on' ? 'starting...' :
                 actionLoading === 'off' ? 'stopping...' :
                 status?.status || 'unbekannt'}
              </span>
            )}
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">GPU</div>
            <div className="font-semibold text-slate-900">
              {status?.gpu_name || '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Kosten/h</div>
            <div className="font-semibold text-slate-900">
              {status?.dph_total ? `$${status.dph_total.toFixed(3)}` : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Auto-Stop</div>
            <div className="font-semibold text-slate-900">
              {status && status.auto_shutdown_in_minutes !== null
                ? `${status.auto_shutdown_in_minutes} min`
                : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Budget</div>
            <div className={`font-bold text-lg ${getCreditColor(status?.account_credit ?? null)}`}>
              {status && status.account_credit !== null
                ? `$${status.account_credit.toFixed(2)}`
                : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Session</div>
            <div className="font-semibold text-slate-900">
              {status && status.session_runtime_minutes !== null && status.session_cost_usd !== null
                ? `${Math.round(status.session_runtime_minutes)} min / $${status.session_cost_usd.toFixed(3)}`
                : '-'}
            </div>
          </div>
        </div>
        {/* Buttons */}
        <div className="flex items-center gap-4 mt-6 pt-6 border-t border-slate-200">
          <button
            onClick={powerOn}
            disabled={actionLoading !== null || status?.status === 'running'}
            className="px-6 py-2 bg-orange-600 text-white rounded-lg font-medium hover:bg-orange-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Starten
          </button>
          <button
            onClick={powerOff}
            disabled={actionLoading !== null || status?.status !== 'running'}
            className="px-6 py-2 bg-red-600 text-white rounded-lg font-medium hover:bg-red-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Stoppen
          </button>
          <button
            onClick={fetchStatus}
            disabled={loading}
            className="px-4 py-2 border border-slate-300 text-slate-700 rounded-lg font-medium hover:bg-slate-50 disabled:opacity-50 transition-colors"
          >
            {loading ? 'Aktualisiere...' : 'Aktualisieren'}
          </button>
          {message && (
            <span className="ml-4 text-sm text-green-600 font-medium">{message}</span>
          )}
          {error && (
            <span className="ml-4 text-sm text-red-600 font-medium">{error}</span>
          )}
        </div>
      </div>
      {/* Extended Stats */}
      <div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Kosten-Uebersicht</h3>
          <div className="space-y-4">
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Session Laufzeit</span>
              <span className="font-semibold">
                {status && status.session_runtime_minutes !== null
                  ? `${Math.round(status.session_runtime_minutes)} Minuten`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Session Kosten</span>
              <span className="font-semibold">
                {status && status.session_cost_usd !== null
                  ? `$${status.session_cost_usd.toFixed(4)}`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center pt-4 border-t border-slate-100">
              <span className="text-slate-600">Gesamtlaufzeit</span>
              <span className="font-semibold">
                {status && status.total_runtime_hours !== null
                  ? `${status.total_runtime_hours.toFixed(1)} Stunden`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Gesamtkosten</span>
              <span className="font-semibold">
                {status && status.total_cost_usd !== null
                  ? `$${status.total_cost_usd.toFixed(2)}`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">vast.ai Ausgaben</span>
              <span className="font-semibold">
                {status && status.account_total_spend !== null
                  ? `$${status.account_total_spend.toFixed(2)}`
                  : '-'}
              </span>
            </div>
          </div>
        </div>
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Instanz-Details</h3>
          <div className="space-y-4">
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Instanz ID</span>
              <span className="font-mono text-sm">
                {status?.instance_id || '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">GPU</span>
              <span className="font-semibold">
                {status?.gpu_name || '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Stundensatz</span>
              <span className="font-semibold">
                {status?.dph_total ? `$${status.dph_total.toFixed(4)}/h` : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Letzte Aktivitaet</span>
              <span className="text-sm">
                {status?.last_activity
                  ? new Date(status.last_activity).toLocaleString('de-DE')
                  : '-'}
              </span>
            </div>
            {status?.endpoint_base_url && status.status === 'running' && (
              <div className="pt-4 border-t border-slate-100">
                <div className="text-slate-600 text-sm mb-1">Endpoint</div>
                <code className="text-xs bg-slate-100 px-2 py-1 rounded block overflow-x-auto">
                  {status.endpoint_base_url}
                </code>
              </div>
            )}
          </div>
        </div>
      </div>
      {/* Info */}
      <div className="bg-violet-50 border border-violet-200 rounded-xl p-4">
        <div className="flex gap-3">
          <svg className="w-5 h-5 text-violet-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          <div>
            <h4 className="font-semibold text-violet-900">Auto-Shutdown</h4>
            <p className="text-sm text-violet-800 mt-1">
              Die GPU-Instanz wird automatisch gestoppt, wenn sie laengere Zeit inaktiv ist.
              Der Status wird alle 30 Sekunden automatisch aktualisiert.
            </p>
          </div>
        </div>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/ai/model-management/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/model-management/page.tsx
@@ -1,549 +0,0 @@
 'use client'
 /**
 * Model Management Page
 *
 * Manage ML model backends (PyTorch vs ONNX), view status,
 * run benchmarks, and configure inference settings.
 */
 import { useState, useEffect, useCallback } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 const KLAUSUR_API = '/klausur-api'
 // ---------------------------------------------------------------------------
 // Types
 // ---------------------------------------------------------------------------
 type BackendMode = 'auto' | 'pytorch' | 'onnx'
 type ModelStatus = 'available' | 'not_found' | 'loading' | 'error'
 type Tab = 'overview' | 'benchmarks' | 'configuration'
 interface ModelInfo {
  name: string
  key: string
  pytorch: { status: ModelStatus; size_mb: number; ram_mb: number }
  onnx: { status: ModelStatus; size_mb: number; ram_mb: number; quantized: boolean }
 }
 interface BenchmarkRow {
  model: string
  backend: string
  quantization: string
  size_mb: number
  ram_mb: number
  inference_ms: number
  load_time_s: number
 }
 interface StatusInfo {
  active_backend: BackendMode
  loaded_models: string[]
  cache_hits: number
  cache_misses: number
  uptime_s: number
 }
 // ---------------------------------------------------------------------------
 // Mock data (used when backend is not available)
 // ---------------------------------------------------------------------------
 const MOCK_MODELS: ModelInfo[] = [
  {
    name: 'TrOCR Printed',
    key: 'trocr_printed',
    pytorch: { status: 'available', size_mb: 892, ram_mb: 1800 },
    onnx: { status: 'available', size_mb: 234, ram_mb: 620, quantized: true },
  },
  {
    name: 'TrOCR Handwritten',
    key: 'trocr_handwritten',
    pytorch: { status: 'available', size_mb: 892, ram_mb: 1800 },
    onnx: { status: 'not_found', size_mb: 0, ram_mb: 0, quantized: false },
  },
  {
    name: 'PP-DocLayout',
    key: 'pp_doclayout',
    pytorch: { status: 'not_found', size_mb: 0, ram_mb: 0 },
    onnx: { status: 'available', size_mb: 48, ram_mb: 180, quantized: false },
  },
 ]
 const MOCK_BENCHMARKS: BenchmarkRow[] = [
  { model: 'TrOCR Printed', backend: 'PyTorch', quantization: 'FP32', size_mb: 892, ram_mb: 1800, inference_ms: 142, load_time_s: 3.2 },
  { model: 'TrOCR Printed', backend: 'ONNX', quantization: 'INT8', size_mb: 234, ram_mb: 620, inference_ms: 38, load_time_s: 0.8 },
  { model: 'TrOCR Handwritten', backend: 'PyTorch', quantization: 'FP32', size_mb: 892, ram_mb: 1800, inference_ms: 156, load_time_s: 3.4 },
  { model: 'PP-DocLayout', backend: 'ONNX', quantization: 'FP32', size_mb: 48, ram_mb: 180, inference_ms: 22, load_time_s: 0.3 },
 ]
 const MOCK_STATUS: StatusInfo = {
  active_backend: 'auto',
  loaded_models: ['trocr_printed (ONNX)', 'pp_doclayout (ONNX)'],
  cache_hits: 1247,
  cache_misses: 83,
  uptime_s: 86400,
 }
 // ---------------------------------------------------------------------------
 // Helpers
 // ---------------------------------------------------------------------------
 function StatusBadge({ status }: { status: ModelStatus }) {
  const cls =
    status === 'available'
      ? 'bg-emerald-100 text-emerald-800 border-emerald-200'
      : status === 'loading'
        ? 'bg-blue-100 text-blue-800 border-blue-200'
        : status === 'not_found'
          ? 'bg-slate-100 text-slate-500 border-slate-200'
          : 'bg-red-100 text-red-800 border-red-200'
  const label =
    status === 'available' ? 'Verfuegbar'
      : status === 'loading' ? 'Laden...'
        : status === 'not_found' ? 'Nicht vorhanden'
          : 'Fehler'
  return (
    <span className={`inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium border ${cls}`}>
      {label}
    </span>
  )
 }
 function formatBytes(mb: number) {
  if (mb === 0) return '--'
  if (mb >= 1000) return `${(mb / 1000).toFixed(1)} GB`
  return `${mb} MB`
 }
 function formatUptime(seconds: number) {
  const h = Math.floor(seconds / 3600)
  const m = Math.floor((seconds % 3600) / 60)
  if (h > 0) return `${h}h ${m}m`
  return `${m}m`
 }
 // ---------------------------------------------------------------------------
 // Component
 // ---------------------------------------------------------------------------
 export default function ModelManagementPage() {
  const [tab, setTab] = useState<Tab>('overview')
  const [models, setModels] = useState<ModelInfo[]>(MOCK_MODELS)
  const [benchmarks, setBenchmarks] = useState<BenchmarkRow[]>(MOCK_BENCHMARKS)
  const [status, setStatus] = useState<StatusInfo>(MOCK_STATUS)
  const [backend, setBackend] = useState<BackendMode>('auto')
  const [saving, setSaving] = useState(false)
  const [benchmarkRunning, setBenchmarkRunning] = useState(false)
  const [usingMock, setUsingMock] = useState(false)
  // Load status
  const loadStatus = useCallback(async () => {
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/models/status`)
      if (res.ok) {
        const data = await res.json()
        setStatus(data)
        setBackend(data.active_backend || 'auto')
        setUsingMock(false)
      } else {
        setUsingMock(true)
      }
    } catch {
      setUsingMock(true)
    }
  }, [])
  // Load models
  const loadModels = useCallback(async () => {
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/models`)
      if (res.ok) {
        const data = await res.json()
        if (data.models?.length) setModels(data.models)
      }
    } catch {
      // Keep mock data
    }
  }, [])
  // Load benchmarks
  const loadBenchmarks = useCallback(async () => {
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/models/benchmarks`)
      if (res.ok) {
        const data = await res.json()
        if (data.benchmarks?.length) setBenchmarks(data.benchmarks)
      }
    } catch {
      // Keep mock data
    }
  }, [])
  useEffect(() => {
    loadStatus()
    loadModels()
    loadBenchmarks()
  }, [loadStatus, loadModels, loadBenchmarks])
  // Save backend preference
  const saveBackend = async (mode: BackendMode) => {
    setBackend(mode)
    setSaving(true)
    try {
      await fetch(`${KLAUSUR_API}/api/v1/models/backend`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ backend: mode }),
      })
      await loadStatus()
    } catch {
      // Silently handle — mock mode
    } finally {
      setSaving(false)
    }
  }
  // Run benchmark
  const runBenchmark = async () => {
    setBenchmarkRunning(true)
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/models/benchmark`, {
        method: 'POST',
      })
      if (res.ok) {
        const data = await res.json()
        if (data.benchmarks?.length) setBenchmarks(data.benchmarks)
      }
      await loadBenchmarks()
    } catch {
      // Keep existing data
    } finally {
      setBenchmarkRunning(false)
    }
  }
  const tabs: { key: Tab; label: string }[] = [
    { key: 'overview', label: 'Uebersicht' },
    { key: 'benchmarks', label: 'Benchmarks' },
    { key: 'configuration', label: 'Konfiguration' },
  ]
  return (
    <div className="space-y-6">
      <div className="max-w-7xl mx-auto p-6 space-y-6">
        <PagePurpose
          title="Model Management"
          purpose="Verwaltung der ML-Modelle fuer OCR und Layout-Erkennung. Vergleich von PyTorch- und ONNX-Backends, Benchmark-Tests und Backend-Konfiguration."
          audience={['Entwickler', 'DevOps']}
          defaultCollapsed
          architecture={{
            services: ['klausur-service (FastAPI, Port 8086)'],
            databases: ['Dateisystem (Modell-Dateien)'],
          }}
          relatedPages={[
            { name: 'OCR Pipeline', href: '/ai/ocr-pipeline', description: 'OCR-Pipeline ausfuehren' },
            { name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'OCR-Methoden vergleichen' },
            { name: 'GPU Infrastruktur', href: '/ai/gpu', description: 'GPU-Ressourcen verwalten' },
          ]}
        />
        {/* Header */}
        <div className="flex items-center justify-between">
          <div>
            <h1 className="text-2xl font-bold text-slate-900">Model Management</h1>
            <p className="text-sm text-slate-500 mt-1">
              {models.length} Modelle konfiguriert
              {usingMock && (
                <span className="ml-2 text-xs bg-amber-100 text-amber-700 px-1.5 py-0.5 rounded">
                  Mock-Daten (Backend nicht erreichbar)
                </span>
              )}
            </p>
          </div>
        </div>
        {/* Status Cards */}
        <div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-4">
          <div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
            <p className="text-xs text-slate-500 uppercase font-medium">Aktives Backend</p>
            <p className="text-lg font-semibold text-slate-900 mt-1">{status.active_backend.toUpperCase()}</p>
          </div>
          <div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
            <p className="text-xs text-slate-500 uppercase font-medium">Geladene Modelle</p>
            <p className="text-lg font-semibold text-slate-900 mt-1">{status.loaded_models.length}</p>
          </div>
          <div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
            <p className="text-xs text-slate-500 uppercase font-medium">Cache Hit-Rate</p>
            <p className="text-lg font-semibold text-slate-900 mt-1">
              {status.cache_hits + status.cache_misses > 0
                ? `${((status.cache_hits / (status.cache_hits + status.cache_misses)) * 100).toFixed(1)}%`
                : '--'}
            </p>
          </div>
          <div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
            <p className="text-xs text-slate-500 uppercase font-medium">Uptime</p>
            <p className="text-lg font-semibold text-slate-900 mt-1">{formatUptime(status.uptime_s)}</p>
          </div>
        </div>
        {/* Tabs */}
        <div className="border-b border-slate-200">
          <nav className="flex gap-4">
            {tabs.map(t => (
              <button
                key={t.key}
                onClick={() => setTab(t.key)}
                className={`pb-3 px-1 text-sm font-medium border-b-2 transition-colors ${
                  tab === t.key
                    ? 'border-teal-500 text-teal-600'
                    : 'border-transparent text-slate-500 hover:text-slate-700'
                }`}
              >
                {t.label}
              </button>
            ))}
          </nav>
        </div>
        {/* Overview Tab */}
        {tab === 'overview' && (
          <div className="space-y-4">
            <h3 className="text-sm font-medium text-slate-700">Verfuegbare Modelle</h3>
            <div className="grid gap-4 sm:grid-cols-2 lg:grid-cols-3">
              {models.map(m => (
                <div key={m.key} className="bg-white rounded-lg border border-slate-200 overflow-hidden">
                  <div className="px-4 py-3 border-b border-slate-100">
                    <h4 className="font-semibold text-slate-900">{m.name}</h4>
                    <p className="text-xs text-slate-400 mt-0.5 font-mono">{m.key}</p>
                  </div>
                  <div className="px-4 py-3 space-y-3">
                    {/* PyTorch */}
                    <div className="flex items-center justify-between">
                      <div className="flex items-center gap-2">
                        <span className="text-xs font-medium text-slate-600 w-16">PyTorch</span>
                        <StatusBadge status={m.pytorch.status} />
                      </div>
                      {m.pytorch.status === 'available' && (
                        <span className="text-xs text-slate-400">
                          {formatBytes(m.pytorch.size_mb)} / {formatBytes(m.pytorch.ram_mb)} RAM
                        </span>
                      )}
                    </div>
                    {/* ONNX */}
                    <div className="flex items-center justify-between">
                      <div className="flex items-center gap-2">
                        <span className="text-xs font-medium text-slate-600 w-16">ONNX</span>
                        <StatusBadge status={m.onnx.status} />
                      </div>
                      {m.onnx.status === 'available' && (
                        <span className="text-xs text-slate-400">
                          {formatBytes(m.onnx.size_mb)} / {formatBytes(m.onnx.ram_mb)} RAM
                          {m.onnx.quantized && (
                            <span className="ml-1 text-xs bg-violet-100 text-violet-700 px-1 rounded">INT8</span>
                          )}
                        </span>
                      )}
                    </div>
                  </div>
                </div>
              ))}
            </div>
            {/* Loaded Models List */}
            {status.loaded_models.length > 0 && (
              <div>
                <h3 className="text-sm font-medium text-slate-700 mb-2">Aktuell geladen</h3>
                <div className="flex flex-wrap gap-2">
                  {status.loaded_models.map((m, i) => (
                    <span key={i} className="inline-flex items-center px-3 py-1 rounded-full text-sm bg-teal-50 text-teal-700 border border-teal-200">
                      {m}
                    </span>
                  ))}
                </div>
              </div>
            )}
          </div>
        )}
        {/* Benchmarks Tab */}
        {tab === 'benchmarks' && (
          <div className="space-y-4">
            <div className="flex items-center justify-between">
              <h3 className="text-sm font-medium text-slate-700">PyTorch vs ONNX Vergleich</h3>
              <button
                onClick={runBenchmark}
                disabled={benchmarkRunning}
                className="inline-flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium transition-colors"
              >
                {benchmarkRunning ? (
                  <>
                    <svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
                      <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
                      <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
                    </svg>
                    Benchmark laeuft...
                  </>
                ) : (
                  'Benchmark starten'
                )}
              </button>
            </div>
            <div className="bg-white rounded-lg border border-slate-200 overflow-hidden">
              <div className="overflow-x-auto">
                <table className="w-full text-sm">
                  <thead>
                    <tr className="border-b border-slate-200 bg-slate-50 text-left text-slate-500">
                      <th className="px-4 py-3 font-medium">Modell</th>
                      <th className="px-4 py-3 font-medium">Backend</th>
                      <th className="px-4 py-3 font-medium">Quantisierung</th>
                      <th className="px-4 py-3 font-medium text-right">Groesse</th>
                      <th className="px-4 py-3 font-medium text-right">RAM</th>
                      <th className="px-4 py-3 font-medium text-right">Inferenz</th>
                      <th className="px-4 py-3 font-medium text-right">Ladezeit</th>
                    </tr>
                  </thead>
                  <tbody>
                    {benchmarks.map((b, i) => (
                      <tr key={i} className="border-b border-slate-100 hover:bg-slate-50">
                        <td className="px-4 py-3 font-medium text-slate-900">{b.model}</td>
                        <td className="px-4 py-3">
                          <span className={`inline-flex items-center px-2 py-0.5 rounded text-xs font-medium ${
                            b.backend === 'ONNX'
                              ? 'bg-violet-100 text-violet-700'
                              : 'bg-orange-100 text-orange-700'
                          }`}>
                            {b.backend}
                          </span>
                        </td>
                        <td className="px-4 py-3 text-slate-600">{b.quantization}</td>
                        <td className="px-4 py-3 text-right text-slate-600">{formatBytes(b.size_mb)}</td>
                        <td className="px-4 py-3 text-right text-slate-600">{formatBytes(b.ram_mb)}</td>
                        <td className="px-4 py-3 text-right">
                          <span className={`font-mono ${b.inference_ms < 50 ? 'text-emerald-600' : b.inference_ms < 100 ? 'text-amber-600' : 'text-red-600'}`}>
                            {b.inference_ms} ms
                          </span>
                        </td>
                        <td className="px-4 py-3 text-right text-slate-500">{b.load_time_s.toFixed(1)}s</td>
                      </tr>
                    ))}
                  </tbody>
                </table>
              </div>
            </div>
            {benchmarks.length === 0 && (
              <div className="text-center py-12 text-slate-400">
                <p className="text-lg">Keine Benchmark-Daten</p>
                <p className="text-sm mt-1">Klicken Sie &quot;Benchmark starten&quot; um einen Vergleich durchzufuehren.</p>
              </div>
            )}
          </div>
        )}
        {/* Configuration Tab */}
        {tab === 'configuration' && (
          <div className="space-y-6">
            {/* Backend Selector */}
            <div className="bg-white rounded-lg border border-slate-200 p-5">
              <h3 className="text-sm font-semibold text-slate-900 mb-1">Inference Backend</h3>
              <p className="text-sm text-slate-500 mb-4">
                Waehlen Sie welches Backend fuer die Modell-Inferenz verwendet werden soll.
              </p>
              <div className="space-y-3">
                {([
                  {
                    mode: 'auto' as const,
                    label: 'Auto',
                    desc: 'ONNX wenn verfuegbar, Fallback auf PyTorch.',
                  },
                  {
                    mode: 'pytorch' as const,
                    label: 'PyTorch',
                    desc: 'Immer PyTorch verwenden. Hoeherer RAM-Verbrauch, volle Flexibilitaet.',
                  },
                  {
                    mode: 'onnx' as const,
                    label: 'ONNX',
                    desc: 'Immer ONNX verwenden. Schneller und weniger RAM, Fehler wenn nicht vorhanden.',
                  },
                ] as const).map(opt => (
                  <label
                    key={opt.mode}
                    className={`flex items-start gap-3 p-3 rounded-lg border cursor-pointer transition-colors ${
                      backend === opt.mode
                        ? 'border-teal-300 bg-teal-50'
                        : 'border-slate-200 hover:bg-slate-50'
                    }`}
                  >
                    <input
                      type="radio"
                      name="backend"
                      value={opt.mode}
                      checked={backend === opt.mode}
                      onChange={() => saveBackend(opt.mode)}
                      disabled={saving}
                      className="mt-1 text-teal-600 focus:ring-teal-500"
                    />
                    <div>
                      <span className="font-medium text-slate-900">{opt.label}</span>
                      <p className="text-sm text-slate-500 mt-0.5">{opt.desc}</p>
                    </div>
                  </label>
                ))}
              </div>
              {saving && (
                <p className="text-xs text-teal-600 mt-3">Speichere...</p>
              )}
            </div>
            {/* Model Details Table */}
            <div className="bg-white rounded-lg border border-slate-200 p-5">
              <h3 className="text-sm font-semibold text-slate-900 mb-4">Modell-Details</h3>
              <div className="overflow-x-auto">
                <table className="w-full text-sm">
                  <thead>
                    <tr className="border-b border-slate-200 text-left text-slate-500">
                      <th className="pb-2 font-medium">Modell</th>
                      <th className="pb-2 font-medium">PyTorch</th>
                      <th className="pb-2 font-medium text-right">Groesse (PT)</th>
                      <th className="pb-2 font-medium">ONNX</th>
                      <th className="pb-2 font-medium text-right">Groesse (ONNX)</th>
                      <th className="pb-2 font-medium text-right">Einsparung</th>
                    </tr>
                  </thead>
                  <tbody>
                    {models.map(m => {
                      const ptAvail = m.pytorch.status === 'available'
                      const oxAvail = m.onnx.status === 'available'
                      const savings = ptAvail && oxAvail && m.pytorch.size_mb > 0
                        ? Math.round((1 - m.onnx.size_mb / m.pytorch.size_mb) * 100)
                        : null
                      return (
                        <tr key={m.key} className="border-b border-slate-100">
                          <td className="py-2.5 font-medium text-slate-900">{m.name}</td>
                          <td className="py-2.5"><StatusBadge status={m.pytorch.status} /></td>
                          <td className="py-2.5 text-right text-slate-500">{ptAvail ? formatBytes(m.pytorch.size_mb) : '--'}</td>
                          <td className="py-2.5"><StatusBadge status={m.onnx.status} /></td>
                          <td className="py-2.5 text-right text-slate-500">{oxAvail ? formatBytes(m.onnx.size_mb) : '--'}</td>
                          <td className="py-2.5 text-right">
                            {savings !== null ? (
                              <span className="text-emerald-600 font-medium">-{savings}%</span>
                            ) : (
                              <span className="text-slate-300">--</span>
                            )}
                          </td>
                        </tr>
                      )
                    })}
                  </tbody>
                </table>
              </div>
            </div>
          </div>
        )}
      </div>
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/ai/ocr-compare/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-compare/page.tsx
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/page.tsx
@@ -2,7 +2,6 @@
 import { Suspense } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
 import { KombiStepper } from '@/components/ocr-kombi/KombiStepper'
 import { SessionList } from '@/components/ocr-kombi/SessionList'
 import { SessionHeader } from '@/components/ocr-kombi/SessionHeader'
@@ -16,6 +15,9 @@ import { StepOcr } from '@/components/ocr-kombi/StepOcr'
 import { StepStructure } from '@/components/ocr-kombi/StepStructure'
 import { StepGridBuild } from '@/components/ocr-kombi/StepGridBuild'
 import { StepGridReview } from '@/components/ocr-kombi/StepGridReview'
 import { StepGutterRepair } from '@/components/ocr-kombi/StepGutterRepair'
 import { StepBoxGridReview } from '@/components/ocr-kombi/StepBoxGridReview'
 import { StepAnsicht } from '@/components/ocr-kombi/StepAnsicht'
 import { StepGroundTruth } from '@/components/ocr-kombi/StepGroundTruth'
 import { useKombiPipeline } from './useKombiPipeline'
@@ -27,8 +29,7 @@ function OcrKombiContent() {
    loadingSessions,
    activeCategory,
    isGroundTruth,
-    subSessions,
+    pageNumber,
    parentSessionId,
    steps,
    gridSaveRef,
    groupedSessions,
@@ -40,11 +41,8 @@ function OcrKombiContent() {
    deleteSession,
    renameSession,
    updateCategory,
    handleSessionChange,
    setSessionId,
    setSessionName,
    setSubSessions,
    setParentSessionId,
    setIsGroundTruth,
  } = useKombiPipeline()
@@ -75,17 +73,11 @@ function OcrKombiContent() {
          <StepPageSplit
            sessionId={sessionId}
            sessionName={sessionName}
-            onNext={() => {
+            onNext={handleNext}
-              // If sub-sessions were created, switch to the first one
+            onSplitComplete={(childId, childName) => {
-              if (subSessions.length > 0) {
+              // Switch to the first child session and refresh the list
-                setSessionId(subSessions[0].id)
+              setSessionId(childId)
-                setSessionName(subSessions[0].name)
+              setSessionName(childName)
              }
              handleNext()
            }}
            onSubSessionsCreated={(subs) => {
              setSubSessions(subs)
              if (sessionId) setParentSessionId(sessionId)
              loadSessions()
            }}
          />
@@ -105,6 +97,12 @@ function OcrKombiContent() {
      case 9:
        return <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
      case 10:
        return <StepGutterRepair sessionId={sessionId} onNext={handleNext} />
      case 11:
        return <StepBoxGridReview sessionId={sessionId} onNext={handleNext} />
      case 12:
        return <StepAnsicht sessionId={sessionId} onNext={handleNext} />
      case 13:
        return (
          <StepGroundTruth
            sessionId={sessionId}
@@ -129,7 +127,6 @@ function OcrKombiContent() {
          databases: ['PostgreSQL Sessions'],
        }}
        relatedPages={[
          { name: 'OCR Overlay (Legacy)', href: '/ai/ocr-overlay', description: 'Alter 3-Modi-Monolith' },
          { name: 'OCR Regression', href: '/ai/ocr-regression', description: 'Regressionstests' },
        ]}
        defaultCollapsed
@@ -151,6 +148,7 @@ function OcrKombiContent() {
          sessionName={sessionName}
          activeCategory={activeCategory}
          isGroundTruth={isGroundTruth}
          pageNumber={pageNumber}
          onUpdateCategory={(cat) => updateCategory(sessionId, cat)}
        />
      )}
@@ -161,15 +159,6 @@ function OcrKombiContent() {
        onStepClick={handleStepClick}
      />
      {subSessions.length > 0 && parentSessionId && sessionId && (
        <BoxSessionTabs
          parentSessionId={parentSessionId}
          subSessions={subSessions}
          activeSessionId={sessionId}
          onSessionChange={handleSessionChange}
        />
      )}
      <div className="min-h-[400px]">{renderStep()}</div>
    </div>
  )
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts
@@ -1,34 +1,210 @@
-import type { PipelineStep, PipelineStepStatus, DocumentCategory } from '../ocr-pipeline/types'
+// OCR Pipeline Types — migrated from deleted ocr-pipeline/types.ts
-// Re-export shared types
+export type PipelineStepStatus = 'pending' | 'active' | 'completed' | 'failed' | 'skipped'
 export type { PipelineStep, PipelineStepStatus, DocumentCategory }
 export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
-// Re-export grid/structure types used by later steps
+export interface PipelineStep {
-export type {
+  id: string
-  SessionListItem,
+  name: string
-  SessionInfo,
+  icon: string
-  SubSession,
+  status: PipelineStepStatus
-  OrientationResult,
+}
-  CropResult,
+
-  DeskewResult,
+export type DocumentCategory =
-  DewarpResult,
+  | 'vokabelseite' | 'woerterbuch' | 'buchseite' | 'arbeitsblatt' | 'klausurseite'
-  GridResult,
+  | 'mathearbeit' | 'statistik' | 'zeitung' | 'formular' | 'handschrift' | 'sonstiges'
-  GridCell,
+
-  OcrWordBox,
+export const DOCUMENT_CATEGORIES: { value: DocumentCategory; label: string; icon: string }[] = [
-  WordBbox,
+  { value: 'vokabelseite', label: 'Vokabelseite', icon: '📖' },
-  ColumnMeta,
+  { value: 'woerterbuch', label: 'Woerterbuch', icon: '📕' },
-  StructureResult,
+  { value: 'buchseite', label: 'Buchseite', icon: '📚' },
-  StructureBox,
+  { value: 'arbeitsblatt', label: 'Arbeitsblatt', icon: '📝' },
-  StructureZone,
+  { value: 'klausurseite', label: 'Klausurseite', icon: '📄' },
-  StructureGraphic,
+  { value: 'mathearbeit', label: 'Mathearbeit', icon: '🔢' },
-  ExcludeRegion,
+  { value: 'statistik', label: 'Statistik', icon: '📊' },
-} from '../ocr-pipeline/types'
+  { value: 'zeitung', label: 'Zeitung', icon: '📰' },
  { value: 'formular', label: 'Formular', icon: '📋' },
  { value: 'handschrift', label: 'Handschrift', icon: '✍️' },
  { value: 'sonstiges', label: 'Sonstiges', icon: '📎' },
 ]
 export interface SessionListItem {
  id: string
  name: string
  filename: string
  status: string
  current_step: number
  document_category?: DocumentCategory
  doc_type?: string
  parent_session_id?: string
  document_group_id?: string
  page_number?: number
  is_ground_truth?: boolean
  created_at: string
  updated_at?: string
 }
 export interface SubSession {
  id: string
  name: string
  box_index: number
  current_step?: number
  status?: string
 }
 export interface OrientationResult {
  orientation_degrees: number
  corrected: boolean
  duration_seconds: number
 }
 export interface CropResult {
  crop_applied: boolean
  crop_rect?: { x: number; y: number; width: number; height: number }
  crop_rect_pct?: { x: number; y: number; width: number; height: number }
  original_size: { width: number; height: number }
  cropped_size: { width: number; height: number }
  detected_format?: string
  format_confidence?: number
  aspect_ratio?: number
  border_fractions?: { top: number; bottom: number; left: number; right: number }
  skipped?: boolean
  duration_seconds?: number
 }
 export interface DeskewResult {
  session_id: string
  angle_hough: number
  angle_word_alignment: number
  angle_iterative?: number
  angle_residual?: number
  angle_textline?: number
  angle_applied: number
  method_used: 'hough' | 'word_alignment' | 'manual' | 'iterative' | 'two_pass' | 'three_pass' | 'manual_combined'
  confidence: number
  duration_seconds: number
  deskewed_image_url: string
  binarized_image_url: string
 }
 export interface DewarpDetection {
  method: string
  shear_degrees: number
  confidence: number
 }
 export interface DewarpResult {
  session_id: string
  method_used: string
  shear_degrees: number
  confidence: number
  duration_seconds: number
  dewarped_image_url: string
  detections?: DewarpDetection[]
 }
 export interface SessionInfo {
  session_id: string
  filename: string
  name?: string
  image_width: number
  image_height: number
  original_image_url: string
  current_step?: number
  document_category?: DocumentCategory
  doc_type?: string
  orientation_result?: OrientationResult
  crop_result?: CropResult
  deskew_result?: DeskewResult
  dewarp_result?: DewarpResult
  sub_sessions?: SubSession[]
  parent_session_id?: string
  box_index?: number
  document_group_id?: string
  page_number?: number
 }
 export interface StructureGraphic {
  x: number; y: number; w: number; h: number
  area: number; shape: string; color_name: string; color_hex: string; confidence: number
 }
 export interface ExcludeRegion {
  x: number; y: number; w: number; h: number; label?: string
 }
 export interface StructureBox {
  x: number; y: number; w: number; h: number
  confidence: number; border_thickness: number
  bg_color_name?: string; bg_color_hex?: string
 }
 export interface StructureZone {
  index: number; zone_type: 'content' | 'box'
  x: number; y: number; w: number; h: number
 }
 export interface DocLayoutRegion {
  x: number; y: number; w: number; h: number
  class_name: string; confidence: number
 }
 export interface StructureResult {
  image_width: number; image_height: number
  content_bounds: { x: number; y: number; w: number; h: number }
  boxes: StructureBox[]; zones: StructureZone[]
  graphics: StructureGraphic[]; exclude_regions?: ExcludeRegion[]
  color_pixel_counts: Record<string, number>
  has_words: boolean; word_count: number
  border_ghosts_removed?: number; duration_seconds: number
  layout_regions?: DocLayoutRegion[]
  detection_method?: 'opencv' | 'ppdoclayout'
 }
 export interface WordBbox { x: number; y: number; w: number; h: number }
 export interface OcrWordBox {
  text: string; left: number; top: number; width: number; height: number; conf: number
  color?: string; color_name?: string; recovered?: boolean
 }
 export interface ColumnMeta { index: number; type: string; x: number; width: number }
 export interface GridCell {
  cell_id: string; row_index: number; col_index: number; col_type: string
  text: string; confidence: number; bbox_px: WordBbox; bbox_pct: WordBbox
  ocr_engine?: string; is_bold?: boolean
  status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
  word_boxes?: OcrWordBox[]
 }
 export interface WordEntry {
  row_index: number; english: string; german: string; example: string
  source_page?: string; marker?: string; confidence: number
  bbox: WordBbox; bbox_en: WordBbox | null; bbox_de: WordBbox | null; bbox_ex: WordBbox | null
  bbox_ref?: WordBbox | null; bbox_marker?: WordBbox | null
  status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
 }
 export interface GridResult {
  cells: GridCell[]
  grid_shape: { rows: number; cols: number; total_cells: number }
  columns_used: ColumnMeta[]
  layout: 'vocab' | 'generic'
  image_width: number; image_height: number; duration_seconds: number
  ocr_engine?: string; vocab_entries?: WordEntry[]; entries?: WordEntry[]; entry_count?: number
  summary: {
    total_cells: number; non_empty_cells: number; low_confidence: number
    total_entries?: number; with_english?: number; with_german?: number
  }
  llm_review?: {
    changes: { row_index: number; field: string; old: string; new: string }[]
    model_used: string; duration_ms: number; entries_corrected: number
    applied_count?: number; applied_at?: string
  }
 }
 // --- Kombi V2 Pipeline ---
 /**
 * 11-step Kombi V2 pipeline.
 * Each step has its own component file in components/ocr-kombi/.
 */
 export const KOMBI_V2_STEPS: PipelineStep[] = [
  { id: 'upload',        name: 'Upload',             icon: '📤', status: 'pending' },
  { id: 'orientation',   name: 'Orientierung',       icon: '🔄', status: 'pending' },
@@ -40,67 +216,43 @@ export const KOMBI_V2_STEPS: PipelineStep[] = [
  { id: 'structure',     name: 'Strukturerkennung',  icon: '🔍', status: 'pending' },
  { id: 'grid-build',    name: 'Grid-Aufbau',        icon: '🧱', status: 'pending' },
  { id: 'grid-review',   name: 'Grid-Review',        icon: '📊', status: 'pending' },
  { id: 'gutter-repair', name: 'Wortkorrektur',      icon: '🩹', status: 'pending' },
  { id: 'box-review',    name: 'Box-Review',          icon: '📦', status: 'pending' },
  { id: 'ansicht',       name: 'Ansicht',             icon: '👁️', status: 'pending' },
  { id: 'ground-truth',  name: 'Ground Truth',       icon: '✅', status: 'pending' },
 ]
 /** Map from Kombi V2 UI step index to DB step number */
 export const KOMBI_V2_UI_TO_DB: Record<number, number> = {
-  0: 1,   // upload
+  0: 1, 1: 2, 2: 2, 3: 3, 4: 4, 5: 5, 6: 8, 7: 9, 8: 10, 9: 11, 10: 11, 11: 11, 12: 11, 13: 12,
  1: 2,   // orientation
  2: 2,   // page-split (same DB step as orientation)
  3: 3,   // deskew
  4: 4,   // dewarp
  5: 5,   // content-crop
  6: 8,   // ocr (word_result)
  7: 9,   // structure
  8: 10,  // grid-build
  9: 11,  // grid-review
  10: 12, // ground-truth
 }
 /** Map from DB step to Kombi V2 UI step index */
 export function dbStepToKombiV2Ui(dbStep: number): number {
-  if (dbStep <= 1) return 0   // upload
+  if (dbStep <= 1) return 0
-  if (dbStep === 2) return 1  // orientation
+  if (dbStep === 2) return 1
-  if (dbStep === 3) return 3  // deskew
+  if (dbStep === 3) return 3
-  if (dbStep === 4) return 4  // dewarp
+  if (dbStep === 4) return 4
-  if (dbStep === 5) return 5  // content-crop
+  if (dbStep === 5) return 5
-  if (dbStep <= 8) return 6   // ocr
+  if (dbStep <= 8) return 6
-  if (dbStep === 9) return 7  // structure
+  if (dbStep === 9) return 7
-  if (dbStep === 10) return 8 // grid-build
+  if (dbStep === 10) return 8
-  if (dbStep === 11) return 9 // grid-review
+  if (dbStep === 11) return 9
-  return 10                   // ground-truth
+  return 13
 }
 /** Document group: groups multiple sessions from a multi-page upload */
 export interface DocumentGroup {
-  group_id: string
+  group_id: string; title: string; page_count: number; sessions: DocumentGroupSession[]
  title: string
  page_count: number
  sessions: DocumentGroupSession[]
 }
 export interface DocumentGroupSession {
-  id: string
+  id: string; name: string; page_number: number; current_step: number
-  name: string
+  status: string; document_category?: DocumentCategory; created_at: string
  page_number: number
  current_step: number
  status: string
  document_category?: DocumentCategory
  created_at: string
 }
 /** Engine source for OCR transparency */
 export type OcrEngineSource = 'both' | 'paddle_only' | 'tesseract_only' | 'conflict_paddle' | 'conflict_tesseract'
 export interface OcrTransparentWord {
-  text: string
+  text: string; left: number; top: number; width: number; height: number
-  left: number
+  conf: number; engine_source: OcrEngineSource
  top: number
  width: number
  height: number
  conf: number
  engine_source: OcrEngineSource
 }
 export interface OcrTransparentResult {
@@ -108,11 +260,7 @@ export interface OcrTransparentResult {
  raw_paddle: { words: OcrTransparentWord[] }
  merged: { words: OcrTransparentWord[] }
  stats: {
-    total_words: number
+    total_words: number; both_agree: number; paddle_only: number
-    both_agree: number
+    tesseract_only: number; conflict_paddle_wins: number; conflict_tesseract_wins: number
    paddle_only: number
    tesseract_only: number
    conflict_paddle_wins: number
    conflict_tesseract_wins: number
  }
 }
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/useKombiPipeline.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/useKombiPipeline.ts
@@ -2,9 +2,8 @@
 import { useCallback, useEffect, useState, useRef } from 'react'
 import { useSearchParams } from 'next/navigation'
-import type { PipelineStep, DocumentCategory } from './types'
+import type { PipelineStep, DocumentCategory, SessionListItem } from './types'
 import { KOMBI_V2_STEPS, dbStepToKombiV2Ui } from './types'
 import type { SubSession, SessionListItem } from '../ocr-pipeline/types'
 export type { SessionListItem }
@@ -33,8 +32,7 @@ export function useKombiPipeline() {
  const [loadingSessions, setLoadingSessions] = useState(true)
  const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
  const [isGroundTruth, setIsGroundTruth] = useState(false)
-  const [subSessions, setSubSessions] = useState<SubSession[]>([])
+  const [pageNumber, setPageNumber] = useState<number | null>(null)
  const [parentSessionId, setParentSessionId] = useState<string | null>(null)
  const [steps, setSteps] = useState<PipelineStep[]>(initSteps())
  const searchParams = useSearchParams()
@@ -115,7 +113,7 @@ export function useKombiPipeline() {
  // ---- Open session ----
-  const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => {
+  const openSession = useCallback(async (sid: string) => {
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
      if (!res.ok) return
@@ -125,26 +123,19 @@ export function useKombiPipeline() {
      setSessionName(data.name || data.filename || '')
      setActiveCategory(data.document_category || undefined)
      setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
-
+      setPageNumber(data.grid_editor_result?.page_number?.number ?? null)
      // Sub-session handling
      if (data.sub_sessions?.length > 0) {
        setSubSessions(data.sub_sessions)
        setParentSessionId(sid)
      } else if (data.parent_session_id) {
        setParentSessionId(data.parent_session_id)
      } else if (!keepSubSessions) {
        setSubSessions([])
        setParentSessionId(null)
      }
      // Determine UI step from DB state
      const dbStep = data.current_step || 1
      const hasGrid = !!data.grid_editor_result
      const hasStructure = !!data.structure_result
      const hasWords = !!data.word_result
      const hasGutterRepair = !!(data.ground_truth?.gutter_repair)
      let uiStep: number
-      if (hasGrid) {
+      if (hasGrid && hasGutterRepair) {
        uiStep = 10 // gutter-repair (already analysed)
      } else if (hasGrid) {
        uiStep = 9 // grid-review
      } else if (hasStructure) {
        uiStep = 8 // grid-build
@@ -159,22 +150,10 @@ export function useKombiPipeline() {
        uiStep = 1
      }
      const skipIds: string[] = []
      const isSubSession = !!data.parent_session_id
      if (isSubSession && dbStep >= 5) {
        skipIds.push('upload', 'orientation', 'page-split', 'deskew', 'dewarp', 'content-crop')
        if (uiStep < 6) uiStep = 6
      } else if (isSubSession && dbStep >= 2) {
        skipIds.push('upload', 'orientation')
        if (uiStep < 2) uiStep = 2
      }
      setSteps(
        KOMBI_V2_STEPS.map((s, i) => ({
          ...s,
-          status: skipIds.includes(s.id)
+          status: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
            ? 'skipped'
            : i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
        })),
      )
      setCurrentStep(uiStep)
@@ -226,8 +205,6 @@ export function useKombiPipeline() {
      setSteps(initSteps())
      setCurrentStep(0)
      setSessionId(null)
      setSubSessions([])
      setParentSessionId(null)
      loadSessions()
      return
    }
@@ -249,8 +226,6 @@ export function useKombiPipeline() {
    setSessionId(null)
    setSessionName('')
    setCurrentStep(0)
    setSubSessions([])
    setParentSessionId(null)
    setSteps(initSteps())
  }, [])
@@ -292,40 +267,6 @@ export function useKombiPipeline() {
    }
  }, [sessionId])
  // ---- Orientation completion (checks for page-split sub-sessions) ----
  const handleOrientationComplete = useCallback(async (sid: string) => {
    setSessionId(sid)
    loadSessions()
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
      if (res.ok) {
        const data = await res.json()
        if (data.sub_sessions?.length > 0) {
          const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
            id: s.id,
            name: s.name,
            box_index: s.box_index,
            current_step: s.current_step,
          }))
          setSubSessions(subs)
          setParentSessionId(sid)
          openSession(subs[0].id, true)
          return
        }
      }
    } catch (e) {
      console.error('Failed to check for sub-sessions:', e)
    }
    handleNext()
  }, [loadSessions, openSession, handleNext])
  const handleSessionChange = useCallback((newSessionId: string) => {
    openSession(newSessionId, true)
  }, [openSession])
  return {
    // State
    currentStep,
@@ -335,8 +276,7 @@ export function useKombiPipeline() {
    loadingSessions,
    activeCategory,
    isGroundTruth,
-    subSessions,
+    pageNumber,
    parentSessionId,
    steps,
    gridSaveRef,
    // Computed
@@ -351,11 +291,7 @@ export function useKombiPipeline() {
    deleteSession,
    renameSession,
    updateCategory,
    handleOrientationComplete,
    handleSessionChange,
    setSessionId,
    setSubSessions,
    setParentSessionId,
    setSessionName,
    setIsGroundTruth,
  }
--- a/admin-lehrer/app/(admin)/ai/ocr-overlay/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-overlay/page.tsx
@@ -1,751 +0,0 @@
 'use client'
 import { useCallback, useEffect, useState, useRef } from 'react'
 import { useSearchParams } from 'next/navigation'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
 import { StepOrientation } from '@/components/ocr-pipeline/StepOrientation'
 import { StepDeskew } from '@/components/ocr-pipeline/StepDeskew'
 import { StepDewarp } from '@/components/ocr-pipeline/StepDewarp'
 import { StepCrop } from '@/components/ocr-pipeline/StepCrop'
 import { StepStructureDetection } from '@/components/ocr-pipeline/StepStructureDetection'
 import { StepRowDetection } from '@/components/ocr-pipeline/StepRowDetection'
 import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecognition'
 import { OverlayReconstruction } from '@/components/ocr-overlay/OverlayReconstruction'
 import { PaddleDirectStep } from '@/components/ocr-overlay/PaddleDirectStep'
 import { GridEditor } from '@/components/grid-editor/GridEditor'
 import { StepGridReview } from '@/components/ocr-pipeline/StepGridReview'
 import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
 import { OVERLAY_PIPELINE_STEPS, PADDLE_DIRECT_STEPS, KOMBI_STEPS, DOCUMENT_CATEGORIES, dbStepToOverlayUi, type PipelineStep, type SessionListItem, type DocumentCategory } from './types'
 import type { SubSession } from '../ocr-pipeline/types'
 const KLAUSUR_API = '/klausur-api'
 export default function OcrOverlayPage() {
  const [mode, setMode] = useState<'pipeline' | 'paddle-direct' | 'kombi'>('pipeline')
  const [currentStep, setCurrentStep] = useState(0)
  const [sessionId, setSessionId] = useState<string | null>(null)
  const [sessionName, setSessionName] = useState<string>('')
  const [sessions, setSessions] = useState<SessionListItem[]>([])
  const [loadingSessions, setLoadingSessions] = useState(true)
  const [editingName, setEditingName] = useState<string | null>(null)
  const [editNameValue, setEditNameValue] = useState('')
  const [editingCategory, setEditingCategory] = useState<string | null>(null)
  const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
  const [editingActiveCategory, setEditingActiveCategory] = useState(false)
  const [subSessions, setSubSessions] = useState<SubSession[]>([])
  const [parentSessionId, setParentSessionId] = useState<string | null>(null)
  const [isGroundTruth, setIsGroundTruth] = useState(false)
  const [gtSaving, setGtSaving] = useState(false)
  const [gtMessage, setGtMessage] = useState('')
  const [steps, setSteps] = useState<PipelineStep[]>(
    OVERLAY_PIPELINE_STEPS.map((s, i) => ({
      ...s,
      status: i === 0 ? 'active' : 'pending',
    })),
  )
  const searchParams = useSearchParams()
  const deepLinkHandled = useRef(false)
  const gridSaveRef = useRef<(() => Promise<void>) | null>(null)
  useEffect(() => {
    loadSessions()
  }, [])
  const loadSessions = async () => {
    setLoadingSessions(true)
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
      if (res.ok) {
        const data = await res.json()
        // Filter to only show top-level sessions (no sub-sessions)
        setSessions((data.sessions || []).filter((s: SessionListItem) => !s.parent_session_id))
      }
    } catch (e) {
      console.error('Failed to load sessions:', e)
    } finally {
      setLoadingSessions(false)
    }
  }
  const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => {
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
      if (!res.ok) return
      const data = await res.json()
      setSessionId(sid)
      setSessionName(data.name || data.filename || '')
      setActiveCategory(data.document_category || undefined)
      setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
      setGtMessage('')
      // Sub-session handling
      if (data.sub_sessions && data.sub_sessions.length > 0) {
        setSubSessions(data.sub_sessions)
        setParentSessionId(sid)
      } else if (data.parent_session_id) {
        setParentSessionId(data.parent_session_id)
      } else if (!keepSubSessions) {
        setSubSessions([])
        setParentSessionId(null)
      }
      const isSubSession = !!data.parent_session_id
      // Mode detection for root sessions with word_result
      const ocrEngine = data.word_result?.ocr_engine
      const isPaddleDirect = ocrEngine === 'paddle_direct'
      const isKombi = ocrEngine === 'kombi' || ocrEngine === 'rapid_kombi'
      let activeMode = mode // keep current mode for sub-sessions
      if (!isSubSession && (isPaddleDirect || isKombi)) {
        activeMode = isKombi ? 'kombi' : 'paddle-direct'
        setMode(activeMode)
      } else if (!isSubSession && !ocrEngine) {
        // Unprocessed root session: keep the user's selected mode
        activeMode = mode
      }
      const baseSteps = activeMode === 'kombi' ? KOMBI_STEPS
        : activeMode === 'paddle-direct' ? PADDLE_DIRECT_STEPS
        : OVERLAY_PIPELINE_STEPS
      // Determine UI step
      let uiStep: number
      const skipIds: string[] = []
      if (!isSubSession && (isPaddleDirect || isKombi)) {
        const hasGrid = isKombi && data.grid_editor_result
        const hasStructure = isKombi && data.structure_result
        uiStep = hasGrid ? 6 : hasStructure ? 6 : data.word_result ? 5 : 4
        if (isPaddleDirect) uiStep = data.word_result ? 4 : 4
      } else {
        const dbStep = data.current_step || 1
        if (dbStep <= 2) uiStep = 0
        else if (dbStep === 3) uiStep = 1
        else if (dbStep === 4) uiStep = 2
        else if (dbStep === 5) uiStep = 3
        else uiStep = 4
        // Sub-session skip logic
        if (isSubSession) {
          if (dbStep >= 5) {
            skipIds.push('orientation', 'deskew', 'dewarp', 'crop')
            if (uiStep < 4) uiStep = 4
          } else if (dbStep >= 2) {
            skipIds.push('orientation')
            if (uiStep < 1) uiStep = 1 // advance past skipped orientation to deskew
          }
        }
      }
      setSteps(
        baseSteps.map((s, i) => ({
          ...s,
          status: skipIds.includes(s.id)
            ? 'skipped'
            : i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
        })),
      )
      setCurrentStep(uiStep)
    } catch (e) {
      console.error('Failed to open session:', e)
    }
  }, [mode])
  // Handle deep-link: ?session=xxx&mode=kombi (from GT Queue page)
  useEffect(() => {
    if (deepLinkHandled.current) return
    const urlSession = searchParams.get('session')
    const urlMode = searchParams.get('mode')
    if (urlSession) {
      deepLinkHandled.current = true
      if (urlMode === 'kombi' || urlMode === 'paddle-direct') {
        setMode(urlMode)
        const baseSteps = urlMode === 'kombi' ? KOMBI_STEPS : PADDLE_DIRECT_STEPS
        setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
      }
      openSession(urlSession)
    }
  }, [searchParams, openSession])
  const deleteSession = useCallback(async (sid: string) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
      setSessions((prev) => prev.filter((s) => s.id !== sid))
      if (sessionId === sid) {
        setSessionId(null)
        setCurrentStep(0)
        setSubSessions([])
        setParentSessionId(null)
        const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
        setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
      }
    } catch (e) {
      console.error('Failed to delete session:', e)
    }
  }, [sessionId, mode])
  const renameSession = useCallback(async (sid: string, newName: string) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ name: newName }),
      })
      setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, name: newName } : s)))
      if (sessionId === sid) setSessionName(newName)
    } catch (e) {
      console.error('Failed to rename session:', e)
    }
    setEditingName(null)
  }, [sessionId])
  const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ document_category: category }),
      })
      setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, document_category: category } : s)))
      if (sessionId === sid) setActiveCategory(category)
    } catch (e) {
      console.error('Failed to update category:', e)
    }
    setEditingCategory(null)
  }, [sessionId])
  const handleStepClick = (index: number) => {
    if (index <= currentStep || steps[index].status === 'completed') {
      setCurrentStep(index)
    }
  }
  const goToStep = (step: number) => {
    setCurrentStep(step)
    setSteps((prev) =>
      prev.map((s, i) => ({
        ...s,
        status: i < step ? 'completed' : i === step ? 'active' : 'pending',
      })),
    )
  }
  const handleNext = () => {
    if (currentStep >= steps.length - 1) {
      // Sub-session completed — switch back to parent
      if (parentSessionId && sessionId !== parentSessionId) {
        setSubSessions((prev) =>
          prev.map((s) => s.id === sessionId ? { ...s, status: 'completed', current_step: 10 } : s)
        )
        handleSessionChange(parentSessionId)
        return
      }
      // Last step completed — return to session list
      const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
      setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
      setCurrentStep(0)
      setSessionId(null)
      setSubSessions([])
      setParentSessionId(null)
      loadSessions()
      return
    }
    const nextStep = currentStep + 1
    setSteps((prev) =>
      prev.map((s, i) => {
        if (i === currentStep) return { ...s, status: 'completed' }
        if (i === nextStep) return { ...s, status: 'active' }
        return s
      }),
    )
    setCurrentStep(nextStep)
  }
  const handleOrientationComplete = async (sid: string) => {
    setSessionId(sid)
    loadSessions()
    // Check for page-split sub-sessions directly from API
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
      if (res.ok) {
        const data = await res.json()
        if (data.sub_sessions?.length > 0) {
          const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
            id: s.id,
            name: s.name,
            box_index: s.box_index,
            current_step: s.current_step,
          }))
          setSubSessions(subs)
          setParentSessionId(sid)
          openSession(subs[0].id, true)
          return
        }
      }
    } catch (e) {
      console.error('Failed to check for sub-sessions:', e)
    }
    handleNext()
  }
  const handleBoxSessionsCreated = useCallback((subs: SubSession[]) => {
    setSubSessions(subs)
    if (sessionId) setParentSessionId(sessionId)
  }, [sessionId])
  const handleSessionChange = useCallback((newSessionId: string) => {
    openSession(newSessionId, true)
  }, [openSession])
  const handleNewSession = () => {
    setSessionId(null)
    setSessionName('')
    setCurrentStep(0)
    setSubSessions([])
    setParentSessionId(null)
    const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
    setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
  }
  const stepNames: Record<number, string> = {
    1: 'Orientierung',
    2: 'Begradigung',
    3: 'Entzerrung',
    4: 'Zuschneiden',
    5: 'Zeilen',
    6: 'Woerter',
    7: 'Overlay',
  }
  const reprocessFromStep = useCallback(async (uiStep: number) => {
    if (!sessionId) return
    // Map overlay UI step to DB step
    const dbStepMap: Record<number, number> = { 0: 2, 1: 3, 2: 4, 3: 5, 4: 7, 5: 8, 6: 9 }
    const dbStep = dbStepMap[uiStep] || uiStep + 1
    if (!confirm(`Ab Schritt ${uiStep + 1} (${stepNames[uiStep + 1] || '?'}) neu verarbeiten?`)) return
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ from_step: dbStep }),
      })
      if (!res.ok) {
        const data = await res.json().catch(() => ({}))
        console.error('Reprocess failed:', data.detail || res.status)
        return
      }
      goToStep(uiStep)
    } catch (e) {
      console.error('Reprocess error:', e)
    }
  // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [sessionId, goToStep])
  const handleMarkGroundTruth = async () => {
    if (!sessionId) return
    setGtSaving(true)
    setGtMessage('')
    try {
      // Auto-save grid editor before marking GT (so DB has latest edits)
      if (gridSaveRef.current) {
        await gridSaveRef.current()
      }
      const resp = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/mark-ground-truth?pipeline=${mode}`,
        { method: 'POST' }
      )
      if (!resp.ok) {
        const body = await resp.text().catch(() => '')
        throw new Error(`Ground Truth fehlgeschlagen (${resp.status}): ${body}`)
      }
      const data = await resp.json()
      setIsGroundTruth(true)
      setGtMessage(`Ground Truth gespeichert (${data.cells_saved} Zellen)`)
      setTimeout(() => setGtMessage(''), 5000)
    } catch (e) {
      setGtMessage(e instanceof Error ? e.message : String(e))
    } finally {
      setGtSaving(false)
    }
  }
  const isLastStep = currentStep === steps.length - 1
  const showGtButton = isLastStep && sessionId != null
  const renderStep = () => {
    if (mode === 'paddle-direct' || mode === 'kombi') {
      switch (currentStep) {
        case 0:
          return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
        case 1:
          return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
        case 2:
          return <StepDewarp key={sessionId} sessionId={sessionId} onNext={handleNext} />
        case 3:
          return <StepCrop key={sessionId} sessionId={sessionId} onNext={handleNext} />
        case 4:
          if (mode === 'kombi') {
            return (
              <PaddleDirectStep
                sessionId={sessionId}
                onNext={handleNext}
                endpoint="paddle-kombi"
                title="Kombi-Modus"
                description="PP-OCRv5 und Tesseract laufen parallel. Koordinaten werden gewichtet gemittelt fuer optimale Positionierung."
                icon="🔀"
                buttonLabel="PP-OCRv5 + Tesseract starten"
                runningLabel="PP-OCRv5 + Tesseract laufen..."
                engineKey="kombi"
              />
            )
          }
          return <PaddleDirectStep sessionId={sessionId} onNext={handleNext} />
        case 5:
          return mode === 'kombi' ? (
            <StepStructureDetection sessionId={sessionId} onNext={handleNext} />
          ) : null
        case 6:
          return mode === 'kombi' ? (
            <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
          ) : null
        default:
          return null
      }
    }
    switch (currentStep) {
      case 0:
        return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
      case 1:
        return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
      case 2:
        return <StepDewarp key={sessionId} sessionId={sessionId} onNext={handleNext} />
      case 3:
        return <StepCrop key={sessionId} sessionId={sessionId} onNext={handleNext} />
      case 4:
        return <StepRowDetection sessionId={sessionId} onNext={handleNext} />
      case 5:
        return <StepWordRecognition sessionId={sessionId} onNext={handleNext} goToStep={goToStep} skipHealGaps />
      case 6:
        return <OverlayReconstruction sessionId={sessionId} onNext={handleNext} />
      default:
        return null
    }
  }
  return (
    <div className="space-y-6">
      <PagePurpose
        title="OCR Overlay"
        purpose="Ganzseitige Overlay-Rekonstruktion: Scan begradigen, Zeilen und Woerter erkennen, dann pixelgenau ueber das Bild legen. Ohne Spaltenerkennung — ideal fuer Arbeitsblaetter."
        audience={['Entwickler']}
        architecture={{
          services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract'],
          databases: ['PostgreSQL Sessions'],
        }}
        relatedPages={[
          { name: 'OCR Pipeline', href: '/ai/ocr-pipeline', description: 'Volle Pipeline mit Spalten' },
          { name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'Methoden-Vergleich' },
        ]}
        defaultCollapsed
      />
      {/* Session List */}
      <div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
        <div className="flex items-center justify-between mb-3">
          <h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
            Sessions ({sessions.length})
          </h3>
          <button
            onClick={handleNewSession}
            className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
          >
            + Neue Session
          </button>
        </div>
        {loadingSessions ? (
          <div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
        ) : sessions.length === 0 ? (
          <div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
        ) : (
          <div className="space-y-1.5 max-h-[320px] overflow-y-auto">
            {sessions.map((s) => {
              const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === s.document_category)
              return (
                <div
                  key={s.id}
                  className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
                    sessionId === s.id
                      ? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
                      : 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
                  }`}
                >
                  {/* Thumbnail */}
                  <div
                    className="flex-shrink-0 w-12 h-12 rounded-md overflow-hidden bg-gray-100 dark:bg-gray-700"
                    onClick={() => openSession(s.id)}
                  >
                    {/* eslint-disable-next-line @next/next/no-img-element */}
                    <img
                      src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${s.id}/thumbnail?size=96`}
                      alt=""
                      className="w-full h-full object-cover"
                      loading="lazy"
                      onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
                    />
                  </div>
                  {/* Info */}
                  <div className="flex-1 min-w-0" onClick={() => openSession(s.id)}>
                    {editingName === s.id ? (
                      <input
                        autoFocus
                        value={editNameValue}
                        onChange={(e) => setEditNameValue(e.target.value)}
                        onBlur={() => renameSession(s.id, editNameValue)}
                        onKeyDown={(e) => {
                          if (e.key === 'Enter') renameSession(s.id, editNameValue)
                          if (e.key === 'Escape') setEditingName(null)
                        }}
                        onClick={(e) => e.stopPropagation()}
                        className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
                      />
                    ) : (
                      <div className="truncate font-medium text-gray-700 dark:text-gray-300">
                        {s.name || s.filename}
                      </div>
                    )}
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        navigator.clipboard.writeText(s.id)
                        const btn = e.currentTarget
                        btn.textContent = 'Kopiert!'
                        setTimeout(() => { btn.textContent = `ID: ${s.id.slice(0, 8)}` }, 1500)
                      }}
                      className="text-[10px] font-mono text-gray-400 hover:text-teal-500 transition-colors"
                      title={`Volle ID: ${s.id} — Klick zum Kopieren`}
                    >
                      ID: {s.id.slice(0, 8)}
                    </button>
                    <div className="text-xs text-gray-400 flex gap-2 mt-0.5">
                      <span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
                    </div>
                  </div>
                  {/* Category Badge */}
                  <div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
                    <button
                      onClick={() => setEditingCategory(editingCategory === s.id ? null : s.id)}
                      className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
                        catInfo
                          ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
                          : 'bg-gray-50 dark:bg-gray-700 border-gray-200 dark:border-gray-600 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300'
                      }`}
                      title="Kategorie setzen"
                    >
                      {catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
                    </button>
                  </div>
                  {/* Actions */}
                  <div className="flex flex-col gap-0.5 flex-shrink-0">
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        setEditNameValue(s.name || s.filename)
                        setEditingName(s.id)
                      }}
                      className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
                      title="Umbenennen"
                    >
                      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
                        <path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
                      </svg>
                    </button>
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        if (confirm('Session loeschen?')) deleteSession(s.id)
                      }}
                      className="p-1 text-gray-400 hover:text-red-500"
                      title="Loeschen"
                    >
                      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
                        <path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
                      </svg>
                    </button>
                  </div>
                  {/* Category dropdown */}
                  {editingCategory === s.id && (
                    <div
                      className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
                      onClick={(e) => e.stopPropagation()}
                    >
                      {DOCUMENT_CATEGORIES.map((cat) => (
                        <button
                          key={cat.value}
                          onClick={() => updateCategory(s.id, cat.value)}
                          className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
                            s.document_category === cat.value
                              ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
                              : 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
                          }`}
                        >
                          {cat.icon} {cat.label}
                        </button>
                      ))}
                    </div>
                  )}
                </div>
              )
            })}
          </div>
        )}
      </div>
      {/* Active session info + category picker */}
      {sessionId && sessionName && (
        <div className="relative flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
          <span>Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span></span>
          <button
            onClick={() => setEditingActiveCategory(!editingActiveCategory)}
            className={`text-xs px-2.5 py-1 rounded-full border transition-colors ${
              activeCategory
                ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300 hover:bg-teal-100 dark:hover:bg-teal-900/50'
                : 'bg-amber-50 dark:bg-amber-900/20 border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300 hover:bg-amber-100 dark:hover:bg-amber-900/40 animate-pulse'
            }`}
          >
            {activeCategory ? (() => {
              const cat = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
              return cat ? `${cat.icon} ${cat.label}` : activeCategory
            })() : 'Kategorie setzen'}
          </button>
          {isGroundTruth && (
            <span className="text-xs px-2 py-0.5 rounded-full bg-amber-50 dark:bg-amber-900/20 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
              GT
            </span>
          )}
          {editingActiveCategory && (
            <div className="absolute left-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64">
              {DOCUMENT_CATEGORIES.map((cat) => (
                <button
                  key={cat.value}
                  onClick={() => {
                    updateCategory(sessionId, cat.value)
                    setEditingActiveCategory(false)
                  }}
                  className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
                    activeCategory === cat.value
                      ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
                      : 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
                  }`}
                >
                  {cat.icon} {cat.label}
                </button>
              ))}
            </div>
          )}
        </div>
      )}
      {/* Mode Toggle */}
      <div className="flex items-center gap-1 bg-gray-100 dark:bg-gray-800 rounded-lg p-1 w-fit">
        <button
          onClick={() => {
            if (mode === 'pipeline') return
            setMode('pipeline')
            setCurrentStep(0)
            setSessionId(null)
            setSteps(OVERLAY_PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
          }}
          className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
            mode === 'pipeline'
              ? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
              : 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
          }`}
        >
          Pipeline (7 Schritte)
        </button>
        <button
          onClick={() => {
            if (mode === 'paddle-direct') return
            setMode('paddle-direct')
            setCurrentStep(0)
            setSessionId(null)
            setSteps(PADDLE_DIRECT_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
          }}
          className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
            mode === 'paddle-direct'
              ? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
              : 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
          }`}
        >
          PP-OCRv5 Direct (5 Schritte)
        </button>
        <button
          onClick={() => {
            if (mode === 'kombi') return
            setMode('kombi')
            setCurrentStep(0)
            setSessionId(null)
            setSteps(KOMBI_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
          }}
          className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
            mode === 'kombi'
              ? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
              : 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
          }`}
        >
          Kombi (7 Schritte)
        </button>
      </div>
      <PipelineStepper
        steps={steps}
        currentStep={currentStep}
        onStepClick={handleStepClick}
        onReprocess={mode === 'pipeline' && sessionId != null ? reprocessFromStep : undefined}
      />
      {subSessions.length > 0 && parentSessionId && sessionId && (
        <BoxSessionTabs
          parentSessionId={parentSessionId}
          subSessions={subSessions}
          activeSessionId={sessionId}
          onSessionChange={handleSessionChange}
        />
      )}
      <div className="min-h-[400px]">{renderStep()}</div>
      {/* Ground Truth button bar — visible on last step */}
      {showGtButton && (
        <div className="sticky bottom-0 bg-white dark:bg-gray-900 border-t dark:border-gray-700 py-3 px-4 -mx-1 flex items-center justify-between rounded-b-xl">
          <div className="text-sm text-gray-500 dark:text-gray-400">
            {gtMessage && (
              <span className={gtMessage.includes('fehlgeschlagen') ? 'text-red-500' : 'text-amber-600 dark:text-amber-400'}>
                {gtMessage}
              </span>
            )}
          </div>
          <button
            onClick={handleMarkGroundTruth}
            disabled={gtSaving}
            className="px-4 py-2 text-sm bg-amber-600 text-white rounded hover:bg-amber-700 disabled:opacity-50"
          >
            {gtSaving ? 'Speichere...' : isGroundTruth ? 'Ground Truth aktualisieren' : 'Als Ground Truth markieren'}
          </button>
        </div>
      )}
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/ai/ocr-overlay/types.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-overlay/types.ts
@@ -1,87 +0,0 @@
 import type { PipelineStep } from '../ocr-pipeline/types'
 // Re-export types used by overlay components
 export type {
  PipelineStep,
  PipelineStepStatus,
  SessionListItem,
  SessionInfo,
  DocumentCategory,
  DocumentTypeResult,
  OrientationResult,
  CropResult,
  DeskewResult,
  DewarpResult,
  RowResult,
  RowItem,
  GridResult,
  GridCell,
  OcrWordBox,
  WordBbox,
  ColumnMeta,
 } from '../ocr-pipeline/types'
 export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
 /**
 * 7-step pipeline for full-page overlay reconstruction.
 * Skips: Spalten (columns), LLM-Review (Korrektur), Ground-Truth (Validierung)
 */
 export const OVERLAY_PIPELINE_STEPS: PipelineStep[] = [
  { id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
  { id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
  { id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
  { id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
  { id: 'rows', name: 'Zeilen', icon: '📏', status: 'pending' },
  { id: 'words', name: 'Woerter', icon: '🔤', status: 'pending' },
  { id: 'reconstruction', name: 'Overlay', icon: '🏗️', status: 'pending' },
 ]
 /** Map from overlay UI step index to DB step number (1-indexed) */
 export const OVERLAY_UI_TO_DB: Record<number, number> = {
  0: 2,  // orientation
  1: 3,  // deskew
  2: 4,  // dewarp
  3: 5,  // crop
  4: 6,  // rows (skip columns=6 in DB, rows=7 — but we reuse DB step numbering)
  5: 7,  // words
  6: 9,  // reconstruction
 }
 /**
 * 5-step pipeline for Paddle Direct mode.
 * Same preprocessing (orient/deskew/dewarp/crop), then PaddleOCR replaces rows+words+overlay.
 */
 export const PADDLE_DIRECT_STEPS: PipelineStep[] = [
  { id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
  { id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
  { id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
  { id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
  { id: 'paddle-direct', name: 'PP-OCRv5 + Overlay', icon: '⚡', status: 'pending' },
 ]
 /**
 * 5-step pipeline for Kombi mode (PP-OCRv5 + Tesseract).
 * Same preprocessing, then both engines run and results are merged.
 */
 export const KOMBI_STEPS: PipelineStep[] = [
  { id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
  { id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
  { id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
  { id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
  { id: 'kombi', name: 'PP-OCRv5 + Tesseract', icon: '🔀', status: 'pending' },
  { id: 'structure', name: 'Struktur', icon: '🔍', status: 'pending' },
  { id: 'grid-editor', name: 'Review & GT', icon: '📊', status: 'pending' },
 ]
 /** Map from DB step to overlay UI step index */
 export function dbStepToOverlayUi(dbStep: number): number {
  // DB: 1=start, 2=orient, 3=deskew, 4=dewarp, 5=crop, 6=columns, 7=rows, 8=words, 9=recon, 10=gt
  if (dbStep <= 2) return 0  // orientation
  if (dbStep === 3) return 1 // deskew
  if (dbStep === 4) return 2 // dewarp
  if (dbStep === 5) return 3 // crop
  if (dbStep <= 7) return 4  // rows (skip columns)
  if (dbStep === 8) return 5 // words
  return 6                   // reconstruction
 }
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/page.tsx
@@ -1,443 +0,0 @@
 'use client'
 import { Suspense, useCallback, useEffect, useState } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
 import { StepOrientation } from '@/components/ocr-pipeline/StepOrientation'
 import { StepCrop } from '@/components/ocr-pipeline/StepCrop'
 import { StepDeskew } from '@/components/ocr-pipeline/StepDeskew'
 import { StepDewarp } from '@/components/ocr-pipeline/StepDewarp'
 import { StepStructureDetection } from '@/components/ocr-pipeline/StepStructureDetection'
 import { StepColumnDetection } from '@/components/ocr-pipeline/StepColumnDetection'
 import { StepRowDetection } from '@/components/ocr-pipeline/StepRowDetection'
 import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecognition'
 import { StepLlmReview } from '@/components/ocr-pipeline/StepLlmReview'
 import { StepReconstruction } from '@/components/ocr-pipeline/StepReconstruction'
 import { StepGroundTruth } from '@/components/ocr-pipeline/StepGroundTruth'
 import { DOCUMENT_CATEGORIES, type SessionListItem, type DocumentTypeResult, type DocumentCategory, type SubSession } from './types'
 import { usePipelineNavigation } from './usePipelineNavigation'
 const KLAUSUR_API = '/klausur-api'
 const STEP_NAMES: Record<number, string> = {
  1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
  5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
  9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
 }
 function OcrPipelineContent() {
  const nav = usePipelineNavigation()
  const [sessions, setSessions] = useState<SessionListItem[]>([])
  const [loadingSessions, setLoadingSessions] = useState(true)
  const [editingName, setEditingName] = useState<string | null>(null)
  const [editNameValue, setEditNameValue] = useState('')
  const [editingCategory, setEditingCategory] = useState<string | null>(null)
  const [sessionName, setSessionName] = useState('')
  const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
  const loadSessions = useCallback(async () => {
    setLoadingSessions(true)
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
      if (res.ok) {
        const data = await res.json()
        setSessions(data.sessions || [])
      }
    } catch (e) {
      console.error('Failed to load sessions:', e)
    } finally {
      setLoadingSessions(false)
    }
  }, [])
  useEffect(() => { loadSessions() }, [loadSessions])
  // Sync session name when nav.sessionId changes
  useEffect(() => {
    if (!nav.sessionId) {
      setSessionName('')
      setActiveCategory(undefined)
      return
    }
    const load = async () => {
      try {
        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}`)
        if (!res.ok) return
        const data = await res.json()
        setSessionName(data.name || data.filename || '')
        setActiveCategory(data.document_category || undefined)
      } catch { /* ignore */ }
    }
    load()
  }, [nav.sessionId])
  const openSession = useCallback((sid: string) => {
    nav.goToSession(sid)
  }, [nav])
  const deleteSession = useCallback(async (sid: string) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
      setSessions(prev => prev.filter(s => s.id !== sid))
      if (nav.sessionId === sid) nav.goToSessionList()
    } catch (e) {
      console.error('Failed to delete session:', e)
    }
  }, [nav])
  const renameSession = useCallback(async (sid: string, newName: string) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ name: newName }),
      })
      setSessions(prev => prev.map(s => (s.id === sid ? { ...s, name: newName } : s)))
      if (nav.sessionId === sid) setSessionName(newName)
    } catch (e) {
      console.error('Failed to rename session:', e)
    }
    setEditingName(null)
  }, [nav.sessionId])
  const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ document_category: category }),
      })
      setSessions(prev => prev.map(s => (s.id === sid ? { ...s, document_category: category } : s)))
      if (nav.sessionId === sid) setActiveCategory(category)
    } catch (e) {
      console.error('Failed to update category:', e)
    }
    setEditingCategory(null)
  }, [nav.sessionId])
  const deleteAllSessions = useCallback(async () => {
    if (!confirm('Alle Sessions loeschen? Dies kann nicht rueckgaengig gemacht werden.')) return
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`, { method: 'DELETE' })
      setSessions([])
      nav.goToSessionList()
    } catch (e) {
      console.error('Failed to delete all sessions:', e)
    }
  }, [nav])
  const handleStepClick = (index: number) => {
    if (index <= nav.currentStepIndex || nav.steps[index].status === 'completed') {
      nav.goToStep(index)
    }
  }
  // Orientation: after upload, navigate to session at deskew step
  const handleOrientationComplete = useCallback(async (sid: string) => {
    loadSessions()
    // Navigate directly to deskew step (index 1) for this session
    nav.goToSession(sid)
  }, [nav, loadSessions])
  // Crop: detect doc type then advance
  const handleCropNext = useCallback(async () => {
    if (nav.sessionId) {
      try {
        const res = await fetch(
          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}/detect-type`,
          { method: 'POST' },
        )
        if (res.ok) {
          const data: DocumentTypeResult = await res.json()
          nav.setDocType(data)
        }
      } catch (e) {
        console.error('Doc type detection failed:', e)
      }
    }
    nav.goToNextStep()
  }, [nav])
  const handleDocTypeChange = (newDocType: DocumentTypeResult['doc_type']) => {
    if (!nav.docTypeResult) return
    let skipSteps: string[] = []
    if (newDocType === 'full_text') skipSteps = ['columns', 'rows']
    nav.setDocType({
      ...nav.docTypeResult,
      doc_type: newDocType,
      skip_steps: skipSteps,
      pipeline: newDocType === 'full_text' ? 'full_page' : 'cell_first',
    })
  }
  // Box sub-sessions (column detection) — still supported
  const handleBoxSessionsCreated = useCallback((_subs: SubSession[]) => {
    // Box sub-sessions are tracked by the backend; no client-side state needed anymore
  }, [])
  const renderStep = () => {
    const sid = nav.sessionId
    switch (nav.currentStepIndex) {
      case 0:
        return (
          <StepOrientation
            key={sid}
            sessionId={sid}
            onNext={handleOrientationComplete}
            onSessionList={() => { loadSessions(); nav.goToSessionList() }}
          />
        )
      case 1:
        return <StepDeskew key={sid} sessionId={sid} onNext={nav.goToNextStep} />
      case 2:
        return <StepDewarp key={sid} sessionId={sid} onNext={nav.goToNextStep} />
      case 3:
        return <StepCrop key={sid} sessionId={sid} onNext={handleCropNext} />
      case 4:
        return <StepColumnDetection sessionId={sid} onNext={nav.goToNextStep} onBoxSessionsCreated={handleBoxSessionsCreated} />
      case 5:
        return <StepRowDetection sessionId={sid} onNext={nav.goToNextStep} />
      case 6:
        return <StepWordRecognition sessionId={sid} onNext={nav.goToNextStep} goToStep={nav.goToStep} />
      case 7:
        return <StepStructureDetection sessionId={sid} onNext={nav.goToNextStep} />
      case 8:
        return <StepLlmReview sessionId={sid} onNext={nav.goToNextStep} />
      case 9:
        return <StepReconstruction sessionId={sid} onNext={nav.goToNextStep} />
      case 10:
        return <StepGroundTruth sessionId={sid} onNext={nav.goToNextStep} />
      default:
        return null
    }
  }
  return (
    <div className="space-y-6">
      <PagePurpose
        title="OCR Pipeline"
        purpose="Schrittweise Seitenrekonstruktion: Scan begradigen, Spalten erkennen, Woerter lokalisieren und die Seite Wort fuer Wort nachbauen. Ziel: 10 Vokabelseiten fehlerfrei rekonstruieren."
        audience={['Entwickler', 'Data Scientists']}
        architecture={{
          services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract'],
          databases: ['PostgreSQL Sessions'],
        }}
        relatedPages={[
          { name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'Methoden-Vergleich' },
          { name: 'OCR-Labeling', href: '/ai/ocr-labeling', description: 'Trainingsdaten' },
        ]}
        defaultCollapsed
      />
      {/* Session List */}
      <div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
        <div className="flex items-center justify-between mb-3">
          <h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
            Sessions ({sessions.length})
          </h3>
          <div className="flex gap-2">
            {sessions.length > 0 && (
              <button
                onClick={deleteAllSessions}
                className="text-xs px-3 py-1.5 text-red-600 hover:bg-red-50 dark:hover:bg-red-900/20 rounded-lg transition-colors"
                title="Alle Sessions loeschen"
              >
                Alle loeschen
              </button>
            )}
            <button
              onClick={() => nav.goToSessionList()}
              className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
            >
              + Neue Session
            </button>
          </div>
        </div>
        {loadingSessions ? (
          <div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
        ) : sessions.length === 0 ? (
          <div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
        ) : (
          <div className="space-y-1.5 max-h-[320px] overflow-y-auto">
            {sessions.map((s) => {
              const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === s.document_category)
              return (
                <div
                  key={s.id}
                  className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
                    nav.sessionId === s.id
                      ? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
                      : 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
                  }`}
                >
                  {/* Thumbnail */}
                  <div
                    className="flex-shrink-0 w-12 h-12 rounded-md overflow-hidden bg-gray-100 dark:bg-gray-700"
                    onClick={() => openSession(s.id)}
                  >
                    {/* eslint-disable-next-line @next/next/no-img-element */}
                    <img
                      src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${s.id}/thumbnail?size=96`}
                      alt=""
                      className="w-full h-full object-cover"
                      loading="lazy"
                      onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
                    />
                  </div>
                  {/* Info */}
                  <div className="flex-1 min-w-0" onClick={() => openSession(s.id)}>
                    {editingName === s.id ? (
                      <input
                        autoFocus
                        value={editNameValue}
                        onChange={(e) => setEditNameValue(e.target.value)}
                        onBlur={() => renameSession(s.id, editNameValue)}
                        onKeyDown={(e) => {
                          if (e.key === 'Enter') renameSession(s.id, editNameValue)
                          if (e.key === 'Escape') setEditingName(null)
                        }}
                        onClick={(e) => e.stopPropagation()}
                        className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
                      />
                    ) : (
                      <div className="truncate font-medium text-gray-700 dark:text-gray-300">
                        {s.name || s.filename}
                      </div>
                    )}
                    {/* ID row */}
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        navigator.clipboard.writeText(s.id)
                        const btn = e.currentTarget
                        btn.textContent = 'Kopiert!'
                        setTimeout(() => { btn.textContent = `ID: ${s.id.slice(0, 8)}` }, 1500)
                      }}
                      className="text-[10px] font-mono text-gray-400 hover:text-teal-500 transition-colors"
                      title={`Volle ID: ${s.id} — Klick zum Kopieren`}
                    >
                      ID: {s.id.slice(0, 8)}
                    </button>
                    <div className="text-xs text-gray-400 flex gap-2 mt-0.5">
                      <span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
                      <span>Schritt {s.current_step}: {STEP_NAMES[s.current_step] || '?'}</span>
                    </div>
                  </div>
                  {/* Badges */}
                  <div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
                    <button
                      onClick={() => setEditingCategory(editingCategory === s.id ? null : s.id)}
                      className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
                        catInfo
                          ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
                          : 'bg-gray-50 dark:bg-gray-700 border-gray-200 dark:border-gray-600 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300'
                      }`}
                      title="Kategorie setzen"
                    >
                      {catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
                    </button>
                    {s.doc_type && (
                      <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
                        {s.doc_type}
                      </span>
                    )}
                  </div>
                  {/* Action buttons */}
                  <div className="flex flex-col gap-0.5 flex-shrink-0">
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        setEditNameValue(s.name || s.filename)
                        setEditingName(s.id)
                      }}
                      className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
                      title="Umbenennen"
                    >
                      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
                        <path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
                      </svg>
                    </button>
                    <button
                      onClick={(e) => {
                        e.stopPropagation()
                        if (confirm('Session loeschen?')) deleteSession(s.id)
                      }}
                      className="p-1 text-gray-400 hover:text-red-500"
                      title="Loeschen"
                    >
                      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
                        <path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
                      </svg>
                    </button>
                  </div>
                  {/* Category dropdown */}
                  {editingCategory === s.id && (
                    <div
                      className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
                      onClick={(e) => e.stopPropagation()}
                    >
                      {DOCUMENT_CATEGORIES.map((cat) => (
                        <button
                          key={cat.value}
                          onClick={() => updateCategory(s.id, cat.value)}
                          className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
                            s.document_category === cat.value
                              ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
                              : 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
                          }`}
                        >
                          {cat.icon} {cat.label}
                        </button>
                      ))}
                    </div>
                  )}
                </div>
              )
            })}
          </div>
        )}
      </div>
      {/* Active session info */}
      {nav.sessionId && sessionName && (
        <div className="flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
          <span>Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span></span>
          {activeCategory && (() => {
            const cat = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
            return cat ? <span className="text-xs px-2 py-0.5 rounded-full bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300">{cat.icon} {cat.label}</span> : null
          })()}
          {nav.docTypeResult && (
            <span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
              {nav.docTypeResult.doc_type}
            </span>
          )}
        </div>
      )}
      <PipelineStepper
        steps={nav.steps}
        currentStep={nav.currentStepIndex}
        onStepClick={handleStepClick}
        onReprocess={nav.sessionId ? nav.reprocessFromStep : undefined}
        docTypeResult={nav.docTypeResult}
        onDocTypeChange={handleDocTypeChange}
      />
      <div className="min-h-[400px]">{renderStep()}</div>
    </div>
  )
 }
 export default function OcrPipelinePage() {
  return (
    <Suspense fallback={<div className="p-8 text-gray-400">Lade Pipeline...</div>}>
      <OcrPipelineContent />
    </Suspense>
  )
 }
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/types.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/types.ts
@@ -1,429 +0,0 @@
 export type PipelineStepStatus = 'pending' | 'active' | 'completed' | 'failed' | 'skipped'
 export interface PipelineStep {
  id: string
  name: string
  icon: string
  status: PipelineStepStatus
 }
 export type DocumentCategory =
  | 'vokabelseite' | 'woerterbuch' | 'buchseite' | 'arbeitsblatt' | 'klausurseite'
  | 'mathearbeit' | 'statistik' | 'zeitung' | 'formular' | 'handschrift' | 'sonstiges'
 export const DOCUMENT_CATEGORIES: { value: DocumentCategory; label: string; icon: string }[] = [
  { value: 'vokabelseite', label: 'Vokabelseite', icon: '📖' },
  { value: 'woerterbuch', label: 'Woerterbuch', icon: '📕' },
  { value: 'buchseite', label: 'Buchseite', icon: '📚' },
  { value: 'arbeitsblatt', label: 'Arbeitsblatt', icon: '📝' },
  { value: 'klausurseite', label: 'Klausurseite', icon: '📄' },
  { value: 'mathearbeit', label: 'Mathearbeit', icon: '🔢' },
  { value: 'statistik', label: 'Statistik', icon: '📊' },
  { value: 'zeitung', label: 'Zeitung', icon: '📰' },
  { value: 'formular', label: 'Formular', icon: '📋' },
  { value: 'handschrift', label: 'Handschrift', icon: '✍️' },
  { value: 'sonstiges', label: 'Sonstiges', icon: '📎' },
 ]
 export interface SessionListItem {
  id: string
  name: string
  filename: string
  status: string
  current_step: number
  document_category?: DocumentCategory
  doc_type?: string
  parent_session_id?: string
  document_group_id?: string
  page_number?: number
  created_at: string
  updated_at?: string
 }
 /** Box sub-session (from column detection zone_type='box') */
 export interface SubSession {
  id: string
  name: string
  box_index: number
  current_step?: number
  status?: string
 }
 export interface PipelineLogEntry {
  step: string
  completed_at: string
  success: boolean
  duration_ms?: number
  metrics: Record<string, unknown>
 }
 export interface PipelineLog {
  steps: PipelineLogEntry[]
 }
 export interface DocumentTypeResult {
  doc_type: 'vocab_table' | 'full_text' | 'generic_table'
  confidence: number
  pipeline: 'cell_first' | 'full_page'
  skip_steps: string[]
  features?: Record<string, unknown>
  duration_seconds?: number
 }
 export interface OrientationResult {
  orientation_degrees: number
  corrected: boolean
  duration_seconds: number
 }
 export interface CropResult {
  crop_applied: boolean
  crop_rect?: { x: number; y: number; width: number; height: number }
  crop_rect_pct?: { x: number; y: number; width: number; height: number }
  original_size: { width: number; height: number }
  cropped_size: { width: number; height: number }
  detected_format?: string
  format_confidence?: number
  aspect_ratio?: number
  border_fractions?: { top: number; bottom: number; left: number; right: number }
  skipped?: boolean
  duration_seconds?: number
 }
 export interface SessionInfo {
  session_id: string
  filename: string
  name?: string
  image_width: number
  image_height: number
  original_image_url: string
  current_step?: number
  document_category?: DocumentCategory
  doc_type?: string
  orientation_result?: OrientationResult
  crop_result?: CropResult
  deskew_result?: DeskewResult
  dewarp_result?: DewarpResult
  column_result?: ColumnResult
  row_result?: RowResult
  word_result?: GridResult
  doc_type_result?: DocumentTypeResult
  sub_sessions?: SubSession[]
  parent_session_id?: string
  box_index?: number
  document_group_id?: string
  page_number?: number
 }
 export interface DeskewResult {
  session_id: string
  angle_hough: number
  angle_word_alignment: number
  angle_iterative?: number
  angle_residual?: number
  angle_textline?: number
  angle_applied: number
  method_used: 'hough' | 'word_alignment' | 'manual' | 'iterative' | 'two_pass' | 'three_pass' | 'manual_combined'
  confidence: number
  duration_seconds: number
  deskewed_image_url: string
  binarized_image_url: string
 }
 export interface DeskewGroundTruth {
  is_correct: boolean
  corrected_angle?: number
  notes?: string
 }
 export interface DewarpDetection {
  method: string
  shear_degrees: number
  confidence: number
 }
 export interface DewarpResult {
  session_id: string
  method_used: string
  shear_degrees: number
  confidence: number
  duration_seconds: number
  dewarped_image_url: string
  detections?: DewarpDetection[]
 }
 export interface DewarpGroundTruth {
  is_correct: boolean
  corrected_shear?: number
  notes?: string
 }
 export interface PageRegion {
  type: 'column_en' | 'column_de' | 'column_example' | 'page_ref'
      | 'column_marker' | 'column_text' | 'column_ignore' | 'header' | 'footer'
  x: number
  y: number
  width: number
  height: number
  classification_confidence?: number
  classification_method?: string
 }
 export interface PageZone {
  zone_type: 'content' | 'box'
  y_start: number
  y_end: number
  box?: { x: number; y: number; width: number; height: number }
 }
 export interface ColumnResult {
  columns: PageRegion[]
  duration_seconds: number
  zones?: PageZone[]
 }
 export interface ColumnGroundTruth {
  is_correct: boolean
  corrected_columns?: PageRegion[]
  notes?: string
 }
 export interface ManualColumnDivider {
  xPercent: number  // Position in % of image width (0-100)
 }
 export type ColumnTypeKey = PageRegion['type']
 export interface RowResult {
  rows: RowItem[]
  summary: Record<string, number>
  total_rows: number
  duration_seconds: number
 }
 export interface RowItem {
  index: number
  x: number
  y: number
  width: number
  height: number
  word_count: number
  row_type: 'content' | 'header' | 'footer'
  gap_before: number
 }
 export interface RowGroundTruth {
  is_correct: boolean
  corrected_rows?: RowItem[]
  notes?: string
 }
 export interface StructureGraphic {
  x: number
  y: number
  w: number
  h: number
  area: number
  shape: string   // image, illustration
  color_name: string
  color_hex: string
  confidence: number
 }
 export interface ExcludeRegion {
  x: number
  y: number
  w: number
  h: number
  label?: string
 }
 export interface DocLayoutRegion {
  x: number
  y: number
  w: number
  h: number
  class_name: string
  confidence: number
 }
 export interface StructureResult {
  image_width: number
  image_height: number
  content_bounds: { x: number; y: number; w: number; h: number }
  boxes: StructureBox[]
  zones: StructureZone[]
  graphics: StructureGraphic[]
  exclude_regions?: ExcludeRegion[]
  color_pixel_counts: Record<string, number>
  has_words: boolean
  word_count: number
  border_ghosts_removed?: number
  duration_seconds: number
  /** PP-DocLayout regions (only present when method=ppdoclayout) */
  layout_regions?: DocLayoutRegion[]
  detection_method?: 'opencv' | 'ppdoclayout'
 }
 export interface StructureBox {
  x: number
  y: number
  w: number
  h: number
  confidence: number
  border_thickness: number
  bg_color_name?: string
  bg_color_hex?: string
 }
 export interface StructureZone {
  index: number
  zone_type: 'content' | 'box'
  x: number
  y: number
  w: number
  h: number
 }
 export interface WordBbox {
  x: number
  y: number
  w: number
  h: number
 }
 export interface OcrWordBox {
  text: string
  left: number    // absolute image x in px
  top: number     // absolute image y in px
  width: number   // px
  height: number  // px
  conf: number
  color?: string       // hex color of detected text, e.g. '#dc2626'
  color_name?: string  // 'black' | 'red' | 'blue' | 'green' | 'orange' | 'purple' | 'yellow'
  recovered?: boolean  // true if this word was recovered via color detection
 }
 export interface GridCell {
  cell_id: string          // "R03_C1"
  row_index: number
  col_index: number
  col_type: string
  text: string
  confidence: number
  bbox_px: WordBbox
  bbox_pct: WordBbox
  ocr_engine?: string
  is_bold?: boolean
  status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
  word_boxes?: OcrWordBox[]  // per-word bounding boxes from OCR engine
 }
 export interface ColumnMeta {
  index: number
  type: string
  x: number
  width: number
 }
 export interface GridResult {
  cells: GridCell[]
  grid_shape: { rows: number; cols: number; total_cells: number }
  columns_used: ColumnMeta[]
  layout: 'vocab' | 'generic'
  image_width: number
  image_height: number
  duration_seconds: number
  ocr_engine?: string
  vocab_entries?: WordEntry[]   // Only when layout='vocab'
  entries?: WordEntry[]         // Backwards compat alias for vocab_entries
  entry_count?: number
  summary: {
    total_cells: number
    non_empty_cells: number
    low_confidence: number
    // Only when layout='vocab':
    total_entries?: number
    with_english?: number
    with_german?: number
  }
  llm_review?: {
    changes: { row_index: number; field: string; old: string; new: string }[]
    model_used: string
    duration_ms: number
    entries_corrected: number
    applied_count?: number
    applied_at?: string
  }
 }
 export interface WordEntry {
  row_index: number
  english: string
  german: string
  example: string
  source_page?: string
  marker?: string
  confidence: number
  bbox: WordBbox
  bbox_en: WordBbox | null
  bbox_de: WordBbox | null
  bbox_ex: WordBbox | null
  bbox_ref?: WordBbox | null
  bbox_marker?: WordBbox | null
  status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
 }
 /** @deprecated Use GridResult instead */
 export interface WordResult {
  entries: WordEntry[]
  entry_count: number
  image_width: number
  image_height: number
  duration_seconds: number
  ocr_engine?: string
  summary: {
    total_entries: number
    with_english: number
    with_german: number
    low_confidence: number
  }
 }
 export interface WordGroundTruth {
  is_correct: boolean
  corrected_entries?: WordEntry[]
  notes?: string
 }
 export interface ImageRegion {
  bbox_pct: { x: number; y: number; w: number; h: number }
  prompt: string
  description: string
  image_b64: string | null
  style: 'educational' | 'cartoon' | 'sketch' | 'clipart' | 'realistic'
 }
 export type ImageStyle = ImageRegion['style']
 export const IMAGE_STYLES: { value: ImageStyle; label: string }[] = [
  { value: 'educational', label: 'Lehrbuch' },
  { value: 'cartoon', label: 'Cartoon' },
  { value: 'sketch', label: 'Skizze' },
  { value: 'clipart', label: 'Clipart' },
  { value: 'realistic', label: 'Realistisch' },
 ]
 export const PIPELINE_STEPS: PipelineStep[] = [
  { id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
  { id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
  { id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
  { id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
  { id: 'columns', name: 'Spalten', icon: '📊', status: 'pending' },
  { id: 'rows', name: 'Zeilen', icon: '📏', status: 'pending' },
  { id: 'words', name: 'Woerter', icon: '🔤', status: 'pending' },
  { id: 'structure', name: 'Struktur', icon: '🔍', status: 'pending' },
  { id: 'llm-review', name: 'Korrektur', icon: '✏️', status: 'pending' },
  { id: 'reconstruction', name: 'Rekonstruktion', icon: '🏗️', status: 'pending' },
  { id: 'ground-truth', name: 'Validierung', icon: '✅', status: 'pending' },
 ]
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/usePipelineNavigation.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/usePipelineNavigation.ts
@@ -1,225 +0,0 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
 import { useRouter, useSearchParams } from 'next/navigation'
 import { PIPELINE_STEPS, type PipelineStep, type PipelineStepStatus, type DocumentTypeResult } from './types'
 const KLAUSUR_API = '/klausur-api'
 export interface PipelineNav {
  sessionId: string | null
  currentStepIndex: number
  currentStepId: string
  steps: PipelineStep[]
  docTypeResult: DocumentTypeResult | null
  goToNextStep: () => void
  goToStep: (index: number) => void
  goToSession: (sessionId: string) => void
  goToSessionList: () => void
  setDocType: (result: DocumentTypeResult) => void
  reprocessFromStep: (uiStep: number) => Promise<void>
 }
 const STEP_NAMES: Record<number, string> = {
  1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
  5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
  9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
 }
 function buildSteps(uiStep: number, skipSteps: string[]): PipelineStep[] {
  return PIPELINE_STEPS.map((s, i) => ({
    ...s,
    status: (
      skipSteps.includes(s.id) ? 'skipped'
        : i < uiStep ? 'completed'
          : i === uiStep ? 'active'
            : 'pending'
    ) as PipelineStepStatus,
  }))
 }
 export function usePipelineNavigation(): PipelineNav {
  const router = useRouter()
  const searchParams = useSearchParams()
  const paramSession = searchParams.get('session')
  const paramStep = searchParams.get('step')
  const [sessionId, setSessionId] = useState<string | null>(paramSession)
  const [currentStepIndex, setCurrentStepIndex] = useState(0)
  const [docTypeResult, setDocTypeResult] = useState<DocumentTypeResult | null>(null)
  const [steps, setSteps] = useState<PipelineStep[]>(buildSteps(0, []))
  const [loaded, setLoaded] = useState(false)
  // Load session info when session param changes
  useEffect(() => {
    if (!paramSession) {
      setSessionId(null)
      setCurrentStepIndex(0)
      setDocTypeResult(null)
      setSteps(buildSteps(0, []))
      setLoaded(true)
      return
    }
    const load = async () => {
      try {
        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${paramSession}`)
        if (!res.ok) return
        const data = await res.json()
        setSessionId(paramSession)
        const savedDocType: DocumentTypeResult | null = data.doc_type_result || null
        setDocTypeResult(savedDocType)
        const dbStep = data.current_step || 1
        let uiStep = Math.max(0, dbStep - 1)
        const skipSteps = [...(savedDocType?.skip_steps || [])]
        // Box sub-sessions (from column detection) skip pre-processing
        const isBoxSubSession = !!data.parent_session_id
        if (isBoxSubSession && dbStep >= 5) {
          const SUB_SESSION_SKIP = ['orientation', 'deskew', 'dewarp', 'crop']
          for (const s of SUB_SESSION_SKIP) {
            if (!skipSteps.includes(s)) skipSteps.push(s)
          }
          if (uiStep < 4) uiStep = 4
        }
        // If URL has a step param, use that instead
        if (paramStep) {
          const stepIdx = PIPELINE_STEPS.findIndex(s => s.id === paramStep)
          if (stepIdx >= 0) uiStep = stepIdx
        }
        setCurrentStepIndex(uiStep)
        setSteps(buildSteps(uiStep, skipSteps))
      } catch (e) {
        console.error('Failed to load session:', e)
      } finally {
        setLoaded(true)
      }
    }
    load()
  }, [paramSession, paramStep])
  const updateUrl = useCallback((sid: string | null, stepIdx?: number) => {
    if (!sid) {
      router.push('/ai/ocr-pipeline')
      return
    }
    const stepId = stepIdx !== undefined ? PIPELINE_STEPS[stepIdx]?.id : undefined
    const params = new URLSearchParams()
    params.set('session', sid)
    if (stepId) params.set('step', stepId)
    router.push(`/ai/ocr-pipeline?${params.toString()}`)
  }, [router])
  const goToNextStep = useCallback(() => {
    if (currentStepIndex >= steps.length - 1) {
      // Last step — return to session list
      setSessionId(null)
      setCurrentStepIndex(0)
      setDocTypeResult(null)
      setSteps(buildSteps(0, []))
      router.push('/ai/ocr-pipeline')
      return
    }
    const skipSteps = docTypeResult?.skip_steps || []
    let nextStep = currentStepIndex + 1
    while (nextStep < steps.length && skipSteps.includes(PIPELINE_STEPS[nextStep]?.id)) {
      nextStep++
    }
    if (nextStep >= steps.length) nextStep = steps.length - 1
    setSteps(prev =>
      prev.map((s, i) => {
        if (i === currentStepIndex) return { ...s, status: 'completed' as PipelineStepStatus }
        if (i === nextStep) return { ...s, status: 'active' as PipelineStepStatus }
        if (i > currentStepIndex && i < nextStep && skipSteps.includes(PIPELINE_STEPS[i]?.id)) {
          return { ...s, status: 'skipped' as PipelineStepStatus }
        }
        return s
      }),
    )
    setCurrentStepIndex(nextStep)
    if (sessionId) updateUrl(sessionId, nextStep)
  }, [currentStepIndex, steps.length, docTypeResult, sessionId, updateUrl, router])
  const goToStep = useCallback((index: number) => {
    setCurrentStepIndex(index)
    setSteps(prev =>
      prev.map((s, i) => ({
        ...s,
        status: s.status === 'skipped' ? 'skipped'
          : i < index ? 'completed'
            : i === index ? 'active'
              : 'pending' as PipelineStepStatus,
      })),
    )
    if (sessionId) updateUrl(sessionId, index)
  }, [sessionId, updateUrl])
  const goToSession = useCallback((sid: string) => {
    updateUrl(sid)
  }, [updateUrl])
  const goToSessionList = useCallback(() => {
    setSessionId(null)
    setCurrentStepIndex(0)
    setDocTypeResult(null)
    setSteps(buildSteps(0, []))
    router.push('/ai/ocr-pipeline')
  }, [router])
  const setDocType = useCallback((result: DocumentTypeResult) => {
    setDocTypeResult(result)
    const skipSteps = result.skip_steps || []
    if (skipSteps.length > 0) {
      setSteps(prev =>
        prev.map(s =>
          skipSteps.includes(s.id) ? { ...s, status: 'skipped' as PipelineStepStatus } : s,
        ),
      )
    }
  }, [])
  const reprocessFromStep = useCallback(async (uiStep: number) => {
    if (!sessionId) return
    const dbStep = uiStep + 1
    if (!confirm(`Ab Schritt ${dbStep} (${STEP_NAMES[dbStep] || '?'}) neu verarbeiten? Nachfolgende Daten werden geloescht.`)) return
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ from_step: dbStep }),
      })
      if (!res.ok) {
        const data = await res.json().catch(() => ({}))
        console.error('Reprocess failed:', data.detail || res.status)
        return
      }
      goToStep(uiStep)
    } catch (e) {
      console.error('Reprocess error:', e)
    }
  }, [sessionId, goToStep])
  return {
    sessionId,
    currentStepIndex,
    currentStepId: PIPELINE_STEPS[currentStepIndex]?.id || 'orientation',
    steps,
    docTypeResult,
    goToNextStep,
    goToStep,
    goToSession,
    goToSessionList,
    setDocType,
    reprocessFromStep,
  }
 }
--- a/admin-lehrer/app/(admin)/ai/rag/tests/rag-documents.test.ts
+++ b/admin-lehrer/app/(admin)/ai/rag/tests/rag-documents.test.ts
@@ -0,0 +1,252 @@
 import { describe, it, expect } from 'vitest'
 import ragData from '../rag-documents.json'
 /**
 * Tests fuer rag-documents.json — Branchen-Regulierungs-Matrix
 *
 * Validiert die JSON-Struktur, Branchen-Zuordnung und Datenintegritaet
 * der 320 Dokumente fuer die RAG Landkarte.
 */
 const VALID_INDUSTRY_IDS = ragData.industries.map((i: any) => i.id)
 const VALID_DOC_TYPE_IDS = ragData.doc_types.map((dt: any) => dt.id)
 describe('rag-documents.json — Struktur', () => {
  it('sollte doc_types, industries und documents enthalten', () => {
    expect(ragData).toHaveProperty('doc_types')
    expect(ragData).toHaveProperty('industries')
    expect(ragData).toHaveProperty('documents')
    expect(Array.isArray(ragData.doc_types)).toBe(true)
    expect(Array.isArray(ragData.industries)).toBe(true)
    expect(Array.isArray(ragData.documents)).toBe(true)
  })
  it('sollte genau 10 Branchen haben (VDMA/VDA/BDI)', () => {
    expect(ragData.industries).toHaveLength(10)
    const ids = ragData.industries.map((i: any) => i.id)
    expect(ids).toContain('automotive')
    expect(ids).toContain('maschinenbau')
    expect(ids).toContain('elektrotechnik')
    expect(ids).toContain('chemie')
    expect(ids).toContain('metall')
    expect(ids).toContain('energie')
    expect(ids).toContain('transport')
    expect(ids).toContain('handel')
    expect(ids).toContain('konsumgueter')
    expect(ids).toContain('bau')
  })
  it('sollte keine Pseudo-Branchen enthalten (IoT, KI, HR, KRITIS, etc.)', () => {
    const ids = ragData.industries.map((i: any) => i.id)
    expect(ids).not.toContain('iot')
    expect(ids).not.toContain('ai')
    expect(ids).not.toContain('hr')
    expect(ids).not.toContain('kritis')
    expect(ids).not.toContain('ecommerce')
    expect(ids).not.toContain('tech')
    expect(ids).not.toContain('media')
    expect(ids).not.toContain('public')
  })
  it('sollte 17 Dokumenttypen haben', () => {
    expect(ragData.doc_types.length).toBe(17)
  })
  it('sollte mindestens 300 Dokumente haben', () => {
    expect(ragData.documents.length).toBeGreaterThanOrEqual(300)
  })
  it('sollte jede Branche name und icon haben', () => {
    ragData.industries.forEach((ind: any) => {
      expect(ind).toHaveProperty('id')
      expect(ind).toHaveProperty('name')
      expect(ind).toHaveProperty('icon')
      expect(ind.name.length).toBeGreaterThan(0)
    })
  })
  it('sollte jeden doc_type mit id, label, icon und sort haben', () => {
    ragData.doc_types.forEach((dt: any) => {
      expect(dt).toHaveProperty('id')
      expect(dt).toHaveProperty('label')
      expect(dt).toHaveProperty('icon')
      expect(dt).toHaveProperty('sort')
    })
  })
 })
 describe('rag-documents.json — Dokument-Validierung', () => {
  it('sollte keine doppelten Codes haben', () => {
    const codes = ragData.documents.map((d: any) => d.code)
    const unique = new Set(codes)
    expect(unique.size).toBe(codes.length)
  })
  it('sollte Pflichtfelder bei jedem Dokument haben', () => {
    ragData.documents.forEach((doc: any) => {
      expect(doc).toHaveProperty('code')
      expect(doc).toHaveProperty('name')
      expect(doc).toHaveProperty('doc_type')
      expect(doc).toHaveProperty('industries')
      expect(doc).toHaveProperty('in_rag')
      expect(doc).toHaveProperty('rag_collection')
      expect(doc.code.length).toBeGreaterThan(0)
      expect(doc.name.length).toBeGreaterThan(0)
      expect(Array.isArray(doc.industries)).toBe(true)
    })
  })
  it('sollte nur gueltige doc_type IDs verwenden', () => {
    ragData.documents.forEach((doc: any) => {
      expect(VALID_DOC_TYPE_IDS).toContain(doc.doc_type)
    })
  })
  it('sollte nur gueltige industry IDs verwenden (oder "all")', () => {
    ragData.documents.forEach((doc: any) => {
      doc.industries.forEach((ind: string) => {
        if (ind !== 'all') {
          expect(VALID_INDUSTRY_IDS).toContain(ind)
        }
      })
    })
  })
  it('sollte gueltige rag_collection Namen verwenden', () => {
    const validCollections = [
      'bp_compliance_ce',
      'bp_compliance_gesetze',
      'bp_compliance_datenschutz',
      'bp_dsfa_corpus',
      'bp_legal_templates',
      'bp_compliance_recht',
      'bp_nibis_eh',
    ]
    ragData.documents.forEach((doc: any) => {
      expect(validCollections).toContain(doc.rag_collection)
    })
  })
 })
 describe('rag-documents.json — Branchen-Zuordnungslogik', () => {
  const findDoc = (code: string) => ragData.documents.find((d: any) => d.code === code)
  describe('Horizontale Regulierungen (alle Branchen)', () => {
    const horizontalCodes = [
      'GDPR', 'BDSG_FULL', 'EPRIVACY', 'TDDDG', 'AIACT', 'CRA',
      'NIS2', 'GPSR', 'PLD', 'EUCSA', 'DATAACT',
    ]
    horizontalCodes.forEach((code) => {
      it(`${code} sollte fuer alle Branchen gelten`, () => {
        const doc = findDoc(code)
        if (doc) {
          expect(doc.industries).toContain('all')
        }
      })
    })
  })
  describe('Sektorspezifische Regulierungen', () => {
    it('Maschinenverordnung sollte Maschinenbau, Automotive, Elektrotechnik enthalten', () => {
      const doc = findDoc('MACHINERY_REG')
      if (doc) {
        expect(doc.industries).toContain('maschinenbau')
        expect(doc.industries).toContain('automotive')
        expect(doc.industries).toContain('elektrotechnik')
        expect(doc.industries).not.toContain('all')
      }
    })
    it('ElektroG sollte Elektrotechnik und Automotive enthalten', () => {
      const doc = findDoc('DE_ELEKTROG')
      if (doc) {
        expect(doc.industries).toContain('elektrotechnik')
        expect(doc.industries).toContain('automotive')
      }
    })
    it('BattDG sollte Automotive und Elektrotechnik enthalten', () => {
      const doc = findDoc('DE_BATTDG')
      if (doc) {
        expect(doc.industries).toContain('automotive')
        expect(doc.industries).toContain('elektrotechnik')
      }
    })
    it('ENISA ICS/SCADA sollte Energie, Maschinenbau, Chemie enthalten', () => {
      const doc = findDoc('ENISA_ICS_SCADA')
      if (doc) {
        expect(doc.industries).toContain('energie')
        expect(doc.industries).toContain('maschinenbau')
        expect(doc.industries).toContain('chemie')
      }
    })
  })
  describe('Nicht zutreffende Regulierungen (Finanz/Medizin/Plattformen)', () => {
    const emptyIndustryCodes = ['DORA', 'PSD2', 'MiCA', 'AMLR', 'EHDS', 'DSA', 'DMA', 'MDR']
    emptyIndustryCodes.forEach((code) => {
      it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
        const doc = findDoc(code)
        if (doc) {
          expect(doc.industries).toHaveLength(0)
        }
      })
    })
  })
  describe('BSI-TR-03161 (DiGA) sollte nicht zutreffend sein', () => {
    ['BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3'].forEach((code) => {
      it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
        const doc = findDoc(code)
        if (doc) {
          expect(doc.industries).toHaveLength(0)
        }
      })
    })
  })
 })
 describe('rag-documents.json — Applicability Notes', () => {
  it('sollte applicability_note bei Dokumenten mit description haben', () => {
    const withDescription = ragData.documents.filter((d: any) => d.description)
    const withNote = withDescription.filter((d: any) => d.applicability_note)
    // Mindestens 90% der Dokumente mit Beschreibung sollten eine Note haben
    expect(withNote.length / withDescription.length).toBeGreaterThan(0.9)
  })
  it('horizontale Regulierungen sollten "alle Branchen" in der Note erwaehnen', () => {
    const gdpr = ragData.documents.find((d: any) => d.code === 'GDPR')
    if (gdpr?.applicability_note) {
      expect(gdpr.applicability_note.toLowerCase()).toContain('alle branchen')
    }
  })
  it('nicht zutreffende sollten "nicht zutreffend" in der Note erwaehnen', () => {
    const dora = ragData.documents.find((d: any) => d.code === 'DORA')
    if (dora?.applicability_note) {
      expect(dora.applicability_note.toLowerCase()).toContain('nicht zutreffend')
    }
  })
 })
 describe('rag-documents.json — Dokumenttyp-Verteilung', () => {
  it('sollte Dokumente in jedem doc_type haben', () => {
    ragData.doc_types.forEach((dt: any) => {
      const count = ragData.documents.filter((d: any) => d.doc_type === dt.id).length
      expect(count).toBeGreaterThan(0)
    })
  })
  it('sollte EU-Verordnungen als groesste Kategorie haben (mind. 15)', () => {
    const euRegs = ragData.documents.filter((d: any) => d.doc_type === 'eu_regulation')
    expect(euRegs.length).toBeGreaterThanOrEqual(15)
  })
  it('sollte EDPB Leitlinien als umfangreichste Kategorie haben (mind. 40)', () => {
    const edpb = ragData.documents.filter((d: any) => d.doc_type === 'edpb_guideline')
    expect(edpb.length).toBeGreaterThanOrEqual(40)
  })
 })
--- a/admin-lehrer/app/(admin)/ai/rag/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/rag/page.tsx
--- a/admin-lehrer/app/(admin)/ai/rag/rag-documents.json
+++ b/admin-lehrer/app/(admin)/ai/rag/rag-documents.json
--- a/admin-lehrer/app/(admin)/communication/matrix/page.tsx
+++ b/admin-lehrer/app/(admin)/communication/matrix/page.tsx
@@ -1,593 +0,0 @@
 'use client'
 /**
 * Voice Service Admin Page (migrated from website/admin/voice)
 *
 * Displays:
 * - Voice-First Architecture Overview
 * - Developer Guide Content
 * - Live Voice Demo (embedded from studio-v2)
 * - Task State Machine Documentation
 * - DSGVO Compliance Information
 */
 import { useState } from 'react'
 import Link from 'next/link'
 import { PagePurpose } from '@/components/common/PagePurpose'
 type TabType = 'overview' | 'demo' | 'tasks' | 'intents' | 'dsgvo' | 'api'
 // Task State Machine data
 const TASK_STATES = [
  { state: 'DRAFT', description: 'Task erstellt, noch nicht verarbeitet', color: 'bg-gray-100 text-gray-800', next: ['QUEUED', 'PAUSED'] },
  { state: 'QUEUED', description: 'In Warteschlange fuer Verarbeitung', color: 'bg-blue-100 text-blue-800', next: ['RUNNING', 'PAUSED'] },
  { state: 'RUNNING', description: 'Wird aktuell verarbeitet', color: 'bg-yellow-100 text-yellow-800', next: ['READY', 'PAUSED'] },
  { state: 'READY', description: 'Fertig, wartet auf User-Bestaetigung', color: 'bg-green-100 text-green-800', next: ['APPROVED', 'REJECTED', 'PAUSED'] },
  { state: 'APPROVED', description: 'Vom User bestaetigt', color: 'bg-emerald-100 text-emerald-800', next: ['COMPLETED'] },
  { state: 'REJECTED', description: 'Vom User abgelehnt', color: 'bg-red-100 text-red-800', next: ['DRAFT'] },
  { state: 'COMPLETED', description: 'Erfolgreich abgeschlossen', color: 'bg-teal-100 text-teal-800', next: [] },
  { state: 'EXPIRED', description: 'TTL ueberschritten', color: 'bg-orange-100 text-orange-800', next: [] },
  { state: 'PAUSED', description: 'Vom User pausiert', color: 'bg-purple-100 text-purple-800', next: ['DRAFT', 'QUEUED', 'RUNNING', 'READY'] },
 ]
 // Intent Types (22 types organized by group)
 const INTENT_GROUPS = [
  {
    group: 'Notizen',
    color: 'bg-blue-50 border-blue-200',
    intents: [
      { type: 'student_observation', example: 'Notiz zu Max: heute wiederholt gestoert', description: 'Schuelerbeobachtungen' },
      { type: 'reminder', example: 'Erinner mich morgen an Konferenz', description: 'Erinnerungen setzen' },
      { type: 'homework_check', example: '7b Mathe Hausaufgabe kontrollieren', description: 'Hausaufgaben pruefen' },
      { type: 'conference_topic', example: 'Thema Lehrerkonferenz: iPad-Regeln', description: 'Konferenzthemen' },
      { type: 'correction_thought', example: 'Aufgabe 3: haeufiger Fehler erklaeren', description: 'Korrekturgedanken' },
    ]
  },
  {
    group: 'Content-Generierung',
    color: 'bg-green-50 border-green-200',
    intents: [
      { type: 'worksheet_generate', example: 'Erstelle 3 Lueckentexte zu Vokabeln', description: 'Arbeitsblaetter erstellen' },
      { type: 'quiz_generate', example: '10-Minuten Vokabeltest mit Loesungen', description: 'Quiz/Tests erstellen' },
      { type: 'quick_activity', example: '10 Minuten Einstieg, 5 Aufgaben', description: 'Schnelle Aktivitaeten' },
      { type: 'differentiation', example: 'Zwei Schwierigkeitsstufen: Basis und Plus', description: 'Differenzierung' },
    ]
  },
  {
    group: 'Kommunikation',
    color: 'bg-yellow-50 border-yellow-200',
    intents: [
      { type: 'parent_letter', example: 'Neutraler Elternbrief wegen Stoerungen', description: 'Elternbriefe erstellen' },
      { type: 'class_message', example: 'Nachricht an 8a: Hausaufgaben bis Mittwoch', description: 'Klassennachrichten' },
    ]
  },
  {
    group: 'Canvas-Editor',
    color: 'bg-purple-50 border-purple-200',
    intents: [
      { type: 'canvas_edit', example: 'Ueberschriften groesser, Zeilenabstand kleiner', description: 'Formatierung aendern' },
      { type: 'canvas_layout', example: 'Alles auf eine Seite, Drucklayout A4', description: 'Layout anpassen' },
      { type: 'canvas_element', example: 'Kasten fuer Merke hinzufuegen', description: 'Elemente hinzufuegen' },
      { type: 'canvas_image', example: 'Bild 2 nach links, Pfeil auf Aufgabe 3', description: 'Bilder positionieren' },
    ]
  },
  {
    group: 'RAG & Korrektur',
    color: 'bg-pink-50 border-pink-200',
    intents: [
      { type: 'operator_checklist', example: 'Operatoren-Checkliste fuer diese Aufgabe', description: 'Operatoren abrufen' },
      { type: 'eh_passage', example: 'Erwartungshorizont-Passage zu diesem Thema', description: 'EH-Passagen suchen' },
      { type: 'feedback_suggestion', example: 'Kurze Feedbackformulierung vorschlagen', description: 'Feedback vorschlagen' },
    ]
  },
  {
    group: 'Follow-up (TaskOrchestrator)',
    color: 'bg-teal-50 border-teal-200',
    intents: [
      { type: 'task_summary', example: 'Fasse alle offenen Tasks zusammen', description: 'Task-Uebersicht' },
      { type: 'convert_note', example: 'Mach aus der Notiz von gestern einen Elternbrief', description: 'Notizen konvertieren' },
      { type: 'schedule_reminder', example: 'Erinner mich morgen an das Gespraech mit Max', description: 'Erinnerungen planen' },
    ]
  },
 ]
 // DSGVO Data Categories
 const DSGVO_CATEGORIES = [
  { category: 'Audio', processing: 'NUR transient im RAM, NIEMALS persistiert', storage: 'Keine', ttl: '-', icon: '🎤', risk: 'low' },
  { category: 'PII (Schuelernamen)', processing: 'NUR auf Lehrergeraet', storage: 'Client-side', ttl: '-', icon: '👤', risk: 'high' },
  { category: 'Pseudonyme', processing: 'Server erlaubt (student_ref, class_ref)', storage: 'Valkey Cache', ttl: '24h', icon: '🔢', risk: 'low' },
  { category: 'Transkripte', processing: 'NUR verschluesselt (AES-256-GCM)', storage: 'PostgreSQL', ttl: '7 Tage', icon: '📝', risk: 'medium' },
  { category: 'Task States', processing: 'TaskOrchestrator', storage: 'Valkey', ttl: '30 Tage', icon: '📋', risk: 'low' },
  { category: 'Audit Logs', processing: 'Nur truncated IDs, keine PII', storage: 'PostgreSQL', ttl: '90 Tage', icon: '📊', risk: 'low' },
 ]
 // API Endpoints
 const API_ENDPOINTS = [
  { method: 'POST', path: '/api/v1/sessions', description: 'Voice Session erstellen' },
  { method: 'GET', path: '/api/v1/sessions/{id}', description: 'Session Status abrufen' },
  { method: 'DELETE', path: '/api/v1/sessions/{id}', description: 'Session beenden' },
  { method: 'GET', path: '/api/v1/sessions/{id}/tasks', description: 'Pending Tasks abrufen' },
  { method: 'POST', path: '/api/v1/tasks', description: 'Task erstellen' },
  { method: 'GET', path: '/api/v1/tasks/{id}', description: 'Task Status abrufen' },
  { method: 'PUT', path: '/api/v1/tasks/{id}/transition', description: 'Task State aendern' },
  { method: 'DELETE', path: '/api/v1/tasks/{id}', description: 'Task loeschen' },
  { method: 'WS', path: '/ws/voice', description: 'Voice Streaming (WebSocket)' },
  { method: 'GET', path: '/health', description: 'Health Check' },
 ]
 export default function VoiceMatrixPage() {
  const [activeTab, setActiveTab] = useState<TabType>('overview')
  const [demoLoaded, setDemoLoaded] = useState(false)
  const tabs = [
    { id: 'overview', name: 'Architektur', icon: '🏗️' },
    { id: 'demo', name: 'Live Demo', icon: '🎤' },
    { id: 'tasks', name: 'Task States', icon: '📋' },
    { id: 'intents', name: 'Intents (22)', icon: '🎯' },
    { id: 'dsgvo', name: 'DSGVO', icon: '🔒' },
    { id: 'api', name: 'API', icon: '🔌' },
  ]
  return (
    <div>
      {/* Page Purpose */}
      <PagePurpose
        title="Voice Service"
        purpose="Voice-First Interface mit PersonaPlex-7B & TaskOrchestrator. Konfigurieren und testen Sie den Voice-Service fuer Lehrer-Interaktionen per Sprache."
        audience={['Entwickler', 'Admins']}
        architecture={{
          services: ['voice-service (Python, Port 8091)', 'studio-v2 (Next.js)', 'valkey (Cache)'],
          databases: ['PostgreSQL', 'Valkey Cache'],
        }}
        relatedPages={[
          { name: 'Matrix & Jitsi', href: '/communication/matrix', description: 'Kommunikation Monitoring' },
          { name: 'GPU Infrastruktur', href: '/infrastructure/gpu', description: 'GPU fuer Voice-Service' },
        ]}
        collapsible={true}
        defaultCollapsed={false}
      />
      {/* Quick Links */}
      <div className="mb-6 flex flex-wrap gap-3">
        <a
          href="https://macmini:3001/voice-test"
          target="_blank"
          rel="noopener noreferrer"
          className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
        >
          <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11a7 7 0 01-7 7m0 0a7 7 0 01-7-7m7 7v4m0 0H8m4 0h4m-4-8a3 3 0 01-3-3V5a3 3 0 116 0v6a3 3 0 01-3 3z" />
          </svg>
          Voice Test (Studio)
        </a>
        <a
          href="https://macmini:8091/health"
          target="_blank"
          rel="noopener noreferrer"
          className="flex items-center gap-2 px-4 py-2 bg-green-100 text-green-700 rounded-lg hover:bg-green-200 transition-colors"
        >
          <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          Health Check
        </a>
        <Link
          href="/development/docs"
          className="flex items-center gap-2 px-4 py-2 bg-slate-100 text-slate-700 rounded-lg hover:bg-slate-200 transition-colors"
        >
          <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
          </svg>
          Developer Docs
        </Link>
      </div>
      {/* Stats Overview */}
      <div className="grid grid-cols-2 md:grid-cols-6 gap-4 mb-6">
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-teal-600">8091</div>
          <div className="text-sm text-slate-500">Port</div>
        </div>
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-blue-600">22</div>
          <div className="text-sm text-slate-500">Task Types</div>
        </div>
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-purple-600">9</div>
          <div className="text-sm text-slate-500">Task States</div>
        </div>
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-green-600">24kHz</div>
          <div className="text-sm text-slate-500">Audio Rate</div>
        </div>
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-orange-600">80ms</div>
          <div className="text-sm text-slate-500">Frame Size</div>
        </div>
        <div className="bg-white rounded-lg shadow p-4">
          <div className="text-3xl font-bold text-red-600">0</div>
          <div className="text-sm text-slate-500">Audio Persist</div>
        </div>
      </div>
      {/* Tabs */}
      <div className="bg-white rounded-lg shadow mb-6">
        <div className="border-b border-slate-200 px-4">
          <div className="flex gap-1 overflow-x-auto">
            {tabs.map((tab) => (
              <button
                key={tab.id}
                onClick={() => setActiveTab(tab.id as TabType)}
                className={`px-4 py-3 text-sm font-medium whitespace-nowrap transition-colors border-b-2 ${
                  activeTab === tab.id
                    ? 'border-teal-600 text-teal-600'
                    : 'border-transparent text-slate-500 hover:text-slate-700'
                }`}
              >
                <span className="mr-2">{tab.icon}</span>
                {tab.name}
              </button>
            ))}
          </div>
        </div>
        <div className="p-6">
          {/* Overview Tab */}
          {activeTab === 'overview' && (
            <div className="space-y-6">
              <h3 className="text-lg font-semibold text-slate-900">Voice-First Architektur</h3>
              {/* Architecture Diagram */}
              <div className="bg-slate-50 rounded-lg p-6 font-mono text-sm overflow-x-auto">
                <pre className="text-slate-700">{`
 ┌──────────────────────────────────────────────────────────────────┐
 │                    LEHRERGERAET (PWA / App)                       │
 │  ┌────────────────────────────────────────────────────────────┐  │
 │  │ VoiceCapture.tsx │ voice-encryption.ts │ voice-api.ts      │  │
 │  │ Mikrofon         │ AES-256-GCM         │ WebSocket Client  │  │
 │  └────────────────────────────────────────────────────────────┘  │
 └───────────────────────────┬──────────────────────────────────────┘
                            │ WebSocket (wss://)
                            ▼
 ┌──────────────────────────────────────────────────────────────────┐
 │                    VOICE SERVICE (Port 8091)                      │
 │  ┌────────────────────────────────────────────────────────────┐  │
 │  │ main.py │ streaming.py │ sessions.py │ tasks.py            │  │
 │  └────────────────────────────────────────────────────────────┘  │
 │  ┌────────────────────────────────────────────────────────────┐  │
 │  │ task_orchestrator.py │ intent_router.py │ encryption        │  │
 │  └────────────────────────────────────────────────────────────┘  │
 └───────────────────────────┬──────────────────────────────────────┘
                            │
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
 ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
 │ PersonaPlex-7B  │ │ Ollama Fallback │ │ Valkey Cache    │
 │ (A100 GPU)      │ │ (Mac Mini)      │ │ (Sessions)      │
 └─────────────────┘ └─────────────────┘ └─────────────────┘
 `}</pre>
              </div>
              {/* Technology Stack */}
              <div className="grid grid-cols-1 md:grid-cols-3 gap-4">
                <div className="bg-blue-50 border border-blue-200 rounded-lg p-4">
                  <h4 className="font-semibold text-blue-800 mb-2">Voice Model (Produktion)</h4>
                  <p className="text-sm text-blue-700">PersonaPlex-7B (NVIDIA)</p>
                  <p className="text-xs text-blue-600 mt-1">Full-Duplex Speech-to-Speech</p>
                  <p className="text-xs text-blue-500">Lizenz: MIT + NVIDIA Open Model</p>
                </div>
                <div className="bg-green-50 border border-green-200 rounded-lg p-4">
                  <h4 className="font-semibold text-green-800 mb-2">Agent Orchestration</h4>
                  <p className="text-sm text-green-700">TaskOrchestrator</p>
                  <p className="text-xs text-green-600 mt-1">Task State Machine</p>
                  <p className="text-xs text-green-500">Lizenz: Proprietary</p>
                </div>
                <div className="bg-purple-50 border border-purple-200 rounded-lg p-4">
                  <h4 className="font-semibold text-purple-800 mb-2">Audio Codec</h4>
                  <p className="text-sm text-purple-700">Mimi (24kHz, 80ms)</p>
                  <p className="text-xs text-purple-600 mt-1">Low-Latency Streaming</p>
                  <p className="text-xs text-purple-500">Lizenz: MIT</p>
                </div>
              </div>
              {/* Key Files */}
              <div>
                <h4 className="font-semibold text-slate-800 mb-3">Wichtige Dateien</h4>
                <div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
                  <table className="min-w-full divide-y divide-slate-200">
                    <thead className="bg-slate-50">
                      <tr>
                        <th className="px-4 py-2 text-left text-xs font-medium text-slate-500 uppercase">Datei</th>
                        <th className="px-4 py-2 text-left text-xs font-medium text-slate-500 uppercase">Beschreibung</th>
                      </tr>
                    </thead>
                    <tbody className="divide-y divide-slate-200">
                      <tr><td className="px-4 py-2 font-mono text-sm">voice-service/main.py</td><td className="px-4 py-2 text-sm text-slate-600">FastAPI Entry, WebSocket Handler</td></tr>
                      <tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/task_orchestrator.py</td><td className="px-4 py-2 text-sm text-slate-600">Task State Machine</td></tr>
                      <tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/intent_router.py</td><td className="px-4 py-2 text-sm text-slate-600">Intent Detection (22 Types)</td></tr>
                      <tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/encryption_service.py</td><td className="px-4 py-2 text-sm text-slate-600">Namespace Key Management</td></tr>
                      <tr><td className="px-4 py-2 font-mono text-sm">studio-v2/components/voice/VoiceCapture.tsx</td><td className="px-4 py-2 text-sm text-slate-600">Frontend Mikrofon + Crypto</td></tr>
                      <tr><td className="px-4 py-2 font-mono text-sm">studio-v2/lib/voice/voice-encryption.ts</td><td className="px-4 py-2 text-sm text-slate-600">AES-256-GCM Client-side</td></tr>
                    </tbody>
                  </table>
                </div>
              </div>
            </div>
          )}
          {/* Demo Tab */}
          {activeTab === 'demo' && (
            <div className="space-y-4">
              <div className="flex items-center justify-between">
                <h3 className="text-lg font-semibold text-slate-900">Live Voice Demo</h3>
                <a
                  href="https://macmini:3001/voice-test"
                  target="_blank"
                  rel="noopener noreferrer"
                  className="text-sm text-teal-600 hover:text-teal-700 flex items-center gap-1"
                >
                  In neuem Tab oeffnen
                  <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M10 6H6a2 2 0 00-2 2v10a2 2 0 002 2h10a2 2 0 002-2v-4M14 4h6m0 0v6m0-6L10 14" />
                  </svg>
                </a>
              </div>
              <div className="bg-slate-100 rounded-lg p-4 text-sm text-slate-600 mb-4">
                <p><strong>Hinweis:</strong> Die Demo erfordert, dass der Voice Service (Port 8091) und das Studio-v2 Frontend (Port 3001) laufen.</p>
                <code className="block mt-2 bg-slate-200 p-2 rounded">docker compose up -d voice-service && cd studio-v2 && npm run dev</code>
              </div>
              {/* Embedded Demo */}
              <div className="relative bg-slate-900 rounded-lg overflow-hidden" style={{ height: '600px' }}>
                {!demoLoaded && (
                  <div className="absolute inset-0 flex items-center justify-center">
                    <button
                      onClick={() => setDemoLoaded(true)}
                      className="px-6 py-3 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors flex items-center gap-2"
                    >
                      <svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
                      </svg>
                      Voice Demo laden
                    </button>
                  </div>
                )}
                {demoLoaded && (
                  <iframe
                    src="https://macmini:3001/voice-test?embed=true"
                    className="w-full h-full border-0"
                    title="Voice Demo"
                    allow="microphone"
                  />
                )}
              </div>
            </div>
          )}
          {/* Task States Tab */}
          {activeTab === 'tasks' && (
            <div className="space-y-6">
              <h3 className="text-lg font-semibold text-slate-900">Task State Machine (TaskOrchestrator)</h3>
              {/* State Diagram */}
              <div className="bg-slate-50 rounded-lg p-6 font-mono text-sm overflow-x-auto">
                <pre className="text-slate-700">{`
 DRAFT → QUEUED → RUNNING → READY
                              │
                  ┌───────────┴───────────┐
                  │                       │
              APPROVED                REJECTED
                  │                       │
              COMPLETED               DRAFT (revision)
 Any State → EXPIRED (TTL)
 Any State → PAUSED (User Interrupt)
 `}</pre>
              </div>
              {/* States Table */}
              <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
                {TASK_STATES.map((state) => (
                  <div key={state.state} className={`${state.color} rounded-lg p-4`}>
                    <div className="font-semibold text-lg">{state.state}</div>
                    <p className="text-sm mt-1">{state.description}</p>
                    {state.next.length > 0 && (
                      <div className="mt-2 text-xs">
                        <span className="opacity-75">Naechste:</span>{' '}
                        {state.next.join(', ')}
                      </div>
                    )}
                  </div>
                ))}
              </div>
            </div>
          )}
          {/* Intents Tab */}
          {activeTab === 'intents' && (
            <div className="space-y-6">
              <h3 className="text-lg font-semibold text-slate-900">Intent Types (22 unterstuetzte Typen)</h3>
              {INTENT_GROUPS.map((group) => (
                <div key={group.group} className={`${group.color} border rounded-lg p-4`}>
                  <h4 className="font-semibold text-slate-800 mb-3">{group.group}</h4>
                  <div className="space-y-2">
                    {group.intents.map((intent) => (
                      <div key={intent.type} className="bg-white rounded-lg p-3 shadow-sm">
                        <div className="flex items-start justify-between">
                          <div>
                            <code className="text-sm font-mono text-teal-700 bg-teal-50 px-2 py-0.5 rounded">
                              {intent.type}
                            </code>
                            <p className="text-sm text-slate-600 mt-1">{intent.description}</p>
                          </div>
                        </div>
                        <div className="mt-2 text-xs text-slate-500 italic">
                          Beispiel: &quot;{intent.example}&quot;
                        </div>
                      </div>
                    ))}
                  </div>
                </div>
              ))}
            </div>
          )}
          {/* DSGVO Tab */}
          {activeTab === 'dsgvo' && (
            <div className="space-y-6">
              <h3 className="text-lg font-semibold text-slate-900">DSGVO-Compliance</h3>
              {/* Key Principles */}
              <div className="bg-green-50 border border-green-200 rounded-lg p-4">
                <h4 className="font-semibold text-green-800 mb-2">Kernprinzipien</h4>
                <ul className="list-disc list-inside text-sm text-green-700 space-y-1">
                  <li><strong>Audio NIEMALS persistiert</strong> - Nur transient im RAM</li>
                  <li><strong>Namespace-Verschluesselung</strong> - Key nur auf Lehrergeraet</li>
                  <li><strong>Keine Klartext-PII serverseitig</strong> - Nur verschluesselt oder pseudonymisiert</li>
                  <li><strong>TTL-basierte Auto-Loeschung</strong> - 7/30/90 Tage je nach Kategorie</li>
                </ul>
              </div>
              {/* Data Categories Table */}
              <div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
                <table className="min-w-full divide-y divide-slate-200">
                  <thead className="bg-slate-50">
                    <tr>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Kategorie</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Verarbeitung</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Speicherort</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">TTL</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Risiko</th>
                    </tr>
                  </thead>
                  <tbody className="divide-y divide-slate-200">
                    {DSGVO_CATEGORIES.map((cat) => (
                      <tr key={cat.category}>
                        <td className="px-4 py-3">
                          <span className="mr-2">{cat.icon}</span>
                          <span className="font-medium">{cat.category}</span>
                        </td>
                        <td className="px-4 py-3 text-sm text-slate-600">{cat.processing}</td>
                        <td className="px-4 py-3 text-sm text-slate-600">{cat.storage}</td>
                        <td className="px-4 py-3 text-sm text-slate-600">{cat.ttl}</td>
                        <td className="px-4 py-3">
                          <span className={`px-2 py-1 rounded text-xs font-medium ${
                            cat.risk === 'low' ? 'bg-green-100 text-green-700' :
                            cat.risk === 'medium' ? 'bg-yellow-100 text-yellow-700' :
                            'bg-red-100 text-red-700'
                          }`}>
                            {cat.risk.toUpperCase()}
                          </span>
                        </td>
                      </tr>
                    ))}
                  </tbody>
                </table>
              </div>
              {/* Audit Log Info */}
              <div className="bg-slate-50 border border-slate-200 rounded-lg p-4">
                <h4 className="font-semibold text-slate-800 mb-2">Audit Logs (ohne PII)</h4>
                <div className="grid grid-cols-2 gap-4 text-sm">
                  <div>
                    <span className="text-green-600 font-medium">Erlaubt:</span>
                    <ul className="list-disc list-inside text-slate-600 mt-1">
                      <li>ref_id (truncated)</li>
                      <li>content_type</li>
                      <li>size_bytes</li>
                      <li>ttl_hours</li>
                    </ul>
                  </div>
                  <div>
                    <span className="text-red-600 font-medium">Verboten:</span>
                    <ul className="list-disc list-inside text-slate-600 mt-1">
                      <li>user_name</li>
                      <li>content / transcript</li>
                      <li>email</li>
                      <li>student_name</li>
                    </ul>
                  </div>
                </div>
              </div>
            </div>
          )}
          {/* API Tab */}
          {activeTab === 'api' && (
            <div className="space-y-6">
              <h3 className="text-lg font-semibold text-slate-900">Voice Service API (Port 8091)</h3>
              {/* REST Endpoints */}
              <div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
                <table className="min-w-full divide-y divide-slate-200">
                  <thead className="bg-slate-50">
                    <tr>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Methode</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Endpoint</th>
                      <th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Beschreibung</th>
                    </tr>
                  </thead>
                  <tbody className="divide-y divide-slate-200">
                    {API_ENDPOINTS.map((ep, idx) => (
                      <tr key={idx}>
                        <td className="px-4 py-3">
                          <span className={`px-2 py-1 rounded text-xs font-medium ${
                            ep.method === 'GET' ? 'bg-green-100 text-green-700' :
                            ep.method === 'POST' ? 'bg-blue-100 text-blue-700' :
                            ep.method === 'PUT' ? 'bg-yellow-100 text-yellow-700' :
                            ep.method === 'DELETE' ? 'bg-red-100 text-red-700' :
                            'bg-purple-100 text-purple-700'
                          }`}>
                            {ep.method}
                          </span>
                        </td>
                        <td className="px-4 py-3 font-mono text-sm">{ep.path}</td>
                        <td className="px-4 py-3 text-sm text-slate-600">{ep.description}</td>
                      </tr>
                    ))}
                  </tbody>
                </table>
              </div>
              {/* WebSocket Protocol */}
              <div className="bg-slate-50 rounded-lg p-4">
                <h4 className="font-semibold text-slate-800 mb-3">WebSocket Protocol</h4>
                <div className="grid grid-cols-1 md:grid-cols-2 gap-4 text-sm">
                  <div className="bg-white rounded-lg p-3 border border-slate-200">
                    <div className="font-medium text-slate-700 mb-2">Client → Server</div>
                    <ul className="list-disc list-inside text-slate-600 space-y-1">
                      <li><code className="bg-slate-100 px-1 rounded">Binary</code>: Int16 PCM Audio (24kHz, 80ms)</li>
                      <li><code className="bg-slate-100 px-1 rounded">JSON</code>: {`{type: "config|end_turn|interrupt"}`}</li>
                    </ul>
                  </div>
                  <div className="bg-white rounded-lg p-3 border border-slate-200">
                    <div className="font-medium text-slate-700 mb-2">Server → Client</div>
                    <ul className="list-disc list-inside text-slate-600 space-y-1">
                      <li><code className="bg-slate-100 px-1 rounded">Binary</code>: Audio Response (base64)</li>
                      <li><code className="bg-slate-100 px-1 rounded">JSON</code>: {`{type: "transcript|intent|status|error"}`}</li>
                    </ul>
                  </div>
                </div>
              </div>
              {/* Example curl commands */}
              <div className="bg-slate-900 rounded-lg p-4 text-sm">
                <h4 className="font-semibold text-slate-300 mb-3">Beispiel: Session erstellen</h4>
                <pre className="text-green-400 overflow-x-auto">{`curl -X POST https://macmini:8091/api/v1/sessions \\
  -H "Content-Type: application/json" \\
  -d '{
    "namespace_id": "ns-12345678abcdef12345678abcdef12",
    "key_hash": "sha256:dGVzdGtleWhhc2h0ZXN0a2V5aGFzaHRlc3Q=",
    "device_type": "pwa"
  }'`}</pre>
              </div>
            </div>
          )}
        </div>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/communication/video-chat/page.tsx
+++ b/admin-lehrer/app/(admin)/communication/video-chat/page.tsx
@@ -1,635 +0,0 @@
 'use client'
 /**
 * Video & Chat Admin Page
 *
 * Matrix & Jitsi Monitoring Dashboard
 * Provides system statistics, active calls, user metrics, and service health
 * Migrated from website/app/admin/communication
 */
 import { useEffect, useState, useCallback } from 'react'
 import Link from 'next/link'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { getModuleByHref } from '@/lib/navigation'
 interface MatrixStats {
  total_users: number
  active_users: number
  total_rooms: number
  active_rooms: number
  messages_today: number
  messages_this_week: number
  status: 'online' | 'offline' | 'degraded'
 }
 interface JitsiStats {
  active_meetings: number
  total_participants: number
  meetings_today: number
  average_duration_minutes: number
  peak_concurrent_users: number
  total_minutes_today: number
  status: 'online' | 'offline' | 'degraded'
 }
 interface TrafficStats {
  matrix: {
    bandwidth_in_mb: number
    bandwidth_out_mb: number
    messages_per_minute: number
    media_uploads_today: number
    media_size_mb: number
  }
  jitsi: {
    bandwidth_in_mb: number
    bandwidth_out_mb: number
    video_streams_active: number
    audio_streams_active: number
    estimated_hourly_gb: number
  }
  total: {
    bandwidth_in_mb: number
    bandwidth_out_mb: number
    estimated_monthly_gb: number
  }
 }
 interface CommunicationStats {
  matrix: MatrixStats
  jitsi: JitsiStats
  traffic?: TrafficStats
  last_updated: string
 }
 interface ActiveMeeting {
  room_name: string
  display_name: string
  participants: number
  started_at: string
  duration_minutes: number
 }
 interface RecentRoom {
  room_id: string
  name: string
  member_count: number
  last_activity: string
  room_type: 'class' | 'parent' | 'staff' | 'general'
 }
 export default function VideoChatPage() {
  const [stats, setStats] = useState<CommunicationStats | null>(null)
  const [activeMeetings, setActiveMeetings] = useState<ActiveMeeting[]>([])
  const [recentRooms, setRecentRooms] = useState<RecentRoom[]>([])
  const [loading, setLoading] = useState(true)
  const [error, setError] = useState<string | null>(null)
  const moduleInfo = getModuleByHref('/communication/video-chat')
  // Use local API proxy
  const fetchStats = useCallback(async () => {
    try {
      const response = await fetch('/api/admin/communication/stats')
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`)
      }
      const data = await response.json()
      setStats(data)
      setActiveMeetings(data.active_meetings || [])
      setRecentRooms(data.recent_rooms || [])
      setError(null)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Verbindungsfehler')
      // Set mock data for display purposes when API unavailable
      setStats({
        matrix: {
          total_users: 0,
          active_users: 0,
          total_rooms: 0,
          active_rooms: 0,
          messages_today: 0,
          messages_this_week: 0,
          status: 'offline'
        },
        jitsi: {
          active_meetings: 0,
          total_participants: 0,
          meetings_today: 0,
          average_duration_minutes: 0,
          peak_concurrent_users: 0,
          total_minutes_today: 0,
          status: 'offline'
        },
        last_updated: new Date().toISOString()
      })
    } finally {
      setLoading(false)
    }
  }, [])
  useEffect(() => {
    fetchStats()
  }, [fetchStats])
  // Auto-refresh every 15 seconds
  useEffect(() => {
    const interval = setInterval(fetchStats, 15000)
    return () => clearInterval(interval)
  }, [fetchStats])
  const getStatusBadge = (status: string) => {
    const baseClasses = 'px-3 py-1 rounded-full text-xs font-semibold uppercase'
    switch (status) {
      case 'online':
        return `${baseClasses} bg-green-100 text-green-800`
      case 'degraded':
        return `${baseClasses} bg-yellow-100 text-yellow-800`
      case 'offline':
        return `${baseClasses} bg-red-100 text-red-800`
      default:
        return `${baseClasses} bg-slate-100 text-slate-600`
    }
  }
  const getRoomTypeBadge = (type: string) => {
    const baseClasses = 'px-2 py-0.5 rounded text-xs font-medium'
    switch (type) {
      case 'class':
        return `${baseClasses} bg-blue-100 text-blue-700`
      case 'parent':
        return `${baseClasses} bg-purple-100 text-purple-700`
      case 'staff':
        return `${baseClasses} bg-orange-100 text-orange-700`
      default:
        return `${baseClasses} bg-slate-100 text-slate-600`
    }
  }
  const formatDuration = (minutes: number) => {
    if (minutes < 60) return `${Math.round(minutes)} Min.`
    const hours = Math.floor(minutes / 60)
    const mins = Math.round(minutes % 60)
    return `${hours}h ${mins}m`
  }
  const formatTimeAgo = (dateStr: string) => {
    const date = new Date(dateStr)
    const now = new Date()
    const diffMs = now.getTime() - date.getTime()
    const diffMins = Math.floor(diffMs / 60000)
    if (diffMins < 1) return 'gerade eben'
    if (diffMins < 60) return `vor ${diffMins} Min.`
    if (diffMins < 1440) return `vor ${Math.floor(diffMins / 60)} Std.`
    return `vor ${Math.floor(diffMins / 1440)} Tagen`
  }
  // Traffic estimation helpers for SysEleven planning
  const calculateEstimatedTraffic = (direction: 'in' | 'out'): number => {
    const messages = stats?.matrix?.messages_today || 0
    const callMinutes = stats?.jitsi?.total_minutes_today || 0
    const participants = stats?.jitsi?.total_participants || 0
    const messageTrafficMB = messages * 0.002
    const videoTrafficMB = callMinutes * participants * 0.011
    if (direction === 'in') {
      return messageTrafficMB * 0.3 + videoTrafficMB * 0.4
    }
    return messageTrafficMB * 0.7 + videoTrafficMB * 0.6
  }
  const calculateHourlyEstimate = (): number => {
    const activeParticipants = stats?.jitsi?.total_participants || 0
    return activeParticipants * 0.675
  }
  const calculateMonthlyEstimate = (): number => {
    const dailyCallMinutes = stats?.jitsi?.total_minutes_today || 0
    const avgParticipants = stats?.jitsi?.peak_concurrent_users || 1
    const monthlyMinutes = dailyCallMinutes * 22
    return (monthlyMinutes * avgParticipants * 11) / 1024
  }
  const getResourceRecommendation = (): string => {
    const peakUsers = stats?.jitsi?.peak_concurrent_users || 0
    const monthlyGB = calculateMonthlyEstimate()
    if (monthlyGB < 10 || peakUsers < 5) {
      return 'Starter (1 vCPU, 2GB RAM, 100GB Traffic)'
    } else if (monthlyGB < 50 || peakUsers < 20) {
      return 'Standard (2 vCPU, 4GB RAM, 500GB Traffic)'
    } else if (monthlyGB < 200 || peakUsers < 50) {
      return 'Professional (4 vCPU, 8GB RAM, 2TB Traffic)'
    } else {
      return 'Enterprise (8+ vCPU, 16GB+ RAM, Unlimited Traffic)'
    }
  }
  return (
    <div>
      {/* Page Purpose */}
      <PagePurpose
        title={moduleInfo?.module.name || 'Video & Chat'}
        purpose={moduleInfo?.module.purpose || 'Matrix & Jitsi Monitoring Dashboard'}
        audience={moduleInfo?.module.audience || ['Admins', 'DevOps']}
        architecture={{
          services: ['synapse (Matrix)', 'jitsi-meet', 'prosody', 'jvb'],
          databases: ['PostgreSQL', 'synapse-db'],
        }}
        collapsible={true}
        defaultCollapsed={true}
      />
      {/* Quick Actions */}
      <div className="flex gap-3 mb-6">
        <Link
          href="/communication/video-chat/wizard"
          className="px-4 py-2 bg-green-600 text-white rounded-lg hover:bg-green-700 transition-colors text-sm font-medium"
        >
          Test Wizard starten
        </Link>
        <button
          onClick={fetchStats}
          disabled={loading}
          className="px-4 py-2 border border-slate-300 rounded-lg hover:bg-slate-50 disabled:opacity-50 text-sm"
        >
          {loading ? 'Lade...' : 'Aktualisieren'}
        </button>
      </div>
      {/* Service Status Overview */}
      <div className="grid grid-cols-1 md:grid-cols-2 gap-6 mb-6">
        {/* Matrix Status Card */}
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <div className="flex items-center justify-between mb-4">
            <div className="flex items-center gap-3">
              <div className="w-10 h-10 bg-purple-100 rounded-lg flex items-center justify-center">
                <svg className="w-6 h-6 text-purple-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 12h.01M12 12h.01M16 12h.01M21 12c0 4.418-4.03 8-9 8a9.863 9.863 0 01-4.255-.949L3 20l1.395-3.72C3.512 15.042 3 13.574 3 12c0-4.418 4.03-8 9-8s9 3.582 9 8z" />
                </svg>
              </div>
              <div>
                <h3 className="font-semibold text-slate-900">Matrix (Synapse)</h3>
                <p className="text-sm text-slate-500">E2EE Messaging</p>
              </div>
            </div>
            <span className={getStatusBadge(stats?.matrix.status || 'offline')}>
              {stats?.matrix.status || 'offline'}
            </span>
          </div>
          <div className="grid grid-cols-3 gap-4">
            <div>
              <div className="text-2xl font-bold text-slate-900">{stats?.matrix.total_users || 0}</div>
              <div className="text-xs text-slate-500">Benutzer</div>
            </div>
            <div>
              <div className="text-2xl font-bold text-slate-900">{stats?.matrix.active_users || 0}</div>
              <div className="text-xs text-slate-500">Aktiv</div>
            </div>
            <div>
              <div className="text-2xl font-bold text-slate-900">{stats?.matrix.total_rooms || 0}</div>
              <div className="text-xs text-slate-500">Raeume</div>
            </div>
          </div>
          <div className="mt-4 pt-4 border-t border-slate-100">
            <div className="flex justify-between text-sm">
              <span className="text-slate-500">Nachrichten heute</span>
              <span className="font-medium">{stats?.matrix.messages_today || 0}</span>
            </div>
            <div className="flex justify-between text-sm mt-1">
              <span className="text-slate-500">Diese Woche</span>
              <span className="font-medium">{stats?.matrix.messages_this_week || 0}</span>
            </div>
          </div>
        </div>
        {/* Jitsi Status Card */}
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <div className="flex items-center justify-between mb-4">
            <div className="flex items-center gap-3">
              <div className="w-10 h-10 bg-blue-100 rounded-lg flex items-center justify-center">
                <svg className="w-6 h-6 text-blue-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 10l4.553-2.276A1 1 0 0121 8.618v6.764a1 1 0 01-1.447.894L15 14M5 18h8a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v8a2 2 0 002 2z" />
                </svg>
              </div>
              <div>
                <h3 className="font-semibold text-slate-900">Jitsi Meet</h3>
                <p className="text-sm text-slate-500">Videokonferenzen</p>
              </div>
            </div>
            <span className={getStatusBadge(stats?.jitsi.status || 'offline')}>
              {stats?.jitsi.status || 'offline'}
            </span>
          </div>
          <div className="grid grid-cols-3 gap-4">
            <div>
              <div className="text-2xl font-bold text-green-600">{stats?.jitsi.active_meetings || 0}</div>
              <div className="text-xs text-slate-500">Live Calls</div>
            </div>
            <div>
              <div className="text-2xl font-bold text-slate-900">{stats?.jitsi.total_participants || 0}</div>
              <div className="text-xs text-slate-500">Teilnehmer</div>
            </div>
            <div>
              <div className="text-2xl font-bold text-slate-900">{stats?.jitsi.meetings_today || 0}</div>
              <div className="text-xs text-slate-500">Calls heute</div>
            </div>
          </div>
          <div className="mt-4 pt-4 border-t border-slate-100">
            <div className="flex justify-between text-sm">
              <span className="text-slate-500">Durchschnittliche Dauer</span>
              <span className="font-medium">{formatDuration(stats?.jitsi.average_duration_minutes || 0)}</span>
            </div>
            <div className="flex justify-between text-sm mt-1">
              <span className="text-slate-500">Peak gleichzeitig</span>
              <span className="font-medium">{stats?.jitsi.peak_concurrent_users || 0} Nutzer</span>
            </div>
          </div>
        </div>
      </div>
      {/* Traffic & Bandwidth Statistics */}
      <div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
        <div className="flex items-center justify-between mb-4">
          <div className="flex items-center gap-3">
            <div className="w-10 h-10 bg-emerald-100 rounded-lg flex items-center justify-center">
              <svg className="w-6 h-6 text-emerald-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 7h8m0 0v8m0-8l-8 8-4-4-6 6" />
              </svg>
            </div>
            <div>
              <h3 className="font-semibold text-slate-900">Traffic & Bandbreite</h3>
              <p className="text-sm text-slate-500">SysEleven Ressourcenplanung</p>
            </div>
          </div>
          <span className="px-3 py-1 rounded-full text-xs font-semibold uppercase bg-emerald-100 text-emerald-800">
            Live
          </span>
        </div>
        <div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4">
          <div className="bg-slate-50 rounded-lg p-4">
            <div className="text-xs text-slate-500 mb-1">Eingehend (heute)</div>
            <div className="text-2xl font-bold text-slate-900">
              {stats?.traffic?.total?.bandwidth_in_mb?.toFixed(1) || calculateEstimatedTraffic('in').toFixed(1)} MB
            </div>
          </div>
          <div className="bg-slate-50 rounded-lg p-4">
            <div className="text-xs text-slate-500 mb-1">Ausgehend (heute)</div>
            <div className="text-2xl font-bold text-slate-900">
              {stats?.traffic?.total?.bandwidth_out_mb?.toFixed(1) || calculateEstimatedTraffic('out').toFixed(1)} MB
            </div>
          </div>
          <div className="bg-slate-50 rounded-lg p-4">
            <div className="text-xs text-slate-500 mb-1">Geschaetzt/Stunde</div>
            <div className="text-2xl font-bold text-blue-600">
              {stats?.traffic?.jitsi?.estimated_hourly_gb?.toFixed(2) || calculateHourlyEstimate().toFixed(2)} GB
            </div>
          </div>
          <div className="bg-slate-50 rounded-lg p-4">
            <div className="text-xs text-slate-500 mb-1">Geschaetzt/Monat</div>
            <div className="text-2xl font-bold text-emerald-600">
              {stats?.traffic?.total?.estimated_monthly_gb?.toFixed(1) || calculateMonthlyEstimate().toFixed(1)} GB
            </div>
          </div>
        </div>
        <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
          {/* Matrix Traffic */}
          <div className="border border-slate-200 rounded-lg p-4">
            <div className="flex items-center gap-2 mb-3">
              <div className="w-3 h-3 bg-purple-500 rounded-full"></div>
              <span className="text-sm font-medium text-slate-700">Matrix Messaging</span>
            </div>
            <div className="space-y-2 text-sm">
              <div className="flex justify-between">
                <span className="text-slate-500">Nachrichten/Min</span>
                <span className="font-medium">{stats?.traffic?.matrix?.messages_per_minute || Math.round((stats?.matrix?.messages_today || 0) / (new Date().getHours() || 1) / 60)}</span>
              </div>
              <div className="flex justify-between">
                <span className="text-slate-500">Media Uploads heute</span>
                <span className="font-medium">{stats?.traffic?.matrix?.media_uploads_today || 0}</span>
              </div>
              <div className="flex justify-between">
                <span className="text-slate-500">Media Groesse</span>
                <span className="font-medium">{stats?.traffic?.matrix?.media_size_mb?.toFixed(1) || '0.0'} MB</span>
              </div>
            </div>
          </div>
          {/* Jitsi Traffic */}
          <div className="border border-slate-200 rounded-lg p-4">
            <div className="flex items-center gap-2 mb-3">
              <div className="w-3 h-3 bg-blue-500 rounded-full"></div>
              <span className="text-sm font-medium text-slate-700">Jitsi Video</span>
            </div>
            <div className="space-y-2 text-sm">
              <div className="flex justify-between">
                <span className="text-slate-500">Video Streams aktiv</span>
                <span className="font-medium">{stats?.traffic?.jitsi?.video_streams_active || (stats?.jitsi?.total_participants || 0)}</span>
              </div>
              <div className="flex justify-between">
                <span className="text-slate-500">Audio Streams aktiv</span>
                <span className="font-medium">{stats?.traffic?.jitsi?.audio_streams_active || (stats?.jitsi?.total_participants || 0)}</span>
              </div>
              <div className="flex justify-between">
                <span className="text-slate-500">Bitrate geschaetzt</span>
                <span className="font-medium">{((stats?.jitsi?.total_participants || 0) * 1.5).toFixed(1)} Mbps</span>
              </div>
            </div>
          </div>
        </div>
        {/* SysEleven Recommendation */}
        <div className="mt-4 p-4 bg-emerald-50 border border-emerald-200 rounded-lg">
          <h4 className="text-sm font-semibold text-emerald-800 mb-2">SysEleven Empfehlung</h4>
          <div className="text-sm text-emerald-700">
            <p>Basierend auf aktuellem Traffic: <strong>{getResourceRecommendation()}</strong></p>
            <p className="mt-1 text-xs text-emerald-600">
              Peak Teilnehmer: {stats?.jitsi?.peak_concurrent_users || 0} |
              Durchschnittliche Call-Dauer: {stats?.jitsi?.average_duration_minutes?.toFixed(0) || 0} Min. |
              Calls heute: {stats?.jitsi?.meetings_today || 0}
            </p>
          </div>
        </div>
      </div>
      {/* Active Meetings */}
      <div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
        <div className="flex items-center justify-between mb-4">
          <h3 className="font-semibold text-slate-900">Aktive Meetings</h3>
        </div>
        {activeMeetings.length === 0 ? (
          <div className="text-center py-8 text-slate-500">
            <svg className="w-12 h-12 mx-auto mb-3 text-slate-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 10l4.553-2.276A1 1 0 0121 8.618v6.764a1 1 0 01-1.447.894L15 14M5 18h8a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v8a2 2 0 002 2z" />
            </svg>
            <p>Keine aktiven Meetings</p>
          </div>
        ) : (
          <div className="overflow-x-auto">
            <table className="w-full">
              <thead>
                <tr className="text-left text-xs text-slate-500 uppercase border-b border-slate-200">
                  <th className="pb-3 pr-4">Meeting</th>
                  <th className="pb-3 pr-4">Teilnehmer</th>
                  <th className="pb-3 pr-4">Gestartet</th>
                  <th className="pb-3">Dauer</th>
                </tr>
              </thead>
              <tbody className="divide-y divide-slate-100">
                {activeMeetings.map((meeting, idx) => (
                  <tr key={idx} className="text-sm">
                    <td className="py-3 pr-4">
                      <div className="font-medium text-slate-900">{meeting.display_name}</div>
                      <div className="text-xs text-slate-500">{meeting.room_name}</div>
                    </td>
                    <td className="py-3 pr-4">
                      <span className="inline-flex items-center gap-1">
                        <span className="w-2 h-2 bg-green-500 rounded-full animate-pulse" />
                        {meeting.participants}
                      </span>
                    </td>
                    <td className="py-3 pr-4 text-slate-500">{formatTimeAgo(meeting.started_at)}</td>
                    <td className="py-3 font-medium">{formatDuration(meeting.duration_minutes)}</td>
                  </tr>
                ))}
              </tbody>
            </table>
          </div>
        )}
      </div>
      {/* Recent Chat Rooms & Usage Stats */}
      <div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Aktive Chat-Raeume</h3>
          {recentRooms.length === 0 ? (
            <div className="text-center py-6 text-slate-500">
              <p>Keine aktiven Raeume</p>
            </div>
          ) : (
            <div className="space-y-3">
              {recentRooms.slice(0, 5).map((room, idx) => (
                <div key={idx} className="flex items-center justify-between p-3 bg-slate-50 rounded-lg">
                  <div className="flex items-center gap-3">
                    <div className="w-8 h-8 bg-slate-200 rounded-lg flex items-center justify-center">
                      <svg className="w-4 h-4 text-slate-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M17 20h5v-2a3 3 0 00-5.356-1.857M17 20H7m10 0v-2c0-.656-.126-1.283-.356-1.857M7 20H2v-2a3 3 0 015.356-1.857M7 20v-2c0-.656.126-1.283.356-1.857m0 0a5.002 5.002 0 019.288 0M15 7a3 3 0 11-6 0 3 3 0 016 0z" />
                      </svg>
                    </div>
                    <div>
                      <div className="font-medium text-slate-900 text-sm">{room.name}</div>
                      <div className="text-xs text-slate-500">{room.member_count} Mitglieder</div>
                    </div>
                  </div>
                  <div className="flex items-center gap-2">
                    <span className={getRoomTypeBadge(room.room_type)}>{room.room_type}</span>
                    <span className="text-xs text-slate-400">{formatTimeAgo(room.last_activity)}</span>
                  </div>
                </div>
              ))}
            </div>
          )}
        </div>
        {/* Usage Statistics */}
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Nutzungsstatistiken</h3>
          <div className="space-y-4">
            <div>
              <div className="flex justify-between text-sm mb-1">
                <span className="text-slate-600">Call-Minuten heute</span>
                <span className="font-semibold">{stats?.jitsi.total_minutes_today || 0} Min.</span>
              </div>
              <div className="w-full bg-slate-100 rounded-full h-2">
                <div
                  className="bg-blue-600 h-2 rounded-full transition-all"
                  style={{ width: `${Math.min((stats?.jitsi.total_minutes_today || 0) / 500 * 100, 100)}%` }}
                />
              </div>
            </div>
            <div>
              <div className="flex justify-between text-sm mb-1">
                <span className="text-slate-600">Aktive Chat-Raeume</span>
                <span className="font-semibold">{stats?.matrix.active_rooms || 0} / {stats?.matrix.total_rooms || 0}</span>
              </div>
              <div className="w-full bg-slate-100 rounded-full h-2">
                <div
                  className="bg-purple-600 h-2 rounded-full transition-all"
                  style={{ width: `${stats?.matrix.total_rooms ? ((stats.matrix.active_rooms / stats.matrix.total_rooms) * 100) : 0}%` }}
                />
              </div>
            </div>
            <div>
              <div className="flex justify-between text-sm mb-1">
                <span className="text-slate-600">Aktive Nutzer</span>
                <span className="font-semibold">{stats?.matrix.active_users || 0} / {stats?.matrix.total_users || 0}</span>
              </div>
              <div className="w-full bg-slate-100 rounded-full h-2">
                <div
                  className="bg-green-600 h-2 rounded-full transition-all"
                  style={{ width: `${stats?.matrix.total_users ? ((stats.matrix.active_users / stats.matrix.total_users) * 100) : 0}%` }}
                />
              </div>
            </div>
          </div>
          {/* Quick Actions */}
          <div className="mt-6 pt-4 border-t border-slate-100">
            <h4 className="text-sm font-medium text-slate-700 mb-3">Schnellaktionen</h4>
            <div className="flex flex-wrap gap-2">
              <a
                href="http://localhost:8448/_synapse/admin"
                target="_blank"
                rel="noopener noreferrer"
                className="px-3 py-1.5 text-sm bg-purple-100 text-purple-700 rounded-lg hover:bg-purple-200 transition-colors"
              >
                Synapse Admin
              </a>
              <a
                href="http://localhost:8443"
                target="_blank"
                rel="noopener noreferrer"
                className="px-3 py-1.5 text-sm bg-blue-100 text-blue-700 rounded-lg hover:bg-blue-200 transition-colors"
              >
                Jitsi Meet
              </a>
            </div>
          </div>
        </div>
      </div>
      {/* Connection Info */}
      <div className="bg-blue-50 border border-blue-200 rounded-xl p-4">
        <div className="flex gap-3">
          <svg className="w-5 h-5 text-blue-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          <div>
            <h4 className="font-semibold text-blue-900">Service Konfiguration</h4>
            <p className="text-sm text-blue-800 mt-1">
              <strong>Matrix Homeserver:</strong> http://localhost:8448 (Synapse)<br />
              <strong>Jitsi Meet:</strong> http://localhost:8443<br />
              <strong>Auto-Refresh:</strong> Alle 15 Sekunden
            </p>
            {error && (
              <p className="text-sm text-red-600 mt-2">
                <strong>Fehler:</strong> {error} - Backend nicht erreichbar
              </p>
            )}
            {stats?.last_updated && (
              <p className="text-xs text-blue-600 mt-2">
                Letzte Aktualisierung: {new Date(stats.last_updated).toLocaleString('de-DE')}
              </p>
            )}
          </div>
        </div>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/communication/video-chat/wizard/page.tsx
+++ b/admin-lehrer/app/(admin)/communication/video-chat/wizard/page.tsx
@@ -1,366 +0,0 @@
 'use client'
 /**
 * Video & Chat Wizard Page
 *
 * Interactive learning and testing wizard for Matrix & Jitsi integration
 * Migrated from website/app/admin/communication/wizard
 */
 import { useState } from 'react'
 import Link from 'next/link'
 import {
  WizardStepper,
  WizardNavigation,
  EducationCard,
  ArchitectureContext,
  TestRunner,
  TestSummary,
  type WizardStep,
  type TestCategoryResult,
  type FullTestResults,
  type EducationContent,
  type ArchitectureContextType,
 } from '@/components/wizard'
 // ==============================================
 // Constants
 // ==============================================
 const BACKEND_URL = process.env.NEXT_PUBLIC_BACKEND_URL || 'http://localhost:8000'
 const STEPS: WizardStep[] = [
  { id: 'welcome', name: 'Willkommen', icon: '👋', status: 'pending' },
  { id: 'api-health', name: 'API Status', icon: '💚', status: 'pending', category: 'api-health' },
  { id: 'matrix', name: 'Matrix', icon: '💬', status: 'pending', category: 'matrix' },
  { id: 'jitsi', name: 'Jitsi', icon: '📹', status: 'pending', category: 'jitsi' },
  { id: 'summary', name: 'Zusammenfassung', icon: '📊', status: 'pending' },
 ]
 const EDUCATION_CONTENT: Record<string, EducationContent> = {
  'welcome': {
    title: 'Willkommen zum Video & Chat Wizard',
    content: [
      'Sichere Kommunikation ist das Rueckgrat moderner Bildungsplattformen.',
      '',
      'BreakPilot nutzt zwei Open-Source Systeme:',
      '• Matrix Synapse: Dezentraler Messenger (Ende-zu-Ende verschluesselt)',
      '• Jitsi Meet: Video-Konferenzen (WebRTC-basiert)',
      '',
      'Beide Systeme sind DSGVO-konform und self-hosted.',
      '',
      'In diesem Wizard testen wir:',
      '• Matrix Homeserver und Federation',
      '• Jitsi Video-Konferenz Server',
      '• Integration mit der Schulverwaltung',
    ],
  },
  'api-health': {
    title: 'Communication API - Backend Integration',
    content: [
      'Die Communication API verbindet Matrix und Jitsi mit BreakPilot.',
      '',
      'Funktionen:',
      '• Automatische Raum-Erstellung fuer Klassen',
      '• Eltern-Lehrer DM-Raeume',
      '• Meeting-Planung mit Kalender-Integration',
      '• Benachrichtigungen bei neuen Nachrichten',
      '',
      'Endpunkte:',
      '• /api/v1/communication/admin/stats',
      '• /api/v1/communication/admin/matrix/users',
      '• /api/v1/communication/rooms',
    ],
  },
  'matrix': {
    title: 'Matrix Synapse - Dezentraler Messenger',
    content: [
      'Matrix ist ein offenes Protokoll fuer sichere Kommunikation.',
      '',
      'Vorteile gegenueber WhatsApp/Teams:',
      '• Ende-zu-Ende Verschluesselung (E2EE)',
      '• Dezentral: Kein Single Point of Failure',
      '• Federation: Kommunikation mit anderen Schulen',
      '• Self-Hosted: Volle Datenkontrolle',
      '',
      'Raum-Typen in BreakPilot:',
      '• Klassen-Info (Ankuendigungen)',
      '• Elternvertreter-Raum',
      '• Lehrer-Eltern DM',
      '• Fachgruppen',
    ],
  },
  'jitsi': {
    title: 'Jitsi Meet - Video-Konferenzen',
    content: [
      'Jitsi ist eine Open-Source Alternative zu Zoom/Teams.',
      '',
      'Features:',
      '• WebRTC: Keine Software-Installation noetig',
      '• Bildschirmfreigabe und Whiteboard',
      '• Breakout-Raeume fuer Gruppenarbeit',
      '• Aufzeichnung (optional, lokal)',
      '',
      'Anwendungsfaelle:',
      '• Elternsprechtage (online)',
      '• Fernunterricht bei Schulausfall',
      '• Lehrerkonferenzen',
      '• Foerdergespraeche',
    ],
  },
  'summary': {
    title: 'Test-Zusammenfassung',
    content: [
      'Hier sehen Sie eine Uebersicht aller durchgefuehrten Tests:',
      '• Matrix Homeserver Verfuegbarkeit',
      '• Jitsi Server Status',
      '• API-Integration',
    ],
  },
 }
 const ARCHITECTURE_CONTEXTS: Record<string, ArchitectureContextType> = {
  'api-health': {
    layer: 'api',
    services: ['backend', 'consent-service'],
    dependencies: ['PostgreSQL', 'Matrix Synapse', 'Jitsi'],
    dataFlow: ['Browser', 'FastAPI', 'Go Service', 'Matrix/Jitsi'],
  },
  'matrix': {
    layer: 'service',
    services: ['matrix'],
    dependencies: ['PostgreSQL', 'Federation', 'TURN Server'],
    dataFlow: ['Element Client', 'Matrix Synapse', 'Federation', 'PostgreSQL'],
  },
  'jitsi': {
    layer: 'service',
    services: ['jitsi'],
    dependencies: ['Prosody XMPP', 'JVB', 'TURN/STUN'],
    dataFlow: ['Browser', 'Nginx', 'Prosody', 'Jitsi Videobridge'],
  },
 }
 // ==============================================
 // Main Component
 // ==============================================
 export default function VideoChatWizardPage() {
  const [currentStep, setCurrentStep] = useState(0)
  const [steps, setSteps] = useState<WizardStep[]>(STEPS)
  const [categoryResults, setCategoryResults] = useState<Record<string, TestCategoryResult>>({})
  const [fullResults, setFullResults] = useState<FullTestResults | null>(null)
  const [isLoading, setIsLoading] = useState(false)
  const [error, setError] = useState<string | null>(null)
  const currentStepData = steps[currentStep]
  const isTestStep = currentStepData?.category !== undefined
  const isWelcome = currentStepData?.id === 'welcome'
  const isSummary = currentStepData?.id === 'summary'
  const runCategoryTest = async (category: string) => {
    setIsLoading(true)
    setError(null)
    try {
      const response = await fetch(`${BACKEND_URL}/api/admin/communication-tests/${category}`, {
        method: 'POST',
      })
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`)
      }
      const result: TestCategoryResult = await response.json()
      setCategoryResults((prev) => ({ ...prev, [category]: result }))
      setSteps((prev) =>
        prev.map((step) =>
          step.category === category
            ? { ...step, status: result.failed === 0 ? 'completed' : 'failed' }
            : step
        )
      )
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Unbekannter Fehler')
    } finally {
      setIsLoading(false)
    }
  }
  const runAllTests = async () => {
    setIsLoading(true)
    setError(null)
    try {
      const response = await fetch(`${BACKEND_URL}/api/admin/communication-tests/run-all`, {
        method: 'POST',
      })
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`)
      }
      const results: FullTestResults = await response.json()
      setFullResults(results)
      setSteps((prev) =>
        prev.map((step) => {
          if (step.category) {
            const catResult = results.categories.find((c) => c.category === step.category)
            if (catResult) {
              return { ...step, status: catResult.failed === 0 ? 'completed' : 'failed' }
            }
          }
          return step
        })
      )
      const newCategoryResults: Record<string, TestCategoryResult> = {}
      results.categories.forEach((cat) => {
        newCategoryResults[cat.category] = cat
      })
      setCategoryResults(newCategoryResults)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Unbekannter Fehler')
    } finally {
      setIsLoading(false)
    }
  }
  const goToNext = () => {
    if (currentStep < steps.length - 1) {
      setSteps((prev) =>
        prev.map((step, idx) =>
          idx === currentStep && step.status === 'pending'
            ? { ...step, status: 'completed' }
            : step
        )
      )
      setCurrentStep((prev) => prev + 1)
    }
  }
  const goToPrev = () => {
    if (currentStep > 0) {
      setCurrentStep((prev) => prev - 1)
    }
  }
  const handleStepClick = (index: number) => {
    if (index <= currentStep || steps[index - 1]?.status !== 'pending') {
      setCurrentStep(index)
    }
  }
  return (
    <div>
      {/* Header */}
      <div className="bg-white rounded-lg border border-slate-200 p-4 mb-6 flex items-center justify-between">
        <div className="flex items-center">
          <span className="text-3xl mr-3">💬</span>
          <div>
            <h2 className="text-lg font-bold text-gray-800">Video & Chat Test Wizard</h2>
            <p className="text-sm text-gray-600">Matrix Messenger & Jitsi Video</p>
          </div>
        </div>
        <Link href="/communication/video-chat" className="text-blue-600 hover:text-blue-800 text-sm">
          &larr; Zurueck zu Video & Chat
        </Link>
      </div>
      {/* Stepper */}
      <div className="bg-white rounded-lg border border-slate-200 p-6 mb-6">
        <WizardStepper steps={steps} currentStep={currentStep} onStepClick={handleStepClick} />
      </div>
      {/* Content */}
      <div className="bg-white rounded-lg border border-slate-200 p-6">
        <div className="flex items-center mb-6">
          <span className="text-3xl mr-3">{currentStepData?.icon}</span>
          <div>
            <h2 className="text-xl font-bold text-gray-800">
              Schritt {currentStep + 1}: {currentStepData?.name}
            </h2>
            <p className="text-gray-500 text-sm">
              {currentStep + 1} von {steps.length}
            </p>
          </div>
        </div>
        <EducationCard content={EDUCATION_CONTENT[currentStepData?.id || '']} />
        {isTestStep && currentStepData?.category && ARCHITECTURE_CONTEXTS[currentStepData.category] && (
          <ArchitectureContext
            context={ARCHITECTURE_CONTEXTS[currentStepData.category]}
            currentStep={currentStepData.name}
          />
        )}
        {error && (
          <div className="bg-red-50 border border-red-200 text-red-700 rounded-lg p-4 mb-6">
            <strong>Fehler:</strong> {error}
          </div>
        )}
        {isWelcome && (
          <div className="text-center py-8">
            <button
              onClick={goToNext}
              className="bg-blue-600 text-white px-8 py-3 rounded-lg font-medium hover:bg-blue-700 transition-colors"
            >
              Wizard starten
            </button>
          </div>
        )}
        {isTestStep && currentStepData?.category && (
          <TestRunner
            category={currentStepData.category}
            categoryResult={categoryResults[currentStepData.category]}
            isLoading={isLoading}
            onRunTests={() => runCategoryTest(currentStepData.category!)}
          />
        )}
        {isSummary && (
          <div>
            {!fullResults ? (
              <div className="text-center py-8">
                <p className="text-gray-600 mb-4">
                  Fuehren Sie alle Tests aus um eine Zusammenfassung zu sehen.
                </p>
                <button
                  onClick={runAllTests}
                  disabled={isLoading}
                  className={`px-6 py-3 rounded-lg font-medium transition-colors ${
                    isLoading
                      ? 'bg-gray-400 cursor-not-allowed'
                      : 'bg-blue-600 text-white hover:bg-blue-700'
                  }`}
                >
                  {isLoading ? 'Alle Tests laufen...' : 'Alle Tests ausfuehren'}
                </button>
              </div>
            ) : (
              <TestSummary results={fullResults} />
            )}
          </div>
        )}
        <WizardNavigation
          currentStep={currentStep}
          totalSteps={steps.length}
          onPrev={goToPrev}
          onNext={goToNext}
          showNext={!isSummary}
          isLoading={isLoading}
        />
      </div>
      <div className="text-center text-gray-500 text-sm mt-6">
        Diese Tests pruefen die Matrix- und Jitsi-Integration.
        Bei Fragen wenden Sie sich an das IT-Team.
      </div>
    </div>
  )
 }
--- a/admin-lehrer/app/(admin)/infrastructure/gpu/page.tsx
+++ b/admin-lehrer/app/(admin)/infrastructure/gpu/page.tsx
@@ -1,390 +0,0 @@
 'use client'
 /**
 * GPU Infrastructure Admin Page
 *
 * vast.ai GPU Management for LLM Processing
 */
 import { useEffect, useState, useCallback } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 interface VastStatus {
  instance_id: number | null
  status: string
  gpu_name: string | null
  dph_total: number | null
  endpoint_base_url: string | null
  last_activity: string | null
  auto_shutdown_in_minutes: number | null
  total_runtime_hours: number | null
  total_cost_usd: number | null
  account_credit: number | null
  account_total_spend: number | null
  session_runtime_minutes: number | null
  session_cost_usd: number | null
  message: string | null
  error?: string
 }
 export default function GPUInfrastructurePage() {
  const [status, setStatus] = useState<VastStatus | null>(null)
  const [loading, setLoading] = useState(true)
  const [actionLoading, setActionLoading] = useState<string | null>(null)
  const [error, setError] = useState<string | null>(null)
  const [message, setMessage] = useState<string | null>(null)
  const API_PROXY = '/api/admin/gpu'
  const fetchStatus = useCallback(async () => {
    setLoading(true)
    setError(null)
    try {
      const response = await fetch(API_PROXY)
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || `HTTP ${response.status}`)
      }
      setStatus(data)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Verbindungsfehler')
      setStatus({
        instance_id: null,
        status: 'error',
        gpu_name: null,
        dph_total: null,
        endpoint_base_url: null,
        last_activity: null,
        auto_shutdown_in_minutes: null,
        total_runtime_hours: null,
        total_cost_usd: null,
        account_credit: null,
        account_total_spend: null,
        session_runtime_minutes: null,
        session_cost_usd: null,
        message: 'Verbindung fehlgeschlagen'
      })
    } finally {
      setLoading(false)
    }
  }, [])
  useEffect(() => {
    fetchStatus()
  }, [fetchStatus])
  useEffect(() => {
    const interval = setInterval(fetchStatus, 30000)
    return () => clearInterval(interval)
  }, [fetchStatus])
  const powerOn = async () => {
    setActionLoading('on')
    setError(null)
    setMessage(null)
    try {
      const response = await fetch(API_PROXY, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ action: 'on' }),
      })
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
      }
      setMessage('Start angefordert')
      setTimeout(fetchStatus, 3000)
      setTimeout(fetchStatus, 10000)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Fehler beim Starten')
      fetchStatus()
    } finally {
      setActionLoading(null)
    }
  }
  const powerOff = async () => {
    setActionLoading('off')
    setError(null)
    setMessage(null)
    try {
      const response = await fetch(API_PROXY, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ action: 'off' }),
      })
      const data = await response.json()
      if (!response.ok) {
        throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
      }
      setMessage('Stop angefordert')
      setTimeout(fetchStatus, 3000)
      setTimeout(fetchStatus, 10000)
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Fehler beim Stoppen')
      fetchStatus()
    } finally {
      setActionLoading(null)
    }
  }
  const getStatusBadge = (s: string) => {
    const baseClasses = 'px-3 py-1 rounded-full text-sm font-semibold uppercase'
    switch (s) {
      case 'running':
        return `${baseClasses} bg-green-100 text-green-800`
      case 'stopped':
      case 'exited':
        return `${baseClasses} bg-red-100 text-red-800`
      case 'loading':
      case 'scheduling':
      case 'creating':
      case 'starting...':
      case 'stopping...':
        return `${baseClasses} bg-yellow-100 text-yellow-800`
      default:
        return `${baseClasses} bg-slate-100 text-slate-600`
    }
  }
  const getCreditColor = (credit: number | null) => {
    if (credit === null) return 'text-slate-500'
    if (credit < 5) return 'text-red-600'
    if (credit < 15) return 'text-yellow-600'
    return 'text-green-600'
  }
  return (
    <div>
      {/* Page Purpose */}
      <PagePurpose
        title="GPU Infrastruktur"
        purpose="Verwalten Sie die vast.ai GPU-Instanzen fuer LLM-Verarbeitung und OCR. Starten/Stoppen Sie GPUs bei Bedarf und ueberwachen Sie Kosten in Echtzeit."
        audience={['DevOps', 'Entwickler', 'System-Admins']}
        architecture={{
          services: ['vast.ai API', 'Ollama', 'VLLM'],
          databases: ['PostgreSQL (Logs)'],
        }}
        relatedPages={[
          { name: 'Security', href: '/infrastructure/security', description: 'DevSecOps Dashboard' },
          { name: 'Builds', href: '/infrastructure/builds', description: 'CI/CD Pipeline' },
        ]}
        collapsible={true}
        defaultCollapsed={true}
      />
      {/* Status Cards */}
      <div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
        <div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-6 gap-6">
          <div>
            <div className="text-sm text-slate-500 mb-2">Status</div>
            {loading ? (
              <span className="px-3 py-1 rounded-full text-sm font-semibold bg-slate-100 text-slate-600">
                Laden...
              </span>
            ) : (
              <span className={getStatusBadge(
                actionLoading === 'on' ? 'starting...' :
                actionLoading === 'off' ? 'stopping...' :
                status?.status || 'unknown'
              )}>
                {actionLoading === 'on' ? 'starting...' :
                 actionLoading === 'off' ? 'stopping...' :
                 status?.status || 'unbekannt'}
              </span>
            )}
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">GPU</div>
            <div className="font-semibold text-slate-900">
              {status?.gpu_name || '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Kosten/h</div>
            <div className="font-semibold text-slate-900">
              {status?.dph_total ? `$${status.dph_total.toFixed(3)}` : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Auto-Stop</div>
            <div className="font-semibold text-slate-900">
              {status && status.auto_shutdown_in_minutes !== null
                ? `${status.auto_shutdown_in_minutes} min`
                : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Budget</div>
            <div className={`font-bold text-lg ${getCreditColor(status?.account_credit ?? null)}`}>
              {status && status.account_credit !== null
                ? `$${status.account_credit.toFixed(2)}`
                : '-'}
            </div>
          </div>
          <div>
            <div className="text-sm text-slate-500 mb-2">Session</div>
            <div className="font-semibold text-slate-900">
              {status && status.session_runtime_minutes !== null && status.session_cost_usd !== null
                ? `${Math.round(status.session_runtime_minutes)} min / $${status.session_cost_usd.toFixed(3)}`
                : '-'}
            </div>
          </div>
        </div>
        {/* Buttons */}
        <div className="flex items-center gap-4 mt-6 pt-6 border-t border-slate-200">
          <button
            onClick={powerOn}
            disabled={actionLoading !== null || status?.status === 'running'}
            className="px-6 py-2 bg-orange-600 text-white rounded-lg font-medium hover:bg-orange-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Starten
          </button>
          <button
            onClick={powerOff}
            disabled={actionLoading !== null || status?.status !== 'running'}
            className="px-6 py-2 bg-red-600 text-white rounded-lg font-medium hover:bg-red-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Stoppen
          </button>
          <button
            onClick={fetchStatus}
            disabled={loading}
            className="px-4 py-2 border border-slate-300 text-slate-700 rounded-lg font-medium hover:bg-slate-50 disabled:opacity-50 transition-colors"
          >
            {loading ? 'Aktualisiere...' : 'Aktualisieren'}
          </button>
          {message && (
            <span className="ml-4 text-sm text-green-600 font-medium">{message}</span>
          )}
          {error && (
            <span className="ml-4 text-sm text-red-600 font-medium">{error}</span>
          )}
        </div>
      </div>
      {/* Extended Stats */}
      <div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Kosten-Uebersicht</h3>
          <div className="space-y-4">
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Session Laufzeit</span>
              <span className="font-semibold">
                {status && status.session_runtime_minutes !== null
                  ? `${Math.round(status.session_runtime_minutes)} Minuten`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Session Kosten</span>
              <span className="font-semibold">
                {status && status.session_cost_usd !== null
                  ? `$${status.session_cost_usd.toFixed(4)}`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center pt-4 border-t border-slate-100">
              <span className="text-slate-600">Gesamtlaufzeit</span>
              <span className="font-semibold">
                {status && status.total_runtime_hours !== null
                  ? `${status.total_runtime_hours.toFixed(1)} Stunden`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Gesamtkosten</span>
              <span className="font-semibold">
                {status && status.total_cost_usd !== null
                  ? `$${status.total_cost_usd.toFixed(2)}`
                  : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">vast.ai Ausgaben</span>
              <span className="font-semibold">
                {status && status.account_total_spend !== null
                  ? `$${status.account_total_spend.toFixed(2)}`
                  : '-'}
              </span>
            </div>
          </div>
        </div>
        <div className="bg-white rounded-xl border border-slate-200 p-6">
          <h3 className="font-semibold text-slate-900 mb-4">Instanz-Details</h3>
          <div className="space-y-4">
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Instanz ID</span>
              <span className="font-mono text-sm">
                {status?.instance_id || '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">GPU</span>
              <span className="font-semibold">
                {status?.gpu_name || '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Stundensatz</span>
              <span className="font-semibold">
                {status?.dph_total ? `$${status.dph_total.toFixed(4)}/h` : '-'}
              </span>
            </div>
            <div className="flex justify-between items-center">
              <span className="text-slate-600">Letzte Aktivitaet</span>
              <span className="text-sm">
                {status?.last_activity
                  ? new Date(status.last_activity).toLocaleString('de-DE')
                  : '-'}
              </span>
            </div>
            {status?.endpoint_base_url && status.status === 'running' && (
              <div className="pt-4 border-t border-slate-100">
                <div className="text-slate-600 text-sm mb-1">Endpoint</div>
                <code className="text-xs bg-slate-100 px-2 py-1 rounded block overflow-x-auto">
                  {status.endpoint_base_url}
                </code>
              </div>
            )}
          </div>
        </div>
      </div>
      {/* Info */}
      <div className="bg-orange-50 border border-orange-200 rounded-xl p-4">
        <div className="flex gap-3">
          <svg className="w-5 h-5 text-orange-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          <div>
            <h4 className="font-semibold text-orange-900">Auto-Shutdown</h4>
            <p className="text-sm text-orange-800 mt-1">
              Die GPU-Instanz wird automatisch gestoppt, wenn sie laengere Zeit inaktiv ist.
              Der Status wird alle 30 Sekunden automatisch aktualisiert.
            </p>
          </div>
        </div>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/components/grid-editor/GridTable.tsx
+++ b/admin-lehrer/components/grid-editor/GridTable.tsx
@@ -107,12 +107,18 @@ export function GridTable({
    const row = zone.rows.find((r) => r.index === rowIndex)
    if (!row) return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
    // Multi-line cells (containing \n): expand height based on line count
    const rowCells = zone.cells.filter((c) => c.row_index === rowIndex)
    const maxLines = Math.max(1, ...rowCells.map((c) => (c.text ?? '').split('\n').length))
    if (maxLines > 1) {
      const lineH = Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
      return lineH * maxLines
    }
    if (isHeader) {
      // Headers keep their measured height
      const measuredH = row.y_max_px - row.y_min_px
      return Math.max(MIN_ROW_HEIGHT, measuredH * scale)
    }
    // Content rows use average for uniformity
    return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
  }
@@ -410,46 +416,43 @@ export function GridTable({
              {/* Cells — spanning header or normal columns */}
              {isSpanning ? (
-                <div
+                <>
-                  className="border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center"
+                  {zone.cells
-                  style={{
+                    .filter((c) => c.row_index === row.index && c.col_type === 'spanning_header')
-                    gridColumn: `2 / ${numCols + 2}`,
+                    .sort((a, b) => a.col_index - b.col_index)
-                    height: `${rowH}px`,
+                    .map((spanCell) => {
-                  }}
+                      const colspan = spanCell.colspan || numCols
-                >
+                      const cellId = spanCell.cell_id
-                  {(() => {
+                      const isSelected = selectedCell === cellId
-                    const spanCell = zone.cells.find(
+                      const cellColor = getCellColor(spanCell)
-                      (c) => c.row_index === row.index && c.col_type === 'spanning_header',
+                      const gridColStart = spanCell.col_index + 2
-                    )
+                      const gridColEnd = gridColStart + colspan
-                    if (!spanCell) return null
+                      return (
-                    const cellId = spanCell.cell_id
+                        <div
-                    const isSelected = selectedCell === cellId
+                          key={cellId}
-                    const cellColor = getCellColor(spanCell)
+                          className={`border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center ${
-                    return (
+                            isSelected ? 'ring-2 ring-teal-500 ring-inset z-10' : ''
                      <div className="flex items-center w-full">
                        {cellColor && (
                          <span
                            className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm"
                            style={{ backgroundColor: cellColor }}
                          />
                        )}
                        <input
                          id={`cell-${cellId}`}
                          type="text"
                          value={spanCell.text}
                          onChange={(e) => onCellTextChange(cellId, e.target.value)}
                          onFocus={() => onSelectCell(cellId)}
                          onKeyDown={(e) => handleKeyDown(e, cellId)}
                          className={`w-full px-3 py-1 bg-transparent border-0 outline-none text-center ${
                            isSelected ? 'ring-2 ring-teal-500 ring-inset rounded' : ''
                          }`}
-                          style={{ color: cellColor || undefined }}
+                          style={{ gridColumn: `${gridColStart} / ${gridColEnd}`, height: `${rowH}px` }}
-                          spellCheck={false}
+                        >
-                        />
+                          {cellColor && (
-                      </div>
+                            <span className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm" style={{ backgroundColor: cellColor }} />
-                    )
+                          )}
-                  })()}
+                          <input
-                </div>
+                            id={`cell-${cellId}`}
                            type="text"
                            value={spanCell.text}
                            onChange={(e) => onCellTextChange(cellId, e.target.value)}
                            onFocus={() => onSelectCell(cellId)}
                            onKeyDown={(e) => handleKeyDown(e, cellId)}
                            className="w-full px-3 py-1 bg-transparent border-0 outline-none text-center"
                            style={{ color: cellColor || undefined }}
                            spellCheck={false}
                          />
                        </div>
                      )
                    })}
                </>
              ) : (
                zone.columns.map((col) => {
                  const cell = cellMap.get(`${row.index}_${col.index}`)
@@ -485,7 +488,13 @@ export function GridTable({
                      } ${isMultiSelected ? 'bg-teal-50/60 dark:bg-teal-900/20' : ''} ${
                        isLowConf && !isMultiSelected ? 'bg-amber-50/50 dark:bg-amber-900/10' : ''
                      } ${row.is_header && !isMultiSelected ? 'bg-blue-50/50 dark:bg-blue-900/10' : ''}`}
-                      style={{ height: `${rowH}px` }}
+                      style={{
                        height: `${rowH}px`,
                        ...(cell?.box_region?.bg_hex ? {
                          backgroundColor: `${cell.box_region.bg_hex}12`,
                          borderLeft: cell.box_region.border ? `3px solid ${cell.box_region.bg_hex}60` : undefined,
                        } : {}),
                      }}
                      onContextMenu={(e) => {
                        if (onSetCellColor) {
                          e.preventDefault()
@@ -501,53 +510,88 @@ export function GridTable({
                        />
                      )}
                      {/* Per-word colored display when not editing */}
-                      {hasColoredWords && !isSelected ? (
+                      {(() => {
-                        <div
+                        const cellText = cell?.text ?? ''
-                          className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
+                        const isMultiLine = cellText.includes('\n')
-                          onClick={(e) => {
+                        if (hasColoredWords && !isSelected) {
-                            if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
+                          return (
-                              onToggleCellSelection(cellId)
+                            <div
-                            } else {
+                              className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
-                              onSelectCell(cellId)
+                              onClick={(e) => {
-                              setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
+                                if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
-                            }
+                                  onToggleCellSelection(cellId)
-                          }}
+                                } else {
-                        >
+                                  onSelectCell(cellId)
-                          {cell!.word_boxes!.map((wb, i) => (
+                                  setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
-                            <span
+                                }
-                              key={i}
+                              }}
                              style={
                                wb.color_name && wb.color_name !== 'black'
                                  ? { color: wb.color }
                                  : undefined
                              }
                            >
-                              {wb.text}
+                              {cell!.word_boxes!.map((wb, i) => (
-                              {i < cell!.word_boxes!.length - 1 ? ' ' : ''}
+                                <span
-                            </span>
+                                  key={i}
-                          ))}
+                                  style={
-                        </div>
+                                    wb.color_name && wb.color_name !== 'black'
-                      ) : (
+                                      ? { color: wb.color }
-                        <input
+                                      : undefined
-                          id={`cell-${cellId}`}
+                                  }
-                          type="text"
+                                >
-                          value={cell?.text ?? ''}
+                                  {wb.text}
-                          onChange={(e) => onCellTextChange(cellId, e.target.value)}
+                                  {i < cell!.word_boxes!.length - 1 ? ' ' : ''}
-                          onFocus={() => onSelectCell(cellId)}
+                                </span>
-                          onClick={(e) => {
+                              ))}
-                            if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
+                            </div>
-                              e.preventDefault()
+                          )
-                              onToggleCellSelection(cellId)
+                        }
-                            }
+                        if (isMultiLine) {
-                          }}
+                          return (
-                          onKeyDown={(e) => handleKeyDown(e, cellId)}
+                            <textarea
-                          className={`w-full px-2 bg-transparent border-0 outline-none ${
+                              id={`cell-${cellId}`}
-                            isBold ? 'font-bold' : 'font-normal'
+                              value={cellText}
-                          }`}
+                              onChange={(e) => onCellTextChange(cellId, e.target.value)}
-                          style={{ color: cellColor || undefined }}
+                              onFocus={() => onSelectCell(cellId)}
-                          spellCheck={false}
+                              onClick={(e) => {
-                        />
+                                if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
-                      )}
+                                  e.preventDefault()
                                  onToggleCellSelection(cellId)
                                }
                              }}
                              onKeyDown={(e) => {
                                if (e.key === 'Tab') {
                                  e.preventDefault()
                                  onNavigate(cellId, e.shiftKey ? 'left' : 'right')
                                }
                              }}
                              rows={cellText.split('\n').length}
                              className={`w-full px-2 bg-transparent border-0 outline-none resize-none ${
                                isBold ? 'font-bold' : 'font-normal'
                              }`}
                              style={{ color: cellColor || undefined }}
                              spellCheck={false}
                            />
                          )
                        }
                        return (
                          <input
                            id={`cell-${cellId}`}
                            type="text"
                            value={cellText}
                            onChange={(e) => onCellTextChange(cellId, e.target.value)}
                            onFocus={() => onSelectCell(cellId)}
                            onClick={(e) => {
                              if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
                                e.preventDefault()
                                onToggleCellSelection(cellId)
                              }
                            }}
                            onKeyDown={(e) => handleKeyDown(e, cellId)}
                            className={`w-full px-2 bg-transparent border-0 outline-none ${
                              isBold ? 'font-bold' : 'font-normal'
                            }`}
                            style={{ color: cellColor || undefined }}
                            spellCheck={false}
                          />
                        )
                      })()}
                    </div>
                  )
                })
--- a/admin-lehrer/components/grid-editor/types.ts
+++ b/admin-lehrer/components/grid-editor/types.ts
@@ -1,4 +1,4 @@
-import type { OcrWordBox } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { OcrWordBox } from '@/app/(admin)/ai/ocr-kombi/types'
 // Re-export for convenience
 export type { OcrWordBox }
@@ -73,6 +73,10 @@ export interface GridZone {
  header_rows: number[]
  layout_hint?: 'left_of_vsplit' | 'right_of_vsplit' | 'middle_of_vsplit'
  vsplit_group?: number
  box_layout_type?: 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
  box_grid_reviewed?: boolean
  box_bg_color?: string
  box_bg_hex?: string
 }
 export interface BBox {
@@ -122,6 +126,16 @@ export interface GridEditorCell {
  is_bold: boolean
  /** Manual color override: hex string or null to clear. */
  color_override?: string | null
  /** Number of columns this cell spans (merged cell). Default 1. */
  colspan?: number
  /** Source zone type when in unified grid. */
  source_zone_type?: 'content' | 'box'
  /** Box visual metadata for cells from box zones. */
  box_region?: {
    bg_hex?: string
    bg_color?: string
    border?: boolean
  }
 }
 /** Layout dividers for the visual column/margin editor on the original image. */
--- a/admin-lehrer/components/grid-editor/useGridEditor.ts
+++ b/admin-lehrer/components/grid-editor/useGridEditor.ts
@@ -28,6 +28,15 @@ export function useGridEditor(sessionId: string | null) {
  const [ipaMode, setIpaMode] = useState<IpaMode>('auto')
  const [syllableMode, setSyllableMode] = useState<SyllableMode>('auto')
  // OCR Quality Steps (A/B testing toggles — defaults off for now)
  const [ocrEnhance, setOcrEnhance] = useState(false)
  const [ocrMaxCols, setOcrMaxCols] = useState(0)
  const [ocrMinConf, setOcrMinConf] = useState(0)
  // Vision-LLM Fusion (Step 4)
  const [visionFusion, setVisionFusion] = useState(false)
  const [documentCategory, setDocumentCategory] = useState('vokabelseite')
  // Undo/redo stacks store serialized zone arrays
  const undoStack = useRef<string[]>([])
  const redoStack = useRef<string[]>([])
@@ -52,6 +61,9 @@ export function useGridEditor(sessionId: string | null) {
      const params = new URLSearchParams()
      params.set('ipa_mode', ipaMode)
      params.set('syllable_mode', syllableMode)
      params.set('enhance', String(ocrEnhance))
      if (ocrMaxCols > 0) params.set('max_cols', String(ocrMaxCols))
      if (ocrMinConf > 0) params.set('min_conf', String(ocrMinConf))
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
        { method: 'POST' },
@@ -70,7 +82,41 @@ export function useGridEditor(sessionId: string | null) {
    } finally {
      setLoading(false)
    }
-  }, [sessionId, ipaMode, syllableMode])
+  }, [sessionId, ipaMode, syllableMode, ocrEnhance, ocrMaxCols, ocrMinConf])
  /** Re-run OCR with current quality settings, then rebuild grid */
  const rerunOcr = useCallback(async () => {
    if (!sessionId) return
    setLoading(true)
    setError(null)
    try {
      const params = new URLSearchParams()
      params.set('ipa_mode', ipaMode)
      params.set('syllable_mode', syllableMode)
      params.set('enhance', String(ocrEnhance))
      if (ocrMaxCols > 0) params.set('max_cols', String(ocrMaxCols))
      if (ocrMinConf > 0) params.set('min_conf', String(ocrMinConf))
      params.set('vision_fusion', String(visionFusion))
      if (documentCategory) params.set('doc_category', documentCategory)
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/rerun-ocr-and-build-grid?${params}`,
        { method: 'POST' },
      )
      if (!res.ok) {
        const data = await res.json().catch(() => ({}))
        throw new Error(data.detail || `HTTP ${res.status}`)
      }
      const data: StructuredGrid = await res.json()
      setGrid(data)
      setDirty(false)
      undoStack.current = []
      redoStack.current = []
    } catch (e) {
      setError(e instanceof Error ? e.message : String(e))
    } finally {
      setLoading(false)
    }
  }, [sessionId, ipaMode, syllableMode, ocrEnhance, ocrMaxCols, ocrMinConf, visionFusion, documentCategory])
  const loadGrid = useCallback(async () => {
    if (!sessionId) return
@@ -81,8 +127,22 @@ export function useGridEditor(sessionId: string | null) {
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`,
      )
      if (res.status === 404) {
-        // No grid yet — build it
+        // No grid yet — build it with current modes
-        await buildGrid()
+        const params = new URLSearchParams()
        params.set('ipa_mode', ipaMode)
        params.set('syllable_mode', syllableMode)
        params.set('enhance', String(ocrEnhance))
        if (ocrMaxCols > 0) params.set('max_cols', String(ocrMaxCols))
        if (ocrMinConf > 0) params.set('min_conf', String(ocrMinConf))
        const buildRes = await fetch(
          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
          { method: 'POST' },
        )
        if (buildRes.ok) {
          const data: StructuredGrid = await buildRes.json()
          setGrid(data)
          setDirty(false)
        }
        return
      }
      if (!res.ok) {
@@ -99,18 +159,48 @@ export function useGridEditor(sessionId: string | null) {
    } finally {
      setLoading(false)
    }
-  }, [sessionId, buildGrid])
+    // Only depends on sessionId — mode changes are handled by the
    // separate useEffect below, not by re-triggering loadGrid.
    // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [sessionId])
-  // Auto-rebuild when IPA or syllable mode changes (skip initial mount)
+  // Auto-rebuild when IPA or syllable mode changes (skip initial mount).
-  const initialLoadDone = useRef(false)
+  // We call the API directly with the new values instead of going through
  // the buildGrid callback, which may still close over stale state due to
  // React's asynchronous state batching.
  const mountedRef = useRef(false)
  useEffect(() => {
-    if (!initialLoadDone.current) {
+    if (!mountedRef.current) {
-      // Mark as initialized once the first grid is loaded
+      // Skip the first trigger (component mount) — don't rebuild yet
-      if (grid) initialLoadDone.current = true
+      mountedRef.current = true
      return
    }
-    // Mode changed after initial load — rebuild
+    if (!sessionId) return
-    buildGrid()
+    const rebuild = async () => {
      setLoading(true)
      setError(null)
      try {
        const params = new URLSearchParams()
        params.set('ipa_mode', ipaMode)
        params.set('syllable_mode', syllableMode)
        const res = await fetch(
          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
          { method: 'POST' },
        )
        if (!res.ok) {
          const data = await res.json().catch(() => ({}))
          throw new Error(data.detail || `HTTP ${res.status}`)
        }
        const data: StructuredGrid = await res.json()
        setGrid(data)
        setDirty(false)
      } catch (e) {
        setError(e instanceof Error ? e.message : String(e))
      } finally {
        setLoading(false)
      }
    }
    rebuild()
    // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [ipaMode, syllableMode])
@@ -940,5 +1030,16 @@ export function useGridEditor(sessionId: string | null) {
    setIpaMode,
    syllableMode,
    setSyllableMode,
    ocrEnhance,
    setOcrEnhance,
    ocrMaxCols,
    setOcrMaxCols,
    ocrMinConf,
    setOcrMinConf,
    visionFusion,
    setVisionFusion,
    documentCategory,
    setDocumentCategory,
    rerunOcr,
  }
 }
--- a/admin-lehrer/components/ocr-kombi/KombiStepper.tsx
+++ b/admin-lehrer/components/ocr-kombi/KombiStepper.tsx
@@ -1,6 +1,6 @@
 'use client'
-import type { PipelineStep } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { PipelineStep } from '@/app/(admin)/ai/ocr-kombi/types'
 interface KombiStepperProps {
  steps: PipelineStep[]
--- a/admin-lehrer/components/ocr-kombi/SessionHeader.tsx
+++ b/admin-lehrer/components/ocr-kombi/SessionHeader.tsx
@@ -1,12 +1,13 @@
 'use client'
 import { useState } from 'react'
-import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-kombi/types'
 interface SessionHeaderProps {
  sessionName: string
  activeCategory?: DocumentCategory
  isGroundTruth: boolean
  pageNumber?: number | null
  onUpdateCategory: (category: DocumentCategory) => void
 }
@@ -14,6 +15,7 @@ export function SessionHeader({
  sessionName,
  activeCategory,
  isGroundTruth,
  pageNumber,
  onUpdateCategory,
 }: SessionHeaderProps) {
  const [showCategoryPicker, setShowCategoryPicker] = useState(false)
@@ -36,6 +38,11 @@ export function SessionHeader({
      >
        {catInfo ? `${catInfo.icon} ${catInfo.label}` : 'Kategorie setzen'}
      </button>
      {pageNumber != null && (
        <span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 border border-gray-200 dark:border-gray-600 text-gray-600 dark:text-gray-300">
          S. {pageNumber}
        </span>
      )}
      {isGroundTruth && (
        <span className="text-xs px-2 py-0.5 rounded-full bg-amber-50 dark:bg-amber-900/20 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
          GT
--- a/admin-lehrer/components/ocr-kombi/SessionList.tsx
+++ b/admin-lehrer/components/ocr-kombi/SessionList.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useState } from 'react'
-import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-kombi/types'
 import type { SessionListItem, DocumentGroupView } from '@/app/(admin)/ai/ocr-kombi/useKombiPipeline'
 const KLAUSUR_API = '/klausur-api'
@@ -150,9 +150,16 @@ function GroupRow({
            {group.page_count} Seiten
          </div>
        </div>
-        <span className="text-xs px-2 py-0.5 rounded-full bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 text-blue-600 dark:text-blue-400">
+        <div className="flex items-center gap-1.5">
-          Dokument
+          {group.sessions.some(s => s.is_ground_truth) && (
-        </span>
+            <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
              GT {group.sessions.filter(s => s.is_ground_truth).length}/{group.sessions.length}
            </span>
          )}
          <span className="text-xs px-2 py-0.5 rounded-full bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 text-blue-600 dark:text-blue-400">
            Dokument
          </span>
        </div>
      </div>
      {expanded && (
@@ -179,6 +186,9 @@ function GroupRow({
                />
              </div>
              <span className="truncate flex-1">S. {s.page_number || '?'}</span>
              {s.is_ground_truth && (
                <span className="text-[9px] px-1 py-0.5 rounded bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">GT</span>
              )}
              <span className="text-[10px] text-gray-400">Step {s.current_step}</span>
              <button
                onClick={(e) => {
@@ -298,7 +308,7 @@ function SessionRow({
        </div>
      </div>
-      {/* Category badge */}
+      {/* Category + GT badge */}
      <div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
        <button
          onClick={onToggleCategory}
@@ -311,6 +321,11 @@ function SessionRow({
        >
          {catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
        </button>
        {session.is_ground_truth && (
          <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300" title="Ground Truth markiert">
            GT
          </span>
        )}
      </div>
      {/* Actions */}
--- a/admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
+++ b/admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
@@ -0,0 +1,241 @@
 'use client'
 /**
 * SpreadsheetView — Fortune Sheet with multi-sheet support.
 *
 * Each zone (content + boxes) becomes its own Excel sheet tab,
 * so each can have independent column widths optimized for its content.
 */
 import { useMemo } from 'react'
 import dynamic from 'next/dynamic'
 const Workbook = dynamic(
  () => import('@fortune-sheet/react').then((m) => m.Workbook),
  { ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
 )
 import '@fortune-sheet/react/dist/index.css'
 import type { GridZone } from '@/components/grid-editor/types'
 interface SpreadsheetViewProps {
  gridData: any
  height?: number
 }
 /** No expansion — keep multi-line cells as single cells with \n and text-wrap. */
 /** Convert a single zone to a Fortune Sheet sheet object. */
 function zoneToSheet(zone: GridZone, sheetIndex: number, isFirst: boolean): any {
  const isBox = zone.zone_type === 'box'
  const boxColor = (zone as any).box_bg_hex || ''
  // Sheet name
  let name: string
  if (!isBox) {
    name = 'Vokabeln'
  } else {
    const firstText = zone.cells?.[0]?.text ?? `Box ${sheetIndex}`
    const cleaned = firstText.replace(/[^\w\s\u00C0-\u024F„"]/g, '').trim()
    name = cleaned.length > 25 ? cleaned.slice(0, 25) + '…' : cleaned || `Box ${sheetIndex}`
  }
  const numCols = zone.columns?.length || 1
  const numRows = zone.rows?.length || 0
  const expandedCells = zone.cells || []
  // Compute zone-wide median word height for font-size detection
  const allWordHeights = zone.cells
    .flatMap((c: any) => (c.word_boxes || []).map((wb: any) => wb.height || 0))
    .filter((h: number) => h > 0)
  const medianWordH = allWordHeights.length
    ? [...allWordHeights].sort((a, b) => a - b)[Math.floor(allWordHeights.length / 2)]
    : 0
  // Build celldata
  const celldata: any[] = []
  const merges: Record<string, any> = {}
  for (const cell of expandedCells) {
    const r = cell.row_index
    const c = cell.col_index
    const text = cell.text ?? ''
    // Row metadata
    const row = zone.rows?.find((rr) => rr.index === r)
    const isHeader = row?.is_header ?? false
    // Font size detection from word_boxes
    const avgWbH = cell.word_boxes?.length
      ? cell.word_boxes.reduce((s: number, wb: any) => s + (wb.height || 0), 0) / cell.word_boxes.length
      : 0
    const isLargerFont = avgWbH > 0 && medianWordH > 0 && avgWbH > medianWordH * 1.3
    const v: any = { v: text, m: text }
    // Bold: headers, is_bold, larger font
    if (cell.is_bold || isHeader || isLargerFont) {
      v.bl = 1
    }
    // Larger font for box titles
    if (isLargerFont && isBox) {
      v.fs = 12
    }
    // Multi-line text (bullets with \n): enable text wrap + vertical top align
    // Add bullet marker (•) if multi-line and no bullet present
    if (text.includes('\n') && !isHeader) {
      if (!text.startsWith('•') && !text.startsWith('-') && !text.startsWith('–') && r > 0) {
        text = '• ' + text
        v.v = text
        v.m = text
      }
      v.tb = '2'  // text wrap
      v.vt = 0    // vertical align: top
    }
    // Header row background
    if (isHeader) {
      v.bg = isBox ? `${boxColor || '#2563eb'}18` : '#f0f4ff'
    }
    // Box cells: light tinted background
    if (isBox && !isHeader && boxColor) {
      v.bg = `${boxColor}08`
    }
    // Text color from OCR
    const color = cell.color_override
      ?? cell.word_boxes?.find((wb: any) => wb.color_name && wb.color_name !== 'black')?.color
    if (color) v.fc = color
    celldata.push({ r, c, v })
    // Colspan → merge
    const colspan = cell.colspan || 0
    if (colspan > 1 || cell.col_type === 'spanning_header') {
      const cs = colspan || numCols
      merges[`${r}_${c}`] = { r, c, rs: 1, cs }
    }
  }
  // Column widths — auto-fit based on longest text
  const columnlen: Record<string, number> = {}
  for (const col of (zone.columns || [])) {
    const colCells = expandedCells.filter(
      (c: any) => c.col_index === col.index && c.col_type !== 'spanning_header'
    )
    let maxTextLen = 0
    for (const c of colCells) {
      const len = (c.text ?? '').length
      if (len > maxTextLen) maxTextLen = len
    }
    const autoWidth = Math.max(60, maxTextLen * 7.5 + 16)
    const pxW = (col.x_max_px ?? 0) - (col.x_min_px ?? 0)
    const scaledPxW = Math.max(60, Math.round(pxW * (numCols <= 2 ? 0.6 : 0.4)))
    columnlen[String(col.index)] = Math.round(Math.max(autoWidth, scaledPxW))
  }
  // Row heights — taller for multi-line cells
  const rowlen: Record<string, number> = {}
  for (const row of (zone.rows || [])) {
    const rowCells = expandedCells.filter((c: any) => c.row_index === row.index)
    const maxLines = Math.max(1, ...rowCells.map((c: any) => (c.text ?? '').split('\n').length))
    const baseH = 24
    rowlen[String(row.index)] = Math.max(baseH, baseH * maxLines)
  }
  // Border info
  const borderInfo: any[] = []
  // Box: colored outside border
  if (isBox && boxColor && numRows > 0 && numCols > 0) {
    borderInfo.push({
      rangeType: 'range',
      borderType: 'border-outside',
      color: boxColor,
      style: 5,
      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
    })
    borderInfo.push({
      rangeType: 'range',
      borderType: 'border-inside',
      color: `${boxColor}40`,
      style: 1,
      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
    })
  }
  // Content zone: light grid lines
  if (!isBox && numRows > 0 && numCols > 0) {
    borderInfo.push({
      rangeType: 'range',
      borderType: 'border-all',
      color: '#e5e7eb',
      style: 1,
      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
    })
  }
  return {
    name,
    id: `zone_${zone.zone_index}`,
    celldata,
    row: numRows,
    column: Math.max(numCols, 1),
    status: isFirst ? 1 : 0,
    color: isBox ? boxColor : undefined,
    config: {
      merge: Object.keys(merges).length > 0 ? merges : undefined,
      columnlen,
      rowlen,
      borderInfo: borderInfo.length > 0 ? borderInfo : undefined,
    },
  }
 }
 export function SpreadsheetView({ gridData, height = 600 }: SpreadsheetViewProps) {
  const sheets = useMemo(() => {
    if (!gridData?.zones) return []
    const sorted = [...gridData.zones].sort((a: GridZone, b: GridZone) => {
      if (a.zone_type === 'content' && b.zone_type !== 'content') return -1
      if (a.zone_type !== 'content' && b.zone_type === 'content') return 1
      return (a.bbox_px?.y ?? 0) - (b.bbox_px?.y ?? 0)
    })
    return sorted
      .filter((z: GridZone) => z.cells && z.cells.length > 0)
      .map((z: GridZone, i: number) => zoneToSheet(z, i, i === 0))
  }, [gridData])
  const maxRows = Math.max(0, ...sheets.map((s: any) => s.row || 0))
  const estimatedHeight = Math.max(height, maxRows * 26 + 80)
  if (sheets.length === 0) {
    return <div className="p-4 text-center text-gray-400">Keine Daten für Spreadsheet.</div>
  }
  return (
    <div style={{ width: '100%', height: `${estimatedHeight}px` }}>
      <Workbook
        data={sheets}
        lang="en"
        showToolbar
        showFormulaBar={false}
        showSheetTabs
        toolbarItems={[
          'undo', 'redo', '|',
          'font-bold', 'font-italic', 'font-strikethrough', '|',
          'font-color', 'background', '|',
          'font-size', '|',
          'horizontal-align', 'vertical-align', '|',
          'text-wrap', 'merge-cell', '|',
          'border',
        ]}
      />
    </div>
  )
 }
--- a/admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
@@ -0,0 +1,110 @@
 'use client'
 /**
 * StepAnsicht — Excel-like Spreadsheet View.
 *
 * Left:  Original scan with OCR word overlay
 * Right: Fortune Sheet spreadsheet with multi-sheet tabs per zone
 */
 import { useEffect, useRef, useState } from 'react'
 import dynamic from 'next/dynamic'
 const SpreadsheetView = dynamic(
  () => import('./SpreadsheetView').then((m) => m.SpreadsheetView),
  { ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
 )
 const KLAUSUR_API = '/klausur-api'
 interface StepAnsichtProps {
  sessionId: string | null
  onNext: () => void
 }
 export function StepAnsicht({ sessionId, onNext }: StepAnsichtProps) {
  const [gridData, setGridData] = useState<any>(null)
  const [loading, setLoading] = useState(true)
  const [error, setError] = useState<string | null>(null)
  const leftRef = useRef<HTMLDivElement>(null)
  const [leftHeight, setLeftHeight] = useState(600)
  // Load grid data on mount
  useEffect(() => {
    if (!sessionId) return
    ;(async () => {
      try {
        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
        if (!res.ok) throw new Error(`HTTP ${res.status}`)
        setGridData(await res.json())
      } catch (e) {
        setError(e instanceof Error ? e.message : 'Fehler beim Laden')
      } finally {
        setLoading(false)
      }
    })()
  }, [sessionId])
  // Track left panel height
  useEffect(() => {
    if (!leftRef.current) return
    const ro = new ResizeObserver(([e]) => setLeftHeight(e.contentRect.height))
    ro.observe(leftRef.current)
    return () => ro.disconnect()
  }, [])
  if (loading) {
    return (
      <div className="flex items-center justify-center py-16">
        <div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
        <span className="ml-3 text-gray-500">Lade Spreadsheet...</span>
      </div>
    )
  }
  if (error || !gridData) {
    return (
      <div className="p-8 text-center">
        <p className="text-red-500 mb-4">{error || 'Keine Grid-Daten.'}</p>
        <button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg">Weiter →</button>
      </div>
    )
  }
  return (
    <div className="space-y-3">
      {/* Header */}
      <div className="flex items-center justify-between">
        <div>
          <h3 className="text-lg font-semibold text-gray-900 dark:text-white">Ansicht — Spreadsheet</h3>
          <p className="text-sm text-gray-500 dark:text-gray-400">
            Jede Zone als eigenes Sheet-Tab. Spaltenbreiten pro Sheet optimiert.
          </p>
        </div>
        <button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 text-sm font-medium">
          Weiter →
        </button>
      </div>
      {/* Split view */}
      <div className="flex gap-2">
        {/* LEFT: Original + OCR overlay */}
        <div ref={leftRef} className="w-1/3 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900 flex-shrink-0">
          <div className="px-2 py-1 bg-black/60 text-white text-[10px] font-medium">Original + OCR</div>
          {sessionId && (
            <img
              src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/words-overlay`}
              alt="Original + OCR"
              className="w-full h-auto"
            />
          )}
        </div>
        {/* RIGHT: Fortune Sheet — height adapts to content */}
        <div className="flex-1 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900">
          <SpreadsheetView gridData={gridData} height={Math.max(700, leftHeight)} />
        </div>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
@@ -0,0 +1,283 @@
 'use client'
 import { useCallback, useEffect, useRef, useState } from 'react'
 import { useGridEditor } from '@/components/grid-editor/useGridEditor'
 import type { GridZone } from '@/components/grid-editor/types'
 import { GridTable } from '@/components/grid-editor/GridTable'
 const KLAUSUR_API = '/klausur-api'
 type BoxLayoutType = 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
 const LAYOUT_LABELS: Record<BoxLayoutType, string> = {
  flowing: 'Fließtext',
  columnar: 'Tabelle/Spalten',
  bullet_list: 'Aufzählung',
  header_only: 'Überschrift',
 }
 interface StepBoxGridReviewProps {
  sessionId: string | null
  onNext: () => void
 }
 export function StepBoxGridReview({ sessionId, onNext }: StepBoxGridReviewProps) {
  const {
    grid,
    loading,
    saving,
    error,
    dirty,
    selectedCell,
    setSelectedCell,
    loadGrid,
    saveGrid,
    updateCellText,
    toggleColumnBold,
    toggleRowHeader,
    undo,
    redo,
    canUndo,
    canRedo,
    getAdjacentCell,
    commitUndoPoint,
    selectedCells,
    toggleCellSelection,
    clearCellSelection,
    toggleSelectedBold,
    setCellColor,
    deleteColumn,
    addColumn,
    deleteRow,
    addRow,
  } = useGridEditor(sessionId)
  const [building, setBuilding] = useState(false)
  const [buildError, setBuildError] = useState<string | null>(null)
  // Load grid on mount
  useEffect(() => {
    if (sessionId) loadGrid()
  }, [sessionId]) // eslint-disable-line react-hooks/exhaustive-deps
  // Get box zones
  const boxZones: GridZone[] = (grid?.zones || []).filter(
    (z: GridZone) => z.zone_type === 'box'
  )
  // Build box grids via backend
  const buildBoxGrids = useCallback(async (overrides?: Record<string, string>) => {
    if (!sessionId) return
    setBuilding(true)
    setBuildError(null)
    try {
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-box-grids`,
        {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ overrides: overrides || {} }),
        },
      )
      if (!res.ok) {
        const data = await res.json().catch(() => ({}))
        throw new Error(data.detail || `HTTP ${res.status}`)
      }
      await loadGrid()
    } catch (e) {
      setBuildError(e instanceof Error ? e.message : String(e))
    } finally {
      setBuilding(false)
    }
  }, [sessionId, loadGrid])
  // Handle layout type change for a specific box zone
  const changeLayoutType = useCallback(async (boxIdx: number, layoutType: string) => {
    await buildBoxGrids({ [String(boxIdx)]: layoutType })
  }, [buildBoxGrids])
  // Auto-build once on first load if box zones have no cells
  const autoBuildDone = useRef(false)
  useEffect(() => {
    if (!grid || loading || building || autoBuildDone.current) return
    const needsBuild = boxZones.some(z => !z.cells || z.cells.length === 0)
    if (needsBuild && sessionId) {
      autoBuildDone.current = true
      buildBoxGrids()
    }
  }, [grid, loading]) // eslint-disable-line react-hooks/exhaustive-deps
  if (loading) {
    return (
      <div className="flex items-center justify-center py-16">
        <div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
        <span className="ml-3 text-gray-500">Lade Grid...</span>
      </div>
    )
  }
  // No boxes after build attempt — skip step
  if (!building && boxZones.length === 0) {
    return (
      <div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-8 text-center">
        <div className="text-4xl mb-3">📦</div>
        <h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-2">
          Keine Boxen erkannt
        </h3>
        <p className="text-gray-500 dark:text-gray-400 mb-6">
          Auf dieser Seite wurden keine eingebetteten Boxen (Grammatik-Tipps, Übungen etc.) erkannt.
        </p>
        <button
          onClick={onNext}
          className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium"
        >
          Weiter →
        </button>
      </div>
    )
  }
  return (
    <div className="space-y-4">
      {/* Header */}
      <div className="flex items-center justify-between">
        <div>
          <h3 className="text-lg font-semibold text-gray-900 dark:text-white">
            Box-Review ({boxZones.length} {boxZones.length === 1 ? 'Box' : 'Boxen'})
          </h3>
          <p className="text-sm text-gray-500 dark:text-gray-400">
            Eingebettete Boxen prüfen und korrigieren. Layout-Typ kann pro Box angepasst werden.
          </p>
        </div>
        <div className="flex items-center gap-2">
          {dirty && (
            <button
              onClick={saveGrid}
              disabled={saving}
              className="px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition-colors text-sm font-medium disabled:opacity-50"
            >
              {saving ? 'Speichere...' : 'Speichern'}
            </button>
          )}
          <button
            onClick={() => buildBoxGrids()}
            disabled={building}
            className="px-4 py-2 bg-amber-600 text-white rounded-lg hover:bg-amber-700 transition-colors text-sm font-medium disabled:opacity-50"
          >
            {building ? 'Verarbeite...' : 'Alle Boxen neu aufbauen'}
          </button>
          <button
            onClick={async () => {
              if (dirty) await saveGrid()
              onNext()
            }}
            className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium"
          >
            Weiter →
          </button>
        </div>
      </div>
      {/* Errors */}
      {(error || buildError) && (
        <div className="p-3 bg-red-50 dark:bg-red-900/30 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-300 text-sm">
          {error || buildError}
        </div>
      )}
      {building && (
        <div className="flex items-center gap-3 p-4 bg-amber-50 dark:bg-amber-900/20 border border-amber-200 dark:border-amber-800 rounded-lg">
          <div className="w-5 h-5 border-2 border-amber-500 border-t-transparent rounded-full animate-spin" />
          <span className="text-amber-700 dark:text-amber-300 text-sm">Box-Grids werden aufgebaut...</span>
        </div>
      )}
      {/* Box zones */}
      {boxZones.map((zone, boxIdx) => {
        const boxColor = zone.box_bg_hex || '#d97706' // amber fallback
        const boxColorName = zone.box_bg_color || 'box'
        return (
        <div
          key={zone.zone_index}
          className="bg-white dark:bg-gray-800 rounded-xl overflow-hidden"
          style={{ border: `3px solid ${boxColor}` }}
        >
          {/* Box header */}
          <div
            className="flex items-center justify-between px-4 py-3 border-b"
            style={{ backgroundColor: `${boxColor}15`, borderColor: `${boxColor}30` }}
          >
            <div className="flex items-center gap-3">
              <div
                className="w-8 h-8 rounded-lg flex items-center justify-center text-white text-sm font-bold"
                style={{ backgroundColor: boxColor }}
              >
                {boxIdx + 1}
              </div>
              <div>
                <span className="font-medium text-gray-900 dark:text-white">
                  Box {boxIdx + 1}
                </span>
                <span className="text-xs text-gray-500 dark:text-gray-400 ml-2">
                  {zone.bbox_px?.w}x{zone.bbox_px?.h}px
                  {zone.cells?.length ? ` | ${zone.cells.length} Zellen` : ''}
                  {zone.box_layout_type ? ` | ${LAYOUT_LABELS[zone.box_layout_type as BoxLayoutType] || zone.box_layout_type}` : ''}
                  {boxColorName !== 'box' ? ` | ${boxColorName}` : ''}
                </span>
              </div>
            </div>
            <div className="flex items-center gap-2">
              <label className="text-xs text-gray-500 dark:text-gray-400">Layout:</label>
              <select
                value={zone.box_layout_type || 'flowing'}
                onChange={(e) => changeLayoutType(boxIdx, e.target.value)}
                disabled={building}
                className="text-xs px-2 py-1 rounded border border-gray-300 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200"
              >
                {Object.entries(LAYOUT_LABELS).map(([key, label]) => (
                  <option key={key} value={key}>{label}</option>
                ))}
              </select>
            </div>
          </div>
          {/* Box grid table */}
          <div className="p-3">
            {zone.cells && zone.cells.length > 0 ? (
              <GridTable
                zone={zone}
                selectedCell={selectedCell}
                selectedCells={selectedCells}
                onSelectCell={setSelectedCell}
                onCellTextChange={updateCellText}
                onToggleColumnBold={toggleColumnBold}
                onToggleRowHeader={toggleRowHeader}
                onNavigate={(cellId, dir) => {
                  const next = getAdjacentCell(cellId, dir)
                  if (next) setSelectedCell(next)
                }}
                onDeleteColumn={deleteColumn}
                onAddColumn={addColumn}
                onDeleteRow={deleteRow}
                onAddRow={addRow}
                onToggleCellSelection={toggleCellSelection}
                onSetCellColor={setCellColor}
              />
            ) : (
              <div className="text-center py-8 text-gray-400">
                <p className="text-sm">Keine Zellen erkannt.</p>
                <button
                  onClick={() => buildBoxGrids({ [String(boxIdx)]: 'flowing' })}
                  className="mt-2 text-xs text-amber-600 hover:text-amber-700"
                >
                  Als Fließtext verarbeiten
                </button>
              </div>
            )}
          </div>
        </div>
        )
      })}
    </div>
  )
 }
--- a/admin-lehrer/components/ocr-kombi/StepGridBuild.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGridBuild.tsx
@@ -32,8 +32,10 @@ export function StepGridBuild({ sessionId, onNext }: StepGridBuildProps) {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
      if (res.ok) {
        const data = await res.json()
-        if (data.grid_shape) {
+        // Use grid-editor summary (accurate zone-based counts)
-          setResult({ rows: data.grid_shape.rows, cols: data.grid_shape.cols, cells: data.grid_shape.total_cells })
+        const summary = data.summary
        if (summary) {
          setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
          return
        }
      }
@@ -57,8 +59,14 @@ export function StepGridBuild({ sessionId, onNext }: StepGridBuildProps) {
        throw new Error(data.detail || `Grid-Build fehlgeschlagen (${res.status})`)
      }
      const data = await res.json()
-      const shape = data.grid_shape || { rows: 0, cols: 0, total_cells: 0 }
+      // Use grid-editor summary (zone-based, more accurate than word_result.grid_shape)
-      setResult({ rows: shape.rows, cols: shape.cols, cells: shape.total_cells })
+      const summary = data.summary
      if (summary) {
        setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
      } else {
        const shape = data.grid_shape || { rows: 0, cols: 0, total_cells: 0 }
        setResult({ rows: shape.rows, cols: shape.cols, cells: shape.total_cells })
      }
    } catch (e) {
      setError(e instanceof Error ? e.message : String(e))
    } finally {
--- a/admin-lehrer/components/ocr-kombi/StepGroundTruth.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGroundTruth.tsx
@@ -1,6 +1,10 @@
 'use client'
-import { useState } from 'react'
+import { useCallback, useEffect, useRef, useState } from 'react'
 import { useGridEditor } from '@/components/grid-editor/useGridEditor'
 import { GridTable } from '@/components/grid-editor/GridTable'
 import { ImageLayoutEditor } from '@/components/grid-editor/ImageLayoutEditor'
 import type { GridZone } from '@/components/grid-editor/types'
 const KLAUSUR_API = '/klausur-api'
@@ -12,22 +16,104 @@ interface StepGroundTruthProps {
 }
 /**
- * Step 11: Ground Truth marking.
+ * Step 12: Ground Truth marking.
- * Saves the current grid as reference data for regression tests.
+ *
 * Shows the full Grid-Review view (original image + table) so the user
 * can verify the final result before marking as Ground Truth reference.
 */
 export function StepGroundTruth({ sessionId, isGroundTruth, onMarked, gridSaveRef }: StepGroundTruthProps) {
-  const [saving, setSaving] = useState(false)
+  const {
    grid,
    loading,
    saving,
    error,
    dirty,
    selectedCell,
    selectedCells,
    setSelectedCell,
    loadGrid,
    saveGrid,
    updateCellText,
    toggleColumnBold,
    toggleRowHeader,
    undo,
    redo,
    canUndo,
    canRedo,
    getAdjacentCell,
    deleteColumn,
    addColumn,
    deleteRow,
    addRow,
    toggleCellSelection,
    clearCellSelection,
    toggleSelectedBold,
    setCellColor,
  } = useGridEditor(sessionId)
  const [showImage, setShowImage] = useState(true)
  const [zoom, setZoom] = useState(100)
  const [markSaving, setMarkSaving] = useState(false)
  const [message, setMessage] = useState('')
  // Expose save function via ref
  useEffect(() => {
    if (gridSaveRef) {
      gridSaveRef.current = async () => {
        if (dirty) await saveGrid()
      }
      return () => { gridSaveRef.current = null }
    }
  }, [gridSaveRef, dirty, saveGrid])
  // Load grid on mount
  useEffect(() => {
    if (sessionId) loadGrid()
  }, [sessionId, loadGrid])
  // Keyboard shortcuts
  useEffect(() => {
    const handler = (e: KeyboardEvent) => {
      if ((e.metaKey || e.ctrlKey) && e.key === 'z' && !e.shiftKey) {
        e.preventDefault(); undo()
      } else if ((e.metaKey || e.ctrlKey) && e.key === 'z' && e.shiftKey) {
        e.preventDefault(); redo()
      } else if ((e.metaKey || e.ctrlKey) && e.key === 's') {
        e.preventDefault(); saveGrid()
      } else if ((e.metaKey || e.ctrlKey) && e.key === 'b') {
        e.preventDefault()
        if (selectedCells.size > 0) toggleSelectedBold()
      } else if (e.key === 'Escape') {
        clearCellSelection()
      }
    }
    window.addEventListener('keydown', handler)
    return () => window.removeEventListener('keydown', handler)
  }, [undo, redo, saveGrid, selectedCells, toggleSelectedBold, clearCellSelection])
  const handleNavigate = useCallback(
    (cellId: string, direction: 'up' | 'down' | 'left' | 'right') => {
      const target = getAdjacentCell(cellId, direction)
      if (target) {
        setSelectedCell(target)
        setTimeout(() => {
          const el = document.getElementById(`cell-${target}`)
          if (el) {
            el.focus()
            if (el instanceof HTMLInputElement) el.select()
          }
        }, 0)
      }
    },
    [getAdjacentCell, setSelectedCell],
  )
  const handleMark = async () => {
    if (!sessionId) return
-    setSaving(true)
+    setMarkSaving(true)
    setMessage('')
    try {
-      // Auto-save grid editor before marking
+      if (dirty) await saveGrid()
      if (gridSaveRef.current) {
        await gridSaveRef.current()
      }
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/mark-ground-truth?pipeline=kombi`,
        { method: 'POST' },
@@ -42,33 +128,168 @@ export function StepGroundTruth({ sessionId, isGroundTruth, onMarked, gridSaveRe
    } catch (e) {
      setMessage(e instanceof Error ? e.message : String(e))
    } finally {
-      setSaving(false)
+      setMarkSaving(false)
    }
  }
-  return (
+  if (!sessionId) {
-    <div className="space-y-4 p-6 bg-amber-50 dark:bg-amber-900/10 rounded-xl border border-amber-200 dark:border-amber-800">
+    return <div className="text-center py-12 text-gray-400">Keine Session ausgewaehlt.</div>
-      <h3 className="text-sm font-medium text-amber-700 dark:text-amber-300">
+  }
        Ground Truth
      </h3>
      <p className="text-sm text-amber-600 dark:text-amber-400">
        Markiert die aktuelle Grid-Ausgabe als Referenz fuer Regressionstests.
        {isGroundTruth && ' Diese Session ist bereits als Ground Truth markiert.'}
      </p>
-      <button
+  if (loading) {
-        onClick={handleMark}
+    return (
-        disabled={saving}
+      <div className="flex items-center justify-center py-16">
-        className="px-4 py-2 text-sm bg-amber-600 text-white rounded-lg hover:bg-amber-700 disabled:opacity-50"
+        <div className="flex items-center gap-3 text-gray-500 dark:text-gray-400">
-      >
+          <svg className="w-5 h-5 animate-spin" fill="none" viewBox="0 0 24 24">
-        {saving ? 'Speichere...' : isGroundTruth ? 'Ground Truth aktualisieren' : 'Als Ground Truth markieren'}
+            <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
-      </button>
+            <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
          </svg>
          Grid wird geladen...
        </div>
      </div>
    )
  }
  if (error) {
    return (
      <div className="bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-4">
        <p className="text-sm text-red-700 dark:text-red-300">Fehler: {error}</p>
      </div>
    )
  }
  if (!grid || !grid.zones.length) {
    return <div className="text-center py-12 text-gray-400">Kein Grid vorhanden.</div>
  }
  const imageUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/cropped`
  return (
    <div className="space-y-3">
      {/* GT Header Bar */}
      <div className="flex items-center justify-between p-3 bg-amber-50 dark:bg-amber-900/10 rounded-xl border border-amber-200 dark:border-amber-800">
        <div>
          <h3 className="text-sm font-medium text-amber-700 dark:text-amber-300">
            Ground Truth
            {isGroundTruth && <span className="ml-2 text-xs font-normal text-amber-500">(bereits markiert)</span>}
          </h3>
          <p className="text-xs text-amber-600 dark:text-amber-400 mt-0.5">
            Pruefen Sie das Ergebnis und markieren Sie es als Referenz fuer Regressionstests.
          </p>
        </div>
        <div className="flex items-center gap-2">
          {dirty && (
            <button
              onClick={saveGrid}
              disabled={saving}
              className="px-3 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
            >
              {saving ? 'Speichere...' : 'Speichern'}
            </button>
          )}
          <button
            onClick={handleMark}
            disabled={markSaving}
            className="px-4 py-1.5 text-xs bg-amber-600 text-white rounded-lg hover:bg-amber-700 disabled:opacity-50"
          >
            {markSaving ? 'Speichere...' : isGroundTruth ? 'GT aktualisieren' : 'Als Ground Truth markieren'}
          </button>
        </div>
      </div>
      {message && (
-        <div className={`text-sm ${message.includes('fehlgeschlagen') ? 'text-red-500' : 'text-amber-600 dark:text-amber-400'}`}>
+        <div className={`text-sm p-2 rounded ${message.includes('fehlgeschlagen') ? 'text-red-500 bg-red-50 dark:bg-red-900/20' : 'text-amber-600 dark:text-amber-400 bg-amber-50 dark:bg-amber-900/10'}`}>
          {message}
        </div>
      )}
      {/* Stats */}
      <div className="flex items-center gap-4 text-xs flex-wrap">
        <span className="text-gray-500 dark:text-gray-400">
          {grid.summary.total_zones} Zone(n), {grid.summary.total_columns} Spalten,{' '}
          {grid.summary.total_rows} Zeilen, {grid.summary.total_cells} Zellen
        </span>
        <button
          onClick={() => setShowImage(!showImage)}
          className={`px-2.5 py-1 rounded text-xs border transition-colors ${
            showImage
              ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
              : 'bg-gray-50 dark:bg-gray-800 border-gray-200 dark:border-gray-700 text-gray-500 dark:text-gray-400'
          }`}
        >
          {showImage ? 'Bild ausblenden' : 'Bild einblenden'}
        </button>
      </div>
      {/* Split View: Image left + Grid right */}
      <div className={showImage ? 'grid grid-cols-2 gap-3' : ''} style={{ minHeight: '55vh' }}>
        {showImage && (
          <ImageLayoutEditor
            imageUrl={imageUrl}
            zones={grid.zones}
            imageWidth={grid.image_width}
            layoutDividers={grid.layout_dividers}
            zoom={zoom}
            onZoomChange={setZoom}
            onColumnDividerMove={() => {}}
            onHorizontalsChange={() => {}}
            onCommitUndo={() => {}}
            onSplitColumnAt={() => {}}
            onDeleteColumn={() => {}}
          />
        )}
        <div className="space-y-3">
          {(() => {
            const groups: GridZone[][] = []
            for (const zone of grid.zones) {
              const prev = groups[groups.length - 1]
              if (prev && zone.vsplit_group != null && prev[0].vsplit_group === zone.vsplit_group) {
                prev.push(zone)
              } else {
                groups.push([zone])
              }
            }
            return groups.map((group) => (
              <div key={group[0].vsplit_group ?? group[0].zone_index}>
                <div className={`${group.length > 1 ? 'flex gap-2' : ''}`}>
                  {group.map((zone) => (
                    <div
                      key={zone.zone_index}
                      className={`${group.length > 1 ? 'flex-1 min-w-0' : ''} bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700`}
                    >
                      <GridTable
                        zone={zone}
                        layoutMetrics={grid.layout_metrics}
                        selectedCell={selectedCell}
                        selectedCells={selectedCells}
                        onSelectCell={setSelectedCell}
                        onToggleCellSelection={toggleCellSelection}
                        onCellTextChange={updateCellText}
                        onToggleColumnBold={toggleColumnBold}
                        onToggleRowHeader={toggleRowHeader}
                        onNavigate={handleNavigate}
                        onDeleteColumn={deleteColumn}
                        onAddColumn={addColumn}
                        onDeleteRow={deleteRow}
                        onAddRow={addRow}
                        onSetCellColor={setCellColor}
                      />
                    </div>
                  ))}
                </div>
              </div>
            ))
          })()}
        </div>
      </div>
      {/* Keyboard tips */}
      <div className="text-[11px] text-gray-400 dark:text-gray-500 flex items-center gap-4">
        <span>Tab: naechste Zelle</span>
        <span>Ctrl+Z/Y: Undo/Redo</span>
        <span>Ctrl+S: Speichern</span>
      </div>
    </div>
  )
 }
--- a/admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
@@ -0,0 +1,422 @@
 'use client'
 import { useState, useEffect, useCallback } from 'react'
 const KLAUSUR_API = '/klausur-api'
 interface GutterSuggestion {
  id: string
  type: 'hyphen_join' | 'spell_fix'
  zone_index: number
  row_index: number
  col_index: number
  col_type: string
  cell_id: string
  original_text: string
  suggested_text: string
  next_row_index: number
  next_row_cell_id: string
  next_row_text: string
  missing_chars: string
  display_parts: string[]
  alternatives: string[]
  confidence: number
  reason: string
 }
 interface GutterRepairResult {
  suggestions: GutterSuggestion[]
  stats: {
    words_checked: number
    gutter_candidates: number
    suggestions_found: number
    error?: string
  }
  duration_seconds: number
 }
 interface StepGutterRepairProps {
  sessionId: string | null
  onNext: () => void
 }
 /**
 * Step 11: Gutter Repair (Wortkorrektur).
 * Detects words truncated at the book gutter and proposes corrections.
 * User can accept/reject each suggestion individually or in batch.
 */
 export function StepGutterRepair({ sessionId, onNext }: StepGutterRepairProps) {
  const [loading, setLoading] = useState(false)
  const [applying, setApplying] = useState(false)
  const [result, setResult] = useState<GutterRepairResult | null>(null)
  const [accepted, setAccepted] = useState<Set<string>>(new Set())
  const [rejected, setRejected] = useState<Set<string>>(new Set())
  const [selectedText, setSelectedText] = useState<Record<string, string>>({})
  const [applied, setApplied] = useState(false)
  const [error, setError] = useState('')
  const [applyMessage, setApplyMessage] = useState('')
  const analyse = useCallback(async () => {
    if (!sessionId) return
    setLoading(true)
    setError('')
    setApplied(false)
    setApplyMessage('')
    try {
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair`,
        { method: 'POST' },
      )
      if (!res.ok) {
        const body = await res.json().catch(() => ({}))
        throw new Error(body.detail || `Analyse fehlgeschlagen (${res.status})`)
      }
      const data: GutterRepairResult = await res.json()
      setResult(data)
      // Auto-accept all suggestions with high confidence
      const autoAccept = new Set<string>()
      for (const s of data.suggestions) {
        if (s.confidence >= 0.85) {
          autoAccept.add(s.id)
        }
      }
      setAccepted(autoAccept)
      setRejected(new Set())
    } catch (e) {
      setError(e instanceof Error ? e.message : String(e))
    } finally {
      setLoading(false)
    }
  }, [sessionId])
  // Auto-trigger analysis on mount
  useEffect(() => {
    if (sessionId) analyse()
  }, [sessionId, analyse])
  const toggleSuggestion = (id: string) => {
    setAccepted(prev => {
      const next = new Set(prev)
      if (next.has(id)) {
        next.delete(id)
        setRejected(r => new Set(r).add(id))
      } else {
        next.add(id)
        setRejected(r => { const n = new Set(r); n.delete(id); return n })
      }
      return next
    })
  }
  const acceptAll = () => {
    if (!result) return
    setAccepted(new Set(result.suggestions.map(s => s.id)))
    setRejected(new Set())
  }
  const rejectAll = () => {
    if (!result) return
    setRejected(new Set(result.suggestions.map(s => s.id)))
    setAccepted(new Set())
  }
  const applyAccepted = async () => {
    if (!sessionId || accepted.size === 0) return
    setApplying(true)
    setApplyMessage('')
    try {
      const res = await fetch(
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair/apply`,
        {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            accepted: Array.from(accepted),
            text_overrides: selectedText,
          }),
        },
      )
      if (!res.ok) {
        const body = await res.json().catch(() => ({}))
        throw new Error(body.detail || `Anwenden fehlgeschlagen (${res.status})`)
      }
      const data = await res.json()
      setApplied(true)
      setApplyMessage(`${data.applied_count} Korrektur(en) angewendet.`)
    } catch (e) {
      setApplyMessage(e instanceof Error ? e.message : String(e))
    } finally {
      setApplying(false)
    }
  }
  const suggestions = result?.suggestions || []
  const hasSuggestions = suggestions.length > 0
  return (
    <div className="space-y-4">
      {/* Header */}
      <div className="flex items-center justify-between">
        <div>
          <h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
            Wortkorrektur (Buchfalz)
          </h3>
          <p className="text-xs text-gray-500 dark:text-gray-400 mt-1">
            Erkennt abgeschnittene oder unscharfe Woerter am Buchfalz und Bindestrich-Trennungen ueber Zeilen hinweg.
          </p>
        </div>
        {result && !loading && (
          <button
            onClick={analyse}
            className="px-3 py-1.5 text-xs bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 rounded-lg hover:bg-gray-200 dark:hover:bg-gray-600"
          >
            Erneut analysieren
          </button>
        )}
      </div>
      {/* Loading */}
      {loading && (
        <div className="flex items-center gap-3 p-6 bg-blue-50 dark:bg-blue-900/20 rounded-xl border border-blue-200 dark:border-blue-800">
          <div className="animate-spin w-5 h-5 border-2 border-blue-400 border-t-transparent rounded-full" />
          <span className="text-sm text-blue-600 dark:text-blue-400">Analysiere Woerter am Buchfalz...</span>
        </div>
      )}
      {/* Error */}
      {error && (
        <div className="space-y-3">
          <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
            {error}
          </div>
          <button
            onClick={analyse}
            className="px-4 py-2 bg-orange-600 text-white text-sm rounded-lg hover:bg-orange-700"
          >
            Erneut versuchen
          </button>
        </div>
      )}
      {/* No suggestions */}
      {result && !hasSuggestions && !loading && (
        <div className="p-4 bg-green-50 dark:bg-green-900/20 rounded-xl border border-green-200 dark:border-green-800">
          <div className="text-sm font-medium text-green-700 dark:text-green-300">
            Keine Buchfalz-Fehler erkannt.
          </div>
          <div className="text-xs text-green-600 dark:text-green-400 mt-1">
            {result.stats.words_checked} Woerter geprueft, {result.stats.gutter_candidates} Kandidaten am Rand analysiert.
          </div>
        </div>
      )}
      {/* Suggestions list */}
      {hasSuggestions && !loading && (
        <>
          {/* Stats bar */}
          <div className="flex items-center justify-between p-3 bg-gray-50 dark:bg-gray-800 rounded-lg">
            <div className="text-xs text-gray-500 dark:text-gray-400">
              {suggestions.length} Vorschlag/Vorschlaege &middot;{' '}
              {result!.stats.words_checked} Woerter geprueft &middot;{' '}
              {result!.duration_seconds}s
            </div>
            <div className="flex gap-2">
              <button
                onClick={acceptAll}
                disabled={applied}
                className="px-2 py-1 text-xs bg-green-100 dark:bg-green-900/30 text-green-700 dark:text-green-300 rounded hover:bg-green-200 dark:hover:bg-green-900/50 disabled:opacity-50"
              >
                Alle akzeptieren
              </button>
              <button
                onClick={rejectAll}
                disabled={applied}
                className="px-2 py-1 text-xs bg-red-100 dark:bg-red-900/30 text-red-700 dark:text-red-300 rounded hover:bg-red-200 dark:hover:bg-red-900/50 disabled:opacity-50"
              >
                Alle ablehnen
              </button>
            </div>
          </div>
          {/* Suggestion cards */}
          <div className="space-y-2">
            {suggestions.map((s) => {
              const isAccepted = accepted.has(s.id)
              const isRejected = rejected.has(s.id)
              return (
                <div
                  key={s.id}
                  className={`p-3 rounded-lg border transition-colors ${
                    applied
                      ? isAccepted
                        ? 'bg-green-50 dark:bg-green-900/10 border-green-200 dark:border-green-800'
                        : 'bg-gray-50 dark:bg-gray-800/50 border-gray-200 dark:border-gray-700 opacity-60'
                      : isAccepted
                        ? 'bg-green-50 dark:bg-green-900/10 border-green-300 dark:border-green-700'
                        : isRejected
                          ? 'bg-red-50 dark:bg-red-900/10 border-red-200 dark:border-red-800 opacity-60'
                          : 'bg-white dark:bg-gray-800 border-gray-200 dark:border-gray-700'
                  }`}
                >
                  <div className="flex items-start justify-between gap-3">
                    {/* Left: suggestion details */}
                    <div className="flex-1 min-w-0">
                      {/* Type badge */}
                      <div className="flex items-center gap-2 mb-1.5">
                        <span className={`inline-flex px-1.5 py-0.5 text-[10px] font-medium rounded ${
                          s.type === 'hyphen_join'
                            ? 'bg-purple-100 dark:bg-purple-900/30 text-purple-700 dark:text-purple-300'
                            : 'bg-orange-100 dark:bg-orange-900/30 text-orange-700 dark:text-orange-300'
                        }`}>
                          {s.type === 'hyphen_join' ? 'Zeilenumbruch' : 'Buchfalz-Korrektur'}
                        </span>
                        <span className="text-[10px] text-gray-400">
                          Zeile {s.row_index + 1}, Spalte {s.col_index + 1}
                          {s.col_type && ` (${s.col_type.replace('column_', '')})`}
                        </span>
                        <span className={`text-[10px] ${
                          s.confidence >= 0.9 ? 'text-green-500' :
                          s.confidence >= 0.7 ? 'text-yellow-500' : 'text-red-500'
                        }`}>
                          {Math.round(s.confidence * 100)}%
                        </span>
                      </div>
                      {/* Correction display */}
                      {s.type === 'hyphen_join' ? (
                        <div className="space-y-1">
                          <div className="flex items-center gap-2 text-sm">
                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
                              {s.original_text}
                            </span>
                            <span className="text-gray-400 text-xs">Z.{s.row_index + 1}</span>
                            <span className="text-gray-300 dark:text-gray-600">+</span>
                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
                              {s.next_row_text.split(' ')[0]}
                            </span>
                            <span className="text-gray-400 text-xs">Z.{s.next_row_index + 1}</span>
                            <span className="text-gray-400">&rarr;</span>
                            <span className="font-mono text-green-600 dark:text-green-400 font-semibold">
                              {s.suggested_text}
                            </span>
                          </div>
                          {s.missing_chars && (
                            <div className="text-[10px] text-gray-400">
                              Fehlende Zeichen: <span className="font-mono font-semibold">{s.missing_chars}</span>
                              {' '}&middot; Darstellung: <span className="font-mono">{s.display_parts.join(' | ')}</span>
                            </div>
                          )}
                        </div>
                      ) : (
                        <div className="space-y-1">
                          <div className="flex items-center gap-2 text-sm">
                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
                              {s.original_text}
                            </span>
                            <span className="text-gray-400">&rarr;</span>
                            <span className="font-mono text-green-600 dark:text-green-400 font-semibold">
                              {selectedText[s.id] || s.suggested_text}
                            </span>
                          </div>
                          {/* Alternatives: show other candidates the user can pick */}
                          {s.alternatives && s.alternatives.length > 0 && !applied && (
                            <div className="flex items-center gap-1.5 flex-wrap">
                              <span className="text-[10px] text-gray-400">Alternativen:</span>
                              {[s.suggested_text, ...s.alternatives].map((alt) => {
                                const isSelected = (selectedText[s.id] || s.suggested_text) === alt
                                return (
                                  <button
                                    key={alt}
                                    onClick={() => setSelectedText(prev => ({ ...prev, [s.id]: alt }))}
                                    className={`px-1.5 py-0.5 text-[11px] font-mono rounded transition-colors ${
                                      isSelected
                                        ? 'bg-green-200 dark:bg-green-800 text-green-800 dark:text-green-200 font-semibold'
                                        : 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 hover:bg-gray-200 dark:hover:bg-gray-600'
                                    }`}
                                  >
                                    {alt}
                                  </button>
                                )
                              })}
                            </div>
                          )}
                        </div>
                      )}
                    </div>
                    {/* Right: accept/reject toggle */}
                    {!applied && (
                      <button
                        onClick={() => toggleSuggestion(s.id)}
                        className={`flex-shrink-0 w-8 h-8 rounded-full flex items-center justify-center text-sm transition-colors ${
                          isAccepted
                            ? 'bg-green-500 text-white hover:bg-green-600'
                            : isRejected
                              ? 'bg-red-400 text-white hover:bg-red-500'
                              : 'bg-gray-200 dark:bg-gray-600 text-gray-500 dark:text-gray-300 hover:bg-gray-300 dark:hover:bg-gray-500'
                        }`}
                        title={isAccepted ? 'Akzeptiert (klicken zum Ablehnen)' : isRejected ? 'Abgelehnt (klicken zum Akzeptieren)' : 'Klicken zum Akzeptieren'}
                      >
                        {isAccepted ? '\u2713' : isRejected ? '\u2717' : '?'}
                      </button>
                    )}
                  </div>
                </div>
              )
            })}
          </div>
          {/* Apply / Next buttons */}
          <div className="flex items-center gap-3 pt-2">
            {!applied ? (
              <button
                onClick={applyAccepted}
                disabled={applying || accepted.size === 0}
                className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700 disabled:opacity-50"
              >
                {applying ? 'Wird angewendet...' : `${accepted.size} Korrektur(en) anwenden`}
              </button>
            ) : (
              <button
                onClick={onNext}
                className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
              >
                Weiter zu Ground Truth
              </button>
            )}
            {!applied && (
              <button
                onClick={onNext}
                className="px-4 py-2 text-sm text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-200"
              >
                Ueberspringen
              </button>
            )}
          </div>
          {/* Apply result message */}
          {applyMessage && (
            <div className={`text-sm p-2 rounded ${
              applyMessage.includes('fehlgeschlagen')
                ? 'text-red-500 bg-red-50 dark:bg-red-900/20'
                : 'text-green-600 dark:text-green-400 bg-green-50 dark:bg-green-900/20'
            }`}>
              {applyMessage}
            </div>
          )}
        </>
      )}
      {/* Skip button when no suggestions */}
      {result && !hasSuggestions && !loading && (
        <button
          onClick={onNext}
          className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
        >
          Weiter zu Ground Truth
        </button>
      )}
    </div>
  )
 }
--- a/admin-lehrer/components/ocr-kombi/StepPageSplit.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepPageSplit.tsx
@@ -1,8 +1,6 @@
 'use client'
 import { useState, useEffect, useRef } from 'react'
 import type { SubSession } from '@/app/(admin)/ai/ocr-pipeline/types'
 const KLAUSUR_API = '/klausur-api'
 interface PageSplitResult {
@@ -18,10 +16,10 @@ interface StepPageSplitProps {
  sessionId: string | null
  sessionName: string
  onNext: () => void
-  onSubSessionsCreated: (subs: SubSession[]) => void
+  onSplitComplete: (firstChildId: string, firstChildName: string) => void
 }
-export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCreated }: StepPageSplitProps) {
+export function StepPageSplit({ sessionId, sessionName, onNext, onSplitComplete }: StepPageSplitProps) {
  const [detecting, setDetecting] = useState(false)
  const [splitResult, setSplitResult] = useState<PageSplitResult | null>(null)
  const [error, setError] = useState('')
@@ -40,30 +38,33 @@ export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCre
    setDetecting(true)
    setError('')
    try {
-      // First check if sub-sessions already exist
+      // First check if this session was already split (status='split')
      const sessionRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
      if (sessionRes.ok) {
        const sessionData = await sessionRes.json()
-        if (sessionData.sub_sessions?.length > 0) {
+        if (sessionData.status === 'split' && sessionData.crop_result?.multi_page) {
-          // Already split — show existing sub-sessions
+          // Already split — find the child sessions in the session list
-          const subs = sessionData.sub_sessions as { id: string; name: string; page_index?: number; box_index?: number; current_step?: number }[]
+          const listRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
-          setSplitResult({
+          if (listRes.ok) {
-            multi_page: true,
+            const listData = await listRes.json()
-            page_count: subs.length,
+            // Child sessions have names like "ParentName — Seite N"
-            sub_sessions: subs.map((s: { id: string; name: string; page_index?: number; box_index?: number }) => ({
+            const baseName = sessionName || sessionData.name || ''
-              id: s.id,
+            const children = (listData.sessions || [])
-              name: s.name,
+              .filter((s: { name?: string }) => s.name?.startsWith(baseName + ' — '))
-              page_index: s.page_index ?? s.box_index ?? 0,
+              .sort((a: { name: string }, b: { name: string }) => a.name.localeCompare(b.name))
-            })),
+            if (children.length > 0) {
-          })
+              setSplitResult({
-          onSubSessionsCreated(subs.map((s: { id: string; name: string; page_index?: number; box_index?: number; current_step?: number }) => ({
+                multi_page: true,
-            id: s.id,
+                page_count: children.length,
-            name: s.name,
+                sub_sessions: children.map((s: { id: string; name: string }, i: number) => ({
-            box_index: s.page_index ?? s.box_index ?? 0,
+                  id: s.id, name: s.name, page_index: i,
-            current_step: s.current_step ?? 2,
+                })),
-          })))
+              })
-          setDetecting(false)
+              onSplitComplete(children[0].id, children[0].name)
-          return
+              setDetecting(false)
              return
            }
          }
        }
      }
@@ -92,12 +93,8 @@ export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCre
          sub.name = newName
        }
-        onSubSessionsCreated(data.sub_sessions.map(s => ({
+        // Signal parent to switch to the first child session
-          id: s.id,
+        onSplitComplete(data.sub_sessions[0].id, data.sub_sessions[0].name)
          name: s.name,
          box_index: s.page_index,
          current_step: 2,
        })))
      }
    } catch (e) {
      setError(e instanceof Error ? e.message : String(e))
--- a/admin-lehrer/components/ocr-kombi/StepUpload.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepUpload.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useState, useCallback, useEffect } from 'react'
-import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/BoxSessionTabs.tsx
+++ b/admin-lehrer/components/ocr-pipeline/BoxSessionTabs.tsx
@@ -1,6 +1,6 @@
 'use client'
-import type { SubSession } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { SubSession } from '@/app/(admin)/ai/ocr-kombi/types'
 interface BoxSessionTabsProps {
  parentSessionId: string
--- a/admin-lehrer/components/ocr-pipeline/ColumnControls.tsx
+++ b/admin-lehrer/components/ocr-pipeline/ColumnControls.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useState, useMemo } from 'react'
-import type { ColumnResult, ColumnGroundTruth, PageRegion } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { ColumnResult, ColumnGroundTruth, PageRegion } from '@/app/(admin)/ai/ocr-kombi/types'
 interface ColumnControlsProps {
  columnResult: ColumnResult | null
--- a/admin-lehrer/components/ocr-pipeline/DeskewControls.tsx
+++ b/admin-lehrer/components/ocr-pipeline/DeskewControls.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useState } from 'react'
-import type { DeskewResult, DeskewGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { DeskewResult, DeskewGroundTruth } from '@/app/(admin)/ai/ocr-kombi/types'
 interface DeskewControlsProps {
  deskewResult: DeskewResult | null
--- a/admin-lehrer/components/ocr-pipeline/DewarpControls.tsx
+++ b/admin-lehrer/components/ocr-pipeline/DewarpControls.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useEffect, useState } from 'react'
-import type { DeskewResult, DewarpResult, DewarpDetection, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { DeskewResult, DewarpResult, DewarpDetection, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-kombi/types'
 interface DewarpControlsProps {
  dewarpResult: DewarpResult | null
--- a/admin-lehrer/components/ocr-pipeline/FabricReconstructionCanvas.tsx
+++ b/admin-lehrer/components/ocr-pipeline/FabricReconstructionCanvas.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useRef, useState } from 'react'
-import type { GridCell } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { GridCell } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/ManualColumnEditor.tsx
+++ b/admin-lehrer/components/ocr-pipeline/ManualColumnEditor.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useRef, useState } from 'react'
-import type { ColumnTypeKey, PageRegion } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { ColumnTypeKey, PageRegion } from '@/app/(admin)/ai/ocr-kombi/types'
 const COLUMN_TYPES: { value: ColumnTypeKey; label: string }[] = [
  { value: 'column_en', label: 'EN' },
--- a/admin-lehrer/components/ocr-pipeline/PipelineStepper.tsx
+++ b/admin-lehrer/components/ocr-pipeline/PipelineStepper.tsx
@@ -1,6 +1,6 @@
 'use client'
-import { PipelineStep, DocumentTypeResult } from '@/app/(admin)/ai/ocr-pipeline/types'
+import { PipelineStep, DocumentTypeResult } from '@/app/(admin)/ai/ocr-kombi/types'
 const DOC_TYPE_LABELS: Record<string, string> = {
  vocab_table: 'Vokabeltabelle',
--- a/admin-lehrer/components/ocr-pipeline/StepColumnDetection.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepColumnDetection.tsx
@@ -1,10 +1,10 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
-import type { ColumnResult, ColumnGroundTruth, PageRegion, SubSession } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { ColumnResult, ColumnGroundTruth, PageRegion, SubSession } from '@/app/(admin)/ai/ocr-kombi/types'
 import { ColumnControls } from './ColumnControls'
 import { ManualColumnEditor } from './ManualColumnEditor'
-import type { ColumnTypeKey } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { ColumnTypeKey } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepCrop.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepCrop.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useEffect, useState } from 'react'
-import type { CropResult } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { CropResult } from '@/app/(admin)/ai/ocr-kombi/types'
 import { ImageCompareView } from './ImageCompareView'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepDeskew.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepDeskew.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
-import type { DeskewGroundTruth, DeskewResult, SessionInfo } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { DeskewGroundTruth, DeskewResult, SessionInfo } from '@/app/(admin)/ai/ocr-kombi/types'
 import { DeskewControls } from './DeskewControls'
 import { ImageCompareView } from './ImageCompareView'
--- a/admin-lehrer/components/ocr-pipeline/StepDewarp.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepDewarp.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
-import type { DeskewResult, DewarpResult, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { DeskewResult, DewarpResult, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-kombi/types'
 import { DewarpControls } from './DewarpControls'
 import { ImageCompareView } from './ImageCompareView'
--- a/admin-lehrer/components/ocr-pipeline/StepGridReview.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepGridReview.tsx
@@ -61,6 +61,17 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
    setIpaMode,
    syllableMode,
    setSyllableMode,
    ocrEnhance,
    setOcrEnhance,
    ocrMaxCols,
    setOcrMaxCols,
    ocrMinConf,
    setOcrMinConf,
    visionFusion,
    setVisionFusion,
    documentCategory,
    setDocumentCategory,
    rerunOcr,
  } = useGridEditor(sessionId)
  const [showImage, setShowImage] = useState(true)
@@ -256,6 +267,50 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
            Alle akzeptieren
          </button>
        )}
        {/* OCR Quality Steps (A/B Testing) */}
        <span className="text-gray-400 dark:text-gray-500">|</span>
        <label className="flex items-center gap-1 cursor-pointer" title="Step 3: CLAHE + Bilateral-Filter Enhancement">
          <input type="checkbox" checked={ocrEnhance} onChange={(e) => setOcrEnhance(e.target.checked)} className="rounded w-3 h-3" />
          <span className="text-gray-500 dark:text-gray-400">CLAHE</span>
        </label>
        <label className="flex items-center gap-1" title="Step 2: Max Spaltenanzahl (0=unbegrenzt)">
          <span className="text-gray-500 dark:text-gray-400">MaxCol:</span>
          <select value={ocrMaxCols} onChange={(e) => setOcrMaxCols(Number(e.target.value))} className="px-1 py-0.5 text-xs rounded border border-gray-200 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-300">
            <option value={0}>off</option>
            <option value={2}>2</option>
            <option value={3}>3</option>
            <option value={4}>4</option>
            <option value={5}>5</option>
          </select>
        </label>
        <label className="flex items-center gap-1" title="Step 1: Min OCR Confidence (0=auto)">
          <span className="text-gray-500 dark:text-gray-400">MinConf:</span>
          <select value={ocrMinConf} onChange={(e) => setOcrMinConf(Number(e.target.value))} className="px-1 py-0.5 text-xs rounded border border-gray-200 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-300">
            <option value={0}>auto</option>
            <option value={20}>20</option>
            <option value={30}>30</option>
            <option value={40}>40</option>
            <option value={50}>50</option>
            <option value={60}>60</option>
          </select>
        </label>
        <span className="text-gray-400 dark:text-gray-500">|</span>
        <label className="flex items-center gap-1 cursor-pointer" title="Step 4: Vision-LLM Fusion — Qwen2.5-VL korrigiert OCR anhand des Bildes">
          <input type="checkbox" checked={visionFusion} onChange={(e) => setVisionFusion(e.target.checked)} className="rounded w-3 h-3 accent-orange-500" />
          <span className={`${visionFusion ? 'text-orange-500 dark:text-orange-400 font-medium' : 'text-gray-500 dark:text-gray-400'}`}>Vision-LLM</span>
        </label>
        <label className="flex items-center gap-1" title="Dokumenttyp fuer Vision-LLM Prompt">
          <span className="text-gray-500 dark:text-gray-400">Typ:</span>
          <select value={documentCategory} onChange={(e) => setDocumentCategory(e.target.value)} className="px-1 py-0.5 text-xs rounded border border-gray-200 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-300">
            <option value="vokabelseite">Vokabelseite</option>
            <option value="woerterbuch">Woerterbuch</option>
            <option value="arbeitsblatt">Arbeitsblatt</option>
            <option value="buchseite">Buchseite</option>
            <option value="sonstiges">Sonstiges</option>
          </select>
        </label>
        <div className="ml-auto flex items-center gap-2">
          <button
            onClick={() => {
@@ -302,6 +357,14 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
          onIpaModeChange={setIpaMode}
          onSyllableModeChange={setSyllableMode}
        />
        <button
          onClick={rerunOcr}
          disabled={loading}
          className="ml-2 px-3 py-1.5 text-xs font-medium rounded border border-orange-300 dark:border-orange-700 bg-orange-50 dark:bg-orange-900/20 text-orange-700 dark:text-orange-300 hover:bg-orange-100 dark:hover:bg-orange-900/40 transition-colors disabled:opacity-50"
          title="OCR komplett neu ausfuehren mit aktuellen Quality-Step-Einstellungen (CLAHE, MinConf), dann Grid neu bauen"
        >
          {loading ? 'OCR laeuft...' : 'OCR neu + Grid'}
        </button>
      </div>
      {/* Split View: Image left + Grid right */}
--- a/admin-lehrer/components/ocr-pipeline/StepGroundTruth.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepGroundTruth.tsx
@@ -3,8 +3,8 @@
 import { useCallback, useEffect, useRef, useState } from 'react'
 import type {
  GridCell, ColumnMeta, ImageRegion, ImageStyle,
-} from '@/app/(admin)/ai/ocr-pipeline/types'
+} from '@/app/(admin)/ai/ocr-kombi/types'
-import { IMAGE_STYLES as STYLES } from '@/app/(admin)/ai/ocr-pipeline/types'
+import { IMAGE_STYLES as STYLES } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepLlmReview.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepLlmReview.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
-import type { GridCell, GridResult, WordEntry, ColumnMeta } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { GridCell, GridResult, WordEntry, ColumnMeta } from '@/app/(admin)/ai/ocr-kombi/types'
 import { usePixelWordPositions } from './usePixelWordPositions'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepOrientation.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepOrientation.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
-import type { OrientationResult, SessionInfo } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { OrientationResult, SessionInfo } from '@/app/(admin)/ai/ocr-kombi/types'
 import { ImageCompareView } from './ImageCompareView'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepReconstruction.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepReconstruction.tsx
@@ -2,7 +2,7 @@
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
 import dynamic from 'next/dynamic'
-import type { GridResult, GridCell, ColumnResult, RowResult, PageZone, PageRegion, RowItem, StructureResult, StructureBox, StructureGraphic } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { GridResult, GridCell, ColumnResult, RowResult, PageZone, PageRegion, RowItem, StructureResult, StructureBox, StructureGraphic } from '@/app/(admin)/ai/ocr-kombi/types'
 import { usePixelWordPositions } from './usePixelWordPositions'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepRowDetection.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepRowDetection.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useState } from 'react'
-import type { RowResult, RowGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { RowResult, RowGroundTruth } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepStructureDetection.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepStructureDetection.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useRef, useState } from 'react'
-import type { ExcludeRegion, StructureResult } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { ExcludeRegion, StructureResult } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/StepWordRecognition.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepWordRecognition.tsx
@@ -1,7 +1,7 @@
 'use client'
 import { useCallback, useEffect, useRef, useState } from 'react'
-import type { GridResult, GridCell, WordEntry, WordGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { GridResult, GridCell, WordEntry, WordGroundTruth } from '@/app/(admin)/ai/ocr-kombi/types'
 const KLAUSUR_API = '/klausur-api'
--- a/admin-lehrer/components/ocr-pipeline/tests/usePixelWordPositions.test.ts
+++ b/admin-lehrer/components/ocr-pipeline/tests/usePixelWordPositions.test.ts
@@ -0,0 +1,328 @@
 /**
 * Tests for usePixelWordPositions hook.
 *
 * The hook performs pixel-based word positioning using an offscreen canvas.
 * Since Canvas/getImageData is not available in jsdom, we test the pure
 * computation logic by extracting and testing the algorithms directly.
 */
 import { describe, it, expect } from 'vitest'
 // ---------------------------------------------------------------------------
 // Extract pure computation functions from the hook for testing
 // ---------------------------------------------------------------------------
 interface Cluster {
  start: number
  end: number
 }
 /**
 * Cluster detection: find runs of dark pixels above a threshold.
 * Replicates the cluster detection logic in usePixelWordPositions.
 */
 function findClusters(proj: number[], ch: number, cw: number): Cluster[] {
  const threshold = Math.max(1, ch * 0.03)
  const minGap = Math.max(5, Math.round(cw * 0.02))
  const clusters: Cluster[] = []
  let inCluster = false
  let clStart = 0
  let gap = 0
  for (let x = 0; x < cw; x++) {
    if (proj[x] >= threshold) {
      if (!inCluster) { clStart = x; inCluster = true }
      gap = 0
    } else if (inCluster) {
      gap++
      if (gap > minGap) {
        clusters.push({ start: clStart, end: x - gap })
        inCluster = false
        gap = 0
      }
    }
  }
  if (inCluster) clusters.push({ start: clStart, end: cw - 1 - gap })
  return clusters
 }
 /**
 * Mirror clusters for 180° rotation.
 * Replicates the rotation logic in usePixelWordPositions.
 */
 function mirrorClusters(clusters: Cluster[], cw: number): Cluster[] {
  return clusters.map(c => ({
    start: cw - 1 - c.end,
    end: cw - 1 - c.start,
  })).reverse()
 }
 /**
 * Compute fontRatio from cluster width, measured text width, and cell height.
 * Replicates the font ratio calculation.
 */
 function computeFontRatio(
  clusterW: number,
  measuredWidth: number,
  refFontSize: number,
  ch: number,
 ): number {
  const autoFontPx = refFontSize * (clusterW / measuredWidth)
  return Math.min(autoFontPx / ch, 1.0)
 }
 /**
 * Mode normalization: find the most common fontRatio (bucketed to 0.02).
 * Replicates the mode normalization in usePixelWordPositions.
 */
 function normalizeFontRatios(ratios: number[]): number {
  if (ratios.length === 0) return 0
  const buckets = new Map<number, number>()
  for (const r of ratios) {
    const key = Math.round(r * 50) / 50
    buckets.set(key, (buckets.get(key) || 0) + 1)
  }
  let modeRatio = ratios[0]
  let modeCount = 0
  for (const [ratio, count] of buckets) {
    if (count > modeCount) { modeRatio = ratio; modeCount = count }
  }
  return modeRatio
 }
 /**
 * Coordinate transform for 180° rotation.
 */
 function transformCellCoords180(
  x: number, y: number, w: number, h: number,
  imgW: number, imgH: number,
 ): { cx: number; cy: number } {
  return {
    cx: Math.round((100 - x - w) / 100 * imgW),
    cy: Math.round((100 - y - h) / 100 * imgH),
  }
 }
 // ---------------------------------------------------------------------------
 // Tests
 // ---------------------------------------------------------------------------
 describe('findClusters', () => {
  it('should find a single cluster', () => {
    // Simulate a projection with dark pixels from x=10 to x=50
    const proj = new Array(100).fill(0)
    for (let x = 10; x <= 50; x++) proj[x] = 10
    const clusters = findClusters(proj, 100, 100)
    expect(clusters.length).toBe(1)
    expect(clusters[0].start).toBe(10)
    expect(clusters[0].end).toBe(50)
  })
  it('should find multiple clusters separated by gaps', () => {
    const proj = new Array(200).fill(0)
    // Two word groups with a gap between
    for (let x = 10; x <= 40; x++) proj[x] = 10
    for (let x = 80; x <= 120; x++) proj[x] = 10
    const clusters = findClusters(proj, 100, 200)
    expect(clusters.length).toBe(2)
    expect(clusters[0].start).toBe(10)
    expect(clusters[1].start).toBe(80)
  })
  it('should merge clusters with small gaps', () => {
    // Gap smaller than minGap should not split clusters
    const proj = new Array(100).fill(0)
    for (let x = 10; x <= 30; x++) proj[x] = 10
    // Small gap (3px) — minGap = max(5, 100*0.02) = 5
    for (let x = 34; x <= 50; x++) proj[x] = 10
    const clusters = findClusters(proj, 100, 100)
    expect(clusters.length).toBe(1)  // merged into one cluster
  })
  it('should return empty for all-white projection', () => {
    const proj = new Array(100).fill(0)
    const clusters = findClusters(proj, 100, 100)
    expect(clusters.length).toBe(0)
  })
 })
 describe('mirrorClusters', () => {
  it('should mirror clusters for 180° rotation', () => {
    const clusters: Cluster[] = [
      { start: 10, end: 50 },
      { start: 80, end: 120 },
    ]
    const cw = 200
    const mirrored = mirrorClusters(clusters, cw)
    // Cluster at (10,50) → (cw-1-50, cw-1-10) = (149, 189)
    // Cluster at (80,120) → (cw-1-120, cw-1-80) = (79, 119)
    // After reverse: [(79,119), (149,189)]
    expect(mirrored.length).toBe(2)
    expect(mirrored[0]).toEqual({ start: 79, end: 119 })
    expect(mirrored[1]).toEqual({ start: 149, end: 189 })
  })
  it('should maintain left-to-right order after mirroring', () => {
    const clusters: Cluster[] = [
      { start: 5, end: 30 },
      { start: 50, end: 80 },
      { start: 100, end: 130 },
    ]
    const mirrored = mirrorClusters(clusters, 200)
    // After mirroring and reversing, order should be left-to-right
    for (let i = 1; i < mirrored.length; i++) {
      expect(mirrored[i].start).toBeGreaterThan(mirrored[i - 1].start)
    }
  })
  it('should handle single cluster', () => {
    const clusters: Cluster[] = [{ start: 20, end: 80 }]
    const mirrored = mirrorClusters(clusters, 200)
    expect(mirrored.length).toBe(1)
    expect(mirrored[0]).toEqual({ start: 119, end: 179 })
  })
 })
 describe('computeFontRatio', () => {
  it('should compute ratio based on cluster vs measured width', () => {
    // Cluster is 100px wide, measured text at 40px font is 200px → autoFont = 20px
    // Cell height = 30px → ratio = 20/30 = 0.667
    const ratio = computeFontRatio(100, 200, 40, 30)
    expect(ratio).toBeCloseTo(0.667, 2)
  })
  it('should cap ratio at 1.0', () => {
    // Very large cluster relative to measured text
    const ratio = computeFontRatio(400, 100, 40, 30)
    expect(ratio).toBe(1.0)
  })
  it('should handle small cluster width', () => {
    const ratio = computeFontRatio(10, 200, 40, 30)
    expect(ratio).toBeCloseTo(0.067, 2)
  })
 })
 describe('normalizeFontRatios', () => {
  it('should return the most common ratio', () => {
    const ratios = [0.5, 0.5, 0.5, 0.3, 0.3, 0.7]
    const mode = normalizeFontRatios(ratios)
    expect(mode).toBe(0.5)
  })
  it('should bucket ratios to nearest 0.02', () => {
    // 0.51 and 0.49 both round to 0.50 (nearest 0.02)
    const ratios = [0.51, 0.49, 0.50, 0.30]
    const mode = normalizeFontRatios(ratios)
    expect(mode).toBe(0.50)
  })
  it('should handle empty array', () => {
    expect(normalizeFontRatios([])).toBe(0)
  })
  it('should handle single ratio', () => {
    expect(normalizeFontRatios([0.65])).toBe(0.66)  // rounded to nearest 0.02
  })
 })
 describe('transformCellCoords180', () => {
  it('should transform cell coordinates for 180° rotation', () => {
    // Cell at x=10%, y=20%, w=30%, h=5% on a 1000x2000 image
    const { cx, cy } = transformCellCoords180(10, 20, 30, 5, 1000, 2000)
    // Expected: cx = (100 - 10 - 30) / 100 * 1000 = 600
    //           cy = (100 - 20 - 5) / 100 * 2000 = 1500
    expect(cx).toBe(600)
    expect(cy).toBe(1500)
  })
  it('should handle cell at origin', () => {
    const { cx, cy } = transformCellCoords180(0, 0, 50, 50, 1000, 1000)
    // Expected: cx = (100 - 0 - 50) / 100 * 1000 = 500
    //           cy = (100 - 0 - 50) / 100 * 1000 = 500
    expect(cx).toBe(500)
    expect(cy).toBe(500)
  })
  it('should handle cell at bottom-right', () => {
    const { cx, cy } = transformCellCoords180(80, 90, 20, 10, 1000, 2000)
    // Expected: cx = (100 - 80 - 20) / 100 * 1000 = 0
    //           cy = (100 - 90 - 10) / 100 * 2000 = 0
    expect(cx).toBe(0)
    expect(cy).toBe(0)
  })
 })
 describe('sub-session coordinate conversion', () => {
  /**
   * Test the coordinate conversion from sub-session (box-relative)
   * to parent (page-absolute) coordinates.
   * Replicates the logic in StepReconstruction loadSessionData.
   */
  it('should convert sub-session cell coords to parent space', () => {
    const imgW = 1746
    const imgH = 2487
    // Box zone in pixels
    const box = { x: 50, y: 1145, width: 1100, height: 270 }
    // Box in percent
    const boxXPct = (box.x / imgW) * 100
    const boxYPct = (box.y / imgH) * 100
    const boxWPct = (box.width / imgW) * 100
    const boxHPct = (box.height / imgH) * 100
    // Sub-session cell at (10%, 20%, 80%, 15%) relative to box
    const subCell = { x: 10, y: 20, w: 80, h: 15 }
    const parentX = boxXPct + (subCell.x / 100) * boxWPct
    const parentY = boxYPct + (subCell.y / 100) * boxHPct
    const parentW = (subCell.w / 100) * boxWPct
    const parentH = (subCell.h / 100) * boxHPct
    // Box start in percent: x ≈ 2.86%, y ≈ 46.04%
    expect(parentX).toBeCloseTo(boxXPct + 0.1 * boxWPct, 2)
    expect(parentY).toBeCloseTo(boxYPct + 0.2 * boxHPct, 2)
    expect(parentW).toBeCloseTo(0.8 * boxWPct, 2)
    expect(parentH).toBeCloseTo(0.15 * boxHPct, 2)
    // All values should be within 0-100%
    expect(parentX).toBeGreaterThan(0)
    expect(parentY).toBeGreaterThan(0)
    expect(parentX + parentW).toBeLessThan(100)
    expect(parentY + parentH).toBeLessThan(100)
  })
  it('should place sub-cell at box origin when sub coords are 0,0', () => {
    const imgW = 1000
    const imgH = 2000
    const box = { x: 100, y: 500, width: 800, height: 200 }
    const boxXPct = (box.x / imgW) * 100  // 10%
    const boxYPct = (box.y / imgH) * 100  // 25%
    const parentX = boxXPct + (0 / 100) * ((box.width / imgW) * 100)
    const parentY = boxYPct + (0 / 100) * ((box.height / imgH) * 100)
    expect(parentX).toBeCloseTo(10, 1)
    expect(parentY).toBeCloseTo(25, 1)
  })
 })
--- a/admin-lehrer/components/ocr-pipeline/usePixelWordPositions.ts
+++ b/admin-lehrer/components/ocr-pipeline/usePixelWordPositions.ts
@@ -1,5 +1,5 @@
 import { useEffect, useState } from 'react'
-import type { GridCell } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { GridCell } from '@/app/(admin)/ai/ocr-kombi/types'
 export interface WordPosition {
  xPct: number
--- a/admin-lehrer/lib/navigation.ts
+++ b/admin-lehrer/lib/navigation.ts
@@ -49,22 +49,6 @@ export const navigation: NavCategory[] = [
        purpose: 'E-Mail-Konten verwalten und KI-Kategorisierung nutzen. IMAP/SMTP Konfiguration, Vorlagen und Audit-Log.',
        audience: ['Support', 'Admins'],
      },
      {
        id: 'video-chat',
        name: 'Video & Chat',
        href: '/communication/video-chat',
        description: 'Matrix & Jitsi Monitoring',
        purpose: 'Dashboard fuer Matrix Synapse und Jitsi Meet. Service-Status, aktive Meetings, Traffic-Analyse und Ressourcen-Empfehlungen.',
        audience: ['Admins', 'DevOps'],
      },
      {
        id: 'voice-service',
        name: 'Voice Service',
        href: '/communication/matrix',
        description: 'PersonaPlex-7B & TaskOrchestrator',
        purpose: 'Voice-First Interface Konfiguration und Architektur-Dokumentation. Live Demo, Task States, Intents und DSGVO-Informationen.',
        audience: ['Entwickler', 'Admins'],
      },
      {
        id: 'alerts',
        name: 'Alerts Monitoring',
@@ -132,24 +116,6 @@ export const navigation: NavCategory[] = [
      // -----------------------------------------------------------------------
      // KI-Werkzeuge: Standalone-Tools fuer Entwicklung & QA
      // -----------------------------------------------------------------------
      {
        id: 'ocr-compare',
        name: 'OCR Vergleich',
        href: '/ai/ocr-compare',
        description: 'OCR-Methoden & Vokabel-Extraktion',
        purpose: 'Vergleichen Sie verschiedene OCR-Methoden (lokales LLM, Vision LLM, PaddleOCR, Tesseract, Anthropic) fuer Vokabel-Extraktion. Grid-Overlay, Block-Review und LLM-Vergleich.',
        audience: ['Entwickler', 'Data Scientists', 'Lehrer'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'ocr-pipeline',
        name: 'OCR Pipeline',
        href: '/ai/ocr-pipeline',
        description: 'Schrittweise Seitenrekonstruktion',
        purpose: 'Schrittweise Seitenrekonstruktion: Scan begradigen, Spalten erkennen, Woerter lokalisieren und die Seite Wort fuer Wort nachbauen. 6-Schritt-Pipeline mit Ground Truth Validierung.',
        audience: ['Entwickler', 'Data Scientists'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'ocr-kombi',
        name: 'OCR Kombi',
@@ -159,15 +125,6 @@ export const navigation: NavCategory[] = [
        audience: ['Entwickler'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'ocr-overlay',
        name: 'OCR Overlay (Legacy)',
        href: '/ai/ocr-overlay',
        description: 'Ganzseitige Overlay-Rekonstruktion',
        purpose: 'Arbeitsblatt ohne Spaltenerkennung direkt als Overlay rekonstruieren. Vereinfachte 7-Schritt-Pipeline.',
        audience: ['Entwickler'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'test-quality',
        name: 'Test Quality (BQAS)',
@@ -178,16 +135,6 @@ export const navigation: NavCategory[] = [
        oldAdminPath: '/admin/quality',
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'gpu',
        name: 'GPU Infrastruktur',
        href: '/ai/gpu',
        description: 'vast.ai GPU Management',
        purpose: 'Verwalten Sie GPU-Instanzen auf vast.ai fuer ML-Training und Inferenz.',
        audience: ['DevOps', 'Entwickler'],
        oldAdminPath: '/admin/gpu',
        subgroup: 'KI-Werkzeuge',
      },
      // -----------------------------------------------------------------------
      // KI-Anwendungen: Endnutzer-orientierte KI-Module
      // -----------------------------------------------------------------------
@@ -209,15 +156,6 @@ export const navigation: NavCategory[] = [
        audience: ['Entwickler', 'QA'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'model-management',
        name: 'Model Management',
        href: '/ai/model-management',
        description: 'ONNX & PyTorch Modell-Verwaltung',
        purpose: 'Verfuegbare ML-Modelle verwalten (PyTorch vs ONNX), Backend umschalten, Benchmark-Vergleiche ausfuehren und RAM/Performance-Metriken einsehen.',
        audience: ['Entwickler', 'DevOps'],
        subgroup: 'KI-Werkzeuge',
      },
      {
        id: 'agents',
        name: 'Agent Management',
--- a/admin-lehrer/package-lock.json
+++ b/admin-lehrer/package-lock.json
--- a/admin-lehrer/package.json
+++ b/admin-lehrer/package.json
@@ -18,6 +18,8 @@
    "test:all": "vitest run && playwright test --project=chromium"
  },
  "dependencies": {
    "@fortune-sheet/react": "^1.0.4",
    "fabric": "^6.0.0",
    "jspdf": "^4.1.0",
    "jszip": "^3.10.1",
    "lucide-react": "^0.468.0",
@@ -26,7 +28,6 @@
    "react-dom": "^18.3.1",
    "reactflow": "^11.11.4",
    "recharts": "^2.15.0",
    "fabric": "^6.0.0",
    "uuid": "^13.0.0"
  },
  "devDependencies": {
--- a/backend-lehrer/infra/init.py
+++ b/backend-lehrer/infra/init.py
@@ -1,10 +1 @@
-"""
+# Infrastructure module (vast.ai GPU management removed — see git history)
 Infrastructure management module.
 Provides control plane for external GPU resources (vast.ai).
 """
 from .vast_client import VastAIClient
 from .vast_power import router as vast_router
 __all__ = ["VastAIClient", "vast_router"]
--- a/backend-lehrer/infra/vast_client.py
+++ b/backend-lehrer/infra/vast_client.py
@@ -1,419 +0,0 @@
 """
 Vast.ai REST API Client.
 Verwendet die offizielle vast.ai API statt CLI fuer mehr Stabilitaet.
 API Dokumentation: https://docs.vast.ai/api
 """
 import asyncio
 import logging
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from enum import Enum
 from typing import Optional, Dict, Any, List
 import httpx
 logger = logging.getLogger(__name__)
 class InstanceStatus(Enum):
    """Vast.ai Instance Status."""
    RUNNING = "running"
    STOPPED = "stopped"
    EXITED = "exited"
    LOADING = "loading"
    SCHEDULING = "scheduling"
    CREATING = "creating"
    UNKNOWN = "unknown"
@dataclass
 class AccountInfo:
    """Informationen ueber den vast.ai Account."""
    credit: float  # Aktuelles Guthaben in USD
    balance: float  # Balance (meist 0)
    total_spend: float  # Gesamtausgaben
    username: str
    email: str
    has_billing: bool
    @classmethod
    def from_api_response(cls, data: Dict[str, Any]) -> "AccountInfo":
        """Erstellt AccountInfo aus API Response."""
        return cls(
            credit=data.get("credit", 0.0),
            balance=data.get("balance", 0.0),
            total_spend=abs(data.get("total_spend", 0.0)),  # API gibt negativ zurück
            username=data.get("username", ""),
            email=data.get("email", ""),
            has_billing=data.get("has_billing", False),
        )
    def to_dict(self) -> Dict[str, Any]:
        """Serialisiert zu Dictionary."""
        return {
            "credit": self.credit,
            "balance": self.balance,
            "total_spend": self.total_spend,
            "username": self.username,
            "email": self.email,
            "has_billing": self.has_billing,
        }
@dataclass
 class InstanceInfo:
    """Informationen ueber eine vast.ai Instanz."""
    id: int
    status: InstanceStatus
    machine_id: Optional[int] = None
    gpu_name: Optional[str] = None
    num_gpus: int = 1
    gpu_ram: Optional[float] = None  # GB
    cpu_ram: Optional[float] = None  # GB
    disk_space: Optional[float] = None  # GB
    dph_total: Optional[float] = None  # $/hour
    public_ipaddr: Optional[str] = None
    ports: Dict[str, Any] = field(default_factory=dict)
    label: Optional[str] = None
    image_uuid: Optional[str] = None
    started_at: Optional[datetime] = None
    @classmethod
    def from_api_response(cls, data: Dict[str, Any]) -> "InstanceInfo":
        """Erstellt InstanceInfo aus API Response."""
        status_map = {
            "running": InstanceStatus.RUNNING,
            "exited": InstanceStatus.EXITED,
            "loading": InstanceStatus.LOADING,
            "scheduling": InstanceStatus.SCHEDULING,
            "creating": InstanceStatus.CREATING,
        }
        actual_status = data.get("actual_status", "unknown")
        status = status_map.get(actual_status, InstanceStatus.UNKNOWN)
        # Parse ports mapping
        ports = {}
        if "ports" in data and data["ports"]:
            ports = data["ports"]
        # Parse started_at
        started_at = None
        if "start_date" in data and data["start_date"]:
            try:
                started_at = datetime.fromtimestamp(data["start_date"], tz=timezone.utc)
            except (ValueError, TypeError):
                pass
        return cls(
            id=data.get("id", 0),
            status=status,
            machine_id=data.get("machine_id"),
            gpu_name=data.get("gpu_name"),
            num_gpus=data.get("num_gpus", 1),
            gpu_ram=data.get("gpu_ram"),
            cpu_ram=data.get("cpu_ram"),
            disk_space=data.get("disk_space"),
            dph_total=data.get("dph_total"),
            public_ipaddr=data.get("public_ipaddr"),
            ports=ports,
            label=data.get("label"),
            image_uuid=data.get("image_uuid"),
            started_at=started_at,
        )
    def get_endpoint_url(self, internal_port: int = 8001) -> Optional[str]:
        """Berechnet die externe URL fuer einen internen Port."""
        if not self.public_ipaddr:
            return None
        # vast.ai mapped interne Ports auf externe Ports
        # Format: {"8001/tcp": [{"HostIp": "0.0.0.0", "HostPort": "12345"}]}
        port_key = f"{internal_port}/tcp"
        if port_key in self.ports:
            port_info = self.ports[port_key]
            if isinstance(port_info, list) and port_info:
                host_port = port_info[0].get("HostPort")
                if host_port:
                    return f"http://{self.public_ipaddr}:{host_port}"
        # Fallback: Direkter Port
        return f"http://{self.public_ipaddr}:{internal_port}"
    def to_dict(self) -> Dict[str, Any]:
        """Serialisiert zu Dictionary."""
        return {
            "id": self.id,
            "status": self.status.value,
            "machine_id": self.machine_id,
            "gpu_name": self.gpu_name,
            "num_gpus": self.num_gpus,
            "gpu_ram": self.gpu_ram,
            "cpu_ram": self.cpu_ram,
            "disk_space": self.disk_space,
            "dph_total": self.dph_total,
            "public_ipaddr": self.public_ipaddr,
            "ports": self.ports,
            "label": self.label,
            "started_at": self.started_at.isoformat() if self.started_at else None,
        }
 class VastAIClient:
    """
    Async Client fuer vast.ai REST API.
    Verwendet die offizielle API unter https://console.vast.ai/api/v0/
    """
    BASE_URL = "https://console.vast.ai/api/v0"
    def __init__(self, api_key: str, timeout: float = 30.0):
        self.api_key = api_key
        self.timeout = timeout
        self._client: Optional[httpx.AsyncClient] = None
    async def _get_client(self) -> httpx.AsyncClient:
        """Lazy Client-Erstellung."""
        if self._client is None or self._client.is_closed:
            self._client = httpx.AsyncClient(
                timeout=self.timeout,
                headers={
                    "Accept": "application/json",
                },
            )
        return self._client
    async def close(self) -> None:
        """Schliesst den HTTP Client."""
        if self._client and not self._client.is_closed:
            await self._client.aclose()
            self._client = None
    def _build_url(self, endpoint: str) -> str:
        """Baut vollstaendige URL mit API Key."""
        sep = "&" if "?" in endpoint else "?"
        return f"{self.BASE_URL}{endpoint}{sep}api_key={self.api_key}"
    async def list_instances(self) -> List[InstanceInfo]:
        """Listet alle Instanzen auf."""
        client = await self._get_client()
        url = self._build_url("/instances/")
        try:
            response = await client.get(url)
            response.raise_for_status()
            data = response.json()
            instances = []
            if "instances" in data:
                for inst_data in data["instances"]:
                    instances.append(InstanceInfo.from_api_response(inst_data))
            return instances
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error listing instances: {e}")
            raise
    async def get_instance(self, instance_id: int) -> Optional[InstanceInfo]:
        """Holt Details einer spezifischen Instanz."""
        client = await self._get_client()
        url = self._build_url(f"/instances/{instance_id}/")
        try:
            response = await client.get(url)
            response.raise_for_status()
            data = response.json()
            if "instances" in data:
                instances = data["instances"]
                # API gibt bei einzelner Instanz ein dict zurück, bei Liste eine Liste
                if isinstance(instances, list) and instances:
                    return InstanceInfo.from_api_response(instances[0])
                elif isinstance(instances, dict):
                    # Füge ID hinzu falls nicht vorhanden
                    if "id" not in instances:
                        instances["id"] = instance_id
                    return InstanceInfo.from_api_response(instances)
            elif isinstance(data, dict) and "id" in data:
                return InstanceInfo.from_api_response(data)
            return None
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 404:
                return None
            logger.error(f"vast.ai API error getting instance {instance_id}: {e}")
            raise
    async def start_instance(self, instance_id: int) -> bool:
        """Startet eine gestoppte Instanz."""
        client = await self._get_client()
        url = self._build_url(f"/instances/{instance_id}/")
        try:
            response = await client.put(
                url,
                json={"state": "running"},
            )
            response.raise_for_status()
            logger.info(f"vast.ai instance {instance_id} start requested")
            return True
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error starting instance {instance_id}: {e}")
            return False
    async def stop_instance(self, instance_id: int) -> bool:
        """Stoppt eine laufende Instanz (haelt Disk)."""
        client = await self._get_client()
        url = self._build_url(f"/instances/{instance_id}/")
        try:
            response = await client.put(
                url,
                json={"state": "stopped"},
            )
            response.raise_for_status()
            logger.info(f"vast.ai instance {instance_id} stop requested")
            return True
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error stopping instance {instance_id}: {e}")
            return False
    async def destroy_instance(self, instance_id: int) -> bool:
        """Loescht eine Instanz komplett (Disk weg!)."""
        client = await self._get_client()
        url = self._build_url(f"/instances/{instance_id}/")
        try:
            response = await client.delete(url)
            response.raise_for_status()
            logger.info(f"vast.ai instance {instance_id} destroyed")
            return True
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error destroying instance {instance_id}: {e}")
            return False
    async def set_label(self, instance_id: int, label: str) -> bool:
        """Setzt ein Label fuer eine Instanz."""
        client = await self._get_client()
        url = self._build_url(f"/instances/{instance_id}/")
        try:
            response = await client.put(
                url,
                json={"label": label},
            )
            response.raise_for_status()
            return True
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error setting label on instance {instance_id}: {e}")
            return False
    async def wait_for_status(
        self,
        instance_id: int,
        target_status: InstanceStatus,
        timeout_seconds: int = 300,
        poll_interval: float = 5.0,
    ) -> Optional[InstanceInfo]:
        """
        Wartet bis eine Instanz einen bestimmten Status erreicht.
        Returns:
            InstanceInfo wenn Status erreicht, None bei Timeout.
        """
        deadline = asyncio.get_event_loop().time() + timeout_seconds
        while asyncio.get_event_loop().time() < deadline:
            instance = await self.get_instance(instance_id)
            if instance and instance.status == target_status:
                return instance
            if instance:
                logger.debug(
                    f"vast.ai instance {instance_id} status: {instance.status.value}, "
                    f"waiting for {target_status.value}"
                )
            await asyncio.sleep(poll_interval)
        logger.warning(
            f"Timeout waiting for instance {instance_id} to reach {target_status.value}"
        )
        return None
    async def wait_for_health(
        self,
        instance: InstanceInfo,
        health_path: str = "/health",
        internal_port: int = 8001,
        timeout_seconds: int = 600,
        poll_interval: float = 5.0,
    ) -> bool:
        """
        Wartet bis der Health-Endpoint erreichbar ist.
        Returns:
            True wenn Health OK, False bei Timeout.
        """
        endpoint = instance.get_endpoint_url(internal_port)
        if not endpoint:
            logger.error("No endpoint URL available for health check")
            return False
        health_url = f"{endpoint.rstrip('/')}{health_path}"
        logger.info(f"Waiting for health at {health_url}")
        deadline = asyncio.get_event_loop().time() + timeout_seconds
        health_client = httpx.AsyncClient(timeout=5.0)
        try:
            while asyncio.get_event_loop().time() < deadline:
                try:
                    response = await health_client.get(health_url)
                    if 200 <= response.status_code < 300:
                        logger.info(f"Health check passed: {health_url}")
                        return True
                except Exception as e:
                    logger.debug(f"Health check failed: {e}")
                await asyncio.sleep(poll_interval)
            logger.warning(f"Health check timeout: {health_url}")
            return False
        finally:
            await health_client.aclose()
    async def get_account_info(self) -> Optional[AccountInfo]:
        """
        Holt Account-Informationen inkl. Credit/Budget.
        Returns:
            AccountInfo oder None bei Fehler.
        """
        client = await self._get_client()
        url = self._build_url("/users/current/")
        try:
            response = await client.get(url)
            response.raise_for_status()
            data = response.json()
            return AccountInfo.from_api_response(data)
        except httpx.HTTPStatusError as e:
            logger.error(f"vast.ai API error getting account info: {e}")
            return None
        except Exception as e:
            logger.error(f"Error getting vast.ai account info: {e}")
            return None
--- a/backend-lehrer/infra/vast_power.py
+++ b/backend-lehrer/infra/vast_power.py
@@ -1,618 +0,0 @@
 """
 Vast.ai Power Control API.
 Stellt Endpoints bereit fuer:
 - Start/Stop von vast.ai Instanzen
 - Status-Abfrage
 - Auto-Shutdown bei Inaktivitaet
 - Kosten-Tracking
 Sicherheit: Alle Endpoints erfordern CONTROL_API_KEY.
 """
 import asyncio
 import json
 import logging
 import os
 import time
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Optional, Dict, Any, List
 from fastapi import APIRouter, Depends, HTTPException, Header, BackgroundTasks
 from pydantic import BaseModel, Field
 from .vast_client import VastAIClient, InstanceInfo, InstanceStatus, AccountInfo
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/infra/vast", tags=["Infrastructure"])
 # -------------------------
 # Configuration (ENV)
 # -------------------------
 VAST_API_KEY = os.getenv("VAST_API_KEY")
 VAST_INSTANCE_ID = os.getenv("VAST_INSTANCE_ID")  # Numeric instance ID
 CONTROL_API_KEY = os.getenv("CONTROL_API_KEY")  # Admin key for these endpoints
 # Health check configuration
 VAST_HEALTH_PORT = int(os.getenv("VAST_HEALTH_PORT", "8001"))
 VAST_HEALTH_PATH = os.getenv("VAST_HEALTH_PATH", "/health")
 VAST_WAIT_TIMEOUT_S = int(os.getenv("VAST_WAIT_TIMEOUT_S", "600"))  # 10 min
 # Auto-shutdown configuration
 AUTO_SHUTDOWN_ENABLED = os.getenv("VAST_AUTO_SHUTDOWN", "true").lower() == "true"
 AUTO_SHUTDOWN_MINUTES = int(os.getenv("VAST_AUTO_SHUTDOWN_MINUTES", "30"))
 # State persistence (in /tmp for container compatibility)
 STATE_PATH = Path(os.getenv("VAST_STATE_PATH", "/tmp/vast_state.json"))
 AUDIT_PATH = Path(os.getenv("VAST_AUDIT_PATH", "/tmp/vast_audit.log"))
 # -------------------------
 # State Management
 # -------------------------
 class VastState:
    """
    Persistenter State fuer vast.ai Kontrolle.
    Speichert:
    - Aktueller Endpunkt (weil IP sich aendern kann)
    - Letzte Aktivitaet (fuer Auto-Shutdown)
    - Kosten-Tracking
    """
    def __init__(self, path: Path = STATE_PATH):
        self.path = path
        self._state: Dict[str, Any] = self._load()
    def _load(self) -> Dict[str, Any]:
        """Laedt State von Disk."""
        if not self.path.exists():
            return {
                "desired_state": None,
                "endpoint_base_url": None,
                "last_activity": None,
                "last_start": None,
                "last_stop": None,
                "total_runtime_seconds": 0,
                "total_cost_usd": 0.0,
            }
        try:
            return json.loads(self.path.read_text(encoding="utf-8"))
        except Exception:
            return {}
    def _save(self) -> None:
        """Speichert State auf Disk."""
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.path.write_text(
            json.dumps(self._state, ensure_ascii=False, indent=2),
            encoding="utf-8",
        )
    def get(self, key: str, default: Any = None) -> Any:
        return self._state.get(key, default)
    def set(self, key: str, value: Any) -> None:
        self._state[key] = value
        self._save()
    def update(self, data: Dict[str, Any]) -> None:
        self._state.update(data)
        self._save()
    def record_activity(self) -> None:
        """Zeichnet letzte Aktivitaet auf (fuer Auto-Shutdown)."""
        self._state["last_activity"] = datetime.now(timezone.utc).isoformat()
        self._save()
    def get_last_activity(self) -> Optional[datetime]:
        """Gibt letzte Aktivitaet als datetime."""
        ts = self._state.get("last_activity")
        if ts:
            return datetime.fromisoformat(ts)
        return None
    def record_start(self) -> None:
        """Zeichnet Start-Zeit auf."""
        self._state["last_start"] = datetime.now(timezone.utc).isoformat()
        self._state["desired_state"] = "RUNNING"
        self._save()
    def record_stop(self, dph_total: Optional[float] = None) -> None:
        """Zeichnet Stop-Zeit auf und berechnet Kosten."""
        now = datetime.now(timezone.utc)
        self._state["last_stop"] = now.isoformat()
        self._state["desired_state"] = "STOPPED"
        # Berechne Runtime und Kosten
        last_start = self._state.get("last_start")
        if last_start:
            start_dt = datetime.fromisoformat(last_start)
            runtime_seconds = (now - start_dt).total_seconds()
            self._state["total_runtime_seconds"] = (
                self._state.get("total_runtime_seconds", 0) + runtime_seconds
            )
            if dph_total:
                hours = runtime_seconds / 3600
                cost = hours * dph_total
                self._state["total_cost_usd"] = (
                    self._state.get("total_cost_usd", 0.0) + cost
                )
                logger.info(
                    f"Session cost: ${cost:.3f} ({runtime_seconds/60:.1f} min @ ${dph_total}/h)"
                )
        self._save()
 # Global state instance
 _state = VastState()
 # -------------------------
 # Audit Logging
 # -------------------------
 def audit_log(event: str, actor: str = "system", meta: Optional[Dict[str, Any]] = None) -> None:
    """Schreibt Audit-Log Eintrag."""
    meta = meta or {}
    line = json.dumps(
        {
            "ts": datetime.now(timezone.utc).isoformat(),
            "event": event,
            "actor": actor,
            "meta": meta,
        },
        ensure_ascii=False,
    )
    AUDIT_PATH.parent.mkdir(parents=True, exist_ok=True)
    with AUDIT_PATH.open("a", encoding="utf-8") as f:
        f.write(line + "\n")
    logger.info(f"AUDIT: {event} by {actor}")
 # -------------------------
 # Request/Response Models
 # -------------------------
 class PowerOnRequest(BaseModel):
    wait_for_health: bool = Field(default=True, description="Warten bis LLM bereit")
    health_path: str = Field(default=VAST_HEALTH_PATH)
    health_port: int = Field(default=VAST_HEALTH_PORT)
 class PowerOnResponse(BaseModel):
    status: str
    instance_id: Optional[int] = None
    endpoint_base_url: Optional[str] = None
    health_url: Optional[str] = None
    message: Optional[str] = None
 class PowerOffRequest(BaseModel):
    pass  # Keine Parameter noetig
 class PowerOffResponse(BaseModel):
    status: str
    session_runtime_minutes: Optional[float] = None
    session_cost_usd: Optional[float] = None
    message: Optional[str] = None
 class VastStatusResponse(BaseModel):
    instance_id: Optional[int] = None
    status: str
    gpu_name: Optional[str] = None
    dph_total: Optional[float] = None
    endpoint_base_url: Optional[str] = None
    last_activity: Optional[str] = None
    auto_shutdown_in_minutes: Optional[int] = None
    total_runtime_hours: Optional[float] = None
    total_cost_usd: Optional[float] = None
    # Budget / Credit Informationen
    account_credit: Optional[float] = None  # Verbleibendes Guthaben in USD
    account_total_spend: Optional[float] = None  # Gesamtausgaben auf vast.ai
    # Session-Kosten (seit letztem Start)
    session_runtime_minutes: Optional[float] = None
    session_cost_usd: Optional[float] = None
    message: Optional[str] = None
 class CostStatsResponse(BaseModel):
    total_runtime_hours: float
    total_cost_usd: float
    sessions_count: int
    avg_session_minutes: float
 # -------------------------
 # Security Dependency
 # -------------------------
 def require_control_key(x_api_key: Optional[str] = Header(default=None)) -> None:
    """
    Admin-Schutz fuer Control-Endpoints.
    Header: X-API-Key: <CONTROL_API_KEY>
    """
    if not CONTROL_API_KEY:
        raise HTTPException(
            status_code=500,
            detail="CONTROL_API_KEY not configured on server",
        )
    if x_api_key != CONTROL_API_KEY:
        raise HTTPException(status_code=401, detail="Unauthorized")
 # -------------------------
 # Auto-Shutdown Background Task
 # -------------------------
 _shutdown_task: Optional[asyncio.Task] = None
 async def auto_shutdown_monitor() -> None:
    """
    Hintergrund-Task der bei Inaktivitaet die Instanz stoppt.
    Laeuft permanent wenn Instanz an ist und prueft alle 60s ob
    Aktivitaet stattfand. Stoppt Instanz wenn keine Aktivitaet
    seit AUTO_SHUTDOWN_MINUTES.
    """
    if not VAST_API_KEY or not VAST_INSTANCE_ID:
        return
    client = VastAIClient(VAST_API_KEY)
    try:
        while True:
            await asyncio.sleep(60)  # Check every minute
            if not AUTO_SHUTDOWN_ENABLED:
                continue
            last_activity = _state.get_last_activity()
            if not last_activity:
                continue
            # Berechne Inaktivitaet
            now = datetime.now(timezone.utc)
            inactive_minutes = (now - last_activity).total_seconds() / 60
            if inactive_minutes >= AUTO_SHUTDOWN_MINUTES:
                logger.info(
                    f"Auto-shutdown triggered: {inactive_minutes:.1f} min inactive"
                )
                audit_log(
                    "auto_shutdown",
                    actor="system",
                    meta={"inactive_minutes": inactive_minutes},
                )
                # Hole aktuelle Instanz-Info fuer Kosten
                instance = await client.get_instance(int(VAST_INSTANCE_ID))
                dph = instance.dph_total if instance else None
                # Stop
                await client.stop_instance(int(VAST_INSTANCE_ID))
                _state.record_stop(dph_total=dph)
                audit_log("auto_shutdown_complete", actor="system")
    except asyncio.CancelledError:
        pass
    except Exception as e:
        logger.error(f"Auto-shutdown monitor error: {e}")
    finally:
        await client.close()
 def start_auto_shutdown_monitor() -> None:
    """Startet den Auto-Shutdown Monitor."""
    global _shutdown_task
    if _shutdown_task is None or _shutdown_task.done():
        _shutdown_task = asyncio.create_task(auto_shutdown_monitor())
        logger.info("Auto-shutdown monitor started")
 def stop_auto_shutdown_monitor() -> None:
    """Stoppt den Auto-Shutdown Monitor."""
    global _shutdown_task
    if _shutdown_task and not _shutdown_task.done():
        _shutdown_task.cancel()
        logger.info("Auto-shutdown monitor stopped")
 # -------------------------
 # API Endpoints
 # -------------------------
@router.get("/status", response_model=VastStatusResponse, dependencies=[Depends(require_control_key)])
 async def get_status() -> VastStatusResponse:
    """
    Gibt Status der vast.ai Instanz zurueck.
    Inkludiert:
    - Aktueller Status (running/stopped/etc)
    - GPU Info und Kosten pro Stunde
    - Endpoint URL
    - Auto-Shutdown Timer
    - Gesamtkosten
    - Account Credit (verbleibendes Budget)
    - Session-Kosten (seit letztem Start)
    """
    if not VAST_API_KEY or not VAST_INSTANCE_ID:
        return VastStatusResponse(
            status="unconfigured",
            message="VAST_API_KEY or VAST_INSTANCE_ID not set",
        )
    client = VastAIClient(VAST_API_KEY)
    try:
        instance = await client.get_instance(int(VAST_INSTANCE_ID))
        if not instance:
            return VastStatusResponse(
                instance_id=int(VAST_INSTANCE_ID),
                status="not_found",
                message=f"Instance {VAST_INSTANCE_ID} not found",
            )
        # Hole Account-Info fuer Budget/Credit
        account_info = await client.get_account_info()
        account_credit = account_info.credit if account_info else None
        account_total_spend = account_info.total_spend if account_info else None
        # Update endpoint if running
        endpoint = None
        if instance.status == InstanceStatus.RUNNING:
            endpoint = instance.get_endpoint_url(VAST_HEALTH_PORT)
            if endpoint:
                _state.set("endpoint_base_url", endpoint)
        # Calculate auto-shutdown timer
        auto_shutdown_minutes = None
        if AUTO_SHUTDOWN_ENABLED and instance.status == InstanceStatus.RUNNING:
            last_activity = _state.get_last_activity()
            if last_activity:
                inactive = (datetime.now(timezone.utc) - last_activity).total_seconds() / 60
                auto_shutdown_minutes = max(0, int(AUTO_SHUTDOWN_MINUTES - inactive))
        # Berechne aktuelle Session-Kosten (wenn Instanz laeuft)
        session_runtime_minutes = None
        session_cost_usd = None
        last_start = _state.get("last_start")
        # Falls Instanz laeuft aber kein last_start gesetzt (z.B. nach Container-Neustart),
        # nutze start_date aus der vast.ai API falls vorhanden, sonst jetzt
        if instance.status == InstanceStatus.RUNNING and not last_start:
            if instance.started_at:
                _state.set("last_start", instance.started_at.isoformat())
                last_start = instance.started_at.isoformat()
            else:
                _state.record_start()
                last_start = _state.get("last_start")
        if last_start and instance.status == InstanceStatus.RUNNING:
            start_dt = datetime.fromisoformat(last_start)
            session_runtime_minutes = (datetime.now(timezone.utc) - start_dt).total_seconds() / 60
            if instance.dph_total:
                session_cost_usd = (session_runtime_minutes / 60) * instance.dph_total
        return VastStatusResponse(
            instance_id=instance.id,
            status=instance.status.value,
            gpu_name=instance.gpu_name,
            dph_total=instance.dph_total,
            endpoint_base_url=endpoint or _state.get("endpoint_base_url"),
            last_activity=_state.get("last_activity"),
            auto_shutdown_in_minutes=auto_shutdown_minutes,
            total_runtime_hours=_state.get("total_runtime_seconds", 0) / 3600,
            total_cost_usd=_state.get("total_cost_usd", 0.0),
            account_credit=account_credit,
            account_total_spend=account_total_spend,
            session_runtime_minutes=session_runtime_minutes,
            session_cost_usd=session_cost_usd,
        )
    finally:
        await client.close()
@router.post("/power/on", response_model=PowerOnResponse, dependencies=[Depends(require_control_key)])
 async def power_on(
    payload: PowerOnRequest,
    background_tasks: BackgroundTasks,
 ) -> PowerOnResponse:
    """
    Startet die vast.ai Instanz.
    1. Startet Instanz via API
    2. Wartet auf Status RUNNING
    3. Optional: Wartet auf Health-Endpoint
    4. Startet Auto-Shutdown Monitor
    """
    if not VAST_API_KEY or not VAST_INSTANCE_ID:
        raise HTTPException(
            status_code=500,
            detail="VAST_API_KEY or VAST_INSTANCE_ID not configured",
        )
    instance_id = int(VAST_INSTANCE_ID)
    audit_log("power_on_requested", meta={"instance_id": instance_id})
    client = VastAIClient(VAST_API_KEY)
    try:
        # Start instance
        success = await client.start_instance(instance_id)
        if not success:
            raise HTTPException(status_code=502, detail="Failed to start instance")
        _state.record_start()
        _state.record_activity()
        # Wait for running status
        instance = await client.wait_for_status(
            instance_id,
            InstanceStatus.RUNNING,
            timeout_seconds=300,
        )
        if not instance:
            return PowerOnResponse(
                status="starting",
                instance_id=instance_id,
                message="Instance start requested but not yet running. Check status.",
            )
        # Get endpoint
        endpoint = instance.get_endpoint_url(payload.health_port)
        if endpoint:
            _state.set("endpoint_base_url", endpoint)
        # Wait for health if requested
        if payload.wait_for_health:
            health_ok = await client.wait_for_health(
                instance,
                health_path=payload.health_path,
                internal_port=payload.health_port,
                timeout_seconds=VAST_WAIT_TIMEOUT_S,
            )
            if not health_ok:
                audit_log("power_on_health_timeout", meta={"instance_id": instance_id})
                return PowerOnResponse(
                    status="running_unhealthy",
                    instance_id=instance_id,
                    endpoint_base_url=endpoint,
                    message=f"Instance running but health check failed at {endpoint}{payload.health_path}",
                )
        # Start auto-shutdown monitor
        start_auto_shutdown_monitor()
        audit_log("power_on_complete", meta={
            "instance_id": instance_id,
            "endpoint": endpoint,
        })
        return PowerOnResponse(
            status="running",
            instance_id=instance_id,
            endpoint_base_url=endpoint,
            health_url=f"{endpoint}{payload.health_path}" if endpoint else None,
            message="Instance running and healthy",
        )
    finally:
        await client.close()
@router.post("/power/off", response_model=PowerOffResponse, dependencies=[Depends(require_control_key)])
 async def power_off(payload: PowerOffRequest) -> PowerOffResponse:
    """
    Stoppt die vast.ai Instanz (behaelt Disk).
    Berechnet Session-Kosten und -Laufzeit.
    """
    if not VAST_API_KEY or not VAST_INSTANCE_ID:
        raise HTTPException(
            status_code=500,
            detail="VAST_API_KEY or VAST_INSTANCE_ID not configured",
        )
    instance_id = int(VAST_INSTANCE_ID)
    audit_log("power_off_requested", meta={"instance_id": instance_id})
    # Stop auto-shutdown monitor
    stop_auto_shutdown_monitor()
    client = VastAIClient(VAST_API_KEY)
    try:
        # Get current info for cost calculation
        instance = await client.get_instance(instance_id)
        dph = instance.dph_total if instance else None
        # Calculate session stats before updating state
        session_runtime = 0.0
        session_cost = 0.0
        last_start = _state.get("last_start")
        if last_start:
            start_dt = datetime.fromisoformat(last_start)
            session_runtime = (datetime.now(timezone.utc) - start_dt).total_seconds() / 60
            if dph:
                session_cost = (session_runtime / 60) * dph
        # Stop instance
        success = await client.stop_instance(instance_id)
        if not success:
            raise HTTPException(status_code=502, detail="Failed to stop instance")
        _state.record_stop(dph_total=dph)
        audit_log("power_off_complete", meta={
            "instance_id": instance_id,
            "session_minutes": session_runtime,
            "session_cost": session_cost,
        })
        return PowerOffResponse(
            status="stopped",
            session_runtime_minutes=session_runtime,
            session_cost_usd=session_cost,
            message=f"Instance stopped. Session: {session_runtime:.1f} min, ${session_cost:.3f}",
        )
    finally:
        await client.close()
@router.post("/activity", dependencies=[Depends(require_control_key)])
 async def record_activity() -> Dict[str, str]:
    """
    Zeichnet Aktivitaet auf (verzoegert Auto-Shutdown).
    Sollte von LLM Gateway aufgerufen werden bei jedem Request.
    """
    _state.record_activity()
    return {"status": "recorded", "last_activity": _state.get("last_activity")}
@router.get("/costs", response_model=CostStatsResponse, dependencies=[Depends(require_control_key)])
 async def get_costs() -> CostStatsResponse:
    """
    Gibt Kosten-Statistiken zurueck.
    """
    total_seconds = _state.get("total_runtime_seconds", 0)
    total_cost = _state.get("total_cost_usd", 0.0)
    # TODO: Sessions count from audit log
    sessions = 1 if total_seconds > 0 else 0
    avg_minutes = (total_seconds / 60 / sessions) if sessions > 0 else 0
    return CostStatsResponse(
        total_runtime_hours=total_seconds / 3600,
        total_cost_usd=total_cost,
        sessions_count=sessions,
        avg_session_minutes=avg_minutes,
    )
@router.get("/audit", dependencies=[Depends(require_control_key)])
 async def get_audit_log(limit: int = 50) -> List[Dict[str, Any]]:
    """
    Gibt letzte Audit-Log Eintraege zurueck.
    """
    if not AUDIT_PATH.exists():
        return []
    lines = AUDIT_PATH.read_text(encoding="utf-8").strip().split("\n")
    entries = []
    for line in lines[-limit:]:
        try:
            entries.append(json.loads(line))
        except json.JSONDecodeError:
            continue
    return list(reversed(entries))  # Neueste zuerst
--- a/backend-lehrer/jitsi_api.py
+++ b/backend-lehrer/jitsi_api.py
@@ -1,199 +0,0 @@
 """
 BreakPilot Jitsi API
 Ermoeglicht das Versenden von Jitsi-Meeting-Einladungen per Email.
 """
 import os
 import uuid
 from datetime import datetime
 from typing import Optional, List
 from pydantic import BaseModel, Field
 from fastapi import APIRouter, HTTPException
 router = APIRouter(prefix="/api/jitsi", tags=["Jitsi"])
 # Standard Jitsi Server (kann konfiguriert werden)
 JITSI_SERVER = os.getenv("JITSI_SERVER", "https://meet.jit.si")
 # ==========================================
 # PYDANTIC MODELS
 # ==========================================
 class JitsiInvitation(BaseModel):
    """Model fuer Jitsi-Meeting-Einladung."""
    to_email: str = Field(..., description="Email-Adresse des Teilnehmers")
    to_name: str = Field(..., description="Name des Teilnehmers")
    organizer_name: str = Field(default="BreakPilot Lehrer", description="Name des Organisators")
    meeting_title: str = Field(..., description="Titel des Meetings")
    meeting_date: str = Field(..., description="Datum z.B. '20. Dezember 2024'")
    meeting_time: str = Field(..., description="Uhrzeit z.B. '14:00 Uhr'")
    room_name: Optional[str] = Field(None, description="Raumname (wird generiert wenn leer)")
    additional_info: Optional[str] = Field(None, description="Zusaetzliche Informationen")
 class JitsiInvitationResponse(BaseModel):
    """Antwort auf eine Jitsi-Einladung."""
    success: bool
    jitsi_url: str
    room_name: str
    email_sent: bool
    email_error: Optional[str] = None
 class JitsiBulkInvitation(BaseModel):
    """Model fuer mehrere Jitsi-Einladungen."""
    recipients: List[dict] = Field(..., description="Liste von {email, name} Objekten")
    organizer_name: str = Field(default="BreakPilot Lehrer")
    meeting_title: str
    meeting_date: str
    meeting_time: str
    room_name: Optional[str] = None
    additional_info: Optional[str] = None
 class JitsiBulkResponse(BaseModel):
    """Antwort auf Bulk-Einladungen."""
    jitsi_url: str
    room_name: str
    sent: int
    failed: int
    errors: List[str]
 # ==========================================
 # HELPER FUNCTIONS
 # ==========================================
 def generate_room_name() -> str:
    """Generiert einen sicheren Raumnamen."""
    # UUID-basiert fuer Sicherheit
    unique_id = uuid.uuid4().hex[:12]
    return f"BreakPilot-{unique_id}"
 def build_jitsi_url(room_name: str) -> str:
    """Erstellt die vollstaendige Jitsi-URL."""
    return f"{JITSI_SERVER}/{room_name}"
 # ==========================================
 # API ENDPOINTS
 # ==========================================
@router.post("/invite", response_model=JitsiInvitationResponse)
 async def send_jitsi_invitation(invitation: JitsiInvitation):
    """
    Sendet eine Jitsi-Meeting-Einladung per Email.
    Der Empfaenger kann dem Meeting ueber den Browser beitreten,
    ohne Matrix oder andere Software installieren zu muessen.
    """
    # Raumname generieren oder verwenden
    room_name = invitation.room_name or generate_room_name()
    jitsi_url = build_jitsi_url(room_name)
    email_sent = False
    email_error = None
    try:
        from email_service import email_service
        result = email_service.send_jitsi_invitation(
            to_email=invitation.to_email,
            to_name=invitation.to_name,
            organizer_name=invitation.organizer_name,
            meeting_title=invitation.meeting_title,
            meeting_date=invitation.meeting_date,
            meeting_time=invitation.meeting_time,
            jitsi_url=jitsi_url,
            additional_info=invitation.additional_info
        )
        email_sent = result.success
        if not result.success:
            email_error = result.error
    except Exception as e:
        email_error = str(e)
    return JitsiInvitationResponse(
        success=email_sent,
        jitsi_url=jitsi_url,
        room_name=room_name,
        email_sent=email_sent,
        email_error=email_error
    )
@router.post("/invite/bulk", response_model=JitsiBulkResponse)
 async def send_bulk_jitsi_invitations(bulk: JitsiBulkInvitation):
    """
    Sendet Jitsi-Einladungen an mehrere Empfaenger.
    Alle Empfaenger erhalten eine Einladung zum selben Meeting.
    """
    # Gemeinsamer Raumname fuer alle
    room_name = bulk.room_name or generate_room_name()
    jitsi_url = build_jitsi_url(room_name)
    sent = 0
    failed = 0
    errors = []
    try:
        from email_service import email_service
        for recipient in bulk.recipients:
            if not recipient.get("email"):
                errors.append(f"Fehlende Email fuer {recipient.get('name', 'Unbekannt')}")
                failed += 1
                continue
            result = email_service.send_jitsi_invitation(
                to_email=recipient["email"],
                to_name=recipient.get("name", ""),
                organizer_name=bulk.organizer_name,
                meeting_title=bulk.meeting_title,
                meeting_date=bulk.meeting_date,
                meeting_time=bulk.meeting_time,
                jitsi_url=jitsi_url,
                additional_info=bulk.additional_info
            )
            if result.success:
                sent += 1
            else:
                failed += 1
                errors.append(f"{recipient.get('email')}: {result.error}")
    except Exception as e:
        errors.append(f"Allgemeiner Fehler: {str(e)}")
    return JitsiBulkResponse(
        jitsi_url=jitsi_url,
        room_name=room_name,
        sent=sent,
        failed=failed,
        errors=errors[:20]  # Max 20 Fehler zurueckgeben
    )
@router.get("/room")
 async def generate_meeting_room():
    """
    Generiert einen neuen Meeting-Raum.
    Gibt die URL zurueck ohne Einladungen zu senden.
    """
    room_name = generate_room_name()
    jitsi_url = build_jitsi_url(room_name)
    return {
        "room_name": room_name,
        "jitsi_url": jitsi_url,
        "server": JITSI_SERVER,
        "created_at": datetime.utcnow().isoformat()
    }
--- a/backend-lehrer/learning_units_api.py
+++ b/backend-lehrer/learning_units_api.py
@@ -1,5 +1,9 @@
 from typing import List, Dict, Any, Optional
 from datetime import datetime
 from pathlib import Path
 import json
 import os
 import logging
 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel
@@ -15,6 +19,8 @@ from learning_units import (
    delete_learning_unit,
 )
 logger = logging.getLogger(__name__)
 router = APIRouter(
    prefix="/learning-units",
@@ -49,6 +55,11 @@ class RemoveWorksheetPayload(BaseModel):
    worksheet_file: str
 class GenerateFromAnalysisPayload(BaseModel):
    analysis_data: Dict[str, Any]
    num_questions: int = 8
 # ---------- Hilfsfunktion: Backend-Modell -> Frontend-Objekt ----------
@@ -195,3 +206,171 @@ def api_delete_learning_unit(unit_id: str):
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    return {"status": "deleted", "id": unit_id}
 # ---------- Generator-Endpunkte ----------
 LERNEINHEITEN_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten")
 def _save_analysis_and_get_path(unit_id: str, analysis_data: Dict[str, Any]) -> Path:
    """Save analysis_data to disk and return the path."""
    os.makedirs(LERNEINHEITEN_DIR, exist_ok=True)
    path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_analyse.json"
    with open(path, "w", encoding="utf-8") as f:
        json.dump(analysis_data, f, ensure_ascii=False, indent=2)
    return path
@router.post("/{unit_id}/generate-qa")
 def api_generate_qa(unit_id: str, payload: GenerateFromAnalysisPayload):
    """Generate Q&A items with Leitner fields from analysis data."""
    lu = get_learning_unit(unit_id)
    if not lu:
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
    try:
        from ai_processing.qa_generator import generate_qa_from_analysis
        qa_path = generate_qa_from_analysis(analysis_path, num_questions=payload.num_questions)
        with open(qa_path, "r", encoding="utf-8") as f:
            qa_data = json.load(f)
        # Update unit status
        update_learning_unit(unit_id, LearningUnitUpdate(status="qa_generated"))
        logger.info(f"Generated QA for unit {unit_id}: {len(qa_data.get('qa_items', []))} items")
        return qa_data
    except Exception as e:
        logger.error(f"QA generation failed for {unit_id}: {e}")
        raise HTTPException(status_code=500, detail=f"QA-Generierung fehlgeschlagen: {e}")
@router.post("/{unit_id}/generate-mc")
 def api_generate_mc(unit_id: str, payload: GenerateFromAnalysisPayload):
    """Generate multiple choice questions from analysis data."""
    lu = get_learning_unit(unit_id)
    if not lu:
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
    try:
        from ai_processing.mc_generator import generate_mc_from_analysis
        mc_path = generate_mc_from_analysis(analysis_path, num_questions=payload.num_questions)
        with open(mc_path, "r", encoding="utf-8") as f:
            mc_data = json.load(f)
        update_learning_unit(unit_id, LearningUnitUpdate(status="mc_generated"))
        logger.info(f"Generated MC for unit {unit_id}: {len(mc_data.get('questions', []))} questions")
        return mc_data
    except Exception as e:
        logger.error(f"MC generation failed for {unit_id}: {e}")
        raise HTTPException(status_code=500, detail=f"MC-Generierung fehlgeschlagen: {e}")
@router.post("/{unit_id}/generate-cloze")
 def api_generate_cloze(unit_id: str, payload: GenerateFromAnalysisPayload):
    """Generate cloze (fill-in-the-blank) items from analysis data."""
    lu = get_learning_unit(unit_id)
    if not lu:
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
    try:
        from ai_processing.cloze_generator import generate_cloze_from_analysis
        cloze_path = generate_cloze_from_analysis(analysis_path)
        with open(cloze_path, "r", encoding="utf-8") as f:
            cloze_data = json.load(f)
        update_learning_unit(unit_id, LearningUnitUpdate(status="cloze_generated"))
        logger.info(f"Generated Cloze for unit {unit_id}: {len(cloze_data.get('cloze_items', []))} items")
        return cloze_data
    except Exception as e:
        logger.error(f"Cloze generation failed for {unit_id}: {e}")
        raise HTTPException(status_code=500, detail=f"Cloze-Generierung fehlgeschlagen: {e}")
@router.get("/{unit_id}/qa")
 def api_get_qa(unit_id: str):
    """Get generated QA items for a unit."""
    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
    if not qa_path.exists():
        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
    with open(qa_path, "r", encoding="utf-8") as f:
        return json.load(f)
@router.get("/{unit_id}/mc")
 def api_get_mc(unit_id: str):
    """Get generated MC questions for a unit."""
    mc_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_mc.json"
    if not mc_path.exists():
        raise HTTPException(status_code=404, detail="Keine MC-Daten gefunden.")
    with open(mc_path, "r", encoding="utf-8") as f:
        return json.load(f)
@router.get("/{unit_id}/cloze")
 def api_get_cloze(unit_id: str):
    """Get generated cloze items for a unit."""
    cloze_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_cloze.json"
    if not cloze_path.exists():
        raise HTTPException(status_code=404, detail="Keine Cloze-Daten gefunden.")
    with open(cloze_path, "r", encoding="utf-8") as f:
        return json.load(f)
@router.post("/{unit_id}/leitner/update")
 def api_update_leitner(unit_id: str, item_id: str, correct: bool):
    """Update Leitner progress for a QA item."""
    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
    if not qa_path.exists():
        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
    try:
        from ai_processing.qa_generator import update_leitner_progress
        result = update_leitner_progress(qa_path, item_id, correct)
        return result
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
@router.get("/{unit_id}/leitner/next")
 def api_get_next_review(unit_id: str, limit: int = 5):
    """Get next Leitner review items."""
    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
    if not qa_path.exists():
        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
    try:
        from ai_processing.qa_generator import get_next_review_items
        items = get_next_review_items(qa_path, limit=limit)
        return {"items": items, "count": len(items)}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
 class StoryGeneratePayload(BaseModel):
    vocabulary: List[Dict[str, Any]]
    language: str = "en"
    grade_level: str = "5-8"
@router.post("/{unit_id}/generate-story")
 def api_generate_story(unit_id: str, payload: StoryGeneratePayload):
    """Generate a short story using vocabulary words."""
    lu = get_learning_unit(unit_id)
    if not lu:
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    try:
        from story_generator import generate_story
        result = generate_story(
            vocabulary=payload.vocabulary,
            language=payload.language,
            grade_level=payload.grade_level,
        )
        return result
    except Exception as e:
        logger.error(f"Story generation failed for {unit_id}: {e}")
        raise HTTPException(status_code=500, detail=f"Story-Generierung fehlgeschlagen: {e}")
--- a/backend-lehrer/main.py
+++ b/backend-lehrer/main.py
@@ -40,7 +40,6 @@ os.environ["DATABASE_URL"] = DATABASE_URL
 # ---------------------------------------------------------------------------
 LLM_GATEWAY_ENABLED = os.getenv("LLM_GATEWAY_ENABLED", "false").lower() == "true"
 ALERTS_AGENT_ENABLED = os.getenv("ALERTS_AGENT_ENABLED", "false").lower() == "true"
 VAST_API_KEY = os.getenv("VAST_API_KEY")
 # ---------------------------------------------------------------------------
@@ -106,21 +105,20 @@ app.include_router(correction_router, prefix="/api")
 from learning_units_api import router as learning_units_router
 app.include_router(learning_units_router, prefix="/api")
 # --- 4b. Learning Progress ---
 from progress_api import router as progress_router
 app.include_router(progress_router, prefix="/api")
 from unit_api import router as unit_router
 app.include_router(unit_router)  # Already has /api/units prefix
 from unit_analytics_api import router as unit_analytics_router
 app.include_router(unit_analytics_router)  # Already has /api/analytics prefix
 # --- 5. Meetings / Jitsi ---
 from meetings_api import router as meetings_api_router
 app.include_router(meetings_api_router)  # Already has /api/meetings prefix
 from recording_api import router as recording_api_router
 app.include_router(recording_api_router)  # Already has /api/recordings prefix
 from jitsi_api import router as jitsi_router
 app.include_router(jitsi_router)  # Already has /api/jitsi prefix
 # --- 6. Messenger ---
 from messenger_api import router as messenger_router
@@ -180,11 +178,6 @@ if ALERTS_AGENT_ENABLED:
    from alerts_agent.api import router as alerts_router
    app.include_router(alerts_router, prefix="/api", tags=["Alerts Agent"])
 # --- 14. vast.ai GPU Infrastructure (optional) ---
 if VAST_API_KEY:
    from infra.vast_power import router as vast_router
    app.include_router(vast_router, tags=["GPU Infrastructure"])
 # ---------------------------------------------------------------------------
 # Middleware (from shared middleware/ package)
--- a/backend-lehrer/meetings_api.py
+++ b/backend-lehrer/meetings_api.py
@@ -1,443 +0,0 @@
 """
 Meetings API Module
 Backend API endpoints for Jitsi Meet integration
 """
 import os
 import uuid
 import httpx
 from datetime import datetime, timedelta
 from typing import Optional, List
 from fastapi import APIRouter, HTTPException, Depends
 from pydantic import BaseModel, EmailStr
 router = APIRouter(prefix="/api/meetings", tags=["meetings"])
 # ============================================
 # Configuration
 # ============================================
 JITSI_BASE_URL = os.getenv("JITSI_PUBLIC_URL", "http://localhost:8443")
 CONSENT_SERVICE_URL = os.getenv("CONSENT_SERVICE_URL", "http://localhost:8081")
 # ============================================
 # Models
 # ============================================
 class MeetingConfig(BaseModel):
    enable_lobby: bool = True
    enable_recording: bool = False
    start_with_audio_muted: bool = True
    start_with_video_muted: bool = False
    require_display_name: bool = True
    enable_breakout: bool = False
 class CreateMeetingRequest(BaseModel):
    type: str = "quick"  # quick, scheduled, training, parent, class
    title: str = "Neues Meeting"
    duration: int = 60
    scheduled_at: Optional[str] = None
    config: Optional[MeetingConfig] = None
    description: Optional[str] = None
    invites: Optional[List[str]] = None
 class ScheduleMeetingRequest(BaseModel):
    title: str
    scheduled_at: str
    duration: int = 60
    description: Optional[str] = None
    invites: Optional[List[str]] = None
 class TrainingRequest(BaseModel):
    title: str
    description: Optional[str] = None
    scheduled_at: str
    duration: int = 120
    max_participants: int = 20
    trainer: str
    config: Optional[MeetingConfig] = None
 class ParentTeacherRequest(BaseModel):
    student_name: str
    parent_name: str
    parent_email: Optional[str] = None
    scheduled_at: str
    reason: Optional[str] = None
    send_invite: bool = True
    duration: int = 30
 class MeetingResponse(BaseModel):
    room_name: str
    join_url: str
    moderator_url: Optional[str] = None
    password: Optional[str] = None
    expires_at: Optional[str] = None
 class MeetingStats(BaseModel):
    active: int = 0
    scheduled: int = 0
    recordings: int = 0
    participants: int = 0
 class ActiveMeeting(BaseModel):
    room_name: str
    title: str
    participants: int
    started_at: str
 # ============================================
 # In-Memory Storage (for demo purposes)
 # In production, use database
 # ============================================
 scheduled_meetings = []
 active_meetings = []
 trainings = []
 recordings = []
 # ============================================
 # Helper Functions
 # ============================================
 def generate_room_name(prefix: str = "meeting") -> str:
    """Generate a unique room name"""
    return f"{prefix}-{uuid.uuid4().hex[:8]}"
 def generate_password() -> str:
    """Generate a simple password"""
    return uuid.uuid4().hex[:8]
 def build_jitsi_url(room_name: str, config: Optional[MeetingConfig] = None) -> str:
    """Build Jitsi meeting URL with config parameters"""
    params = []
    if config:
        if config.start_with_audio_muted:
            params.append("config.startWithAudioMuted=true")
        if config.start_with_video_muted:
            params.append("config.startWithVideoMuted=true")
        if config.require_display_name:
            params.append("config.requireDisplayName=true")
    # Common config
    params.extend([
        "config.prejoinPageEnabled=false",
        "config.disableDeepLinking=true",
        "config.defaultLanguage=de",
        "interfaceConfig.SHOW_JITSI_WATERMARK=false",
        "interfaceConfig.SHOW_BRAND_WATERMARK=false"
    ])
    url = f"{JITSI_BASE_URL}/{room_name}"
    if params:
        url += "#" + "&".join(params)
    return url
 async def call_consent_service(endpoint: str, method: str = "GET", data: dict = None) -> dict:
    """Call the consent service API"""
    async with httpx.AsyncClient() as client:
        url = f"{CONSENT_SERVICE_URL}{endpoint}"
        if method == "GET":
            response = await client.get(url)
        elif method == "POST":
            response = await client.post(url, json=data)
        else:
            raise ValueError(f"Unsupported method: {method}")
        if response.status_code >= 400:
            return None
        return response.json()
 # ============================================
 # API Endpoints
 # ============================================
@router.get("/stats", response_model=MeetingStats)
 async def get_meeting_stats():
    """Get meeting statistics"""
    return MeetingStats(
        active=len(active_meetings),
        scheduled=len(scheduled_meetings),
        recordings=len(recordings),
        participants=sum(m.get("participants", 0) for m in active_meetings)
    )
@router.get("/active", response_model=List[ActiveMeeting])
 async def get_active_meetings():
    """Get list of active meetings"""
    return [
        ActiveMeeting(
            room_name=m["room_name"],
            title=m["title"],
            participants=m.get("participants", 0),
            started_at=m.get("started_at", datetime.now().isoformat())
        )
        for m in active_meetings
    ]
@router.post("/create", response_model=MeetingResponse)
 async def create_meeting(request: CreateMeetingRequest):
    """Create a new meeting"""
    config = request.config or MeetingConfig()
    # Generate room name based on type
    if request.type == "quick":
        room_name = generate_room_name("quick")
    elif request.type == "training":
        room_name = generate_room_name("schulung")
    elif request.type == "parent":
        room_name = generate_room_name("elterngespraech")
    elif request.type == "class":
        room_name = generate_room_name("klasse")
    else:
        room_name = generate_room_name("meeting")
    join_url = build_jitsi_url(room_name, config)
    # Store meeting if scheduled
    if request.scheduled_at:
        scheduled_meetings.append({
            "room_name": room_name,
            "title": request.title,
            "scheduled_at": request.scheduled_at,
            "duration": request.duration,
            "config": config.model_dump() if config else None
        })
    return MeetingResponse(
        room_name=room_name,
        join_url=join_url
    )
@router.post("/schedule", response_model=MeetingResponse)
 async def schedule_meeting(request: ScheduleMeetingRequest):
    """Schedule a new meeting"""
    room_name = generate_room_name("meeting")
    meeting = {
        "room_name": room_name,
        "title": request.title,
        "scheduled_at": request.scheduled_at,
        "duration": request.duration,
        "description": request.description,
        "invites": request.invites or []
    }
    scheduled_meetings.append(meeting)
    join_url = build_jitsi_url(room_name)
    # TODO: Send email invites if configured
    return MeetingResponse(
        room_name=room_name,
        join_url=join_url
    )
@router.post("/training", response_model=MeetingResponse)
 async def create_training(request: TrainingRequest):
    """Create a training session"""
    # Generate room name from title
    title_slug = request.title.lower().replace(" ", "-")[:20]
    room_name = f"schulung-{title_slug}-{uuid.uuid4().hex[:4]}"
    config = request.config or MeetingConfig(
        enable_lobby=True,
        enable_recording=True,
        start_with_audio_muted=True
    )
    training = {
        "room_name": room_name,
        "title": request.title,
        "description": request.description,
        "scheduled_at": request.scheduled_at,
        "duration": request.duration,
        "max_participants": request.max_participants,
        "trainer": request.trainer,
        "config": config.model_dump()
    }
    trainings.append(training)
    scheduled_meetings.append(training)
    join_url = build_jitsi_url(room_name, config)
    return MeetingResponse(
        room_name=room_name,
        join_url=join_url
    )
@router.post("/parent-teacher", response_model=MeetingResponse)
 async def create_parent_teacher_meeting(request: ParentTeacherRequest):
    """Create a parent-teacher meeting"""
    # Generate room name with student name and date
    student_slug = request.student_name.lower().replace(" ", "-")[:15]
    date_str = datetime.fromisoformat(request.scheduled_at).strftime("%Y%m%d-%H%M")
    room_name = f"elterngespraech-{student_slug}-{date_str}"
    # Generate password for security
    password = generate_password()
    config = MeetingConfig(
        enable_lobby=True,
        enable_recording=False,
        start_with_audio_muted=False
    )
    meeting = {
        "room_name": room_name,
        "title": f"Elterngespräch - {request.student_name}",
        "student_name": request.student_name,
        "parent_name": request.parent_name,
        "parent_email": request.parent_email,
        "scheduled_at": request.scheduled_at,
        "duration": request.duration,
        "reason": request.reason,
        "password": password,
        "config": config.model_dump()
    }
    scheduled_meetings.append(meeting)
    join_url = build_jitsi_url(room_name, config)
    # TODO: Send email invite to parents if configured
    return MeetingResponse(
        room_name=room_name,
        join_url=join_url,
        password=password
    )
@router.get("/scheduled")
 async def get_scheduled_meetings():
    """Get all scheduled meetings"""
    return scheduled_meetings
@router.get("/trainings")
 async def get_trainings():
    """Get all training sessions"""
    return trainings
@router.delete("/{room_name}")
 async def delete_meeting(room_name: str):
    """Delete a scheduled meeting"""
    # Find and remove the meeting (in-place modification)
    for i, m in enumerate(scheduled_meetings):
        if m["room_name"] == room_name:
            scheduled_meetings.pop(i)
            break
    return {"status": "deleted"}
 # ============================================
 # Recording Endpoints
 # ============================================
@router.get("/recordings")
 async def get_recordings():
    """Get list of recordings"""
    # Demo data
    return [
        {
            "id": "docker-basics",
            "title": "Docker Grundlagen Schulung",
            "date": "2025-12-10T10:00:00",
            "duration": "1:30:00",
            "size_mb": 156,
            "participants": 15
        },
        {
            "id": "team-kw49",
            "title": "Team-Meeting KW 49",
            "date": "2025-12-06T14:00:00",
            "duration": "1:00:00",
            "size_mb": 98,
            "participants": 8
        },
        {
            "id": "parent-mueller",
            "title": "Elterngespräch - Max Müller",
            "date": "2025-12-02T16:00:00",
            "duration": "0:28:00",
            "size_mb": 42,
            "participants": 2
        }
    ]
@router.get("/recordings/{recording_id}")
 async def get_recording(recording_id: str):
    """Get recording details"""
    return {
        "id": recording_id,
        "title": "Recording " + recording_id,
        "date": "2025-12-10T10:00:00",
        "duration": "1:30:00",
        "size_mb": 156,
        "download_url": f"/api/recordings/{recording_id}/download"
    }
@router.get("/recordings/{recording_id}/download")
 async def download_recording(recording_id: str):
    """Download a recording"""
    # In production, this would stream the actual file
    raise HTTPException(status_code=404, detail="Recording file not found (demo mode)")
@router.delete("/recordings/{recording_id}")
 async def delete_recording(recording_id: str):
    """Delete a recording"""
    return {"status": "deleted", "id": recording_id}
 # ============================================
 # Health Check
 # ============================================
@router.get("/health")
 async def health_check():
    """Check meetings service health"""
    # Check Jitsi availability
    jitsi_healthy = False
    try:
        async with httpx.AsyncClient(timeout=5.0) as client:
            response = await client.get(JITSI_BASE_URL)
            jitsi_healthy = response.status_code == 200
    except Exception:
        pass
    return {
        "status": "healthy" if jitsi_healthy else "degraded",
        "jitsi_url": JITSI_BASE_URL,
        "jitsi_available": jitsi_healthy,
        "scheduled_meetings": len(scheduled_meetings),
        "active_meetings": len(active_meetings)
    }
--- a/backend-lehrer/progress_api.py
+++ b/backend-lehrer/progress_api.py
@@ -0,0 +1,131 @@
 """
 Progress API — Tracks student learning progress per unit.
 Stores coins, crowns, streak data, and exercise completion stats.
 Uses JSON file storage (same pattern as learning_units.py).
 """
 import os
 import json
 import logging
 from datetime import datetime, date
 from typing import Dict, Any, Optional, List
 from pathlib import Path
 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel
 logger = logging.getLogger(__name__)
 router = APIRouter(
    prefix="/progress",
    tags=["progress"],
 )
 PROGRESS_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten/progress")
 def _ensure_dir():
    os.makedirs(PROGRESS_DIR, exist_ok=True)
 def _progress_path(unit_id: str) -> Path:
    return Path(PROGRESS_DIR) / f"{unit_id}.json"
 def _load_progress(unit_id: str) -> Dict[str, Any]:
    path = _progress_path(unit_id)
    if path.exists():
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    return {
        "unit_id": unit_id,
        "coins": 0,
        "crowns": 0,
        "streak_days": 0,
        "last_activity": None,
        "exercises": {
            "flashcards": {"completed": 0, "correct": 0, "incorrect": 0},
            "quiz": {"completed": 0, "correct": 0, "incorrect": 0},
            "type": {"completed": 0, "correct": 0, "incorrect": 0},
            "story": {"generated": 0},
        },
        "created_at": datetime.now().isoformat(),
    }
 def _save_progress(unit_id: str, data: Dict[str, Any]):
    _ensure_dir()
    path = _progress_path(unit_id)
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
 class RewardPayload(BaseModel):
    exercise_type: str  # flashcards, quiz, type, story
    correct: bool = True
    first_try: bool = True
@router.get("/{unit_id}")
 def get_progress(unit_id: str):
    """Get learning progress for a unit."""
    return _load_progress(unit_id)
@router.post("/{unit_id}/reward")
 def add_reward(unit_id: str, payload: RewardPayload):
    """Record an exercise result and award coins."""
    progress = _load_progress(unit_id)
    # Update exercise stats
    ex = progress["exercises"].get(payload.exercise_type, {"completed": 0, "correct": 0, "incorrect": 0})
    ex["completed"] = ex.get("completed", 0) + 1
    if payload.correct:
        ex["correct"] = ex.get("correct", 0) + 1
    else:
        ex["incorrect"] = ex.get("incorrect", 0) + 1
    progress["exercises"][payload.exercise_type] = ex
    # Award coins
    if payload.correct:
        coins = 3 if payload.first_try else 1
    else:
        coins = 0
    progress["coins"] = progress.get("coins", 0) + coins
    # Update streak
    today = date.today().isoformat()
    last = progress.get("last_activity")
    if last != today:
        if last == (date.today().replace(day=date.today().day - 1)).isoformat() if date.today().day > 1 else None:
            progress["streak_days"] = progress.get("streak_days", 0) + 1
        elif last != today:
            progress["streak_days"] = 1
        progress["last_activity"] = today
    # Award crowns for milestones
    total_correct = sum(
        e.get("correct", 0) for e in progress["exercises"].values() if isinstance(e, dict)
    )
    progress["crowns"] = total_correct // 20  # 1 crown per 20 correct answers
    _save_progress(unit_id, progress)
    return {
        "coins_awarded": coins,
        "total_coins": progress["coins"],
        "crowns": progress["crowns"],
        "streak_days": progress["streak_days"],
    }
@router.get("/")
 def list_all_progress():
    """List progress for all units."""
    _ensure_dir()
    results = []
    for f in Path(PROGRESS_DIR).glob("*.json"):
        with open(f, "r", encoding="utf-8") as fh:
            results.append(json.load(fh))
    return results
--- a/backend-lehrer/story_generator.py
+++ b/backend-lehrer/story_generator.py
@@ -0,0 +1,108 @@
 """
 Story Generator — Creates short stories using vocabulary words.
 Generates age-appropriate mini-stories (3-5 sentences) that incorporate
 the given vocabulary words, marked with <mark> tags for highlighting.
 Uses Ollama (local LLM) for generation.
 """
 import os
 import json
 import logging
 import requests
 from typing import List, Dict, Any, Optional
 logger = logging.getLogger(__name__)
 OLLAMA_URL = os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434")
 STORY_MODEL = os.getenv("STORY_MODEL", "llama3.1:8b")
 def generate_story(
    vocabulary: List[Dict[str, str]],
    language: str = "en",
    grade_level: str = "5-8",
    max_words: int = 5,
 ) -> Dict[str, Any]:
    """
    Generate a short story incorporating vocabulary words.
    Args:
        vocabulary: List of dicts with 'english' and 'german' keys
        language: 'en' for English story, 'de' for German story
        grade_level: Target grade level
        max_words: Maximum vocab words to include (to keep story short)
    Returns:
        Dict with 'story_html', 'story_text', 'vocab_used', 'language'
    """
    # Select subset of vocabulary
    words = vocabulary[:max_words]
    word_list = [w.get("english", "") if language == "en" else w.get("german", "") for w in words]
    word_list = [w for w in word_list if w.strip()]
    if not word_list:
        return {"story_html": "", "story_text": "", "vocab_used": [], "language": language}
    lang_name = "English" if language == "en" else "German"
    words_str = ", ".join(word_list)
    prompt = f"""Write a short story (3-5 sentences) in {lang_name} for a grade {grade_level} student.
 The story MUST use these vocabulary words: {words_str}
 Rules:
 1. The story should be fun and age-appropriate
 2. Each vocabulary word must appear at least once
 3. Keep sentences simple and clear
 4. The story should make sense and be engaging
 Write ONLY the story, nothing else. No title, no introduction."""
    try:
        resp = requests.post(
            f"{OLLAMA_URL}/api/generate",
            json={
                "model": STORY_MODEL,
                "prompt": prompt,
                "stream": False,
                "options": {"temperature": 0.8, "num_predict": 300},
            },
            timeout=30,
        )
        resp.raise_for_status()
        story_text = resp.json().get("response", "").strip()
    except Exception as e:
        logger.error(f"Story generation failed: {e}")
        # Fallback: simple template story
        story_text = _fallback_story(word_list, language)
    # Mark vocabulary words in the story
    story_html = story_text
    vocab_found = []
    for word in word_list:
        if word.lower() in story_html.lower():
            # Case-insensitive replacement preserving original case
            import re
            pattern = re.compile(re.escape(word), re.IGNORECASE)
            story_html = pattern.sub(
                lambda m: f'<mark class="vocab-highlight">{m.group()}</mark>',
                story_html,
                count=1,
            )
            vocab_found.append(word)
    return {
        "story_html": story_html,
        "story_text": story_text,
        "vocab_used": vocab_found,
        "vocab_total": len(word_list),
        "language": language,
    }
 def _fallback_story(words: List[str], language: str) -> str:
    """Simple fallback when LLM is unavailable."""
    if language == "de":
        return f"Heute habe ich neue Woerter gelernt: {', '.join(words)}. Es war ein guter Tag zum Lernen."
    return f"Today I learned new words: {', '.join(words)}. It was a great day for learning."
--- a/docs-src/services/klausur-service/RAG-Landkarte.md
+++ b/docs-src/services/klausur-service/RAG-Landkarte.md
@@ -0,0 +1,204 @@
 # RAG Landkarte — Branchen-Regulierungs-Matrix
 ## Uebersicht
 Die RAG Landkarte zeigt eine interaktive Matrix aller 320 Compliance-Dokumente im RAG-System, gruppiert nach Dokumenttyp und zugeordnet zu 10 Industriebranchen.
 **URL**: `https://macmini:3002/ai/rag` → Tab "Landkarte"
 **Letzte Aktualisierung**: 2026-04-15
 ## Architektur
 ```
 rag-documents.json          ← Zentrale Datendatei (320 Dokumente)
    ├── doc_types[]          ← 17 Dokumenttypen (EU-VO, DE-Gesetz, etc.)
    ├── industries[]         ← 10 Branchen (VDMA/VDA/BDI)
    └── documents[]          ← Alle Dokumente mit Branchen-Mapping
         ├── code            ← Eindeutiger Identifier
         ├── name            ← Anzeigename
         ├── doc_type        ← Verweis auf doc_types.id
         ├── industries[]    ← ["all"] oder ["automotive", "chemie", ...]
         ├── in_rag          ← true (alle im RAG)
         ├── rag_collection  ← Qdrant Collection Name
         ├── description?    ← Beschreibung (fuer ~100 Hauptregulierungen)
         ├── applicability_note?  ← Begruendung der Branchenzuordnung
         └── effective_date? ← Gueltigkeitsdatum
 rag-constants.ts            ← RAG-Metadaten (Chunks, Qdrant-IDs)
 page.tsx                    ← Frontend (importiert aus JSON)
 ```
 ## Dateien
 | Pfad | Beschreibung |
 |------|--------------|
 | `admin-lehrer/app/(admin)/ai/rag/rag-documents.json` | Alle 320 Dokumente mit Branchen-Mapping |
 | `admin-lehrer/app/(admin)/ai/rag/rag-constants.ts` | REGULATIONS_IN_RAG (Chunk-Counts, Qdrant-IDs) |
 | `admin-lehrer/app/(admin)/ai/rag/page.tsx` | Frontend-Rendering |
 | `admin-lehrer/app/(admin)/ai/rag/__tests__/rag-documents.test.ts` | 44 Tests fuer JSON-Validierung |
 ## Branchen (10 Industriesektoren)
 Die Branchen orientieren sich an den Mitgliedsverbaenden von VDMA, VDA und BDI:
 | ID | Branche | Icon | Typische Kunden |
 |----|---------|------|-----------------|
 | `automotive` | Automobilindustrie | 🚗 | OEMs, Tier-1/2 Zulieferer |
 | `maschinenbau` | Maschinen- & Anlagenbau | ⚙️ | Werkzeugmaschinen, Automatisierung |
 | `elektrotechnik` | Elektro- & Digitalindustrie | ⚡ | Embedded Systems, Steuerungstechnik |
 | `chemie` | Chemie- & Prozessindustrie | 🧪 | Grundstoffchemie, Spezialchemie |
 | `metall` | Metallindustrie | 🔩 | Stahl, Aluminium, Metallverarbeitung |
 | `energie` | Energie & Versorgung | 🔋 | Energieerzeugung, Netzbetreiber |
 | `transport` | Transport & Logistik | 🚚 | Gueterverkehr, Schiene, Luftfahrt |
 | `handel` | Handel | 🏪 | Einzel-/Grosshandel, E-Commerce |
 | `konsumgueter` | Konsumgueter & Lebensmittel | 📦 | FMCG, Lebensmittel, Verpackung |
 | `bau` | Bauwirtschaft | 🏗️ | Hoch-/Tiefbau, Gebaeudeautomation |
 !!! warning "Keine Pseudo-Branchen"
    Es werden bewusst **keine** Querschnittsthemen wie IoT, KI, HR, KRITIS oder E-Commerce als "Branchen" gefuehrt. Diese sind Technologien, Abteilungen oder Klassifizierungen — keine Wirtschaftssektoren.
 ## Zuordnungslogik
 ### Drei Ebenen
 | Ebene | `industries` Wert | Anzahl | Beispiele |
 |-------|-------------------|--------|-----------|
 | **Horizontal** | `["all"]` | 264 | DSGVO, AI Act, CRA, NIS2, BetrVG |
 | **Sektorspezifisch** | `["automotive", "chemie", ...]` | 42 | Maschinenverordnung, ElektroG, BattDG |
 | **Nicht zutreffend** | `[]` | 14 | DORA, MiCA, EHDS, DSA |
 ### Horizontal (alle Branchen)
 Regulierungen die **branchenuebergreifend** gelten:
 - **Datenschutz**: DSGVO, BDSG, ePrivacy, TDDDG, SCC, DPF
 - **KI**: AI Act (jedes Unternehmen das KI einsetzt)
 - **Cybersecurity**: CRA (jedes Produkt mit digitalen Elementen), NIS2, EUCSA
 - **Produktsicherheit**: GPSR, Produkthaftungs-RL
 - **Arbeitsrecht**: BetrVG, AGG, KSchG, ArbSchG, LkSG
 - **Handels-/Steuerrecht**: HGB, AO, UStG
 - **Software-Security**: OWASP Top 10, NIST SSDF, CISA Secure by Design
 - **Supply Chain**: CycloneDX, SPDX, SLSA (CRA verlangt SBOM)
 - **Alle Leitlinien**: EDPB, DSK, DSFA-Listen, Gerichtsurteile
 ### Sektorspezifisch
 | Regulierung | Branchen | Begruendung |
 |-------------|----------|-------------|
 | Maschinenverordnung | Maschinenbau, Automotive, Elektrotechnik, Metall, Bau | Hersteller von Maschinen und zugehoerigen Produkten |
 | ElektroG | Elektrotechnik, Automotive, Konsumgueter | Elektro-/Elektronikgeraete |
 | BattDG/BattVO | Automotive, Elektrotechnik, Energie | Batterien und Akkumulatoren |
 | VerpackG | Konsumgueter, Handel, Chemie | Verpackungspflichtige Produkte |
 | PAngV, UWG, VSBG | Handel, Konsumgueter | Verbraucherschutz im Verkauf |
 | BSI-KritisV, KRITIS-Dachgesetz | Energie, Transport, Chemie | KRITIS-Sektoren |
 | ENISA ICS/SCADA | Maschinenbau, Elektrotechnik, Automotive, Chemie, Energie, Transport | Industrielle Steuerungstechnik |
 | NIST SP 800-82 (OT) | Maschinenbau, Automotive, Elektrotechnik, Chemie, Energie, Metall | Operational Technology |
 ### Nicht zutreffend
 Dokumente die **im RAG bleiben** aber fuer keine der 10 Zielbranchen relevant sind:
 | Code | Name | Grund |
 |------|------|-------|
 | DORA | Digital Operational Resilience Act | Finanzsektor |
 | PSD2 | Zahlungsdiensterichtlinie | Zahlungsdienstleister |
 | MiCA | Markets in Crypto-Assets | Krypto-Maerkte |
 | AMLR | AML-Verordnung | Geldwaesche-Bekaempfung |
 | EHDS | Europaeischer Gesundheitsdatenraum | Gesundheitswesen |
 | DSA | Digital Services Act | Online-Plattformen |
 | DMA | Digital Markets Act | Gatekeeper-Plattformen |
 | MDR | Medizinprodukteverordnung | Medizintechnik |
 | BSI-TR-03161 | DiGA-Sicherheit (3 Teile) | Digitale Gesundheitsanwendungen |
 ## Dokumenttypen (17)
 | doc_type | Label | Anzahl | Beispiele |
 |----------|-------|--------|-----------|
 | `eu_regulation` | EU-Verordnungen | 22 | DSGVO, AI Act, CRA, DORA |
 | `eu_directive` | EU-Richtlinien | 14 | ePrivacy, NIS2, PSD2 |
 | `eu_guidance` | EU-Leitfaeden | 9 | Blue Guide, GPAI CoP |
 | `de_law` | Deutsche Gesetze | 41 | BDSG, BGB, HGB, BetrVG |
 | `at_law` | Oesterreichische Gesetze | 11 | DSG AT, ECG, KSchG |
 | `ch_law` | Schweizer Gesetze | 8 | revDSG, DSV, OR |
 | `national_law` | Nationale Datenschutzgesetze | 17 | UK DPA, LOPDGDD, UAVG |
 | `bsi_standard` | BSI Standards & TR | 4 | BSI 200-4, BSI-TR-03161 |
 | `edpb_guideline` | EDPB/WP29 Leitlinien | 50 | Consent, Controller/Processor |
 | `dsk_guidance` | DSK Orientierungshilfen | 57 | Kurzpapiere, OH Telemedien |
 | `court_decision` | Gerichtsurteile | 20 | BAG M365, BGH Planet49 |
 | `dsfa_list` | DSFA Muss-Listen | 20 | Pro Bundesland + DSK |
 | `nist_standard` | NIST Standards | 11 | CSF 2.0, SSDF, AI RMF |
 | `owasp_standard` | OWASP Standards | 6 | Top 10, ASVS, API Security |
 | `enisa_guidance` | ENISA Guidance | 6 | Supply Chain, ICS/SCADA |
 | `international` | Internationale Standards | 7 | CVSS, CycloneDX, SPDX |
 | `legal_template` | Vorlagen & Muster | 17 | GitHub Policies, VVT-Muster |
 ## Integration in andere Projekte
 ### JSON importieren
 ```typescript
 import ragData from './rag-documents.json'
 const documents = ragData.documents    // 320 Dokumente
 const docTypes = ragData.doc_types     // 17 Kategorien
 const industries = ragData.industries  // 10 Branchen
 ```
 ### Matrix-Logik
 ```typescript
 // Pruefen ob Dokument fuer Branche gilt
 const applies = (doc, industryId) =>
  doc.industries.includes(industryId) || doc.industries.includes('all')
 // Dokumente nach Typ gruppieren
 const grouped = Object.groupBy(documents, d => d.doc_type)
 // Nur sektorspezifische Dokumente fuer eine Branche
 const forAutomotive = documents.filter(d =>
  d.industries.includes('automotive') && !d.industries.includes('all')
 )
 ```
 ### RAG-Status pruefen
 ```typescript
 import { REGULATIONS_IN_RAG } from './rag-constants'
 const isInRag = (code: string) => code in REGULATIONS_IN_RAG
 const chunks = REGULATIONS_IN_RAG['GDPR']?.chunks  // 423
 ```
 ## Datenquellen
 | Quelle | Pfad | Beschreibung |
 |--------|------|--------------|
 | RAG-Inventar | `~/Desktop/RAG-Dokumenten-Inventar.md` | 386 Quelldateien |
 | rag-documents.json | `admin-lehrer/.../rag/rag-documents.json` | 320 konsolidierte Dokumente |
 | rag-constants.ts | `admin-lehrer/.../rag/rag-constants.ts` | Qdrant-Metadaten |
 ## Tests
 ```bash
 cd admin-lehrer
 npx vitest run app/\(admin\)/ai/rag/__tests__/rag-documents.test.ts
 ```
 44 Tests validieren:
 - JSON-Struktur (doc_types, industries, documents)
 - 10 echte Branchen (keine Pseudo-Branchen)
 - Pflichtfelder und gueltige Referenzen
 - Horizontale Regulierungen (DSGVO, AI Act, CRA → "all")
 - Sektorspezifische Zuordnungen (Maschinenverordnung, ElektroG)
 - Nicht zutreffende Regulierungen (DORA, MiCA → leer)
 - Applicability Notes vorhanden und korrekt
 ## Aenderungshistorie
 | Datum | Aenderung |
 |-------|-----------|
 | 2026-04-15 | Initiale Implementierung: 320 Dokumente, 10 Branchen, 17 Typen |
 | 2026-04-15 | Branchen-Review: OWASP/SBOM → alle, BSI-TR-03161 → leer |
 | 2026-04-15 | Applicability Notes UI: Aufklappbare Erklaerungen pro Dokument |
--- a/edu-search-service/scripts/vast_ai_extractor.py
+++ b/edu-search-service/scripts/vast_ai_extractor.py
@@ -1,320 +0,0 @@
 #!/usr/bin/env python3
 """
 vast.ai Profile Extractor Script
 Dieses Skript läuft auf vast.ai und extrahiert Profildaten von Universitäts-Webseiten.
 Verwendung auf vast.ai:
 1. Lade dieses Skript auf deine vast.ai Instanz
 2. Installiere Abhängigkeiten: pip install requests beautifulsoup4 openai
 3. Setze Umgebungsvariablen:
   - BREAKPILOT_API_URL=http://deine-ip:8086
   - BREAKPILOT_API_KEY=dev-key
   - OPENAI_API_KEY=sk-...
 4. Starte: python vast_ai_extractor.py
 """
 import os
 import sys
 import json
 import time
 import logging
 import requests
 from bs4 import BeautifulSoup
 from typing import Optional, Dict, Any, List
 # Logging Setup
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
 )
 logger = logging.getLogger(__name__)
 # Configuration
 API_URL = os.environ.get('BREAKPILOT_API_URL', 'http://localhost:8086')
 API_KEY = os.environ.get('BREAKPILOT_API_KEY', 'dev-key')
 OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', '')
 BATCH_SIZE = 10
 SLEEP_BETWEEN_REQUESTS = 1  # Sekunden zwischen Requests (respektiere rate limits)
 def fetch_pending_profiles(limit: int = 50) -> List[Dict]:
    """Hole Profile die noch extrahiert werden müssen."""
    try:
        response = requests.get(
            f"{API_URL}/api/v1/ai/extraction/pending",
            params={"limit": limit},
            headers={"Authorization": f"Bearer {API_KEY}"},
            timeout=30
        )
        response.raise_for_status()
        data = response.json()
        return data.get("tasks", [])
    except Exception as e:
        logger.error(f"Fehler beim Abrufen der Profile: {e}")
        return []
 def fetch_profile_page(url: str) -> Optional[str]:
    """Lade den HTML-Inhalt einer Profilseite."""
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (compatible; BreakPilot-Crawler/1.0; +https://breakpilot.de)',
            'Accept': 'text/html,application/xhtml+xml',
            'Accept-Language': 'de-DE,de;q=0.9,en;q=0.8',
        }
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        return response.text
    except Exception as e:
        logger.error(f"Fehler beim Laden von {url}: {e}")
        return None
 def extract_with_beautifulsoup(html: str, url: str) -> Dict[str, Any]:
    """Extrahiere Basis-Informationen mit BeautifulSoup (ohne AI)."""
    soup = BeautifulSoup(html, 'html.parser')
    data = {}
    # Email suchen
    email_links = soup.find_all('a', href=lambda x: x and x.startswith('mailto:'))
    if email_links:
        email = email_links[0]['href'].replace('mailto:', '').split('?')[0]
        data['email'] = email
    # Telefon suchen
    phone_links = soup.find_all('a', href=lambda x: x and x.startswith('tel:'))
    if phone_links:
        data['phone'] = phone_links[0]['href'].replace('tel:', '')
    # ORCID suchen
    orcid_links = soup.find_all('a', href=lambda x: x and 'orcid.org' in x)
    if orcid_links:
        orcid = orcid_links[0]['href']
        # Extrahiere ORCID ID
        if '/' in orcid:
            data['orcid'] = orcid.split('/')[-1]
    # Google Scholar suchen
    scholar_links = soup.find_all('a', href=lambda x: x and 'scholar.google' in x)
    if scholar_links:
        href = scholar_links[0]['href']
        if 'user=' in href:
            data['google_scholar_id'] = href.split('user=')[1].split('&')[0]
    # ResearchGate suchen
    rg_links = soup.find_all('a', href=lambda x: x and 'researchgate.net' in x)
    if rg_links:
        data['researchgate_url'] = rg_links[0]['href']
    # LinkedIn suchen
    linkedin_links = soup.find_all('a', href=lambda x: x and 'linkedin.com' in x)
    if linkedin_links:
        data['linkedin_url'] = linkedin_links[0]['href']
    # Institut/Abteilung Links sammeln (für Hierarchie-Erkennung)
    base_domain = '/'.join(url.split('/')[:3])
    department_links = []
    for link in soup.find_all('a', href=True):
        href = link['href']
        text = link.get_text(strip=True)
        # Suche nach Links die auf Institute/Fakultäten hindeuten
        if any(kw in text.lower() for kw in ['institut', 'fakultät', 'fachbereich', 'abteilung', 'lehrstuhl']):
            if href.startswith('/'):
                href = base_domain + href
            if href.startswith('http'):
                department_links.append({'url': href, 'name': text})
    if department_links:
        # Nimm den ersten gefundenen Department-Link
        data['department_url'] = department_links[0]['url']
        data['department_name'] = department_links[0]['name']
    return data
 def extract_with_ai(html: str, url: str, full_name: str) -> Dict[str, Any]:
    """Extrahiere strukturierte Daten mit OpenAI GPT."""
    if not OPENAI_API_KEY:
        logger.warning("Kein OPENAI_API_KEY gesetzt - nutze nur BeautifulSoup")
        return extract_with_beautifulsoup(html, url)
    try:
        import openai
        client = openai.OpenAI(api_key=OPENAI_API_KEY)
        # Reduziere HTML auf relevanten Text
        soup = BeautifulSoup(html, 'html.parser')
        # Entferne Scripts, Styles, etc.
        for tag in soup(['script', 'style', 'nav', 'footer', 'header']):
            tag.decompose()
        # Extrahiere Text
        text = soup.get_text(separator='\n', strip=True)
        # Limitiere auf 8000 Zeichen für API
        text = text[:8000]
        prompt = f"""Analysiere diese Universitäts-Profilseite für {full_name} und extrahiere folgende Informationen im JSON-Format:
 {{
  "email": "email@uni.de oder null",
  "phone": "Telefonnummer oder null",
  "office": "Raum/Büro oder null",
  "position": "Position/Titel (z.B. Wissenschaftlicher Mitarbeiter, Professorin) oder null",
  "department_name": "Name des Instituts/der Abteilung oder null",
  "research_interests": ["Liste", "der", "Forschungsthemen"] oder [],
  "teaching_topics": ["Liste", "der", "Lehrveranstaltungen/Fächer"] oder [],
  "supervisor_name": "Name des Vorgesetzten/Lehrstuhlinhabers falls erkennbar oder null"
 }}
 Profilseite von {url}:
 {text}
 Antworte NUR mit dem JSON-Objekt, keine Erklärungen."""
        response = client.chat.completions.create(
            model="gpt-4o-mini",  # Kostengünstig und schnell
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1,
            max_tokens=500
        )
        result_text = response.choices[0].message.content.strip()
        # Parse JSON (entferne eventuelle Markdown-Blöcke)
        if result_text.startswith('```'):
            result_text = result_text.split('```')[1]
            if result_text.startswith('json'):
                result_text = result_text[4:]
        ai_data = json.loads(result_text)
        # Kombiniere mit BeautifulSoup-Ergebnissen (für Links wie ORCID)
        bs_data = extract_with_beautifulsoup(html, url)
        # AI-Daten haben Priorität, aber BS-Daten für spezifische Links
        for key in ['orcid', 'google_scholar_id', 'researchgate_url', 'linkedin_url']:
            if key in bs_data and bs_data[key]:
                ai_data[key] = bs_data[key]
        return ai_data
    except Exception as e:
        logger.error(f"AI-Extraktion fehlgeschlagen: {e}")
        return extract_with_beautifulsoup(html, url)
 def submit_extracted_data(staff_id: str, data: Dict[str, Any]) -> bool:
    """Sende extrahierte Daten zurück an BreakPilot."""
    try:
        payload = {"staff_id": staff_id, **data}
        # Entferne None-Werte
        payload = {k: v for k, v in payload.items() if v is not None}
        response = requests.post(
            f"{API_URL}/api/v1/ai/extraction/submit",
            json=payload,
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json"
            },
            timeout=30
        )
        response.raise_for_status()
        return True
    except Exception as e:
        logger.error(f"Fehler beim Senden der Daten für {staff_id}: {e}")
        return False
 def process_profiles():
    """Hauptschleife: Hole Profile, extrahiere Daten, sende zurück."""
    logger.info(f"Starte Extraktion - API: {API_URL}")
    processed = 0
    errors = 0
    while True:
        # Hole neue Profile
        profiles = fetch_pending_profiles(limit=BATCH_SIZE)
        if not profiles:
            logger.info("Keine weiteren Profile zum Verarbeiten. Warte 60 Sekunden...")
            time.sleep(60)
            continue
        logger.info(f"Verarbeite {len(profiles)} Profile...")
        for profile in profiles:
            staff_id = profile['staff_id']
            url = profile['profile_url']
            full_name = profile.get('full_name', 'Unbekannt')
            logger.info(f"Verarbeite: {full_name} - {url}")
            # Lade Profilseite
            html = fetch_profile_page(url)
            if not html:
                errors += 1
                continue
            # Extrahiere Daten
            extracted = extract_with_ai(html, url, full_name)
            if extracted:
                # Sende zurück
                if submit_extracted_data(staff_id, extracted):
                    processed += 1
                    logger.info(f"Erfolgreich: {full_name} - Email: {extracted.get('email', 'N/A')}")
                else:
                    errors += 1
            else:
                errors += 1
            # Rate limiting
            time.sleep(SLEEP_BETWEEN_REQUESTS)
        logger.info(f"Batch abgeschlossen. Gesamt: {processed} erfolgreich, {errors} Fehler")
 def main():
    """Einstiegspunkt."""
    logger.info("=" * 60)
    logger.info("BreakPilot vast.ai Profile Extractor")
    logger.info("=" * 60)
    # Prüfe Konfiguration
    if not API_KEY:
        logger.error("BREAKPILOT_API_KEY nicht gesetzt!")
        sys.exit(1)
    if not OPENAI_API_KEY:
        logger.warning("OPENAI_API_KEY nicht gesetzt - nutze nur BeautifulSoup-Extraktion")
    # Teste Verbindung
    try:
        response = requests.get(
            f"{API_URL}/v1/health",
            headers={"Authorization": f"Bearer {API_KEY}"},
            timeout=10
        )
        logger.info(f"API-Verbindung OK: {response.status_code}")
    except Exception as e:
        logger.error(f"Kann API nicht erreichen: {e}")
        logger.error(f"Stelle sicher dass {API_URL} erreichbar ist!")
        sys.exit(1)
    # Starte Verarbeitung
    try:
        process_profiles()
    except KeyboardInterrupt:
        logger.info("Beendet durch Benutzer")
    except Exception as e:
        logger.error(f"Unerwarteter Fehler: {e}")
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/klausur-service/backend/cv_box_layout.py
+++ b/klausur-service/backend/cv_box_layout.py
@@ -0,0 +1,339 @@
 """
 Box layout classifier — detects internal layout type of embedded boxes.
 Classifies each box as: flowing | columnar | bullet_list | header_only
 and provides layout-appropriate grid building.
 Used by the Box-Grid-Review step to rebuild box zones with correct structure.
 """
 import logging
 import re
 import statistics
 from typing import Any, Dict, List, Optional, Tuple
 logger = logging.getLogger(__name__)
 # Bullet / list-item patterns at the start of a line
 _BULLET_RE = re.compile(
    r'^[\-\u2022\u2013\u2014\u25CF\u25CB\u25AA\u25A0•·]\s'  # dash, bullet chars
    r'|^\d{1,2}[.)]\s'     # numbered: "1) " or "1. "
    r'|^[a-z][.)]\s'       # lettered: "a) " or "a. "
 )
 def classify_box_layout(
    words: List[Dict],
    box_w: int,
    box_h: int,
 ) -> str:
    """Classify the internal layout of a detected box.
    Args:
        words: OCR word dicts within the box (with top, left, width, height, text)
        box_w: Box width in pixels
        box_h: Box height in pixels
    Returns:
        'header_only' | 'bullet_list' | 'columnar' | 'flowing'
    """
    if not words:
        return "header_only"
    # Group words into lines by y-proximity
    lines = _group_into_lines(words)
    # Header only: very few words or single line
    total_words = sum(len(line) for line in lines)
    if total_words <= 5 or len(lines) <= 1:
        return "header_only"
    # Bullet list: check if majority of lines start with bullet patterns
    bullet_count = 0
    for line in lines:
        first_text = line[0].get("text", "") if line else ""
        if _BULLET_RE.match(first_text):
            bullet_count += 1
        # Also check if first word IS a bullet char
        elif first_text.strip() in ("-", "–", "—", "•", "·", "▪", "▸"):
            bullet_count += 1
    if bullet_count >= len(lines) * 0.4 and bullet_count >= 2:
        return "bullet_list"
    # Columnar: check for multiple distinct x-clusters
    if len(lines) >= 3 and _has_column_structure(words, box_w):
        return "columnar"
    # Default: flowing text
    return "flowing"
 def _group_into_lines(words: List[Dict]) -> List[List[Dict]]:
    """Group words into lines by y-proximity."""
    if not words:
        return []
    sorted_words = sorted(words, key=lambda w: (w["top"], w["left"]))
    heights = [w["height"] for w in sorted_words if w.get("height", 0) > 0]
    median_h = statistics.median(heights) if heights else 20
    y_tolerance = max(median_h * 0.5, 5)
    lines: List[List[Dict]] = []
    current_line: List[Dict] = [sorted_words[0]]
    current_y = sorted_words[0]["top"]
    for w in sorted_words[1:]:
        if abs(w["top"] - current_y) <= y_tolerance:
            current_line.append(w)
        else:
            lines.append(sorted(current_line, key=lambda ww: ww["left"]))
            current_line = [w]
            current_y = w["top"]
    if current_line:
        lines.append(sorted(current_line, key=lambda ww: ww["left"]))
    return lines
 def _has_column_structure(words: List[Dict], box_w: int) -> bool:
    """Check if words have multiple distinct left-edge clusters (columns)."""
    if box_w <= 0:
        return False
    lines = _group_into_lines(words)
    if len(lines) < 3:
        return False
    # Collect left-edges of non-first words in each line
    # (first word of each line often aligns regardless of columns)
    left_edges = []
    for line in lines:
        for w in line[1:]:  # skip first word
            left_edges.append(w["left"])
    if len(left_edges) < 4:
        return False
    # Check if left edges cluster into 2+ distinct groups
    left_edges.sort()
    gaps = [left_edges[i + 1] - left_edges[i] for i in range(len(left_edges) - 1)]
    if not gaps:
        return False
    median_gap = statistics.median(gaps)
    # A column gap is typically > 15% of box width
    column_gap_threshold = box_w * 0.15
    large_gaps = [g for g in gaps if g > column_gap_threshold]
    return len(large_gaps) >= 1
 def build_box_zone_grid(
    zone_words: List[Dict],
    box_x: int,
    box_y: int,
    box_w: int,
    box_h: int,
    zone_index: int,
    img_w: int,
    img_h: int,
    layout_type: Optional[str] = None,
 ) -> Dict[str, Any]:
    """Build a grid for a box zone with layout-aware processing.
    If layout_type is None, auto-detects it.
    For 'flowing' and 'bullet_list', forces single-column layout.
    For 'columnar', uses the standard multi-column detection.
    For 'header_only', creates a single cell.
    Returns the same format as _build_zone_grid (columns, rows, cells, header_rows).
    """
    from grid_editor_helpers import _build_zone_grid, _cluster_rows
    if not zone_words:
        return {
            "columns": [],
            "rows": [],
            "cells": [],
            "header_rows": [],
            "box_layout_type": layout_type or "header_only",
            "box_grid_reviewed": False,
        }
    # Auto-detect layout if not specified
    if not layout_type:
        layout_type = classify_box_layout(zone_words, box_w, box_h)
    logger.info(
        "Box zone %d: layout_type=%s, %d words, %dx%d",
        zone_index, layout_type, len(zone_words), box_w, box_h,
    )
    if layout_type == "header_only":
        # Single cell with all text concatenated
        all_text = " ".join(
            w.get("text", "") for w in sorted(zone_words, key=lambda ww: (ww["top"], ww["left"]))
        ).strip()
        return {
            "columns": [{"col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
                         "x_min_px": box_x, "x_max_px": box_x + box_w,
                         "x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
                         "x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
                         "bold": False}],
            "rows": [{"index": 0, "row_index": 0,
                       "y_min": box_y, "y_max": box_y + box_h, "y_center": box_y + box_h / 2,
                       "y_min_px": box_y, "y_max_px": box_y + box_h,
                       "y_min_pct": round(box_y / img_h * 100, 2) if img_h else 0,
                       "y_max_pct": round((box_y + box_h) / img_h * 100, 2) if img_h else 0,
                       "is_header": True}],
            "cells": [{
                "cell_id": f"Z{zone_index}_R0C0",
                "row_index": 0,
                "col_index": 0,
                "col_type": "column_1",
                "text": all_text,
                "word_boxes": zone_words,
            }],
            "header_rows": [0],
            "box_layout_type": layout_type,
            "box_grid_reviewed": False,
        }
    if layout_type in ("flowing", "bullet_list"):
        # Force single column — each line becomes one row with one cell.
        # Detect bullet structure from indentation and merge continuation
        # lines into the bullet they belong to.
        lines = _group_into_lines(zone_words)
        column = {
            "col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
            "x_min_px": box_x, "x_max_px": box_x + box_w,
            "x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
            "x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
            "bold": False,
        }
        # --- Detect indentation levels ---
        line_indents = []
        for line_words in lines:
            if not line_words:
                line_indents.append(0)
                continue
            min_left = min(w["left"] for w in line_words)
            line_indents.append(min_left - box_x)
        # Find the minimum indent (= bullet/main level)
        valid_indents = [ind for ind in line_indents if ind >= 0]
        min_indent = min(valid_indents) if valid_indents else 0
        # Indentation threshold: lines indented > 15px more than minimum
        # are continuation lines belonging to the previous bullet
        INDENT_THRESHOLD = 15
        # --- Group lines into logical items (bullet + continuations) ---
        # Each item is a list of line indices
        items: List[List[int]] = []
        for li, indent in enumerate(line_indents):
            is_continuation = (indent > min_indent + INDENT_THRESHOLD) and len(items) > 0
            if is_continuation:
                items[-1].append(li)
            else:
                items.append([li])
        logger.info(
            "Box zone %d flowing: %d lines → %d items (indents=%s, min=%d, threshold=%d)",
            zone_index, len(lines), len(items),
            [int(i) for i in line_indents], int(min_indent), INDENT_THRESHOLD,
        )
        # --- Build rows and cells from grouped items ---
        rows = []
        cells = []
        header_rows = []
        for row_idx, item_line_indices in enumerate(items):
            # Collect all words from all lines in this item
            item_words = []
            item_texts = []
            for li in item_line_indices:
                if li < len(lines):
                    item_words.extend(lines[li])
                    line_text = " ".join(w.get("text", "") for w in lines[li]).strip()
                    if line_text:
                        item_texts.append(line_text)
            if not item_words:
                continue
            y_min = min(w["top"] for w in item_words)
            y_max = max(w["top"] + w["height"] for w in item_words)
            y_center = (y_min + y_max) / 2
            row = {
                "index": row_idx,
                "row_index": row_idx,
                "y_min": y_min,
                "y_max": y_max,
                "y_center": y_center,
                "y_min_px": y_min,
                "y_max_px": y_max,
                "y_min_pct": round(y_min / img_h * 100, 2) if img_h else 0,
                "y_max_pct": round(y_max / img_h * 100, 2) if img_h else 0,
                "is_header": False,
            }
            rows.append(row)
            # Join multi-line text with newline for display
            merged_text = "\n".join(item_texts)
            # Add bullet marker if this is a bullet item without one
            first_text = item_texts[0] if item_texts else ""
            is_bullet = len(item_line_indices) > 1 or _BULLET_RE.match(first_text)
            if is_bullet and not _BULLET_RE.match(first_text) and row_idx > 0:
                # Continuation item without bullet — add one
                merged_text = "• " + merged_text
            cell = {
                "cell_id": f"Z{zone_index}_R{row_idx}C0",
                "row_index": row_idx,
                "col_index": 0,
                "col_type": "column_1",
                "text": merged_text,
                "word_boxes": item_words,
            }
            cells.append(cell)
        # Detect header: first item if it has no continuation lines and is short
        if len(items) >= 2:
            first_item_texts = []
            for li in items[0]:
                if li < len(lines):
                    first_item_texts.append(" ".join(w.get("text", "") for w in lines[li]).strip())
            first_text = " ".join(first_item_texts)
            if (len(first_text) < 40
                    or first_text.isupper()
                    or first_text.rstrip().endswith(':')):
                header_rows = [0]
        return {
            "columns": [column],
            "rows": rows,
            "cells": cells,
            "header_rows": header_rows,
            "box_layout_type": layout_type,
            "box_grid_reviewed": False,
        }
    # Columnar: use standard grid builder with independent column detection
    result = _build_zone_grid(
        zone_words, box_x, box_y, box_w, box_h,
        zone_index, img_w, img_h,
        global_columns=None,  # detect columns independently
    )
    # Colspan detection is now handled generically by _detect_colspan_cells
    # in grid_editor_helpers.py (called inside _build_zone_grid).
    result["box_layout_type"] = layout_type
    result["box_grid_reviewed"] = False
    return result
--- a/klausur-service/backend/cv_cell_grid.py
+++ b/klausur-service/backend/cv_cell_grid.py
@@ -1447,6 +1447,90 @@ def _merge_phonetic_continuation_rows(
    return merged
 def _merge_wrapped_rows(
    entries: List[Dict[str, Any]],
 ) -> List[Dict[str, Any]]:
    """Merge rows where the primary column (EN) is empty — cell wrap continuation.
    In textbook vocabulary tables, columns are often narrow, so the author
    wraps text within a cell. OCR treats each physical line as a separate row.
    The key indicator: if the EN column is empty but DE/example have text,
    this row is a continuation of the previous row's cells.
    Example (original textbook has ONE row):
      Row 2: EN="take part (in)"  DE="teilnehmen (an), mitmachen"  EX="More than 200 singers took"
      Row 3: EN=""                DE="(bei)"                        EX="part in the concert."
      → Merged: EN="take part (in)" DE="teilnehmen (an), mitmachen (bei)" EX="More than 200 singers took part in the concert."
    Also handles the reverse case: DE empty but EN has text (wrap in EN column).
    """
    if len(entries) < 2:
        return entries
    merged: List[Dict[str, Any]] = []
    for entry in entries:
        en = (entry.get('english') or '').strip()
        de = (entry.get('german') or '').strip()
        ex = (entry.get('example') or '').strip()
        if not merged:
            merged.append(entry)
            continue
        prev = merged[-1]
        prev_en = (prev.get('english') or '').strip()
        prev_de = (prev.get('german') or '').strip()
        prev_ex = (prev.get('example') or '').strip()
        # Case 1: EN is empty → continuation of previous row
        # (DE or EX have text that should be appended to previous row)
        if not en and (de or ex) and prev_en:
            if de:
                if prev_de.endswith(','):
                    sep = ' '  # "Wort," + " " + "Ausdruck"
                elif prev_de.endswith(('-', '(')):
                    sep = ''   # "teil-" + "nehmen" or "(" + "bei)"
                else:
                    sep = ' '
                prev['german'] = (prev_de + sep + de).strip()
            if ex:
                sep = ' ' if prev_ex else ''
                prev['example'] = (prev_ex + sep + ex).strip()
            logger.debug(
                f"Merged wrapped row {entry.get('row_index')} into previous "
                f"(empty EN): DE={prev['german']!r}, EX={prev.get('example', '')!r}"
            )
            continue
        # Case 2: DE is empty, EN has text that looks like continuation
        # (starts with lowercase or is a parenthetical like "(bei)")
        if en and not de and prev_de:
            is_paren = en.startswith('(')
            first_alpha = next((c for c in en if c.isalpha()), '')
            starts_lower = first_alpha and first_alpha.islower()
            if (is_paren or starts_lower) and len(en.split()) < 5:
                sep = ' ' if prev_en and not prev_en.endswith((',', '-', '(')) else ''
                prev['english'] = (prev_en + sep + en).strip()
                if ex:
                    sep2 = ' ' if prev_ex else ''
                    prev['example'] = (prev_ex + sep2 + ex).strip()
                logger.debug(
                    f"Merged wrapped row {entry.get('row_index')} into previous "
                    f"(empty DE): EN={prev['english']!r}"
                )
                continue
        merged.append(entry)
    if len(merged) < len(entries):
        logger.info(
            f"_merge_wrapped_rows: merged {len(entries) - len(merged)} "
            f"continuation rows ({len(entries)} → {len(merged)})"
        )
    return merged
 def _merge_continuation_rows(
    entries: List[Dict[str, Any]],
 ) -> List[Dict[str, Any]]:
@@ -1561,6 +1645,9 @@ def build_word_grid(
    # --- Post-processing pipeline (deterministic, no LLM) ---
    n_raw = len(entries)
    # 0. Merge cell-wrap continuation rows (empty primary column = text wrap)
    entries = _merge_wrapped_rows(entries)
    # 0a. Merge phonetic-only continuation rows into previous entry
    entries = _merge_phonetic_continuation_rows(entries)
--- a/klausur-service/backend/cv_gutter_repair.py
+++ b/klausur-service/backend/cv_gutter_repair.py
@@ -0,0 +1,610 @@
 """
 Gutter Repair — detects and fixes words truncated or blurred at the book gutter.
 When scanning double-page spreads, the binding area (gutter) causes:
  1. Blurry/garbled trailing characters  ("stammeli" → "stammeln")
  2. Words split across lines with a hyphen lost in the gutter
     ("ve" + "künden" → "verkünden")
 This module analyses grid cells, identifies gutter-edge candidates, and
 proposes corrections using pyspellchecker (DE + EN).
 Lizenz: Apache 2.0 (kommerziell nutzbar)
 DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
 """
 import itertools
 import logging
 import re
 import time
 import uuid
 from dataclasses import dataclass, field, asdict
 from typing import Any, Dict, List, Optional, Tuple
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Spellchecker setup (lazy, cached)
 # ---------------------------------------------------------------------------
 _spell_de = None
 _spell_en = None
 _SPELL_AVAILABLE = False
 def _init_spellcheckers():
    """Lazy-load DE + EN spellcheckers (cached across calls)."""
    global _spell_de, _spell_en, _SPELL_AVAILABLE
    if _spell_de is not None:
        return
    try:
        from spellchecker import SpellChecker
        _spell_de = SpellChecker(language='de', distance=1)
        _spell_en = SpellChecker(language='en', distance=1)
        _SPELL_AVAILABLE = True
        logger.info("Gutter repair: spellcheckers loaded (DE + EN)")
    except ImportError:
        logger.warning("pyspellchecker not installed — gutter repair unavailable")
 def _is_known(word: str) -> bool:
    """Check if a word is known in DE or EN dictionary."""
    _init_spellcheckers()
    if not _SPELL_AVAILABLE:
        return False
    w = word.lower()
    return bool(_spell_de.known([w])) or bool(_spell_en.known([w]))
 def _spell_candidates(word: str, lang: str = "both") -> List[str]:
    """Get all plausible spellchecker candidates for a word (deduplicated)."""
    _init_spellcheckers()
    if not _SPELL_AVAILABLE:
        return []
    w = word.lower()
    seen: set = set()
    results: List[str] = []
    for checker in ([_spell_de, _spell_en] if lang == "both"
                    else [_spell_de] if lang == "de"
                    else [_spell_en]):
        if checker is None:
            continue
        cands = checker.candidates(w)
        if cands:
            for c in cands:
                if c and c != w and c not in seen:
                    seen.add(c)
                    results.append(c)
    return results
 # ---------------------------------------------------------------------------
 # Gutter position detection
 # ---------------------------------------------------------------------------
 # Minimum word length for spell-fix (very short words are often legitimate)
 _MIN_WORD_LEN_SPELL = 3
 # Minimum word length for hyphen-join candidates (fragments at the gutter
 # can be as short as 1-2 chars, e.g. "ve" from "ver-künden")
 _MIN_WORD_LEN_HYPHEN = 2
 # How close to the right column edge a word must be to count as "gutter-adjacent".
 # Expressed as fraction of column width (e.g. 0.75 = rightmost 25%).
 _GUTTER_EDGE_THRESHOLD = 0.70
 # Small common words / abbreviations that should NOT be repaired
 _STOPWORDS = frozenset([
    # German
    "ab", "an", "am", "da", "er", "es", "im", "in", "ja", "ob", "so", "um",
    "zu", "wo", "du", "eh", "ei", "je", "na", "nu", "oh",
    # English
    "a", "am", "an", "as", "at", "be", "by", "do", "go", "he", "if", "in",
    "is", "it", "me", "my", "no", "of", "on", "or", "so", "to", "up", "us",
    "we",
 ])
 # IPA / phonetic patterns — skip these cells
 _IPA_RE = re.compile(r'[\[\]/ˈˌːʃʒθðŋɑɒæɔəɛɪʊʌ]')
 def _is_ipa_text(text: str) -> bool:
    """True if text looks like IPA transcription."""
    return bool(_IPA_RE.search(text))
 def _word_is_at_gutter_edge(word_bbox: Dict, col_x: float, col_width: float) -> bool:
    """Check if a word's right edge is near the right boundary of its column."""
    if col_width <= 0:
        return False
    word_right = word_bbox.get("left", 0) + word_bbox.get("width", 0)
    col_right = col_x + col_width
    # Word's right edge within the rightmost portion of the column
    relative_pos = (word_right - col_x) / col_width
    return relative_pos >= _GUTTER_EDGE_THRESHOLD
 # ---------------------------------------------------------------------------
 # Suggestion types
 # ---------------------------------------------------------------------------
@dataclass
 class GutterSuggestion:
    """A single correction suggestion."""
    id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
    type: str = ""             # "hyphen_join" | "spell_fix"
    zone_index: int = 0
    row_index: int = 0
    col_index: int = 0
    col_type: str = ""
    cell_id: str = ""
    original_text: str = ""
    suggested_text: str = ""
    # For hyphen_join:
    next_row_index: int = -1
    next_row_cell_id: str = ""
    next_row_text: str = ""
    missing_chars: str = ""
    display_parts: List[str] = field(default_factory=list)
    # Alternatives (other plausible corrections the user can pick from)
    alternatives: List[str] = field(default_factory=list)
    # Meta:
    confidence: float = 0.0
    reason: str = ""           # "gutter_truncation" | "gutter_blur" | "hyphen_continuation"
    def to_dict(self) -> Dict[str, Any]:
        return asdict(self)
 # ---------------------------------------------------------------------------
 # Core repair logic
 # ---------------------------------------------------------------------------
 _TRAILING_PUNCT_RE = re.compile(r'[.,;:!?\)\]]+$')
 def _try_hyphen_join(
    word_text: str,
    next_word_text: str,
    max_missing: int = 3,
 ) -> Optional[Tuple[str, str, float]]:
    """Try joining two fragments with 0..max_missing interpolated chars.
    Strips trailing punctuation from the continuation word before testing
    (e.g. "künden," → "künden") so dictionary lookup succeeds.
    Returns (joined_word, missing_chars, confidence) or None.
    """
    base = word_text.rstrip("-").rstrip()
    # Strip trailing punctuation from continuation (commas, periods, etc.)
    raw_continuation = next_word_text.lstrip()
    continuation = _TRAILING_PUNCT_RE.sub('', raw_continuation)
    if not base or not continuation:
        return None
    # 1. Direct join (no missing chars)
    direct = base + continuation
    if _is_known(direct):
        return (direct, "", 0.95)
    # 2. Try with 1..max_missing missing characters
    # Use common letters, weighted by frequency in German/English
    _COMMON_CHARS = "enristaldhgcmobwfkzpvjyxqu"
    for n_missing in range(1, max_missing + 1):
        for chars in itertools.product(_COMMON_CHARS[:15], repeat=n_missing):
            candidate = base + "".join(chars) + continuation
            if _is_known(candidate):
                missing = "".join(chars)
                # Confidence decreases with more missing chars
                conf = 0.90 - (n_missing - 1) * 0.10
                return (candidate, missing, conf)
    return None
 def _try_spell_fix(
    word_text: str, col_type: str = "",
 ) -> Optional[Tuple[str, float, List[str]]]:
    """Try to fix a single garbled gutter word via spellchecker.
    Returns (best_correction, confidence, alternatives_list) or None.
    The alternatives list contains other plausible corrections the user
    can choose from (e.g. "stammelt" vs "stammeln").
    """
    if len(word_text) < _MIN_WORD_LEN_SPELL:
        return None
    # Strip trailing/leading parentheses and check if the bare word is valid.
    # Words like "probieren)" or "(Englisch" are valid words with punctuation,
    # not OCR errors. Don't suggest corrections for them.
    stripped = word_text.strip("()")
    if stripped and _is_known(stripped):
        return None
    # Determine language priority from column type
    if "en" in col_type:
        lang = "en"
    elif "de" in col_type:
        lang = "de"
    else:
        lang = "both"
    candidates = _spell_candidates(word_text, lang=lang)
    if not candidates and lang != "both":
        candidates = _spell_candidates(word_text, lang="both")
    if not candidates:
        return None
    # Preserve original casing
    is_upper = word_text[0].isupper()
    def _preserve_case(w: str) -> str:
        if is_upper and w:
            return w[0].upper() + w[1:]
        return w
    # Sort candidates by edit distance (closest first)
    scored = []
    for c in candidates:
        dist = _edit_distance(word_text.lower(), c.lower())
        scored.append((dist, c))
    scored.sort(key=lambda x: x[0])
    best_dist, best = scored[0]
    best = _preserve_case(best)
    conf = max(0.5, 1.0 - best_dist * 0.15)
    # Build alternatives (all other candidates, also case-preserved)
    alts = [_preserve_case(c) for _, c in scored[1:] if c.lower() != best.lower()]
    # Limit to top 5 alternatives
    alts = alts[:5]
    return (best, conf, alts)
 def _edit_distance(a: str, b: str) -> int:
    """Simple Levenshtein distance."""
    if len(a) < len(b):
        return _edit_distance(b, a)
    if len(b) == 0:
        return len(a)
    prev = list(range(len(b) + 1))
    for i, ca in enumerate(a):
        curr = [i + 1]
        for j, cb in enumerate(b):
            cost = 0 if ca == cb else 1
            curr.append(min(curr[j] + 1, prev[j + 1] + 1, prev[j] + cost))
        prev = curr
    return prev[len(b)]
 # ---------------------------------------------------------------------------
 # Grid analysis
 # ---------------------------------------------------------------------------
 def analyse_grid_for_gutter_repair(
    grid_data: Dict[str, Any],
    image_width: int = 0,
 ) -> Dict[str, Any]:
    """Analyse a structured grid and return gutter repair suggestions.
    Args:
        grid_data: The grid_editor_result from the session (zones→cells structure).
        image_width: Image width in pixels (for determining gutter side).
    Returns:
        Dict with "suggestions" list and "stats".
    """
    t0 = time.time()
    _init_spellcheckers()
    if not _SPELL_AVAILABLE:
        return {
            "suggestions": [],
            "stats": {"error": "pyspellchecker not installed"},
            "duration_seconds": 0,
        }
    zones = grid_data.get("zones", [])
    suggestions: List[GutterSuggestion] = []
    words_checked = 0
    gutter_candidates = 0
    for zi, zone in enumerate(zones):
        columns = zone.get("columns", [])
        cells = zone.get("cells", [])
        if not columns or not cells:
            continue
        # Build column lookup: col_index → {x, width, type}
        col_info: Dict[int, Dict] = {}
        for col in columns:
            ci = col.get("index", col.get("col_index", -1))
            col_info[ci] = {
                "x": col.get("x_min_px", col.get("x", 0)),
                "width": col.get("x_max_px", col.get("width", 0)) - col.get("x_min_px", col.get("x", 0)),
                "type": col.get("type", col.get("col_type", "")),
            }
        # Build row→col→cell lookup
        cell_map: Dict[Tuple[int, int], Dict] = {}
        max_row = 0
        for cell in cells:
            ri = cell.get("row_index", 0)
            ci = cell.get("col_index", 0)
            cell_map[(ri, ci)] = cell
            if ri > max_row:
                max_row = ri
        # Determine which columns are at the gutter edge.
        # For a left page: rightmost content columns.
        # For now, check ALL columns — a word is a candidate if it's at the
        # right edge of its column AND not a known word.
        for (ri, ci), cell in cell_map.items():
            text = (cell.get("text") or "").strip()
            if not text:
                continue
            if _is_ipa_text(text):
                continue
            words_checked += 1
            col = col_info.get(ci, {})
            col_type = col.get("type", "")
            # Get word boxes to check position
            word_boxes = cell.get("word_boxes", [])
            # Check the LAST word in the cell (rightmost, closest to gutter)
            cell_words = text.split()
            if not cell_words:
                continue
            last_word = cell_words[-1]
            # Skip stopwords
            if last_word.lower().rstrip(".,;:!?-") in _STOPWORDS:
                continue
            last_word_clean = last_word.rstrip(".,;:!?)(")
            if len(last_word_clean) < _MIN_WORD_LEN_HYPHEN:
                continue
            # Check if the last word is at the gutter edge
            is_at_edge = False
            if word_boxes:
                last_wb = word_boxes[-1]
                is_at_edge = _word_is_at_gutter_edge(
                    last_wb, col.get("x", 0), col.get("width", 1)
                )
            else:
                # No word boxes — use cell bbox
                bbox = cell.get("bbox_px", {})
                is_at_edge = _word_is_at_gutter_edge(
                    {"left": bbox.get("x", 0), "width": bbox.get("w", 0)},
                    col.get("x", 0), col.get("width", 1)
                )
            if not is_at_edge:
                continue
            # Word is at gutter edge — check if it's a known word
            if _is_known(last_word_clean):
                continue
            # Check if the word ends with "-" (explicit hyphen break)
            ends_with_hyphen = last_word.endswith("-")
            # If the word already ends with "-" and the stem (without
            # the hyphen) is a known word, this is a VALID line-break
            # hyphenation — not a gutter error.  Gutter problems cause
            # the hyphen to be LOST ("ve" instead of "ver-"), so a
            # visible hyphen + known stem = intentional word-wrap.
            # Example: "wunder-" → "wunder" is known → skip.
            if ends_with_hyphen:
                stem = last_word_clean.rstrip("-")
                if stem and _is_known(stem):
                    continue
            gutter_candidates += 1
            # --- Strategy 1: Hyphen join with next row ---
            next_cell = cell_map.get((ri + 1, ci))
            if next_cell:
                next_text = (next_cell.get("text") or "").strip()
                next_words = next_text.split()
                if next_words:
                    first_next = next_words[0]
                    first_next_clean = _TRAILING_PUNCT_RE.sub('', first_next)
                    first_alpha = next((c for c in first_next if c.isalpha()), "")
                    # Also skip if the joined word is known (covers compound
                    # words where the stem alone might not be in the dictionary)
                    if ends_with_hyphen and first_next_clean:
                        direct = last_word_clean.rstrip("-") + first_next_clean
                        if _is_known(direct):
                            continue
                    # Continuation likely if:
                    # - explicit hyphen, OR
                    # - next row starts lowercase (= not a new entry)
                    if ends_with_hyphen or (first_alpha and first_alpha.islower()):
                        result = _try_hyphen_join(last_word_clean, first_next)
                        if result:
                            joined, missing, conf = result
                            # Build display parts: show hyphenation for original layout
                            if ends_with_hyphen:
                                display_p1 = last_word_clean.rstrip("-")
                                if missing:
                                    display_p1 += missing
                                display_p1 += "-"
                            else:
                                display_p1 = last_word_clean
                                if missing:
                                    display_p1 += missing + "-"
                                else:
                                    display_p1 += "-"
                            suggestion = GutterSuggestion(
                                type="hyphen_join",
                                zone_index=zi,
                                row_index=ri,
                                col_index=ci,
                                col_type=col_type,
                                cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
                                original_text=last_word,
                                suggested_text=joined,
                                next_row_index=ri + 1,
                                next_row_cell_id=next_cell.get("cell_id", f"R{ri+1:02d}_C{ci}"),
                                next_row_text=next_text,
                                missing_chars=missing,
                                display_parts=[display_p1, first_next],
                                confidence=conf,
                                reason="gutter_truncation" if missing else "hyphen_continuation",
                            )
                            suggestions.append(suggestion)
                            continue  # skip spell_fix if hyphen_join found
            # --- Strategy 2: Single-word spell fix (only for longer words) ---
            fix_result = _try_spell_fix(last_word_clean, col_type)
            if fix_result:
                corrected, conf, alts = fix_result
                suggestion = GutterSuggestion(
                    type="spell_fix",
                    zone_index=zi,
                    row_index=ri,
                    col_index=ci,
                    col_type=col_type,
                    cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
                    original_text=last_word,
                    suggested_text=corrected,
                    alternatives=alts,
                    confidence=conf,
                    reason="gutter_blur",
                )
                suggestions.append(suggestion)
    duration = round(time.time() - t0, 3)
    logger.info(
        "Gutter repair: checked %d words, %d gutter candidates, %d suggestions (%.2fs)",
        words_checked, gutter_candidates, len(suggestions), duration,
    )
    return {
        "suggestions": [s.to_dict() for s in suggestions],
        "stats": {
            "words_checked": words_checked,
            "gutter_candidates": gutter_candidates,
            "suggestions_found": len(suggestions),
        },
        "duration_seconds": duration,
    }
 def apply_gutter_suggestions(
    grid_data: Dict[str, Any],
    accepted_ids: List[str],
    suggestions: List[Dict[str, Any]],
 ) -> Dict[str, Any]:
    """Apply accepted gutter repair suggestions to the grid data.
    Modifies cells in-place and returns summary of changes.
    Args:
        grid_data: The grid_editor_result (zones→cells).
        accepted_ids: List of suggestion IDs the user accepted.
        suggestions: The full suggestions list (from analyse_grid_for_gutter_repair).
    Returns:
        Dict with "applied_count" and "changes" list.
    """
    accepted_set = set(accepted_ids)
    accepted_suggestions = [s for s in suggestions if s.get("id") in accepted_set]
    zones = grid_data.get("zones", [])
    changes: List[Dict[str, Any]] = []
    for s in accepted_suggestions:
        zi = s.get("zone_index", 0)
        ri = s.get("row_index", 0)
        ci = s.get("col_index", 0)
        stype = s.get("type", "")
        if zi >= len(zones):
            continue
        zone_cells = zones[zi].get("cells", [])
        # Find the target cell
        target_cell = None
        for cell in zone_cells:
            if cell.get("row_index") == ri and cell.get("col_index") == ci:
                target_cell = cell
                break
        if not target_cell:
            continue
        old_text = target_cell.get("text", "")
        if stype == "spell_fix":
            # Replace the last word in the cell text
            original_word = s.get("original_text", "")
            corrected = s.get("suggested_text", "")
            if original_word and corrected:
                # Replace from the right (last occurrence)
                idx = old_text.rfind(original_word)
                if idx >= 0:
                    new_text = old_text[:idx] + corrected + old_text[idx + len(original_word):]
                    target_cell["text"] = new_text
                    changes.append({
                        "type": "spell_fix",
                        "zone_index": zi,
                        "row_index": ri,
                        "col_index": ci,
                        "cell_id": target_cell.get("cell_id", ""),
                        "old_text": old_text,
                        "new_text": new_text,
                    })
        elif stype == "hyphen_join":
            # Current cell: replace last word with the hyphenated first part
            original_word = s.get("original_text", "")
            joined = s.get("suggested_text", "")
            display_parts = s.get("display_parts", [])
            next_ri = s.get("next_row_index", -1)
            if not original_word or not joined or not display_parts:
                continue
            # The first display part is what goes in the current row
            first_part = display_parts[0] if display_parts else ""
            # Replace the last word in current cell with the restored form.
            # The next row is NOT modified — "künden" stays in its row
            # because the original book layout has it there. We only fix
            # the truncated word in the current row (e.g. "ve" → "ver-").
            idx = old_text.rfind(original_word)
            if idx >= 0:
                new_text = old_text[:idx] + first_part + old_text[idx + len(original_word):]
                target_cell["text"] = new_text
                changes.append({
                    "type": "hyphen_join",
                    "zone_index": zi,
                    "row_index": ri,
                    "col_index": ci,
                    "cell_id": target_cell.get("cell_id", ""),
                    "old_text": old_text,
                    "new_text": new_text,
                    "joined_word": joined,
                })
    logger.info("Gutter repair applied: %d/%d suggestions", len(changes), len(accepted_suggestions))
    return {
        "applied_count": len(accepted_suggestions),
        "changes": changes,
    }
--- a/klausur-service/backend/cv_ocr_engines.py
+++ b/klausur-service/backend/cv_ocr_engines.py
@@ -1182,6 +1182,10 @@ def _insert_missing_ipa(text: str, pronunciation: str = 'british') -> str:
                if wj in ('–', '—', '-', '/', '|', ',', ';'):
                    kept.extend(words[j:])
                    break
                # Pure digits or numbering (e.g. "1", "2.", "3)") — keep
                if re.match(r'^[\d.)\-]+$', wj):
                    kept.extend(words[j:])
                    break
                # Starts with uppercase — likely German or proper noun
                clean_j = re.sub(r'[^a-zA-Z]', '', wj)
                if clean_j and clean_j[0].isupper():
@@ -1243,6 +1247,9 @@ def _has_non_dict_trailing(text: str, pronunciation: str = 'british') -> bool:
        wj = words[j]
        if wj in ('–', '—', '-', '/', '|', ',', ';'):
            return False
        # Pure digits or numbering (e.g. "1", "2.", "3)") — not garbled IPA
        if re.match(r'^[\d.)\-]+$', wj):
            return False
        clean_j = re.sub(r'[^a-zA-Z]', '', wj)
        if clean_j and clean_j[0].isupper():
            return False
@@ -1874,6 +1881,11 @@ def _is_noise_tail_token(token: str) -> bool:
    if t.endswith(']'):
        return False
    # Keep meaningful punctuation tokens used in textbooks
    # = (definition marker), (= (definition opener), ; (separator)
    if t in ('=', '(=', '=)', ';', ':', '-', '–', '—', '/', '+', '&'):
        return False
    # Pure non-alpha → noise ("3", ")", "|")
    alpha_chars = _RE_ALPHA.findall(t)
    if not alpha_chars:
--- a/klausur-service/backend/cv_review.py
+++ b/klausur-service/backend/cv_review.py
@@ -720,6 +720,62 @@ def _spell_dict_knows(word: str) -> bool:
    return bool(_en_spell.known([w])) or bool(_de_spell.known([w]))
 def _try_split_merged_word(token: str) -> Optional[str]:
    """Try to split a merged word like 'atmyschool' into 'at my school'.
    Uses dynamic programming to find the shortest sequence of dictionary
    words that covers the entire token.  Only returns a result when the
    split produces at least 2 words and ALL parts are known dictionary words.
    Preserves original capitalisation by mapping back to the input string.
    """
    if not _SPELL_AVAILABLE or len(token) < 4:
        return None
    lower = token.lower()
    n = len(lower)
    # dp[i] = (word_lengths_list, score) for best split of lower[:i], or None
    # Score: (-word_count, sum_of_squared_lengths) — fewer words first,
    # then prefer longer words (e.g. "come on" over "com eon")
    dp: list = [None] * (n + 1)
    dp[0] = ([], 0)
    for i in range(1, n + 1):
        for j in range(max(0, i - 20), i):
            if dp[j] is None:
                continue
            candidate = lower[j:i]
            word_len = i - j
            if word_len == 1 and candidate not in ('a', 'i'):
                continue
            if _spell_dict_knows(candidate):
                prev_words, prev_sq = dp[j]
                new_words = prev_words + [word_len]
                new_sq = prev_sq + word_len * word_len
                new_key = (-len(new_words), new_sq)
                if dp[i] is None:
                    dp[i] = (new_words, new_sq)
                else:
                    old_key = (-len(dp[i][0]), dp[i][1])
                    if new_key >= old_key:
                        # >= so that later splits (longer first word) win ties
                        dp[i] = (new_words, new_sq)
    if dp[n] is None or len(dp[n][0]) < 2:
        return None
    # Reconstruct with original casing
    result = []
    pos = 0
    for wlen in dp[n][0]:
        result.append(token[pos:pos + wlen])
        pos += wlen
    logger.debug("Split merged word: %r → %r", token, " ".join(result))
    return " ".join(result)
 def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
    """Return corrected form of token, or None if no fix needed/possible.
@@ -777,6 +833,14 @@ def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
                    correction = correction[0].upper() + correction[1:]
                if _spell_dict_knows(correction):
                    return correction
    # 5. Merged-word split: OCR often merges adjacent words when spacing
    #    is too tight, e.g. "atmyschool" → "at my school"
    if len(token) >= 4 and token.isalpha():
        split = _try_split_merged_word(token)
        if split:
            return split
    return None
@@ -817,10 +881,25 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
    """Rule-based OCR correction: spell-checker + structural heuristics.
    Deterministic — never translates, never touches IPA, never hallucinates.
    Uses SmartSpellChecker for language-aware corrections with context-based
    disambiguation (a/I), multi-digit substitution, and cross-language guard.
    """
    t0 = time.time()
    changes: List[Dict] = []
    all_corrected: List[Dict] = []
    # Use SmartSpellChecker if available, fall back to legacy _spell_fix_field
    _smart = None
    try:
        from smart_spell import SmartSpellChecker
        _smart = SmartSpellChecker()
        logger.debug("spell_review: using SmartSpellChecker")
    except Exception:
        logger.debug("spell_review: SmartSpellChecker not available, using legacy")
    # Map field names → language codes for SmartSpellChecker
    _LANG_MAP = {"english": "en", "german": "de", "example": "auto"}
    for i, entry in enumerate(entries):
        e = dict(entry)
        # Page-ref normalization (always, regardless of review status)
@@ -843,9 +922,18 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
            old_val = (e.get(field_name) or "").strip()
            if not old_val:
                continue
-            # example field is mixed-language — try German first (for umlauts)
+
-            lang = "german" if field_name in ("german", "example") else "english"
+            if _smart:
-            new_val, was_changed = _spell_fix_field(old_val, field=lang)
+                # SmartSpellChecker path — language-aware, context-based
                lang_code = _LANG_MAP.get(field_name, "en")
                result = _smart.correct_text(old_val, lang=lang_code)
                new_val = result.corrected
                was_changed = result.changed
            else:
                # Legacy path
                lang = "german" if field_name in ("german", "example") else "english"
                new_val, was_changed = _spell_fix_field(old_val, field=lang)
            if was_changed and new_val != old_val:
                changes.append({
                    "row_index": e.get("row_index", i),
@@ -857,12 +945,13 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
                e["llm_corrected"] = True
        all_corrected.append(e)
    duration_ms = int((time.time() - t0) * 1000)
    model_name = "smart-spell-checker" if _smart else "spell-checker"
    return {
        "entries_original": entries,
        "entries_corrected": all_corrected,
        "changes": changes,
        "skipped_count": 0,
-        "model_used": "spell-checker",
+        "model_used": model_name,
        "duration_ms": duration_ms,
    }
--- a/klausur-service/backend/cv_syllable_detect.py
+++ b/klausur-service/backend/cv_syllable_detect.py
@@ -55,6 +55,9 @@ _STOP_WORDS = frozenset([
 _hyph_de = None
 _hyph_en = None
 # Cached spellchecker (for autocorrect_pipe_artifacts)
 _spell_de = None
 def _get_hyphenators():
    """Lazy-load pyphen hyphenators (cached across calls)."""
@@ -70,6 +73,35 @@ def _get_hyphenators():
    return _hyph_de, _hyph_en
 def _get_spellchecker():
    """Lazy-load German spellchecker (cached across calls)."""
    global _spell_de
    if _spell_de is not None:
        return _spell_de
    try:
        from spellchecker import SpellChecker
    except ImportError:
        return None
    _spell_de = SpellChecker(language='de')
    return _spell_de
 def _is_known_word(word: str, hyph_de, hyph_en) -> bool:
    """Check whether pyphen recognises a word (DE or EN)."""
    if len(word) < 2:
        return False
    return ('|' in hyph_de.inserted(word, hyphen='|')
            or '|' in hyph_en.inserted(word, hyphen='|'))
 def _is_real_word(word: str) -> bool:
    """Check whether spellchecker knows this word (case-insensitive)."""
    spell = _get_spellchecker()
    if spell is None:
        return False
    return word.lower() in spell
 def _hyphenate_word(word: str, hyph_de, hyph_en) -> Optional[str]:
    """Try to hyphenate a word using DE then EN dictionary.
@@ -84,6 +116,139 @@ def _hyphenate_word(word: str, hyph_de, hyph_en) -> Optional[str]:
    return None
 def _autocorrect_piped_word(word_with_pipes: str) -> Optional[str]:
    """Try to correct a word that has OCR pipe artifacts.
    Printed syllable divider lines on dictionary pages confuse OCR:
    the vertical stroke is often read as an extra character (commonly
    ``l``, ``I``, ``1``, ``i``) adjacent to where the pipe appears.
    Sometimes OCR reads one divider as ``|`` and another as a letter,
    so the garbled character may be far from any detected pipe.
    Uses ``spellchecker`` (frequency-based word list) for validation —
    unlike pyphen which is a pattern-based hyphenator and accepts
    nonsense strings like "Zeplpelin".
    Strategy:
        1. Strip ``|`` — if spellchecker knows the result, done.
        2. Try deleting each pipe-like character (l, I, 1, i, t).
           OCR inserts extra chars that resemble vertical strokes.
        3. Fall back to spellchecker's own ``correction()`` method.
        4. Preserve the original casing of the first letter.
    """
    stripped = word_with_pipes.replace('|', '')
    if not stripped or len(stripped) < 3:
        return stripped  # too short to validate
    # Step 1: if the stripped word is already a real word, done
    if _is_real_word(stripped):
        return stripped
    # Step 2: try deleting pipe-like characters (most likely artifacts)
    _PIPE_LIKE = frozenset('lI1it')
    for idx in range(len(stripped)):
        if stripped[idx] not in _PIPE_LIKE:
            continue
        candidate = stripped[:idx] + stripped[idx + 1:]
        if len(candidate) >= 3 and _is_real_word(candidate):
            return candidate
    # Step 3: use spellchecker's built-in correction
    spell = _get_spellchecker()
    if spell is not None:
        suggestion = spell.correction(stripped.lower())
        if suggestion and suggestion != stripped.lower():
            # Preserve original first-letter case
            if stripped[0].isupper():
                suggestion = suggestion[0].upper() + suggestion[1:]
            return suggestion
    return None  # could not fix
 def autocorrect_pipe_artifacts(
    zones_data: List[Dict], session_id: str,
 ) -> int:
    """Strip OCR pipe artifacts and correct garbled words in-place.
    Printed syllable divider lines on dictionary scans are read by OCR
    as ``|`` characters embedded in words (e.g. ``Zel|le``, ``Ze|plpe|lin``).
    This function:
    1. Strips ``|`` from every word in content cells.
    2. Validates with spellchecker (real dictionary lookup).
    3. If not recognised, tries deleting pipe-like characters or uses
       spellchecker's correction (e.g. ``Zeplpelin`` → ``Zeppelin``).
    4. Updates both word-box texts and cell text.
    Returns the number of cells modified.
    """
    spell = _get_spellchecker()
    if spell is None:
        logger.warning("spellchecker not available — pipe autocorrect limited")
        # Fall back: still strip pipes even without spellchecker
        pass
    modified = 0
    for z in zones_data:
        for cell in z.get("cells", []):
            ct = cell.get("col_type", "")
            if not ct.startswith("column_"):
                continue
            cell_changed = False
            # --- Fix word boxes ---
            for wb in cell.get("word_boxes", []):
                wb_text = wb.get("text", "")
                if "|" not in wb_text:
                    continue
                # Separate trailing punctuation
                m = re.match(
                    r'^([^a-zA-ZäöüÄÖÜßẞ]*)'
                    r'(.*?)'
                    r'([^a-zA-ZäöüÄÖÜßẞ]*)$',
                    wb_text,
                )
                if not m:
                    continue
                lead, core, trail = m.group(1), m.group(2), m.group(3)
                if "|" not in core:
                    continue
                corrected = _autocorrect_piped_word(core)
                if corrected is not None and corrected != core:
                    wb["text"] = lead + corrected + trail
                    cell_changed = True
            # --- Rebuild cell text from word boxes ---
            if cell_changed:
                wbs = cell.get("word_boxes", [])
                if wbs:
                    cell["text"] = " ".join(
                        (wb.get("text") or "") for wb in wbs
                    )
                modified += 1
            # --- Fallback: strip residual | from cell text ---
            # (covers cases where word_boxes don't exist or weren't fixed)
            text = cell.get("text", "")
            if "|" in text:
                clean = text.replace("|", "")
                if clean != text:
                    cell["text"] = clean
                    if not cell_changed:
                        modified += 1
    if modified:
        logger.info(
            "build-grid session %s: autocorrected pipe artifacts in %d cells",
            session_id, modified,
        )
    return modified
 def _try_merge_pipe_gaps(text: str, hyph_de) -> str:
    """Merge fragments separated by single spaces where OCR split at a pipe.
@@ -185,7 +350,7 @@ def merge_word_gaps_in_zones(zones_data: List[Dict], session_id: str) -> int:
 def _try_merge_word_gaps(text: str, hyph_de) -> str:
-    """Merge OCR word fragments with relaxed threshold (max_short=6).
+    """Merge OCR word fragments with relaxed threshold (max_short=5).
    Similar to ``_try_merge_pipe_gaps`` but allows slightly longer fragments
    (max_short=5 instead of 3).  Still requires pyphen to recognize the
--- a/klausur-service/backend/cv_words_first.py
+++ b/klausur-service/backend/cv_words_first.py
@@ -35,9 +35,15 @@ def _cluster_columns(
    words: List[Dict],
    img_w: int,
    min_gap_pct: float = 3.0,
    max_columns: Optional[int] = None,
 ) -> List[Dict[str, Any]]:
    """Cluster words into columns by finding large horizontal gaps.
    Args:
        max_columns: If set, limits the number of columns by merging
            the closest adjacent pairs until the count matches.
            Prevents phantom columns from degraded OCR.
    Returns a list of column dicts:
        [{'index': 0, 'type': 'column_1', 'x_min': ..., 'x_max': ...}, ...]
    sorted left-to-right.
@@ -57,17 +63,28 @@ def _cluster_columns(
    # Find X-gap boundaries between consecutive words (sorted by X-center)
    # For each word, compute right edge; for next word, compute left edge
-    boundaries: List[float] = []  # X positions where columns split
+    # Collect gaps with their sizes for max_columns enforcement
    gaps: List[Tuple[float, float]] = []  # (gap_size, split_x)
    for i in range(len(sorted_w) - 1):
        right_edge = sorted_w[i]['left'] + sorted_w[i]['width']
        left_edge = sorted_w[i + 1]['left']
        gap = left_edge - right_edge
        if gap > min_gap_px:
-            # Split point is midway through the gap
+            split_x = (right_edge + left_edge) / 2
-            boundaries.append((right_edge + left_edge) / 2)
+            gaps.append((gap, split_x))
    # If max_columns is set, keep only the (max_columns - 1) largest gaps
    if max_columns and len(gaps) >= max_columns:
        gaps.sort(key=lambda g: g[0], reverse=True)
        gaps = gaps[:max_columns - 1]
        logger.info(
            f"_cluster_columns: limited to {max_columns} columns "
            f"(removed {len(gaps) + max_columns - 1 - (max_columns - 1)} smallest gaps)"
        )
    boundaries = sorted(g[1] for g in gaps)
    # Build column ranges from boundaries
    # Column ranges: (-inf, boundary[0]), (boundary[0], boundary[1]), ..., (boundary[-1], +inf)
    col_edges = [0.0] + boundaries + [float(img_w)]
    columns = []
    for ci in range(len(col_edges) - 1):
@@ -302,6 +319,7 @@ def build_grid_from_words(
    img_h: int,
    min_confidence: int = 30,
    box_rects: Optional[List[Dict]] = None,
    max_columns: Optional[int] = None,
 ) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
    """Build a cell grid bottom-up from Tesseract word boxes.
@@ -359,8 +377,9 @@ def build_grid_from_words(
            return [], []
    # Step 1: cluster columns
-    columns = _cluster_columns(words, img_w)
+    columns = _cluster_columns(words, img_w, max_columns=max_columns)
-    logger.info("build_grid_from_words: %d column(s) detected", len(columns))
+    logger.info("build_grid_from_words: %d column(s) detected%s",
                len(columns), f" (max={max_columns})" if max_columns else "")
    # Step 2: cluster rows
    rows = _cluster_rows(words)
--- a/klausur-service/backend/grid_build_core.py
+++ b/klausur-service/backend/grid_build_core.py
--- a/klausur-service/backend/grid_editor_api.py
+++ b/klausur-service/backend/grid_editor_api.py
--- a/klausur-service/backend/grid_editor_helpers.py
+++ b/klausur-service/backend/grid_editor_helpers.py
@@ -22,6 +22,148 @@ from cv_ocr_engines import _text_has_garbled_ipa
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Cross-column word splitting
 # ---------------------------------------------------------------------------
 _spell_cache: Optional[Any] = None
 _spell_loaded = False
 def _is_recognized_word(text: str) -> bool:
    """Check if *text* is a recognized German or English word.
    Uses the spellchecker library (same as cv_syllable_detect.py).
    Returns True for real words like "oder", "Kabel", "Zeitung".
    Returns False for OCR merge artifacts like "sichzie", "dasZimmer".
    """
    global _spell_cache, _spell_loaded
    if not text or len(text) < 2:
        return False
    if not _spell_loaded:
        _spell_loaded = True
        try:
            from spellchecker import SpellChecker
            _spell_cache = SpellChecker(language="de")
        except Exception:
            pass
    if _spell_cache is None:
        return False
    return text.lower() in _spell_cache
 def _split_cross_column_words(
    words: List[Dict],
    columns: List[Dict],
 ) -> List[Dict]:
    """Split word boxes that span across column boundaries.
    When OCR merges adjacent words from different columns (e.g. "sichzie"
    spanning Col 1 and Col 2, or "dasZimmer" crossing the boundary),
    split the word box at the column boundary so each piece is assigned
    to the correct column.
    Only splits when:
    - The word has significant overlap (>15% of its width) on both sides
    - AND the word is not a recognized real word (OCR merge artifact), OR
      the word contains a case transition (lowercase→uppercase) near the
      boundary indicating two merged words like "dasZimmer".
    """
    if len(columns) < 2:
        return words
    # Column boundaries = midpoints between adjacent column edges
    boundaries = []
    for i in range(len(columns) - 1):
        boundary = (columns[i]["x_max"] + columns[i + 1]["x_min"]) / 2
        boundaries.append(boundary)
    new_words: List[Dict] = []
    split_count = 0
    for w in words:
        w_left = w["left"]
        w_width = w["width"]
        w_right = w_left + w_width
        text = (w.get("text") or "").strip()
        if not text or len(text) < 4 or w_width < 10:
            new_words.append(w)
            continue
        # Find the first boundary this word straddles significantly
        split_boundary = None
        for b in boundaries:
            if w_left < b < w_right:
                left_part = b - w_left
                right_part = w_right - b
                # Both sides must have at least 15% of the word width
                if left_part > w_width * 0.15 and right_part > w_width * 0.15:
                    split_boundary = b
                    break
        if split_boundary is None:
            new_words.append(w)
            continue
        # Compute approximate split position in the text.
        left_width = split_boundary - w_left
        split_ratio = left_width / w_width
        approx_pos = len(text) * split_ratio
        # Strategy 1: look for a case transition (lowercase→uppercase) near
        # the approximate split point — e.g. "dasZimmer" splits at 'Z'.
        split_char = None
        search_lo = max(1, int(approx_pos) - 3)
        search_hi = min(len(text), int(approx_pos) + 2)
        for i in range(search_lo, search_hi):
            if text[i - 1].islower() and text[i].isupper():
                split_char = i
                break
        # Strategy 2: if no case transition, only split if the whole word
        # is NOT a real word (i.e. it's an OCR merge artifact like "sichzie").
        # Real words like "oder", "Kabel", "Zeitung" must not be split.
        if split_char is None:
            clean = re.sub(r"[,;:.!?]+$", "", text)  # strip trailing punct
            if _is_recognized_word(clean):
                new_words.append(w)
                continue
            # Not a real word — use floor of proportional position
            split_char = max(1, min(len(text) - 1, int(approx_pos)))
        left_text = text[:split_char].rstrip()
        right_text = text[split_char:].lstrip()
        if len(left_text) < 2 or len(right_text) < 2:
            new_words.append(w)
            continue
        right_width = w_width - round(left_width)
        new_words.append({
            **w,
            "text": left_text,
            "width": round(left_width),
        })
        new_words.append({
            **w,
            "text": right_text,
            "left": round(split_boundary),
            "width": right_width,
        })
        split_count += 1
        logger.info(
            "split cross-column word %r → %r + %r at boundary %.0f",
            text, left_text, right_text, split_boundary,
        )
    if split_count:
        logger.info("split %d cross-column word(s)", split_count)
    return new_words
 def _filter_border_strip_words(words: List[Dict]) -> Tuple[List[Dict], int]:
    """Remove page-border decoration strip words BEFORE column detection.
@@ -138,8 +280,27 @@ def _cluster_columns_by_alignment(
        median_gap = sorted_gaps[len(sorted_gaps) // 2]
        heights = [w["height"] for w in words if w.get("height", 0) > 0]
        median_h = sorted(heights)[len(heights) // 2] if heights else 25
-        # Column boundary: gap > 3× median gap or > 1.5× median word height
+
-        gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
+        # For small word counts (boxes, sub-zones): PaddleOCR returns
        # multi-word blocks, so ALL inter-word gaps are potential column
        # boundaries.  Use a low threshold based on word height — any gap
        # wider than ~1x median word height is a column separator.
        if len(words) <= 60:
            gap_threshold = max(median_h * 1.0, 25)
            logger.info(
                "alignment columns (small zone): gap_threshold=%.0f "
                "(median_h=%.0f, %d words, %d gaps: %s)",
                gap_threshold, median_h, len(words), len(sorted_gaps),
                [int(g) for g in sorted_gaps[:10]],
            )
        else:
            # Standard approach for large zones (full pages)
            gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
            # Cap at 25% of zone width
            max_gap = zone_w * 0.25
            if gap_threshold > max_gap > 30:
                logger.info("alignment columns: capping gap_threshold %.0f → %.0f (25%% of zone_w=%d)", gap_threshold, max_gap, zone_w)
                gap_threshold = max_gap
    else:
        gap_threshold = 50
@@ -233,13 +394,17 @@ def _cluster_columns_by_alignment(
    used_ids = {id(c) for c in primary} | {id(c) for c in secondary}
    sig_xs = [c["mean_x"] for c in primary + secondary]
-    MIN_DISTINCT_ROWS_TERTIARY = max(MIN_DISTINCT_ROWS + 1, 4)
+    # Tertiary: clusters that are clearly to the LEFT of the first
-    MIN_COVERAGE_TERTIARY = 0.05  # at least 5% of rows
+    # significant column (or RIGHT of the last).  If words consistently
    # start at a position left of the established first column boundary,
    # they MUST be a separate column — regardless of how few rows they
    # cover.  The only requirement is a clear spatial gap.
    MIN_COVERAGE_TERTIARY = 0.02  # at least 1 row effectively
    tertiary = []
    for c in clusters:
        if id(c) in used_ids:
            continue
-        if c["distinct_rows"] < MIN_DISTINCT_ROWS_TERTIARY:
+        if c["distinct_rows"] < 1:
            continue
        if c["row_coverage"] < MIN_COVERAGE_TERTIARY:
            continue
@@ -907,6 +1072,16 @@ def _detect_heading_rows_by_single_cell(
            text = (cell.get("text") or "").strip()
            if not text or text.startswith("["):
                continue
            # Continuation lines start with "(" — e.g. "(usw.)", "(TV-Serie)"
            if text.startswith("("):
                continue
            # Single cell NOT in the first content column is likely a
            # continuation/overflow line, not a heading.  Real headings
            # ("Theme 1", "Unit 3: ...") appear in the first or second
            # content column.
            first_content_col = col_indices[0] if col_indices else 0
            if cell.get("col_index", 0) > first_content_col + 1:
                continue
            # Skip garbled IPA without brackets (e.g. "ska:f – ska:vz")
            # but NOT text with real IPA symbols (e.g. "Theme [θˈiːm]")
            _REAL_IPA_CHARS = set("ˈˌəɪɛɒʊʌæɑɔʃʒθðŋ")
@@ -1043,6 +1218,130 @@ def _detect_header_rows(
    return headers
 def _detect_colspan_cells(
    zone_words: List[Dict],
    columns: List[Dict],
    rows: List[Dict],
    cells: List[Dict],
    img_w: int,
    img_h: int,
 ) -> List[Dict]:
    """Detect and merge cells that span multiple columns (colspan).
    A word-block (PaddleOCR phrase) that extends significantly past a column
    boundary into the next column indicates a merged cell.  This replaces
    the incorrectly split cells with a single cell spanning multiple columns.
    Works for both full-page scans and box zones.
    """
    if len(columns) < 2 or not zone_words or not rows:
        return cells
    from cv_words_first import _assign_word_to_row
    # Column boundaries (midpoints between adjacent columns)
    col_boundaries = []
    for ci in range(len(columns) - 1):
        col_boundaries.append((columns[ci]["x_max"] + columns[ci + 1]["x_min"]) / 2)
    def _cols_covered(w_left: float, w_right: float) -> List[int]:
        """Return list of column indices that a word-block covers."""
        covered = []
        for col in columns:
            col_mid = (col["x_min"] + col["x_max"]) / 2
            # Word covers a column if it extends past the column's midpoint
            if w_left < col_mid < w_right:
                covered.append(col["index"])
            # Also include column if word starts within it
            elif col["x_min"] <= w_left < col["x_max"]:
                covered.append(col["index"])
        return sorted(set(covered))
    # Group original word-blocks by row
    row_word_blocks: Dict[int, List[Dict]] = {}
    for w in zone_words:
        ri = _assign_word_to_row(w, rows)
        row_word_blocks.setdefault(ri, []).append(w)
    # For each row, check if any word-block spans multiple columns
    rows_to_merge: Dict[int, List[Dict]] = {}  # row_index → list of spanning word-blocks
    for ri, wblocks in row_word_blocks.items():
        spanning = []
        for w in wblocks:
            w_left = w["left"]
            w_right = w_left + w["width"]
            covered = _cols_covered(w_left, w_right)
            if len(covered) >= 2:
                spanning.append({"word": w, "cols": covered})
        if spanning:
            rows_to_merge[ri] = spanning
    if not rows_to_merge:
        return cells
    # Merge cells for spanning rows
    new_cells = []
    for cell in cells:
        ri = cell.get("row_index", -1)
        if ri not in rows_to_merge:
            new_cells.append(cell)
            continue
        # Check if this cell's column is part of a spanning block
        ci = cell.get("col_index", -1)
        is_part_of_span = False
        for span in rows_to_merge[ri]:
            if ci in span["cols"]:
                is_part_of_span = True
                # Only emit the merged cell for the FIRST column in the span
                if ci == span["cols"][0]:
                    # Use the ORIGINAL word-block text (not the split cell texts
                    # which may have broken words like "euros a" + "nd cents")
                    orig_word = span["word"]
                    merged_text = orig_word.get("text", "").strip()
                    all_wb = [orig_word]
                    # Compute merged bbox
                    if all_wb:
                        x_min = min(wb["left"] for wb in all_wb)
                        y_min = min(wb["top"] for wb in all_wb)
                        x_max = max(wb["left"] + wb["width"] for wb in all_wb)
                        y_max = max(wb["top"] + wb["height"] for wb in all_wb)
                    else:
                        x_min = y_min = x_max = y_max = 0
                    new_cells.append({
                        "cell_id": cell["cell_id"],
                        "row_index": ri,
                        "col_index": span["cols"][0],
                        "col_type": "spanning_header",
                        "colspan": len(span["cols"]),
                        "text": merged_text,
                        "confidence": cell.get("confidence", 0),
                        "bbox_px": {"x": x_min, "y": y_min,
                                    "w": x_max - x_min, "h": y_max - y_min},
                        "bbox_pct": {
                            "x": round(x_min / img_w * 100, 2) if img_w else 0,
                            "y": round(y_min / img_h * 100, 2) if img_h else 0,
                            "w": round((x_max - x_min) / img_w * 100, 2) if img_w else 0,
                            "h": round((y_max - y_min) / img_h * 100, 2) if img_h else 0,
                        },
                        "word_boxes": all_wb,
                        "ocr_engine": cell.get("ocr_engine", ""),
                        "is_bold": cell.get("is_bold", False),
                    })
                    logger.info(
                        "colspan detected: row %d, cols %s → merged %d cells (%r)",
                        ri, span["cols"], len(span["cols"]), merged_text[:50],
                    )
                break
        if not is_part_of_span:
            new_cells.append(cell)
    return new_cells
 def _build_zone_grid(
    zone_words: List[Dict],
    zone_x: int,
@@ -1111,9 +1410,24 @@ def _build_zone_grid(
            "header_rows": [],
        }
    # Split word boxes that straddle column boundaries (e.g. "sichzie"
    # spanning Col 1 + Col 2).  Must happen after column detection and
    # before cell assignment.
    # Keep original words for colspan detection (split destroys span info).
    original_zone_words = zone_words
    if len(columns) >= 2:
        zone_words = _split_cross_column_words(zone_words, columns)
    # Build cells
    cells = _build_cells(zone_words, columns, rows, img_w, img_h)
    # --- Detect colspan (merged cells spanning multiple columns) ---
    # Uses the ORIGINAL (pre-split) words to detect word-blocks that span
    # multiple columns.  _split_cross_column_words would have destroyed
    # this information by cutting words at column boundaries.
    if len(columns) >= 2:
        cells = _detect_colspan_cells(original_zone_words, columns, rows, cells, img_w, img_h)
    # Prefix cell IDs with zone index
    for cell in cells:
        cell["cell_id"] = f"Z{zone_index}_{cell['cell_id']}"
--- a/klausur-service/backend/ocr_image_enhance.py
+++ b/klausur-service/backend/ocr_image_enhance.py
@@ -0,0 +1,92 @@
 """
 OCR Image Enhancement — Improve scan quality before OCR.
 Applies CLAHE contrast enhancement + bilateral filter denoising
 to degraded scans. Only runs when scan_quality.is_degraded is True.
 Pattern adapted from handwriting_htr_api.py (lines 50-68) and
 cv_layout.py (lines 229-241).
 All operations use OpenCV (Apache-2.0).
 """
 import logging
 import cv2
 import numpy as np
 logger = logging.getLogger(__name__)
 def enhance_for_ocr(
    img_bgr: np.ndarray,
    is_degraded: bool = False,
    clip_limit: float = 3.0,
    tile_size: int = 8,
    denoise_d: int = 9,
    denoise_sigma_color: float = 75,
    denoise_sigma_space: float = 75,
    sharpen: bool = True,
 ) -> np.ndarray:
    """
    Enhance image quality for OCR processing.
    Only applies aggressive enhancement when is_degraded is True.
    For good scans, applies minimal enhancement (light CLAHE only).
    Args:
        img_bgr: Input BGR image
        is_degraded: Whether the scan is degraded (from ScanQualityReport)
        clip_limit: CLAHE clip limit (higher = more contrast)
        tile_size: CLAHE tile grid size
        denoise_d: Bilateral filter diameter
        denoise_sigma_color: Bilateral filter sigma for color
        denoise_sigma_space: Bilateral filter sigma for space
        sharpen: Apply unsharp mask for blurry scans
    Returns:
        Enhanced BGR image
    """
    if not is_degraded:
        # For good scans: light CLAHE only (preserves quality)
        lab = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2LAB)
        l_channel, a_channel, b_channel = cv2.split(lab)
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        l_enhanced = clahe.apply(l_channel)
        lab_enhanced = cv2.merge([l_enhanced, a_channel, b_channel])
        result = cv2.cvtColor(lab_enhanced, cv2.COLOR_LAB2BGR)
        logger.info("enhance_for_ocr: light CLAHE applied (good scan)")
        return result
    # Degraded scan: full enhancement pipeline
    logger.info(
        f"enhance_for_ocr: full enhancement "
        f"(CLAHE clip={clip_limit}, denoise d={denoise_d}, sharpen={sharpen})"
    )
    # 1. CLAHE on L-channel of LAB colorspace (preserves color for RapidOCR)
    lab = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2LAB)
    l_channel, a_channel, b_channel = cv2.split(lab)
    clahe = cv2.createCLAHE(
        clipLimit=clip_limit,
        tileGridSize=(tile_size, tile_size),
    )
    l_enhanced = clahe.apply(l_channel)
    lab_enhanced = cv2.merge([l_enhanced, a_channel, b_channel])
    enhanced = cv2.cvtColor(lab_enhanced, cv2.COLOR_LAB2BGR)
    # 2. Bilateral filter: denoises while preserving edges
    enhanced = cv2.bilateralFilter(
        enhanced,
        d=denoise_d,
        sigmaColor=denoise_sigma_color,
        sigmaSpace=denoise_sigma_space,
    )
    # 3. Unsharp mask for sharpening blurry text
    if sharpen:
        gaussian = cv2.GaussianBlur(enhanced, (0, 0), 3)
        enhanced = cv2.addWeighted(enhanced, 1.5, gaussian, -0.5, 0)
    logger.info("enhance_for_ocr: full enhancement pipeline complete")
    return enhanced
--- a/klausur-service/backend/ocr_pipeline_session_store.py
+++ b/klausur-service/backend/ocr_pipeline_session_store.py
@@ -262,14 +262,22 @@ async def list_sessions_db(
                   document_category, doc_type,
                   parent_session_id, box_index,
                   document_group_id, page_number,
-                   created_at, updated_at
+                   created_at, updated_at,
                   ground_truth
            FROM ocr_pipeline_sessions
            {where}
            ORDER BY created_at DESC
            LIMIT $1
        """, limit)
-        return [_row_to_dict(row) for row in rows]
+        results = []
        for row in rows:
            d = _row_to_dict(row)
            # Derive is_ground_truth flag from JSONB, then drop the heavy field
            gt = d.pop("ground_truth", None) or {}
            d["is_ground_truth"] = bool(gt.get("build_grid_reference"))
            results.append(d)
        return results
 async def get_sub_sessions(parent_session_id: str) -> List[Dict[str, Any]]:
--- a/klausur-service/backend/ocr_pipeline_sessions.py
+++ b/klausur-service/backend/ocr_pipeline_sessions.py
@@ -71,13 +71,36 @@ async def create_session(
    file: UploadFile = File(...),
    name: Optional[str] = Form(None),
 ):
-    """Upload a PDF or image file and create a pipeline session."""
+    """Upload a PDF or image file and create a pipeline session.
    For multi-page PDFs (> 1 page), each page becomes its own session
    grouped under a ``document_group_id``.  The response includes a
    ``pages`` array with one entry per page/session.
    """
    file_data = await file.read()
    filename = file.filename or "upload"
    content_type = file.content_type or ""
    session_id = str(uuid.uuid4())
    is_pdf = content_type == "application/pdf" or filename.lower().endswith(".pdf")
    session_name = name or filename
    # --- Multi-page PDF handling ---
    if is_pdf:
        try:
            import fitz  # PyMuPDF
            pdf_doc = fitz.open(stream=file_data, filetype="pdf")
            page_count = pdf_doc.page_count
            pdf_doc.close()
        except Exception as e:
            raise HTTPException(status_code=400, detail=f"Could not read PDF: {e}")
        if page_count > 1:
            return await _create_multi_page_sessions(
                file_data, filename, session_name, page_count,
            )
    # --- Single page (image or 1-page PDF) ---
    session_id = str(uuid.uuid4())
    try:
        if is_pdf:
@@ -93,7 +116,6 @@ async def create_session(
        raise HTTPException(status_code=500, detail="Failed to encode image")
    original_png = png_buf.tobytes()
    session_name = name or filename
    # Persist to DB
    await create_session_db(
@@ -134,6 +156,86 @@ async def create_session(
    }
 async def _create_multi_page_sessions(
    pdf_data: bytes,
    filename: str,
    base_name: str,
    page_count: int,
 ) -> dict:
    """Create one session per PDF page, grouped by document_group_id."""
    document_group_id = str(uuid.uuid4())
    pages = []
    for page_idx in range(page_count):
        session_id = str(uuid.uuid4())
        page_name = f"{base_name} — Seite {page_idx + 1}"
        try:
            img_bgr = render_pdf_high_res(pdf_data, page_number=page_idx, zoom=3.0)
        except Exception as e:
            logger.warning(f"Failed to render PDF page {page_idx + 1}: {e}")
            continue
        ok, png_buf = cv2.imencode(".png", img_bgr)
        if not ok:
            continue
        page_png = png_buf.tobytes()
        await create_session_db(
            session_id=session_id,
            name=page_name,
            filename=filename,
            original_png=page_png,
            document_group_id=document_group_id,
            page_number=page_idx + 1,
        )
        _cache[session_id] = {
            "id": session_id,
            "filename": filename,
            "name": page_name,
            "original_bgr": img_bgr,
            "oriented_bgr": None,
            "cropped_bgr": None,
            "deskewed_bgr": None,
            "dewarped_bgr": None,
            "orientation_result": None,
            "crop_result": None,
            "deskew_result": None,
            "dewarp_result": None,
            "ground_truth": {},
            "current_step": 1,
        }
        h, w = img_bgr.shape[:2]
        pages.append({
            "session_id": session_id,
            "name": page_name,
            "page_number": page_idx + 1,
            "image_width": w,
            "image_height": h,
            "original_image_url": f"/api/v1/ocr-pipeline/sessions/{session_id}/image/original",
        })
        logger.info(
            f"OCR Pipeline: created page session {session_id} "
            f"(page {page_idx + 1}/{page_count}) from {filename} ({w}x{h})"
        )
    # Include session_id pointing to first page for backwards compatibility
    # (frontends that expect a single session_id will navigate to page 1)
    first_session_id = pages[0]["session_id"] if pages else None
    return {
        "session_id": first_session_id,
        "document_group_id": document_group_id,
        "filename": filename,
        "name": base_name,
        "page_count": page_count,
        "pages": pages,
    }
@router.get("/sessions/{session_id}")
 async def get_session_info(session_id: str):
    """Get session info including deskew/dewarp/column results for step navigation."""
--- a/klausur-service/backend/page_crop.py
+++ b/klausur-service/backend/page_crop.py
@@ -457,6 +457,164 @@ def _detect_spine_shadow(
    return spine_x
 def _detect_gutter_continuity(
    gray: np.ndarray,
    search_region: np.ndarray,
    offset_x: int,
    w: int,
    side: str,
 ) -> Optional[int]:
    """Detect gutter shadow via vertical continuity analysis.
    Camera book scans produce a subtle brightness gradient at the gutter
    that is too faint for scanner-shadow detection (range < 40).  However,
    the gutter shadow has a unique property: it runs **continuously from
    top to bottom** without interruption.  Text and images always have
    vertical gaps between lines, paragraphs, or sections.
    Algorithm:
    1. Divide image into N horizontal strips (~60px each)
    2. For each column, compute what fraction of strips are darker than
       the page median (from the center 50% of the full image)
    3. A "gutter column" has ≥ 75% of strips darker than page_median − δ
    4. Smooth the dark-fraction profile and find the transition point
       from the edge inward where the fraction drops below 0.50
    5. Validate: gutter band must be 0.5%-10% of image width
    Args:
        gray: Full grayscale image.
        search_region: Edge slice of the grayscale image.
        offset_x: X offset of search_region relative to full image.
        w: Full image width.
        side: 'left' or 'right'.
    Returns:
        X coordinate (in full image) of the gutter inner edge, or None.
    """
    region_h, region_w = search_region.shape[:2]
    if region_w < 20 or region_h < 100:
        return None
    # --- 1. Divide into horizontal strips ---
    strip_target_h = 60  # ~60px per strip
    n_strips = max(10, region_h // strip_target_h)
    strip_h = region_h // n_strips
    strip_means = np.zeros((n_strips, region_w), dtype=np.float64)
    for s in range(n_strips):
        y0 = s * strip_h
        y1 = min((s + 1) * strip_h, region_h)
        strip_means[s] = np.mean(search_region[y0:y1, :], axis=0)
    # --- 2. Page median from center 50% of full image ---
    center_lo = w // 4
    center_hi = 3 * w // 4
    page_median = float(np.median(gray[:, center_lo:center_hi]))
    # Camera shadows are subtle — threshold just 5 levels below page median
    dark_thresh = page_median - 5.0
    # If page is very dark overall (e.g. photo, not a book page), bail out
    if page_median < 180:
        return None
    # --- 3. Per-column dark fraction ---
    dark_count = np.sum(strip_means < dark_thresh, axis=0).astype(np.float64)
    dark_frac = dark_count / n_strips  # shape: (region_w,)
    # --- 4. Smooth and find transition ---
    # Rolling mean (window = 1% of image width, min 5)
    smooth_w = max(5, w // 100)
    if smooth_w % 2 == 0:
        smooth_w += 1
    kernel = np.ones(smooth_w) / smooth_w
    frac_smooth = np.convolve(dark_frac, kernel, mode="same")
    # Trim convolution edges
    margin = smooth_w // 2
    if region_w <= 2 * margin + 10:
        return None
    # Find the peak of dark fraction (gutter center).
    # For right gutters the peak is near the edge; for left gutters
    # (V-shaped spine shadow) the peak may be well inside the region.
    transition_thresh = 0.50
    peak_frac = float(np.max(frac_smooth[margin:region_w - margin]))
    if peak_frac < 0.70:
        logger.debug(
            "%s gutter: peak dark fraction %.2f < 0.70", side.capitalize(), peak_frac,
        )
        return None
    peak_x = int(np.argmax(frac_smooth[margin:region_w - margin])) + margin
    gutter_inner = None  # local x in search_region
    if side == "right":
        # Scan from peak toward the page center (leftward)
        for x in range(peak_x, margin, -1):
            if frac_smooth[x] < transition_thresh:
                gutter_inner = x + 1
                break
    else:
        # Scan from peak toward the page center (rightward)
        for x in range(peak_x, region_w - margin):
            if frac_smooth[x] < transition_thresh:
                gutter_inner = x - 1
                break
    if gutter_inner is None:
        return None
    # --- 5. Validate gutter width ---
    if side == "right":
        gutter_width = region_w - gutter_inner
    else:
        gutter_width = gutter_inner
    min_gutter = max(3, int(w * 0.005))   # at least 0.5% of image
    max_gutter = int(w * 0.10)            # at most 10% of image
    if gutter_width < min_gutter:
        logger.debug(
            "%s gutter: too narrow (%dpx < %dpx)", side.capitalize(),
            gutter_width, min_gutter,
        )
        return None
    if gutter_width > max_gutter:
        logger.debug(
            "%s gutter: too wide (%dpx > %dpx)", side.capitalize(),
            gutter_width, max_gutter,
        )
        return None
    # Check that the gutter band is meaningfully darker than the page
    if side == "right":
        gutter_brightness = float(np.mean(strip_means[:, gutter_inner:]))
    else:
        gutter_brightness = float(np.mean(strip_means[:, :gutter_inner]))
    brightness_drop = page_median - gutter_brightness
    if brightness_drop < 3:
        logger.debug(
            "%s gutter: insufficient brightness drop (%.1f levels)",
            side.capitalize(), brightness_drop,
        )
        return None
    gutter_x = offset_x + gutter_inner
    logger.info(
        "%s gutter (continuity): x=%d, width=%dpx (%.1f%%), "
        "brightness=%.0f vs page=%.0f (drop=%.0f), frac@edge=%.2f",
        side.capitalize(), gutter_x, gutter_width,
        100.0 * gutter_width / w, gutter_brightness, page_median,
        brightness_drop, float(frac_smooth[gutter_inner]),
    )
    return gutter_x
 def _detect_left_edge_shadow(
    gray: np.ndarray,
    binary: np.ndarray,
@@ -465,15 +623,22 @@ def _detect_left_edge_shadow(
 ) -> int:
    """Detect left content edge, accounting for book-spine shadow.
-    Looks at the left 25% for a scanner gray strip.  Cuts at the
+    Tries three methods in order:
-    darkest column (= spine center).  Fallback: binary projection.
+    1. Scanner spine-shadow (dark gradient, range > 40)
    2. Camera gutter continuity (subtle shadow running top-to-bottom)
    3. Binary projection fallback (first ink column)
    """
    search_w = max(1, w // 4)
    spine_x = _detect_spine_shadow(gray, gray[:, :search_w], 0, w, "left")
    if spine_x is not None:
        return spine_x
-    # Fallback: binary vertical projection
+    # Fallback 1: vertical continuity (camera gutter shadow)
    gutter_x = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
    if gutter_x is not None:
        return gutter_x
    # Fallback 2: binary vertical projection
    return _detect_edge_projection(binary, axis=0, from_start=True, dim=w)
@@ -485,8 +650,10 @@ def _detect_right_edge_shadow(
 ) -> int:
    """Detect right content edge, accounting for book-spine shadow.
-    Looks at the right 25% for a scanner gray strip.  Cuts at the
+    Tries three methods in order:
-    darkest column (= spine center).  Fallback: binary projection.
+    1. Scanner spine-shadow (dark gradient, range > 40)
    2. Camera gutter continuity (subtle shadow running top-to-bottom)
    3. Binary projection fallback (last ink column)
    """
    search_w = max(1, w // 4)
    right_start = w - search_w
@@ -494,7 +661,12 @@ def _detect_right_edge_shadow(
    if spine_x is not None:
        return spine_x
-    # Fallback: binary vertical projection
+    # Fallback 1: vertical continuity (camera gutter shadow)
    gutter_x = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
    if gutter_x is not None:
        return gutter_x
    # Fallback 2: binary vertical projection
    return _detect_edge_projection(binary, axis=0, from_start=False, dim=w)
--- a/klausur-service/backend/scan_quality.py
+++ b/klausur-service/backend/scan_quality.py
@@ -0,0 +1,102 @@
 """
 Scan Quality Assessment — Measures image quality before OCR.
 Computes blur score, contrast score, and an overall quality rating.
 Used to gate enhancement steps and warn users about degraded scans.
 All operations use OpenCV (Apache-2.0), no additional dependencies.
 """
 import logging
 from dataclasses import dataclass, asdict
 from typing import Dict, Any
 import cv2
 import numpy as np
 logger = logging.getLogger(__name__)
 # Thresholds (empirically tuned on textbook scans)
 BLUR_THRESHOLD = 100.0       # Laplacian variance below this = blurry
 CONTRAST_THRESHOLD = 40.0    # Grayscale stddev below this = low contrast
 CONFIDENCE_GOOD = 40         # OCR min confidence for good scans
 CONFIDENCE_DEGRADED = 30     # OCR min confidence for degraded scans
@dataclass
 class ScanQualityReport:
    """Result of scan quality assessment."""
    blur_score: float         # Laplacian variance (higher = sharper)
    contrast_score: float     # Grayscale std deviation (higher = more contrast)
    brightness: float         # Mean grayscale value (0-255)
    is_blurry: bool
    is_low_contrast: bool
    is_degraded: bool         # True if any quality issue detected
    quality_pct: int          # 0-100 overall quality estimate
    recommended_min_conf: int # Recommended OCR confidence threshold
    def to_dict(self) -> Dict[str, Any]:
        return asdict(self)
 def score_scan_quality(img_bgr: np.ndarray) -> ScanQualityReport:
    """
    Assess the quality of a scanned image.
    Uses:
    - Laplacian variance for blur detection
    - Grayscale standard deviation for contrast
    - Mean brightness for exposure assessment
    Args:
        img_bgr: BGR image (numpy array from OpenCV)
    Returns:
        ScanQualityReport with scores and recommendations
    """
    gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
    # Blur detection: Laplacian variance
    # Higher = sharper edges = better quality
    laplacian = cv2.Laplacian(gray, cv2.CV_64F)
    blur_score = float(laplacian.var())
    # Contrast: standard deviation of grayscale
    contrast_score = float(np.std(gray))
    # Brightness: mean grayscale
    brightness = float(np.mean(gray))
    # Quality flags
    is_blurry = blur_score < BLUR_THRESHOLD
    is_low_contrast = contrast_score < CONTRAST_THRESHOLD
    is_degraded = is_blurry or is_low_contrast
    # Overall quality percentage (simple weighted combination)
    blur_pct = min(100, blur_score / BLUR_THRESHOLD * 50)
    contrast_pct = min(100, contrast_score / CONTRAST_THRESHOLD * 50)
    quality_pct = int(min(100, blur_pct + contrast_pct))
    # Recommended confidence threshold
    recommended_min_conf = CONFIDENCE_DEGRADED if is_degraded else CONFIDENCE_GOOD
    report = ScanQualityReport(
        blur_score=round(blur_score, 1),
        contrast_score=round(contrast_score, 1),
        brightness=round(brightness, 1),
        is_blurry=is_blurry,
        is_low_contrast=is_low_contrast,
        is_degraded=is_degraded,
        quality_pct=quality_pct,
        recommended_min_conf=recommended_min_conf,
    )
    logger.info(
        f"Scan quality: blur={report.blur_score} "
        f"contrast={report.contrast_score} "
        f"quality={report.quality_pct}% "
        f"degraded={report.is_degraded} "
        f"min_conf={report.recommended_min_conf}"
    )
    return report
--- a/klausur-service/backend/services/lighton_ocr_service.py
+++ b/klausur-service/backend/services/lighton_ocr_service.py
@@ -0,0 +1,119 @@
 """
 LightOnOCR-2-1B Service
 End-to-end VLM OCR fuer gedruckten und gemischten Text.
 1B Parameter, Apple MPS-faehig (M-Serie).
 Modell:  lightonai/LightOnOCR-2-1B
 Lizenz:  Apache 2.0
 Quelle:  https://huggingface.co/lightonai/LightOnOCR-2-1B
 Unterstuetzte Dokumenttypen:
 - Buchseiten, Vokabelseiten
 - Arbeitsblaetter, Klausuren
 - Gemischt gedruckt/handschriftlich
 DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
 """
 import io
 import logging
 import os
 from typing import Optional, Tuple
 logger = logging.getLogger(__name__)
 LIGHTON_MODEL_ID = os.getenv("LIGHTON_OCR_MODEL", "lightonai/LightOnOCR-2-1B")
 _lighton_model = None
 _lighton_processor = None
 _lighton_available: Optional[bool] = None
 def _check_lighton_available() -> bool:
    """Check if LightOnOCR dependencies (transformers, torch) are available."""
    global _lighton_available
    if _lighton_available is not None:
        return _lighton_available
    try:
        from transformers import AutoModelForImageTextToText, AutoProcessor  # noqa: F401
        import torch  # noqa: F401
        _lighton_available = True
    except ImportError as e:
        logger.warning(f"LightOnOCR deps not available: {e}")
        _lighton_available = False
    return _lighton_available
 def get_lighton_model() -> Tuple:
    """
    Lazy-load LightOnOCR-2-1B processor and model.
    Returns (processor, model) or (None, None) on failure.
    Device priority: MPS (Apple Silicon) > CUDA > CPU.
    """
    global _lighton_model, _lighton_processor
    if _lighton_model is not None:
        return _lighton_processor, _lighton_model
    if not _check_lighton_available():
        return None, None
    try:
        import torch
        from transformers import AutoModelForImageTextToText, AutoProcessor
        if torch.backends.mps.is_available():
            device = "mps"
        elif torch.cuda.is_available():
            device = "cuda"
        else:
            device = "cpu"
        dtype = torch.bfloat16
        logger.info(f"Loading LightOnOCR-2-1B on {device} ({dtype}) from {LIGHTON_MODEL_ID} ...")
        _lighton_processor = AutoProcessor.from_pretrained(LIGHTON_MODEL_ID)
        _lighton_model = AutoModelForImageTextToText.from_pretrained(
            LIGHTON_MODEL_ID, torch_dtype=dtype
        ).to(device)
        _lighton_model.eval()
        logger.info("LightOnOCR-2-1B loaded successfully")
    except Exception as e:
        logger.error(f"Failed to load LightOnOCR-2-1B: {e}")
        _lighton_model = None
        _lighton_processor = None
    return _lighton_processor, _lighton_model
 def run_lighton_ocr_sync(image_bytes: bytes) -> Optional[str]:
    """
    Run LightOnOCR on image bytes (synchronous).
    Returns extracted text or None on error.
    Generic — works for any document/page region.
    """
    processor, model = get_lighton_model()
    if processor is None or model is None:
        return None
    try:
        import torch
        from PIL import Image as _PILImage
        pil_img = _PILImage.open(io.BytesIO(image_bytes)).convert("RGB")
        conversation = [{"role": "user", "content": [{"type": "image"}]}]
        inputs = processor.apply_chat_template(
            conversation, images=[pil_img],
            add_generation_prompt=True, return_tensors="pt"
        ).to(model.device)
        with torch.no_grad():
            output_ids = model.generate(**inputs, max_new_tokens=1024)
        text = processor.decode(output_ids[0], skip_special_tokens=True)
        return text.strip() if text else None
    except Exception as e:
        logger.error(f"LightOnOCR inference failed: {e}")
        return None
--- a/klausur-service/backend/smart_spell.py
+++ b/klausur-service/backend/smart_spell.py
@@ -0,0 +1,594 @@
 """
 SmartSpellChecker — Language-aware OCR post-correction without LLMs.
 Uses pyspellchecker (MIT) with dual EN+DE dictionaries for:
 - Automatic language detection per word (dual-dictionary heuristic)
 - OCR error correction (digit↔letter, umlauts, transpositions)
 - Context-based disambiguation (a/I, l/I) via bigram lookup
 - Mixed-language support for example sentences
 Lizenz: Apache 2.0 (kommerziell nutzbar)
 """
 import logging
 import re
 from dataclasses import dataclass, field
 from typing import Dict, List, Literal, Optional, Set, Tuple
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Init
 # ---------------------------------------------------------------------------
 try:
    from spellchecker import SpellChecker as _SpellChecker
    _en_spell = _SpellChecker(language='en', distance=1)
    _de_spell = _SpellChecker(language='de', distance=1)
    _AVAILABLE = True
 except ImportError:
    _AVAILABLE = False
    logger.warning("pyspellchecker not installed — SmartSpellChecker disabled")
 Lang = Literal["en", "de", "both", "unknown"]
 # ---------------------------------------------------------------------------
 # Bigram context for a/I disambiguation
 # ---------------------------------------------------------------------------
 # Words that commonly follow "I" (subject pronoun → verb/modal)
 _I_FOLLOWERS: frozenset = frozenset({
    "am", "was", "have", "had", "do", "did", "will", "would", "can",
    "could", "should", "shall", "may", "might", "must",
    "think", "know", "see", "want", "need", "like", "love", "hate",
    "go", "went", "come", "came", "say", "said", "get", "got",
    "make", "made", "take", "took", "give", "gave", "tell", "told",
    "feel", "felt", "find", "found", "believe", "hope", "wish",
    "remember", "forget", "understand", "mean", "meant",
    "don't", "didn't", "can't", "won't", "couldn't", "wouldn't",
    "shouldn't", "haven't", "hadn't", "isn't", "wasn't",
    "really", "just", "also", "always", "never", "often", "sometimes",
 })
 # Words that commonly follow "a" (article → noun/adjective)
 _A_FOLLOWERS: frozenset = frozenset({
    "lot", "few", "little", "bit", "good", "bad", "great", "new", "old",
    "long", "short", "big", "small", "large", "huge", "tiny",
    "nice", "beautiful", "wonderful", "terrible", "horrible",
    "man", "woman", "boy", "girl", "child", "dog", "cat", "bird",
    "book", "car", "house", "room", "school", "teacher", "student",
    "day", "week", "month", "year", "time", "place", "way",
    "friend", "family", "person", "problem", "question", "story",
    "very", "really", "quite", "rather", "pretty", "single",
 })
 # Digit→letter substitutions (OCR confusion)
 _DIGIT_SUBS: Dict[str, List[str]] = {
    '0': ['o', 'O'],
    '1': ['l', 'I'],
    '5': ['s', 'S'],
    '6': ['g', 'G'],
    '8': ['b', 'B'],
    '|': ['I', 'l'],
    '/': ['l'],  # italic 'l' misread as slash (e.g. "p/" → "pl")
 }
 _SUSPICIOUS_CHARS = frozenset(_DIGIT_SUBS.keys())
 # Umlaut confusion: OCR drops dots (ü→u, ä→a, ö→o)
 _UMLAUT_MAP = {
    'a': 'ä', 'o': 'ö', 'u': 'ü', 'i': 'ü',
    'A': 'Ä', 'O': 'Ö', 'U': 'Ü', 'I': 'Ü',
 }
 # Tokenizer — includes | and / so OCR artifacts like "p/" are treated as words
 _TOKEN_RE = re.compile(r"([A-Za-zÄÖÜäöüß'|/]+)([^A-Za-zÄÖÜäöüß'|/]*)")
 # ---------------------------------------------------------------------------
 # Data types
 # ---------------------------------------------------------------------------
@dataclass
 class CorrectionResult:
    original: str
    corrected: str
    lang_detected: Lang
    changed: bool
    changes: List[str] = field(default_factory=list)
 # ---------------------------------------------------------------------------
 # Core class
 # ---------------------------------------------------------------------------
 class SmartSpellChecker:
    """Language-aware OCR spell checker using pyspellchecker (no LLM)."""
    def __init__(self):
        if not _AVAILABLE:
            raise RuntimeError("pyspellchecker not installed")
        self.en = _en_spell
        self.de = _de_spell
    # --- Language detection ---
    def detect_word_lang(self, word: str) -> Lang:
        """Detect language of a single word using dual-dict heuristic."""
        w = word.lower().strip(".,;:!?\"'()")
        if not w:
            return "unknown"
        in_en = bool(self.en.known([w]))
        in_de = bool(self.de.known([w]))
        if in_en and in_de:
            return "both"
        if in_en:
            return "en"
        if in_de:
            return "de"
        return "unknown"
    def detect_text_lang(self, text: str) -> Lang:
        """Detect dominant language of a text string (sentence/phrase)."""
        words = re.findall(r"[A-Za-zÄÖÜäöüß]+", text)
        if not words:
            return "unknown"
        en_count = 0
        de_count = 0
        for w in words:
            lang = self.detect_word_lang(w)
            if lang == "en":
                en_count += 1
            elif lang == "de":
                de_count += 1
            # "both" doesn't count for either
        if en_count > de_count:
            return "en"
        if de_count > en_count:
            return "de"
        if en_count == de_count and en_count > 0:
            return "both"
        return "unknown"
    # --- Single-word correction ---
    def _known(self, word: str) -> bool:
        """True if word is known in EN or DE dictionary, or is a known abbreviation."""
        w = word.lower()
        if bool(self.en.known([w])) or bool(self.de.known([w])):
            return True
        # Also accept known abbreviations (sth, sb, adj, etc.)
        try:
            from cv_ocr_engines import _KNOWN_ABBREVIATIONS
            if w in _KNOWN_ABBREVIATIONS:
                return True
        except ImportError:
            pass
        return False
    def _word_freq(self, word: str) -> float:
        """Get word frequency (max of EN and DE)."""
        w = word.lower()
        return max(self.en.word_usage_frequency(w), self.de.word_usage_frequency(w))
    def _known_in(self, word: str, lang: str) -> bool:
        """True if word is known in a specific language dictionary."""
        w = word.lower()
        spell = self.en if lang == "en" else self.de
        return bool(spell.known([w]))
    def correct_word(self, word: str, lang: str = "en",
                     prev_word: str = "", next_word: str = "") -> Optional[str]:
        """Correct a single word for the given language.
        Returns None if no correction needed, or the corrected string.
        Args:
            word: The word to check/correct
            lang: Expected language ("en" or "de")
            prev_word: Previous word (for context)
            next_word: Next word (for context)
        """
        if not word or not word.strip():
            return None
        # Skip numbers, abbreviations with dots, very short tokens
        if word.isdigit() or '.' in word:
            return None
        # Skip IPA/phonetic content in brackets
        if '[' in word or ']' in word:
            return None
        has_suspicious = any(ch in _SUSPICIOUS_CHARS for ch in word)
        # 1. Already known → no fix
        if self._known(word):
            # But check a/I disambiguation for single-char words
            if word.lower() in ('l', '|') and next_word:
                return self._disambiguate_a_I(word, next_word)
            return None
        # 2. Digit/pipe substitution
        if has_suspicious:
            if word == '|':
                return 'I'
            # Try single-char substitutions
            for i, ch in enumerate(word):
                if ch not in _DIGIT_SUBS:
                    continue
                for replacement in _DIGIT_SUBS[ch]:
                    candidate = word[:i] + replacement + word[i + 1:]
                    if self._known(candidate):
                        return candidate
            # Try multi-char substitution (e.g., "sch00l" → "school")
            multi = self._try_multi_digit_sub(word)
            if multi:
                return multi
        # 3. Umlaut correction (German)
        if lang == "de" and len(word) >= 3 and word.isalpha():
            umlaut_fix = self._try_umlaut_fix(word)
            if umlaut_fix:
                return umlaut_fix
        # 4. General spell correction
        if not has_suspicious and len(word) >= 3 and word.isalpha():
            # Safety: don't correct if the word is valid in the OTHER language
            # (either directly or via umlaut fix)
            other_lang = "de" if lang == "en" else "en"
            if self._known_in(word, other_lang):
                return None
            if other_lang == "de" and self._try_umlaut_fix(word):
                return None  # has a valid DE umlaut variant → don't touch
            spell = self.en if lang == "en" else self.de
            correction = spell.correction(word.lower())
            if correction and correction != word.lower():
                if word[0].isupper():
                    correction = correction[0].upper() + correction[1:]
                if self._known(correction):
                    return correction
        return None
    # --- Multi-digit substitution ---
    def _try_multi_digit_sub(self, word: str) -> Optional[str]:
        """Try replacing multiple digits simultaneously."""
        positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
        if len(positions) < 1 or len(positions) > 4:
            return None
        # Try all combinations (max 2^4 = 16 for 4 positions)
        chars = list(word)
        best = None
        self._multi_sub_recurse(chars, positions, 0, best_result=[None])
        return self._multi_sub_recurse_result
    _multi_sub_recurse_result: Optional[str] = None
    def _try_multi_digit_sub(self, word: str) -> Optional[str]:
        """Try replacing multiple digits simultaneously using BFS."""
        positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
        if not positions or len(positions) > 4:
            return None
        # BFS over substitution combinations
        queue = [list(word)]
        for pos, ch in positions:
            next_queue = []
            for current in queue:
                # Keep original
                next_queue.append(current[:])
                # Try each substitution
                for repl in _DIGIT_SUBS[ch]:
                    variant = current[:]
                    variant[pos] = repl
                    next_queue.append(variant)
            queue = next_queue
        # Check which combinations produce known words
        for combo in queue:
            candidate = "".join(combo)
            if candidate != word and self._known(candidate):
                return candidate
        return None
    # --- Umlaut fix ---
    def _try_umlaut_fix(self, word: str) -> Optional[str]:
        """Try single-char umlaut substitutions for German words."""
        for i, ch in enumerate(word):
            if ch in _UMLAUT_MAP:
                candidate = word[:i] + _UMLAUT_MAP[ch] + word[i + 1:]
                if self._known(candidate):
                    return candidate
        return None
    # --- Boundary repair (shifted word boundaries) ---
    def _try_boundary_repair(self, word1: str, word2: str) -> Optional[Tuple[str, str]]:
        """Fix shifted word boundaries between adjacent tokens.
        OCR sometimes shifts the boundary: "at sth." → "ats th."
        Try moving 1-2 chars from end of word1 to start of word2 and vice versa.
        Returns (fixed_word1, fixed_word2) or None.
        """
        # Import known abbreviations for vocabulary context
        try:
            from cv_ocr_engines import _KNOWN_ABBREVIATIONS
        except ImportError:
            _KNOWN_ABBREVIATIONS = set()
        # Strip trailing punctuation for checking, preserve for result
        w2_stripped = word2.rstrip(".,;:!?")
        w2_punct = word2[len(w2_stripped):]
        # Try shifting 1-2 chars from word1 → word2
        for shift in (1, 2):
            if len(word1) <= shift:
                continue
            new_w1 = word1[:-shift]
            new_w2_base = word1[-shift:] + w2_stripped
            w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
            w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
            if w1_ok and w2_ok:
                return (new_w1, new_w2_base + w2_punct)
        # Try shifting 1-2 chars from word2 → word1
        for shift in (1, 2):
            if len(w2_stripped) <= shift:
                continue
            new_w1 = word1 + w2_stripped[:shift]
            new_w2_base = w2_stripped[shift:]
            w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
            w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
            if w1_ok and w2_ok:
                return (new_w1, new_w2_base + w2_punct)
        return None
    # --- Context-based word split for ambiguous merges ---
    # Patterns where a valid word is actually "a" + adjective/noun
    _ARTICLE_SPLIT_CANDIDATES = {
        # word → (article, remainder) — only when followed by a compatible word
        "anew": ("a", "new"),
        "areal": ("a", "real"),
        "alive": None,    # genuinely one word, never split
        "alone": None,
        "aware": None,
        "alike": None,
        "apart": None,
        "aside": None,
        "above": None,
        "about": None,
        "among": None,
        "along": None,
    }
    def _try_context_split(self, word: str, next_word: str,
                           prev_word: str) -> Optional[str]:
        """Split words like 'anew' → 'a new' when context indicates a merge.
        Only splits when:
        - The word is in the split candidates list
        - The following word makes sense as a noun (for "a + adj + noun" pattern)
        - OR the word is unknown and can be split into article + known word
        """
        w_lower = word.lower()
        # Check explicit candidates
        if w_lower in self._ARTICLE_SPLIT_CANDIDATES:
            split = self._ARTICLE_SPLIT_CANDIDATES[w_lower]
            if split is None:
                return None  # explicitly marked as "don't split"
            article, remainder = split
            # Only split if followed by a word (noun pattern)
            if next_word and next_word[0].islower():
                return f"{article} {remainder}"
            # Also split if remainder + next_word makes a common phrase
            if next_word and self._known(next_word):
                return f"{article} {remainder}"
        # Generic: if word starts with 'a' and rest is a known adjective/word
        if (len(word) >= 4 and word[0].lower() == 'a'
                and not self._known(word)  # only for UNKNOWN words
                and self._known(word[1:])):
            return f"a {word[1:]}"
        return None
    # --- a/I disambiguation ---
    def _disambiguate_a_I(self, token: str, next_word: str) -> Optional[str]:
        """Disambiguate 'a' vs 'I' (and OCR variants like 'l', '|')."""
        nw = next_word.lower().strip(".,;:!?")
        if nw in _I_FOLLOWERS:
            return "I"
        if nw in _A_FOLLOWERS:
            return "a"
        # Fallback: check if next word is more commonly a verb (→I) or noun/adj (→a)
        # Simple heuristic: if next word starts with uppercase (and isn't first in sentence)
        # it's likely a German noun following "I"... but in English context, uppercase
        # after "I" is unusual.
        return None  # uncertain, don't change
    # --- Full text correction ---
    def correct_text(self, text: str, lang: str = "en") -> CorrectionResult:
        """Correct a full text string (field value).
        Three passes:
        1. Boundary repair — fix shifted word boundaries between adjacent tokens
        2. Context split — split ambiguous merges (anew → a new)
        3. Per-word correction — spell check individual words
        Args:
            text: The text to correct
            lang: Expected language ("en" or "de")
        """
        if not text or not text.strip():
            return CorrectionResult(text, text, "unknown", False)
        detected = self.detect_text_lang(text) if lang == "auto" else lang
        effective_lang = detected if detected in ("en", "de") else "en"
        changes: List[str] = []
        tokens = list(_TOKEN_RE.finditer(text))
        # Extract token list: [(word, separator), ...]
        token_list: List[List[str]] = []  # [[word, sep], ...]
        for m in tokens:
            token_list.append([m.group(1), m.group(2)])
        # --- Pass 1: Boundary repair between adjacent unknown words ---
        # Import abbreviations for the heuristic below
        try:
            from cv_ocr_engines import _KNOWN_ABBREVIATIONS as _ABBREVS
        except ImportError:
            _ABBREVS = set()
        for i in range(len(token_list) - 1):
            w1 = token_list[i][0]
            w2_raw = token_list[i + 1][0]
            # Skip boundary repair for IPA/bracket content
            # Brackets may be in the token OR in the adjacent separators
            sep_before_w1 = token_list[i - 1][1] if i > 0 else ""
            sep_after_w1 = token_list[i][1]
            sep_after_w2 = token_list[i + 1][1]
            has_bracket = (
                '[' in w1 or ']' in w1 or '[' in w2_raw or ']' in w2_raw
                or ']' in sep_after_w1  # w1 text was inside [brackets]
                or '[' in sep_after_w1  # w2 starts a bracket
                or ']' in sep_after_w2  # w2 text was inside [brackets]
                or '[' in sep_before_w1  # w1 starts a bracket
            )
            if has_bracket:
                continue
            # Include trailing punct from separator in w2 for abbreviation matching
            w2_with_punct = w2_raw + token_list[i + 1][1].rstrip(" ")
            # Try boundary repair — always, even if both words are valid.
            # Use word-frequency scoring to decide if repair is better.
            repair = self._try_boundary_repair(w1, w2_with_punct)
            if not repair and w2_with_punct != w2_raw:
                repair = self._try_boundary_repair(w1, w2_raw)
            if repair:
                new_w1, new_w2_full = repair
                new_w2_base = new_w2_full.rstrip(".,;:!?")
                # Frequency-based scoring: product of word frequencies
                # Higher product = more common word pair = better
                old_freq = self._word_freq(w1) * self._word_freq(w2_raw)
                new_freq = self._word_freq(new_w1) * self._word_freq(new_w2_base)
                # Abbreviation bonus: if repair produces a known abbreviation
                has_abbrev = new_w1.lower() in _ABBREVS or new_w2_base.lower() in _ABBREVS
                if has_abbrev:
                    # Accept abbreviation repair ONLY if at least one of the
                    # original words is rare/unknown (prevents "Can I" → "Ca nI"
                    # where both original words are common and correct).
                    # "Rare" = frequency < 1e-6 (covers "ats", "th" but not "Can", "I")
                    RARE_THRESHOLD = 1e-6
                    orig_both_common = (
                        self._word_freq(w1) > RARE_THRESHOLD
                        and self._word_freq(w2_raw) > RARE_THRESHOLD
                    )
                    if not orig_both_common:
                        new_freq = max(new_freq, old_freq * 10)
                    else:
                        has_abbrev = False  # both originals common → don't trust
                # Accept if repair produces a more frequent word pair
                # (threshold: at least 5x more frequent to avoid false positives)
                if new_freq > old_freq * 5:
                    new_w2_punct = new_w2_full[len(new_w2_base):]
                    changes.append(f"{w1} {w2_raw}→{new_w1} {new_w2_base}")
                    token_list[i][0] = new_w1
                    token_list[i + 1][0] = new_w2_base
                    if new_w2_punct:
                        token_list[i + 1][1] = new_w2_punct + token_list[i + 1][1].lstrip(".,;:!?")
        # --- Pass 2: Context split (anew → a new) ---
        expanded: List[List[str]] = []
        for i, (word, sep) in enumerate(token_list):
            next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
            prev_word = token_list[i - 1][0] if i > 0 else ""
            split = self._try_context_split(word, next_word, prev_word)
            if split and split != word:
                changes.append(f"{word}→{split}")
                expanded.append([split, sep])
            else:
                expanded.append([word, sep])
        token_list = expanded
        # --- Pass 3: Per-word correction ---
        parts: List[str] = []
        # Preserve any leading text before the first token match
        # (e.g., "(= " before "I won and he lost.")
        first_start = tokens[0].start() if tokens else 0
        if first_start > 0:
            parts.append(text[:first_start])
        for i, (word, sep) in enumerate(token_list):
            # Skip words inside IPA brackets (brackets land in separators)
            prev_sep = token_list[i - 1][1] if i > 0 else ""
            if '[' in prev_sep or ']' in sep:
                parts.append(word)
                parts.append(sep)
                continue
            next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
            prev_word = token_list[i - 1][0] if i > 0 else ""
            correction = self.correct_word(
                word, lang=effective_lang,
                prev_word=prev_word, next_word=next_word,
            )
            if correction and correction != word:
                changes.append(f"{word}→{correction}")
                parts.append(correction)
            else:
                parts.append(word)
            parts.append(sep)
        # Append any trailing text
        last_end = tokens[-1].end() if tokens else 0
        if last_end < len(text):
            parts.append(text[last_end:])
        corrected = "".join(parts)
        return CorrectionResult(
            original=text,
            corrected=corrected,
            lang_detected=detected,
            changed=corrected != text,
            changes=changes,
        )
    # --- Vocabulary entry correction ---
    def correct_vocab_entry(self, english: str, german: str,
                            example: str = "") -> Dict[str, CorrectionResult]:
        """Correct a full vocabulary entry (EN + DE + example).
        Uses column position to determine language — the most reliable signal.
        """
        results = {}
        results["english"] = self.correct_text(english, lang="en")
        results["german"] = self.correct_text(german, lang="de")
        if example:
            # For examples, auto-detect language
            results["example"] = self.correct_text(example, lang="auto")
        return results
--- a/klausur-service/backend/tests/debug_shear.py
+++ b/klausur-service/backend/tests/debug_shear.py
@@ -0,0 +1,100 @@
 #!/usr/bin/env python3
 """Debug script: analyze text line slopes on deskewed image to determine true residual shear."""
 import sys, math, asyncio
 sys.path.insert(0, "/app/backend")
 import cv2
 import numpy as np
 import pytesseract
 from ocr_pipeline_session_store import get_session_db
 SESSION_ID = "3dcb1897-09a6-4b80-91b5-7e4207980bf3"
 async def main():
    s = await get_session_db(SESSION_ID)
    if not s:
        print("Session not found")
        return
    deskewed_png = s.get("deskewed_png")
    if not deskewed_png:
        print("No deskewed_png stored")
        return
    arr = np.frombuffer(deskewed_png, dtype=np.uint8)
    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
    h, w = img.shape[:2]
    print(f"Deskewed image: {w}x{h}")
    # Detect text line slopes using Tesseract word positions
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    data = pytesseract.image_to_data(gray, output_type=pytesseract.Output.DICT, config="--psm 6")
    lines = {}
    for i in range(len(data["text"])):
        txt = (data["text"][i] or "").strip()
        if len(txt) < 2 or data["conf"][i] < 30:
            continue
        key = (data["block_num"][i], data["par_num"][i], data["line_num"][i])
        cx = data["left"][i] + data["width"][i] / 2
        cy = data["top"][i] + data["height"][i] / 2
        if key not in lines:
            lines[key] = []
        lines[key].append((cx, cy))
    slopes = []
    for key, pts in lines.items():
        if len(pts) < 3:
            continue
        pts.sort(key=lambda p: p[0])
        xs = np.array([p[0] for p in pts])
        ys = np.array([p[1] for p in pts])
        if xs[-1] - xs[0] < w * 0.2:
            continue
        A = np.vstack([xs, np.ones(len(xs))]).T
        result = np.linalg.lstsq(A, ys, rcond=None)
        slope = result[0][0]
        angle_deg = math.degrees(math.atan(slope))
        slopes.append(angle_deg)
    if not slopes:
        print("No text lines detected")
        return
    median_slope = sorted(slopes)[len(slopes) // 2]
    mean_slope = sum(slopes) / len(slopes)
    print(f"Text lines found: {len(slopes)}")
    print(f"Median slope: {median_slope:.4f} deg")
    print(f"Mean slope:   {mean_slope:.4f} deg")
    print(f"Range: [{min(slopes):.4f}, {max(slopes):.4f}]")
    print()
    print("Individual line slopes:")
    for s in sorted(slopes):
        print(f"  {s:+.4f}")
    # Also test the 4 dewarp methods directly
    print("\n--- Dewarp method results on deskewed image ---")
    from cv_vocab_pipeline import (
        _detect_shear_angle, _detect_shear_by_projection,
        _detect_shear_by_hough, _detect_shear_by_text_lines,
    )
    for name, fn in [
        ("vertical_edge", _detect_shear_angle),
        ("projection", _detect_shear_by_projection),
        ("hough_lines", _detect_shear_by_hough),
        ("text_lines", _detect_shear_by_text_lines),
    ]:
        r = fn(img)
        print(f"  {name}: shear={r['shear_degrees']:.4f} conf={r['confidence']:.3f}")
    # The user says "right side needs to come down 3mm"
    # For a ~85mm wide image (1002px at ~300DPI), 3mm ~ 35px
    # shear angle = atan(35 / 1556) ~ 1.29 degrees
    # Let's check: what does the image look like if we apply 0.5, 1.0, 1.5 deg shear?
    print("\n--- Pixel shift at right edge for various shear angles ---")
    for deg in [0.5, 0.8, 1.0, 1.3, 1.5, 2.0]:
        shift_px = h * math.tan(math.radians(deg))
        shift_mm = shift_px / (w / 85.0)  # approximate mm
        print(f"  {deg:.1f} deg -> {shift_px:.0f}px shift -> ~{shift_mm:.1f}mm")
 asyncio.run(main())
--- a/klausur-service/backend/tests/test_box_boundary_rows.py
+++ b/klausur-service/backend/tests/test_box_boundary_rows.py
@@ -0,0 +1,256 @@
 """
 Tests for box boundary row filtering logic (box_ranges_inner).
 Verifies that rows at the border of box zones are NOT excluded during
 row detection and word filtering. This prevents the last row above a
 box from being clipped by the box's border pixels.
 Related fix in ocr_pipeline_api.py: detect_rows() and detect_words()
 use box_ranges_inner (shrunk by border_thickness, min 5px) instead of
 full box_ranges for row exclusion.
 """
 import pytest
 import numpy as np
 from dataclasses import dataclass
 # ---------------------------------------------------------------------------
 # Simulate the box_ranges_inner calculation from ocr_pipeline_api.py
 # ---------------------------------------------------------------------------
 def compute_box_ranges(zones: list[dict]) -> tuple[list, list]:
    """
    Replicates the box_ranges / box_ranges_inner calculation
    from detect_rows() in ocr_pipeline_api.py.
    """
    box_ranges = []
    box_ranges_inner = []
    for zone in zones:
        if zone.get("zone_type") == "box" and zone.get("box"):
            box = zone["box"]
            bt = max(box.get("border_thickness", 0), 5)  # minimum 5px margin
            box_ranges.append((box["y"], box["y"] + box["height"]))
            box_ranges_inner.append((box["y"] + bt, box["y"] + box["height"] - bt))
    return box_ranges, box_ranges_inner
 def build_content_strips(box_ranges_inner: list, top_y: int, bottom_y: int) -> list:
    """
    Replicates the content_strips calculation from detect_rows() in ocr_pipeline_api.py.
    """
    sorted_boxes = sorted(box_ranges_inner, key=lambda r: r[0])
    content_strips = []
    strip_start = top_y
    for by_start, by_end in sorted_boxes:
        if by_start > strip_start:
            content_strips.append((strip_start, by_start))
        strip_start = max(strip_start, by_end)
    if strip_start < bottom_y:
        content_strips.append((strip_start, bottom_y))
    return [(ys, ye) for ys, ye in content_strips if ye - ys >= 20]
 def row_in_box(row_y: int, row_height: int, box_ranges_inner: list) -> bool:
    """
    Replicates the _row_in_box filter from detect_words() in ocr_pipeline_api.py.
    """
    center_y = row_y + row_height / 2
    return any(by_s <= center_y < by_e for by_s, by_e in box_ranges_inner)
 # ---------------------------------------------------------------------------
 # Tests
 # ---------------------------------------------------------------------------
 class TestBoxRangesInner:
    """Tests for box_ranges_inner calculation."""
    def test_border_thickness_shrinks_inner_range(self):
        """Inner range should be shrunk by border_thickness."""
        zones = [{
            "zone_type": "box",
            "box": {"x": 50, "y": 500, "width": 1100, "height": 200, "border_thickness": 10},
        }]
        box_ranges, inner = compute_box_ranges(zones)
        assert box_ranges == [(500, 700)]
        assert inner == [(510, 690)]  # shrunk by 10px on each side
    def test_minimum_5px_margin(self):
        """Even with border_thickness=0, minimum 5px margin should apply."""
        zones = [{
            "zone_type": "box",
            "box": {"x": 50, "y": 500, "width": 1100, "height": 200, "border_thickness": 0},
        }]
        _, inner = compute_box_ranges(zones)
        assert inner == [(505, 695)]  # minimum 5px applied
    def test_no_box_zones_returns_empty(self):
        """Without box zones, both ranges should be empty."""
        zones = [
            {"zone_type": "content", "y": 0, "height": 500},
        ]
        box_ranges, inner = compute_box_ranges(zones)
        assert box_ranges == []
        assert inner == []
    def test_multiple_boxes(self):
        """Multiple boxes should each get their own inner range."""
        zones = [
            {"zone_type": "box", "box": {"x": 50, "y": 300, "width": 1100, "height": 150, "border_thickness": 8}},
            {"zone_type": "box", "box": {"x": 50, "y": 700, "width": 1100, "height": 150, "border_thickness": 3}},
        ]
        box_ranges, inner = compute_box_ranges(zones)
        assert len(box_ranges) == 2
        assert len(inner) == 2
        assert inner[0] == (308, 442)  # 300+8 to 450-8
        assert inner[1] == (705, 845)  # 700+5(min) to 850-5(min)
 class TestContentStrips:
    """Tests for content strip building with box_ranges_inner."""
    def test_single_box_creates_two_strips(self):
        """A single box in the middle should create two content strips."""
        inner = [(505, 695)]  # box inner at y=505..695
        strips = build_content_strips(inner, top_y=100, bottom_y=1700)
        assert len(strips) == 2
        assert strips[0] == (100, 505)   # above box
        assert strips[1] == (695, 1700)  # below box
    def test_content_strip_includes_box_border_area(self):
        """Content strips should INCLUDE the box border area (not just stop at box outer edge)."""
        # Box at y=500, height=200, border=10 → inner=(510, 690)
        inner = [(510, 690)]
        strips = build_content_strips(inner, top_y=100, bottom_y=1700)
        # Strip above extends to 510 (not 500), including border area
        assert strips[0] == (100, 510)
        # Strip below starts at 690 (not 700), including border area
        assert strips[1] == (690, 1700)
    def test_row_at_box_border_is_in_content_strip(self):
        """A row at y=495 (just above box at y=500) should be in the content strip."""
        # Box at y=500, height=200, border=10 → inner=(510, 690)
        inner = [(510, 690)]
        strips = build_content_strips(inner, top_y=100, bottom_y=1700)
        # Row at y=495, height=30 → center at y=510 → just at the edge
        row_center = 495 + 15  # = 510
        # This row center is at the boundary — it should be in the first strip
        in_first_strip = strips[0][0] <= row_center <= strips[0][1]
        assert in_first_strip
    def test_no_boxes_single_strip(self):
        """Without boxes, a single strip covering the full content should be returned."""
        strips = build_content_strips([], top_y=100, bottom_y=1700)
        assert len(strips) == 1
        assert strips[0] == (100, 1700)
 class TestRowInBoxFilter:
    """Tests for the _row_in_box filter using box_ranges_inner."""
    def test_row_inside_box_is_excluded(self):
        """A row clearly inside the box inner range should be excluded."""
        inner = [(510, 690)]
        # Row at y=550, height=30 → center at 565
        assert row_in_box(550, 30, inner) is True
    def test_row_above_box_not_excluded(self):
        """A row above the box (at the border area) should NOT be excluded."""
        inner = [(510, 690)]
        # Row at y=490, height=30 → center at 505 → below inner start (510)
        assert row_in_box(490, 30, inner) is False
    def test_row_below_box_not_excluded(self):
        """A row below the box (at the border area) should NOT be excluded."""
        inner = [(510, 690)]
        # Row at y=695, height=30 → center at 710 → above inner end (690)
        assert row_in_box(695, 30, inner) is False
    def test_row_at_box_border_not_excluded(self):
        """A row overlapping with the box border should NOT be excluded.
        This is the key fix: previously, box_ranges (not inner) was used,
        which would exclude this row because its center (505) falls within
        the full box range (500-700).
        """
        # Full box range: (500, 700), inner: (510, 690)
        inner = [(510, 690)]
        # Row at y=490, height=30 → center at 505
        # With box_ranges (500, 700): 500 <= 505 < 700 → excluded (BUG!)
        # With box_ranges_inner (510, 690): 510 <= 505 → False → not excluded (FIXED!)
        assert row_in_box(490, 30, inner) is False
    def test_row_at_bottom_border_not_excluded(self):
        """A row overlapping with the bottom box border should NOT be excluded."""
        inner = [(510, 690)]
        # Row at y=685, height=30 → center at 700
        # With box_ranges (500, 700): 500 <= 700 < 700 → not excluded (edge)
        # With box_ranges_inner (510, 690): 510 <= 700 → True but 700 >= 690 → False
        assert row_in_box(685, 30, inner) is False
    def test_no_boxes_nothing_excluded(self):
        """Without box zones, no rows should be excluded."""
        assert row_in_box(500, 30, []) is False
 class TestBoxBoundaryIntegration:
    """Integration test: simulate the full row → content strip → filter pipeline."""
    def test_boundary_row_preserved_with_inner_ranges(self):
        """
        End-to-end: A row at the box boundary is preserved in content strips
        and not filtered out by _row_in_box.
        Simulates the real scenario: page with a box at y=500..700,
        border_thickness=10. Row at y=488..518 (center=503) sits just
        above the box border.
        """
        zones = [{
            "zone_type": "box",
            "box": {"x": 50, "y": 500, "width": 1100, "height": 200, "border_thickness": 10},
        }]
        # Step 1: Compute inner ranges
        box_ranges, inner = compute_box_ranges(zones)
        assert inner == [(510, 690)]
        # Step 2: Build content strips
        strips = build_content_strips(inner, top_y=20, bottom_y=2400)
        assert len(strips) == 2
        # First strip extends to 510 (includes the border area 500-510)
        assert strips[0] == (20, 510)
        # Step 3: Check that the boundary row is NOT in box
        row_y, row_h = 488, 30  # center = 503
        assert row_in_box(row_y, row_h, inner) is False
        # Step 4: Verify the row's center falls within a content strip
        row_center = row_y + row_h / 2  # 503
        in_any_strip = any(ys <= row_center < ye for ys, ye in strips)
        assert in_any_strip, f"Row center {row_center} should be in content strips {strips}"
    def test_boundary_row_would_be_lost_with_full_ranges(self):
        """
        Demonstrates the bug: using full box_ranges (not inner) WOULD
        exclude the boundary row.
        """
        zones = [{
            "zone_type": "box",
            "box": {"x": 50, "y": 500, "width": 1100, "height": 200, "border_thickness": 10},
        }]
        box_ranges, _ = compute_box_ranges(zones)
        # The full range is (500, 700)
        row_center = 488 + 30 / 2  # 503
        # With full range: 500 <= 503 < 700 → would be excluded!
        in_box_full = any(by_s <= row_center < by_e for by_s, by_e in box_ranges)
        assert in_box_full is True, "Full range SHOULD incorrectly exclude this row"
--- a/klausur-service/backend/tests/test_box_layout.py
+++ b/klausur-service/backend/tests/test_box_layout.py
@@ -0,0 +1,124 @@
 """Tests for cv_box_layout.py — box layout classification and grid building."""
 import pytest
 import sys, os
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
 from cv_box_layout import classify_box_layout, build_box_zone_grid, _group_into_lines
 def _make_words(lines_data):
    """Create word dicts from [(y, x, text), ...] tuples."""
    words = []
    for y, x, text in lines_data:
        words.append({"top": y, "left": x, "width": len(text) * 10, "height": 25, "text": text})
    return words
 class TestClassifyBoxLayout:
    def test_header_only(self):
        words = _make_words([(100, 50, "Unit 3")])
        assert classify_box_layout(words, 500, 50) == "header_only"
    def test_empty(self):
        assert classify_box_layout([], 500, 200) == "header_only"
    def test_flowing(self):
        """Multiple lines without bullet patterns → flowing."""
        words = _make_words([
            (100, 50, "German leihen title"),
            (130, 50, "etwas ausleihen von jm"),
            (160, 70, "borrow sth from sb"),
            (190, 70, "Can I borrow your CD"),
            (220, 50, "etwas verleihen an jn"),
            (250, 70, "OK I can lend you my"),
        ])
        assert classify_box_layout(words, 500, 200) == "flowing"
    def test_bullet_list(self):
        """Lines starting with bullet markers → bullet_list."""
        words = _make_words([
            (100, 50, "Title of the box"),
            (130, 50, "• First item text here"),
            (160, 50, "• Second item text here"),
            (190, 50, "• Third item text here"),
            (220, 50, "• Fourth item text here"),
            (250, 50, "• Fifth item text here"),
        ])
        assert classify_box_layout(words, 500, 150) == "bullet_list"
 class TestGroupIntoLines:
    def test_single_line(self):
        words = _make_words([(100, 50, "hello"), (100, 120, "world")])
        lines = _group_into_lines(words)
        assert len(lines) == 1
        assert len(lines[0]) == 2
    def test_two_lines(self):
        words = _make_words([(100, 50, "line1"), (150, 50, "line2")])
        lines = _group_into_lines(words)
        assert len(lines) == 2
    def test_y_proximity(self):
        """Words within y-tolerance are on same line."""
        words = _make_words([(100, 50, "a"), (103, 120, "b")])  # 3px apart
        lines = _group_into_lines(words)
        assert len(lines) == 1
 class TestBuildBoxZoneGrid:
    def test_flowing_groups_by_indent(self):
        """Flowing layout groups continuation lines by indentation."""
        words = _make_words([
            (100, 50, "Header Title"),
            (130, 50, "Bullet start text"),
            (160, 80, "continuation line 1"),
            (190, 80, "continuation line 2"),
        ])
        result = build_box_zone_grid(words, 40, 90, 500, 120, 0, 1600, 2200, layout_type="flowing")
        # Header + 1 grouped bullet = 2 rows
        assert len(result["rows"]) == 2
        assert len(result["cells"]) == 2
        # Second cell should have \n (multi-line)
        bullet_cell = result["cells"][1]
        assert "\n" in bullet_cell["text"]
    def test_header_only_single_cell(self):
        words = _make_words([(100, 50, "Just a title")])
        result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="header_only")
        assert len(result["cells"]) == 1
        assert result["box_layout_type"] == "header_only"
    def test_columnar_delegates_to_zone_grid(self):
        """Columnar layout uses standard grid builder."""
        words = _make_words([
            (100, 50, "Col A header"),
            (100, 300, "Col B header"),
            (130, 50, "A data"),
            (130, 300, "B data"),
        ])
        result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="columnar")
        assert result["box_layout_type"] == "columnar"
        # Should have detected columns
        assert len(result.get("columns", [])) >= 1
    def test_row_fields_for_gridtable(self):
        """Rows must have y_min_px, y_max_px, is_header for GridTable."""
        words = _make_words([(100, 50, "Title"), (130, 50, "Body")])
        result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="flowing")
        for row in result["rows"]:
            assert "y_min_px" in row
            assert "y_max_px" in row
            assert "is_header" in row
    def test_column_fields_for_gridtable(self):
        """Columns must have x_min_px, x_max_px for GridTable width calculation."""
        words = _make_words([(100, 50, "Text")])
        result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="flowing")
        for col in result["columns"]:
            assert "x_min_px" in col
            assert "x_max_px" in col
--- a/klausur-service/backend/tests/test_dictionary_detection.py
+++ b/klausur-service/backend/tests/test_dictionary_detection.py
@@ -0,0 +1,285 @@
 """Tests for dictionary/Wörterbuch page detection.
 Tests the _score_dictionary_signals() function and _classify_dictionary_columns()
 from cv_layout.py.
 """
 import sys
 import os
 # Add backend to path for imports
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
 from cv_vocab_types import ColumnGeometry
 from cv_layout import _score_dictionary_signals, _classify_dictionary_columns, _score_language
 def _make_words(texts, start_y=0, y_step=30, x=100, conf=80):
    """Create a list of word dicts from text strings."""
    return [
        {
            "text": t,
            "conf": conf,
            "top": start_y + i * y_step,
            "left": x,
            "height": 20,
            "width": len(t) * 10,
        }
        for i, t in enumerate(texts)
    ]
 def _make_geom(index, words, x=0, width=200, width_ratio=0.15):
    """Create a ColumnGeometry with given words."""
    return ColumnGeometry(
        index=index,
        x=x,
        y=0,
        width=width,
        height=1000,
        word_count=len(words),
        words=words,
        width_ratio=width_ratio,
    )
 class TestDictionarySignals:
    """Test _score_dictionary_signals with synthetic data."""
    def test_alphabetical_column_detected(self):
        """A column with alphabetically ordered words should score high."""
        # Simulate a dictionary headword column: Z words
        headwords = _make_words([
            "Zahl", "Zahn", "zart", "Zauber", "Zaun",
            "Zeichen", "zeigen", "Zeit", "Zelt", "Zentrum",
            "zerbrechen", "Zeug", "Ziel", "Zimmer", "Zitrone",
            "Zoll", "Zone", "Zoo", "Zucker", "Zug",
        ])
        # Article column
        articles = _make_words(
            ["die", "der", "das", "der", "der",
             "das", "die", "die", "das", "das",
             "der", "das", "das", "das", "die",
             "der", "die", "der", "der", "der"],
            x=0,
        )
        # Translation column
        translations = _make_words(
            ["number", "tooth", "tender", "magic", "fence",
             "sign", "to show", "time", "tent", "centre",
             "to break", "stuff", "goal", "room", "lemon",
             "customs", "zone", "zoo", "sugar", "train"],
            x=400,
        )
        geoms = [
            _make_geom(0, articles, x=0, width=60, width_ratio=0.05),
            _make_geom(1, headwords, x=80, width=200, width_ratio=0.15),
            _make_geom(2, translations, x=400, width=200, width_ratio=0.15),
        ]
        result = _score_dictionary_signals(geoms)
        assert result["signals"]["alphabetical_score"] >= 0.80, (
            f"Expected alphabetical_score >= 0.80, got {result['signals']['alphabetical_score']}"
        )
        assert result["signals"]["article_density"] >= 0.80, (
            f"Expected article_density >= 0.80, got {result['signals']['article_density']}"
        )
        assert result["signals"]["first_letter_uniformity"] >= 0.60, (
            f"Expected first_letter_uniformity >= 0.60, got {result['signals']['first_letter_uniformity']}"
        )
        assert result["is_dictionary"] is True
        assert result["confidence"] >= 0.40
    def test_non_dictionary_vocab_table(self):
        """A normal vocab table (topic-grouped, no alphabetical order) should NOT be detected."""
        en_words = _make_words([
            "school", "teacher", "homework", "pencil", "break",
            "lunch", "friend", "computer", "book", "bag",
        ])
        de_words = _make_words([
            "Schule", "Lehrer", "Hausaufgaben", "Bleistift", "Pause",
            "Mittagessen", "Freund", "Computer", "Buch", "Tasche",
        ], x=300)
        geoms = [
            _make_geom(0, en_words, x=0, width=200, width_ratio=0.20),
            _make_geom(1, de_words, x=300, width=200, width_ratio=0.20),
        ]
        result = _score_dictionary_signals(geoms)
        # Alphabetical score should be moderate at best (random order)
        assert result["is_dictionary"] is False, (
            f"Normal vocab table should NOT be detected as dictionary, "
            f"confidence={result['confidence']}"
        )
    def test_article_column_detection(self):
        """A narrow column with mostly articles should be identified."""
        articles = _make_words(
            ["der", "die", "das", "der", "die", "das", "der", "die", "das", "der"],
            x=0,
        )
        headwords = _make_words(
            ["Apfel", "Birne", "Dose", "Eis", "Fisch",
             "Gabel", "Haus", "Igel", "Jacke", "Kuchen"],
        )
        translations = _make_words(
            ["apple", "pear", "can", "ice", "fish",
             "fork", "house", "hedgehog", "jacket", "cake"],
            x=400,
        )
        geoms = [
            _make_geom(0, articles, x=0, width=50, width_ratio=0.04),
            _make_geom(1, headwords, x=80, width=200, width_ratio=0.15),
            _make_geom(2, translations, x=400, width=200, width_ratio=0.15),
        ]
        result = _score_dictionary_signals(geoms)
        assert result["signals"]["article_density"] >= 0.80
        assert result["signals"]["article_col"] == 0
    def test_first_letter_uniformity(self):
        """Words all starting with same letter should have high uniformity."""
        z_words = _make_words([
            "Zahl", "Zahn", "zart", "Zauber", "Zaun",
            "Zeichen", "zeigen", "Zeit", "Zelt", "Zentrum",
        ])
        other = _make_words(
            ["number", "tooth", "tender", "magic", "fence",
             "sign", "to show", "time", "tent", "centre"],
            x=300,
        )
        geoms = [
            _make_geom(0, z_words, x=0, width=200, width_ratio=0.15),
            _make_geom(1, other, x=300, width=200, width_ratio=0.15),
        ]
        result = _score_dictionary_signals(geoms)
        assert result["signals"]["first_letter_uniformity"] >= 0.80
    def test_letter_transition_detected(self):
        """Words transitioning from one letter to next (A→B) should be detected."""
        words = _make_words([
            "Apfel", "Arm", "Auto", "Auge", "Abend",
            "Ball", "Baum", "Berg", "Blume", "Boot",
        ])
        other = _make_words(
            ["apple", "arm", "car", "eye", "evening",
             "ball", "tree", "mountain", "flower", "boat"],
            x=300,
        )
        geoms = [
            _make_geom(0, words, x=0, width=200, width_ratio=0.15),
            _make_geom(1, other, x=300, width=200, width_ratio=0.15),
        ]
        result = _score_dictionary_signals(geoms)
        assert result["signals"]["has_letter_transition"] is True
    def test_category_boost(self):
        """document_category='woerterbuch' should boost confidence."""
        # Weak signals that normally wouldn't trigger dictionary detection
        words_a = _make_words(["cat", "dog", "fish", "hat", "map"], x=0)
        words_b = _make_words(["Katze", "Hund", "Fisch", "Hut", "Karte"], x=300)
        geoms = [
            _make_geom(0, words_a, x=0, width=200, width_ratio=0.15),
            _make_geom(1, words_b, x=300, width=200, width_ratio=0.15),
        ]
        without_boost = _score_dictionary_signals(geoms)
        with_boost = _score_dictionary_signals(geoms, document_category="woerterbuch")
        assert with_boost["confidence"] > without_boost["confidence"]
        assert with_boost["confidence"] - without_boost["confidence"] >= 0.19  # ~0.20 boost
    def test_margin_strip_signal(self):
        """margin_strip_detected=True should contribute to confidence."""
        words_a = _make_words(["Apfel", "Arm", "Auto", "Auge", "Abend"], x=0)
        words_b = _make_words(["apple", "arm", "car", "eye", "evening"], x=300)
        geoms = [
            _make_geom(0, words_a, x=0, width=200, width_ratio=0.15),
            _make_geom(1, words_b, x=300, width=200, width_ratio=0.15),
        ]
        without = _score_dictionary_signals(geoms, margin_strip_detected=False)
        with_strip = _score_dictionary_signals(geoms, margin_strip_detected=True)
        assert with_strip["confidence"] > without["confidence"]
        assert with_strip["signals"]["margin_strip_detected"] is True
    def test_too_few_columns(self):
        """Single column should return is_dictionary=False."""
        words = _make_words(["Zahl", "Zahn", "zart", "Zauber", "Zaun"])
        geoms = [_make_geom(0, words)]
        result = _score_dictionary_signals(geoms)
        assert result["is_dictionary"] is False
    def test_empty_words(self):
        """Columns with no words should return is_dictionary=False."""
        geoms = [
            _make_geom(0, [], x=0),
            _make_geom(1, [], x=300),
        ]
        result = _score_dictionary_signals(geoms)
        assert result["is_dictionary"] is False
 class TestClassifyDictionaryColumns:
    """Test _classify_dictionary_columns with dictionary-detected data."""
    def test_assigns_article_and_headword(self):
        """When dictionary detected, assigns column_article and column_headword."""
        articles = _make_words(
            ["der", "die", "das", "der", "die", "das", "der", "die", "das", "der"],
            x=0,
        )
        headwords = _make_words([
            "Zahl", "Zahn", "zart", "Zauber", "Zaun",
            "Zeichen", "zeigen", "Zeit", "Zelt", "Zentrum",
        ])
        translations = _make_words(
            ["number", "tooth", "tender", "magic", "fence",
             "sign", "to show", "time", "tent", "centre"],
            x=400,
        )
        geoms = [
            _make_geom(0, articles, x=0, width=50, width_ratio=0.04),
            _make_geom(1, headwords, x=80, width=200, width_ratio=0.15),
            _make_geom(2, translations, x=400, width=200, width_ratio=0.15),
        ]
        dict_signals = _score_dictionary_signals(geoms)
        assert dict_signals["is_dictionary"] is True
        lang_scores = [_score_language(g.words) for g in geoms]
        regions = _classify_dictionary_columns(geoms, dict_signals, lang_scores, 1000)
        assert regions is not None
        types = [r.type for r in regions]
        assert "column_article" in types, f"Expected column_article in {types}"
        assert "column_headword" in types, f"Expected column_headword in {types}"
        # All regions should have classification_method='dictionary'
        for r in regions:
            assert r.classification_method == "dictionary"
    def test_returns_none_when_not_dictionary(self):
        """Should return None when dict_signals says not a dictionary."""
        geoms = [
            _make_geom(0, _make_words(["cat", "dog"]), x=0),
            _make_geom(1, _make_words(["Katze", "Hund"]), x=300),
        ]
        dict_signals = {"is_dictionary": False, "confidence": 0.1}
        lang_scores = [_score_language(g.words) for g in geoms]
        result = _classify_dictionary_columns(geoms, dict_signals, lang_scores, 1000)
        assert result is None
--- a/klausur-service/backend/tests/test_gutter_repair.py
+++ b/klausur-service/backend/tests/test_gutter_repair.py
@@ -0,0 +1,339 @@
 """Tests for cv_gutter_repair: gutter-edge word detection and repair."""
 import pytest
 import sys
 import os
 # Add parent directory to path so we can import the module
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
 from cv_gutter_repair import (
    _is_known,
    _try_hyphen_join,
    _try_spell_fix,
    _edit_distance,
    _word_is_at_gutter_edge,
    _MIN_WORD_LEN_SPELL,
    _MIN_WORD_LEN_HYPHEN,
    analyse_grid_for_gutter_repair,
    apply_gutter_suggestions,
 )
 # ---------------------------------------------------------------------------
 # Helper function tests
 # ---------------------------------------------------------------------------
 class TestEditDistance:
    def test_identical(self):
        assert _edit_distance("hello", "hello") == 0
    def test_one_substitution(self):
        assert _edit_distance("stammeli", "stammeln") == 1
    def test_one_deletion(self):
        assert _edit_distance("cat", "ca") == 1
    def test_one_insertion(self):
        assert _edit_distance("ca", "cat") == 1
    def test_empty(self):
        assert _edit_distance("", "abc") == 3
        assert _edit_distance("abc", "") == 3
    def test_both_empty(self):
        assert _edit_distance("", "") == 0
 class TestWordIsAtGutterEdge:
    def test_word_at_right_edge(self):
        # Word right edge at 90% of column = within gutter zone
        word_bbox = {"left": 80, "width": 15}  # right edge = 95
        assert _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
    def test_word_in_middle(self):
        # Word right edge at 50% of column = NOT at gutter
        word_bbox = {"left": 30, "width": 20}  # right edge = 50
        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
    def test_word_at_left(self):
        word_bbox = {"left": 5, "width": 20}  # right edge = 25
        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
    def test_zero_width_column(self):
        word_bbox = {"left": 0, "width": 10}
        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=0)
 # ---------------------------------------------------------------------------
 # Spellchecker-dependent tests (skip if not installed)
 # ---------------------------------------------------------------------------
 try:
    from spellchecker import SpellChecker
    _HAS_SPELLCHECKER = True
 except ImportError:
    _HAS_SPELLCHECKER = False
 needs_spellchecker = pytest.mark.skipif(
    not _HAS_SPELLCHECKER, reason="pyspellchecker not installed"
 )
@needs_spellchecker
 class TestIsKnown:
    def test_known_english(self):
        assert _is_known("hello") is True
        assert _is_known("world") is True
    def test_known_german(self):
        assert _is_known("verkünden") is True
        assert _is_known("stammeln") is True
    def test_unknown_garbled(self):
        assert _is_known("stammeli") is False
        assert _is_known("xyzqwp") is False
    def test_short_word(self):
        # Words < 3 chars are not checked
        assert _is_known("a") is False
@needs_spellchecker
 class TestTryHyphenJoin:
    def test_direct_join(self):
        # "ver" + "künden" = "verkünden"
        result = _try_hyphen_join("ver-", "künden")
        assert result is not None
        joined, missing, conf = result
        assert joined == "verkünden"
        assert missing == ""
        assert conf >= 0.9
    def test_join_with_missing_chars(self):
        # "ve" + "künden" → needs "r" in between → "verkünden"
        result = _try_hyphen_join("ve", "künden", max_missing=2)
        assert result is not None
        joined, missing, conf = result
        assert joined == "verkünden"
        assert "r" in missing
    def test_no_valid_join(self):
        result = _try_hyphen_join("xyz", "qwpgh")
        assert result is None
    def test_empty_inputs(self):
        assert _try_hyphen_join("", "word") is None
        assert _try_hyphen_join("word", "") is None
    def test_join_strips_trailing_punctuation(self):
        # "ver" + "künden," → should still find "verkünden" despite comma
        result = _try_hyphen_join("ver-", "künden,")
        assert result is not None
        joined, missing, conf = result
        assert joined == "verkünden"
    def test_join_with_missing_chars_and_punctuation(self):
        # "ve" + "künden," → needs "r" in between, comma must be stripped
        result = _try_hyphen_join("ve", "künden,", max_missing=2)
        assert result is not None
        joined, missing, conf = result
        assert joined == "verkünden"
        assert "r" in missing
@needs_spellchecker
 class TestTrySpellFix:
    def test_fix_garbled_ending_returns_alternatives(self):
        # "stammeli" should return a correction with alternatives
        result = _try_spell_fix("stammeli", col_type="column_de")
        assert result is not None
        corrected, conf, alts = result
        # The best correction is one of the valid forms
        all_options = [corrected] + alts
        all_lower = [w.lower() for w in all_options]
        # "stammeln" must be among the candidates
        assert "stammeln" in all_lower, f"Expected 'stammeln' in {all_options}"
    def test_known_word_not_fixed(self):
        # "Haus" is correct — no fix needed
        result = _try_spell_fix("Haus", col_type="column_de")
        # Should be None since the word is correct
        if result is not None:
            corrected, _, _ = result
            assert corrected.lower() == "haus"
    def test_short_word_skipped(self):
        result = _try_spell_fix("ab")
        assert result is None
    def test_min_word_len_thresholds(self):
        assert _MIN_WORD_LEN_HYPHEN == 2
        assert _MIN_WORD_LEN_SPELL == 3
 # ---------------------------------------------------------------------------
 # Grid analysis tests
 # ---------------------------------------------------------------------------
 def _make_grid(cells, columns=None):
    """Helper to create a minimal grid_data structure."""
    if columns is None:
        columns = [
            {"index": 0, "type": "column_en", "x_min_px": 0, "x_max_px": 200},
            {"index": 1, "type": "column_de", "x_min_px": 200, "x_max_px": 400},
            {"index": 2, "type": "column_text", "x_min_px": 400, "x_max_px": 600},
        ]
    return {
        "image_width": 600,
        "image_height": 800,
        "zones": [{
            "columns": columns,
            "cells": cells,
        }],
    }
 def _make_cell(row, col, text, left=0, width=50, col_width=200, col_x=0):
    """Helper to create a cell dict with word_boxes at a specific position."""
    return {
        "cell_id": f"R{row:02d}_C{col}",
        "row_index": row,
        "col_index": col,
        "col_type": "column_text",
        "text": text,
        "confidence": 90.0,
        "bbox_px": {"x": left, "y": row * 25, "w": width, "h": 20},
        "word_boxes": [
            {"text": text, "left": left, "top": row * 25, "width": width, "height": 20, "conf": 90},
        ],
    }
@needs_spellchecker
 class TestAnalyseGrid:
    def test_empty_grid(self):
        result = analyse_grid_for_gutter_repair({"zones": []})
        assert result["suggestions"] == []
        assert result["stats"]["words_checked"] == 0
    def test_detects_spell_fix_at_edge(self):
        # "stammeli" at position 160 in a column 0-200 wide = 80% = at gutter
        cells = [
            _make_cell(29, 2, "stammeli", left=540, width=55, col_width=200, col_x=400),
        ]
        grid = _make_grid(cells)
        result = analyse_grid_for_gutter_repair(grid)
        suggestions = result["suggestions"]
        assert len(suggestions) >= 1
        assert suggestions[0]["type"] == "spell_fix"
        assert suggestions[0]["suggested_text"] == "stammeln"
    def test_detects_hyphen_join(self):
        # Row 30: "ve" at gutter edge, Row 31: "künden"
        cells = [
            _make_cell(30, 2, "ve", left=570, width=25, col_width=200, col_x=400),
            _make_cell(31, 2, "künden", left=410, width=80, col_width=200, col_x=400),
        ]
        grid = _make_grid(cells)
        result = analyse_grid_for_gutter_repair(grid)
        suggestions = result["suggestions"]
        # Should find hyphen_join or spell_fix
        assert len(suggestions) >= 1
    def test_ignores_known_words(self):
        # "hello" is a known word — should not be suggested
        cells = [
            _make_cell(0, 0, "hello", left=160, width=35),
        ]
        grid = _make_grid(cells)
        result = analyse_grid_for_gutter_repair(grid)
        # Should not suggest anything for known words
        spell_fixes = [s for s in result["suggestions"] if s["original_text"] == "hello"]
        assert len(spell_fixes) == 0
    def test_ignores_words_not_at_edge(self):
        # "stammeli" at position 10 = NOT at gutter edge
        cells = [
            _make_cell(0, 0, "stammeli", left=10, width=50),
        ]
        grid = _make_grid(cells)
        result = analyse_grid_for_gutter_repair(grid)
        assert len(result["suggestions"]) == 0
 # ---------------------------------------------------------------------------
 # Apply suggestions tests
 # ---------------------------------------------------------------------------
 class TestApplySuggestions:
    def test_apply_spell_fix(self):
        cells = [
            {"cell_id": "R29_C2", "row_index": 29, "col_index": 2,
             "text": "er stammeli", "word_boxes": []},
        ]
        grid = _make_grid(cells)
        suggestions = [{
            "id": "abc",
            "type": "spell_fix",
            "zone_index": 0,
            "row_index": 29,
            "col_index": 2,
            "original_text": "stammeli",
            "suggested_text": "stammeln",
        }]
        result = apply_gutter_suggestions(grid, ["abc"], suggestions)
        assert result["applied_count"] == 1
        assert grid["zones"][0]["cells"][0]["text"] == "er stammeln"
    def test_apply_hyphen_join(self):
        cells = [
            {"cell_id": "R30_C2", "row_index": 30, "col_index": 2,
             "text": "ve", "word_boxes": []},
            {"cell_id": "R31_C2", "row_index": 31, "col_index": 2,
             "text": "künden und", "word_boxes": []},
        ]
        grid = _make_grid(cells)
        suggestions = [{
            "id": "def",
            "type": "hyphen_join",
            "zone_index": 0,
            "row_index": 30,
            "col_index": 2,
            "original_text": "ve",
            "suggested_text": "verkünden",
            "next_row_index": 31,
            "display_parts": ["ver-", "künden"],
            "missing_chars": "r",
        }]
        result = apply_gutter_suggestions(grid, ["def"], suggestions)
        assert result["applied_count"] == 1
        # Current row: "ve" replaced with "ver-"
        assert grid["zones"][0]["cells"][0]["text"] == "ver-"
        # Next row: UNCHANGED — "künden" stays in its original row
        assert grid["zones"][0]["cells"][1]["text"] == "künden und"
    def test_apply_nothing_when_no_accepted(self):
        grid = _make_grid([])
        result = apply_gutter_suggestions(grid, [], [])
        assert result["applied_count"] == 0
    def test_skip_unknown_suggestion_id(self):
        cells = [
            {"cell_id": "R0_C0", "row_index": 0, "col_index": 0,
             "text": "test", "word_boxes": []},
        ]
        grid = _make_grid(cells)
        suggestions = [{
            "id": "abc",
            "type": "spell_fix",
            "zone_index": 0,
            "row_index": 0,
            "col_index": 0,
            "original_text": "test",
            "suggested_text": "test2",
        }]
        # Accept a non-existent ID
        result = apply_gutter_suggestions(grid, ["nonexistent"], suggestions)
        assert result["applied_count"] == 0
        assert grid["zones"][0]["cells"][0]["text"] == "test"
--- a/klausur-service/backend/tests/test_merge_wrapped_rows.py
+++ b/klausur-service/backend/tests/test_merge_wrapped_rows.py
@@ -0,0 +1,135 @@
 """Tests for _merge_wrapped_rows — cell-wrap continuation row merging."""
 import pytest
 import sys
 import os
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
 from cv_cell_grid import _merge_wrapped_rows
 def _entry(row_index, english='', german='', example=''):
    return {
        'row_index': row_index,
        'english': english,
        'german': german,
        'example': example,
    }
 class TestMergeWrappedRows:
    """Test cell-wrap continuation row merging."""
    def test_basic_en_empty_merge(self):
        """EN empty, DE has text → merge DE into previous row."""
        entries = [
            _entry(0, english='take part (in)', german='teilnehmen (an), mitmachen', example='More than 200 singers took'),
            _entry(1, english='', german='(bei)', example='part in the concert.'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['german'] == 'teilnehmen (an), mitmachen (bei)'
        assert result[0]['example'] == 'More than 200 singers took part in the concert.'
    def test_en_empty_de_only(self):
        """EN empty, only DE continuation (no example)."""
        entries = [
            _entry(0, english='competition', german='der Wettbewerb,'),
            _entry(1, english='', german='das Turnier'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['german'] == 'der Wettbewerb, das Turnier'
    def test_en_empty_example_only(self):
        """EN empty, only example continuation."""
        entries = [
            _entry(0, english='to arrive', german='ankommen', example='We arrived at the'),
            _entry(1, english='', german='', example='hotel at midnight.'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['example'] == 'We arrived at the hotel at midnight.'
    def test_de_empty_paren_continuation(self):
        """DE empty, EN starts with parenthetical → merge into previous EN."""
        entries = [
            _entry(0, english='to take part', german='teilnehmen'),
            _entry(1, english='(in)', german=''),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['english'] == 'to take part (in)'
    def test_de_empty_lowercase_continuation(self):
        """DE empty, EN starts lowercase → merge into previous EN."""
        entries = [
            _entry(0, english='to put up', german='aufstellen'),
            _entry(1, english='with sth.', german=''),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['english'] == 'to put up with sth.'
    def test_no_merge_both_have_content(self):
        """Both EN and DE have text → normal row, don't merge."""
        entries = [
            _entry(0, english='house', german='Haus'),
            _entry(1, english='garden', german='Garten'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 2
    def test_no_merge_new_word_uppercase(self):
        """EN has uppercase text, DE is empty → could be a new word, not merged."""
        entries = [
            _entry(0, english='house', german='Haus'),
            _entry(1, english='Garden', german=''),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 2
    def test_triple_wrap(self):
        """Three consecutive wrapped rows → all merge into first."""
        entries = [
            _entry(0, english='competition', german='der Wettbewerb,'),
            _entry(1, english='', german='das Turnier,'),
            _entry(2, english='', german='der Wettkampf'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['german'] == 'der Wettbewerb, das Turnier, der Wettkampf'
    def test_empty_entries(self):
        """Empty list."""
        assert _merge_wrapped_rows([]) == []
    def test_single_entry(self):
        """Single entry unchanged."""
        entries = [_entry(0, english='house', german='Haus')]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
    def test_mixed_normal_and_wrapped(self):
        """Mix of normal rows and wrapped rows."""
        entries = [
            _entry(0, english='house', german='Haus'),
            _entry(1, english='take part (in)', german='teilnehmen (an),'),
            _entry(2, english='', german='mitmachen (bei)'),
            _entry(3, english='garden', german='Garten'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 3
        assert result[0]['english'] == 'house'
        assert result[1]['german'] == 'teilnehmen (an), mitmachen (bei)'
        assert result[2]['english'] == 'garden'
    def test_comma_separator_handling(self):
        """Previous DE ends with comma → no extra space needed."""
        entries = [
            _entry(0, english='word', german='Wort,'),
            _entry(1, english='', german='Ausdruck'),
        ]
        result = _merge_wrapped_rows(entries)
        assert len(result) == 1
        assert result[0]['german'] == 'Wort, Ausdruck'
--- a/klausur-service/backend/tests/test_page_crop.py
+++ b/klausur-service/backend/tests/test_page_crop.py
@@ -18,6 +18,7 @@ from page_crop import (
    detect_page_splits,
    _detect_format,
    _detect_edge_projection,
    _detect_gutter_continuity,
    _detect_left_edge_shadow,
    _detect_right_edge_shadow,
    _detect_spine_shadow,
@@ -564,3 +565,110 @@ class TestDetectPageSplits:
            assert pages[0]["x"] == 0
            total_w = sum(p["width"] for p in pages)
            assert total_w == w, f"Total page width {total_w} != image width {w}"
 # ---------------------------------------------------------------------------
 # Tests: _detect_gutter_continuity (camera book scans)
 # ---------------------------------------------------------------------------
 def _make_camera_book_scan(h: int = 2400, w: int = 1700, gutter_side: str = "right") -> np.ndarray:
    """Create a synthetic camera book scan with a subtle gutter shadow.
    Camera gutter shadows are much subtler than scanner shadows:
    - Page brightness ~250 (well-lit)
    - Gutter brightness ~210-230 (slight shadow)
    - Shadow runs continuously from top to bottom
    - Gradient is ~40px wide
    """
    img = np.full((h, w, 3), 250, dtype=np.uint8)
    # Add some variation to make it realistic
    rng = np.random.RandomState(99)
    # Subtle gutter gradient at the specified side
    gutter_w = int(w * 0.04)  # ~4% of width
    gradient_w = int(w * 0.03)  # transition zone
    if gutter_side == "right":
        gutter_start = w - gutter_w - gradient_w
        for x in range(gutter_start, w):
            dist_from_start = x - gutter_start
            # Linear gradient from 250 down to 210
            brightness = int(250 - 40 * min(dist_from_start / (gutter_w + gradient_w), 1.0))
            img[:, x] = brightness
    else:
        gutter_end = gutter_w + gradient_w
        for x in range(gutter_end):
            dist_from_edge = gutter_end - x
            brightness = int(250 - 40 * min(dist_from_edge / (gutter_w + gradient_w), 1.0))
            img[:, x] = brightness
    # Scatter some text (dark pixels) in the content area
    content_left = gutter_end + 20 if gutter_side == "left" else 50
    content_right = gutter_start - 20 if gutter_side == "right" else w - 50
    for _ in range(800):
        y = rng.randint(h // 10, h - h // 10)
        x = rng.randint(content_left, content_right)
        y2 = min(y + 3, h)
        x2 = min(x + 15, w)
        img[y:y2, x:x2] = 20
    return img
 class TestDetectGutterContinuity:
    """Tests for camera gutter shadow detection via vertical continuity."""
    def test_detects_right_gutter(self):
        """Should detect a subtle gutter shadow on the right side."""
        img = _make_camera_book_scan(gutter_side="right")
        h, w = img.shape[:2]
        gray = np.mean(img, axis=2).astype(np.uint8)
        search_w = w // 4
        right_start = w - search_w
        result = _detect_gutter_continuity(
            gray, gray[:, right_start:], right_start, w, "right",
        )
        assert result is not None
        # Gutter starts roughly at 93% of width (w - 4% - 3%)
        assert result > w * 0.85, f"Gutter x={result} too far left"
        assert result < w * 0.98, f"Gutter x={result} too close to edge"
    def test_detects_left_gutter(self):
        """Should detect a subtle gutter shadow on the left side."""
        img = _make_camera_book_scan(gutter_side="left")
        h, w = img.shape[:2]
        gray = np.mean(img, axis=2).astype(np.uint8)
        search_w = w // 4
        result = _detect_gutter_continuity(
            gray, gray[:, :search_w], 0, w, "left",
        )
        assert result is not None
        assert result > w * 0.02, f"Gutter x={result} too close to edge"
        assert result < w * 0.15, f"Gutter x={result} too far right"
    def test_no_gutter_on_clean_page(self):
        """Should NOT detect a gutter on a uniformly bright page."""
        img = np.full((2000, 1600, 3), 250, dtype=np.uint8)
        # Add some text but no gutter
        rng = np.random.RandomState(42)
        for _ in range(500):
            y = rng.randint(100, 1900)
            x = rng.randint(100, 1500)
            img[y:min(y+3, 2000), x:min(x+15, 1600)] = 20
        gray = np.mean(img, axis=2).astype(np.uint8)
        w = 1600
        search_w = w // 4
        right_start = w - search_w
        result_r = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
        result_l = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
        assert result_r is None, f"False positive on right: x={result_r}"
        assert result_l is None, f"False positive on left: x={result_l}"
    def test_integrated_with_crop(self):
        """End-to-end: detect_and_crop_page should crop at the gutter."""
        img = _make_camera_book_scan(gutter_side="right")
        cropped, result = detect_and_crop_page(img)
        # The right border should be > 0 (gutter cropped)
        right_border = result["border_fractions"]["right"]
        assert right_border > 0.01, f"Right border {right_border} — gutter not cropped"
--- a/Show More
+++ b/Show More