Fix: reprocess button works after session resume + apply merge logic

Two bugs fixed: 1. reprocessPages() failed silently after session resume because successfulPages was empty. Now derives pages from vocabulary source_page or selectedPages as fallback. 2. process-single-page endpoint built vocabulary entries WITHOUT applying merge logic (_merge_wrapped_rows, _merge_continuation_rows). Now applies full merge pipeline after vocabulary extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add "Neu verarbeiten" button to VocabularyTab
2026-04-17 00:46:15 +02:00 · 2026-04-16 08:37:13 +02:00 · 2026-04-16 08:32:45 +02:00 · 2026-04-16 07:26:44 +02:00 · 2026-04-16 07:22:52 +02:00 · 2026-04-16 07:13:23 +02:00
114 changed files with 661412 additions and 5600 deletions
--- a/.claude/rules/ocr-pipeline-extensions.md
+++ b/.claude/rules/ocr-pipeline-extensions.md
@@ -0,0 +1,237 @@
+# OCR Pipeline Erweiterungen - Entwicklerdokumentation
+
+**Status:** Produktiv
+**Letzte Aktualisierung:** 2026-04-15
+**URL:** https://macmini:3002/ai/ocr-kombi
+
+---
+
+## Uebersicht
+
+Erweiterungen der OCR Kombi Pipeline (14 Steps, 0-13):
+- **SmartSpellChecker** — LLM-freie OCR-Korrektur mit Spracherkennung
+- **Box-Grid-Review** (Step 11) — Eingebettete Boxen verarbeiten
+- **Ansicht/Spreadsheet** (Step 12) — Fortune Sheet Excel-Editor
+
+---
+
+## Pipeline Steps
+
+| Step | ID | Name | Komponente |
+|------|----|------|------------|
+| 0 | upload | Upload | StepUpload |
+| 1 | orientation | Orientierung | StepOrientation |
+| 2 | page-split | Seitentrennung | StepPageSplit |
+| 3 | deskew | Begradigung | StepDeskew |
+| 4 | dewarp | Entzerrung | StepDewarp |
+| 5 | content-crop | Zuschneiden | StepContentCrop |
+| 6 | ocr | OCR | StepOcr |
+| 7 | structure | Strukturerkennung | StepStructure |
+| 8 | grid-build | Grid-Aufbau | StepGridBuild |
+| 9 | grid-review | Grid-Review | StepGridReview |
+| 10 | gutter-repair | Wortkorrektur | StepGutterRepair |
+| **11** | **box-review** | **Box-Review** | **StepBoxGridReview** |
+| **12** | **ansicht** | **Ansicht** | **StepAnsicht** |
+| 13 | ground-truth | Ground Truth | StepGroundTruth |
+
+Step-Definitionen: `admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts`
+
+---
+
+## SmartSpellChecker
+
+**Datei:** `klausur-service/backend/smart_spell.py`
+**Tests:** `tests/test_smart_spell.py` (43 Tests)
+**Lizenz:** Nur pyspellchecker (MIT) — kein LLM, kein Hunspell
+
+### Features
+
+| Feature | Methode |
+|---------|---------|
+| Spracherkennung | Dual-Dictionary EN/DE Heuristik |
+| a/I Disambiguation | Bigram-Kontext (Folgewort-Lookup) |
+| Boundary Repair | Frequenz-basiert: `Pound sand`→`Pounds and` |
+| Context Split | `anew`→`a new` (Allow/Deny-Liste) |
+| Multi-Digit | BFS: `sch00l`→`school` |
+| Cross-Language Guard | DE-Woerter in EN-Spalte nicht falsch korrigieren |
+| Umlaut-Korrektur | `Schuler`→`Schueler` |
+| IPA-Schutz | Inhalte in [Klammern] nie aendern |
+| Slash→l | `p/`→`pl` (kursives l als / erkannt) |
+| Abkuerzungen | 120+ aus `_KNOWN_ABBREVIATIONS` |
+
+### Integration
+
+```python
+# In cv_review.py (LLM Review Step):
+from smart_spell import SmartSpellChecker
+_smart = SmartSpellChecker()
+result = _smart.correct_text(text, lang="en")  # oder "de" oder "auto"
+
+# In grid_editor_api.py (Grid Build + Box Build):
+# Automatisch nach Grid-Aufbau und Box-Grid-Aufbau
+```
+
+### Frequenz-Scoring
+
+Boundary Repair vergleicht Wort-Frequenz-Produkte:
+- `old_freq = word_freq(w1) * word_freq(w2)`
+- `new_freq = word_freq(repaired_w1) * word_freq(repaired_w2)`
+- Akzeptiert wenn `new_freq > old_freq * 5`
+- Abkuerzungs-Bonus nur wenn Original-Woerter selten (freq < 1e-6)
+
+---
+
+## Box-Grid-Review (Step 11)
+
+**Frontend:** `admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx`
+**Backend:** `klausur-service/backend/cv_box_layout.py`, `grid_editor_api.py`
+**Tests:** `tests/test_box_layout.py` (13 Tests)
+
+### Backend-Endpoints
+
+```
+POST /api/v1/ocr-pipeline/sessions/{id}/build-box-grids
+```
+
+Verarbeitet alle erkannten Boxen aus `structure_result`:
+1. Filtert Header/Footer-Boxen (obere/untere 7% der Bildhoehe)
+2. Extrahiert OCR-Woerter pro Box aus `raw_paddle_words`
+3. Klassifiziert Layout: `flowing` | `columnar` | `bullet_list` | `header_only`
+4. Baut Grid mit layout-spezifischer Logik
+5. Wendet SmartSpellChecker an
+
+### Box Layout Klassifikation (`cv_box_layout.py`)
+
+| Layout | Erkennung | Grid-Aufbau |
+|--------|-----------|-------------|
+| `header_only` | ≤5 Woerter oder 1 Zeile | 1 Zelle, alles zusammen |
+| `flowing` | Gleichmaessige Zeilenbreite | 1 Spalte, Bullet-Gruppierung per Einrueckung |
+| `bullet_list` | ≥40% Zeilen mit Bullet-Marker | 1 Spalte, Bullet-Items |
+| `columnar` | Mehrere X-Cluster | Standard-Spaltenerkennung |
+
+### Bullet-Einrueckung
+
+Erkennung ueber Left-Edge-Analyse:
+- Minimale Einrueckung = Bullet-Ebene
+- Zeilen mit >15px mehr Einrueckung = Folgezeilen
+- Folgezeilen werden mit `\n` in die Bullet-Zelle integriert
+- Fehlende `•` Marker werden automatisch ergaenzt
+
+### Colspan-Erkennung (`grid_editor_helpers.py`)
+
+Generische Funktion `_detect_colspan_cells()`:
+- Laeuft nach `_build_cells()` fuer ALLE Zonen
+- Nutzt Original-Wort-Bloecke (vor `_split_cross_column_words`)
+- Wort-Block der ueber Spaltengrenze reicht → `spanning_header` mit `colspan=N`
+- Beispiel: "In Britain you pay with pounds and pence." ueber 2 Spalten
+
+### Spalten-Erkennung in Boxen
+
+Fuer kleine Zonen (≤60 Woerter):
+- `gap_threshold = max(median_h * 1.0, 25)` statt `3x median`
+- PaddleOCR liefert Multi-Word-Bloecke → alle Gaps sind Spalten-Gaps
+
+---
+
+## Ansicht / Spreadsheet (Step 12)
+
+**Frontend:** `admin-lehrer/components/ocr-kombi/StepAnsicht.tsx`, `SpreadsheetView.tsx`
+**Bibliothek:** `@fortune-sheet/react` (MIT, v1.0.4)
+
+### Architektur
+
+Split-View:
+- **Links:** Original-Scan mit OCR-Overlay (`/image/words-overlay`)
+- **Rechts:** Fortune Sheet Spreadsheet mit Multi-Sheet-Tabs
+
+### Multi-Sheet Ansatz
+
+Jede Zone wird ein eigenes Sheet-Tab:
+- Sheet "Vokabeln" — Hauptgrid mit EN/DE Spalten
+- Sheet "Pounds and euros" — Box 1 mit eigenen 4 Spalten
+- Sheet "German leihen" — Box 2 als Fliesstexttext
+
+Grund: Spaltenbreiten sind pro Zone unterschiedlich optimiert. Excel-Limitation: Spaltenbreite gilt fuer die ganze Spalte.
+
+### Zell-Formatierung
+
+| Format | Quelle | Fortune Sheet Property |
+|--------|--------|----------------------|
+| Fett | `is_header`, `is_bold`, groessere Schrift | `bl: 1` |
+| Schriftfarbe | OCR word_boxes color | `fc: '#hex'` |
+| Hintergrund | Box bg_hex, Header | `bg: '#hex08'` |
+| Text-Wrap | Mehrzeilige Zellen (\n) | `tb: '2'` |
+| Vertikal oben | Mehrzeilige Zellen | `vt: 0` |
+| Groessere Schrift | word_box height >1.3x median | `fs: 12` |
+
+### Spaltenbreiten
+
+Auto-Fit: `max(laengster_text * 7.5 + 16, original_px * scaleFactor)`
+
+### Toolbar
+
+`undo, redo, font-bold, font-italic, font-strikethrough, font-color, background, font-size, horizontal-align, vertical-align, text-wrap, merge-cell, border`
+
+---
+
+## Unified Grid (Backend)
+
+**Datei:** `klausur-service/backend/unified_grid.py`
+**Tests:** `tests/test_unified_grid.py` (10 Tests)
+
+Mergt alle Zonen in ein einzelnes Grid (fuer Export/Analyse):
+
+```
+POST /api/v1/ocr-pipeline/sessions/{id}/build-unified-grid
+GET  /api/v1/ocr-pipeline/sessions/{id}/unified-grid
+```
+
+- Dominante Zeilenhoehe = Median der Content-Row-Abstaende
+- Full-Width Boxen: Rows direkt integriert
+- Partial-Width Boxen: Extra-Rows eingefuegt wenn Box mehr Zeilen hat
+- Box-Zellen mit `source_zone_type: "box"` und `box_region` Metadaten
+
+---
+
+## Dateistruktur
+
+### Backend (klausur-service)
+
+| Datei | Zeilen | Beschreibung |
+|-------|--------|--------------|
+| `grid_build_core.py` | 1943 | `_build_grid_core()` — Haupt-Grid-Aufbau |
+| `grid_editor_api.py` | 474 | REST-Endpoints (build, save, get, gutter, box, unified) |
+| `grid_editor_helpers.py` | 1737 | Helper: Spalten, Rows, Cells, Colspan, Header |
+| `smart_spell.py` | 587 | SmartSpellChecker |
+| `cv_box_layout.py` | 339 | Box-Layout-Klassifikation + Grid-Aufbau |
+| `unified_grid.py` | 425 | Unified Grid Builder |
+
+### Frontend (admin-lehrer)
+
+| Datei | Zeilen | Beschreibung |
+|-------|--------|--------------|
+| `StepBoxGridReview.tsx` | 283 | Box-Review Step 11 |
+| `StepAnsicht.tsx` | 112 | Ansicht Step 12 (Split-View) |
+| `SpreadsheetView.tsx` | ~160 | Fortune Sheet Integration |
+| `GridTable.tsx` | 652 | Grid-Editor Tabelle (Steps 9-11) |
+| `useGridEditor.ts` | 985 | Grid-Editor Hook |
+
+### Tests
+
+| Datei | Tests | Beschreibung |
+|-------|-------|--------------|
+| `test_smart_spell.py` | 43 | Spracherkennung, Boundary Repair, IPA-Schutz |
+| `test_box_layout.py` | 13 | Layout-Klassifikation, Bullet-Gruppierung |
+| `test_unified_grid.py` | 10 | Unified Grid, Box-Klassifikation |
+| **Gesamt** | **66** | |
+
+---
+
+## Aenderungshistorie
+
+| Datum | Aenderung |
+|-------|-----------|
+| 2026-04-15 | Fortune Sheet Multi-Sheet Tabs, Bullet-Points, Auto-Fit, Refactoring |
+| 2026-04-14 | Unified Grid, Ansicht Step, Colspan-Erkennung |
+| 2026-04-13 | Box-Grid-Review Step, Spalten in Boxen, Header/Footer Filter |
+| 2026-04-12 | SmartSpellChecker, Frequency Scoring, IPA-Schutz, Vocab-Worksheet Refactoring |
--- a/.claude/rules/vocab-worksheet.md
+++ b/.claude/rules/vocab-worksheet.md
@@ -188,11 +188,35 @@ ssh macmini "docker compose up -d klausur-service studio-v2"

 ---

+## Frontend Refactoring (2026-04-12)
+
+`page.tsx` wurde von 2337 Zeilen in 14 Dateien aufgeteilt:
+
+```
+studio-v2/app/vocab-worksheet/
+├── page.tsx                 # 198 Zeilen — Orchestrator
+├── types.ts                 # Interfaces, VocabWorksheetHook
+├── constants.ts             # API-Base, Formats, Defaults
+├── useVocabWorksheet.ts     # 843 Zeilen — Custom Hook (alle State + Logik)
+└── components/
+    ├── UploadScreen.tsx      # Session-Liste + Dokument-Auswahl
+    ├── PageSelection.tsx     # PDF-Seitenauswahl
+    ├── VocabularyTab.tsx     # Vokabel-Tabelle + IPA/Silben
+    ├── WorksheetTab.tsx      # Format-Auswahl + Konfiguration
+    ├── ExportTab.tsx         # PDF-Download
+    ├── OcrSettingsPanel.tsx   # OCR-Filter Einstellungen
+    ├── FullscreenPreview.tsx  # Vollbild-Vorschau Modal
+    ├── QRCodeModal.tsx        # QR-Upload Modal
+    └── OcrComparisonModal.tsx # OCR-Vergleich Modal
+```
+
+---
+
 ## Erweiterung: Neue Formate hinzufuegen

 1. **Backend**: Neuen Generator in `klausur-service/backend/` erstellen
 2. **API**: Neuen Endpoint in `vocab_worksheet_api.py` hinzufuegen
-3. **Frontend**: Format zu `worksheetFormats` Array in `page.tsx` hinzufuegen
+3. **Frontend**: Format zu `worksheetFormats` Array in `constants.ts` hinzufuegen
 4. **Doku**: Diese Datei aktualisieren

 ---
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/page.tsx
@@ -0,0 +1,174 @@
+'use client'
+
+import { Suspense } from 'react'
+import { PagePurpose } from '@/components/common/PagePurpose'
+import { KombiStepper } from '@/components/ocr-kombi/KombiStepper'
+import { SessionList } from '@/components/ocr-kombi/SessionList'
+import { SessionHeader } from '@/components/ocr-kombi/SessionHeader'
+import { StepUpload } from '@/components/ocr-kombi/StepUpload'
+import { StepOrientation } from '@/components/ocr-kombi/StepOrientation'
+import { StepPageSplit } from '@/components/ocr-kombi/StepPageSplit'
+import { StepDeskew } from '@/components/ocr-kombi/StepDeskew'
+import { StepDewarp } from '@/components/ocr-kombi/StepDewarp'
+import { StepContentCrop } from '@/components/ocr-kombi/StepContentCrop'
+import { StepOcr } from '@/components/ocr-kombi/StepOcr'
+import { StepStructure } from '@/components/ocr-kombi/StepStructure'
+import { StepGridBuild } from '@/components/ocr-kombi/StepGridBuild'
+import { StepGridReview } from '@/components/ocr-kombi/StepGridReview'
+import { StepGutterRepair } from '@/components/ocr-kombi/StepGutterRepair'
+import { StepBoxGridReview } from '@/components/ocr-kombi/StepBoxGridReview'
+import { StepAnsicht } from '@/components/ocr-kombi/StepAnsicht'
+import { StepGroundTruth } from '@/components/ocr-kombi/StepGroundTruth'
+import { useKombiPipeline } from './useKombiPipeline'
+
+function OcrKombiContent() {
+  const {
+    currentStep,
+    sessionId,
+    sessionName,
+    loadingSessions,
+    activeCategory,
+    isGroundTruth,
+    pageNumber,
+    steps,
+    gridSaveRef,
+    groupedSessions,
+    loadSessions,
+    openSession,
+    handleStepClick,
+    handleNext,
+    handleNewSession,
+    deleteSession,
+    renameSession,
+    updateCategory,
+    setSessionId,
+    setSessionName,
+    setIsGroundTruth,
+  } = useKombiPipeline()
+
+  const renderStep = () => {
+    switch (currentStep) {
+      case 0:
+        return (
+          <StepUpload
+            sessionId={sessionId}
+            onUploaded={(sid, name) => {
+              setSessionId(sid)
+              setSessionName(name)
+              loadSessions()
+            }}
+            onNext={handleNext}
+          />
+        )
+      case 1:
+        return (
+          <StepOrientation
+            sessionId={sessionId}
+            onNext={() => handleNext()}
+            onSessionList={() => { loadSessions(); handleNewSession() }}
+          />
+        )
+      case 2:
+        return (
+          <StepPageSplit
+            sessionId={sessionId}
+            sessionName={sessionName}
+            onNext={handleNext}
+            onSplitComplete={(childId, childName) => {
+              // Switch to the first child session and refresh the list
+              setSessionId(childId)
+              setSessionName(childName)
+              loadSessions()
+            }}
+          />
+        )
+      case 3:
+        return <StepDeskew sessionId={sessionId} onNext={handleNext} />
+      case 4:
+        return <StepDewarp sessionId={sessionId} onNext={handleNext} />
+      case 5:
+        return <StepContentCrop sessionId={sessionId} onNext={handleNext} />
+      case 6:
+        return <StepOcr sessionId={sessionId} onNext={handleNext} />
+      case 7:
+        return <StepStructure sessionId={sessionId} onNext={handleNext} />
+      case 8:
+        return <StepGridBuild sessionId={sessionId} onNext={handleNext} />
+      case 9:
+        return <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
+      case 10:
+        return <StepGutterRepair sessionId={sessionId} onNext={handleNext} />
+      case 11:
+        return <StepBoxGridReview sessionId={sessionId} onNext={handleNext} />
+      case 12:
+        return <StepAnsicht sessionId={sessionId} onNext={handleNext} />
+      case 13:
+        return (
+          <StepGroundTruth
+            sessionId={sessionId}
+            isGroundTruth={isGroundTruth}
+            onMarked={() => setIsGroundTruth(true)}
+            gridSaveRef={gridSaveRef}
+          />
+        )
+      default:
+        return null
+    }
+  }
+
+  return (
+    <div className="space-y-6">
+      <PagePurpose
+        title="OCR Kombi Pipeline"
+        purpose="Modulare 11-Schritt-Pipeline: Upload, Vorverarbeitung, Dual-Engine-OCR (PP-OCRv5 + Tesseract), Strukturerkennung, Grid-Aufbau und Review. Multi-Page-Dokument-Unterstuetzung."
+        audience={['Entwickler']}
+        architecture={{
+          services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract', 'PaddleOCR'],
+          databases: ['PostgreSQL Sessions'],
+        }}
+        relatedPages={[
+          { name: 'OCR Overlay (Legacy)', href: '/ai/ocr-overlay', description: 'Alter 3-Modi-Monolith' },
+          { name: 'OCR Regression', href: '/ai/ocr-regression', description: 'Regressionstests' },
+        ]}
+        defaultCollapsed
+      />
+
+      <SessionList
+        items={groupedSessions()}
+        loading={loadingSessions}
+        activeSessionId={sessionId}
+        onOpenSession={(sid) => openSession(sid)}
+        onNewSession={handleNewSession}
+        onDeleteSession={deleteSession}
+        onRenameSession={renameSession}
+        onUpdateCategory={updateCategory}
+      />
+
+      {sessionId && sessionName && (
+        <SessionHeader
+          sessionName={sessionName}
+          activeCategory={activeCategory}
+          isGroundTruth={isGroundTruth}
+          pageNumber={pageNumber}
+          onUpdateCategory={(cat) => updateCategory(sessionId, cat)}
+        />
+      )}
+
+      <KombiStepper
+        steps={steps}
+        currentStep={currentStep}
+        onStepClick={handleStepClick}
+      />
+
+      <div className="min-h-[400px]">{renderStep()}</div>
+    </div>
+  )
+}
+
+export default function OcrKombiPage() {
+  return (
+    <Suspense fallback={<div className="p-4 text-sm text-gray-400">Lade...</div>}>
+      <OcrKombiContent />
+    </Suspense>
+  )
+}
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts
@@ -0,0 +1,123 @@
+import type { PipelineStep, PipelineStepStatus, DocumentCategory } from '../ocr-pipeline/types'
+
+// Re-export shared types
+export type { PipelineStep, PipelineStepStatus, DocumentCategory }
+export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
+
+// Re-export grid/structure types used by later steps
+export type {
+  SessionListItem,
+  SessionInfo,
+  OrientationResult,
+  CropResult,
+  DeskewResult,
+  DewarpResult,
+  GridResult,
+  GridCell,
+  OcrWordBox,
+  WordBbox,
+  ColumnMeta,
+  StructureResult,
+  StructureBox,
+  StructureZone,
+  StructureGraphic,
+  ExcludeRegion,
+} from '../ocr-pipeline/types'
+
+/**
+ * 11-step Kombi V2 pipeline.
+ * Each step has its own component file in components/ocr-kombi/.
+ */
+export const KOMBI_V2_STEPS: PipelineStep[] = [
+  { id: 'upload',        name: 'Upload',             icon: '📤', status: 'pending' },
+  { id: 'orientation',   name: 'Orientierung',       icon: '🔄', status: 'pending' },
+  { id: 'page-split',    name: 'Seitentrennung',     icon: '📖', status: 'pending' },
+  { id: 'deskew',        name: 'Begradigung',        icon: '📐', status: 'pending' },
+  { id: 'dewarp',        name: 'Entzerrung',         icon: '🔧', status: 'pending' },
+  { id: 'content-crop',  name: 'Zuschneiden',        icon: '✂️', status: 'pending' },
+  { id: 'ocr',           name: 'OCR',                icon: '🔀', status: 'pending' },
+  { id: 'structure',     name: 'Strukturerkennung',  icon: '🔍', status: 'pending' },
+  { id: 'grid-build',    name: 'Grid-Aufbau',        icon: '🧱', status: 'pending' },
+  { id: 'grid-review',   name: 'Grid-Review',        icon: '📊', status: 'pending' },
+  { id: 'gutter-repair', name: 'Wortkorrektur',      icon: '🩹', status: 'pending' },
+  { id: 'box-review',    name: 'Box-Review',          icon: '📦', status: 'pending' },
+  { id: 'ansicht',       name: 'Ansicht',             icon: '👁️', status: 'pending' },
+  { id: 'ground-truth',  name: 'Ground Truth',       icon: '✅', status: 'pending' },
+]
+
+/** Map from Kombi V2 UI step index to DB step number */
+export const KOMBI_V2_UI_TO_DB: Record<number, number> = {
+  0: 1,   // upload
+  1: 2,   // orientation
+  2: 2,   // page-split (same DB step as orientation)
+  3: 3,   // deskew
+  4: 4,   // dewarp
+  5: 5,   // content-crop
+  6: 8,   // ocr (word_result)
+  7: 9,   // structure
+  8: 10,  // grid-build
+  9: 11,  // grid-review
+  10: 11, // gutter-repair (shares DB step with grid-review)
+  11: 11, // box-review (shares DB step with grid-review)
+  12: 11, // ansicht (shares DB step with grid-review)
+  13: 12, // ground-truth
+}
+
+/** Map from DB step to Kombi V2 UI step index */
+export function dbStepToKombiV2Ui(dbStep: number): number {
+  if (dbStep <= 1) return 0   // upload
+  if (dbStep === 2) return 1  // orientation
+  if (dbStep === 3) return 3  // deskew
+  if (dbStep === 4) return 4  // dewarp
+  if (dbStep === 5) return 5  // content-crop
+  if (dbStep <= 8) return 6   // ocr
+  if (dbStep === 9) return 7  // structure
+  if (dbStep === 10) return 8 // grid-build
+  if (dbStep === 11) return 9 // grid-review
+  return 13                   // ground-truth
+}
+
+/** Document group: groups multiple sessions from a multi-page upload */
+export interface DocumentGroup {
+  group_id: string
+  title: string
+  page_count: number
+  sessions: DocumentGroupSession[]
+}
+
+export interface DocumentGroupSession {
+  id: string
+  name: string
+  page_number: number
+  current_step: number
+  status: string
+  document_category?: DocumentCategory
+  created_at: string
+}
+
+/** Engine source for OCR transparency */
+export type OcrEngineSource = 'both' | 'paddle_only' | 'tesseract_only' | 'conflict_paddle' | 'conflict_tesseract'
+
+export interface OcrTransparentWord {
+  text: string
+  left: number
+  top: number
+  width: number
+  height: number
+  conf: number
+  engine_source: OcrEngineSource
+}
+
+export interface OcrTransparentResult {
+  raw_tesseract: { words: OcrTransparentWord[] }
+  raw_paddle: { words: OcrTransparentWord[] }
+  merged: { words: OcrTransparentWord[] }
+  stats: {
+    total_words: number
+    both_agree: number
+    paddle_only: number
+    tesseract_only: number
+    conflict_paddle_wins: number
+    conflict_tesseract_wins: number
+  }
+}
--- a/admin-lehrer/app/(admin)/ai/ocr-kombi/useKombiPipeline.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-kombi/useKombiPipeline.ts
@@ -0,0 +1,299 @@
+'use client'
+
+import { useCallback, useEffect, useState, useRef } from 'react'
+import { useSearchParams } from 'next/navigation'
+import type { PipelineStep, DocumentCategory } from './types'
+import { KOMBI_V2_STEPS, dbStepToKombiV2Ui } from './types'
+import type { SessionListItem } from '../ocr-pipeline/types'
+
+export type { SessionListItem }
+
+const KLAUSUR_API = '/klausur-api'
+
+/** Groups sessions by document_group_id for the session list */
+export interface DocumentGroupView {
+  group_id: string
+  title: string
+  sessions: SessionListItem[]
+  page_count: number
+}
+
+function initSteps(): PipelineStep[] {
+  return KOMBI_V2_STEPS.map((s, i) => ({
+    ...s,
+    status: i === 0 ? 'active' : 'pending',
+  }))
+}
+
+export function useKombiPipeline() {
+  const [currentStep, setCurrentStep] = useState(0)
+  const [sessionId, setSessionId] = useState<string | null>(null)
+  const [sessionName, setSessionName] = useState('')
+  const [sessions, setSessions] = useState<SessionListItem[]>([])
+  const [loadingSessions, setLoadingSessions] = useState(true)
+  const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
+  const [isGroundTruth, setIsGroundTruth] = useState(false)
+  const [pageNumber, setPageNumber] = useState<number | null>(null)
+  const [steps, setSteps] = useState<PipelineStep[]>(initSteps())
+
+  const searchParams = useSearchParams()
+  const deepLinkHandled = useRef(false)
+  const gridSaveRef = useRef<(() => Promise<void>) | null>(null)
+
+  // ---- Session loading ----
+
+  const loadSessions = useCallback(async () => {
+    setLoadingSessions(true)
+    try {
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
+      if (res.ok) {
+        const data = await res.json()
+        setSessions((data.sessions || []).filter((s: SessionListItem) => !s.parent_session_id))
+      }
+    } catch (e) {
+      console.error('Failed to load sessions:', e)
+    } finally {
+      setLoadingSessions(false)
+    }
+  }, [])
+
+  useEffect(() => { loadSessions() }, [loadSessions])
+
+  // ---- Group sessions by document_group_id ----
+
+  const groupedSessions = useCallback((): (SessionListItem | DocumentGroupView)[] => {
+    const groups = new Map<string, SessionListItem[]>()
+    const ungrouped: SessionListItem[] = []
+
+    for (const s of sessions) {
+      if (s.document_group_id) {
+        const existing = groups.get(s.document_group_id) || []
+        existing.push(s)
+        groups.set(s.document_group_id, existing)
+      } else {
+        ungrouped.push(s)
+      }
+    }
+
+    const result: (SessionListItem | DocumentGroupView)[] = []
+
+    // Sort groups by earliest created_at
+    const sortedGroups = Array.from(groups.entries()).sort((a, b) => {
+      const aTime = Math.min(...a[1].map(s => new Date(s.created_at).getTime()))
+      const bTime = Math.min(...b[1].map(s => new Date(s.created_at).getTime()))
+      return bTime - aTime
+    })
+
+    for (const [groupId, groupSessions] of sortedGroups) {
+      groupSessions.sort((a, b) => (a.page_number || 0) - (b.page_number || 0))
+      // Extract base title (remove " — S. X" suffix)
+      const baseName = groupSessions[0]?.name?.replace(/ — S\. \d+$/, '') || 'Dokument'
+      result.push({
+        group_id: groupId,
+        title: baseName,
+        sessions: groupSessions,
+        page_count: groupSessions.length,
+      })
+    }
+
+    for (const s of ungrouped) {
+      result.push(s)
+    }
+
+    // Sort by creation time (most recent first)
+    const getTime = (item: SessionListItem | DocumentGroupView): number => {
+      if ('group_id' in item) {
+        return Math.min(...item.sessions.map((s: SessionListItem) => new Date(s.created_at).getTime()))
+      }
+      return new Date(item.created_at).getTime()
+    }
+    result.sort((a, b) => getTime(b) - getTime(a))
+
+    return result
+  }, [sessions])
+
+  // ---- Open session ----
+
+  const openSession = useCallback(async (sid: string) => {
+    try {
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
+      if (!res.ok) return
+      const data = await res.json()
+
+      setSessionId(sid)
+      setSessionName(data.name || data.filename || '')
+      setActiveCategory(data.document_category || undefined)
+      setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
+      setPageNumber(data.grid_editor_result?.page_number?.number ?? null)
+
+      // Determine UI step from DB state
+      const dbStep = data.current_step || 1
+      const hasGrid = !!data.grid_editor_result
+      const hasStructure = !!data.structure_result
+      const hasWords = !!data.word_result
+      const hasGutterRepair = !!(data.ground_truth?.gutter_repair)
+
+      let uiStep: number
+      if (hasGrid && hasGutterRepair) {
+        uiStep = 10 // gutter-repair (already analysed)
+      } else if (hasGrid) {
+        uiStep = 9 // grid-review
+      } else if (hasStructure) {
+        uiStep = 8 // grid-build
+      } else if (hasWords) {
+        uiStep = 7 // structure
+      } else {
+        uiStep = dbStepToKombiV2Ui(dbStep)
+      }
+
+      // Sessions only exist after upload, so always skip the upload step
+      if (uiStep === 0) {
+        uiStep = 1
+      }
+
+      setSteps(
+        KOMBI_V2_STEPS.map((s, i) => ({
+          ...s,
+          status: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
+        })),
+      )
+      setCurrentStep(uiStep)
+    } catch (e) {
+      console.error('Failed to open session:', e)
+    }
+  }, [])
+
+  // ---- Deep link handling ----
+
+  useEffect(() => {
+    if (deepLinkHandled.current) return
+    const urlSession = searchParams.get('session')
+    const urlStep = searchParams.get('step')
+    if (urlSession) {
+      deepLinkHandled.current = true
+      openSession(urlSession).then(() => {
+        if (urlStep) {
+          const stepIdx = parseInt(urlStep, 10)
+          if (!isNaN(stepIdx) && stepIdx >= 0 && stepIdx < KOMBI_V2_STEPS.length) {
+            setCurrentStep(stepIdx)
+          }
+        }
+      })
+    }
+  }, [searchParams, openSession])
+
+  // ---- Step navigation ----
+
+  const goToStep = useCallback((step: number) => {
+    setCurrentStep(step)
+    setSteps(prev =>
+      prev.map((s, i) => ({
+        ...s,
+        status: i < step ? 'completed' : i === step ? 'active' : 'pending',
+      })),
+    )
+  }, [])
+
+  const handleStepClick = useCallback((index: number) => {
+    if (index <= currentStep || steps[index].status === 'completed') {
+      setCurrentStep(index)
+    }
+  }, [currentStep, steps])
+
+  const handleNext = useCallback(() => {
+    if (currentStep >= steps.length - 1) {
+      // Last step → return to session list
+      setSteps(initSteps())
+      setCurrentStep(0)
+      setSessionId(null)
+      loadSessions()
+      return
+    }
+
+    const nextStep = currentStep + 1
+    setSteps(prev =>
+      prev.map((s, i) => {
+        if (i === currentStep) return { ...s, status: 'completed' }
+        if (i === nextStep) return { ...s, status: 'active' }
+        return s
+      }),
+    )
+    setCurrentStep(nextStep)
+  }, [currentStep, steps, loadSessions])
+
+  // ---- Session CRUD ----
+
+  const handleNewSession = useCallback(() => {
+    setSessionId(null)
+    setSessionName('')
+    setCurrentStep(0)
+    setSteps(initSteps())
+  }, [])
+
+  const deleteSession = useCallback(async (sid: string) => {
+    try {
+      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
+      setSessions(prev => prev.filter(s => s.id !== sid))
+      if (sessionId === sid) handleNewSession()
+    } catch (e) {
+      console.error('Failed to delete session:', e)
+    }
+  }, [sessionId, handleNewSession])
+
+  const renameSession = useCallback(async (sid: string, newName: string) => {
+    try {
+      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
+        method: 'PUT',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ name: newName }),
+      })
+      setSessions(prev => prev.map(s => s.id === sid ? { ...s, name: newName } : s))
+      if (sessionId === sid) setSessionName(newName)
+    } catch (e) {
+      console.error('Failed to rename session:', e)
+    }
+  }, [sessionId])
+
+  const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
+    try {
+      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
+        method: 'PUT',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ document_category: category }),
+      })
+      setSessions(prev => prev.map(s => s.id === sid ? { ...s, document_category: category } : s))
+      if (sessionId === sid) setActiveCategory(category)
+    } catch (e) {
+      console.error('Failed to update category:', e)
+    }
+  }, [sessionId])
+
+  return {
+    // State
+    currentStep,
+    sessionId,
+    sessionName,
+    sessions,
+    loadingSessions,
+    activeCategory,
+    isGroundTruth,
+    pageNumber,
+    steps,
+    gridSaveRef,
+    // Computed
+    groupedSessions,
+    // Actions
+    loadSessions,
+    openSession,
+    goToStep,
+    handleStepClick,
+    handleNext,
+    handleNewSession,
+    deleteSession,
+    renameSession,
+    updateCategory,
+    setSessionId,
+    setSessionName,
+    setIsGroundTruth,
+  }
+}
--- a/admin-lehrer/app/(admin)/ai/ocr-overlay/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-overlay/page.tsx
@@ -383,7 +383,7 @@ export default function OcrOverlayPage() {
    if (mode === 'paddle-direct' || mode === 'kombi') {
      switch (currentStep) {
        case 0:
-          return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSubSessionsCreated={handleBoxSessionsCreated} />
+          return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
        case 1:
          return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
        case 2:
@@ -421,7 +421,7 @@ export default function OcrOverlayPage() {
    }
    switch (currentStep) {
      case 0:
-        return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSubSessionsCreated={handleBoxSessionsCreated} />
+        return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
      case 1:
        return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
      case 2:
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/page.tsx
@@ -1,6 +1,6 @@
 'use client'

-import { useCallback, useEffect, useState } from 'react'
+import { Suspense, useCallback, useEffect, useState } from 'react'
 import { PagePurpose } from '@/components/common/PagePurpose'
 import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
 import { StepOrientation } from '@/components/ocr-pipeline/StepOrientation'
@@ -14,37 +14,28 @@ import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecogniti
 import { StepLlmReview } from '@/components/ocr-pipeline/StepLlmReview'
 import { StepReconstruction } from '@/components/ocr-pipeline/StepReconstruction'
 import { StepGroundTruth } from '@/components/ocr-pipeline/StepGroundTruth'
-import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
-import { PIPELINE_STEPS, DOCUMENT_CATEGORIES, type PipelineStep, type SessionListItem, type DocumentTypeResult, type DocumentCategory, type SubSession } from './types'
+import { DOCUMENT_CATEGORIES, type SessionListItem, type DocumentTypeResult, type DocumentCategory, type SubSession } from './types'
+import { usePipelineNavigation } from './usePipelineNavigation'

 const KLAUSUR_API = '/klausur-api'

-export default function OcrPipelinePage() {
-  const [currentStep, setCurrentStep] = useState(0)
-  const [sessionId, setSessionId] = useState<string | null>(null)
-  const [sessionName, setSessionName] = useState<string>('')
+const STEP_NAMES: Record<number, string> = {
+  1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
+  5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
+  9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
+}
+
+function OcrPipelineContent() {
+  const nav = usePipelineNavigation()
  const [sessions, setSessions] = useState<SessionListItem[]>([])
  const [loadingSessions, setLoadingSessions] = useState(true)
  const [editingName, setEditingName] = useState<string | null>(null)
  const [editNameValue, setEditNameValue] = useState('')
  const [editingCategory, setEditingCategory] = useState<string | null>(null)
-  const [docTypeResult, setDocTypeResult] = useState<DocumentTypeResult | null>(null)
+  const [sessionName, setSessionName] = useState('')
  const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
-  const [subSessions, setSubSessions] = useState<SubSession[]>([])
-  const [parentSessionId, setParentSessionId] = useState<string | null>(null)
-  const [steps, setSteps] = useState<PipelineStep[]>(
-    PIPELINE_STEPS.map((s, i) => ({
-      ...s,
-      status: i === 0 ? 'active' : 'pending',
-    })),
-  )

-  // Load session list on mount
-  useEffect(() => {
-    loadSessions()
-  }, [])
-
-  const loadSessions = async () => {
+  const loadSessions = useCallback(async () => {
    setLoadingSessions(true)
    try {
      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
@@ -57,103 +48,42 @@ export default function OcrPipelinePage() {
    } finally {
      setLoadingSessions(false)
    }
-  }
-
-  const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => {
-    try {
-      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
-      if (!res.ok) return
-      const data = await res.json()
-
-      setSessionId(sid)
-      setSessionName(data.name || data.filename || '')
-      setActiveCategory(data.document_category || undefined)
-
-      // Sub-session handling
-      if (data.sub_sessions && data.sub_sessions.length > 0) {
-        setSubSessions(data.sub_sessions)
-        setParentSessionId(sid)
-        // Parent has sub-sessions — open the first incomplete one (or most advanced if all done)
-        const incomplete = data.sub_sessions.find(
-          (s: SubSession) => !s.current_step || s.current_step < 10,
-        )
-        const target = incomplete || [...data.sub_sessions].sort(
-          (a: SubSession, b: SubSession) => (b.current_step || 0) - (a.current_step || 0),
-        )[0]
-        if (target) {
-          openSession(target.id, true)
-          return
-        }
-      } else if (data.parent_session_id) {
-        // This is a sub-session — keep parent info but don't reset sub-session list
-        setParentSessionId(data.parent_session_id)
-      } else if (!keepSubSessions) {
-        setSubSessions([])
-        setParentSessionId(null)
-      }
-
-      // Restore doc type result if available
-      const savedDocType: DocumentTypeResult | null = data.doc_type_result || null
-      setDocTypeResult(savedDocType)
-
-      // Determine which step to jump to based on current_step
-      const dbStep = data.current_step || 1
-      // DB steps: 1=start, 2=orientation, 3=deskew, 4=dewarp, 5=crop, 6=columns, ...
-      // UI steps are 0-indexed: 0=orientation, 1=deskew, 2=dewarp, 3=crop, 4=columns, ...
-      let uiStep = Math.max(0, dbStep - 1)
-      const skipSteps = [...(savedDocType?.skip_steps || [])]
-
-      // Sub-session handling depends on how they were created:
-      // - Crop-based (current_step >= 5): image already cropped, skip all pre-processing
-      // - Page-split (current_step 2): orientation done on parent, skip only orientation
-      // - Page-split from original (current_step 1): needs full pipeline
-      const isSubSession = !!data.parent_session_id
-      if (isSubSession) {
-        if (dbStep >= 5) {
-          // Crop-based sub-sessions: image already cropped
-          const SUB_SESSION_SKIP = ['orientation', 'deskew', 'dewarp', 'crop']
-          for (const s of SUB_SESSION_SKIP) {
-            if (!skipSteps.includes(s)) skipSteps.push(s)
-          }
-          if (uiStep < 4) uiStep = 4 // columns step (index 4)
-        } else if (dbStep >= 2) {
-          // Page-split sub-session: parent orientation applied, skip only orientation
-          if (!skipSteps.includes('orientation')) skipSteps.push('orientation')
-          if (uiStep < 1) uiStep = 1 // advance past skipped orientation to deskew
-        }
-        // dbStep === 1: page-split from original image, needs full pipeline
-      }
-
-      setSteps(
-        PIPELINE_STEPS.map((s, i) => ({
-          ...s,
-          status: skipSteps.includes(s.id)
-            ? 'skipped'
-            : i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
-        })),
-      )
-      setCurrentStep(uiStep)
-    } catch (e) {
-      console.error('Failed to open session:', e)
-    }
  }, [])

+  useEffect(() => { loadSessions() }, [loadSessions])
+
+  // Sync session name when nav.sessionId changes
+  useEffect(() => {
+    if (!nav.sessionId) {
+      setSessionName('')
+      setActiveCategory(undefined)
+      return
+    }
+    const load = async () => {
+      try {
+        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}`)
+        if (!res.ok) return
+        const data = await res.json()
+        setSessionName(data.name || data.filename || '')
+        setActiveCategory(data.document_category || undefined)
+      } catch { /* ignore */ }
+    }
+    load()
+  }, [nav.sessionId])
+
+  const openSession = useCallback((sid: string) => {
+    nav.goToSession(sid)
+  }, [nav])
+
  const deleteSession = useCallback(async (sid: string) => {
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
-      setSessions((prev) => prev.filter((s) => s.id !== sid))
-      if (sessionId === sid) {
-        setSessionId(null)
-        setCurrentStep(0)
-        setDocTypeResult(null)
-        setSubSessions([])
-        setParentSessionId(null)
-        setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
-      }
+      setSessions(prev => prev.filter(s => s.id !== sid))
+      if (nav.sessionId === sid) nav.goToSessionList()
    } catch (e) {
      console.error('Failed to delete session:', e)
    }
-  }, [sessionId])
+  }, [nav])

  const renameSession = useCallback(async (sid: string, newName: string) => {
    try {
@@ -162,13 +92,13 @@ export default function OcrPipelinePage() {
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ name: newName }),
      })
-      setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, name: newName } : s)))
-      if (sessionId === sid) setSessionName(newName)
+      setSessions(prev => prev.map(s => (s.id === sid ? { ...s, name: newName } : s)))
+      if (nav.sessionId === sid) setSessionName(newName)
    } catch (e) {
      console.error('Failed to rename session:', e)
    }
    setEditingName(null)
-  }, [sessionId])
+  }, [nav.sessionId])

  const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
    try {
@@ -177,275 +107,107 @@ export default function OcrPipelinePage() {
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ document_category: category }),
      })
-      setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, document_category: category } : s)))
-      if (sessionId === sid) setActiveCategory(category)
+      setSessions(prev => prev.map(s => (s.id === sid ? { ...s, document_category: category } : s)))
+      if (nav.sessionId === sid) setActiveCategory(category)
    } catch (e) {
      console.error('Failed to update category:', e)
    }
    setEditingCategory(null)
-  }, [sessionId])
+  }, [nav.sessionId])

  const deleteAllSessions = useCallback(async () => {
    if (!confirm('Alle Sessions loeschen? Dies kann nicht rueckgaengig gemacht werden.')) return
    try {
      await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`, { method: 'DELETE' })
      setSessions([])
-      setSessionId(null)
-      setCurrentStep(0)
-      setDocTypeResult(null)
-      setActiveCategory(undefined)
-      setSubSessions([])
-      setParentSessionId(null)
-      setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
+      nav.goToSessionList()
    } catch (e) {
      console.error('Failed to delete all sessions:', e)
    }
-  }, [])
+  }, [nav])

  const handleStepClick = (index: number) => {
-    if (index <= currentStep || steps[index].status === 'completed') {
-      setCurrentStep(index)
+    if (index <= nav.currentStepIndex || nav.steps[index].status === 'completed') {
+      nav.goToStep(index)
    }
  }

-  const goToStep = (step: number) => {
-    setCurrentStep(step)
-    setSteps((prev) =>
-      prev.map((s, i) => ({
-        ...s,
-        status: i < step ? 'completed' : i === step ? 'active' : 'pending',
-      })),
-    )
-  }
-
-  const handleNext = () => {
-    if (currentStep >= steps.length - 1) {
-      // Last step completed
-      if (parentSessionId && sessionId !== parentSessionId) {
-        // Sub-session completed — mark it and find next incomplete one
-        const updatedSubs = subSessions.map((s) =>
-          s.id === sessionId ? { ...s, status: 'completed' as const, current_step: 10 } : s,
-        )
-        setSubSessions(updatedSubs)
-
-        // Find next incomplete sub-session
-        const nextIncomplete = updatedSubs.find(
-          (s) => s.id !== sessionId && (!s.current_step || s.current_step < 10),
-        )
-        if (nextIncomplete) {
-          // Open next incomplete sub-session
-          openSession(nextIncomplete.id, true)
-        } else {
-          // All sub-sessions done — return to session list
-          setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
-          setCurrentStep(0)
-          setSessionId(null)
-          setSubSessions([])
-          setParentSessionId(null)
-          loadSessions()
-        }
-        return
-      }
-      // Main session: return to session list
-      setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
-      setCurrentStep(0)
-      setSessionId(null)
-      setSubSessions([])
-      setParentSessionId(null)
-      loadSessions()
-      return
-    }
-
-    // Find the next non-skipped step
-    const skipSteps = docTypeResult?.skip_steps || []
-    let nextStep = currentStep + 1
-    while (nextStep < steps.length && skipSteps.includes(PIPELINE_STEPS[nextStep]?.id)) {
-      nextStep++
-    }
-    if (nextStep >= steps.length) nextStep = steps.length - 1
-
-    setSteps((prev) =>
-      prev.map((s, i) => {
-        if (i === currentStep) return { ...s, status: 'completed' }
-        if (i === nextStep) return { ...s, status: 'active' }
-        // Mark skipped steps between current and next
-        if (i > currentStep && i < nextStep && skipSteps.includes(PIPELINE_STEPS[i]?.id)) {
-          return { ...s, status: 'skipped' }
-        }
-        return s
-      }),
-    )
-    setCurrentStep(nextStep)
-  }
-
-  const handleOrientationComplete = async (sid: string) => {
-    setSessionId(sid)
+  // Orientation: after upload, navigate to session at deskew step
+  const handleOrientationComplete = useCallback(async (sid: string) => {
    loadSessions()
+    // Navigate directly to deskew step (index 1) for this session
+    nav.goToSession(sid)
+  }, [nav, loadSessions])

-    // Check for page-split sub-sessions directly from API
-    // (React state may not be committed yet due to batching)
-    try {
-      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
-      if (res.ok) {
-        const data = await res.json()
-        if (data.sub_sessions?.length > 0) {
-          const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
-            id: s.id,
-            name: s.name,
-            box_index: s.box_index,
-            current_step: s.current_step,
-          }))
-          setSubSessions(subs)
-          setParentSessionId(sid)
-          openSession(subs[0].id, true)
-          return
-        }
-      }
-    } catch (e) {
-      console.error('Failed to check for sub-sessions:', e)
-    }
-
-    handleNext()
-  }
-
-  const handleCropNext = async () => {
-    // Auto-detect document type after crop (last image-processing step), then advance
-    if (sessionId) {
+  // Crop: detect doc type then advance
+  const handleCropNext = useCallback(async () => {
+    if (nav.sessionId) {
      try {
        const res = await fetch(
-          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/detect-type`,
+          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}/detect-type`,
          { method: 'POST' },
        )
        if (res.ok) {
          const data: DocumentTypeResult = await res.json()
-          setDocTypeResult(data)
-
-          // Mark skipped steps immediately
-          const skipSteps = data.skip_steps || []
-          if (skipSteps.length > 0) {
-            setSteps((prev) =>
-              prev.map((s) =>
-                skipSteps.includes(s.id) ? { ...s, status: 'skipped' } : s,
-              ),
-            )
-          }
+          nav.setDocType(data)
        }
      } catch (e) {
        console.error('Doc type detection failed:', e)
-        // Not critical — continue without it
      }
    }
-    handleNext()
-  }
+    nav.goToNextStep()
+  }, [nav])

  const handleDocTypeChange = (newDocType: DocumentTypeResult['doc_type']) => {
-    if (!docTypeResult) return
-
-    // Build new skip_steps based on doc type
+    if (!nav.docTypeResult) return
    let skipSteps: string[] = []
-    if (newDocType === 'full_text') {
-      skipSteps = ['columns', 'rows']
-    }
-    // vocab_table and generic_table: no skips
+    if (newDocType === 'full_text') skipSteps = ['columns', 'rows']

-    const updated: DocumentTypeResult = {
-      ...docTypeResult,
+    nav.setDocType({
+      ...nav.docTypeResult,
      doc_type: newDocType,
      skip_steps: skipSteps,
      pipeline: newDocType === 'full_text' ? 'full_page' : 'cell_first',
-    }
-    setDocTypeResult(updated)
-
-    // Update step statuses
-    setSteps((prev) =>
-      prev.map((s) => {
-        if (skipSteps.includes(s.id)) return { ...s, status: 'skipped' as const }
-        if (s.status === 'skipped') return { ...s, status: 'pending' as const }
-        return s
-      }),
-    )
+    })
  }

-  const handleNewSession = () => {
-    setSessionId(null)
-    setSessionName('')
-    setCurrentStep(0)
-    setDocTypeResult(null)
-    setSubSessions([])
-    setParentSessionId(null)
-    setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
-  }
-
-  const handleSessionChange = useCallback((newSessionId: string) => {
-    openSession(newSessionId, true)
-  }, [openSession])
-
-  const handleBoxSessionsCreated = useCallback((subs: SubSession[]) => {
-    setSubSessions(subs)
-    if (sessionId) setParentSessionId(sessionId)
-  }, [sessionId])
-
-  const stepNames: Record<number, string> = {
-    1: 'Orientierung',
-    2: 'Begradigung',
-    3: 'Entzerrung',
-    4: 'Zuschneiden',
-    5: 'Spalten',
-    6: 'Zeilen',
-    7: 'Woerter',
-    8: 'Struktur',
-    9: 'Korrektur',
-    10: 'Rekonstruktion',
-    11: 'Validierung',
-  }
-
-  const reprocessFromStep = useCallback(async (uiStep: number) => {
-    if (!sessionId) return
-    const dbStep = uiStep + 1 // UI is 0-indexed, DB is 1-indexed
-    if (!confirm(`Ab Schritt ${dbStep} (${stepNames[dbStep] || '?'}) neu verarbeiten? Nachfolgende Daten werden geloescht.`)) return
-    try {
-      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
-        method: 'POST',
-        headers: { 'Content-Type': 'application/json' },
-        body: JSON.stringify({ from_step: dbStep }),
-      })
-      if (!res.ok) {
-        const data = await res.json().catch(() => ({}))
-        console.error('Reprocess failed:', data.detail || res.status)
-        return
-      }
-      // Reset UI steps
-      goToStep(uiStep)
-    } catch (e) {
-      console.error('Reprocess error:', e)
-    }
-  // eslint-disable-next-line react-hooks/exhaustive-deps
-  }, [sessionId, goToStep])
+  // Box sub-sessions (column detection) — still supported
+  const handleBoxSessionsCreated = useCallback((_subs: SubSession[]) => {
+    // Box sub-sessions are tracked by the backend; no client-side state needed anymore
+  }, [])

  const renderStep = () => {
-    switch (currentStep) {
+    const sid = nav.sessionId
+    switch (nav.currentStepIndex) {
      case 0:
-        return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSubSessionsCreated={handleBoxSessionsCreated} />
+        return (
+          <StepOrientation
+            key={sid}
+            sessionId={sid}
+            onNext={handleOrientationComplete}
+            onSessionList={() => { loadSessions(); nav.goToSessionList() }}
+          />
+        )
      case 1:
-        return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
+        return <StepDeskew key={sid} sessionId={sid} onNext={nav.goToNextStep} />
      case 2:
-        return <StepDewarp key={sessionId} sessionId={sessionId} onNext={handleNext} />
+        return <StepDewarp key={sid} sessionId={sid} onNext={nav.goToNextStep} />
      case 3:
-        return <StepCrop key={sessionId} sessionId={sessionId} onNext={handleCropNext} />
+        return <StepCrop key={sid} sessionId={sid} onNext={handleCropNext} />
      case 4:
-        return <StepColumnDetection sessionId={sessionId} onNext={handleNext} onBoxSessionsCreated={handleBoxSessionsCreated} />
+        return <StepColumnDetection sessionId={sid} onNext={nav.goToNextStep} onBoxSessionsCreated={handleBoxSessionsCreated} />
      case 5:
-        return <StepRowDetection sessionId={sessionId} onNext={handleNext} />
+        return <StepRowDetection sessionId={sid} onNext={nav.goToNextStep} />
      case 6:
-        return <StepWordRecognition sessionId={sessionId} onNext={handleNext} goToStep={goToStep} />
+        return <StepWordRecognition sessionId={sid} onNext={nav.goToNextStep} goToStep={nav.goToStep} />
      case 7:
-        return <StepStructureDetection sessionId={sessionId} onNext={handleNext} />
+        return <StepStructureDetection sessionId={sid} onNext={nav.goToNextStep} />
      case 8:
-        return <StepLlmReview sessionId={sessionId} onNext={handleNext} />
+        return <StepLlmReview sessionId={sid} onNext={nav.goToNextStep} />
      case 9:
-        return <StepReconstruction sessionId={sessionId} onNext={handleNext} />
+        return <StepReconstruction sessionId={sid} onNext={nav.goToNextStep} />
      case 10:
-        return <StepGroundTruth sessionId={sessionId} onNext={handleNext} />
+        return <StepGroundTruth sessionId={sid} onNext={nav.goToNextStep} />
      default:
        return null
    }
@@ -485,7 +247,7 @@ export default function OcrPipelinePage() {
              </button>
            )}
            <button
-              onClick={handleNewSession}
+              onClick={() => nav.goToSessionList()}
              className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
            >
              + Neue Session
@@ -505,7 +267,7 @@ export default function OcrPipelinePage() {
                <div
                  key={s.id}
                  className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
-                    sessionId === s.id
+                    nav.sessionId === s.id
                      ? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
                      : 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
                  }`}
@@ -561,13 +323,12 @@ export default function OcrPipelinePage() {
                    </button>
                    <div className="text-xs text-gray-400 flex gap-2 mt-0.5">
                      <span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
-                      <span>Schritt {s.current_step}: {stepNames[s.current_step] || '?'}</span>
+                      <span>Schritt {s.current_step}: {STEP_NAMES[s.current_step] || '?'}</span>
                    </div>
                  </div>

                  {/* Badges */}
                  <div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
-                    {/* Category Badge */}
                    <button
                      onClick={() => setEditingCategory(editingCategory === s.id ? null : s.id)}
                      className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
@@ -579,7 +340,6 @@ export default function OcrPipelinePage() {
                    >
                      {catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
                    </button>
-                    {/* Doc Type Badge (read-only) */}
                    {s.doc_type && (
                      <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
                        {s.doc_type}
@@ -616,7 +376,7 @@ export default function OcrPipelinePage() {
                    </button>
                  </div>

-                  {/* Category dropdown (inline) */}
+                  {/* Category dropdown */}
                  {editingCategory === s.id && (
                    <div
                      className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
@@ -645,40 +405,39 @@ export default function OcrPipelinePage() {
      </div>

      {/* Active session info */}
-      {sessionId && sessionName && (
+      {nav.sessionId && sessionName && (
        <div className="flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
          <span>Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span></span>
          {activeCategory && (() => {
            const cat = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
            return cat ? <span className="text-xs px-2 py-0.5 rounded-full bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300">{cat.icon} {cat.label}</span> : null
          })()}
-          {docTypeResult && (
+          {nav.docTypeResult && (
            <span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
-              {docTypeResult.doc_type}
+              {nav.docTypeResult.doc_type}
            </span>
          )}
        </div>
      )}

      <PipelineStepper
-        steps={steps}
-        currentStep={currentStep}
+        steps={nav.steps}
+        currentStep={nav.currentStepIndex}
        onStepClick={handleStepClick}
-        onReprocess={sessionId ? reprocessFromStep : undefined}
-        docTypeResult={docTypeResult}
+        onReprocess={nav.sessionId ? nav.reprocessFromStep : undefined}
+        docTypeResult={nav.docTypeResult}
        onDocTypeChange={handleDocTypeChange}
      />

-      {subSessions.length > 0 && parentSessionId && sessionId && (
-        <BoxSessionTabs
-          parentSessionId={parentSessionId}
-          subSessions={subSessions}
-          activeSessionId={sessionId}
-          onSessionChange={handleSessionChange}
-        />
-      )}
-
      <div className="min-h-[400px]">{renderStep()}</div>
    </div>
  )
 }
+
+export default function OcrPipelinePage() {
+  return (
+    <Suspense fallback={<div className="p-8 text-gray-400">Lade Pipeline...</div>}>
+      <OcrPipelineContent />
+    </Suspense>
+  )
+}
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/types.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/types.ts
@@ -33,12 +33,15 @@ export interface SessionListItem {
  current_step: number
  document_category?: DocumentCategory
  doc_type?: string
+  parent_session_id?: string
+  document_group_id?: string
+  page_number?: number
+  is_ground_truth?: boolean
  created_at: string
  updated_at?: string
-  parent_session_id?: string | null
-  box_index?: number | null
 }

+/** Box sub-session (from column detection zone_type='box') */
 export interface SubSession {
  id: string
  name: string
@@ -109,6 +112,8 @@ export interface SessionInfo {
  sub_sessions?: SubSession[]
  parent_session_id?: string
  box_index?: number
+  document_group_id?: string
+  page_number?: number
 }

 export interface DeskewResult {
--- a/admin-lehrer/app/(admin)/ai/ocr-pipeline/usePipelineNavigation.ts
+++ b/admin-lehrer/app/(admin)/ai/ocr-pipeline/usePipelineNavigation.ts
@@ -0,0 +1,225 @@
+'use client'
+
+import { useCallback, useEffect, useState } from 'react'
+import { useRouter, useSearchParams } from 'next/navigation'
+import { PIPELINE_STEPS, type PipelineStep, type PipelineStepStatus, type DocumentTypeResult } from './types'
+
+const KLAUSUR_API = '/klausur-api'
+
+export interface PipelineNav {
+  sessionId: string | null
+  currentStepIndex: number
+  currentStepId: string
+  steps: PipelineStep[]
+  docTypeResult: DocumentTypeResult | null
+
+  goToNextStep: () => void
+  goToStep: (index: number) => void
+  goToSession: (sessionId: string) => void
+  goToSessionList: () => void
+  setDocType: (result: DocumentTypeResult) => void
+  reprocessFromStep: (uiStep: number) => Promise<void>
+}
+
+const STEP_NAMES: Record<number, string> = {
+  1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
+  5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
+  9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
+}
+
+function buildSteps(uiStep: number, skipSteps: string[]): PipelineStep[] {
+  return PIPELINE_STEPS.map((s, i) => ({
+    ...s,
+    status: (
+      skipSteps.includes(s.id) ? 'skipped'
+        : i < uiStep ? 'completed'
+          : i === uiStep ? 'active'
+            : 'pending'
+    ) as PipelineStepStatus,
+  }))
+}
+
+export function usePipelineNavigation(): PipelineNav {
+  const router = useRouter()
+  const searchParams = useSearchParams()
+
+  const paramSession = searchParams.get('session')
+  const paramStep = searchParams.get('step')
+
+  const [sessionId, setSessionId] = useState<string | null>(paramSession)
+  const [currentStepIndex, setCurrentStepIndex] = useState(0)
+  const [docTypeResult, setDocTypeResult] = useState<DocumentTypeResult | null>(null)
+  const [steps, setSteps] = useState<PipelineStep[]>(buildSteps(0, []))
+  const [loaded, setLoaded] = useState(false)
+
+  // Load session info when session param changes
+  useEffect(() => {
+    if (!paramSession) {
+      setSessionId(null)
+      setCurrentStepIndex(0)
+      setDocTypeResult(null)
+      setSteps(buildSteps(0, []))
+      setLoaded(true)
+      return
+    }
+
+    const load = async () => {
+      try {
+        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${paramSession}`)
+        if (!res.ok) return
+        const data = await res.json()
+
+        setSessionId(paramSession)
+
+        const savedDocType: DocumentTypeResult | null = data.doc_type_result || null
+        setDocTypeResult(savedDocType)
+
+        const dbStep = data.current_step || 1
+        let uiStep = Math.max(0, dbStep - 1)
+        const skipSteps = [...(savedDocType?.skip_steps || [])]
+
+        // Box sub-sessions (from column detection) skip pre-processing
+        const isBoxSubSession = !!data.parent_session_id
+        if (isBoxSubSession && dbStep >= 5) {
+          const SUB_SESSION_SKIP = ['orientation', 'deskew', 'dewarp', 'crop']
+          for (const s of SUB_SESSION_SKIP) {
+            if (!skipSteps.includes(s)) skipSteps.push(s)
+          }
+          if (uiStep < 4) uiStep = 4
+        }
+
+        // If URL has a step param, use that instead
+        if (paramStep) {
+          const stepIdx = PIPELINE_STEPS.findIndex(s => s.id === paramStep)
+          if (stepIdx >= 0) uiStep = stepIdx
+        }
+
+        setCurrentStepIndex(uiStep)
+        setSteps(buildSteps(uiStep, skipSteps))
+      } catch (e) {
+        console.error('Failed to load session:', e)
+      } finally {
+        setLoaded(true)
+      }
+    }
+
+    load()
+  }, [paramSession, paramStep])
+
+  const updateUrl = useCallback((sid: string | null, stepIdx?: number) => {
+    if (!sid) {
+      router.push('/ai/ocr-pipeline')
+      return
+    }
+    const stepId = stepIdx !== undefined ? PIPELINE_STEPS[stepIdx]?.id : undefined
+    const params = new URLSearchParams()
+    params.set('session', sid)
+    if (stepId) params.set('step', stepId)
+    router.push(`/ai/ocr-pipeline?${params.toString()}`)
+  }, [router])
+
+  const goToNextStep = useCallback(() => {
+    if (currentStepIndex >= steps.length - 1) {
+      // Last step — return to session list
+      setSessionId(null)
+      setCurrentStepIndex(0)
+      setDocTypeResult(null)
+      setSteps(buildSteps(0, []))
+      router.push('/ai/ocr-pipeline')
+      return
+    }
+
+    const skipSteps = docTypeResult?.skip_steps || []
+    let nextStep = currentStepIndex + 1
+    while (nextStep < steps.length && skipSteps.includes(PIPELINE_STEPS[nextStep]?.id)) {
+      nextStep++
+    }
+    if (nextStep >= steps.length) nextStep = steps.length - 1
+
+    setSteps(prev =>
+      prev.map((s, i) => {
+        if (i === currentStepIndex) return { ...s, status: 'completed' as PipelineStepStatus }
+        if (i === nextStep) return { ...s, status: 'active' as PipelineStepStatus }
+        if (i > currentStepIndex && i < nextStep && skipSteps.includes(PIPELINE_STEPS[i]?.id)) {
+          return { ...s, status: 'skipped' as PipelineStepStatus }
+        }
+        return s
+      }),
+    )
+    setCurrentStepIndex(nextStep)
+    if (sessionId) updateUrl(sessionId, nextStep)
+  }, [currentStepIndex, steps.length, docTypeResult, sessionId, updateUrl, router])
+
+  const goToStep = useCallback((index: number) => {
+    setCurrentStepIndex(index)
+    setSteps(prev =>
+      prev.map((s, i) => ({
+        ...s,
+        status: s.status === 'skipped' ? 'skipped'
+          : i < index ? 'completed'
+            : i === index ? 'active'
+              : 'pending' as PipelineStepStatus,
+      })),
+    )
+    if (sessionId) updateUrl(sessionId, index)
+  }, [sessionId, updateUrl])
+
+  const goToSession = useCallback((sid: string) => {
+    updateUrl(sid)
+  }, [updateUrl])
+
+  const goToSessionList = useCallback(() => {
+    setSessionId(null)
+    setCurrentStepIndex(0)
+    setDocTypeResult(null)
+    setSteps(buildSteps(0, []))
+    router.push('/ai/ocr-pipeline')
+  }, [router])
+
+  const setDocType = useCallback((result: DocumentTypeResult) => {
+    setDocTypeResult(result)
+    const skipSteps = result.skip_steps || []
+    if (skipSteps.length > 0) {
+      setSteps(prev =>
+        prev.map(s =>
+          skipSteps.includes(s.id) ? { ...s, status: 'skipped' as PipelineStepStatus } : s,
+        ),
+      )
+    }
+  }, [])
+
+  const reprocessFromStep = useCallback(async (uiStep: number) => {
+    if (!sessionId) return
+    const dbStep = uiStep + 1
+    if (!confirm(`Ab Schritt ${dbStep} (${STEP_NAMES[dbStep] || '?'}) neu verarbeiten? Nachfolgende Daten werden geloescht.`)) return
+    try {
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ from_step: dbStep }),
+      })
+      if (!res.ok) {
+        const data = await res.json().catch(() => ({}))
+        console.error('Reprocess failed:', data.detail || res.status)
+        return
+      }
+      goToStep(uiStep)
+    } catch (e) {
+      console.error('Reprocess error:', e)
+    }
+  }, [sessionId, goToStep])
+
+  return {
+    sessionId,
+    currentStepIndex,
+    currentStepId: PIPELINE_STEPS[currentStepIndex]?.id || 'orientation',
+    steps,
+    docTypeResult,
+    goToNextStep,
+    goToStep,
+    goToSession,
+    goToSessionList,
+    setDocType,
+    reprocessFromStep,
+  }
+}
--- a/admin-lehrer/app/(admin)/ai/rag/tests/rag-documents.test.ts
+++ b/admin-lehrer/app/(admin)/ai/rag/tests/rag-documents.test.ts
@@ -0,0 +1,252 @@
+import { describe, it, expect } from 'vitest'
+import ragData from '../rag-documents.json'
+
+/**
+ * Tests fuer rag-documents.json — Branchen-Regulierungs-Matrix
+ *
+ * Validiert die JSON-Struktur, Branchen-Zuordnung und Datenintegritaet
+ * der 320 Dokumente fuer die RAG Landkarte.
+ */
+
+const VALID_INDUSTRY_IDS = ragData.industries.map((i: any) => i.id)
+const VALID_DOC_TYPE_IDS = ragData.doc_types.map((dt: any) => dt.id)
+
+describe('rag-documents.json — Struktur', () => {
+  it('sollte doc_types, industries und documents enthalten', () => {
+    expect(ragData).toHaveProperty('doc_types')
+    expect(ragData).toHaveProperty('industries')
+    expect(ragData).toHaveProperty('documents')
+    expect(Array.isArray(ragData.doc_types)).toBe(true)
+    expect(Array.isArray(ragData.industries)).toBe(true)
+    expect(Array.isArray(ragData.documents)).toBe(true)
+  })
+
+  it('sollte genau 10 Branchen haben (VDMA/VDA/BDI)', () => {
+    expect(ragData.industries).toHaveLength(10)
+    const ids = ragData.industries.map((i: any) => i.id)
+    expect(ids).toContain('automotive')
+    expect(ids).toContain('maschinenbau')
+    expect(ids).toContain('elektrotechnik')
+    expect(ids).toContain('chemie')
+    expect(ids).toContain('metall')
+    expect(ids).toContain('energie')
+    expect(ids).toContain('transport')
+    expect(ids).toContain('handel')
+    expect(ids).toContain('konsumgueter')
+    expect(ids).toContain('bau')
+  })
+
+  it('sollte keine Pseudo-Branchen enthalten (IoT, KI, HR, KRITIS, etc.)', () => {
+    const ids = ragData.industries.map((i: any) => i.id)
+    expect(ids).not.toContain('iot')
+    expect(ids).not.toContain('ai')
+    expect(ids).not.toContain('hr')
+    expect(ids).not.toContain('kritis')
+    expect(ids).not.toContain('ecommerce')
+    expect(ids).not.toContain('tech')
+    expect(ids).not.toContain('media')
+    expect(ids).not.toContain('public')
+  })
+
+  it('sollte 17 Dokumenttypen haben', () => {
+    expect(ragData.doc_types.length).toBe(17)
+  })
+
+  it('sollte mindestens 300 Dokumente haben', () => {
+    expect(ragData.documents.length).toBeGreaterThanOrEqual(300)
+  })
+
+  it('sollte jede Branche name und icon haben', () => {
+    ragData.industries.forEach((ind: any) => {
+      expect(ind).toHaveProperty('id')
+      expect(ind).toHaveProperty('name')
+      expect(ind).toHaveProperty('icon')
+      expect(ind.name.length).toBeGreaterThan(0)
+    })
+  })
+
+  it('sollte jeden doc_type mit id, label, icon und sort haben', () => {
+    ragData.doc_types.forEach((dt: any) => {
+      expect(dt).toHaveProperty('id')
+      expect(dt).toHaveProperty('label')
+      expect(dt).toHaveProperty('icon')
+      expect(dt).toHaveProperty('sort')
+    })
+  })
+})
+
+describe('rag-documents.json — Dokument-Validierung', () => {
+  it('sollte keine doppelten Codes haben', () => {
+    const codes = ragData.documents.map((d: any) => d.code)
+    const unique = new Set(codes)
+    expect(unique.size).toBe(codes.length)
+  })
+
+  it('sollte Pflichtfelder bei jedem Dokument haben', () => {
+    ragData.documents.forEach((doc: any) => {
+      expect(doc).toHaveProperty('code')
+      expect(doc).toHaveProperty('name')
+      expect(doc).toHaveProperty('doc_type')
+      expect(doc).toHaveProperty('industries')
+      expect(doc).toHaveProperty('in_rag')
+      expect(doc).toHaveProperty('rag_collection')
+      expect(doc.code.length).toBeGreaterThan(0)
+      expect(doc.name.length).toBeGreaterThan(0)
+      expect(Array.isArray(doc.industries)).toBe(true)
+    })
+  })
+
+  it('sollte nur gueltige doc_type IDs verwenden', () => {
+    ragData.documents.forEach((doc: any) => {
+      expect(VALID_DOC_TYPE_IDS).toContain(doc.doc_type)
+    })
+  })
+
+  it('sollte nur gueltige industry IDs verwenden (oder "all")', () => {
+    ragData.documents.forEach((doc: any) => {
+      doc.industries.forEach((ind: string) => {
+        if (ind !== 'all') {
+          expect(VALID_INDUSTRY_IDS).toContain(ind)
+        }
+      })
+    })
+  })
+
+  it('sollte gueltige rag_collection Namen verwenden', () => {
+    const validCollections = [
+      'bp_compliance_ce',
+      'bp_compliance_gesetze',
+      'bp_compliance_datenschutz',
+      'bp_dsfa_corpus',
+      'bp_legal_templates',
+      'bp_compliance_recht',
+      'bp_nibis_eh',
+    ]
+    ragData.documents.forEach((doc: any) => {
+      expect(validCollections).toContain(doc.rag_collection)
+    })
+  })
+})
+
+describe('rag-documents.json — Branchen-Zuordnungslogik', () => {
+  const findDoc = (code: string) => ragData.documents.find((d: any) => d.code === code)
+
+  describe('Horizontale Regulierungen (alle Branchen)', () => {
+    const horizontalCodes = [
+      'GDPR', 'BDSG_FULL', 'EPRIVACY', 'TDDDG', 'AIACT', 'CRA',
+      'NIS2', 'GPSR', 'PLD', 'EUCSA', 'DATAACT',
+    ]
+
+    horizontalCodes.forEach((code) => {
+      it(`${code} sollte fuer alle Branchen gelten`, () => {
+        const doc = findDoc(code)
+        if (doc) {
+          expect(doc.industries).toContain('all')
+        }
+      })
+    })
+  })
+
+  describe('Sektorspezifische Regulierungen', () => {
+    it('Maschinenverordnung sollte Maschinenbau, Automotive, Elektrotechnik enthalten', () => {
+      const doc = findDoc('MACHINERY_REG')
+      if (doc) {
+        expect(doc.industries).toContain('maschinenbau')
+        expect(doc.industries).toContain('automotive')
+        expect(doc.industries).toContain('elektrotechnik')
+        expect(doc.industries).not.toContain('all')
+      }
+    })
+
+    it('ElektroG sollte Elektrotechnik und Automotive enthalten', () => {
+      const doc = findDoc('DE_ELEKTROG')
+      if (doc) {
+        expect(doc.industries).toContain('elektrotechnik')
+        expect(doc.industries).toContain('automotive')
+      }
+    })
+
+    it('BattDG sollte Automotive und Elektrotechnik enthalten', () => {
+      const doc = findDoc('DE_BATTDG')
+      if (doc) {
+        expect(doc.industries).toContain('automotive')
+        expect(doc.industries).toContain('elektrotechnik')
+      }
+    })
+
+    it('ENISA ICS/SCADA sollte Energie, Maschinenbau, Chemie enthalten', () => {
+      const doc = findDoc('ENISA_ICS_SCADA')
+      if (doc) {
+        expect(doc.industries).toContain('energie')
+        expect(doc.industries).toContain('maschinenbau')
+        expect(doc.industries).toContain('chemie')
+      }
+    })
+  })
+
+  describe('Nicht zutreffende Regulierungen (Finanz/Medizin/Plattformen)', () => {
+    const emptyIndustryCodes = ['DORA', 'PSD2', 'MiCA', 'AMLR', 'EHDS', 'DSA', 'DMA', 'MDR']
+
+    emptyIndustryCodes.forEach((code) => {
+      it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
+        const doc = findDoc(code)
+        if (doc) {
+          expect(doc.industries).toHaveLength(0)
+        }
+      })
+    })
+  })
+
+  describe('BSI-TR-03161 (DiGA) sollte nicht zutreffend sein', () => {
+    ['BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3'].forEach((code) => {
+      it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
+        const doc = findDoc(code)
+        if (doc) {
+          expect(doc.industries).toHaveLength(0)
+        }
+      })
+    })
+  })
+})
+
+describe('rag-documents.json — Applicability Notes', () => {
+  it('sollte applicability_note bei Dokumenten mit description haben', () => {
+    const withDescription = ragData.documents.filter((d: any) => d.description)
+    const withNote = withDescription.filter((d: any) => d.applicability_note)
+    // Mindestens 90% der Dokumente mit Beschreibung sollten eine Note haben
+    expect(withNote.length / withDescription.length).toBeGreaterThan(0.9)
+  })
+
+  it('horizontale Regulierungen sollten "alle Branchen" in der Note erwaehnen', () => {
+    const gdpr = ragData.documents.find((d: any) => d.code === 'GDPR')
+    if (gdpr?.applicability_note) {
+      expect(gdpr.applicability_note.toLowerCase()).toContain('alle branchen')
+    }
+  })
+
+  it('nicht zutreffende sollten "nicht zutreffend" in der Note erwaehnen', () => {
+    const dora = ragData.documents.find((d: any) => d.code === 'DORA')
+    if (dora?.applicability_note) {
+      expect(dora.applicability_note.toLowerCase()).toContain('nicht zutreffend')
+    }
+  })
+})
+
+describe('rag-documents.json — Dokumenttyp-Verteilung', () => {
+  it('sollte Dokumente in jedem doc_type haben', () => {
+    ragData.doc_types.forEach((dt: any) => {
+      const count = ragData.documents.filter((d: any) => d.doc_type === dt.id).length
+      expect(count).toBeGreaterThan(0)
+    })
+  })
+
+  it('sollte EU-Verordnungen als groesste Kategorie haben (mind. 15)', () => {
+    const euRegs = ragData.documents.filter((d: any) => d.doc_type === 'eu_regulation')
+    expect(euRegs.length).toBeGreaterThanOrEqual(15)
+  })
+
+  it('sollte EDPB Leitlinien als umfangreichste Kategorie haben (mind. 40)', () => {
+    const edpb = ragData.documents.filter((d: any) => d.doc_type === 'edpb_guideline')
+    expect(edpb.length).toBeGreaterThanOrEqual(40)
+  })
+})
--- a/admin-lehrer/app/(admin)/ai/rag/page.tsx
+++ b/admin-lehrer/app/(admin)/ai/rag/page.tsx
--- a/admin-lehrer/app/(admin)/ai/rag/rag-documents.json
+++ b/admin-lehrer/app/(admin)/ai/rag/rag-documents.json
--- a/admin-lehrer/components/grid-editor/GridEditor.tsx
+++ b/admin-lehrer/components/grid-editor/GridEditor.tsx
@@ -36,6 +36,10 @@ export function GridEditor({ sessionId, onNext }: GridEditorProps) {
    addColumn,
    deleteRow,
    addRow,
+    ipaMode,
+    setIpaMode,
+    syllableMode,
+    setSyllableMode,
  } = useGridEditor(sessionId)

  const [showOverlay, setShowOverlay] = useState(false)
@@ -170,6 +174,11 @@ export function GridEditor({ sessionId, onNext }: GridEditorProps) {
            Woerterbuch ({Math.round(grid.dictionary_detection.confidence * 100)}%)
          </span>
        )}
+        {grid.page_number?.text && (
+          <span className="px-1.5 py-0.5 rounded bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 border border-gray-200 dark:border-gray-600">
+            S. {grid.page_number.text}
+          </span>
+        )}
        <span className="text-gray-400">
          {grid.duration_seconds.toFixed(1)}s
        </span>
@@ -183,11 +192,15 @@ export function GridEditor({ sessionId, onNext }: GridEditorProps) {
          canUndo={canUndo}
          canRedo={canRedo}
          showOverlay={showOverlay}
+          ipaMode={ipaMode}
+          syllableMode={syllableMode}
          onSave={saveGrid}
          onUndo={undo}
          onRedo={redo}
          onRebuild={buildGrid}
          onToggleOverlay={() => setShowOverlay(!showOverlay)}
+          onIpaModeChange={setIpaMode}
+          onSyllableModeChange={setSyllableMode}
        />
      </div>

--- a/admin-lehrer/components/grid-editor/GridTable.tsx
+++ b/admin-lehrer/components/grid-editor/GridTable.tsx
@@ -107,12 +107,18 @@ export function GridTable({
    const row = zone.rows.find((r) => r.index === rowIndex)
    if (!row) return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)

+    // Multi-line cells (containing \n): expand height based on line count
+    const rowCells = zone.cells.filter((c) => c.row_index === rowIndex)
+    const maxLines = Math.max(1, ...rowCells.map((c) => (c.text ?? '').split('\n').length))
+    if (maxLines > 1) {
+      const lineH = Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
+      return lineH * maxLines
+    }
+
    if (isHeader) {
-      // Headers keep their measured height
      const measuredH = row.y_max_px - row.y_min_px
      return Math.max(MIN_ROW_HEIGHT, measuredH * scale)
    }
-    // Content rows use average for uniformity
    return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
  }

@@ -410,46 +416,43 @@ export function GridTable({

              {/* Cells — spanning header or normal columns */}
              {isSpanning ? (
-                <div
-                  className="border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center"
-                  style={{
-                    gridColumn: `2 / ${numCols + 2}`,
-                    height: `${rowH}px`,
-                  }}
-                >
-                  {(() => {
-                    const spanCell = zone.cells.find(
-                      (c) => c.row_index === row.index && c.col_type === 'spanning_header',
-                    )
-                    if (!spanCell) return null
-                    const cellId = spanCell.cell_id
-                    const isSelected = selectedCell === cellId
-                    const cellColor = getCellColor(spanCell)
-                    return (
-                      <div className="flex items-center w-full">
-                        {cellColor && (
-                          <span
-                            className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm"
-                            style={{ backgroundColor: cellColor }}
-                          />
-                        )}
-                        <input
-                          id={`cell-${cellId}`}
-                          type="text"
-                          value={spanCell.text}
-                          onChange={(e) => onCellTextChange(cellId, e.target.value)}
-                          onFocus={() => onSelectCell(cellId)}
-                          onKeyDown={(e) => handleKeyDown(e, cellId)}
-                          className={`w-full px-3 py-1 bg-transparent border-0 outline-none text-center ${
-                            isSelected ? 'ring-2 ring-teal-500 ring-inset rounded' : ''
+                <>
+                  {zone.cells
+                    .filter((c) => c.row_index === row.index && c.col_type === 'spanning_header')
+                    .sort((a, b) => a.col_index - b.col_index)
+                    .map((spanCell) => {
+                      const colspan = spanCell.colspan || numCols
+                      const cellId = spanCell.cell_id
+                      const isSelected = selectedCell === cellId
+                      const cellColor = getCellColor(spanCell)
+                      const gridColStart = spanCell.col_index + 2
+                      const gridColEnd = gridColStart + colspan
+                      return (
+                        <div
+                          key={cellId}
+                          className={`border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center ${
+                            isSelected ? 'ring-2 ring-teal-500 ring-inset z-10' : ''
                          }`}
-                          style={{ color: cellColor || undefined }}
-                          spellCheck={false}
-                        />
-                      </div>
-                    )
-                  })()}
-                </div>
+                          style={{ gridColumn: `${gridColStart} / ${gridColEnd}`, height: `${rowH}px` }}
+                        >
+                          {cellColor && (
+                            <span className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm" style={{ backgroundColor: cellColor }} />
+                          )}
+                          <input
+                            id={`cell-${cellId}`}
+                            type="text"
+                            value={spanCell.text}
+                            onChange={(e) => onCellTextChange(cellId, e.target.value)}
+                            onFocus={() => onSelectCell(cellId)}
+                            onKeyDown={(e) => handleKeyDown(e, cellId)}
+                            className="w-full px-3 py-1 bg-transparent border-0 outline-none text-center"
+                            style={{ color: cellColor || undefined }}
+                            spellCheck={false}
+                          />
+                        </div>
+                      )
+                    })}
+                </>
              ) : (
                zone.columns.map((col) => {
                  const cell = cellMap.get(`${row.index}_${col.index}`)
@@ -485,7 +488,13 @@ export function GridTable({
                      } ${isMultiSelected ? 'bg-teal-50/60 dark:bg-teal-900/20' : ''} ${
                        isLowConf && !isMultiSelected ? 'bg-amber-50/50 dark:bg-amber-900/10' : ''
                      } ${row.is_header && !isMultiSelected ? 'bg-blue-50/50 dark:bg-blue-900/10' : ''}`}
-                      style={{ height: `${rowH}px` }}
+                      style={{
+                        height: `${rowH}px`,
+                        ...(cell?.box_region?.bg_hex ? {
+                          backgroundColor: `${cell.box_region.bg_hex}12`,
+                          borderLeft: cell.box_region.border ? `3px solid ${cell.box_region.bg_hex}60` : undefined,
+                        } : {}),
+                      }}
                      onContextMenu={(e) => {
                        if (onSetCellColor) {
                          e.preventDefault()
@@ -501,53 +510,88 @@ export function GridTable({
                        />
                      )}
                      {/* Per-word colored display when not editing */}
-                      {hasColoredWords && !isSelected ? (
-                        <div
-                          className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
-                          onClick={(e) => {
-                            if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
-                              onToggleCellSelection(cellId)
-                            } else {
-                              onSelectCell(cellId)
-                              setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
-                            }
-                          }}
-                        >
-                          {cell!.word_boxes!.map((wb, i) => (
-                            <span
-                              key={i}
-                              style={
-                                wb.color_name && wb.color_name !== 'black'
-                                  ? { color: wb.color }
-                                  : undefined
-                              }
+                      {(() => {
+                        const cellText = cell?.text ?? ''
+                        const isMultiLine = cellText.includes('\n')
+                        if (hasColoredWords && !isSelected) {
+                          return (
+                            <div
+                              className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
+                              onClick={(e) => {
+                                if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
+                                  onToggleCellSelection(cellId)
+                                } else {
+                                  onSelectCell(cellId)
+                                  setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
+                                }
+                              }}
                            >
-                              {wb.text}
-                              {i < cell!.word_boxes!.length - 1 ? ' ' : ''}
-                            </span>
-                          ))}
-                        </div>
-                      ) : (
-                        <input
-                          id={`cell-${cellId}`}
-                          type="text"
-                          value={cell?.text ?? ''}
-                          onChange={(e) => onCellTextChange(cellId, e.target.value)}
-                          onFocus={() => onSelectCell(cellId)}
-                          onClick={(e) => {
-                            if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
-                              e.preventDefault()
-                              onToggleCellSelection(cellId)
-                            }
-                          }}
-                          onKeyDown={(e) => handleKeyDown(e, cellId)}
-                          className={`w-full px-2 bg-transparent border-0 outline-none ${
-                            isBold ? 'font-bold' : 'font-normal'
-                          }`}
-                          style={{ color: cellColor || undefined }}
-                          spellCheck={false}
-                        />
-                      )}
+                              {cell!.word_boxes!.map((wb, i) => (
+                                <span
+                                  key={i}
+                                  style={
+                                    wb.color_name && wb.color_name !== 'black'
+                                      ? { color: wb.color }
+                                      : undefined
+                                  }
+                                >
+                                  {wb.text}
+                                  {i < cell!.word_boxes!.length - 1 ? ' ' : ''}
+                                </span>
+                              ))}
+                            </div>
+                          )
+                        }
+                        if (isMultiLine) {
+                          return (
+                            <textarea
+                              id={`cell-${cellId}`}
+                              value={cellText}
+                              onChange={(e) => onCellTextChange(cellId, e.target.value)}
+                              onFocus={() => onSelectCell(cellId)}
+                              onClick={(e) => {
+                                if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
+                                  e.preventDefault()
+                                  onToggleCellSelection(cellId)
+                                }
+                              }}
+                              onKeyDown={(e) => {
+                                if (e.key === 'Tab') {
+                                  e.preventDefault()
+                                  onNavigate(cellId, e.shiftKey ? 'left' : 'right')
+                                }
+                              }}
+                              rows={cellText.split('\n').length}
+                              className={`w-full px-2 bg-transparent border-0 outline-none resize-none ${
+                                isBold ? 'font-bold' : 'font-normal'
+                              }`}
+                              style={{ color: cellColor || undefined }}
+                              spellCheck={false}
+                            />
+                          )
+                        }
+                        return (
+                          <input
+                            id={`cell-${cellId}`}
+                            type="text"
+                            value={cellText}
+                            onChange={(e) => onCellTextChange(cellId, e.target.value)}
+                            onFocus={() => onSelectCell(cellId)}
+                            onClick={(e) => {
+                              if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
+                                e.preventDefault()
+                                onToggleCellSelection(cellId)
+                              }
+                            }}
+                            onKeyDown={(e) => handleKeyDown(e, cellId)}
+                            className={`w-full px-2 bg-transparent border-0 outline-none ${
+                              isBold ? 'font-bold' : 'font-normal'
+                            }`}
+                            style={{ color: cellColor || undefined }}
+                            spellCheck={false}
+                          />
+                        )
+                      })()}
                    </div>
                  )
                })
--- a/admin-lehrer/components/grid-editor/GridToolbar.tsx
+++ b/admin-lehrer/components/grid-editor/GridToolbar.tsx
@@ -1,16 +1,38 @@
 'use client'

+import type { IpaMode, SyllableMode } from './useGridEditor'
+
 interface GridToolbarProps {
  dirty: boolean
  saving: boolean
  canUndo: boolean
  canRedo: boolean
  showOverlay: boolean
+  ipaMode: IpaMode
+  syllableMode: SyllableMode
  onSave: () => void
  onUndo: () => void
  onRedo: () => void
  onRebuild: () => void
  onToggleOverlay: () => void
+  onIpaModeChange: (mode: IpaMode) => void
+  onSyllableModeChange: (mode: SyllableMode) => void
+}
+
+const IPA_LABELS: Record<IpaMode, string> = {
+  auto: 'IPA: Auto',
+  en: 'IPA: nur EN',
+  de: 'IPA: nur DE',
+  all: 'IPA: Alle',
+  none: 'IPA: Aus',
+}
+
+const SYLLABLE_LABELS: Record<SyllableMode, string> = {
+  auto: 'Silben: Original',
+  en: 'Silben: nur EN',
+  de: 'Silben: nur DE',
+  all: 'Silben: Alle',
+  none: 'Silben: Aus',
 }

 export function GridToolbar({
@@ -19,11 +41,15 @@ export function GridToolbar({
  canUndo,
  canRedo,
  showOverlay,
+  ipaMode,
+  syllableMode,
  onSave,
  onUndo,
  onRedo,
  onRebuild,
  onToggleOverlay,
+  onIpaModeChange,
+  onSyllableModeChange,
 }: GridToolbarProps) {
  return (
    <div className="flex items-center gap-2 flex-wrap">
@@ -67,6 +93,40 @@ export function GridToolbar({
        Bild-Overlay
      </button>

+      {/* IPA mode */}
+      <div className="flex items-center gap-1">
+        <select
+          value={ipaMode}
+          onChange={(e) => onIpaModeChange(e.target.value as IpaMode)}
+          className="px-2 py-1.5 text-xs rounded-md border border-gray-200 dark:border-gray-700 bg-white dark:bg-gray-800 text-gray-600 dark:text-gray-400"
+          title="Lautschrift (IPA): Auto = nur erkannte EN-Woerter, DE = deutsches IPA (Wiktionary), Alle = EN + DE, Aus = keine"
+        >
+          {(Object.keys(IPA_LABELS) as IpaMode[]).map((m) => (
+            <option key={m} value={m}>{IPA_LABELS[m]}</option>
+          ))}
+        </select>
+        {(ipaMode === 'de' || ipaMode === 'all') && (
+          <span
+            className="text-[9px] text-gray-400 dark:text-gray-500 cursor-help"
+            title="DE-Lautschrift: Wiktionary (CC-BY-SA 4.0) + epitran (MIT). EN-Lautschrift: Britfone (MIT) + eng_to_ipa (MIT)."
+          >
+            CC-BY-SA
+          </span>
+        )}
+      </div>
+
+      {/* Syllable mode */}
+      <select
+        value={syllableMode}
+        onChange={(e) => onSyllableModeChange(e.target.value as SyllableMode)}
+        className="px-2 py-1.5 text-xs rounded-md border border-gray-200 dark:border-gray-700 bg-white dark:bg-gray-800 text-gray-600 dark:text-gray-400"
+        title="Silbentrennung: Original = nur wo im Scan vorhanden, Alle = fuer alle Woerter, Aus = keine"
+      >
+        {(Object.keys(SYLLABLE_LABELS) as SyllableMode[]).map((m) => (
+          <option key={m} value={m}>{SYLLABLE_LABELS[m]}</option>
+        ))}
+      </select>
+
      {/* Rebuild */}
      <button
        onClick={onRebuild}
--- a/admin-lehrer/components/grid-editor/types.ts
+++ b/admin-lehrer/components/grid-editor/types.ts
@@ -20,6 +20,13 @@ export interface DictionaryDetection {
  headword_col_index: number | null
 }

+/** Page number extracted from footer region of the scan. */
+export interface PageNumber {
+  text: string
+  y_pct: number
+  number?: number
+}
+
 /** A complete structured grid with zones, ready for the Excel-like editor. */
 export interface StructuredGrid {
  session_id: string
@@ -31,6 +38,7 @@ export interface StructuredGrid {
  formatting: GridFormatting
  layout_metrics?: LayoutMetrics
  dictionary_detection?: DictionaryDetection
+  page_number?: PageNumber | null
  duration_seconds: number
  edited?: boolean
  layout_dividers?: LayoutDividers
@@ -65,6 +73,10 @@ export interface GridZone {
  header_rows: number[]
  layout_hint?: 'left_of_vsplit' | 'right_of_vsplit' | 'middle_of_vsplit'
  vsplit_group?: number
+  box_layout_type?: 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
+  box_grid_reviewed?: boolean
+  box_bg_color?: string
+  box_bg_hex?: string
 }

 export interface BBox {
@@ -114,6 +126,16 @@ export interface GridEditorCell {
  is_bold: boolean
  /** Manual color override: hex string or null to clear. */
  color_override?: string | null
+  /** Number of columns this cell spans (merged cell). Default 1. */
+  colspan?: number
+  /** Source zone type when in unified grid. */
+  source_zone_type?: 'content' | 'box'
+  /** Box visual metadata for cells from box zones. */
+  box_region?: {
+    bg_hex?: string
+    bg_color?: string
+    border?: boolean
+  }
 }

 /** Layout dividers for the visual column/margin editor on the original image. */
--- a/admin-lehrer/components/grid-editor/useGridEditor.ts
+++ b/admin-lehrer/components/grid-editor/useGridEditor.ts
@@ -1,4 +1,4 @@
-import { useCallback, useRef, useState } from 'react'
+import { useCallback, useEffect, useRef, useState } from 'react'
 import type { StructuredGrid, GridZone, LayoutDividers } from './types'

 const KLAUSUR_API = '/klausur-api'
@@ -14,6 +14,9 @@ export interface GridEditorState {
  selectedZone: number | null
 }

+export type IpaMode = 'auto' | 'all' | 'de' | 'en' | 'none'
+export type SyllableMode = 'auto' | 'all' | 'de' | 'en' | 'none'
+
 export function useGridEditor(sessionId: string | null) {
  const [grid, setGrid] = useState<StructuredGrid | null>(null)
  const [loading, setLoading] = useState(false)
@@ -22,6 +25,8 @@ export function useGridEditor(sessionId: string | null) {
  const [dirty, setDirty] = useState(false)
  const [selectedCell, setSelectedCell] = useState<string | null>(null)
  const [selectedZone, setSelectedZone] = useState<number | null>(null)
+  const [ipaMode, setIpaMode] = useState<IpaMode>('auto')
+  const [syllableMode, setSyllableMode] = useState<SyllableMode>('auto')

  // Undo/redo stacks store serialized zone arrays
  const undoStack = useRef<string[]>([])
@@ -44,8 +49,11 @@ export function useGridEditor(sessionId: string | null) {
    setLoading(true)
    setError(null)
    try {
+      const params = new URLSearchParams()
+      params.set('ipa_mode', ipaMode)
+      params.set('syllable_mode', syllableMode)
      const res = await fetch(
-        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid`,
+        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
        { method: 'POST' },
      )
      if (!res.ok) {
@@ -62,7 +70,7 @@ export function useGridEditor(sessionId: string | null) {
    } finally {
      setLoading(false)
    }
-  }, [sessionId])
+  }, [sessionId, ipaMode, syllableMode])

  const loadGrid = useCallback(async () => {
    if (!sessionId) return
@@ -73,8 +81,19 @@ export function useGridEditor(sessionId: string | null) {
        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`,
      )
      if (res.status === 404) {
-        // No grid yet — build it
-        await buildGrid()
+        // No grid yet — build it with current modes
+        const params = new URLSearchParams()
+        params.set('ipa_mode', ipaMode)
+        params.set('syllable_mode', syllableMode)
+        const buildRes = await fetch(
+          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
+          { method: 'POST' },
+        )
+        if (buildRes.ok) {
+          const data: StructuredGrid = await buildRes.json()
+          setGrid(data)
+          setDirty(false)
+        }
        return
      }
      if (!res.ok) {
@@ -91,7 +110,50 @@ export function useGridEditor(sessionId: string | null) {
    } finally {
      setLoading(false)
    }
-  }, [sessionId, buildGrid])
+    // Only depends on sessionId — mode changes are handled by the
+    // separate useEffect below, not by re-triggering loadGrid.
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [sessionId])
+
+  // Auto-rebuild when IPA or syllable mode changes (skip initial mount).
+  // We call the API directly with the new values instead of going through
+  // the buildGrid callback, which may still close over stale state due to
+  // React's asynchronous state batching.
+  const mountedRef = useRef(false)
+  useEffect(() => {
+    if (!mountedRef.current) {
+      // Skip the first trigger (component mount) — don't rebuild yet
+      mountedRef.current = true
+      return
+    }
+    if (!sessionId) return
+    const rebuild = async () => {
+      setLoading(true)
+      setError(null)
+      try {
+        const params = new URLSearchParams()
+        params.set('ipa_mode', ipaMode)
+        params.set('syllable_mode', syllableMode)
+        const res = await fetch(
+          `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
+          { method: 'POST' },
+        )
+        if (!res.ok) {
+          const data = await res.json().catch(() => ({}))
+          throw new Error(data.detail || `HTTP ${res.status}`)
+        }
+        const data: StructuredGrid = await res.json()
+        setGrid(data)
+        setDirty(false)
+      } catch (e) {
+        setError(e instanceof Error ? e.message : String(e))
+      } finally {
+        setLoading(false)
+      }
+    }
+    rebuild()
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [ipaMode, syllableMode])

  // ------------------------------------------------------------------
  // Save
@@ -915,5 +977,9 @@ export function useGridEditor(sessionId: string | null) {
    toggleSelectedBold,
    autoCorrectColumnPatterns,
    setCellColor,
+    ipaMode,
+    setIpaMode,
+    syllableMode,
+    setSyllableMode,
  }
 }
--- a/admin-lehrer/components/ocr-kombi/KombiStepper.tsx
+++ b/admin-lehrer/components/ocr-kombi/KombiStepper.tsx
@@ -0,0 +1,59 @@
+'use client'
+
+import type { PipelineStep } from '@/app/(admin)/ai/ocr-pipeline/types'
+
+interface KombiStepperProps {
+  steps: PipelineStep[]
+  currentStep: number
+  onStepClick: (index: number) => void
+}
+
+export function KombiStepper({ steps, currentStep, onStepClick }: KombiStepperProps) {
+  return (
+    <div className="flex items-center gap-0.5 px-3 py-2.5 bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 overflow-x-auto">
+      {steps.map((step, index) => {
+        const isActive = index === currentStep
+        const isCompleted = step.status === 'completed'
+        const isFailed = step.status === 'failed'
+        const isSkipped = step.status === 'skipped'
+        const isClickable = (index <= currentStep || isCompleted) && !isSkipped
+
+        return (
+          <div key={step.id} className="flex items-center flex-shrink-0">
+            {index > 0 && (
+              <div
+                className={`h-0.5 w-4 mx-0.5 ${
+                  isSkipped
+                    ? 'bg-gray-200 dark:bg-gray-700 border-t border-dashed border-gray-400'
+                    : index <= currentStep ? 'bg-teal-400' : 'bg-gray-300 dark:bg-gray-600'
+                }`}
+              />
+            )}
+            <button
+              onClick={() => isClickable && onStepClick(index)}
+              disabled={!isClickable}
+              className={`flex items-center gap-1 px-2 py-1 rounded-full text-xs font-medium transition-all whitespace-nowrap ${
+                isSkipped
+                  ? 'bg-gray-100 text-gray-400 dark:bg-gray-800 dark:text-gray-600 line-through'
+                  : isActive
+                    ? 'bg-teal-100 text-teal-700 dark:bg-teal-900/40 dark:text-teal-300 ring-2 ring-teal-400'
+                    : isCompleted
+                      ? 'bg-green-100 text-green-700 dark:bg-green-900/40 dark:text-green-300'
+                      : isFailed
+                        ? 'bg-red-100 text-red-700 dark:bg-red-900/40 dark:text-red-300'
+                        : 'text-gray-400 dark:text-gray-500'
+              } ${isClickable ? 'cursor-pointer hover:opacity-80' : 'cursor-default'}`}
+              title={step.name}
+            >
+              <span className="text-sm">
+                {isSkipped ? '-' : isCompleted ? '\u2713' : isFailed ? '\u2717' : step.icon}
+              </span>
+              <span className="hidden lg:inline">{step.name}</span>
+              <span className="lg:hidden">{index + 1}</span>
+            </button>
+          </div>
+        )
+      })}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/SessionHeader.tsx
+++ b/admin-lehrer/components/ocr-kombi/SessionHeader.tsx
@@ -0,0 +1,73 @@
+'use client'
+
+import { useState } from 'react'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+
+interface SessionHeaderProps {
+  sessionName: string
+  activeCategory?: DocumentCategory
+  isGroundTruth: boolean
+  pageNumber?: number | null
+  onUpdateCategory: (category: DocumentCategory) => void
+}
+
+export function SessionHeader({
+  sessionName,
+  activeCategory,
+  isGroundTruth,
+  pageNumber,
+  onUpdateCategory,
+}: SessionHeaderProps) {
+  const [showCategoryPicker, setShowCategoryPicker] = useState(false)
+
+  const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
+
+  return (
+    <div className="relative flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
+      <span>
+        Aktive Session:{' '}
+        <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span>
+      </span>
+      <button
+        onClick={() => setShowCategoryPicker(!showCategoryPicker)}
+        className={`text-xs px-2.5 py-1 rounded-full border transition-colors ${
+          activeCategory
+            ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300 hover:bg-teal-100'
+            : 'bg-amber-50 dark:bg-amber-900/20 border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300 hover:bg-amber-100 animate-pulse'
+        }`}
+      >
+        {catInfo ? `${catInfo.icon} ${catInfo.label}` : 'Kategorie setzen'}
+      </button>
+      {pageNumber != null && (
+        <span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 border border-gray-200 dark:border-gray-600 text-gray-600 dark:text-gray-300">
+          S. {pageNumber}
+        </span>
+      )}
+      {isGroundTruth && (
+        <span className="text-xs px-2 py-0.5 rounded-full bg-amber-50 dark:bg-amber-900/20 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
+          GT
+        </span>
+      )}
+      {showCategoryPicker && (
+        <div className="absolute left-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64">
+          {DOCUMENT_CATEGORIES.map(cat => (
+            <button
+              key={cat.value}
+              onClick={() => {
+                onUpdateCategory(cat.value)
+                setShowCategoryPicker(false)
+              }}
+              className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
+                activeCategory === cat.value
+                  ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
+                  : 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
+              }`}
+            >
+              {cat.icon} {cat.label}
+            </button>
+          ))}
+        </div>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/SessionList.tsx
+++ b/admin-lehrer/components/ocr-kombi/SessionList.tsx
@@ -0,0 +1,376 @@
+'use client'
+
+import { useState } from 'react'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { SessionListItem, DocumentGroupView } from '@/app/(admin)/ai/ocr-kombi/useKombiPipeline'
+
+const KLAUSUR_API = '/klausur-api'
+
+interface SessionListProps {
+  items: (SessionListItem | DocumentGroupView)[]
+  loading: boolean
+  activeSessionId: string | null
+  onOpenSession: (sid: string) => void
+  onNewSession: () => void
+  onDeleteSession: (sid: string) => void
+  onRenameSession: (sid: string, newName: string) => void
+  onUpdateCategory: (sid: string, category: DocumentCategory) => void
+}
+
+function isGroup(item: SessionListItem | DocumentGroupView): item is DocumentGroupView {
+  return 'group_id' in item
+}
+
+export function SessionList({
+  items,
+  loading,
+  activeSessionId,
+  onOpenSession,
+  onNewSession,
+  onDeleteSession,
+  onRenameSession,
+  onUpdateCategory,
+}: SessionListProps) {
+  const [editingName, setEditingName] = useState<string | null>(null)
+  const [editNameValue, setEditNameValue] = useState('')
+  const [editingCategory, setEditingCategory] = useState<string | null>(null)
+  const [expandedGroups, setExpandedGroups] = useState<Set<string>>(new Set())
+
+  const toggleGroup = (groupId: string) => {
+    setExpandedGroups(prev => {
+      const next = new Set(prev)
+      if (next.has(groupId)) next.delete(groupId)
+      else next.add(groupId)
+      return next
+    })
+  }
+
+  return (
+    <div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
+      <div className="flex items-center justify-between mb-3">
+        <h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
+          Sessions ({items.length})
+        </h3>
+        <button
+          onClick={onNewSession}
+          className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
+        >
+          + Neue Session
+        </button>
+      </div>
+
+      {loading ? (
+        <div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
+      ) : items.length === 0 ? (
+        <div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
+      ) : (
+        <div className="space-y-1.5 max-h-[320px] overflow-y-auto">
+          {items.map(item =>
+            isGroup(item) ? (
+              <GroupRow
+                key={item.group_id}
+                group={item}
+                expanded={expandedGroups.has(item.group_id)}
+                activeSessionId={activeSessionId}
+                onToggle={() => toggleGroup(item.group_id)}
+                onOpenSession={onOpenSession}
+                onDeleteSession={onDeleteSession}
+              />
+            ) : (
+              <SessionRow
+                key={item.id}
+                session={item}
+                isActive={activeSessionId === item.id}
+                editingName={editingName}
+                editNameValue={editNameValue}
+                editingCategory={editingCategory}
+                onOpenSession={() => onOpenSession(item.id)}
+                onStartRename={() => {
+                  setEditNameValue(item.name || item.filename)
+                  setEditingName(item.id)
+                }}
+                onFinishRename={(newName) => {
+                  onRenameSession(item.id, newName)
+                  setEditingName(null)
+                }}
+                onCancelRename={() => setEditingName(null)}
+                onEditNameChange={setEditNameValue}
+                onToggleCategory={() => setEditingCategory(editingCategory === item.id ? null : item.id)}
+                onUpdateCategory={(cat) => {
+                  onUpdateCategory(item.id, cat)
+                  setEditingCategory(null)
+                }}
+                onDelete={() => {
+                  if (confirm('Session loeschen?')) onDeleteSession(item.id)
+                }}
+              />
+            )
+          )}
+        </div>
+      )}
+    </div>
+  )
+}
+
+// ---- Group row (multi-page document) ----
+
+function GroupRow({
+  group,
+  expanded,
+  activeSessionId,
+  onToggle,
+  onOpenSession,
+  onDeleteSession,
+}: {
+  group: DocumentGroupView
+  expanded: boolean
+  activeSessionId: string | null
+  onToggle: () => void
+  onOpenSession: (sid: string) => void
+  onDeleteSession: (sid: string) => void
+}) {
+  const isActive = group.sessions.some(s => s.id === activeSessionId)
+
+  return (
+    <div>
+      <div
+        onClick={onToggle}
+        className={`flex items-center gap-3 px-3 py-2 rounded-lg text-sm cursor-pointer transition-colors ${
+          isActive
+            ? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
+            : 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
+        }`}
+      >
+        <span className="text-base">{expanded ? '\u25BC' : '\u25B6'}</span>
+        <div className="flex-1 min-w-0">
+          <div className="truncate font-medium text-gray-700 dark:text-gray-300">
+            {group.title}
+          </div>
+          <div className="text-xs text-gray-400">
+            {group.page_count} Seiten
+          </div>
+        </div>
+        <div className="flex items-center gap-1.5">
+          {group.sessions.some(s => s.is_ground_truth) && (
+            <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
+              GT {group.sessions.filter(s => s.is_ground_truth).length}/{group.sessions.length}
+            </span>
+          )}
+          <span className="text-xs px-2 py-0.5 rounded-full bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 text-blue-600 dark:text-blue-400">
+            Dokument
+          </span>
+        </div>
+      </div>
+
+      {expanded && (
+        <div className="ml-6 mt-1 space-y-1 border-l-2 border-gray-200 dark:border-gray-700 pl-3">
+          {group.sessions.map(s => (
+            <div
+              key={s.id}
+              className={`flex items-center gap-2 px-2 py-1.5 rounded text-xs cursor-pointer transition-colors ${
+                activeSessionId === s.id
+                  ? 'bg-teal-50 dark:bg-teal-900/30 text-teal-700 dark:text-teal-300'
+                  : 'hover:bg-gray-50 dark:hover:bg-gray-700/50 text-gray-600 dark:text-gray-400'
+              }`}
+              onClick={() => onOpenSession(s.id)}
+            >
+              {/* Thumbnail */}
+              <div className="flex-shrink-0 w-8 h-8 rounded overflow-hidden bg-gray-100 dark:bg-gray-700">
+                {/* eslint-disable-next-line @next/next/no-img-element */}
+                <img
+                  src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${s.id}/thumbnail?size=64`}
+                  alt=""
+                  className="w-full h-full object-cover"
+                  loading="lazy"
+                  onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
+                />
+              </div>
+              <span className="truncate flex-1">S. {s.page_number || '?'}</span>
+              {s.is_ground_truth && (
+                <span className="text-[9px] px-1 py-0.5 rounded bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">GT</span>
+              )}
+              <span className="text-[10px] text-gray-400">Step {s.current_step}</span>
+              <button
+                onClick={(e) => {
+                  e.stopPropagation()
+                  if (confirm('Seite loeschen?')) onDeleteSession(s.id)
+                }}
+                className="p-0.5 text-gray-400 hover:text-red-500"
+                title="Loeschen"
+              >
+                <svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+                  <path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
+                </svg>
+              </button>
+            </div>
+          ))}
+        </div>
+      )}
+    </div>
+  )
+}
+
+// ---- Single session row ----
+
+function SessionRow({
+  session,
+  isActive,
+  editingName,
+  editNameValue,
+  editingCategory,
+  onOpenSession,
+  onStartRename,
+  onFinishRename,
+  onCancelRename,
+  onEditNameChange,
+  onToggleCategory,
+  onUpdateCategory,
+  onDelete,
+}: {
+  session: SessionListItem
+  isActive: boolean
+  editingName: string | null
+  editNameValue: string
+  editingCategory: string | null
+  onOpenSession: () => void
+  onStartRename: () => void
+  onFinishRename: (name: string) => void
+  onCancelRename: () => void
+  onEditNameChange: (val: string) => void
+  onToggleCategory: () => void
+  onUpdateCategory: (cat: DocumentCategory) => void
+  onDelete: () => void
+}) {
+  const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === session.document_category)
+  const isEditing = editingName === session.id
+
+  return (
+    <div
+      className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
+        isActive
+          ? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
+          : 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
+      }`}
+    >
+      {/* Thumbnail */}
+      <div
+        className="flex-shrink-0 w-12 h-12 rounded-md overflow-hidden bg-gray-100 dark:bg-gray-700"
+        onClick={onOpenSession}
+      >
+        {/* eslint-disable-next-line @next/next/no-img-element */}
+        <img
+          src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${session.id}/thumbnail?size=96`}
+          alt=""
+          className="w-full h-full object-cover"
+          loading="lazy"
+          onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
+        />
+      </div>
+
+      {/* Info */}
+      <div className="flex-1 min-w-0" onClick={onOpenSession}>
+        {isEditing ? (
+          <input
+            autoFocus
+            value={editNameValue}
+            onChange={(e) => onEditNameChange(e.target.value)}
+            onBlur={() => onFinishRename(editNameValue)}
+            onKeyDown={(e) => {
+              if (e.key === 'Enter') onFinishRename(editNameValue)
+              if (e.key === 'Escape') onCancelRename()
+            }}
+            onClick={(e) => e.stopPropagation()}
+            className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
+          />
+        ) : (
+          <div className="truncate font-medium text-gray-700 dark:text-gray-300">
+            {session.name || session.filename}
+          </div>
+        )}
+        <button
+          onClick={(e) => {
+            e.stopPropagation()
+            navigator.clipboard.writeText(session.id)
+            const btn = e.currentTarget
+            btn.textContent = 'Kopiert!'
+            setTimeout(() => { btn.textContent = `ID: ${session.id.slice(0, 8)}` }, 1500)
+          }}
+          className="text-[10px] font-mono text-gray-400 hover:text-teal-500 transition-colors"
+          title={`Volle ID: ${session.id} — Klick zum Kopieren`}
+        >
+          ID: {session.id.slice(0, 8)}
+        </button>
+        <div className="text-xs text-gray-400 mt-0.5">
+          {new Date(session.created_at).toLocaleDateString('de-DE', {
+            day: '2-digit', month: '2-digit', year: '2-digit',
+            hour: '2-digit', minute: '2-digit',
+          })}
+        </div>
+      </div>
+
+      {/* Category + GT badge */}
+      <div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
+        <button
+          onClick={onToggleCategory}
+          className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
+            catInfo
+              ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
+              : 'bg-gray-50 dark:bg-gray-700 border-gray-200 dark:border-gray-600 text-gray-400 hover:text-gray-600'
+          }`}
+          title="Kategorie setzen"
+        >
+          {catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
+        </button>
+        {session.is_ground_truth && (
+          <span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300" title="Ground Truth markiert">
+            GT
+          </span>
+        )}
+      </div>
+
+      {/* Actions */}
+      <div className="flex flex-col gap-0.5 flex-shrink-0">
+        <button
+          onClick={(e) => { e.stopPropagation(); onStartRename() }}
+          className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
+          title="Umbenennen"
+        >
+          <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+            <path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
+          </svg>
+        </button>
+        <button
+          onClick={(e) => { e.stopPropagation(); onDelete() }}
+          className="p-1 text-gray-400 hover:text-red-500"
+          title="Loeschen"
+        >
+          <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
+            <path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
+          </svg>
+        </button>
+      </div>
+
+      {/* Category dropdown */}
+      {editingCategory === session.id && (
+        <div
+          className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
+          onClick={(e) => e.stopPropagation()}
+        >
+          {DOCUMENT_CATEGORIES.map(cat => (
+            <button
+              key={cat.value}
+              onClick={() => onUpdateCategory(cat.value)}
+              className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
+                session.document_category === cat.value
+                  ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
+                  : 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
+              }`}
+            >
+              {cat.icon} {cat.label}
+            </button>
+          ))}
+        </div>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
+++ b/admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
@@ -0,0 +1,241 @@
+'use client'
+
+/**
+ * SpreadsheetView — Fortune Sheet with multi-sheet support.
+ *
+ * Each zone (content + boxes) becomes its own Excel sheet tab,
+ * so each can have independent column widths optimized for its content.
+ */
+
+import { useMemo } from 'react'
+import dynamic from 'next/dynamic'
+
+const Workbook = dynamic(
+  () => import('@fortune-sheet/react').then((m) => m.Workbook),
+  { ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
+)
+
+import '@fortune-sheet/react/dist/index.css'
+
+import type { GridZone } from '@/components/grid-editor/types'
+
+interface SpreadsheetViewProps {
+  gridData: any
+  height?: number
+}
+
+/** No expansion — keep multi-line cells as single cells with \n and text-wrap. */
+
+/** Convert a single zone to a Fortune Sheet sheet object. */
+function zoneToSheet(zone: GridZone, sheetIndex: number, isFirst: boolean): any {
+  const isBox = zone.zone_type === 'box'
+  const boxColor = (zone as any).box_bg_hex || ''
+
+  // Sheet name
+  let name: string
+  if (!isBox) {
+    name = 'Vokabeln'
+  } else {
+    const firstText = zone.cells?.[0]?.text ?? `Box ${sheetIndex}`
+    const cleaned = firstText.replace(/[^\w\s\u00C0-\u024F„"]/g, '').trim()
+    name = cleaned.length > 25 ? cleaned.slice(0, 25) + '…' : cleaned || `Box ${sheetIndex}`
+  }
+
+  const numCols = zone.columns?.length || 1
+  const numRows = zone.rows?.length || 0
+  const expandedCells = zone.cells || []
+
+  // Compute zone-wide median word height for font-size detection
+  const allWordHeights = zone.cells
+    .flatMap((c: any) => (c.word_boxes || []).map((wb: any) => wb.height || 0))
+    .filter((h: number) => h > 0)
+  const medianWordH = allWordHeights.length
+    ? [...allWordHeights].sort((a, b) => a - b)[Math.floor(allWordHeights.length / 2)]
+    : 0
+
+  // Build celldata
+  const celldata: any[] = []
+  const merges: Record<string, any> = {}
+
+  for (const cell of expandedCells) {
+    const r = cell.row_index
+    const c = cell.col_index
+    const text = cell.text ?? ''
+
+    // Row metadata
+    const row = zone.rows?.find((rr) => rr.index === r)
+    const isHeader = row?.is_header ?? false
+
+    // Font size detection from word_boxes
+    const avgWbH = cell.word_boxes?.length
+      ? cell.word_boxes.reduce((s: number, wb: any) => s + (wb.height || 0), 0) / cell.word_boxes.length
+      : 0
+    const isLargerFont = avgWbH > 0 && medianWordH > 0 && avgWbH > medianWordH * 1.3
+
+    const v: any = { v: text, m: text }
+
+    // Bold: headers, is_bold, larger font
+    if (cell.is_bold || isHeader || isLargerFont) {
+      v.bl = 1
+    }
+
+    // Larger font for box titles
+    if (isLargerFont && isBox) {
+      v.fs = 12
+    }
+
+    // Multi-line text (bullets with \n): enable text wrap + vertical top align
+    // Add bullet marker (•) if multi-line and no bullet present
+    if (text.includes('\n') && !isHeader) {
+      if (!text.startsWith('•') && !text.startsWith('-') && !text.startsWith('–') && r > 0) {
+        text = '• ' + text
+        v.v = text
+        v.m = text
+      }
+      v.tb = '2'  // text wrap
+      v.vt = 0    // vertical align: top
+    }
+
+    // Header row background
+    if (isHeader) {
+      v.bg = isBox ? `${boxColor || '#2563eb'}18` : '#f0f4ff'
+    }
+
+    // Box cells: light tinted background
+    if (isBox && !isHeader && boxColor) {
+      v.bg = `${boxColor}08`
+    }
+
+    // Text color from OCR
+    const color = cell.color_override
+      ?? cell.word_boxes?.find((wb: any) => wb.color_name && wb.color_name !== 'black')?.color
+    if (color) v.fc = color
+
+    celldata.push({ r, c, v })
+
+    // Colspan → merge
+    const colspan = cell.colspan || 0
+    if (colspan > 1 || cell.col_type === 'spanning_header') {
+      const cs = colspan || numCols
+      merges[`${r}_${c}`] = { r, c, rs: 1, cs }
+    }
+  }
+
+  // Column widths — auto-fit based on longest text
+  const columnlen: Record<string, number> = {}
+  for (const col of (zone.columns || [])) {
+    const colCells = expandedCells.filter(
+      (c: any) => c.col_index === col.index && c.col_type !== 'spanning_header'
+    )
+    let maxTextLen = 0
+    for (const c of colCells) {
+      const len = (c.text ?? '').length
+      if (len > maxTextLen) maxTextLen = len
+    }
+    const autoWidth = Math.max(60, maxTextLen * 7.5 + 16)
+    const pxW = (col.x_max_px ?? 0) - (col.x_min_px ?? 0)
+    const scaledPxW = Math.max(60, Math.round(pxW * (numCols <= 2 ? 0.6 : 0.4)))
+    columnlen[String(col.index)] = Math.round(Math.max(autoWidth, scaledPxW))
+  }
+
+  // Row heights — taller for multi-line cells
+  const rowlen: Record<string, number> = {}
+  for (const row of (zone.rows || [])) {
+    const rowCells = expandedCells.filter((c: any) => c.row_index === row.index)
+    const maxLines = Math.max(1, ...rowCells.map((c: any) => (c.text ?? '').split('\n').length))
+    const baseH = 24
+    rowlen[String(row.index)] = Math.max(baseH, baseH * maxLines)
+  }
+
+  // Border info
+  const borderInfo: any[] = []
+
+  // Box: colored outside border
+  if (isBox && boxColor && numRows > 0 && numCols > 0) {
+    borderInfo.push({
+      rangeType: 'range',
+      borderType: 'border-outside',
+      color: boxColor,
+      style: 5,
+      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
+    })
+    borderInfo.push({
+      rangeType: 'range',
+      borderType: 'border-inside',
+      color: `${boxColor}40`,
+      style: 1,
+      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
+    })
+  }
+
+  // Content zone: light grid lines
+  if (!isBox && numRows > 0 && numCols > 0) {
+    borderInfo.push({
+      rangeType: 'range',
+      borderType: 'border-all',
+      color: '#e5e7eb',
+      style: 1,
+      range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
+    })
+  }
+
+  return {
+    name,
+    id: `zone_${zone.zone_index}`,
+    celldata,
+    row: numRows,
+    column: Math.max(numCols, 1),
+    status: isFirst ? 1 : 0,
+    color: isBox ? boxColor : undefined,
+    config: {
+      merge: Object.keys(merges).length > 0 ? merges : undefined,
+      columnlen,
+      rowlen,
+      borderInfo: borderInfo.length > 0 ? borderInfo : undefined,
+    },
+  }
+}
+
+export function SpreadsheetView({ gridData, height = 600 }: SpreadsheetViewProps) {
+  const sheets = useMemo(() => {
+    if (!gridData?.zones) return []
+
+    const sorted = [...gridData.zones].sort((a: GridZone, b: GridZone) => {
+      if (a.zone_type === 'content' && b.zone_type !== 'content') return -1
+      if (a.zone_type !== 'content' && b.zone_type === 'content') return 1
+      return (a.bbox_px?.y ?? 0) - (b.bbox_px?.y ?? 0)
+    })
+
+    return sorted
+      .filter((z: GridZone) => z.cells && z.cells.length > 0)
+      .map((z: GridZone, i: number) => zoneToSheet(z, i, i === 0))
+  }, [gridData])
+
+  const maxRows = Math.max(0, ...sheets.map((s: any) => s.row || 0))
+  const estimatedHeight = Math.max(height, maxRows * 26 + 80)
+
+  if (sheets.length === 0) {
+    return <div className="p-4 text-center text-gray-400">Keine Daten für Spreadsheet.</div>
+  }
+
+  return (
+    <div style={{ width: '100%', height: `${estimatedHeight}px` }}>
+      <Workbook
+        data={sheets}
+        lang="en"
+        showToolbar
+        showFormulaBar={false}
+        showSheetTabs
+        toolbarItems={[
+          'undo', 'redo', '|',
+          'font-bold', 'font-italic', 'font-strikethrough', '|',
+          'font-color', 'background', '|',
+          'font-size', '|',
+          'horizontal-align', 'vertical-align', '|',
+          'text-wrap', 'merge-cell', '|',
+          'border',
+        ]}
+      />
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
@@ -0,0 +1,110 @@
+'use client'
+
+/**
+ * StepAnsicht — Excel-like Spreadsheet View.
+ *
+ * Left:  Original scan with OCR word overlay
+ * Right: Fortune Sheet spreadsheet with multi-sheet tabs per zone
+ */
+
+import { useEffect, useRef, useState } from 'react'
+import dynamic from 'next/dynamic'
+
+const SpreadsheetView = dynamic(
+  () => import('./SpreadsheetView').then((m) => m.SpreadsheetView),
+  { ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
+)
+
+const KLAUSUR_API = '/klausur-api'
+
+interface StepAnsichtProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+export function StepAnsicht({ sessionId, onNext }: StepAnsichtProps) {
+  const [gridData, setGridData] = useState<any>(null)
+  const [loading, setLoading] = useState(true)
+  const [error, setError] = useState<string | null>(null)
+  const leftRef = useRef<HTMLDivElement>(null)
+  const [leftHeight, setLeftHeight] = useState(600)
+
+  // Load grid data on mount
+  useEffect(() => {
+    if (!sessionId) return
+    ;(async () => {
+      try {
+        const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
+        if (!res.ok) throw new Error(`HTTP ${res.status}`)
+        setGridData(await res.json())
+      } catch (e) {
+        setError(e instanceof Error ? e.message : 'Fehler beim Laden')
+      } finally {
+        setLoading(false)
+      }
+    })()
+  }, [sessionId])
+
+  // Track left panel height
+  useEffect(() => {
+    if (!leftRef.current) return
+    const ro = new ResizeObserver(([e]) => setLeftHeight(e.contentRect.height))
+    ro.observe(leftRef.current)
+    return () => ro.disconnect()
+  }, [])
+
+  if (loading) {
+    return (
+      <div className="flex items-center justify-center py-16">
+        <div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
+        <span className="ml-3 text-gray-500">Lade Spreadsheet...</span>
+      </div>
+    )
+  }
+
+  if (error || !gridData) {
+    return (
+      <div className="p-8 text-center">
+        <p className="text-red-500 mb-4">{error || 'Keine Grid-Daten.'}</p>
+        <button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg">Weiter →</button>
+      </div>
+    )
+  }
+
+  return (
+    <div className="space-y-3">
+      {/* Header */}
+      <div className="flex items-center justify-between">
+        <div>
+          <h3 className="text-lg font-semibold text-gray-900 dark:text-white">Ansicht — Spreadsheet</h3>
+          <p className="text-sm text-gray-500 dark:text-gray-400">
+            Jede Zone als eigenes Sheet-Tab. Spaltenbreiten pro Sheet optimiert.
+          </p>
+        </div>
+        <button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 text-sm font-medium">
+          Weiter →
+        </button>
+      </div>
+
+      {/* Split view */}
+      <div className="flex gap-2">
+        {/* LEFT: Original + OCR overlay */}
+        <div ref={leftRef} className="w-1/3 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900 flex-shrink-0">
+          <div className="px-2 py-1 bg-black/60 text-white text-[10px] font-medium">Original + OCR</div>
+          {sessionId && (
+            <img
+              src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/words-overlay`}
+              alt="Original + OCR"
+              className="w-full h-auto"
+            />
+          )}
+        </div>
+
+        {/* RIGHT: Fortune Sheet — height adapts to content */}
+        <div className="flex-1 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900">
+          <SpreadsheetView gridData={gridData} height={Math.max(700, leftHeight)} />
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
@@ -0,0 +1,283 @@
+'use client'
+
+import { useCallback, useEffect, useRef, useState } from 'react'
+import { useGridEditor } from '@/components/grid-editor/useGridEditor'
+import type { GridZone } from '@/components/grid-editor/types'
+import { GridTable } from '@/components/grid-editor/GridTable'
+
+const KLAUSUR_API = '/klausur-api'
+
+type BoxLayoutType = 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
+
+const LAYOUT_LABELS: Record<BoxLayoutType, string> = {
+  flowing: 'Fließtext',
+  columnar: 'Tabelle/Spalten',
+  bullet_list: 'Aufzählung',
+  header_only: 'Überschrift',
+}
+
+interface StepBoxGridReviewProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+export function StepBoxGridReview({ sessionId, onNext }: StepBoxGridReviewProps) {
+  const {
+    grid,
+    loading,
+    saving,
+    error,
+    dirty,
+    selectedCell,
+    setSelectedCell,
+    loadGrid,
+    saveGrid,
+    updateCellText,
+    toggleColumnBold,
+    toggleRowHeader,
+    undo,
+    redo,
+    canUndo,
+    canRedo,
+    getAdjacentCell,
+    commitUndoPoint,
+    selectedCells,
+    toggleCellSelection,
+    clearCellSelection,
+    toggleSelectedBold,
+    setCellColor,
+    deleteColumn,
+    addColumn,
+    deleteRow,
+    addRow,
+  } = useGridEditor(sessionId)
+
+  const [building, setBuilding] = useState(false)
+  const [buildError, setBuildError] = useState<string | null>(null)
+
+  // Load grid on mount
+  useEffect(() => {
+    if (sessionId) loadGrid()
+  }, [sessionId]) // eslint-disable-line react-hooks/exhaustive-deps
+
+  // Get box zones
+  const boxZones: GridZone[] = (grid?.zones || []).filter(
+    (z: GridZone) => z.zone_type === 'box'
+  )
+
+  // Build box grids via backend
+  const buildBoxGrids = useCallback(async (overrides?: Record<string, string>) => {
+    if (!sessionId) return
+    setBuilding(true)
+    setBuildError(null)
+    try {
+      const res = await fetch(
+        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-box-grids`,
+        {
+          method: 'POST',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({ overrides: overrides || {} }),
+        },
+      )
+      if (!res.ok) {
+        const data = await res.json().catch(() => ({}))
+        throw new Error(data.detail || `HTTP ${res.status}`)
+      }
+      await loadGrid()
+    } catch (e) {
+      setBuildError(e instanceof Error ? e.message : String(e))
+    } finally {
+      setBuilding(false)
+    }
+  }, [sessionId, loadGrid])
+
+  // Handle layout type change for a specific box zone
+  const changeLayoutType = useCallback(async (boxIdx: number, layoutType: string) => {
+    await buildBoxGrids({ [String(boxIdx)]: layoutType })
+  }, [buildBoxGrids])
+
+  // Auto-build once on first load if box zones have no cells
+  const autoBuildDone = useRef(false)
+  useEffect(() => {
+    if (!grid || loading || building || autoBuildDone.current) return
+    const needsBuild = boxZones.some(z => !z.cells || z.cells.length === 0)
+    if (needsBuild && sessionId) {
+      autoBuildDone.current = true
+      buildBoxGrids()
+    }
+  }, [grid, loading]) // eslint-disable-line react-hooks/exhaustive-deps
+
+  if (loading) {
+    return (
+      <div className="flex items-center justify-center py-16">
+        <div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
+        <span className="ml-3 text-gray-500">Lade Grid...</span>
+      </div>
+    )
+  }
+
+  // No boxes after build attempt — skip step
+  if (!building && boxZones.length === 0) {
+    return (
+      <div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-8 text-center">
+        <div className="text-4xl mb-3">📦</div>
+        <h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-2">
+          Keine Boxen erkannt
+        </h3>
+        <p className="text-gray-500 dark:text-gray-400 mb-6">
+          Auf dieser Seite wurden keine eingebetteten Boxen (Grammatik-Tipps, Übungen etc.) erkannt.
+        </p>
+        <button
+          onClick={onNext}
+          className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium"
+        >
+          Weiter →
+        </button>
+      </div>
+    )
+  }
+
+  return (
+    <div className="space-y-4">
+      {/* Header */}
+      <div className="flex items-center justify-between">
+        <div>
+          <h3 className="text-lg font-semibold text-gray-900 dark:text-white">
+            Box-Review ({boxZones.length} {boxZones.length === 1 ? 'Box' : 'Boxen'})
+          </h3>
+          <p className="text-sm text-gray-500 dark:text-gray-400">
+            Eingebettete Boxen prüfen und korrigieren. Layout-Typ kann pro Box angepasst werden.
+          </p>
+        </div>
+        <div className="flex items-center gap-2">
+          {dirty && (
+            <button
+              onClick={saveGrid}
+              disabled={saving}
+              className="px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition-colors text-sm font-medium disabled:opacity-50"
+            >
+              {saving ? 'Speichere...' : 'Speichern'}
+            </button>
+          )}
+          <button
+            onClick={() => buildBoxGrids()}
+            disabled={building}
+            className="px-4 py-2 bg-amber-600 text-white rounded-lg hover:bg-amber-700 transition-colors text-sm font-medium disabled:opacity-50"
+          >
+            {building ? 'Verarbeite...' : 'Alle Boxen neu aufbauen'}
+          </button>
+          <button
+            onClick={async () => {
+              if (dirty) await saveGrid()
+              onNext()
+            }}
+            className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium"
+          >
+            Weiter →
+          </button>
+        </div>
+      </div>
+
+      {/* Errors */}
+      {(error || buildError) && (
+        <div className="p-3 bg-red-50 dark:bg-red-900/30 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-300 text-sm">
+          {error || buildError}
+        </div>
+      )}
+
+      {building && (
+        <div className="flex items-center gap-3 p-4 bg-amber-50 dark:bg-amber-900/20 border border-amber-200 dark:border-amber-800 rounded-lg">
+          <div className="w-5 h-5 border-2 border-amber-500 border-t-transparent rounded-full animate-spin" />
+          <span className="text-amber-700 dark:text-amber-300 text-sm">Box-Grids werden aufgebaut...</span>
+        </div>
+      )}
+
+      {/* Box zones */}
+      {boxZones.map((zone, boxIdx) => {
+        const boxColor = zone.box_bg_hex || '#d97706' // amber fallback
+        const boxColorName = zone.box_bg_color || 'box'
+        return (
+        <div
+          key={zone.zone_index}
+          className="bg-white dark:bg-gray-800 rounded-xl overflow-hidden"
+          style={{ border: `3px solid ${boxColor}` }}
+        >
+          {/* Box header */}
+          <div
+            className="flex items-center justify-between px-4 py-3 border-b"
+            style={{ backgroundColor: `${boxColor}15`, borderColor: `${boxColor}30` }}
+          >
+            <div className="flex items-center gap-3">
+              <div
+                className="w-8 h-8 rounded-lg flex items-center justify-center text-white text-sm font-bold"
+                style={{ backgroundColor: boxColor }}
+              >
+                {boxIdx + 1}
+              </div>
+              <div>
+                <span className="font-medium text-gray-900 dark:text-white">
+                  Box {boxIdx + 1}
+                </span>
+                <span className="text-xs text-gray-500 dark:text-gray-400 ml-2">
+                  {zone.bbox_px?.w}x{zone.bbox_px?.h}px
+                  {zone.cells?.length ? ` | ${zone.cells.length} Zellen` : ''}
+                  {zone.box_layout_type ? ` | ${LAYOUT_LABELS[zone.box_layout_type as BoxLayoutType] || zone.box_layout_type}` : ''}
+                  {boxColorName !== 'box' ? ` | ${boxColorName}` : ''}
+                </span>
+              </div>
+            </div>
+            <div className="flex items-center gap-2">
+              <label className="text-xs text-gray-500 dark:text-gray-400">Layout:</label>
+              <select
+                value={zone.box_layout_type || 'flowing'}
+                onChange={(e) => changeLayoutType(boxIdx, e.target.value)}
+                disabled={building}
+                className="text-xs px-2 py-1 rounded border border-gray-300 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200"
+              >
+                {Object.entries(LAYOUT_LABELS).map(([key, label]) => (
+                  <option key={key} value={key}>{label}</option>
+                ))}
+              </select>
+            </div>
+          </div>
+
+          {/* Box grid table */}
+          <div className="p-3">
+            {zone.cells && zone.cells.length > 0 ? (
+              <GridTable
+                zone={zone}
+                selectedCell={selectedCell}
+                selectedCells={selectedCells}
+                onSelectCell={setSelectedCell}
+                onCellTextChange={updateCellText}
+                onToggleColumnBold={toggleColumnBold}
+                onToggleRowHeader={toggleRowHeader}
+                onNavigate={(cellId, dir) => {
+                  const next = getAdjacentCell(cellId, dir)
+                  if (next) setSelectedCell(next)
+                }}
+                onDeleteColumn={deleteColumn}
+                onAddColumn={addColumn}
+                onDeleteRow={deleteRow}
+                onAddRow={addRow}
+                onToggleCellSelection={toggleCellSelection}
+                onSetCellColor={setCellColor}
+              />
+            ) : (
+              <div className="text-center py-8 text-gray-400">
+                <p className="text-sm">Keine Zellen erkannt.</p>
+                <button
+                  onClick={() => buildBoxGrids({ [String(boxIdx)]: 'flowing' })}
+                  className="mt-2 text-xs text-amber-600 hover:text-amber-700"
+                >
+                  Als Fließtext verarbeiten
+                </button>
+              </div>
+            )}
+          </div>
+        </div>
+        )
+      })}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepContentCrop.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepContentCrop.tsx
@@ -0,0 +1,13 @@
+'use client'
+
+import { StepCrop as BaseStepCrop } from '@/components/ocr-pipeline/StepCrop'
+
+interface StepContentCropProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/** Thin wrapper around the shared StepCrop component */
+export function StepContentCrop({ sessionId, onNext }: StepContentCropProps) {
+  return <BaseStepCrop key={sessionId} sessionId={sessionId} onNext={onNext} />
+}
--- a/admin-lehrer/components/ocr-kombi/StepDeskew.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepDeskew.tsx
@@ -0,0 +1,13 @@
+'use client'
+
+import { StepDeskew as BaseStepDeskew } from '@/components/ocr-pipeline/StepDeskew'
+
+interface StepDeskewProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/** Thin wrapper around the shared StepDeskew component */
+export function StepDeskew({ sessionId, onNext }: StepDeskewProps) {
+  return <BaseStepDeskew key={sessionId} sessionId={sessionId} onNext={onNext} />
+}
--- a/admin-lehrer/components/ocr-kombi/StepDewarp.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepDewarp.tsx
@@ -0,0 +1,13 @@
+'use client'
+
+import { StepDewarp as BaseStepDewarp } from '@/components/ocr-pipeline/StepDewarp'
+
+interface StepDewarpProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/** Thin wrapper around the shared StepDewarp component */
+export function StepDewarp({ sessionId, onNext }: StepDewarpProps) {
+  return <BaseStepDewarp key={sessionId} sessionId={sessionId} onNext={onNext} />
+}
--- a/admin-lehrer/components/ocr-kombi/StepGridBuild.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGridBuild.tsx
@@ -0,0 +1,117 @@
+'use client'
+
+import { useState, useEffect } from 'react'
+
+const KLAUSUR_API = '/klausur-api'
+
+interface StepGridBuildProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/**
+ * Step 9: Grid Build.
+ * Triggers the build-grid endpoint and shows progress.
+ */
+export function StepGridBuild({ sessionId, onNext }: StepGridBuildProps) {
+  const [building, setBuilding] = useState(false)
+  const [result, setResult] = useState<{ rows: number; cols: number; cells: number } | null>(null)
+  const [error, setError] = useState('')
+  const [autoTriggered, setAutoTriggered] = useState(false)
+
+  useEffect(() => {
+    if (!sessionId || autoTriggered) return
+    // Check if grid already exists
+    checkExistingGrid()
+  // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [sessionId])
+
+  const checkExistingGrid = async () => {
+    if (!sessionId) return
+    try {
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
+      if (res.ok) {
+        const data = await res.json()
+        // Use grid-editor summary (accurate zone-based counts)
+        const summary = data.summary
+        if (summary) {
+          setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
+          return
+        }
+      }
+    } catch { /* no existing grid */ }
+
+    // Auto-trigger build
+    setAutoTriggered(true)
+    buildGrid()
+  }
+
+  const buildGrid = async () => {
+    if (!sessionId) return
+    setBuilding(true)
+    setError('')
+    try {
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid`, {
+        method: 'POST',
+      })
+      if (!res.ok) {
+        const data = await res.json().catch(() => ({}))
+        throw new Error(data.detail || `Grid-Build fehlgeschlagen (${res.status})`)
+      }
+      const data = await res.json()
+      // Use grid-editor summary (zone-based, more accurate than word_result.grid_shape)
+      const summary = data.summary
+      if (summary) {
+        setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
+      } else {
+        const shape = data.grid_shape || { rows: 0, cols: 0, total_cells: 0 }
+        setResult({ rows: shape.rows, cols: shape.cols, cells: shape.total_cells })
+      }
+    } catch (e) {
+      setError(e instanceof Error ? e.message : String(e))
+    } finally {
+      setBuilding(false)
+    }
+  }
+
+  return (
+    <div className="space-y-4">
+      {building && (
+        <div className="flex items-center gap-3 p-6 bg-blue-50 dark:bg-blue-900/20 rounded-xl border border-blue-200 dark:border-blue-800">
+          <div className="animate-spin w-5 h-5 border-2 border-blue-400 border-t-transparent rounded-full" />
+          <span className="text-sm text-blue-600 dark:text-blue-400">Grid wird aufgebaut...</span>
+        </div>
+      )}
+
+      {result && (
+        <div className="space-y-3">
+          <div className="p-4 bg-green-50 dark:bg-green-900/20 rounded-xl border border-green-200 dark:border-green-800">
+            <div className="text-sm font-medium text-green-700 dark:text-green-300">
+              Grid erstellt: {result.rows} Zeilen, {result.cols} Spalten, {result.cells} Zellen
+            </div>
+          </div>
+          <button
+            onClick={onNext}
+            className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
+          >
+            Weiter zum Review
+          </button>
+        </div>
+      )}
+
+      {error && (
+        <div className="space-y-3">
+          <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
+            {error}
+          </div>
+          <button
+            onClick={buildGrid}
+            className="px-4 py-2 bg-orange-600 text-white text-sm rounded-lg hover:bg-orange-700"
+          >
+            Erneut versuchen
+          </button>
+        </div>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepGridReview.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGridReview.tsx
@@ -0,0 +1,15 @@
+'use client'
+
+import { StepGridReview as BaseStepGridReview } from '@/components/ocr-pipeline/StepGridReview'
+import type { MutableRefObject } from 'react'
+
+interface StepGridReviewProps {
+  sessionId: string | null
+  onNext: () => void
+  saveRef: MutableRefObject<(() => Promise<void>) | null>
+}
+
+/** Thin wrapper around the shared StepGridReview component */
+export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewProps) {
+  return <BaseStepGridReview sessionId={sessionId} onNext={onNext} saveRef={saveRef} />
+}
--- a/admin-lehrer/components/ocr-kombi/StepGroundTruth.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGroundTruth.tsx
@@ -0,0 +1,295 @@
+'use client'
+
+import { useCallback, useEffect, useRef, useState } from 'react'
+import { useGridEditor } from '@/components/grid-editor/useGridEditor'
+import { GridTable } from '@/components/grid-editor/GridTable'
+import { ImageLayoutEditor } from '@/components/grid-editor/ImageLayoutEditor'
+import type { GridZone } from '@/components/grid-editor/types'
+
+const KLAUSUR_API = '/klausur-api'
+
+interface StepGroundTruthProps {
+  sessionId: string | null
+  isGroundTruth: boolean
+  onMarked: () => void
+  gridSaveRef: React.MutableRefObject<(() => Promise<void>) | null>
+}
+
+/**
+ * Step 12: Ground Truth marking.
+ *
+ * Shows the full Grid-Review view (original image + table) so the user
+ * can verify the final result before marking as Ground Truth reference.
+ */
+export function StepGroundTruth({ sessionId, isGroundTruth, onMarked, gridSaveRef }: StepGroundTruthProps) {
+  const {
+    grid,
+    loading,
+    saving,
+    error,
+    dirty,
+    selectedCell,
+    selectedCells,
+    setSelectedCell,
+    loadGrid,
+    saveGrid,
+    updateCellText,
+    toggleColumnBold,
+    toggleRowHeader,
+    undo,
+    redo,
+    canUndo,
+    canRedo,
+    getAdjacentCell,
+    deleteColumn,
+    addColumn,
+    deleteRow,
+    addRow,
+    toggleCellSelection,
+    clearCellSelection,
+    toggleSelectedBold,
+    setCellColor,
+  } = useGridEditor(sessionId)
+
+  const [showImage, setShowImage] = useState(true)
+  const [zoom, setZoom] = useState(100)
+  const [markSaving, setMarkSaving] = useState(false)
+  const [message, setMessage] = useState('')
+
+  // Expose save function via ref
+  useEffect(() => {
+    if (gridSaveRef) {
+      gridSaveRef.current = async () => {
+        if (dirty) await saveGrid()
+      }
+      return () => { gridSaveRef.current = null }
+    }
+  }, [gridSaveRef, dirty, saveGrid])
+
+  // Load grid on mount
+  useEffect(() => {
+    if (sessionId) loadGrid()
+  }, [sessionId, loadGrid])
+
+  // Keyboard shortcuts
+  useEffect(() => {
+    const handler = (e: KeyboardEvent) => {
+      if ((e.metaKey || e.ctrlKey) && e.key === 'z' && !e.shiftKey) {
+        e.preventDefault(); undo()
+      } else if ((e.metaKey || e.ctrlKey) && e.key === 'z' && e.shiftKey) {
+        e.preventDefault(); redo()
+      } else if ((e.metaKey || e.ctrlKey) && e.key === 's') {
+        e.preventDefault(); saveGrid()
+      } else if ((e.metaKey || e.ctrlKey) && e.key === 'b') {
+        e.preventDefault()
+        if (selectedCells.size > 0) toggleSelectedBold()
+      } else if (e.key === 'Escape') {
+        clearCellSelection()
+      }
+    }
+    window.addEventListener('keydown', handler)
+    return () => window.removeEventListener('keydown', handler)
+  }, [undo, redo, saveGrid, selectedCells, toggleSelectedBold, clearCellSelection])
+
+  const handleNavigate = useCallback(
+    (cellId: string, direction: 'up' | 'down' | 'left' | 'right') => {
+      const target = getAdjacentCell(cellId, direction)
+      if (target) {
+        setSelectedCell(target)
+        setTimeout(() => {
+          const el = document.getElementById(`cell-${target}`)
+          if (el) {
+            el.focus()
+            if (el instanceof HTMLInputElement) el.select()
+          }
+        }, 0)
+      }
+    },
+    [getAdjacentCell, setSelectedCell],
+  )
+
+  const handleMark = async () => {
+    if (!sessionId) return
+    setMarkSaving(true)
+    setMessage('')
+    try {
+      if (dirty) await saveGrid()
+      const res = await fetch(
+        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/mark-ground-truth?pipeline=kombi`,
+        { method: 'POST' },
+      )
+      if (!res.ok) {
+        const body = await res.text().catch(() => '')
+        throw new Error(`Ground Truth fehlgeschlagen (${res.status}): ${body}`)
+      }
+      const data = await res.json()
+      setMessage(`Ground Truth gespeichert (${data.cells_saved} Zellen)`)
+      onMarked()
+    } catch (e) {
+      setMessage(e instanceof Error ? e.message : String(e))
+    } finally {
+      setMarkSaving(false)
+    }
+  }
+
+  if (!sessionId) {
+    return <div className="text-center py-12 text-gray-400">Keine Session ausgewaehlt.</div>
+  }
+
+  if (loading) {
+    return (
+      <div className="flex items-center justify-center py-16">
+        <div className="flex items-center gap-3 text-gray-500 dark:text-gray-400">
+          <svg className="w-5 h-5 animate-spin" fill="none" viewBox="0 0 24 24">
+            <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
+            <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
+          </svg>
+          Grid wird geladen...
+        </div>
+      </div>
+    )
+  }
+
+  if (error) {
+    return (
+      <div className="bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-4">
+        <p className="text-sm text-red-700 dark:text-red-300">Fehler: {error}</p>
+      </div>
+    )
+  }
+
+  if (!grid || !grid.zones.length) {
+    return <div className="text-center py-12 text-gray-400">Kein Grid vorhanden.</div>
+  }
+
+  const imageUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/cropped`
+
+  return (
+    <div className="space-y-3">
+      {/* GT Header Bar */}
+      <div className="flex items-center justify-between p-3 bg-amber-50 dark:bg-amber-900/10 rounded-xl border border-amber-200 dark:border-amber-800">
+        <div>
+          <h3 className="text-sm font-medium text-amber-700 dark:text-amber-300">
+            Ground Truth
+            {isGroundTruth && <span className="ml-2 text-xs font-normal text-amber-500">(bereits markiert)</span>}
+          </h3>
+          <p className="text-xs text-amber-600 dark:text-amber-400 mt-0.5">
+            Pruefen Sie das Ergebnis und markieren Sie es als Referenz fuer Regressionstests.
+          </p>
+        </div>
+        <div className="flex items-center gap-2">
+          {dirty && (
+            <button
+              onClick={saveGrid}
+              disabled={saving}
+              className="px-3 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
+            >
+              {saving ? 'Speichere...' : 'Speichern'}
+            </button>
+          )}
+          <button
+            onClick={handleMark}
+            disabled={markSaving}
+            className="px-4 py-1.5 text-xs bg-amber-600 text-white rounded-lg hover:bg-amber-700 disabled:opacity-50"
+          >
+            {markSaving ? 'Speichere...' : isGroundTruth ? 'GT aktualisieren' : 'Als Ground Truth markieren'}
+          </button>
+        </div>
+      </div>
+
+      {message && (
+        <div className={`text-sm p-2 rounded ${message.includes('fehlgeschlagen') ? 'text-red-500 bg-red-50 dark:bg-red-900/20' : 'text-amber-600 dark:text-amber-400 bg-amber-50 dark:bg-amber-900/10'}`}>
+          {message}
+        </div>
+      )}
+
+      {/* Stats */}
+      <div className="flex items-center gap-4 text-xs flex-wrap">
+        <span className="text-gray-500 dark:text-gray-400">
+          {grid.summary.total_zones} Zone(n), {grid.summary.total_columns} Spalten,{' '}
+          {grid.summary.total_rows} Zeilen, {grid.summary.total_cells} Zellen
+        </span>
+        <button
+          onClick={() => setShowImage(!showImage)}
+          className={`px-2.5 py-1 rounded text-xs border transition-colors ${
+            showImage
+              ? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
+              : 'bg-gray-50 dark:bg-gray-800 border-gray-200 dark:border-gray-700 text-gray-500 dark:text-gray-400'
+          }`}
+        >
+          {showImage ? 'Bild ausblenden' : 'Bild einblenden'}
+        </button>
+      </div>
+
+      {/* Split View: Image left + Grid right */}
+      <div className={showImage ? 'grid grid-cols-2 gap-3' : ''} style={{ minHeight: '55vh' }}>
+        {showImage && (
+          <ImageLayoutEditor
+            imageUrl={imageUrl}
+            zones={grid.zones}
+            imageWidth={grid.image_width}
+            layoutDividers={grid.layout_dividers}
+            zoom={zoom}
+            onZoomChange={setZoom}
+            onColumnDividerMove={() => {}}
+            onHorizontalsChange={() => {}}
+            onCommitUndo={() => {}}
+            onSplitColumnAt={() => {}}
+            onDeleteColumn={() => {}}
+          />
+        )}
+
+        <div className="space-y-3">
+          {(() => {
+            const groups: GridZone[][] = []
+            for (const zone of grid.zones) {
+              const prev = groups[groups.length - 1]
+              if (prev && zone.vsplit_group != null && prev[0].vsplit_group === zone.vsplit_group) {
+                prev.push(zone)
+              } else {
+                groups.push([zone])
+              }
+            }
+            return groups.map((group) => (
+              <div key={group[0].vsplit_group ?? group[0].zone_index}>
+                <div className={`${group.length > 1 ? 'flex gap-2' : ''}`}>
+                  {group.map((zone) => (
+                    <div
+                      key={zone.zone_index}
+                      className={`${group.length > 1 ? 'flex-1 min-w-0' : ''} bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700`}
+                    >
+                      <GridTable
+                        zone={zone}
+                        layoutMetrics={grid.layout_metrics}
+                        selectedCell={selectedCell}
+                        selectedCells={selectedCells}
+                        onSelectCell={setSelectedCell}
+                        onToggleCellSelection={toggleCellSelection}
+                        onCellTextChange={updateCellText}
+                        onToggleColumnBold={toggleColumnBold}
+                        onToggleRowHeader={toggleRowHeader}
+                        onNavigate={handleNavigate}
+                        onDeleteColumn={deleteColumn}
+                        onAddColumn={addColumn}
+                        onDeleteRow={deleteRow}
+                        onAddRow={addRow}
+                        onSetCellColor={setCellColor}
+                      />
+                    </div>
+                  ))}
+                </div>
+              </div>
+            ))
+          })()}
+        </div>
+      </div>
+
+      {/* Keyboard tips */}
+      <div className="text-[11px] text-gray-400 dark:text-gray-500 flex items-center gap-4">
+        <span>Tab: naechste Zelle</span>
+        <span>Ctrl+Z/Y: Undo/Redo</span>
+        <span>Ctrl+S: Speichern</span>
+      </div>
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
@@ -0,0 +1,422 @@
+'use client'
+
+import { useState, useEffect, useCallback } from 'react'
+
+const KLAUSUR_API = '/klausur-api'
+
+interface GutterSuggestion {
+  id: string
+  type: 'hyphen_join' | 'spell_fix'
+  zone_index: number
+  row_index: number
+  col_index: number
+  col_type: string
+  cell_id: string
+  original_text: string
+  suggested_text: string
+  next_row_index: number
+  next_row_cell_id: string
+  next_row_text: string
+  missing_chars: string
+  display_parts: string[]
+  alternatives: string[]
+  confidence: number
+  reason: string
+}
+
+interface GutterRepairResult {
+  suggestions: GutterSuggestion[]
+  stats: {
+    words_checked: number
+    gutter_candidates: number
+    suggestions_found: number
+    error?: string
+  }
+  duration_seconds: number
+}
+
+interface StepGutterRepairProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/**
+ * Step 11: Gutter Repair (Wortkorrektur).
+ * Detects words truncated at the book gutter and proposes corrections.
+ * User can accept/reject each suggestion individually or in batch.
+ */
+export function StepGutterRepair({ sessionId, onNext }: StepGutterRepairProps) {
+  const [loading, setLoading] = useState(false)
+  const [applying, setApplying] = useState(false)
+  const [result, setResult] = useState<GutterRepairResult | null>(null)
+  const [accepted, setAccepted] = useState<Set<string>>(new Set())
+  const [rejected, setRejected] = useState<Set<string>>(new Set())
+  const [selectedText, setSelectedText] = useState<Record<string, string>>({})
+  const [applied, setApplied] = useState(false)
+  const [error, setError] = useState('')
+  const [applyMessage, setApplyMessage] = useState('')
+
+  const analyse = useCallback(async () => {
+    if (!sessionId) return
+    setLoading(true)
+    setError('')
+    setApplied(false)
+    setApplyMessage('')
+    try {
+      const res = await fetch(
+        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair`,
+        { method: 'POST' },
+      )
+      if (!res.ok) {
+        const body = await res.json().catch(() => ({}))
+        throw new Error(body.detail || `Analyse fehlgeschlagen (${res.status})`)
+      }
+      const data: GutterRepairResult = await res.json()
+      setResult(data)
+      // Auto-accept all suggestions with high confidence
+      const autoAccept = new Set<string>()
+      for (const s of data.suggestions) {
+        if (s.confidence >= 0.85) {
+          autoAccept.add(s.id)
+        }
+      }
+      setAccepted(autoAccept)
+      setRejected(new Set())
+    } catch (e) {
+      setError(e instanceof Error ? e.message : String(e))
+    } finally {
+      setLoading(false)
+    }
+  }, [sessionId])
+
+  // Auto-trigger analysis on mount
+  useEffect(() => {
+    if (sessionId) analyse()
+  }, [sessionId, analyse])
+
+  const toggleSuggestion = (id: string) => {
+    setAccepted(prev => {
+      const next = new Set(prev)
+      if (next.has(id)) {
+        next.delete(id)
+        setRejected(r => new Set(r).add(id))
+      } else {
+        next.add(id)
+        setRejected(r => { const n = new Set(r); n.delete(id); return n })
+      }
+      return next
+    })
+  }
+
+  const acceptAll = () => {
+    if (!result) return
+    setAccepted(new Set(result.suggestions.map(s => s.id)))
+    setRejected(new Set())
+  }
+
+  const rejectAll = () => {
+    if (!result) return
+    setRejected(new Set(result.suggestions.map(s => s.id)))
+    setAccepted(new Set())
+  }
+
+  const applyAccepted = async () => {
+    if (!sessionId || accepted.size === 0) return
+    setApplying(true)
+    setApplyMessage('')
+    try {
+      const res = await fetch(
+        `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair/apply`,
+        {
+          method: 'POST',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({
+            accepted: Array.from(accepted),
+            text_overrides: selectedText,
+          }),
+        },
+      )
+      if (!res.ok) {
+        const body = await res.json().catch(() => ({}))
+        throw new Error(body.detail || `Anwenden fehlgeschlagen (${res.status})`)
+      }
+      const data = await res.json()
+      setApplied(true)
+      setApplyMessage(`${data.applied_count} Korrektur(en) angewendet.`)
+    } catch (e) {
+      setApplyMessage(e instanceof Error ? e.message : String(e))
+    } finally {
+      setApplying(false)
+    }
+  }
+
+  const suggestions = result?.suggestions || []
+  const hasSuggestions = suggestions.length > 0
+
+  return (
+    <div className="space-y-4">
+      {/* Header */}
+      <div className="flex items-center justify-between">
+        <div>
+          <h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
+            Wortkorrektur (Buchfalz)
+          </h3>
+          <p className="text-xs text-gray-500 dark:text-gray-400 mt-1">
+            Erkennt abgeschnittene oder unscharfe Woerter am Buchfalz und Bindestrich-Trennungen ueber Zeilen hinweg.
+          </p>
+        </div>
+        {result && !loading && (
+          <button
+            onClick={analyse}
+            className="px-3 py-1.5 text-xs bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 rounded-lg hover:bg-gray-200 dark:hover:bg-gray-600"
+          >
+            Erneut analysieren
+          </button>
+        )}
+      </div>
+
+      {/* Loading */}
+      {loading && (
+        <div className="flex items-center gap-3 p-6 bg-blue-50 dark:bg-blue-900/20 rounded-xl border border-blue-200 dark:border-blue-800">
+          <div className="animate-spin w-5 h-5 border-2 border-blue-400 border-t-transparent rounded-full" />
+          <span className="text-sm text-blue-600 dark:text-blue-400">Analysiere Woerter am Buchfalz...</span>
+        </div>
+      )}
+
+      {/* Error */}
+      {error && (
+        <div className="space-y-3">
+          <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
+            {error}
+          </div>
+          <button
+            onClick={analyse}
+            className="px-4 py-2 bg-orange-600 text-white text-sm rounded-lg hover:bg-orange-700"
+          >
+            Erneut versuchen
+          </button>
+        </div>
+      )}
+
+      {/* No suggestions */}
+      {result && !hasSuggestions && !loading && (
+        <div className="p-4 bg-green-50 dark:bg-green-900/20 rounded-xl border border-green-200 dark:border-green-800">
+          <div className="text-sm font-medium text-green-700 dark:text-green-300">
+            Keine Buchfalz-Fehler erkannt.
+          </div>
+          <div className="text-xs text-green-600 dark:text-green-400 mt-1">
+            {result.stats.words_checked} Woerter geprueft, {result.stats.gutter_candidates} Kandidaten am Rand analysiert.
+          </div>
+        </div>
+      )}
+
+      {/* Suggestions list */}
+      {hasSuggestions && !loading && (
+        <>
+          {/* Stats bar */}
+          <div className="flex items-center justify-between p-3 bg-gray-50 dark:bg-gray-800 rounded-lg">
+            <div className="text-xs text-gray-500 dark:text-gray-400">
+              {suggestions.length} Vorschlag/Vorschlaege &middot;{' '}
+              {result!.stats.words_checked} Woerter geprueft &middot;{' '}
+              {result!.duration_seconds}s
+            </div>
+            <div className="flex gap-2">
+              <button
+                onClick={acceptAll}
+                disabled={applied}
+                className="px-2 py-1 text-xs bg-green-100 dark:bg-green-900/30 text-green-700 dark:text-green-300 rounded hover:bg-green-200 dark:hover:bg-green-900/50 disabled:opacity-50"
+              >
+                Alle akzeptieren
+              </button>
+              <button
+                onClick={rejectAll}
+                disabled={applied}
+                className="px-2 py-1 text-xs bg-red-100 dark:bg-red-900/30 text-red-700 dark:text-red-300 rounded hover:bg-red-200 dark:hover:bg-red-900/50 disabled:opacity-50"
+              >
+                Alle ablehnen
+              </button>
+            </div>
+          </div>
+
+          {/* Suggestion cards */}
+          <div className="space-y-2">
+            {suggestions.map((s) => {
+              const isAccepted = accepted.has(s.id)
+              const isRejected = rejected.has(s.id)
+
+              return (
+                <div
+                  key={s.id}
+                  className={`p-3 rounded-lg border transition-colors ${
+                    applied
+                      ? isAccepted
+                        ? 'bg-green-50 dark:bg-green-900/10 border-green-200 dark:border-green-800'
+                        : 'bg-gray-50 dark:bg-gray-800/50 border-gray-200 dark:border-gray-700 opacity-60'
+                      : isAccepted
+                        ? 'bg-green-50 dark:bg-green-900/10 border-green-300 dark:border-green-700'
+                        : isRejected
+                          ? 'bg-red-50 dark:bg-red-900/10 border-red-200 dark:border-red-800 opacity-60'
+                          : 'bg-white dark:bg-gray-800 border-gray-200 dark:border-gray-700'
+                  }`}
+                >
+                  <div className="flex items-start justify-between gap-3">
+                    {/* Left: suggestion details */}
+                    <div className="flex-1 min-w-0">
+                      {/* Type badge */}
+                      <div className="flex items-center gap-2 mb-1.5">
+                        <span className={`inline-flex px-1.5 py-0.5 text-[10px] font-medium rounded ${
+                          s.type === 'hyphen_join'
+                            ? 'bg-purple-100 dark:bg-purple-900/30 text-purple-700 dark:text-purple-300'
+                            : 'bg-orange-100 dark:bg-orange-900/30 text-orange-700 dark:text-orange-300'
+                        }`}>
+                          {s.type === 'hyphen_join' ? 'Zeilenumbruch' : 'Buchfalz-Korrektur'}
+                        </span>
+                        <span className="text-[10px] text-gray-400">
+                          Zeile {s.row_index + 1}, Spalte {s.col_index + 1}
+                          {s.col_type && ` (${s.col_type.replace('column_', '')})`}
+                        </span>
+                        <span className={`text-[10px] ${
+                          s.confidence >= 0.9 ? 'text-green-500' :
+                          s.confidence >= 0.7 ? 'text-yellow-500' : 'text-red-500'
+                        }`}>
+                          {Math.round(s.confidence * 100)}%
+                        </span>
+                      </div>
+
+                      {/* Correction display */}
+                      {s.type === 'hyphen_join' ? (
+                        <div className="space-y-1">
+                          <div className="flex items-center gap-2 text-sm">
+                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
+                              {s.original_text}
+                            </span>
+                            <span className="text-gray-400 text-xs">Z.{s.row_index + 1}</span>
+                            <span className="text-gray-300 dark:text-gray-600">+</span>
+                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
+                              {s.next_row_text.split(' ')[0]}
+                            </span>
+                            <span className="text-gray-400 text-xs">Z.{s.next_row_index + 1}</span>
+                            <span className="text-gray-400">&rarr;</span>
+                            <span className="font-mono text-green-600 dark:text-green-400 font-semibold">
+                              {s.suggested_text}
+                            </span>
+                          </div>
+                          {s.missing_chars && (
+                            <div className="text-[10px] text-gray-400">
+                              Fehlende Zeichen: <span className="font-mono font-semibold">{s.missing_chars}</span>
+                              {' '}&middot; Darstellung: <span className="font-mono">{s.display_parts.join(' | ')}</span>
+                            </div>
+                          )}
+                        </div>
+                      ) : (
+                        <div className="space-y-1">
+                          <div className="flex items-center gap-2 text-sm">
+                            <span className="font-mono text-red-600 dark:text-red-400 line-through">
+                              {s.original_text}
+                            </span>
+                            <span className="text-gray-400">&rarr;</span>
+                            <span className="font-mono text-green-600 dark:text-green-400 font-semibold">
+                              {selectedText[s.id] || s.suggested_text}
+                            </span>
+                          </div>
+                          {/* Alternatives: show other candidates the user can pick */}
+                          {s.alternatives && s.alternatives.length > 0 && !applied && (
+                            <div className="flex items-center gap-1.5 flex-wrap">
+                              <span className="text-[10px] text-gray-400">Alternativen:</span>
+                              {[s.suggested_text, ...s.alternatives].map((alt) => {
+                                const isSelected = (selectedText[s.id] || s.suggested_text) === alt
+                                return (
+                                  <button
+                                    key={alt}
+                                    onClick={() => setSelectedText(prev => ({ ...prev, [s.id]: alt }))}
+                                    className={`px-1.5 py-0.5 text-[11px] font-mono rounded transition-colors ${
+                                      isSelected
+                                        ? 'bg-green-200 dark:bg-green-800 text-green-800 dark:text-green-200 font-semibold'
+                                        : 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 hover:bg-gray-200 dark:hover:bg-gray-600'
+                                    }`}
+                                  >
+                                    {alt}
+                                  </button>
+                                )
+                              })}
+                            </div>
+                          )}
+                        </div>
+                      )}
+                    </div>
+
+                    {/* Right: accept/reject toggle */}
+                    {!applied && (
+                      <button
+                        onClick={() => toggleSuggestion(s.id)}
+                        className={`flex-shrink-0 w-8 h-8 rounded-full flex items-center justify-center text-sm transition-colors ${
+                          isAccepted
+                            ? 'bg-green-500 text-white hover:bg-green-600'
+                            : isRejected
+                              ? 'bg-red-400 text-white hover:bg-red-500'
+                              : 'bg-gray-200 dark:bg-gray-600 text-gray-500 dark:text-gray-300 hover:bg-gray-300 dark:hover:bg-gray-500'
+                        }`}
+                        title={isAccepted ? 'Akzeptiert (klicken zum Ablehnen)' : isRejected ? 'Abgelehnt (klicken zum Akzeptieren)' : 'Klicken zum Akzeptieren'}
+                      >
+                        {isAccepted ? '\u2713' : isRejected ? '\u2717' : '?'}
+                      </button>
+                    )}
+                  </div>
+                </div>
+              )
+            })}
+          </div>
+
+          {/* Apply / Next buttons */}
+          <div className="flex items-center gap-3 pt-2">
+            {!applied ? (
+              <button
+                onClick={applyAccepted}
+                disabled={applying || accepted.size === 0}
+                className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700 disabled:opacity-50"
+              >
+                {applying ? 'Wird angewendet...' : `${accepted.size} Korrektur(en) anwenden`}
+              </button>
+            ) : (
+              <button
+                onClick={onNext}
+                className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
+              >
+                Weiter zu Ground Truth
+              </button>
+            )}
+            {!applied && (
+              <button
+                onClick={onNext}
+                className="px-4 py-2 text-sm text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-200"
+              >
+                Ueberspringen
+              </button>
+            )}
+          </div>
+
+          {/* Apply result message */}
+          {applyMessage && (
+            <div className={`text-sm p-2 rounded ${
+              applyMessage.includes('fehlgeschlagen')
+                ? 'text-red-500 bg-red-50 dark:bg-red-900/20'
+                : 'text-green-600 dark:text-green-400 bg-green-50 dark:bg-green-900/20'
+            }`}>
+              {applyMessage}
+            </div>
+          )}
+        </>
+      )}
+
+      {/* Skip button when no suggestions */}
+      {result && !hasSuggestions && !loading && (
+        <button
+          onClick={onNext}
+          className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
+        >
+          Weiter zu Ground Truth
+        </button>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepOcr.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepOcr.tsx
@@ -0,0 +1,30 @@
+'use client'
+
+import { PaddleDirectStep } from '@/components/ocr-overlay/PaddleDirectStep'
+
+interface StepOcrProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/**
+ * Step 7: OCR (Kombi mode = PaddleOCR + Tesseract).
+ *
+ * Phase 1: Uses the existing PaddleDirectStep with kombi endpoint.
+ * Phase 3 (later) will add transparent 3-phase progress + engine comparison.
+ */
+export function StepOcr({ sessionId, onNext }: StepOcrProps) {
+  return (
+    <PaddleDirectStep
+      sessionId={sessionId}
+      onNext={onNext}
+      endpoint="paddle-kombi"
+      title="Kombi-Modus"
+      description="PP-OCRv5 und Tesseract laufen parallel. Koordinaten werden gewichtet gemittelt fuer optimale Positionierung."
+      icon="🔀"
+      buttonLabel="PP-OCRv5 + Tesseract starten"
+      runningLabel="PP-OCRv5 + Tesseract laufen..."
+      engineKey="kombi"
+    />
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepOrientation.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepOrientation.tsx
@@ -0,0 +1,21 @@
+'use client'
+
+import { StepOrientation as BaseStepOrientation } from '@/components/ocr-pipeline/StepOrientation'
+
+interface StepOrientationProps {
+  sessionId: string | null
+  onNext: () => void
+  onSessionList: () => void
+}
+
+/** Thin wrapper — adapts the shared StepOrientation to the Kombi pipeline's simpler onNext() */
+export function StepOrientation({ sessionId, onNext, onSessionList }: StepOrientationProps) {
+  return (
+    <BaseStepOrientation
+      key={sessionId}
+      sessionId={sessionId}
+      onNext={() => onNext()}
+      onSessionList={onSessionList}
+    />
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepPageSplit.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepPageSplit.tsx
@@ -0,0 +1,198 @@
+'use client'
+
+import { useState, useEffect, useRef } from 'react'
+const KLAUSUR_API = '/klausur-api'
+
+interface PageSplitResult {
+  multi_page: boolean
+  page_count?: number
+  page_splits?: { x: number; y: number; width: number; height: number; page_index: number }[]
+  sub_sessions?: { id: string; name: string; page_index: number }[]
+  used_original?: boolean
+  duration_seconds?: number
+}
+
+interface StepPageSplitProps {
+  sessionId: string | null
+  sessionName: string
+  onNext: () => void
+  onSplitComplete: (firstChildId: string, firstChildName: string) => void
+}
+
+export function StepPageSplit({ sessionId, sessionName, onNext, onSplitComplete }: StepPageSplitProps) {
+  const [detecting, setDetecting] = useState(false)
+  const [splitResult, setSplitResult] = useState<PageSplitResult | null>(null)
+  const [error, setError] = useState('')
+  const didDetect = useRef(false)
+
+  // Auto-detect page split when step opens
+  useEffect(() => {
+    if (!sessionId || didDetect.current) return
+    didDetect.current = true
+    detectPageSplit()
+  // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [sessionId])
+
+  const detectPageSplit = async () => {
+    if (!sessionId) return
+    setDetecting(true)
+    setError('')
+    try {
+      // First check if this session was already split (status='split')
+      const sessionRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
+      if (sessionRes.ok) {
+        const sessionData = await sessionRes.json()
+        if (sessionData.status === 'split' && sessionData.crop_result?.multi_page) {
+          // Already split — find the child sessions in the session list
+          const listRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
+          if (listRes.ok) {
+            const listData = await listRes.json()
+            // Child sessions have names like "ParentName — Seite N"
+            const baseName = sessionName || sessionData.name || ''
+            const children = (listData.sessions || [])
+              .filter((s: { name?: string }) => s.name?.startsWith(baseName + ' — '))
+              .sort((a: { name: string }, b: { name: string }) => a.name.localeCompare(b.name))
+            if (children.length > 0) {
+              setSplitResult({
+                multi_page: true,
+                page_count: children.length,
+                sub_sessions: children.map((s: { id: string; name: string }, i: number) => ({
+                  id: s.id, name: s.name, page_index: i,
+                })),
+              })
+              onSplitComplete(children[0].id, children[0].name)
+              setDetecting(false)
+              return
+            }
+          }
+        }
+      }
+
+      // Run page-split detection
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/page-split`, {
+        method: 'POST',
+      })
+      if (!res.ok) {
+        const data = await res.json().catch(() => ({}))
+        throw new Error(data.detail || 'Seitentrennung fehlgeschlagen')
+      }
+      const data: PageSplitResult = await res.json()
+      setSplitResult(data)
+
+      if (data.multi_page && data.sub_sessions?.length) {
+        // Rename sub-sessions to "Title — S. 1", "Title — S. 2"
+        const baseName = sessionName || 'Dokument'
+        for (let i = 0; i < data.sub_sessions.length; i++) {
+          const sub = data.sub_sessions[i]
+          const newName = `${baseName} — S. ${i + 1}`
+          await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sub.id}`, {
+            method: 'PUT',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify({ name: newName }),
+          }).catch(() => {})
+          sub.name = newName
+        }
+
+        // Signal parent to switch to the first child session
+        onSplitComplete(data.sub_sessions[0].id, data.sub_sessions[0].name)
+      }
+    } catch (e) {
+      setError(e instanceof Error ? e.message : String(e))
+    } finally {
+      setDetecting(false)
+    }
+  }
+
+  if (!sessionId) return null
+
+  const imageUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/oriented`
+
+  return (
+    <div className="space-y-4">
+      {/* Image */}
+      <div className="relative rounded-lg overflow-hidden bg-gray-100 dark:bg-gray-700">
+        {/* eslint-disable-next-line @next/next/no-img-element */}
+        <img
+          src={imageUrl}
+          alt="Orientiertes Bild"
+          className="w-full object-contain max-h-[500px]"
+          onError={(e) => {
+            // Fallback to non-oriented image
+            (e.target as HTMLImageElement).src =
+              `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image`
+          }}
+        />
+      </div>
+
+      {/* Detection status */}
+      {detecting && (
+        <div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
+          <div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
+          Doppelseiten-Erkennung laeuft...
+        </div>
+      )}
+
+      {/* Detection result */}
+      {splitResult && !detecting && (
+        splitResult.multi_page ? (
+          <div className="bg-blue-50 dark:bg-blue-900/20 rounded-lg border border-blue-200 dark:border-blue-700 p-4 space-y-2">
+            <div className="text-sm font-medium text-blue-700 dark:text-blue-300">
+              Doppelseite erkannt — {splitResult.page_count} Seiten getrennt
+            </div>
+            <p className="text-xs text-blue-600 dark:text-blue-400">
+              Jede Seite wird als eigene Session weiterverarbeitet (eigene Begradigung, Entzerrung, etc.).
+              {splitResult.used_original && ' Trennung auf Originalbild, da Orientierung die Doppelseite gedreht hat.'}
+            </p>
+            <div className="flex gap-2 mt-2">
+              {splitResult.sub_sessions?.map(s => (
+                <span
+                  key={s.id}
+                  className="text-xs px-2.5 py-1 rounded-md bg-blue-100 dark:bg-blue-800/40 text-blue-700 dark:text-blue-300 font-medium"
+                >
+                  {s.name}
+                </span>
+              ))}
+            </div>
+            {splitResult.duration_seconds != null && (
+              <div className="text-xs text-gray-400">{splitResult.duration_seconds.toFixed(1)}s</div>
+            )}
+          </div>
+        ) : (
+          <div className="bg-green-50 dark:bg-green-900/20 rounded-lg border border-green-200 dark:border-green-800 p-4">
+            <div className="flex items-center gap-2 text-sm font-medium text-green-700 dark:text-green-300">
+              <span>&#10003;</span> Einzelseite — keine Trennung noetig
+            </div>
+            {splitResult.duration_seconds != null && (
+              <div className="text-xs text-gray-400 mt-1">{splitResult.duration_seconds.toFixed(1)}s</div>
+            )}
+          </div>
+        )
+      )}
+
+      {/* Error */}
+      {error && (
+        <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
+          {error}
+          <button
+            onClick={() => { didDetect.current = false; detectPageSplit() }}
+            className="ml-2 text-teal-600 hover:underline"
+          >
+            Erneut versuchen
+          </button>
+        </div>
+      )}
+
+      {/* Next button — only show when detection is done */}
+      {(splitResult || error) && !detecting && (
+        <div className="flex justify-end">
+          <button
+            onClick={onNext}
+            className="px-6 py-2.5 bg-teal-600 text-white text-sm font-medium rounded-lg hover:bg-teal-700 transition-colors"
+          >
+            Weiter &rarr;
+          </button>
+        </div>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-kombi/StepStructure.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepStructure.tsx
@@ -0,0 +1,13 @@
+'use client'
+
+import { StepStructureDetection } from '@/components/ocr-pipeline/StepStructureDetection'
+
+interface StepStructureProps {
+  sessionId: string | null
+  onNext: () => void
+}
+
+/** Thin wrapper around the shared StepStructureDetection component */
+export function StepStructure({ sessionId, onNext }: StepStructureProps) {
+  return <StepStructureDetection sessionId={sessionId} onNext={onNext} />
+}
--- a/admin-lehrer/components/ocr-kombi/StepUpload.tsx
+++ b/admin-lehrer/components/ocr-kombi/StepUpload.tsx
@@ -0,0 +1,303 @@
+'use client'
+
+import { useState, useCallback, useEffect } from 'react'
+import { DOCUMENT_CATEGORIES, type DocumentCategory } from '@/app/(admin)/ai/ocr-pipeline/types'
+
+const KLAUSUR_API = '/klausur-api'
+
+interface StepUploadProps {
+  sessionId: string | null
+  onUploaded: (sessionId: string, name: string) => void
+  onNext: () => void
+}
+
+export function StepUpload({ sessionId, onUploaded, onNext }: StepUploadProps) {
+  const [dragging, setDragging] = useState(false)
+  const [uploading, setUploading] = useState(false)
+  const [selectedFile, setSelectedFile] = useState<File | null>(null)
+  const [preview, setPreview] = useState<string | null>(null)
+  const [title, setTitle] = useState('')
+  const [category, setCategory] = useState<DocumentCategory>('vokabelseite')
+  const [error, setError] = useState('')
+
+  // Clean up preview URL on unmount
+  useEffect(() => {
+    return () => { if (preview) URL.revokeObjectURL(preview) }
+  }, [preview])
+
+  const handleFileSelect = useCallback((file: File) => {
+    setSelectedFile(file)
+    setError('')
+    if (file.type.startsWith('image/')) {
+      setPreview(URL.createObjectURL(file))
+    } else {
+      setPreview(null)
+    }
+    // Auto-fill title from filename if empty
+    if (!title.trim()) {
+      setTitle(file.name.replace(/\.[^.]+$/, ''))
+    }
+  }, [title])
+
+  const handleUpload = useCallback(async () => {
+    if (!selectedFile) return
+    setUploading(true)
+    setError('')
+
+    try {
+      const formData = new FormData()
+      formData.append('file', selectedFile)
+      if (title.trim()) formData.append('name', title.trim())
+
+      const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`, {
+        method: 'POST',
+        body: formData,
+      })
+
+      if (!res.ok) {
+        const data = await res.json().catch(() => ({}))
+        throw new Error(data.detail || `Upload fehlgeschlagen (${res.status})`)
+      }
+
+      const data = await res.json()
+      const sid = data.session_id || data.id
+
+      // Set category
+      if (category) {
+        await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
+          method: 'PUT',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({ document_category: category }),
+        })
+      }
+
+      onUploaded(sid, title.trim() || selectedFile.name)
+    } catch (e) {
+      setError(e instanceof Error ? e.message : String(e))
+    } finally {
+      setUploading(false)
+    }
+  }, [selectedFile, title, category, onUploaded])
+
+  const handleDrop = useCallback((e: React.DragEvent) => {
+    e.preventDefault()
+    setDragging(false)
+    const file = e.dataTransfer.files[0]
+    if (file) handleFileSelect(file)
+  }, [handleFileSelect])
+
+  const handleInputChange = useCallback((e: React.ChangeEvent<HTMLInputElement>) => {
+    const file = e.target.files?.[0]
+    if (file) handleFileSelect(file)
+  }, [handleFileSelect])
+
+  const clearFile = useCallback(() => {
+    setSelectedFile(null)
+    if (preview) URL.revokeObjectURL(preview)
+    setPreview(null)
+  }, [preview])
+
+  // ---- Phase 2: Uploaded → show result + "Weiter" ----
+  if (sessionId) {
+    return (
+      <div className="space-y-4">
+        <div className="bg-green-50 dark:bg-green-900/20 border border-green-200 dark:border-green-800 rounded-lg p-4">
+          <div className="flex items-center gap-2 text-green-700 dark:text-green-300 text-sm font-medium mb-3">
+            <span>&#10003;</span> Dokument hochgeladen
+          </div>
+          <div className="flex gap-4">
+            <div className="w-48 h-64 rounded-lg overflow-hidden bg-gray-100 dark:bg-gray-700 flex-shrink-0 border border-gray-200 dark:border-gray-600">
+              {/* eslint-disable-next-line @next/next/no-img-element */}
+              <img
+                src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image`}
+                alt="Hochgeladenes Dokument"
+                className="w-full h-full object-contain"
+                onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
+              />
+            </div>
+            <div className="text-sm text-gray-600 dark:text-gray-400">
+              <div className="font-medium text-gray-700 dark:text-gray-300 mb-1">
+                {title || 'Dokument'}
+              </div>
+              <div className="text-xs text-gray-400 mt-1">
+                Kategorie: {DOCUMENT_CATEGORIES.find(c => c.value === category)?.label || category}
+              </div>
+              <div className="text-xs font-mono text-gray-400 mt-1">
+                Session: {sessionId.slice(0, 8)}...
+              </div>
+            </div>
+          </div>
+        </div>
+
+        <div className="flex justify-end">
+          <button
+            onClick={onNext}
+            className="px-6 py-2.5 bg-teal-600 text-white text-sm font-medium rounded-lg hover:bg-teal-700 transition-colors"
+          >
+            Weiter &rarr;
+          </button>
+        </div>
+      </div>
+    )
+  }
+
+  // ---- Phase 1b: File selected → preview + "Hochladen" ----
+  if (selectedFile) {
+    return (
+      <div className="space-y-4">
+        {/* Title input */}
+        <div>
+          <label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
+            Titel
+          </label>
+          <input
+            type="text"
+            value={title}
+            onChange={(e) => setTitle(e.target.value)}
+            placeholder="z.B. Vokabeln Unit 3"
+            className="w-full px-3 py-2 border border-gray-300 dark:border-gray-600 rounded-lg bg-white dark:bg-gray-800 text-sm"
+          />
+        </div>
+
+        {/* Category selector */}
+        <div>
+          <label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
+            Kategorie
+          </label>
+          <div className="grid grid-cols-4 gap-1.5">
+            {DOCUMENT_CATEGORIES.map(cat => (
+              <button
+                key={cat.value}
+                onClick={() => setCategory(cat.value)}
+                className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
+                  category === cat.value
+                    ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300 ring-1 ring-teal-400'
+                    : 'bg-gray-50 dark:bg-gray-700 text-gray-600 dark:text-gray-400 hover:bg-gray-100'
+                }`}
+              >
+                {cat.icon} {cat.label}
+              </button>
+            ))}
+          </div>
+        </div>
+
+        {/* File preview */}
+        <div className="border border-gray-200 dark:border-gray-700 rounded-xl p-4">
+          <div className="flex items-start gap-4">
+            {preview ? (
+              <div className="w-36 h-48 rounded-lg overflow-hidden bg-gray-100 dark:bg-gray-700 flex-shrink-0 border border-gray-200 dark:border-gray-600">
+                {/* eslint-disable-next-line @next/next/no-img-element */}
+                <img src={preview} alt="Vorschau" className="w-full h-full object-contain" />
+              </div>
+            ) : (
+              <div className="w-36 h-48 rounded-lg bg-gray-100 dark:bg-gray-700 flex-shrink-0 flex items-center justify-center border border-gray-200 dark:border-gray-600">
+                <span className="text-3xl">&#128196;</span>
+              </div>
+            )}
+            <div className="flex-1 min-w-0">
+              <div className="font-medium text-sm text-gray-700 dark:text-gray-300 truncate">
+                {selectedFile.name}
+              </div>
+              <div className="text-xs text-gray-400 mt-1">
+                {(selectedFile.size / 1024 / 1024).toFixed(1)} MB
+              </div>
+              <button
+                onClick={clearFile}
+                className="text-xs text-red-500 hover:text-red-700 mt-2"
+              >
+                Andere Datei waehlen
+              </button>
+            </div>
+          </div>
+
+          <button
+            onClick={handleUpload}
+            disabled={uploading}
+            className="mt-4 w-full px-4 py-2.5 bg-teal-600 text-white text-sm font-medium rounded-lg hover:bg-teal-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
+          >
+            {uploading ? 'Wird hochgeladen...' : 'Hochladen'}
+          </button>
+        </div>
+
+        {error && (
+          <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
+            {error}
+          </div>
+        )}
+      </div>
+    )
+  }
+
+  // ---- Phase 1a: No file → drop zone ----
+  return (
+    <div className="space-y-4">
+      {/* Title input */}
+      <div>
+        <label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
+          Titel (optional)
+        </label>
+        <input
+          type="text"
+          value={title}
+          onChange={(e) => setTitle(e.target.value)}
+          placeholder="z.B. Vokabeln Unit 3"
+          className="w-full px-3 py-2 border border-gray-300 dark:border-gray-600 rounded-lg bg-white dark:bg-gray-800 text-sm"
+        />
+      </div>
+
+      {/* Category selector */}
+      <div>
+        <label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
+          Kategorie
+        </label>
+        <div className="grid grid-cols-4 gap-1.5">
+          {DOCUMENT_CATEGORIES.map(cat => (
+            <button
+              key={cat.value}
+              onClick={() => setCategory(cat.value)}
+              className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
+                category === cat.value
+                  ? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300 ring-1 ring-teal-400'
+                  : 'bg-gray-50 dark:bg-gray-700 text-gray-600 dark:text-gray-400 hover:bg-gray-100'
+              }`}
+            >
+              {cat.icon} {cat.label}
+            </button>
+          ))}
+        </div>
+      </div>
+
+      {/* Drop zone */}
+      <div
+        onDragOver={(e) => { e.preventDefault(); setDragging(true) }}
+        onDragLeave={() => setDragging(false)}
+        onDrop={handleDrop}
+        className={`border-2 border-dashed rounded-xl p-12 text-center transition-colors ${
+          dragging
+            ? 'border-teal-400 bg-teal-50 dark:bg-teal-900/20'
+            : 'border-gray-300 dark:border-gray-600 hover:border-gray-400'
+        }`}
+      >
+        <div className="text-4xl mb-3">&#128228;</div>
+        <div className="text-sm text-gray-600 dark:text-gray-400 mb-2">
+          Bild oder PDF hierher ziehen
+        </div>
+        <label className="inline-block px-4 py-2 bg-teal-600 text-white text-sm rounded-lg cursor-pointer hover:bg-teal-700">
+          Datei auswaehlen
+          <input
+            type="file"
+            accept="image/*,.pdf"
+            onChange={handleInputChange}
+            className="hidden"
+          />
+        </label>
+      </div>
+
+      {error && (
+        <div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
+          {error}
+        </div>
+      )}
+    </div>
+  )
+}
--- a/admin-lehrer/components/ocr-pipeline/BoxSessionTabs.tsx
+++ b/admin-lehrer/components/ocr-pipeline/BoxSessionTabs.tsx
@@ -21,6 +21,7 @@ function getStatusIcon(sub: SubSession): string {
  return STATUS_ICONS.pending
 }

+/** Tabs for box sub-sessions (from column detection zone_type='box'). */
 export function BoxSessionTabs({ parentSessionId, subSessions, activeSessionId, onSessionChange }: BoxSessionTabsProps) {
  if (subSessions.length === 0) return null

@@ -28,7 +29,6 @@ export function BoxSessionTabs({ parentSessionId, subSessions, activeSessionId,

  return (
    <div className="flex items-center gap-1.5 px-1 py-1.5 bg-gray-50 dark:bg-gray-800/50 rounded-xl border border-gray-200 dark:border-gray-700">
-      {/* Main session tab */}
      <button
        onClick={() => onSessionChange(parentSessionId)}
        className={`px-3 py-1.5 rounded-lg text-xs font-medium transition-colors ${
@@ -42,7 +42,6 @@ export function BoxSessionTabs({ parentSessionId, subSessions, activeSessionId,

      <div className="w-px h-5 bg-gray-200 dark:bg-gray-700" />

-      {/* Sub-session tabs */}
      {subSessions.map((sub) => {
        const isActive = activeSessionId === sub.id
        const icon = getStatusIcon(sub)
@@ -59,7 +58,7 @@ export function BoxSessionTabs({ parentSessionId, subSessions, activeSessionId,
            title={sub.name}
          >
            <span className="mr-1">{icon}</span>
-            Seite {sub.box_index + 1}
+            Box {sub.box_index + 1}
          </button>
        )
      })}
--- a/admin-lehrer/components/ocr-pipeline/StepGridReview.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepGridReview.tsx
@@ -57,6 +57,10 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
    toggleSelectedBold,
    autoCorrectColumnPatterns,
    setCellColor,
+    ipaMode,
+    setIpaMode,
+    syllableMode,
+    setSyllableMode,
  } = useGridEditor(sessionId)

  const [showImage, setShowImage] = useState(true)
@@ -231,6 +235,11 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
            Woerterbuch ({Math.round(grid.dictionary_detection.confidence * 100)}%)
          </span>
        )}
+        {grid.page_number?.text && (
+          <span className="px-1.5 py-0.5 rounded bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 border border-gray-200 dark:border-gray-600">
+            S. {grid.page_number.number ?? grid.page_number.text}
+          </span>
+        )}
        {lowConfCells.length > 0 && (
          <span className="px-2 py-0.5 rounded-full bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 border border-red-200 dark:border-red-800">
            {lowConfCells.length} niedrige Konfidenz
@@ -283,11 +292,15 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
          canUndo={canUndo}
          canRedo={canRedo}
          showOverlay={false}
+          ipaMode={ipaMode}
+          syllableMode={syllableMode}
          onSave={saveGrid}
          onUndo={undo}
          onRedo={redo}
          onRebuild={buildGrid}
          onToggleOverlay={() => setShowImage(!showImage)}
+          onIpaModeChange={setIpaMode}
+          onSyllableModeChange={setSyllableMode}
        />
      </div>

--- a/admin-lehrer/components/ocr-pipeline/StepOrientation.tsx
+++ b/admin-lehrer/components/ocr-pipeline/StepOrientation.tsx
@@ -1,7 +1,7 @@
 'use client'

 import { useCallback, useEffect, useState } from 'react'
-import type { OrientationResult, SessionInfo, SubSession } from '@/app/(admin)/ai/ocr-pipeline/types'
+import type { OrientationResult, SessionInfo } from '@/app/(admin)/ai/ocr-pipeline/types'
 import { ImageCompareView } from './ImageCompareView'

 const KLAUSUR_API = '/klausur-api'
@@ -17,10 +17,10 @@ interface PageSplitResult {
 interface StepOrientationProps {
  sessionId?: string | null
  onNext: (sessionId: string) => void
-  onSubSessionsCreated?: (subs: SubSession[]) => void
+  onSessionList?: () => void
 }

-export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSessionsCreated }: StepOrientationProps) {
+export function StepOrientation({ sessionId: existingSessionId, onNext, onSessionList }: StepOrientationProps) {
  const [session, setSession] = useState<SessionInfo | null>(null)
  const [orientationResult, setOrientationResult] = useState<OrientationResult | null>(null)
  const [pageSplitResult, setPageSplitResult] = useState<PageSplitResult | null>(null)
@@ -30,7 +30,7 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes
  const [dragOver, setDragOver] = useState(false)
  const [sessionName, setSessionName] = useState('')

-  // Reload session data when navigating back
+  // Reload session data when navigating back — auto-trigger orientation if missing
  useEffect(() => {
    if (!existingSessionId || session) return

@@ -51,6 +51,28 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes

        if (data.orientation_result) {
          setOrientationResult(data.orientation_result)
+        } else {
+          // Session exists but orientation not yet run (e.g. page-split session)
+          // Auto-trigger orientation detection
+          setDetecting(true)
+          try {
+            const orientRes = await fetch(
+              `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${existingSessionId}/orientation`,
+              { method: 'POST' },
+            )
+            if (orientRes.ok) {
+              const orientData = await orientRes.json()
+              setOrientationResult({
+                orientation_degrees: orientData.orientation_degrees,
+                corrected: orientData.corrected,
+                duration_seconds: orientData.duration_seconds,
+              })
+            }
+          } catch (e) {
+            console.error('Auto-orientation failed:', e)
+          } finally {
+            setDetecting(false)
+          }
        }
      } catch (e) {
        console.error('Failed to reload session:', e)
@@ -112,16 +134,6 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes
        if (splitRes.ok) {
          const splitData: PageSplitResult = await splitRes.json()
          setPageSplitResult(splitData)
-          if (splitData.multi_page && splitData.sub_sessions && onSubSessionsCreated) {
-            onSubSessionsCreated(
-              splitData.sub_sessions.map((s) => ({
-                id: s.id,
-                name: s.name,
-                box_index: s.page_index,
-                current_step: splitData.used_original ? 1 : 2,
-              }))
-            )
-          }
        }
      } catch (e) {
        console.error('Page-split detection failed:', e)
@@ -133,7 +145,7 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes
      setUploading(false)
      setDetecting(false)
    }
-  }, [sessionName, onSubSessionsCreated])
+  }, [sessionName])

  const handleDrop = useCallback((e: React.DragEvent) => {
    e.preventDefault()
@@ -264,10 +276,10 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes
      {pageSplitResult?.multi_page && (
        <div className="bg-blue-50 dark:bg-blue-900/20 rounded-lg border border-blue-200 dark:border-blue-700 p-4">
          <div className="text-sm font-medium text-blue-700 dark:text-blue-300">
-            Doppelseite erkannt — {pageSplitResult.page_count} Seiten
+            Doppelseite erkannt — {pageSplitResult.page_count} unabhaengige Sessions erstellt
          </div>
          <p className="text-xs text-blue-600 dark:text-blue-400 mt-1">
-            Jede Seite wird einzeln durch die Pipeline (Begradigung, Entzerrung, Zuschnitt, ...) verarbeitet.
+            Jede Seite wird als eigene Session durch die Pipeline verarbeitet.
            {pageSplitResult.used_original && ' (Seitentrennung auf dem Originalbild, da die Orientierung die Doppelseite gedreht hat.)'}
          </p>
          <div className="flex gap-2 mt-2">
@@ -286,12 +298,21 @@ export function StepOrientation({ sessionId: existingSessionId, onNext, onSubSes
      {/* Next button */}
      {orientationResult && (
        <div className="flex justify-end">
-          <button
-            onClick={() => onNext(session.session_id)}
-            className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium transition-colors"
-          >
-            {pageSplitResult?.multi_page ? 'Seiten verarbeiten' : 'Weiter'} &rarr;
-          </button>
+          {pageSplitResult?.multi_page ? (
+            <button
+              onClick={() => onSessionList?.()}
+              className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium transition-colors"
+            >
+              Zur Session-Liste &rarr;
+            </button>
+          ) : (
+            <button
+              onClick={() => onNext(session.session_id)}
+              className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium transition-colors"
+            >
+              Weiter &rarr;
+            </button>
+          )}
        </div>
      )}

--- a/admin-lehrer/lib/navigation.ts
+++ b/admin-lehrer/lib/navigation.ts
@@ -150,9 +150,18 @@ export const navigation: NavCategory[] = [
        audience: ['Entwickler', 'Data Scientists'],
        subgroup: 'KI-Werkzeuge',
      },
+      {
+        id: 'ocr-kombi',
+        name: 'OCR Kombi',
+        href: '/ai/ocr-kombi',
+        description: 'Modulare 11-Schritt-Pipeline',
+        purpose: 'Modulare OCR-Pipeline mit Dual-Engine (PP-OCRv5 + Tesseract), Strukturerkennung, Grid-Aufbau und Review. Multi-Page-Dokument-Unterstuetzung.',
+        audience: ['Entwickler'],
+        subgroup: 'KI-Werkzeuge',
+      },
      {
        id: 'ocr-overlay',
-        name: 'OCR Overlay',
+        name: 'OCR Overlay (Legacy)',
        href: '/ai/ocr-overlay',
        description: 'Ganzseitige Overlay-Rekonstruktion',
        purpose: 'Arbeitsblatt ohne Spaltenerkennung direkt als Overlay rekonstruieren. Vereinfachte 7-Schritt-Pipeline.',
--- a/admin-lehrer/package-lock.json
+++ b/admin-lehrer/package-lock.json
--- a/admin-lehrer/package.json
+++ b/admin-lehrer/package.json
@@ -18,6 +18,8 @@
    "test:all": "vitest run && playwright test --project=chromium"
  },
  "dependencies": {
+    "@fortune-sheet/react": "^1.0.4",
+    "fabric": "^6.0.0",
    "jspdf": "^4.1.0",
    "jszip": "^3.10.1",
    "lucide-react": "^0.468.0",
@@ -26,7 +28,6 @@
    "react-dom": "^18.3.1",
    "reactflow": "^11.11.4",
    "recharts": "^2.15.0",
-    "fabric": "^6.0.0",
    "uuid": "^13.0.0"
  },
  "devDependencies": {
--- a/backend-lehrer/learning_units_api.py
+++ b/backend-lehrer/learning_units_api.py
@@ -1,5 +1,9 @@
 from typing import List, Dict, Any, Optional
 from datetime import datetime
+from pathlib import Path
+import json
+import os
+import logging

 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel
@@ -15,6 +19,8 @@ from learning_units import (
    delete_learning_unit,
 )

+logger = logging.getLogger(__name__)
+

 router = APIRouter(
    prefix="/learning-units",
@@ -49,6 +55,11 @@ class RemoveWorksheetPayload(BaseModel):
    worksheet_file: str


+class GenerateFromAnalysisPayload(BaseModel):
+    analysis_data: Dict[str, Any]
+    num_questions: int = 8
+
+
 # ---------- Hilfsfunktion: Backend-Modell -> Frontend-Objekt ----------


@@ -195,3 +206,171 @@ def api_delete_learning_unit(unit_id: str):
        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
    return {"status": "deleted", "id": unit_id}

+
+# ---------- Generator-Endpunkte ----------
+
+LERNEINHEITEN_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten")
+
+
+def _save_analysis_and_get_path(unit_id: str, analysis_data: Dict[str, Any]) -> Path:
+    """Save analysis_data to disk and return the path."""
+    os.makedirs(LERNEINHEITEN_DIR, exist_ok=True)
+    path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_analyse.json"
+    with open(path, "w", encoding="utf-8") as f:
+        json.dump(analysis_data, f, ensure_ascii=False, indent=2)
+    return path
+
+
+@router.post("/{unit_id}/generate-qa")
+def api_generate_qa(unit_id: str, payload: GenerateFromAnalysisPayload):
+    """Generate Q&A items with Leitner fields from analysis data."""
+    lu = get_learning_unit(unit_id)
+    if not lu:
+        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
+
+    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
+
+    try:
+        from ai_processing.qa_generator import generate_qa_from_analysis
+        qa_path = generate_qa_from_analysis(analysis_path, num_questions=payload.num_questions)
+        with open(qa_path, "r", encoding="utf-8") as f:
+            qa_data = json.load(f)
+
+        # Update unit status
+        update_learning_unit(unit_id, LearningUnitUpdate(status="qa_generated"))
+        logger.info(f"Generated QA for unit {unit_id}: {len(qa_data.get('qa_items', []))} items")
+        return qa_data
+    except Exception as e:
+        logger.error(f"QA generation failed for {unit_id}: {e}")
+        raise HTTPException(status_code=500, detail=f"QA-Generierung fehlgeschlagen: {e}")
+
+
+@router.post("/{unit_id}/generate-mc")
+def api_generate_mc(unit_id: str, payload: GenerateFromAnalysisPayload):
+    """Generate multiple choice questions from analysis data."""
+    lu = get_learning_unit(unit_id)
+    if not lu:
+        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
+
+    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
+
+    try:
+        from ai_processing.mc_generator import generate_mc_from_analysis
+        mc_path = generate_mc_from_analysis(analysis_path, num_questions=payload.num_questions)
+        with open(mc_path, "r", encoding="utf-8") as f:
+            mc_data = json.load(f)
+
+        update_learning_unit(unit_id, LearningUnitUpdate(status="mc_generated"))
+        logger.info(f"Generated MC for unit {unit_id}: {len(mc_data.get('questions', []))} questions")
+        return mc_data
+    except Exception as e:
+        logger.error(f"MC generation failed for {unit_id}: {e}")
+        raise HTTPException(status_code=500, detail=f"MC-Generierung fehlgeschlagen: {e}")
+
+
+@router.post("/{unit_id}/generate-cloze")
+def api_generate_cloze(unit_id: str, payload: GenerateFromAnalysisPayload):
+    """Generate cloze (fill-in-the-blank) items from analysis data."""
+    lu = get_learning_unit(unit_id)
+    if not lu:
+        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
+
+    analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
+
+    try:
+        from ai_processing.cloze_generator import generate_cloze_from_analysis
+        cloze_path = generate_cloze_from_analysis(analysis_path)
+        with open(cloze_path, "r", encoding="utf-8") as f:
+            cloze_data = json.load(f)
+
+        update_learning_unit(unit_id, LearningUnitUpdate(status="cloze_generated"))
+        logger.info(f"Generated Cloze for unit {unit_id}: {len(cloze_data.get('cloze_items', []))} items")
+        return cloze_data
+    except Exception as e:
+        logger.error(f"Cloze generation failed for {unit_id}: {e}")
+        raise HTTPException(status_code=500, detail=f"Cloze-Generierung fehlgeschlagen: {e}")
+
+
+@router.get("/{unit_id}/qa")
+def api_get_qa(unit_id: str):
+    """Get generated QA items for a unit."""
+    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
+    if not qa_path.exists():
+        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
+    with open(qa_path, "r", encoding="utf-8") as f:
+        return json.load(f)
+
+
+@router.get("/{unit_id}/mc")
+def api_get_mc(unit_id: str):
+    """Get generated MC questions for a unit."""
+    mc_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_mc.json"
+    if not mc_path.exists():
+        raise HTTPException(status_code=404, detail="Keine MC-Daten gefunden.")
+    with open(mc_path, "r", encoding="utf-8") as f:
+        return json.load(f)
+
+
+@router.get("/{unit_id}/cloze")
+def api_get_cloze(unit_id: str):
+    """Get generated cloze items for a unit."""
+    cloze_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_cloze.json"
+    if not cloze_path.exists():
+        raise HTTPException(status_code=404, detail="Keine Cloze-Daten gefunden.")
+    with open(cloze_path, "r", encoding="utf-8") as f:
+        return json.load(f)
+
+
+@router.post("/{unit_id}/leitner/update")
+def api_update_leitner(unit_id: str, item_id: str, correct: bool):
+    """Update Leitner progress for a QA item."""
+    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
+    if not qa_path.exists():
+        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
+    try:
+        from ai_processing.qa_generator import update_leitner_progress
+        result = update_leitner_progress(qa_path, item_id, correct)
+        return result
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@router.get("/{unit_id}/leitner/next")
+def api_get_next_review(unit_id: str, limit: int = 5):
+    """Get next Leitner review items."""
+    qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
+    if not qa_path.exists():
+        raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
+    try:
+        from ai_processing.qa_generator import get_next_review_items
+        items = get_next_review_items(qa_path, limit=limit)
+        return {"items": items, "count": len(items)}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+class StoryGeneratePayload(BaseModel):
+    vocabulary: List[Dict[str, Any]]
+    language: str = "en"
+    grade_level: str = "5-8"
+
+
+@router.post("/{unit_id}/generate-story")
+def api_generate_story(unit_id: str, payload: StoryGeneratePayload):
+    """Generate a short story using vocabulary words."""
+    lu = get_learning_unit(unit_id)
+    if not lu:
+        raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
+
+    try:
+        from story_generator import generate_story
+        result = generate_story(
+            vocabulary=payload.vocabulary,
+            language=payload.language,
+            grade_level=payload.grade_level,
+        )
+        return result
+    except Exception as e:
+        logger.error(f"Story generation failed for {unit_id}: {e}")
+        raise HTTPException(status_code=500, detail=f"Story-Generierung fehlgeschlagen: {e}")
+
--- a/backend-lehrer/main.py
+++ b/backend-lehrer/main.py
@@ -106,6 +106,10 @@ app.include_router(correction_router, prefix="/api")
 from learning_units_api import router as learning_units_router
 app.include_router(learning_units_router, prefix="/api")

+# --- 4b. Learning Progress ---
+from progress_api import router as progress_router
+app.include_router(progress_router, prefix="/api")
+
 from unit_api import router as unit_router
 app.include_router(unit_router)  # Already has /api/units prefix

--- a/backend-lehrer/progress_api.py
+++ b/backend-lehrer/progress_api.py
@@ -0,0 +1,131 @@
+"""
+Progress API — Tracks student learning progress per unit.
+
+Stores coins, crowns, streak data, and exercise completion stats.
+Uses JSON file storage (same pattern as learning_units.py).
+"""
+
+import os
+import json
+import logging
+from datetime import datetime, date
+from typing import Dict, Any, Optional, List
+from pathlib import Path
+
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(
+    prefix="/progress",
+    tags=["progress"],
+)
+
+PROGRESS_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten/progress")
+
+
+def _ensure_dir():
+    os.makedirs(PROGRESS_DIR, exist_ok=True)
+
+
+def _progress_path(unit_id: str) -> Path:
+    return Path(PROGRESS_DIR) / f"{unit_id}.json"
+
+
+def _load_progress(unit_id: str) -> Dict[str, Any]:
+    path = _progress_path(unit_id)
+    if path.exists():
+        with open(path, "r", encoding="utf-8") as f:
+            return json.load(f)
+    return {
+        "unit_id": unit_id,
+        "coins": 0,
+        "crowns": 0,
+        "streak_days": 0,
+        "last_activity": None,
+        "exercises": {
+            "flashcards": {"completed": 0, "correct": 0, "incorrect": 0},
+            "quiz": {"completed": 0, "correct": 0, "incorrect": 0},
+            "type": {"completed": 0, "correct": 0, "incorrect": 0},
+            "story": {"generated": 0},
+        },
+        "created_at": datetime.now().isoformat(),
+    }
+
+
+def _save_progress(unit_id: str, data: Dict[str, Any]):
+    _ensure_dir()
+    path = _progress_path(unit_id)
+    with open(path, "w", encoding="utf-8") as f:
+        json.dump(data, f, ensure_ascii=False, indent=2)
+
+
+class RewardPayload(BaseModel):
+    exercise_type: str  # flashcards, quiz, type, story
+    correct: bool = True
+    first_try: bool = True
+
+
+@router.get("/{unit_id}")
+def get_progress(unit_id: str):
+    """Get learning progress for a unit."""
+    return _load_progress(unit_id)
+
+
+@router.post("/{unit_id}/reward")
+def add_reward(unit_id: str, payload: RewardPayload):
+    """Record an exercise result and award coins."""
+    progress = _load_progress(unit_id)
+
+    # Update exercise stats
+    ex = progress["exercises"].get(payload.exercise_type, {"completed": 0, "correct": 0, "incorrect": 0})
+    ex["completed"] = ex.get("completed", 0) + 1
+    if payload.correct:
+        ex["correct"] = ex.get("correct", 0) + 1
+    else:
+        ex["incorrect"] = ex.get("incorrect", 0) + 1
+    progress["exercises"][payload.exercise_type] = ex
+
+    # Award coins
+    if payload.correct:
+        coins = 3 if payload.first_try else 1
+    else:
+        coins = 0
+    progress["coins"] = progress.get("coins", 0) + coins
+
+    # Update streak
+    today = date.today().isoformat()
+    last = progress.get("last_activity")
+    if last != today:
+        if last == (date.today().replace(day=date.today().day - 1)).isoformat() if date.today().day > 1 else None:
+            progress["streak_days"] = progress.get("streak_days", 0) + 1
+        elif last != today:
+            progress["streak_days"] = 1
+        progress["last_activity"] = today
+
+    # Award crowns for milestones
+    total_correct = sum(
+        e.get("correct", 0) for e in progress["exercises"].values() if isinstance(e, dict)
+    )
+    progress["crowns"] = total_correct // 20  # 1 crown per 20 correct answers
+
+    _save_progress(unit_id, progress)
+
+    return {
+        "coins_awarded": coins,
+        "total_coins": progress["coins"],
+        "crowns": progress["crowns"],
+        "streak_days": progress["streak_days"],
+    }
+
+
+@router.get("/")
+def list_all_progress():
+    """List progress for all units."""
+    _ensure_dir()
+    results = []
+    for f in Path(PROGRESS_DIR).glob("*.json"):
+        with open(f, "r", encoding="utf-8") as fh:
+            results.append(json.load(fh))
+    return results
--- a/backend-lehrer/story_generator.py
+++ b/backend-lehrer/story_generator.py
@@ -0,0 +1,108 @@
+"""
+Story Generator — Creates short stories using vocabulary words.
+
+Generates age-appropriate mini-stories (3-5 sentences) that incorporate
+the given vocabulary words, marked with <mark> tags for highlighting.
+
+Uses Ollama (local LLM) for generation.
+"""
+
+import os
+import json
+import logging
+import requests
+from typing import List, Dict, Any, Optional
+
+logger = logging.getLogger(__name__)
+
+OLLAMA_URL = os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434")
+STORY_MODEL = os.getenv("STORY_MODEL", "llama3.1:8b")
+
+
+def generate_story(
+    vocabulary: List[Dict[str, str]],
+    language: str = "en",
+    grade_level: str = "5-8",
+    max_words: int = 5,
+) -> Dict[str, Any]:
+    """
+    Generate a short story incorporating vocabulary words.
+
+    Args:
+        vocabulary: List of dicts with 'english' and 'german' keys
+        language: 'en' for English story, 'de' for German story
+        grade_level: Target grade level
+        max_words: Maximum vocab words to include (to keep story short)
+
+    Returns:
+        Dict with 'story_html', 'story_text', 'vocab_used', 'language'
+    """
+    # Select subset of vocabulary
+    words = vocabulary[:max_words]
+    word_list = [w.get("english", "") if language == "en" else w.get("german", "") for w in words]
+    word_list = [w for w in word_list if w.strip()]
+
+    if not word_list:
+        return {"story_html": "", "story_text": "", "vocab_used": [], "language": language}
+
+    lang_name = "English" if language == "en" else "German"
+    words_str = ", ".join(word_list)
+
+    prompt = f"""Write a short story (3-5 sentences) in {lang_name} for a grade {grade_level} student.
+The story MUST use these vocabulary words: {words_str}
+
+Rules:
+1. The story should be fun and age-appropriate
+2. Each vocabulary word must appear at least once
+3. Keep sentences simple and clear
+4. The story should make sense and be engaging
+
+Write ONLY the story, nothing else. No title, no introduction."""
+
+    try:
+        resp = requests.post(
+            f"{OLLAMA_URL}/api/generate",
+            json={
+                "model": STORY_MODEL,
+                "prompt": prompt,
+                "stream": False,
+                "options": {"temperature": 0.8, "num_predict": 300},
+            },
+            timeout=30,
+        )
+        resp.raise_for_status()
+        story_text = resp.json().get("response", "").strip()
+    except Exception as e:
+        logger.error(f"Story generation failed: {e}")
+        # Fallback: simple template story
+        story_text = _fallback_story(word_list, language)
+
+    # Mark vocabulary words in the story
+    story_html = story_text
+    vocab_found = []
+    for word in word_list:
+        if word.lower() in story_html.lower():
+            # Case-insensitive replacement preserving original case
+            import re
+            pattern = re.compile(re.escape(word), re.IGNORECASE)
+            story_html = pattern.sub(
+                lambda m: f'<mark class="vocab-highlight">{m.group()}</mark>',
+                story_html,
+                count=1,
+            )
+            vocab_found.append(word)
+
+    return {
+        "story_html": story_html,
+        "story_text": story_text,
+        "vocab_used": vocab_found,
+        "vocab_total": len(word_list),
+        "language": language,
+    }
+
+
+def _fallback_story(words: List[str], language: str) -> str:
+    """Simple fallback when LLM is unavailable."""
+    if language == "de":
+        return f"Heute habe ich neue Woerter gelernt: {', '.join(words)}. Es war ein guter Tag zum Lernen."
+    return f"Today I learned new words: {', '.join(words)}. It was a great day for learning."
--- a/docs-src/services/klausur-service/OCR-Kombi-Pipeline.md
+++ b/docs-src/services/klausur-service/OCR-Kombi-Pipeline.md
@@ -0,0 +1,253 @@
+# OCR Kombi Pipeline - Modulare 11-Schritt-Architektur
+
+**Version:** 1.0.0
+**Status:** Phase 1 implementiert (Grundgeruest + DB)
+**URL:** https://macmini:3002/ai/ocr-kombi
+
+## Uebersicht
+
+Die OCR Kombi Pipeline ist der Nachfolger des OCR-Overlay-Monolithen (`/ai/ocr-overlay`).
+Sie zerlegt den OCR-Prozess in **11 modulare Schritte** mit je einer eigenen Komponente
+pro Frontend- und Backend-Datei. Ziel: schnelles Debugging, klare Verantwortlichkeiten,
+Multi-Page-Dokument-Unterstuetzung.
+
+**Primaerer Modus:** Kombi (PaddleOCR + Tesseract) — der einzige Modus, den der User nutzt.
+
+### Warum ein Refactor?
+
+| Problem (alt) | Loesung (neu) |
+|----------------|---------------|
+| `page.tsx` = 751-Zeilen-Monolith mit 3 Modi | `page.tsx` = ~140-Zeilen-Orchestrator, je 1 Datei pro Step |
+| Upload, Orientierung und Page-Split in einem Step | 3 separate Steps mit eigener Logik |
+| Keine Multi-Page-Dokument-Unterstuetzung | `document_group_id` + `page_number` auf DB-Ebene |
+| OCR intransparent (eine Blackbox) | 3-Phasen-Fortschritt + Engine-Attribution pro Wort (geplant) |
+| `grid_editor_api.py` = 1801 Zeilen | 4 Module + Orchestrator (geplant) |
+
+---
+
+## Pipeline-Schritte
+
+| # | Step | Frontend | Backend | Beschreibung |
+|---|------|----------|---------|--------------|
+| 1 | Upload | `StepUpload.tsx` | `step_upload.py` | Bild/PDF hochladen, Titel, Kategorie. Multi-Page-PDF → N Sessions |
+| 2 | Orientierung | `StepOrientation.tsx` | (shared) | Rotation 90/180/270 erkennen + korrigieren |
+| 3 | Seitentrennung | `StepPageSplit.tsx` | (shared) | Doppelseiten erkennen + splitten |
+| 4 | Begradigung | `StepDeskew.tsx` | (shared) | Hough Lines + Word Alignment |
+| 5 | Entzerrung | `StepDewarp.tsx` | (shared) | Shear-Korrektur (Vertikalkanten-Drift) |
+| 6 | Zuschneiden | `StepContentCrop.tsx` | (shared) | Scanner-Raender entfernen (nach Begradigung!) |
+| 7 | OCR | `StepOcr.tsx` | (shared) | Tesseract + PaddleOCR + Merge |
+| 8 | Strukturerkennung | `StepStructure.tsx` | (shared) | Boxen, Zonen, Farben, Grafiken |
+| 9 | Grid-Aufbau | `StepGridBuild.tsx` | (shared) | Strukturiertes Grid aus OCR + Struktur |
+| 10 | Grid-Review | `StepGridReview.tsx` | (shared) | Excel-Editor, IPA, Silben, Korrekturen |
+| 11 | Ground Truth | `StepGroundTruth.tsx` | (shared) | Validierung + GT-Markierung |
+
+!!! note "Crop nach Dewarp"
+    Seitentrennung (Step 3) passiert **vor** Begradigung — richtig, weil jede Haelfte
+    unabhaengig begradigt wird. Der Content-Crop (Step 6) bleibt **nach** Dewarp,
+    weil content-basierter Crop auf geradem Bild besser funktioniert.
+
+---
+
+## Multi-Page-Dokument-Gruppierung
+
+### Problem
+
+Ein Lehrer scannt 10 Vokabelseiten als eine PDF-Datei. Im Endnutzer-Frontend soll das
+ein zusammenhaengendes Dokument sein. Alle Seiten muessen spaeter zu gemeinsamen
+Lern-Units verarbeitet werden koennen.
+
+### Loesung: `document_group_id` + `page_number`
+
+Zwei neue Felder auf `ocr_pipeline_sessions` (Migration `009_add_document_group.sql`):
+
+```sql
+ALTER TABLE ocr_pipeline_sessions
+    ADD COLUMN IF NOT EXISTS document_group_id UUID,
+    ADD COLUMN IF NOT EXISTS page_number INT;
+```
+
+| Upload-Typ | document_group_id | page_number |
+|------------|-------------------|-------------|
+| Einzelbild | neues UUID | 1 |
+| Multi-Page-PDF (10 Seiten) | gleiches UUID fuer alle 10 | 1..10 |
+| Doppelseiten-Split von S. 3 | gleiches UUID | neue S. 3 + S. 4, Rest umkontiert |
+
+### Benennung
+
+Upload-Titel "Vokabeln Unit 3" erzeugt:
+
+- "Vokabeln Unit 3 — S. 1"
+- "Vokabeln Unit 3 — S. 2"
+- ...
+- "Vokabeln Unit 3 — S. 10"
+
+### Session-Liste im Admin
+
+Gruppierte Anzeige: Ein Dokument-Header ("Vokabeln Unit 3, 10 Seiten") mit aufklappbaren
+Einzel-Sessions darunter. Jede Session hat eigenen Pipeline-Status.
+
+---
+
+## API-Endpoints
+
+### Neue Endpoints (OCR Kombi)
+
+| Methode | Pfad | Beschreibung |
+|---------|------|--------------|
+| POST | `/api/v1/ocr-kombi/upload` | Upload: Einzelbild oder Multi-Page-PDF |
+| GET | `/api/v1/ocr-kombi/documents/{group_id}` | Alle Sessions einer Dokumentgruppe |
+
+### Bestehende Endpoints (wiederverwendet)
+
+Die Kombi-Pipeline nutzt alle bestehenden Endpoints aus `/api/v1/ocr-pipeline/`:
+
+| Methode | Pfad | Step |
+|---------|------|------|
+| POST | `/sessions` | Upload (Legacy, Einzelbild) |
+| POST | `/sessions/{id}/orientation` | Orientierung |
+| POST | `/sessions/{id}/page-split` | Seitentrennung |
+| POST | `/sessions/{id}/deskew` | Begradigung |
+| POST | `/sessions/{id}/dewarp` | Entzerrung |
+| POST | `/sessions/{id}/crop` | Zuschneiden |
+| POST | `/sessions/{id}/paddle-kombi` | OCR (Kombi) |
+| POST | `/sessions/{id}/detect-structure` | Strukturerkennung |
+| POST | `/sessions/{id}/build-grid` | Grid-Aufbau |
+| POST | `/sessions/{id}/save-grid` | Grid speichern |
+| GET | `/sessions/{id}/grid-editor` | Grid laden |
+| POST | `/sessions/{id}/mark-ground-truth` | GT markieren |
+
+---
+
+## Dateistruktur
+
+### Frontend
+
+```
+admin-lehrer/app/(admin)/ai/ocr-kombi/
+├── page.tsx                    # ~140 Zeilen, Orchestrator mit Suspense-Boundary
+├── types.ts                    # KOMBI_V2_STEPS (11 Steps), DocumentGroup-Types, OCR-Transparenz-Types
+└── useKombiPipeline.ts         # Hook: Session-State, Step-Navigation, Dokument-Gruppierung
+
+admin-lehrer/components/ocr-kombi/
+├── KombiStepper.tsx            # 11-Step-Indikator (kompakt, scrollbar)
+├── SessionList.tsx             # Gruppierte Session-Liste (Dokumentgruppen aufklappbar)
+├── SessionHeader.tsx           # Aktive Session: Name + Kategorie + GT-Badge
+├── StepUpload.tsx              # Drag-Drop + Titel + Kategorie-Auswahl
+├── StepOrientation.tsx         # Wrapper → shared StepOrientation
+├── StepPageSplit.tsx           # Doppelseiten-Erkennung + Auto-Advance
+├── StepDeskew.tsx              # Wrapper → shared StepDeskew
+├── StepDewarp.tsx              # Wrapper → shared StepDewarp
+├── StepContentCrop.tsx         # Wrapper → shared StepCrop
+├── StepOcr.tsx                 # Wrapper → PaddleDirectStep (kombi endpoint)
+├── StepStructure.tsx           # Wrapper → shared StepStructureDetection
+├── StepGridBuild.tsx           # Auto-Trigger build-grid + Ergebnis-Anzeige
+├── StepGridReview.tsx          # Wrapper → shared StepGridReview (mit saveRef)
+└── StepGroundTruth.tsx         # GT-Markierung mit Auto-Save
+```
+
+### Backend
+
+```
+klausur-service/backend/ocr_kombi/
+├── __init__.py
+├── router.py                   # Composite Router (/api/v1/ocr-kombi)
+└── step_upload.py              # Multi-Page-PDF → N Sessions + document_group_id
+```
+
+### Shared (wiederverwendet)
+
+Die Kombi-Pipeline nutzt alle bestehenden Backend-Module:
+
+- `orientation_crop_api.py` — Orientierung, Page-Split, Crop
+- `ocr_pipeline_api.py` — Deskew, Dewarp
+- `ocr_pipeline_ocr_merge.py` — PaddleOCR + Tesseract Merge
+- `grid_editor_api.py` — Grid-Aufbau + Editor
+- `ocr_pipeline_session_store.py` — DB-Layer (erweitert um document_group_id)
+- Alle `cv_*.py` — CV-Algorithmen
+
+### Migration
+
+```
+migrations/009_add_document_group.sql  # document_group_id UUID + page_number INT + Index
+```
+
+---
+
+## Implementierungsphasen
+
+| Phase | Status | Beschreibung |
+|-------|--------|--------------|
+| **1: Grundgeruest + DB** | Implementiert | DB-Migration, Types, Hook, Stepper, SessionList, page.tsx, Navigation, Backend-Router |
+| **2: Vorverarbeitungs-Steps** | Geplant | Multi-Page-PDF-Upload, Orientierung ohne Upload, Seitentrennung mit document_group_id |
+| **3: OCR-Transparenz** | Geplant | 3-Phasen-Fortschritt, Engine-Attribution pro Wort, Farbkodierung |
+| **4: Grid-Pipeline aufteilen** | Geplant | grid_editor_api.py → 4 Module + Orchestrator |
+| **5: Restliche Steps** | Geplant | Structure, GridBuild, GridReview, GroundTruth voll integriert |
+| **6: Features migrieren** | Spaeter | LLM-Review-Streaming, Labeling-Mode, Bild-Generierung |
+| **7: Aufraeumen** | Spaeter | /ai/ocr-overlay und /ai/ocr-pipeline loeschen |
+
+---
+
+## OCR-Transparenz (Phase 3, geplant)
+
+### 3-Phasen-Fortschritt in Step 7
+
+1. "Tesseract laeuft..." (Fortschrittsbalken)
+2. "PaddleOCR laeuft..." (Fortschrittsbalken)
+3. "Merge laeuft..." (Fortschrittsbalken)
+
+### Engine-Attribution pro Wort
+
+Vergleichsansicht mit Farbkodierung:
+
+| Farbe | Bedeutung |
+|-------|-----------|
+| Gruen | Beide Engines einig |
+| Blau | Nur PaddleOCR |
+| Orange | Nur Tesseract |
+| Gelb | Konflikt, PaddleOCR gewaehlt |
+| Rot | Konflikt, Tesseract gewaehlt |
+
+### Geplanter Endpoint
+
+```
+POST /sessions/{id}/ocr-kombi-transparent
+→ { raw_tesseract, raw_paddle, merged, engine_source_per_word }
+```
+
+---
+
+## Grid-Pipeline-Aufteilung (Phase 4, geplant)
+
+`grid_editor_api.py` (1801 Zeilen) wird aufgeteilt in:
+
+| Modul | Inhalt | ~Zeilen |
+|-------|--------|---------|
+| `grid_build_filters.py` | Margin, Footer, Header, Exclude, Grafik-Filter | ~200 |
+| `grid_build_zones.py` | Box-Detect, Page-Zones, Vert-Dividers | ~250 |
+| `grid_build_columns.py` | Spalten-Clustering + Union-Merge + Zone-Grids | ~300 |
+| `grid_build_postprocess.py` | Row/Cell-Postprocessing, IPA, Farben, Dictionary | ~500 |
+
+`grid_editor_api.py` wird zum schlanken Orchestrator.
+
+---
+
+## Verhaeltnis zu bestehenden Pipelines
+
+| Pipeline | Route | Status | Beschreibung |
+|----------|-------|--------|--------------|
+| **OCR Kombi** | `/ai/ocr-kombi` | Aktiv (neu) | Modulare 11-Schritt-Pipeline |
+| OCR Overlay | `/ai/ocr-overlay` | Legacy | 751-Zeilen-Monolith, 3 Modi |
+| OCR Pipeline | `/ai/ocr-pipeline` | Legacy | Volle Pipeline mit Spalten |
+| OCR Compare | `/ai/ocr-compare` | Eigenstaendig | Methoden-Vergleich |
+
+Die alte OCR Overlay (`/ai/ocr-overlay`) bleibt waehrend des gesamten Umbaus parallel nutzbar
+fuer A/B-Tests. Sobald die Kombi-Pipeline feature-complete ist, werden die alten Pipelines
+in Phase 7 entfernt.
+
+---
+
+## Aenderungshistorie
+
+| Datum | Version | Aenderung |
+|-------|---------|-----------|
+| 2026-03-26 | 1.0.0 | **Phase 1:** Grundgeruest mit 11-Step-Architektur, DB-Migration (document_group_id, page_number), Backend-Router mit Multi-Page-Upload, Frontend mit SessionList (gruppiert), KombiStepper, 13 Step-Komponenten, useKombiPipeline Hook, Navigation |
--- a/docs-src/services/klausur-service/OCR-Pipeline.md
+++ b/docs-src/services/klausur-service/OCR-Pipeline.md
@@ -1,6 +1,6 @@
 # OCR Pipeline - Schrittweise Seitenrekonstruktion

-**Version:** 5.0.0
+**Version:** 5.1.0
 **Status:** Produktiv (Schritte 1–10 + Grid Editor + Regression Framework)
 **URL:** https://macmini:3002/ai/ocr-pipeline

@@ -149,7 +149,9 @@ klausur-service/backend/
 ├── ocr_pipeline_api.py                 # FastAPI Router (Schritte 2-10)
 ├── orientation_crop_api.py             # FastAPI Router (Schritte 1 + 4)
 ├── grid_editor_api.py                  # Grid Editor: build-grid, save-grid, grid-editor
+├── grid_editor_helpers.py              # Footer-Filterung, Seitenzahl-Extraktion
 ├── cv_ocr_engines.py                   # OCR-Engines, IPA-Korrektur, Britfone-Woerterbuch
+├── cv_syllable_detect.py               # Deutsche Silbentrennung (Silben:DE Modus)
 ├── cv_box_detect.py                    # Box-Erkennung + Zonen-Aufteilung
 ├── cv_graphic_detect.py                # Grafik-/Bilderkennung (Region-basiert)
 ├── cv_color_detect.py                  # Farbtext-Erkennung (HSV-Analyse)
@@ -1081,6 +1083,8 @@ Rekonstruktion fuer Vokabelseiten mit komplexen Layouts (Bilder, Ueberschriften,
 | Datei | Beschreibung |
 |-------|--------------|
 | `grid_editor_api.py` | `_build_grid_core()` Pipeline, alle Steps |
+| `grid_editor_helpers.py` | `_filter_footer_words()` → Seitenzahl-Extraktion, Footer-Filterung |
+| `cv_syllable_detect.py` | Deutsche Silbentrennung mit IPA-Kompatibilitaet |
 | `cv_ocr_engines.py` | IPA-Korrektur, Britfone-Woerterbuch, Garbled-IPA-Erkennung |
 | `cv_vocab_types.py` | `PageZone` (mit `image_overlays`), `ColumnGeometry` |
 | `tests/test_grid_editor_api.py` | 27 Tests |
@@ -1106,9 +1110,15 @@ Kombi-Wortdaten
  ├─ Step 4: Farb-Annotation
  │   → detect_word_colors(): HSV-Farbanalyse aller word_boxes
  │
+  ├─ Step 4b2: Per-Cell Artifact Filter
+  │   → Einzel-Wort-Zellen mit ≤2 Zeichen und conf < 65 entfernen
+  │
  ├─ Step 4c: Oversized Word Box Removal
  │   → word_boxes > 3x Median entfernen (Grafik-Artefakte)
  │
+  ├─ Step 4d2: Connector Column Normalization
+  │   → Dominante Kurzwoerter in schmalen Spalten normalisieren
+  │
  ├─ Step 5: Overlay-Wort-Filter
  │   → Woerter innerhalb image_overlays entfernen
  │
@@ -1197,6 +1207,38 @@ des Headwords der vorherigen Zeile). Diese werden von PaddleOCR als garbled Text
 4. Schlaegt IPA im Britfone-Woerterbuch nach
 5. Beruecksichtigt alle Wortteile (z.B. "close sth. down" → `[klˈəʊz dˈaʊn]`)

+### Per-Cell Artifact Filter (Step 4b2)
+
+Entfernt OCR-Rauschen auf Zellebene: Zellen mit genau einer `word_box`, maximal 2 Zeichen
+und Confidence unter 65 werden als Artefakte klassifiziert und entfernt.
+
+**Konstanten:**
+
+| Parameter | Wert | Beschreibung |
+|-----------|------|--------------|
+| `_ARTIFACT_MAX_LEN` | 2 | Maximale Textlaenge fuer Artefakt-Verdacht |
+| `_ARTIFACT_CONF_THRESHOLD` | 65 | Confidence-Schwelle (darunter = Artefakt) |
+
+**Sicherheit:** Einzelne Zeichen mit hoher Confidence (z.B. rote `!`-Marker mit conf=98)
+werden **nicht** entfernt, da ihre Confidence ueber dem Schwellwert liegt.
+
+**Typische Artefakte:** `(as)` conf=55, `u)` conf=44 — OCR-Noise aus Seitenraendern
+oder Schatten.
+
+### Connector Column Normalization (Step 4d2)
+
+Erkennt schmale Spalten mit einem dominanten Kurzwort (z.B. "oder", "and", "bzw.")
+und normalisiert OCR-Fehler bei denen das dominante Wort mit Rauschen versehen wurde.
+
+**Algorithmus:**
+
+1. Pro Spalte: Zaehle Textvorkommen aller Zellen
+2. Pruefe ob ein dominantes Wort existiert (≥ 60% der Zellen, max 10 Zeichen)
+3. Fuer Zellen die mit dem dominanten Wort **beginnen** und max 2 Zeichen laenger sind:
+   Normalisiere auf das dominante Wort
+
+**Beispiel:** Spalte mit "oder" in 80% der Zellen → `"oderb"` wird zu `"oder"` normalisiert.
+
 ### Compound Word IPA Decomposition (Step 5e)

 Zusammengesetzte Woerter wie "schoolbag" oder "blackbird" haben oft keinen eigenen
@@ -1253,6 +1295,69 @@ Admin-UI fuer effiziente Massenpruefung von Sessions:

 Admin-UI: [/ai/ocr-ground-truth](https://macmini:3002/ai/ocr-ground-truth)

+### Page Number Extraction
+
+Die Footer-Filterung (`_filter_footer_words` in `grid_editor_helpers.py`) erkennt
+Seitenzahlen in den untersten 5% des Bildes und gibt sie als Metadaten zurueck,
+statt sie stillschweigend zu entfernen.
+
+**Algorithmus:**
+
+1. Woerter in den untersten 5% des Bildes identifizieren
+2. Wenn ≤ 3 Woerter mit ≤ 10 Zeichen Gesamtlaenge: Als Seitenzahl extrahieren
+3. Rueckgabe als `PageNumber`-Objekt: `{text, y_pct, number?}`
+4. Ziffern werden separat als `number` (Integer) extrahiert
+
+**Datentyp:**
+
+```typescript
+interface PageNumber {
+  text: string     // Roh-OCR-Text (z.B. "u)233")
+  y_pct: number    // Vertikale Position in Prozent
+  number?: number  // Extrahierte Zahl (z.B. 233)
+}
+```
+
+**Frontend-Anzeige:**
+
+In der Summary-Leiste (GridEditor + StepGridReview) als Badge: `S. 233`.
+Zeigt bevorzugt `page_number.number` (saubere Zahl), Fallback auf `page_number.text`.
+
+**Zweck:** Spaetere Zusammenfuehrung aufeinanderfolgender Seiten im Kundenfrontend.
+
+### Footer-Zeilen-Erkennung (Verbesserung)
+
+Die Footer-Erkennung wurde um zwei Pruefungen erweitert, um falsch-positive
+Footer-Markierungen bei Content-Zeilen zu verhindern:
+
+| Pruefung | Bedingung | Grund |
+|----------|-----------|-------|
+| Komma-Check | `',' in text` → kein Footer | Content-Saetze enthalten Kommas, Seitenzahlen nicht |
+| Laengen-Check | `len(text) > 20` → kein Footer | Seitenzahlen sind kurz, Content-Zeilen lang |
+
+**Vorher:** `"Uhrzeit, Vergangenheit, Zukunft"` wurde als Footer markiert.
+**Nachher:** Nur tatsaechliche Seitenzahlen (kurz, ohne Kommas) werden als Footer erkannt.
+
+### Silben + IPA Kombination (Fix)
+
+**Datei:** `cv_syllable_detect.py`
+
+Wenn beide Modi (Silben:DE und IPA) aktiviert sind, blockierte der `_IPA_RE`-Guard
+die Silbentrennung, weil programmatisch eingefuegte IPA-Klammern (z.B. `[bɪltʃøn]`)
+IPA-Zeichen enthalten.
+
+**Loesung:** Vor der IPA-Pruefung wird Bracket-Content entfernt:
+
+```python
+# Bracket-Content strippen, da programmatisch eingefuegt
+text_no_brackets = re.sub(r'\[[^\]]*\]', '', text)
+if _IPA_RE.search(text_no_brackets):
+    return text  # Echte IPA im Fliesstext → keine Silbentrennung
+```
+
+So wird `"Bild·chen [bɪltʃøn]"` korrekt silbifiziert: Die Silbenpunkte bleiben erhalten,
+und die IPA in Klammern wird nicht als Blockiergrund gewertet.
+
 ### `en_col_type` Erkennung

 Die Erkennung der Englisch-Headword-Spalte nutzt **Bracket-IPA-Pattern-Count**
@@ -1620,6 +1725,8 @@ Die Ergebnisse fliessen in Schritt 5 (Spaltenerkennung) und den Grid Editor ein.

 | Datum | Version | Aenderung |
 |-------|---------|----------|
+| 2026-03-26 | 5.2.0 | **OCR Kombi Pipeline:** Neuer modularer Nachfolger als 11-Schritt-Architektur unter `/ai/ocr-kombi`. Eigene Dokumentation: [OCR Kombi Pipeline](OCR-Kombi-Pipeline.md). Phase 1 (Grundgeruest + DB) implementiert: DB-Migration (`document_group_id`, `page_number`), Frontend-Orchestrator, 13 Step-Komponenten, Backend-Router mit Multi-Page-Upload. |
+| 2026-03-26 | 5.1.0 | **Grid Quality & Metadata:** Per-Cell Artifact Filter (Step 4b2: ≤2 Zeichen + conf < 65), Connector Column Normalization (Step 4d2: dominante Kurzwoerter), Footer-Erkennung verbessert (Komma/Laengen-Check), Seitenzahl-Extraktion als Metadaten (`page_number` Feld im Grid-Result), Frontend-Anzeige in Summary-Leiste. Silben+IPA-Kombination gefixt (Bracket-Content vor IPA-Guard strippen). |
 | 2026-03-23 | 5.0.0 | **Phase 1 Sprint 1:** Compound-IPA-Zerlegung (`_decompose_compound`), Trailing-Garbled-Fragment-Entfernung (Multi-Wort-Headwords), Regression Framework mit DB-Persistenz + History + Shell-Script, Ground-Truth Review Workflow UI, Page-Crop Determinismus verifiziert. Admin-Seiten: `/ai/ocr-regression`, `/ai/ocr-ground-truth`. |
 | 2026-03-20 | 4.7.0 | Grid Editor: Zone Merging ueber Bilder (`image_overlays`), Heading Detection (Farbe + Hoehe), Ghost-Filter (borderless-aware), Oversized Word Box Removal, IPA Phonetic Correction (Britfone), IPA Continuation Detection, `en_col_type` via Bracket-Count. 27 Tests. |
 | 2026-03-16 | 4.6.0 | Strukturerkennung (Schritt 8): Region-basierte Grafikerkennung (`cv_graphic_detect.py`) mit Zwei-Pass-Verfahren (Farbregionen + schwarze Illustrationen), Wort-Ueberlappungs-Filter, Box/Zonen/Farb-Analyse. Schritt laeuft nach Worterkennung. |
--- a/docs-src/services/klausur-service/RAG-Landkarte.md
+++ b/docs-src/services/klausur-service/RAG-Landkarte.md
@@ -0,0 +1,204 @@
+# RAG Landkarte — Branchen-Regulierungs-Matrix
+
+## Uebersicht
+
+Die RAG Landkarte zeigt eine interaktive Matrix aller 320 Compliance-Dokumente im RAG-System, gruppiert nach Dokumenttyp und zugeordnet zu 10 Industriebranchen.
+
+**URL**: `https://macmini:3002/ai/rag` → Tab "Landkarte"
+
+**Letzte Aktualisierung**: 2026-04-15
+
+## Architektur
+
+```
+rag-documents.json          ← Zentrale Datendatei (320 Dokumente)
+    ├── doc_types[]          ← 17 Dokumenttypen (EU-VO, DE-Gesetz, etc.)
+    ├── industries[]         ← 10 Branchen (VDMA/VDA/BDI)
+    └── documents[]          ← Alle Dokumente mit Branchen-Mapping
+         ├── code            ← Eindeutiger Identifier
+         ├── name            ← Anzeigename
+         ├── doc_type        ← Verweis auf doc_types.id
+         ├── industries[]    ← ["all"] oder ["automotive", "chemie", ...]
+         ├── in_rag          ← true (alle im RAG)
+         ├── rag_collection  ← Qdrant Collection Name
+         ├── description?    ← Beschreibung (fuer ~100 Hauptregulierungen)
+         ├── applicability_note?  ← Begruendung der Branchenzuordnung
+         └── effective_date? ← Gueltigkeitsdatum
+
+rag-constants.ts            ← RAG-Metadaten (Chunks, Qdrant-IDs)
+page.tsx                    ← Frontend (importiert aus JSON)
+```
+
+## Dateien
+
+| Pfad | Beschreibung |
+|------|--------------|
+| `admin-lehrer/app/(admin)/ai/rag/rag-documents.json` | Alle 320 Dokumente mit Branchen-Mapping |
+| `admin-lehrer/app/(admin)/ai/rag/rag-constants.ts` | REGULATIONS_IN_RAG (Chunk-Counts, Qdrant-IDs) |
+| `admin-lehrer/app/(admin)/ai/rag/page.tsx` | Frontend-Rendering |
+| `admin-lehrer/app/(admin)/ai/rag/__tests__/rag-documents.test.ts` | 44 Tests fuer JSON-Validierung |
+
+## Branchen (10 Industriesektoren)
+
+Die Branchen orientieren sich an den Mitgliedsverbaenden von VDMA, VDA und BDI:
+
+| ID | Branche | Icon | Typische Kunden |
+|----|---------|------|-----------------|
+| `automotive` | Automobilindustrie | 🚗 | OEMs, Tier-1/2 Zulieferer |
+| `maschinenbau` | Maschinen- & Anlagenbau | ⚙️ | Werkzeugmaschinen, Automatisierung |
+| `elektrotechnik` | Elektro- & Digitalindustrie | ⚡ | Embedded Systems, Steuerungstechnik |
+| `chemie` | Chemie- & Prozessindustrie | 🧪 | Grundstoffchemie, Spezialchemie |
+| `metall` | Metallindustrie | 🔩 | Stahl, Aluminium, Metallverarbeitung |
+| `energie` | Energie & Versorgung | 🔋 | Energieerzeugung, Netzbetreiber |
+| `transport` | Transport & Logistik | 🚚 | Gueterverkehr, Schiene, Luftfahrt |
+| `handel` | Handel | 🏪 | Einzel-/Grosshandel, E-Commerce |
+| `konsumgueter` | Konsumgueter & Lebensmittel | 📦 | FMCG, Lebensmittel, Verpackung |
+| `bau` | Bauwirtschaft | 🏗️ | Hoch-/Tiefbau, Gebaeudeautomation |
+
+!!! warning "Keine Pseudo-Branchen"
+    Es werden bewusst **keine** Querschnittsthemen wie IoT, KI, HR, KRITIS oder E-Commerce als "Branchen" gefuehrt. Diese sind Technologien, Abteilungen oder Klassifizierungen — keine Wirtschaftssektoren.
+
+## Zuordnungslogik
+
+### Drei Ebenen
+
+| Ebene | `industries` Wert | Anzahl | Beispiele |
+|-------|-------------------|--------|-----------|
+| **Horizontal** | `["all"]` | 264 | DSGVO, AI Act, CRA, NIS2, BetrVG |
+| **Sektorspezifisch** | `["automotive", "chemie", ...]` | 42 | Maschinenverordnung, ElektroG, BattDG |
+| **Nicht zutreffend** | `[]` | 14 | DORA, MiCA, EHDS, DSA |
+
+### Horizontal (alle Branchen)
+
+Regulierungen die **branchenuebergreifend** gelten:
+
+- **Datenschutz**: DSGVO, BDSG, ePrivacy, TDDDG, SCC, DPF
+- **KI**: AI Act (jedes Unternehmen das KI einsetzt)
+- **Cybersecurity**: CRA (jedes Produkt mit digitalen Elementen), NIS2, EUCSA
+- **Produktsicherheit**: GPSR, Produkthaftungs-RL
+- **Arbeitsrecht**: BetrVG, AGG, KSchG, ArbSchG, LkSG
+- **Handels-/Steuerrecht**: HGB, AO, UStG
+- **Software-Security**: OWASP Top 10, NIST SSDF, CISA Secure by Design
+- **Supply Chain**: CycloneDX, SPDX, SLSA (CRA verlangt SBOM)
+- **Alle Leitlinien**: EDPB, DSK, DSFA-Listen, Gerichtsurteile
+
+### Sektorspezifisch
+
+| Regulierung | Branchen | Begruendung |
+|-------------|----------|-------------|
+| Maschinenverordnung | Maschinenbau, Automotive, Elektrotechnik, Metall, Bau | Hersteller von Maschinen und zugehoerigen Produkten |
+| ElektroG | Elektrotechnik, Automotive, Konsumgueter | Elektro-/Elektronikgeraete |
+| BattDG/BattVO | Automotive, Elektrotechnik, Energie | Batterien und Akkumulatoren |
+| VerpackG | Konsumgueter, Handel, Chemie | Verpackungspflichtige Produkte |
+| PAngV, UWG, VSBG | Handel, Konsumgueter | Verbraucherschutz im Verkauf |
+| BSI-KritisV, KRITIS-Dachgesetz | Energie, Transport, Chemie | KRITIS-Sektoren |
+| ENISA ICS/SCADA | Maschinenbau, Elektrotechnik, Automotive, Chemie, Energie, Transport | Industrielle Steuerungstechnik |
+| NIST SP 800-82 (OT) | Maschinenbau, Automotive, Elektrotechnik, Chemie, Energie, Metall | Operational Technology |
+
+### Nicht zutreffend
+
+Dokumente die **im RAG bleiben** aber fuer keine der 10 Zielbranchen relevant sind:
+
+| Code | Name | Grund |
+|------|------|-------|
+| DORA | Digital Operational Resilience Act | Finanzsektor |
+| PSD2 | Zahlungsdiensterichtlinie | Zahlungsdienstleister |
+| MiCA | Markets in Crypto-Assets | Krypto-Maerkte |
+| AMLR | AML-Verordnung | Geldwaesche-Bekaempfung |
+| EHDS | Europaeischer Gesundheitsdatenraum | Gesundheitswesen |
+| DSA | Digital Services Act | Online-Plattformen |
+| DMA | Digital Markets Act | Gatekeeper-Plattformen |
+| MDR | Medizinprodukteverordnung | Medizintechnik |
+| BSI-TR-03161 | DiGA-Sicherheit (3 Teile) | Digitale Gesundheitsanwendungen |
+
+## Dokumenttypen (17)
+
+| doc_type | Label | Anzahl | Beispiele |
+|----------|-------|--------|-----------|
+| `eu_regulation` | EU-Verordnungen | 22 | DSGVO, AI Act, CRA, DORA |
+| `eu_directive` | EU-Richtlinien | 14 | ePrivacy, NIS2, PSD2 |
+| `eu_guidance` | EU-Leitfaeden | 9 | Blue Guide, GPAI CoP |
+| `de_law` | Deutsche Gesetze | 41 | BDSG, BGB, HGB, BetrVG |
+| `at_law` | Oesterreichische Gesetze | 11 | DSG AT, ECG, KSchG |
+| `ch_law` | Schweizer Gesetze | 8 | revDSG, DSV, OR |
+| `national_law` | Nationale Datenschutzgesetze | 17 | UK DPA, LOPDGDD, UAVG |
+| `bsi_standard` | BSI Standards & TR | 4 | BSI 200-4, BSI-TR-03161 |
+| `edpb_guideline` | EDPB/WP29 Leitlinien | 50 | Consent, Controller/Processor |
+| `dsk_guidance` | DSK Orientierungshilfen | 57 | Kurzpapiere, OH Telemedien |
+| `court_decision` | Gerichtsurteile | 20 | BAG M365, BGH Planet49 |
+| `dsfa_list` | DSFA Muss-Listen | 20 | Pro Bundesland + DSK |
+| `nist_standard` | NIST Standards | 11 | CSF 2.0, SSDF, AI RMF |
+| `owasp_standard` | OWASP Standards | 6 | Top 10, ASVS, API Security |
+| `enisa_guidance` | ENISA Guidance | 6 | Supply Chain, ICS/SCADA |
+| `international` | Internationale Standards | 7 | CVSS, CycloneDX, SPDX |
+| `legal_template` | Vorlagen & Muster | 17 | GitHub Policies, VVT-Muster |
+
+## Integration in andere Projekte
+
+### JSON importieren
+
+```typescript
+import ragData from './rag-documents.json'
+
+const documents = ragData.documents    // 320 Dokumente
+const docTypes = ragData.doc_types     // 17 Kategorien
+const industries = ragData.industries  // 10 Branchen
+```
+
+### Matrix-Logik
+
+```typescript
+// Pruefen ob Dokument fuer Branche gilt
+const applies = (doc, industryId) =>
+  doc.industries.includes(industryId) || doc.industries.includes('all')
+
+// Dokumente nach Typ gruppieren
+const grouped = Object.groupBy(documents, d => d.doc_type)
+
+// Nur sektorspezifische Dokumente fuer eine Branche
+const forAutomotive = documents.filter(d =>
+  d.industries.includes('automotive') && !d.industries.includes('all')
+)
+```
+
+### RAG-Status pruefen
+
+```typescript
+import { REGULATIONS_IN_RAG } from './rag-constants'
+
+const isInRag = (code: string) => code in REGULATIONS_IN_RAG
+const chunks = REGULATIONS_IN_RAG['GDPR']?.chunks  // 423
+```
+
+## Datenquellen
+
+| Quelle | Pfad | Beschreibung |
+|--------|------|--------------|
+| RAG-Inventar | `~/Desktop/RAG-Dokumenten-Inventar.md` | 386 Quelldateien |
+| rag-documents.json | `admin-lehrer/.../rag/rag-documents.json` | 320 konsolidierte Dokumente |
+| rag-constants.ts | `admin-lehrer/.../rag/rag-constants.ts` | Qdrant-Metadaten |
+
+## Tests
+
+```bash
+cd admin-lehrer
+npx vitest run app/\(admin\)/ai/rag/__tests__/rag-documents.test.ts
+```
+
+44 Tests validieren:
+
+- JSON-Struktur (doc_types, industries, documents)
+- 10 echte Branchen (keine Pseudo-Branchen)
+- Pflichtfelder und gueltige Referenzen
+- Horizontale Regulierungen (DSGVO, AI Act, CRA → "all")
+- Sektorspezifische Zuordnungen (Maschinenverordnung, ElektroG)
+- Nicht zutreffende Regulierungen (DORA, MiCA → leer)
+- Applicability Notes vorhanden und korrekt
+
+## Aenderungshistorie
+
+| Datum | Aenderung |
+|-------|-----------|
+| 2026-04-15 | Initiale Implementierung: 320 Dokumente, 10 Branchen, 17 Typen |
+| 2026-04-15 | Branchen-Review: OWASP/SBOM → alle, BSI-TR-03161 → leer |
+| 2026-04-15 | Applicability Notes UI: Aufklappbare Erklaerungen pro Dokument |
--- a/klausur-service/backend/cv_box_layout.py
+++ b/klausur-service/backend/cv_box_layout.py
@@ -0,0 +1,339 @@
+"""
+Box layout classifier — detects internal layout type of embedded boxes.
+
+Classifies each box as: flowing | columnar | bullet_list | header_only
+and provides layout-appropriate grid building.
+
+Used by the Box-Grid-Review step to rebuild box zones with correct structure.
+"""
+
+import logging
+import re
+import statistics
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+# Bullet / list-item patterns at the start of a line
+_BULLET_RE = re.compile(
+    r'^[\-\u2022\u2013\u2014\u25CF\u25CB\u25AA\u25A0•·]\s'  # dash, bullet chars
+    r'|^\d{1,2}[.)]\s'     # numbered: "1) " or "1. "
+    r'|^[a-z][.)]\s'       # lettered: "a) " or "a. "
+)
+
+
+def classify_box_layout(
+    words: List[Dict],
+    box_w: int,
+    box_h: int,
+) -> str:
+    """Classify the internal layout of a detected box.
+
+    Args:
+        words: OCR word dicts within the box (with top, left, width, height, text)
+        box_w: Box width in pixels
+        box_h: Box height in pixels
+
+    Returns:
+        'header_only' | 'bullet_list' | 'columnar' | 'flowing'
+    """
+    if not words:
+        return "header_only"
+
+    # Group words into lines by y-proximity
+    lines = _group_into_lines(words)
+
+    # Header only: very few words or single line
+    total_words = sum(len(line) for line in lines)
+    if total_words <= 5 or len(lines) <= 1:
+        return "header_only"
+
+    # Bullet list: check if majority of lines start with bullet patterns
+    bullet_count = 0
+    for line in lines:
+        first_text = line[0].get("text", "") if line else ""
+        if _BULLET_RE.match(first_text):
+            bullet_count += 1
+        # Also check if first word IS a bullet char
+        elif first_text.strip() in ("-", "–", "—", "•", "·", "▪", "▸"):
+            bullet_count += 1
+    if bullet_count >= len(lines) * 0.4 and bullet_count >= 2:
+        return "bullet_list"
+
+    # Columnar: check for multiple distinct x-clusters
+    if len(lines) >= 3 and _has_column_structure(words, box_w):
+        return "columnar"
+
+    # Default: flowing text
+    return "flowing"
+
+
+def _group_into_lines(words: List[Dict]) -> List[List[Dict]]:
+    """Group words into lines by y-proximity."""
+    if not words:
+        return []
+
+    sorted_words = sorted(words, key=lambda w: (w["top"], w["left"]))
+    heights = [w["height"] for w in sorted_words if w.get("height", 0) > 0]
+    median_h = statistics.median(heights) if heights else 20
+    y_tolerance = max(median_h * 0.5, 5)
+
+    lines: List[List[Dict]] = []
+    current_line: List[Dict] = [sorted_words[0]]
+    current_y = sorted_words[0]["top"]
+
+    for w in sorted_words[1:]:
+        if abs(w["top"] - current_y) <= y_tolerance:
+            current_line.append(w)
+        else:
+            lines.append(sorted(current_line, key=lambda ww: ww["left"]))
+            current_line = [w]
+            current_y = w["top"]
+
+    if current_line:
+        lines.append(sorted(current_line, key=lambda ww: ww["left"]))
+
+    return lines
+
+
+def _has_column_structure(words: List[Dict], box_w: int) -> bool:
+    """Check if words have multiple distinct left-edge clusters (columns)."""
+    if box_w <= 0:
+        return False
+
+    lines = _group_into_lines(words)
+    if len(lines) < 3:
+        return False
+
+    # Collect left-edges of non-first words in each line
+    # (first word of each line often aligns regardless of columns)
+    left_edges = []
+    for line in lines:
+        for w in line[1:]:  # skip first word
+            left_edges.append(w["left"])
+
+    if len(left_edges) < 4:
+        return False
+
+    # Check if left edges cluster into 2+ distinct groups
+    left_edges.sort()
+    gaps = [left_edges[i + 1] - left_edges[i] for i in range(len(left_edges) - 1)]
+    if not gaps:
+        return False
+
+    median_gap = statistics.median(gaps)
+    # A column gap is typically > 15% of box width
+    column_gap_threshold = box_w * 0.15
+    large_gaps = [g for g in gaps if g > column_gap_threshold]
+
+    return len(large_gaps) >= 1
+
+
+def build_box_zone_grid(
+    zone_words: List[Dict],
+    box_x: int,
+    box_y: int,
+    box_w: int,
+    box_h: int,
+    zone_index: int,
+    img_w: int,
+    img_h: int,
+    layout_type: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Build a grid for a box zone with layout-aware processing.
+
+    If layout_type is None, auto-detects it.
+    For 'flowing' and 'bullet_list', forces single-column layout.
+    For 'columnar', uses the standard multi-column detection.
+    For 'header_only', creates a single cell.
+
+    Returns the same format as _build_zone_grid (columns, rows, cells, header_rows).
+    """
+    from grid_editor_helpers import _build_zone_grid, _cluster_rows
+
+    if not zone_words:
+        return {
+            "columns": [],
+            "rows": [],
+            "cells": [],
+            "header_rows": [],
+            "box_layout_type": layout_type or "header_only",
+            "box_grid_reviewed": False,
+        }
+
+    # Auto-detect layout if not specified
+    if not layout_type:
+        layout_type = classify_box_layout(zone_words, box_w, box_h)
+
+    logger.info(
+        "Box zone %d: layout_type=%s, %d words, %dx%d",
+        zone_index, layout_type, len(zone_words), box_w, box_h,
+    )
+
+    if layout_type == "header_only":
+        # Single cell with all text concatenated
+        all_text = " ".join(
+            w.get("text", "") for w in sorted(zone_words, key=lambda ww: (ww["top"], ww["left"]))
+        ).strip()
+        return {
+            "columns": [{"col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
+                         "x_min_px": box_x, "x_max_px": box_x + box_w,
+                         "x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
+                         "x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
+                         "bold": False}],
+            "rows": [{"index": 0, "row_index": 0,
+                       "y_min": box_y, "y_max": box_y + box_h, "y_center": box_y + box_h / 2,
+                       "y_min_px": box_y, "y_max_px": box_y + box_h,
+                       "y_min_pct": round(box_y / img_h * 100, 2) if img_h else 0,
+                       "y_max_pct": round((box_y + box_h) / img_h * 100, 2) if img_h else 0,
+                       "is_header": True}],
+            "cells": [{
+                "cell_id": f"Z{zone_index}_R0C0",
+                "row_index": 0,
+                "col_index": 0,
+                "col_type": "column_1",
+                "text": all_text,
+                "word_boxes": zone_words,
+            }],
+            "header_rows": [0],
+            "box_layout_type": layout_type,
+            "box_grid_reviewed": False,
+        }
+
+    if layout_type in ("flowing", "bullet_list"):
+        # Force single column — each line becomes one row with one cell.
+        # Detect bullet structure from indentation and merge continuation
+        # lines into the bullet they belong to.
+        lines = _group_into_lines(zone_words)
+        column = {
+            "col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
+            "x_min_px": box_x, "x_max_px": box_x + box_w,
+            "x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
+            "x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
+            "bold": False,
+        }
+
+        # --- Detect indentation levels ---
+        line_indents = []
+        for line_words in lines:
+            if not line_words:
+                line_indents.append(0)
+                continue
+            min_left = min(w["left"] for w in line_words)
+            line_indents.append(min_left - box_x)
+
+        # Find the minimum indent (= bullet/main level)
+        valid_indents = [ind for ind in line_indents if ind >= 0]
+        min_indent = min(valid_indents) if valid_indents else 0
+
+        # Indentation threshold: lines indented > 15px more than minimum
+        # are continuation lines belonging to the previous bullet
+        INDENT_THRESHOLD = 15
+
+        # --- Group lines into logical items (bullet + continuations) ---
+        # Each item is a list of line indices
+        items: List[List[int]] = []
+        for li, indent in enumerate(line_indents):
+            is_continuation = (indent > min_indent + INDENT_THRESHOLD) and len(items) > 0
+            if is_continuation:
+                items[-1].append(li)
+            else:
+                items.append([li])
+
+        logger.info(
+            "Box zone %d flowing: %d lines → %d items (indents=%s, min=%d, threshold=%d)",
+            zone_index, len(lines), len(items),
+            [int(i) for i in line_indents], int(min_indent), INDENT_THRESHOLD,
+        )
+
+        # --- Build rows and cells from grouped items ---
+        rows = []
+        cells = []
+        header_rows = []
+
+        for row_idx, item_line_indices in enumerate(items):
+            # Collect all words from all lines in this item
+            item_words = []
+            item_texts = []
+            for li in item_line_indices:
+                if li < len(lines):
+                    item_words.extend(lines[li])
+                    line_text = " ".join(w.get("text", "") for w in lines[li]).strip()
+                    if line_text:
+                        item_texts.append(line_text)
+
+            if not item_words:
+                continue
+
+            y_min = min(w["top"] for w in item_words)
+            y_max = max(w["top"] + w["height"] for w in item_words)
+            y_center = (y_min + y_max) / 2
+
+            row = {
+                "index": row_idx,
+                "row_index": row_idx,
+                "y_min": y_min,
+                "y_max": y_max,
+                "y_center": y_center,
+                "y_min_px": y_min,
+                "y_max_px": y_max,
+                "y_min_pct": round(y_min / img_h * 100, 2) if img_h else 0,
+                "y_max_pct": round(y_max / img_h * 100, 2) if img_h else 0,
+                "is_header": False,
+            }
+            rows.append(row)
+
+            # Join multi-line text with newline for display
+            merged_text = "\n".join(item_texts)
+
+            # Add bullet marker if this is a bullet item without one
+            first_text = item_texts[0] if item_texts else ""
+            is_bullet = len(item_line_indices) > 1 or _BULLET_RE.match(first_text)
+            if is_bullet and not _BULLET_RE.match(first_text) and row_idx > 0:
+                # Continuation item without bullet — add one
+                merged_text = "• " + merged_text
+
+            cell = {
+                "cell_id": f"Z{zone_index}_R{row_idx}C0",
+                "row_index": row_idx,
+                "col_index": 0,
+                "col_type": "column_1",
+                "text": merged_text,
+                "word_boxes": item_words,
+            }
+            cells.append(cell)
+
+        # Detect header: first item if it has no continuation lines and is short
+        if len(items) >= 2:
+            first_item_texts = []
+            for li in items[0]:
+                if li < len(lines):
+                    first_item_texts.append(" ".join(w.get("text", "") for w in lines[li]).strip())
+            first_text = " ".join(first_item_texts)
+            if (len(first_text) < 40
+                    or first_text.isupper()
+                    or first_text.rstrip().endswith(':')):
+                header_rows = [0]
+
+        return {
+            "columns": [column],
+            "rows": rows,
+            "cells": cells,
+            "header_rows": header_rows,
+            "box_layout_type": layout_type,
+            "box_grid_reviewed": False,
+        }
+
+    # Columnar: use standard grid builder with independent column detection
+    result = _build_zone_grid(
+        zone_words, box_x, box_y, box_w, box_h,
+        zone_index, img_w, img_h,
+        global_columns=None,  # detect columns independently
+    )
+
+    # Colspan detection is now handled generically by _detect_colspan_cells
+    # in grid_editor_helpers.py (called inside _build_zone_grid).
+
+    result["box_layout_type"] = layout_type
+    result["box_grid_reviewed"] = False
+    return result
--- a/klausur-service/backend/cv_cell_grid.py
+++ b/klausur-service/backend/cv_cell_grid.py
@@ -1447,6 +1447,90 @@ def _merge_phonetic_continuation_rows(
    return merged


+def _merge_wrapped_rows(
+    entries: List[Dict[str, Any]],
+) -> List[Dict[str, Any]]:
+    """Merge rows where the primary column (EN) is empty — cell wrap continuation.
+
+    In textbook vocabulary tables, columns are often narrow, so the author
+    wraps text within a cell. OCR treats each physical line as a separate row.
+    The key indicator: if the EN column is empty but DE/example have text,
+    this row is a continuation of the previous row's cells.
+
+    Example (original textbook has ONE row):
+      Row 2: EN="take part (in)"  DE="teilnehmen (an), mitmachen"  EX="More than 200 singers took"
+      Row 3: EN=""                DE="(bei)"                        EX="part in the concert."
+      → Merged: EN="take part (in)" DE="teilnehmen (an), mitmachen (bei)" EX="More than 200 singers took part in the concert."
+
+    Also handles the reverse case: DE empty but EN has text (wrap in EN column).
+    """
+    if len(entries) < 2:
+        return entries
+
+    merged: List[Dict[str, Any]] = []
+    for entry in entries:
+        en = (entry.get('english') or '').strip()
+        de = (entry.get('german') or '').strip()
+        ex = (entry.get('example') or '').strip()
+
+        if not merged:
+            merged.append(entry)
+            continue
+
+        prev = merged[-1]
+        prev_en = (prev.get('english') or '').strip()
+        prev_de = (prev.get('german') or '').strip()
+        prev_ex = (prev.get('example') or '').strip()
+
+        # Case 1: EN is empty → continuation of previous row
+        # (DE or EX have text that should be appended to previous row)
+        if not en and (de or ex) and prev_en:
+            if de:
+                if prev_de.endswith(','):
+                    sep = ' '  # "Wort," + " " + "Ausdruck"
+                elif prev_de.endswith(('-', '(')):
+                    sep = ''   # "teil-" + "nehmen" or "(" + "bei)"
+                else:
+                    sep = ' '
+                prev['german'] = (prev_de + sep + de).strip()
+            if ex:
+                sep = ' ' if prev_ex else ''
+                prev['example'] = (prev_ex + sep + ex).strip()
+            logger.debug(
+                f"Merged wrapped row {entry.get('row_index')} into previous "
+                f"(empty EN): DE={prev['german']!r}, EX={prev.get('example', '')!r}"
+            )
+            continue
+
+        # Case 2: DE is empty, EN has text that looks like continuation
+        # (starts with lowercase or is a parenthetical like "(bei)")
+        if en and not de and prev_de:
+            is_paren = en.startswith('(')
+            first_alpha = next((c for c in en if c.isalpha()), '')
+            starts_lower = first_alpha and first_alpha.islower()
+
+            if (is_paren or starts_lower) and len(en.split()) < 5:
+                sep = ' ' if prev_en and not prev_en.endswith((',', '-', '(')) else ''
+                prev['english'] = (prev_en + sep + en).strip()
+                if ex:
+                    sep2 = ' ' if prev_ex else ''
+                    prev['example'] = (prev_ex + sep2 + ex).strip()
+                logger.debug(
+                    f"Merged wrapped row {entry.get('row_index')} into previous "
+                    f"(empty DE): EN={prev['english']!r}"
+                )
+                continue
+
+        merged.append(entry)
+
+    if len(merged) < len(entries):
+        logger.info(
+            f"_merge_wrapped_rows: merged {len(entries) - len(merged)} "
+            f"continuation rows ({len(entries)} → {len(merged)})"
+        )
+    return merged
+
+
 def _merge_continuation_rows(
    entries: List[Dict[str, Any]],
 ) -> List[Dict[str, Any]]:
@@ -1561,6 +1645,9 @@ def build_word_grid(
    # --- Post-processing pipeline (deterministic, no LLM) ---
    n_raw = len(entries)

+    # 0. Merge cell-wrap continuation rows (empty primary column = text wrap)
+    entries = _merge_wrapped_rows(entries)
+
    # 0a. Merge phonetic-only continuation rows into previous entry
    entries = _merge_phonetic_continuation_rows(entries)

--- a/klausur-service/backend/cv_gutter_repair.py
+++ b/klausur-service/backend/cv_gutter_repair.py
@@ -0,0 +1,610 @@
+"""
+Gutter Repair — detects and fixes words truncated or blurred at the book gutter.
+
+When scanning double-page spreads, the binding area (gutter) causes:
+  1. Blurry/garbled trailing characters  ("stammeli" → "stammeln")
+  2. Words split across lines with a hyphen lost in the gutter
+     ("ve" + "künden" → "verkünden")
+
+This module analyses grid cells, identifies gutter-edge candidates, and
+proposes corrections using pyspellchecker (DE + EN).
+
+Lizenz: Apache 2.0 (kommerziell nutzbar)
+DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
+"""
+
+import itertools
+import logging
+import re
+import time
+import uuid
+from dataclasses import dataclass, field, asdict
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Spellchecker setup (lazy, cached)
+# ---------------------------------------------------------------------------
+
+_spell_de = None
+_spell_en = None
+_SPELL_AVAILABLE = False
+
+def _init_spellcheckers():
+    """Lazy-load DE + EN spellcheckers (cached across calls)."""
+    global _spell_de, _spell_en, _SPELL_AVAILABLE
+    if _spell_de is not None:
+        return
+    try:
+        from spellchecker import SpellChecker
+        _spell_de = SpellChecker(language='de', distance=1)
+        _spell_en = SpellChecker(language='en', distance=1)
+        _SPELL_AVAILABLE = True
+        logger.info("Gutter repair: spellcheckers loaded (DE + EN)")
+    except ImportError:
+        logger.warning("pyspellchecker not installed — gutter repair unavailable")
+
+
+def _is_known(word: str) -> bool:
+    """Check if a word is known in DE or EN dictionary."""
+    _init_spellcheckers()
+    if not _SPELL_AVAILABLE:
+        return False
+    w = word.lower()
+    return bool(_spell_de.known([w])) or bool(_spell_en.known([w]))
+
+
+def _spell_candidates(word: str, lang: str = "both") -> List[str]:
+    """Get all plausible spellchecker candidates for a word (deduplicated)."""
+    _init_spellcheckers()
+    if not _SPELL_AVAILABLE:
+        return []
+    w = word.lower()
+    seen: set = set()
+    results: List[str] = []
+
+    for checker in ([_spell_de, _spell_en] if lang == "both"
+                    else [_spell_de] if lang == "de"
+                    else [_spell_en]):
+        if checker is None:
+            continue
+        cands = checker.candidates(w)
+        if cands:
+            for c in cands:
+                if c and c != w and c not in seen:
+                    seen.add(c)
+                    results.append(c)
+
+    return results
+
+
+# ---------------------------------------------------------------------------
+# Gutter position detection
+# ---------------------------------------------------------------------------
+
+# Minimum word length for spell-fix (very short words are often legitimate)
+_MIN_WORD_LEN_SPELL = 3
+
+# Minimum word length for hyphen-join candidates (fragments at the gutter
+# can be as short as 1-2 chars, e.g. "ve" from "ver-künden")
+_MIN_WORD_LEN_HYPHEN = 2
+
+# How close to the right column edge a word must be to count as "gutter-adjacent".
+# Expressed as fraction of column width (e.g. 0.75 = rightmost 25%).
+_GUTTER_EDGE_THRESHOLD = 0.70
+
+# Small common words / abbreviations that should NOT be repaired
+_STOPWORDS = frozenset([
+    # German
+    "ab", "an", "am", "da", "er", "es", "im", "in", "ja", "ob", "so", "um",
+    "zu", "wo", "du", "eh", "ei", "je", "na", "nu", "oh",
+    # English
+    "a", "am", "an", "as", "at", "be", "by", "do", "go", "he", "if", "in",
+    "is", "it", "me", "my", "no", "of", "on", "or", "so", "to", "up", "us",
+    "we",
+])
+
+# IPA / phonetic patterns — skip these cells
+_IPA_RE = re.compile(r'[\[\]/ˈˌːʃʒθðŋɑɒæɔəɛɪʊʌ]')
+
+
+def _is_ipa_text(text: str) -> bool:
+    """True if text looks like IPA transcription."""
+    return bool(_IPA_RE.search(text))
+
+
+def _word_is_at_gutter_edge(word_bbox: Dict, col_x: float, col_width: float) -> bool:
+    """Check if a word's right edge is near the right boundary of its column."""
+    if col_width <= 0:
+        return False
+    word_right = word_bbox.get("left", 0) + word_bbox.get("width", 0)
+    col_right = col_x + col_width
+    # Word's right edge within the rightmost portion of the column
+    relative_pos = (word_right - col_x) / col_width
+    return relative_pos >= _GUTTER_EDGE_THRESHOLD
+
+
+# ---------------------------------------------------------------------------
+# Suggestion types
+# ---------------------------------------------------------------------------
+
+@dataclass
+class GutterSuggestion:
+    """A single correction suggestion."""
+    id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
+    type: str = ""             # "hyphen_join" | "spell_fix"
+    zone_index: int = 0
+    row_index: int = 0
+    col_index: int = 0
+    col_type: str = ""
+    cell_id: str = ""
+    original_text: str = ""
+    suggested_text: str = ""
+    # For hyphen_join:
+    next_row_index: int = -1
+    next_row_cell_id: str = ""
+    next_row_text: str = ""
+    missing_chars: str = ""
+    display_parts: List[str] = field(default_factory=list)
+    # Alternatives (other plausible corrections the user can pick from)
+    alternatives: List[str] = field(default_factory=list)
+    # Meta:
+    confidence: float = 0.0
+    reason: str = ""           # "gutter_truncation" | "gutter_blur" | "hyphen_continuation"
+
+    def to_dict(self) -> Dict[str, Any]:
+        return asdict(self)
+
+
+# ---------------------------------------------------------------------------
+# Core repair logic
+# ---------------------------------------------------------------------------
+
+_TRAILING_PUNCT_RE = re.compile(r'[.,;:!?\)\]]+$')
+
+
+def _try_hyphen_join(
+    word_text: str,
+    next_word_text: str,
+    max_missing: int = 3,
+) -> Optional[Tuple[str, str, float]]:
+    """Try joining two fragments with 0..max_missing interpolated chars.
+
+    Strips trailing punctuation from the continuation word before testing
+    (e.g. "künden," → "künden") so dictionary lookup succeeds.
+
+    Returns (joined_word, missing_chars, confidence) or None.
+    """
+    base = word_text.rstrip("-").rstrip()
+    # Strip trailing punctuation from continuation (commas, periods, etc.)
+    raw_continuation = next_word_text.lstrip()
+    continuation = _TRAILING_PUNCT_RE.sub('', raw_continuation)
+
+    if not base or not continuation:
+        return None
+
+    # 1. Direct join (no missing chars)
+    direct = base + continuation
+    if _is_known(direct):
+        return (direct, "", 0.95)
+
+    # 2. Try with 1..max_missing missing characters
+    # Use common letters, weighted by frequency in German/English
+    _COMMON_CHARS = "enristaldhgcmobwfkzpvjyxqu"
+
+    for n_missing in range(1, max_missing + 1):
+        for chars in itertools.product(_COMMON_CHARS[:15], repeat=n_missing):
+            candidate = base + "".join(chars) + continuation
+            if _is_known(candidate):
+                missing = "".join(chars)
+                # Confidence decreases with more missing chars
+                conf = 0.90 - (n_missing - 1) * 0.10
+                return (candidate, missing, conf)
+
+    return None
+
+
+def _try_spell_fix(
+    word_text: str, col_type: str = "",
+) -> Optional[Tuple[str, float, List[str]]]:
+    """Try to fix a single garbled gutter word via spellchecker.
+
+    Returns (best_correction, confidence, alternatives_list) or None.
+    The alternatives list contains other plausible corrections the user
+    can choose from (e.g. "stammelt" vs "stammeln").
+    """
+    if len(word_text) < _MIN_WORD_LEN_SPELL:
+        return None
+
+    # Strip trailing/leading parentheses and check if the bare word is valid.
+    # Words like "probieren)" or "(Englisch" are valid words with punctuation,
+    # not OCR errors. Don't suggest corrections for them.
+    stripped = word_text.strip("()")
+    if stripped and _is_known(stripped):
+        return None
+
+    # Determine language priority from column type
+    if "en" in col_type:
+        lang = "en"
+    elif "de" in col_type:
+        lang = "de"
+    else:
+        lang = "both"
+
+    candidates = _spell_candidates(word_text, lang=lang)
+    if not candidates and lang != "both":
+        candidates = _spell_candidates(word_text, lang="both")
+
+    if not candidates:
+        return None
+
+    # Preserve original casing
+    is_upper = word_text[0].isupper()
+
+    def _preserve_case(w: str) -> str:
+        if is_upper and w:
+            return w[0].upper() + w[1:]
+        return w
+
+    # Sort candidates by edit distance (closest first)
+    scored = []
+    for c in candidates:
+        dist = _edit_distance(word_text.lower(), c.lower())
+        scored.append((dist, c))
+    scored.sort(key=lambda x: x[0])
+
+    best_dist, best = scored[0]
+    best = _preserve_case(best)
+    conf = max(0.5, 1.0 - best_dist * 0.15)
+
+    # Build alternatives (all other candidates, also case-preserved)
+    alts = [_preserve_case(c) for _, c in scored[1:] if c.lower() != best.lower()]
+    # Limit to top 5 alternatives
+    alts = alts[:5]
+
+    return (best, conf, alts)
+
+
+def _edit_distance(a: str, b: str) -> int:
+    """Simple Levenshtein distance."""
+    if len(a) < len(b):
+        return _edit_distance(b, a)
+    if len(b) == 0:
+        return len(a)
+    prev = list(range(len(b) + 1))
+    for i, ca in enumerate(a):
+        curr = [i + 1]
+        for j, cb in enumerate(b):
+            cost = 0 if ca == cb else 1
+            curr.append(min(curr[j] + 1, prev[j + 1] + 1, prev[j] + cost))
+        prev = curr
+    return prev[len(b)]
+
+
+# ---------------------------------------------------------------------------
+# Grid analysis
+# ---------------------------------------------------------------------------
+
+def analyse_grid_for_gutter_repair(
+    grid_data: Dict[str, Any],
+    image_width: int = 0,
+) -> Dict[str, Any]:
+    """Analyse a structured grid and return gutter repair suggestions.
+
+    Args:
+        grid_data: The grid_editor_result from the session (zones→cells structure).
+        image_width: Image width in pixels (for determining gutter side).
+
+    Returns:
+        Dict with "suggestions" list and "stats".
+    """
+    t0 = time.time()
+    _init_spellcheckers()
+
+    if not _SPELL_AVAILABLE:
+        return {
+            "suggestions": [],
+            "stats": {"error": "pyspellchecker not installed"},
+            "duration_seconds": 0,
+        }
+
+    zones = grid_data.get("zones", [])
+    suggestions: List[GutterSuggestion] = []
+    words_checked = 0
+    gutter_candidates = 0
+
+    for zi, zone in enumerate(zones):
+        columns = zone.get("columns", [])
+        cells = zone.get("cells", [])
+        if not columns or not cells:
+            continue
+
+        # Build column lookup: col_index → {x, width, type}
+        col_info: Dict[int, Dict] = {}
+        for col in columns:
+            ci = col.get("index", col.get("col_index", -1))
+            col_info[ci] = {
+                "x": col.get("x_min_px", col.get("x", 0)),
+                "width": col.get("x_max_px", col.get("width", 0)) - col.get("x_min_px", col.get("x", 0)),
+                "type": col.get("type", col.get("col_type", "")),
+            }
+
+        # Build row→col→cell lookup
+        cell_map: Dict[Tuple[int, int], Dict] = {}
+        max_row = 0
+        for cell in cells:
+            ri = cell.get("row_index", 0)
+            ci = cell.get("col_index", 0)
+            cell_map[(ri, ci)] = cell
+            if ri > max_row:
+                max_row = ri
+
+        # Determine which columns are at the gutter edge.
+        # For a left page: rightmost content columns.
+        # For now, check ALL columns — a word is a candidate if it's at the
+        # right edge of its column AND not a known word.
+        for (ri, ci), cell in cell_map.items():
+            text = (cell.get("text") or "").strip()
+            if not text:
+                continue
+            if _is_ipa_text(text):
+                continue
+
+            words_checked += 1
+            col = col_info.get(ci, {})
+            col_type = col.get("type", "")
+
+            # Get word boxes to check position
+            word_boxes = cell.get("word_boxes", [])
+
+            # Check the LAST word in the cell (rightmost, closest to gutter)
+            cell_words = text.split()
+            if not cell_words:
+                continue
+
+            last_word = cell_words[-1]
+
+            # Skip stopwords
+            if last_word.lower().rstrip(".,;:!?-") in _STOPWORDS:
+                continue
+
+            last_word_clean = last_word.rstrip(".,;:!?)(")
+            if len(last_word_clean) < _MIN_WORD_LEN_HYPHEN:
+                continue
+
+            # Check if the last word is at the gutter edge
+            is_at_edge = False
+            if word_boxes:
+                last_wb = word_boxes[-1]
+                is_at_edge = _word_is_at_gutter_edge(
+                    last_wb, col.get("x", 0), col.get("width", 1)
+                )
+            else:
+                # No word boxes — use cell bbox
+                bbox = cell.get("bbox_px", {})
+                is_at_edge = _word_is_at_gutter_edge(
+                    {"left": bbox.get("x", 0), "width": bbox.get("w", 0)},
+                    col.get("x", 0), col.get("width", 1)
+                )
+
+            if not is_at_edge:
+                continue
+
+            # Word is at gutter edge — check if it's a known word
+            if _is_known(last_word_clean):
+                continue
+
+            # Check if the word ends with "-" (explicit hyphen break)
+            ends_with_hyphen = last_word.endswith("-")
+
+            # If the word already ends with "-" and the stem (without
+            # the hyphen) is a known word, this is a VALID line-break
+            # hyphenation — not a gutter error.  Gutter problems cause
+            # the hyphen to be LOST ("ve" instead of "ver-"), so a
+            # visible hyphen + known stem = intentional word-wrap.
+            # Example: "wunder-" → "wunder" is known → skip.
+            if ends_with_hyphen:
+                stem = last_word_clean.rstrip("-")
+                if stem and _is_known(stem):
+                    continue
+
+            gutter_candidates += 1
+
+            # --- Strategy 1: Hyphen join with next row ---
+            next_cell = cell_map.get((ri + 1, ci))
+            if next_cell:
+                next_text = (next_cell.get("text") or "").strip()
+                next_words = next_text.split()
+                if next_words:
+                    first_next = next_words[0]
+                    first_next_clean = _TRAILING_PUNCT_RE.sub('', first_next)
+                    first_alpha = next((c for c in first_next if c.isalpha()), "")
+
+                    # Also skip if the joined word is known (covers compound
+                    # words where the stem alone might not be in the dictionary)
+                    if ends_with_hyphen and first_next_clean:
+                        direct = last_word_clean.rstrip("-") + first_next_clean
+                        if _is_known(direct):
+                            continue
+
+                    # Continuation likely if:
+                    # - explicit hyphen, OR
+                    # - next row starts lowercase (= not a new entry)
+                    if ends_with_hyphen or (first_alpha and first_alpha.islower()):
+                        result = _try_hyphen_join(last_word_clean, first_next)
+                        if result:
+                            joined, missing, conf = result
+                            # Build display parts: show hyphenation for original layout
+                            if ends_with_hyphen:
+                                display_p1 = last_word_clean.rstrip("-")
+                                if missing:
+                                    display_p1 += missing
+                                display_p1 += "-"
+                            else:
+                                display_p1 = last_word_clean
+                                if missing:
+                                    display_p1 += missing + "-"
+                                else:
+                                    display_p1 += "-"
+
+                            suggestion = GutterSuggestion(
+                                type="hyphen_join",
+                                zone_index=zi,
+                                row_index=ri,
+                                col_index=ci,
+                                col_type=col_type,
+                                cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
+                                original_text=last_word,
+                                suggested_text=joined,
+                                next_row_index=ri + 1,
+                                next_row_cell_id=next_cell.get("cell_id", f"R{ri+1:02d}_C{ci}"),
+                                next_row_text=next_text,
+                                missing_chars=missing,
+                                display_parts=[display_p1, first_next],
+                                confidence=conf,
+                                reason="gutter_truncation" if missing else "hyphen_continuation",
+                            )
+                            suggestions.append(suggestion)
+                            continue  # skip spell_fix if hyphen_join found
+
+            # --- Strategy 2: Single-word spell fix (only for longer words) ---
+            fix_result = _try_spell_fix(last_word_clean, col_type)
+            if fix_result:
+                corrected, conf, alts = fix_result
+                suggestion = GutterSuggestion(
+                    type="spell_fix",
+                    zone_index=zi,
+                    row_index=ri,
+                    col_index=ci,
+                    col_type=col_type,
+                    cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
+                    original_text=last_word,
+                    suggested_text=corrected,
+                    alternatives=alts,
+                    confidence=conf,
+                    reason="gutter_blur",
+                )
+                suggestions.append(suggestion)
+
+    duration = round(time.time() - t0, 3)
+
+    logger.info(
+        "Gutter repair: checked %d words, %d gutter candidates, %d suggestions (%.2fs)",
+        words_checked, gutter_candidates, len(suggestions), duration,
+    )
+
+    return {
+        "suggestions": [s.to_dict() for s in suggestions],
+        "stats": {
+            "words_checked": words_checked,
+            "gutter_candidates": gutter_candidates,
+            "suggestions_found": len(suggestions),
+        },
+        "duration_seconds": duration,
+    }
+
+
+def apply_gutter_suggestions(
+    grid_data: Dict[str, Any],
+    accepted_ids: List[str],
+    suggestions: List[Dict[str, Any]],
+) -> Dict[str, Any]:
+    """Apply accepted gutter repair suggestions to the grid data.
+
+    Modifies cells in-place and returns summary of changes.
+
+    Args:
+        grid_data: The grid_editor_result (zones→cells).
+        accepted_ids: List of suggestion IDs the user accepted.
+        suggestions: The full suggestions list (from analyse_grid_for_gutter_repair).
+
+    Returns:
+        Dict with "applied_count" and "changes" list.
+    """
+    accepted_set = set(accepted_ids)
+    accepted_suggestions = [s for s in suggestions if s.get("id") in accepted_set]
+
+    zones = grid_data.get("zones", [])
+    changes: List[Dict[str, Any]] = []
+
+    for s in accepted_suggestions:
+        zi = s.get("zone_index", 0)
+        ri = s.get("row_index", 0)
+        ci = s.get("col_index", 0)
+        stype = s.get("type", "")
+
+        if zi >= len(zones):
+            continue
+        zone_cells = zones[zi].get("cells", [])
+
+        # Find the target cell
+        target_cell = None
+        for cell in zone_cells:
+            if cell.get("row_index") == ri and cell.get("col_index") == ci:
+                target_cell = cell
+                break
+
+        if not target_cell:
+            continue
+
+        old_text = target_cell.get("text", "")
+
+        if stype == "spell_fix":
+            # Replace the last word in the cell text
+            original_word = s.get("original_text", "")
+            corrected = s.get("suggested_text", "")
+            if original_word and corrected:
+                # Replace from the right (last occurrence)
+                idx = old_text.rfind(original_word)
+                if idx >= 0:
+                    new_text = old_text[:idx] + corrected + old_text[idx + len(original_word):]
+                    target_cell["text"] = new_text
+                    changes.append({
+                        "type": "spell_fix",
+                        "zone_index": zi,
+                        "row_index": ri,
+                        "col_index": ci,
+                        "cell_id": target_cell.get("cell_id", ""),
+                        "old_text": old_text,
+                        "new_text": new_text,
+                    })
+
+        elif stype == "hyphen_join":
+            # Current cell: replace last word with the hyphenated first part
+            original_word = s.get("original_text", "")
+            joined = s.get("suggested_text", "")
+            display_parts = s.get("display_parts", [])
+            next_ri = s.get("next_row_index", -1)
+
+            if not original_word or not joined or not display_parts:
+                continue
+
+            # The first display part is what goes in the current row
+            first_part = display_parts[0] if display_parts else ""
+
+            # Replace the last word in current cell with the restored form.
+            # The next row is NOT modified — "künden" stays in its row
+            # because the original book layout has it there. We only fix
+            # the truncated word in the current row (e.g. "ve" → "ver-").
+            idx = old_text.rfind(original_word)
+            if idx >= 0:
+                new_text = old_text[:idx] + first_part + old_text[idx + len(original_word):]
+                target_cell["text"] = new_text
+                changes.append({
+                    "type": "hyphen_join",
+                    "zone_index": zi,
+                    "row_index": ri,
+                    "col_index": ci,
+                    "cell_id": target_cell.get("cell_id", ""),
+                    "old_text": old_text,
+                    "new_text": new_text,
+                    "joined_word": joined,
+                })
+
+    logger.info("Gutter repair applied: %d/%d suggestions", len(changes), len(accepted_suggestions))
+
+    return {
+        "applied_count": len(accepted_suggestions),
+        "changes": changes,
+    }
--- a/klausur-service/backend/cv_ipa_german.py
+++ b/klausur-service/backend/cv_ipa_german.py
@@ -0,0 +1,135 @@
+"""German IPA insertion for grid editor cells.
+
+Hybrid approach:
+  1. Primary lookup: wiki-pronunciation-dict (636k entries, CC-BY-SA)
+  2. Fallback: epitran rule-based G2P (MIT license)
+
+German IPA data sourced from Wiktionary contributors (CC-BY-SA 4.0).
+Attribution required — see grid editor UI.
+
+Lizenz: Code Apache-2.0, IPA-Daten CC-BY-SA 4.0 (Wiktionary)
+DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
+"""
+
+import logging
+import re
+from typing import Dict, List, Optional, Set
+
+logger = logging.getLogger(__name__)
+
+# IPA/phonetic characters — skip cells that already contain IPA
+_IPA_RE = re.compile(r'[\[\]ˈˌːʃʒθðŋɑɒæɔəɛɜɪʊʌ]')
+
+
+def _lookup_ipa_de(word: str) -> Optional[str]:
+    """Look up German IPA for a single word.
+
+    Returns IPA string or None if not found.
+    """
+    from cv_vocab_types import _de_ipa_dict, _epitran_de, DE_IPA_AVAILABLE
+
+    if not DE_IPA_AVAILABLE and _epitran_de is None:
+        return None
+
+    lower = word.lower().strip()
+    if not lower:
+        return None
+
+    # 1. Dictionary lookup (636k entries)
+    ipa = _de_ipa_dict.get(lower)
+    if ipa:
+        return ipa
+
+    # 2. epitran fallback (rule-based)
+    if _epitran_de is not None:
+        try:
+            result = _epitran_de.transliterate(word)
+            if result and result != word.lower():
+                return result
+        except Exception:
+            pass
+
+    return None
+
+
+def _insert_ipa_for_text(text: str) -> str:
+    """Insert German IPA after each recognized word in a text string.
+
+    Handles comma-separated lists:
+      "bildschön, blendend" → "bildschön [bɪltʃøn], blendend [blɛndənt]"
+
+    Skips cells already containing IPA brackets.
+    """
+    if not text or _IPA_RE.search(text):
+        return text
+
+    # Split on comma/semicolon sequences, keeping separators
+    tokens = re.split(r'([,;:]+\s*)', text)
+    result = []
+    changed = False
+
+    for tok in tokens:
+        # Keep separators as-is
+        if not tok or re.match(r'^[,;:\s]+$', tok):
+            result.append(tok)
+            continue
+
+        # Process words within this token
+        words = tok.split()
+        new_words = []
+        for w in words:
+            # Strip punctuation for lookup
+            clean = re.sub(r'[^a-zA-ZäöüÄÖÜß]', '', w)
+            if len(clean) < 3:
+                new_words.append(w)
+                continue
+
+            ipa = _lookup_ipa_de(clean)
+            if ipa:
+                new_words.append(f"{w} [{ipa}]")
+                changed = True
+            else:
+                new_words.append(w)
+
+        result.append(' '.join(new_words))
+
+    return ''.join(result) if changed else text
+
+
+def insert_german_ipa(
+    cells: List[Dict],
+    target_cols: Set[str],
+) -> int:
+    """Insert German IPA transcriptions into cells of target columns.
+
+    Args:
+        cells: Flat list of all cells (modified in-place).
+        target_cols: Set of col_type values to process.
+
+    Returns:
+        Number of cells modified.
+    """
+    from cv_vocab_types import DE_IPA_AVAILABLE, _epitran_de
+
+    if not DE_IPA_AVAILABLE and _epitran_de is None:
+        logger.warning("German IPA not available — skipping")
+        return 0
+
+    count = 0
+    for cell in cells:
+        ct = cell.get("col_type", "")
+        if ct not in target_cols:
+            continue
+        text = cell.get("text", "")
+        if not text.strip():
+            continue
+
+        new_text = _insert_ipa_for_text(text)
+        if new_text != text:
+            cell["text"] = new_text
+            cell["_ipa_corrected"] = True
+            count += 1
+
+    if count:
+        logger.info(f"German IPA inserted in {count} cells")
+    return count
--- a/klausur-service/backend/cv_ocr_engines.py
+++ b/klausur-service/backend/cv_ocr_engines.py
@@ -1030,6 +1030,15 @@ def _text_has_garbled_ipa(text: str) -> bool:
        # Contains IPA special characters
        if any(c in w for c in 'əɪɛɒʊʌæɑɔʃʒθðŋ'):
            return True
+        # Embedded apostrophe suggesting merged garbled IPA with stress mark.
+        # E.g. "Scotland'skotland" — OCR reads ˈ as '.
+        # Guard: apostrophe must be after ≥3 chars and before ≥3 lowercase
+        # chars to avoid contractions (don't, won't, o'clock).
+        if "'" in w and not w.startswith("'"):
+            apos_idx = w.index("'")
+            after = w[apos_idx + 1:]
+            if apos_idx >= 3 and len(after) >= 3 and after[0].islower():
+                return True
    return False


@@ -1173,6 +1182,10 @@ def _insert_missing_ipa(text: str, pronunciation: str = 'british') -> str:
                if wj in ('–', '—', '-', '/', '|', ',', ';'):
                    kept.extend(words[j:])
                    break
+                # Pure digits or numbering (e.g. "1", "2.", "3)") — keep
+                if re.match(r'^[\d.)\-]+$', wj):
+                    kept.extend(words[j:])
+                    break
                # Starts with uppercase — likely German or proper noun
                clean_j = re.sub(r'[^a-zA-Z]', '', wj)
                if clean_j and clean_j[0].isupper():
@@ -1183,6 +1196,19 @@ def _insert_missing_ipa(text: str, pronunciation: str = 'british') -> str:
                    if _lookup_ipa(clean_j, pronunciation):
                        kept.extend(words[j:])
                        break
+                # Merged token: dictionary word + garbled IPA stuck together.
+                # E.g. "fictionsalans'fIkfn" starts with "fiction".
+                # Extract the dictionary prefix (≥4 chars) and add it with
+                # IPA, but only if enough chars remain after the prefix (≥3)
+                # to look like garbled IPA, not just a plural 's'.
+                if clean_j and len(clean_j) >= 7:
+                    for pend in range(min(len(clean_j) - 3, 15), 3, -1):
+                        prefix_j = clean_j[:pend]
+                        prefix_ipa = _lookup_ipa(prefix_j, pronunciation)
+                        if prefix_ipa:
+                            kept.append(f"{prefix_j} [{prefix_ipa}]")
+                            break
+                    break  # rest of this token is garbled
                # Otherwise — likely garbled phonetics, skip
            words = kept
            break
@@ -1221,6 +1247,9 @@ def _has_non_dict_trailing(text: str, pronunciation: str = 'british') -> bool:
        wj = words[j]
        if wj in ('–', '—', '-', '/', '|', ',', ';'):
            return False
+        # Pure digits or numbering (e.g. "1", "2.", "3)") — not garbled IPA
+        if re.match(r'^[\d.)\-]+$', wj):
+            return False
        clean_j = re.sub(r'[^a-zA-Z]', '', wj)
        if clean_j and clean_j[0].isupper():
            return False
@@ -1852,6 +1881,11 @@ def _is_noise_tail_token(token: str) -> bool:
    if t.endswith(']'):
        return False

+    # Keep meaningful punctuation tokens used in textbooks
+    # = (definition marker), (= (definition opener), ; (separator)
+    if t in ('=', '(=', '=)', ';', ':', '-', '–', '—', '/', '+', '&'):
+        return False
+
    # Pure non-alpha → noise ("3", ")", "|")
    alpha_chars = _RE_ALPHA.findall(t)
    if not alpha_chars:
--- a/klausur-service/backend/cv_review.py
+++ b/klausur-service/backend/cv_review.py
@@ -720,6 +720,62 @@ def _spell_dict_knows(word: str) -> bool:
    return bool(_en_spell.known([w])) or bool(_de_spell.known([w]))


+def _try_split_merged_word(token: str) -> Optional[str]:
+    """Try to split a merged word like 'atmyschool' into 'at my school'.
+
+    Uses dynamic programming to find the shortest sequence of dictionary
+    words that covers the entire token.  Only returns a result when the
+    split produces at least 2 words and ALL parts are known dictionary words.
+
+    Preserves original capitalisation by mapping back to the input string.
+    """
+    if not _SPELL_AVAILABLE or len(token) < 4:
+        return None
+
+    lower = token.lower()
+    n = len(lower)
+
+    # dp[i] = (word_lengths_list, score) for best split of lower[:i], or None
+    # Score: (-word_count, sum_of_squared_lengths) — fewer words first,
+    # then prefer longer words (e.g. "come on" over "com eon")
+    dp: list = [None] * (n + 1)
+    dp[0] = ([], 0)
+
+    for i in range(1, n + 1):
+        for j in range(max(0, i - 20), i):
+            if dp[j] is None:
+                continue
+            candidate = lower[j:i]
+            word_len = i - j
+            if word_len == 1 and candidate not in ('a', 'i'):
+                continue
+            if _spell_dict_knows(candidate):
+                prev_words, prev_sq = dp[j]
+                new_words = prev_words + [word_len]
+                new_sq = prev_sq + word_len * word_len
+                new_key = (-len(new_words), new_sq)
+                if dp[i] is None:
+                    dp[i] = (new_words, new_sq)
+                else:
+                    old_key = (-len(dp[i][0]), dp[i][1])
+                    if new_key >= old_key:
+                        # >= so that later splits (longer first word) win ties
+                        dp[i] = (new_words, new_sq)
+
+    if dp[n] is None or len(dp[n][0]) < 2:
+        return None
+
+    # Reconstruct with original casing
+    result = []
+    pos = 0
+    for wlen in dp[n][0]:
+        result.append(token[pos:pos + wlen])
+        pos += wlen
+
+    logger.debug("Split merged word: %r → %r", token, " ".join(result))
+    return " ".join(result)
+
+
 def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
    """Return corrected form of token, or None if no fix needed/possible.

@@ -777,6 +833,14 @@ def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
                    correction = correction[0].upper() + correction[1:]
                if _spell_dict_knows(correction):
                    return correction
+
+    # 5. Merged-word split: OCR often merges adjacent words when spacing
+    #    is too tight, e.g. "atmyschool" → "at my school"
+    if len(token) >= 4 and token.isalpha():
+        split = _try_split_merged_word(token)
+        if split:
+            return split
+
    return None


@@ -817,10 +881,25 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
    """Rule-based OCR correction: spell-checker + structural heuristics.

    Deterministic — never translates, never touches IPA, never hallucinates.
+    Uses SmartSpellChecker for language-aware corrections with context-based
+    disambiguation (a/I), multi-digit substitution, and cross-language guard.
    """
    t0 = time.time()
    changes: List[Dict] = []
    all_corrected: List[Dict] = []
+
+    # Use SmartSpellChecker if available, fall back to legacy _spell_fix_field
+    _smart = None
+    try:
+        from smart_spell import SmartSpellChecker
+        _smart = SmartSpellChecker()
+        logger.debug("spell_review: using SmartSpellChecker")
+    except Exception:
+        logger.debug("spell_review: SmartSpellChecker not available, using legacy")
+
+    # Map field names → language codes for SmartSpellChecker
+    _LANG_MAP = {"english": "en", "german": "de", "example": "auto"}
+
    for i, entry in enumerate(entries):
        e = dict(entry)
        # Page-ref normalization (always, regardless of review status)
@@ -843,9 +922,18 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
            old_val = (e.get(field_name) or "").strip()
            if not old_val:
                continue
-            # example field is mixed-language — try German first (for umlauts)
-            lang = "german" if field_name in ("german", "example") else "english"
-            new_val, was_changed = _spell_fix_field(old_val, field=lang)
+
+            if _smart:
+                # SmartSpellChecker path — language-aware, context-based
+                lang_code = _LANG_MAP.get(field_name, "en")
+                result = _smart.correct_text(old_val, lang=lang_code)
+                new_val = result.corrected
+                was_changed = result.changed
+            else:
+                # Legacy path
+                lang = "german" if field_name in ("german", "example") else "english"
+                new_val, was_changed = _spell_fix_field(old_val, field=lang)
+
            if was_changed and new_val != old_val:
                changes.append({
                    "row_index": e.get("row_index", i),
@@ -857,12 +945,13 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
                e["llm_corrected"] = True
        all_corrected.append(e)
    duration_ms = int((time.time() - t0) * 1000)
+    model_name = "smart-spell-checker" if _smart else "spell-checker"
    return {
        "entries_original": entries,
        "entries_corrected": all_corrected,
        "changes": changes,
        "skipped_count": 0,
-        "model_used": "spell-checker",
+        "model_used": model_name,
        "duration_ms": duration_ms,
    }

--- a/klausur-service/backend/cv_syllable_detect.py
+++ b/klausur-service/backend/cv_syllable_detect.py
@@ -1,11 +1,15 @@
 """
-CV-based syllable divider detection and insertion for dictionary pages.
+Syllable divider insertion for dictionary pages.

-Two-step approach:
-  1. CV: morphological vertical line detection checks if a word_box image
-     contains thin, isolated pipe-like vertical lines (syllable dividers).
-  2. pyphen: inserts syllable breaks at linguistically correct positions
-     for words where CV confirmed the presence of dividers.
+For confirmed dictionary pages (is_dictionary=True), processes all content
+column cells:
+  1. Strips existing | dividers for clean normalization
+  2. Merges pipe-gap spaces (where OCR split a word at a divider position)
+  3. Applies pyphen syllabification to each word >= 3 alpha chars (DE then EN)
+  4. Only modifies words that pyphen recognizes — garbled OCR stays as-is
+
+No CV gate needed — the dictionary detection confidence is sufficient.
+pyphen uses Hunspell/TeX hyphenation dictionaries and is very reliable.

 Lizenz: Apache 2.0 (kommerziell nutzbar)
 DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
@@ -13,94 +17,488 @@ DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.

 import logging
 import re
-from typing import Any, Dict, List
+from typing import Any, Dict, List, Optional, Tuple

-import cv2
 import numpy as np

 logger = logging.getLogger(__name__)

-
-def _word_has_pipe_lines(img_gray: np.ndarray, wb: Dict) -> bool:
-    """CV check: does this word_box image show thin vertical pipe dividers?
-
-    Uses morphological opening with a tall thin kernel to isolate vertical
-    structures, then filters for thin (≤4px), isolated contours that are
-    NOT at the word edges (those would be l, I, 1 etc.).
-    """
-    x = wb.get("left", 0)
-    y = wb.get("top", 0)
-    w = wb.get("width", 0)
-    h = wb.get("height", 0)
-    if w < 30 or h < 12:
-        return False
-    ih, iw = img_gray.shape[:2]
-    y1, y2 = max(0, y), min(ih, y + h)
-    x1, x2 = max(0, x), min(iw, x + w)
-    roi = img_gray[y1:y2, x1:x2]
-    if roi.size == 0:
-        return False
-    rh, rw = roi.shape
-
-    # Binarize (ink = white on black background)
-    _, binary = cv2.threshold(
-        roi, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU
-    )
-
-    # Morphological opening: keep only tall vertical structures (≥55% height)
-    kern_h = max(int(rh * 0.55), 8)
-    kernel = np.ones((kern_h, 1), np.uint8)
-    vertical = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
-
-    # Find surviving contours
-    contours, _ = cv2.findContours(
-        vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
-    )
-
-    margin = max(int(rw * 0.08), 3)
-    for cnt in contours:
-        cx, cy, cw, ch = cv2.boundingRect(cnt)
-        if cw > 4:
-            continue  # too wide for a pipe
-        if cx < margin or cx + cw > rw - margin:
-            continue  # at word edge — likely l, I, 1
-        # Check isolation: adjacent columns should be mostly empty (ink-free)
-        left_zone = binary[cy:cy + ch, max(0, cx - 3):cx]
-        right_zone = binary[cy:cy + ch, cx + cw:min(rw, cx + cw + 3)]
-        left_ink = np.mean(left_zone) if left_zone.size else 255
-        right_ink = np.mean(right_zone) if right_zone.size else 255
-        if left_ink < 80 and right_ink < 80:
-            return True  # isolated thin vertical line = pipe divider
-    return False
-
-
-# IPA/phonetic bracket pattern — don't hyphenate transcriptions
+# IPA/phonetic characters — skip cells containing these
 _IPA_RE = re.compile(r'[\[\]ˈˌːʃʒθðŋɑɒæɔəɛɜɪʊʌ]')

+# Common German words that should NOT be merged with adjacent tokens.
+# These are function words that appear as standalone words between
+# headwords/definitions on dictionary pages.
+_STOP_WORDS = frozenset([
+    # Articles
+    'der', 'die', 'das', 'dem', 'den', 'des',
+    'ein', 'eine', 'einem', 'einen', 'einer',
+    # Pronouns
+    'du', 'er', 'es', 'sie', 'wir', 'ihr', 'ich', 'man', 'sich',
+    'dich', 'dir', 'mich', 'mir', 'uns', 'euch', 'ihm', 'ihn',
+    # Prepositions
+    'mit', 'von', 'zu', 'für', 'auf', 'in', 'an', 'um', 'am', 'im',
+    'aus', 'bei', 'nach', 'vor', 'bis', 'durch', 'über', 'unter',
+    'zwischen', 'ohne', 'gegen',
+    # Conjunctions
+    'und', 'oder', 'als', 'wie', 'wenn', 'dass', 'weil', 'aber',
+    # Adverbs
+    'auch', 'noch', 'nur', 'schon', 'sehr', 'nicht',
+    # Verbs
+    'ist', 'hat', 'wird', 'kann', 'soll', 'muss', 'darf',
+    'sein', 'haben',
+    # Other
+    'kein', 'keine', 'keinem', 'keinen', 'keiner',
+])
+
+# Cached hyphenators
+_hyph_de = None
+_hyph_en = None
+
+# Cached spellchecker (for autocorrect_pipe_artifacts)
+_spell_de = None
+
+
+def _get_hyphenators():
+    """Lazy-load pyphen hyphenators (cached across calls)."""
+    global _hyph_de, _hyph_en
+    if _hyph_de is not None:
+        return _hyph_de, _hyph_en
+    try:
+        import pyphen
+    except ImportError:
+        return None, None
+    _hyph_de = pyphen.Pyphen(lang='de_DE')
+    _hyph_en = pyphen.Pyphen(lang='en_US')
+    return _hyph_de, _hyph_en
+
+
+def _get_spellchecker():
+    """Lazy-load German spellchecker (cached across calls)."""
+    global _spell_de
+    if _spell_de is not None:
+        return _spell_de
+    try:
+        from spellchecker import SpellChecker
+    except ImportError:
+        return None
+    _spell_de = SpellChecker(language='de')
+    return _spell_de
+
+
+def _is_known_word(word: str, hyph_de, hyph_en) -> bool:
+    """Check whether pyphen recognises a word (DE or EN)."""
+    if len(word) < 2:
+        return False
+    return ('|' in hyph_de.inserted(word, hyphen='|')
+            or '|' in hyph_en.inserted(word, hyphen='|'))
+
+
+def _is_real_word(word: str) -> bool:
+    """Check whether spellchecker knows this word (case-insensitive)."""
+    spell = _get_spellchecker()
+    if spell is None:
+        return False
+    return word.lower() in spell
+
+
+def _hyphenate_word(word: str, hyph_de, hyph_en) -> Optional[str]:
+    """Try to hyphenate a word using DE then EN dictionary.
+
+    Returns word with | separators, or None if not recognized.
+    """
+    hyph = hyph_de.inserted(word, hyphen='|')
+    if '|' in hyph:
+        return hyph
+    hyph = hyph_en.inserted(word, hyphen='|')
+    if '|' in hyph:
+        return hyph
+    return None
+
+
+def _autocorrect_piped_word(word_with_pipes: str) -> Optional[str]:
+    """Try to correct a word that has OCR pipe artifacts.
+
+    Printed syllable divider lines on dictionary pages confuse OCR:
+    the vertical stroke is often read as an extra character (commonly
+    ``l``, ``I``, ``1``, ``i``) adjacent to where the pipe appears.
+    Sometimes OCR reads one divider as ``|`` and another as a letter,
+    so the garbled character may be far from any detected pipe.
+
+    Uses ``spellchecker`` (frequency-based word list) for validation —
+    unlike pyphen which is a pattern-based hyphenator and accepts
+    nonsense strings like "Zeplpelin".
+
+    Strategy:
+        1. Strip ``|`` — if spellchecker knows the result, done.
+        2. Try deleting each pipe-like character (l, I, 1, i, t).
+           OCR inserts extra chars that resemble vertical strokes.
+        3. Fall back to spellchecker's own ``correction()`` method.
+        4. Preserve the original casing of the first letter.
+    """
+    stripped = word_with_pipes.replace('|', '')
+    if not stripped or len(stripped) < 3:
+        return stripped  # too short to validate
+
+    # Step 1: if the stripped word is already a real word, done
+    if _is_real_word(stripped):
+        return stripped
+
+    # Step 2: try deleting pipe-like characters (most likely artifacts)
+    _PIPE_LIKE = frozenset('lI1it')
+    for idx in range(len(stripped)):
+        if stripped[idx] not in _PIPE_LIKE:
+            continue
+        candidate = stripped[:idx] + stripped[idx + 1:]
+        if len(candidate) >= 3 and _is_real_word(candidate):
+            return candidate
+
+    # Step 3: use spellchecker's built-in correction
+    spell = _get_spellchecker()
+    if spell is not None:
+        suggestion = spell.correction(stripped.lower())
+        if suggestion and suggestion != stripped.lower():
+            # Preserve original first-letter case
+            if stripped[0].isupper():
+                suggestion = suggestion[0].upper() + suggestion[1:]
+            return suggestion
+
+    return None  # could not fix
+
+
+def autocorrect_pipe_artifacts(
+    zones_data: List[Dict], session_id: str,
+) -> int:
+    """Strip OCR pipe artifacts and correct garbled words in-place.
+
+    Printed syllable divider lines on dictionary scans are read by OCR
+    as ``|`` characters embedded in words (e.g. ``Zel|le``, ``Ze|plpe|lin``).
+    This function:
+
+    1. Strips ``|`` from every word in content cells.
+    2. Validates with spellchecker (real dictionary lookup).
+    3. If not recognised, tries deleting pipe-like characters or uses
+       spellchecker's correction (e.g. ``Zeplpelin`` → ``Zeppelin``).
+    4. Updates both word-box texts and cell text.
+
+    Returns the number of cells modified.
+    """
+    spell = _get_spellchecker()
+    if spell is None:
+        logger.warning("spellchecker not available — pipe autocorrect limited")
+        # Fall back: still strip pipes even without spellchecker
+        pass
+
+    modified = 0
+    for z in zones_data:
+        for cell in z.get("cells", []):
+            ct = cell.get("col_type", "")
+            if not ct.startswith("column_"):
+                continue
+
+            cell_changed = False
+
+            # --- Fix word boxes ---
+            for wb in cell.get("word_boxes", []):
+                wb_text = wb.get("text", "")
+                if "|" not in wb_text:
+                    continue
+
+                # Separate trailing punctuation
+                m = re.match(
+                    r'^([^a-zA-ZäöüÄÖÜßẞ]*)'
+                    r'(.*?)'
+                    r'([^a-zA-ZäöüÄÖÜßẞ]*)$',
+                    wb_text,
+                )
+                if not m:
+                    continue
+                lead, core, trail = m.group(1), m.group(2), m.group(3)
+                if "|" not in core:
+                    continue
+
+                corrected = _autocorrect_piped_word(core)
+                if corrected is not None and corrected != core:
+                    wb["text"] = lead + corrected + trail
+                    cell_changed = True
+
+            # --- Rebuild cell text from word boxes ---
+            if cell_changed:
+                wbs = cell.get("word_boxes", [])
+                if wbs:
+                    cell["text"] = " ".join(
+                        (wb.get("text") or "") for wb in wbs
+                    )
+                modified += 1
+
+            # --- Fallback: strip residual | from cell text ---
+            # (covers cases where word_boxes don't exist or weren't fixed)
+            text = cell.get("text", "")
+            if "|" in text:
+                clean = text.replace("|", "")
+                if clean != text:
+                    cell["text"] = clean
+                    if not cell_changed:
+                        modified += 1
+
+    if modified:
+        logger.info(
+            "build-grid session %s: autocorrected pipe artifacts in %d cells",
+            session_id, modified,
+        )
+    return modified
+
+
+def _try_merge_pipe_gaps(text: str, hyph_de) -> str:
+    """Merge fragments separated by single spaces where OCR split at a pipe.
+
+    Example: "Kaf fee" -> "Kaffee" (pyphen recognizes the merged word).
+    Multi-step: "Ka bel jau" -> "Kabel jau" -> "Kabeljau".
+
+    Guards against false merges:
+    - The FIRST token must be pure alpha (word start — no attached punctuation)
+    - The second token may have trailing punctuation (comma, period) which
+      stays attached to the merged word: "Kä" + "fer," -> "Käfer,"
+    - Common German function words (der, die, das, ...) are never merged
+    - At least one fragment must be very short (<=3 alpha chars)
+    """
+    parts = text.split(' ')
+    if len(parts) < 2:
+        return text
+
+    result = [parts[0]]
+    i = 1
+    while i < len(parts):
+        prev = result[-1]
+        curr = parts[i]
+
+        # Extract alpha-only core for lookup
+        prev_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßẞ]', '', prev)
+        curr_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßẞ]', '', curr)
+
+        # Guard 1: first token must be pure alpha (word-start fragment)
+        #          second token may have trailing punctuation
+        # Guard 2: neither alpha core can be a common German function word
+        # Guard 3: the shorter fragment must be <= 3 chars (pipe-gap signal)
+        # Guard 4: combined length must be >= 4
+        should_try = (
+            prev == prev_alpha  # first token: pure alpha (word start)
+            and prev_alpha and curr_alpha
+            and prev_alpha.lower() not in _STOP_WORDS
+            and curr_alpha.lower() not in _STOP_WORDS
+            and min(len(prev_alpha), len(curr_alpha)) <= 3
+            and len(prev_alpha) + len(curr_alpha) >= 4
+        )
+
+        if should_try:
+            merged_alpha = prev_alpha + curr_alpha
+            hyph = hyph_de.inserted(merged_alpha, hyphen='-')
+            if '-' in hyph:
+                # pyphen recognizes merged word — collapse the space
+                result[-1] = prev + curr
+                i += 1
+                continue
+
+        result.append(curr)
+        i += 1
+
+    return ' '.join(result)
+
+
+def merge_word_gaps_in_zones(zones_data: List[Dict], session_id: str) -> int:
+    """Merge OCR word-gap fragments in cell texts using pyphen validation.
+
+    OCR often splits words at syllable boundaries into separate word_boxes,
+    producing text like "zerknit tert" instead of "zerknittert".  This
+    function tries to merge adjacent fragments in every content cell.
+
+    More permissive than ``_try_merge_pipe_gaps`` (threshold 5 instead of 3)
+    but still guarded by pyphen dictionary lookup and stop-word exclusion.
+
+    Returns the number of cells modified.
+    """
+    hyph_de, _ = _get_hyphenators()
+    if hyph_de is None:
+        return 0
+
+    modified = 0
+    for z in zones_data:
+        for cell in z.get("cells", []):
+            ct = cell.get("col_type", "")
+            if not ct.startswith("column_"):
+                continue
+            text = cell.get("text", "")
+            if not text or " " not in text:
+                continue
+
+            # Skip IPA cells
+            text_no_brackets = re.sub(r'\[[^\]]*\]', '', text)
+            if _IPA_RE.search(text_no_brackets):
+                continue
+
+            new_text = _try_merge_word_gaps(text, hyph_de)
+            if new_text != text:
+                cell["text"] = new_text
+                modified += 1
+
+    if modified:
+        logger.info(
+            "build-grid session %s: merged word gaps in %d cells",
+            session_id, modified,
+        )
+    return modified
+
+
+def _try_merge_word_gaps(text: str, hyph_de) -> str:
+    """Merge OCR word fragments with relaxed threshold (max_short=5).
+
+    Similar to ``_try_merge_pipe_gaps`` but allows slightly longer fragments
+    (max_short=5 instead of 3).  Still requires pyphen to recognize the
+    merged word.
+    """
+    parts = text.split(' ')
+    if len(parts) < 2:
+        return text
+
+    result = [parts[0]]
+    i = 1
+    while i < len(parts):
+        prev = result[-1]
+        curr = parts[i]
+
+        prev_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßẞ]', '', prev)
+        curr_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßẞ]', '', curr)
+
+        should_try = (
+            prev == prev_alpha
+            and prev_alpha and curr_alpha
+            and prev_alpha.lower() not in _STOP_WORDS
+            and curr_alpha.lower() not in _STOP_WORDS
+            and min(len(prev_alpha), len(curr_alpha)) <= 5
+            and len(prev_alpha) + len(curr_alpha) >= 4
+        )
+
+        if should_try:
+            merged_alpha = prev_alpha + curr_alpha
+            hyph = hyph_de.inserted(merged_alpha, hyphen='-')
+            if '-' in hyph:
+                result[-1] = prev + curr
+                i += 1
+                continue
+
+        result.append(curr)
+        i += 1
+
+    return ' '.join(result)
+
+
+def _syllabify_text(text: str, hyph_de, hyph_en) -> str:
+    """Syllabify all significant words in a text string.
+
+    1. Strip existing | dividers
+    2. Merge pipe-gap spaces where possible
+    3. Apply pyphen to each word >= 3 alphabetic chars
+    4. Words pyphen doesn't recognize stay as-is (no bad guesses)
+    """
+    if not text:
+        return text
+
+    # Skip cells that contain IPA transcription characters outside brackets.
+    # Bracket content like [bɪltʃøn] is programmatically inserted and should
+    # not block syllabification of the surrounding text.
+    text_no_brackets = re.sub(r'\[[^\]]*\]', '', text)
+    if _IPA_RE.search(text_no_brackets):
+        return text
+
+    # Phase 1: strip existing pipe dividers for clean normalization
+    clean = text.replace('|', '')
+
+    # Phase 2: merge pipe-gap spaces (OCR fragments from pipe splitting)
+    clean = _try_merge_pipe_gaps(clean, hyph_de)
+
+    # Phase 3: tokenize and syllabify each word
+    # Split on whitespace and comma/semicolon sequences, keeping separators
+    tokens = re.split(r'(\s+|[,;:]+\s*)', clean)
+
+    result = []
+    for tok in tokens:
+        if not tok or re.match(r'^[\s,;:]+$', tok):
+            result.append(tok)
+            continue
+
+        # Strip trailing/leading punctuation for pyphen lookup
+        m = re.match(r'^([^a-zA-ZäöüÄÖÜßẞ]*)(.*?)([^a-zA-ZäöüÄÖÜßẞ]*)$', tok)
+        if not m:
+            result.append(tok)
+            continue
+        lead, word, trail = m.group(1), m.group(2), m.group(3)
+
+        if len(word) < 3 or not re.search(r'[a-zA-ZäöüÄÖÜß]', word):
+            result.append(tok)
+            continue
+
+        hyph = _hyphenate_word(word, hyph_de, hyph_en)
+        if hyph:
+            result.append(lead + hyph + trail)
+        else:
+            result.append(tok)
+
+    return ''.join(result)
+

 def insert_syllable_dividers(
    zones_data: List[Dict],
    img_bgr: np.ndarray,
    session_id: str,
+    *,
+    force: bool = False,
+    col_filter: Optional[set] = None,
 ) -> int:
-    """Insert pipe syllable dividers into dictionary cells where CV confirms them.
+    """Insert pipe syllable dividers into dictionary cells.

-    For each cell on a dictionary page:
-      1. Check if ANY word_box has CV-detected pipe lines
-      2. If yes, apply pyphen to EACH word (≥4 chars) in the cell
-      3. Try DE hyphenation first, then EN
+    For dictionary pages: process all content column cells, strip existing
+    pipes, merge pipe-gap spaces, and re-syllabify using pyphen.
+
+    Pre-check: at least 1% of content cells must already contain ``|`` from
+    OCR.  This guards against pages with zero pipe characters (the primary
+    guard — article_col_index — is checked at the call site).
+
+    Args:
+        force: If True, skip the pipe-ratio pre-check and syllabify all
+            content words regardless of whether the original has pipe dividers.
+        col_filter: If set, only process cells whose col_type is in this set.
+            None means process all content columns.

    Returns the number of cells modified.
    """
-    try:
-        import pyphen
-    except ImportError:
+    hyph_de, hyph_en = _get_hyphenators()
+    if hyph_de is None:
        logger.warning("pyphen not installed — skipping syllable insertion")
        return 0

-    _hyph_de = pyphen.Pyphen(lang='de_DE')
-    _hyph_en = pyphen.Pyphen(lang='en_US')
-    img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
+    # Pre-check: count cells that already have | from OCR.
+    # Real dictionary pages with printed syllable dividers will have OCR-
+    # detected pipes in many cells.  Pages without syllable dividers will
+    # have zero — skip those to avoid false syllabification.
+    if not force:
+        total_col_cells = 0
+        cells_with_pipes = 0
+        for z in zones_data:
+            for cell in z.get("cells", []):
+                if cell.get("col_type", "").startswith("column_"):
+                    total_col_cells += 1
+                    if "|" in cell.get("text", ""):
+                        cells_with_pipes += 1
+
+        if total_col_cells > 0:
+            pipe_ratio = cells_with_pipes / total_col_cells
+            if pipe_ratio < 0.01:
+                logger.info(
+                    "build-grid session %s: skipping syllable insertion — "
+                    "only %.1f%% of cells have existing pipes (need >=1%%)",
+                    session_id, pipe_ratio * 100,
+                )
+                return 0

    insertions = 0
    for z in zones_data:
@@ -108,48 +506,27 @@ def insert_syllable_dividers(
            ct = cell.get("col_type", "")
            if not ct.startswith("column_"):
                continue
+            if col_filter is not None and ct not in col_filter:
+                continue
            text = cell.get("text", "")
-            if not text or "|" in text:
-                continue
-            if _IPA_RE.search(text):
+            if not text:
                continue

-            # CV gate: check if ANY word_box in this cell has pipe lines
-            wbs = cell.get("word_boxes") or []
-            if not any(_word_has_pipe_lines(img_gray, wb) for wb in wbs):
+            # In auto mode (force=False), only normalize cells that already
+            # have | from OCR (i.e. printed syllable dividers on the original
+            # scan).  Don't add new syllable marks to other words.
+            if not force and "|" not in text:
                continue

-            # Apply pyphen to each significant word in the cell
-            tokens = re.split(r'(\s+|[,;]+\s*)', text)
-            new_tokens = []
-            changed = False
-            for tok in tokens:
-                # Skip whitespace/punctuation separators
-                if re.match(r'^[\s,;]+$', tok):
-                    new_tokens.append(tok)
-                    continue
-                # Only hyphenate words ≥ 4 alpha chars
-                clean = re.sub(r'[().\-]', '', tok)
-                if len(clean) < 4 or not re.search(r'[a-zA-ZäöüÄÖÜß]', clean):
-                    new_tokens.append(tok)
-                    continue
-                # Try DE first, then EN
-                hyph = _hyph_de.inserted(tok, hyphen='|')
-                if '|' not in hyph:
-                    hyph = _hyph_en.inserted(tok, hyphen='|')
-                if '|' in hyph and hyph != tok:
-                    new_tokens.append(hyph)
-                    changed = True
-                else:
-                    new_tokens.append(tok)
-            if changed:
-                cell["text"] = ''.join(new_tokens)
+            new_text = _syllabify_text(text, hyph_de, hyph_en)
+            if new_text != text:
+                cell["text"] = new_text
                insertions += 1

    if insertions:
        logger.info(
-            "build-grid session %s: inserted syllable dividers in %d cells "
-            "(CV-validated)",
+            "build-grid session %s: syllable dividers inserted/normalized "
+            "in %d cells (pyphen)",
            session_id, insertions,
        )
    return insertions
--- a/klausur-service/backend/cv_vocab_types.py
+++ b/klausur-service/backend/cv_vocab_types.py
@@ -65,6 +65,38 @@ if os.path.exists(_britfone_path):
 else:
    logger.info("Britfone not found — British IPA disabled")

+# --- German IPA Dictionary (CC-BY-SA, Wiktionary) ---
+
+DE_IPA_AVAILABLE = False
+_de_ipa_dict: Dict[str, str] = {}
+
+_de_ipa_path = os.path.join(os.path.dirname(__file__), 'data', 'de_ipa.tsv')
+if os.path.exists(_de_ipa_path):
+    try:
+        with open(_de_ipa_path, 'r', encoding='utf-8') as f:
+            for line in f:
+                parts = line.rstrip('\n').split('\t', 1)
+                if len(parts) == 2:
+                    _de_ipa_dict[parts[0]] = parts[1]
+        DE_IPA_AVAILABLE = True
+        logger.info(f"German IPA loaded — {len(_de_ipa_dict)} entries (CC-BY-SA, Wiktionary)")
+    except Exception as e:
+        logger.warning(f"Failed to load German IPA: {e}")
+else:
+    logger.info("German IPA not found — German IPA disabled")
+
+# --- epitran German fallback (MIT license) ---
+
+_epitran_de = None
+try:
+    import epitran as _epitran_module
+    _epitran_de = _epitran_module.Epitran('deu-Latn')
+    logger.info("epitran loaded — German rule-based IPA fallback enabled")
+except ImportError:
+    logger.info("epitran not installed — German IPA fallback disabled")
+except Exception as e:
+    logger.warning(f"Failed to init epitran: {e}")
+
 # --- Language Detection Constants ---

 GERMAN_FUNCTION_WORDS = {'der', 'die', 'das', 'und', 'ist', 'ein', 'eine', 'nicht',
--- a/klausur-service/backend/data/de_ipa.tsv
+++ b/klausur-service/backend/data/de_ipa.tsv
--- a/klausur-service/backend/grid_build_core.py
+++ b/klausur-service/backend/grid_build_core.py
--- a/klausur-service/backend/grid_editor_api.py
+++ b/klausur-service/backend/grid_editor_api.py
--- a/klausur-service/backend/grid_editor_helpers.py
+++ b/klausur-service/backend/grid_editor_helpers.py
@@ -15,12 +15,155 @@ from typing import Any, Dict, List, Optional, Tuple
 import cv2
 import numpy as np

+from cv_vocab_types import PageZone
 from cv_words_first import _cluster_rows, _build_cells
 from cv_ocr_engines import _text_has_garbled_ipa

 logger = logging.getLogger(__name__)


+# ---------------------------------------------------------------------------
+# Cross-column word splitting
+# ---------------------------------------------------------------------------
+
+_spell_cache: Optional[Any] = None
+_spell_loaded = False
+
+
+def _is_recognized_word(text: str) -> bool:
+    """Check if *text* is a recognized German or English word.
+
+    Uses the spellchecker library (same as cv_syllable_detect.py).
+    Returns True for real words like "oder", "Kabel", "Zeitung".
+    Returns False for OCR merge artifacts like "sichzie", "dasZimmer".
+    """
+    global _spell_cache, _spell_loaded
+    if not text or len(text) < 2:
+        return False
+
+    if not _spell_loaded:
+        _spell_loaded = True
+        try:
+            from spellchecker import SpellChecker
+            _spell_cache = SpellChecker(language="de")
+        except Exception:
+            pass
+
+    if _spell_cache is None:
+        return False
+
+    return text.lower() in _spell_cache
+
+
+def _split_cross_column_words(
+    words: List[Dict],
+    columns: List[Dict],
+) -> List[Dict]:
+    """Split word boxes that span across column boundaries.
+
+    When OCR merges adjacent words from different columns (e.g. "sichzie"
+    spanning Col 1 and Col 2, or "dasZimmer" crossing the boundary),
+    split the word box at the column boundary so each piece is assigned
+    to the correct column.
+
+    Only splits when:
+    - The word has significant overlap (>15% of its width) on both sides
+    - AND the word is not a recognized real word (OCR merge artifact), OR
+      the word contains a case transition (lowercase→uppercase) near the
+      boundary indicating two merged words like "dasZimmer".
+    """
+    if len(columns) < 2:
+        return words
+
+    # Column boundaries = midpoints between adjacent column edges
+    boundaries = []
+    for i in range(len(columns) - 1):
+        boundary = (columns[i]["x_max"] + columns[i + 1]["x_min"]) / 2
+        boundaries.append(boundary)
+
+    new_words: List[Dict] = []
+    split_count = 0
+    for w in words:
+        w_left = w["left"]
+        w_width = w["width"]
+        w_right = w_left + w_width
+        text = (w.get("text") or "").strip()
+
+        if not text or len(text) < 4 or w_width < 10:
+            new_words.append(w)
+            continue
+
+        # Find the first boundary this word straddles significantly
+        split_boundary = None
+        for b in boundaries:
+            if w_left < b < w_right:
+                left_part = b - w_left
+                right_part = w_right - b
+                # Both sides must have at least 15% of the word width
+                if left_part > w_width * 0.15 and right_part > w_width * 0.15:
+                    split_boundary = b
+                    break
+
+        if split_boundary is None:
+            new_words.append(w)
+            continue
+
+        # Compute approximate split position in the text.
+        left_width = split_boundary - w_left
+        split_ratio = left_width / w_width
+        approx_pos = len(text) * split_ratio
+
+        # Strategy 1: look for a case transition (lowercase→uppercase) near
+        # the approximate split point — e.g. "dasZimmer" splits at 'Z'.
+        split_char = None
+        search_lo = max(1, int(approx_pos) - 3)
+        search_hi = min(len(text), int(approx_pos) + 2)
+        for i in range(search_lo, search_hi):
+            if text[i - 1].islower() and text[i].isupper():
+                split_char = i
+                break
+
+        # Strategy 2: if no case transition, only split if the whole word
+        # is NOT a real word (i.e. it's an OCR merge artifact like "sichzie").
+        # Real words like "oder", "Kabel", "Zeitung" must not be split.
+        if split_char is None:
+            clean = re.sub(r"[,;:.!?]+$", "", text)  # strip trailing punct
+            if _is_recognized_word(clean):
+                new_words.append(w)
+                continue
+            # Not a real word — use floor of proportional position
+            split_char = max(1, min(len(text) - 1, int(approx_pos)))
+
+        left_text = text[:split_char].rstrip()
+        right_text = text[split_char:].lstrip()
+
+        if len(left_text) < 2 or len(right_text) < 2:
+            new_words.append(w)
+            continue
+
+        right_width = w_width - round(left_width)
+        new_words.append({
+            **w,
+            "text": left_text,
+            "width": round(left_width),
+        })
+        new_words.append({
+            **w,
+            "text": right_text,
+            "left": round(split_boundary),
+            "width": right_width,
+        })
+        split_count += 1
+        logger.info(
+            "split cross-column word %r → %r + %r at boundary %.0f",
+            text, left_text, right_text, split_boundary,
+        )
+
+    if split_count:
+        logger.info("split %d cross-column word(s)", split_count)
+    return new_words
+
+
 def _filter_border_strip_words(words: List[Dict]) -> Tuple[List[Dict], int]:
    """Remove page-border decoration strip words BEFORE column detection.

@@ -137,8 +280,27 @@ def _cluster_columns_by_alignment(
        median_gap = sorted_gaps[len(sorted_gaps) // 2]
        heights = [w["height"] for w in words if w.get("height", 0) > 0]
        median_h = sorted(heights)[len(heights) // 2] if heights else 25
-        # Column boundary: gap > 3× median gap or > 1.5× median word height
-        gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
+
+        # For small word counts (boxes, sub-zones): PaddleOCR returns
+        # multi-word blocks, so ALL inter-word gaps are potential column
+        # boundaries.  Use a low threshold based on word height — any gap
+        # wider than ~1x median word height is a column separator.
+        if len(words) <= 60:
+            gap_threshold = max(median_h * 1.0, 25)
+            logger.info(
+                "alignment columns (small zone): gap_threshold=%.0f "
+                "(median_h=%.0f, %d words, %d gaps: %s)",
+                gap_threshold, median_h, len(words), len(sorted_gaps),
+                [int(g) for g in sorted_gaps[:10]],
+            )
+        else:
+            # Standard approach for large zones (full pages)
+            gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
+            # Cap at 25% of zone width
+            max_gap = zone_w * 0.25
+            if gap_threshold > max_gap > 30:
+                logger.info("alignment columns: capping gap_threshold %.0f → %.0f (25%% of zone_w=%d)", gap_threshold, max_gap, zone_w)
+                gap_threshold = max_gap
    else:
        gap_threshold = 50

@@ -232,13 +394,17 @@ def _cluster_columns_by_alignment(
    used_ids = {id(c) for c in primary} | {id(c) for c in secondary}
    sig_xs = [c["mean_x"] for c in primary + secondary]

-    MIN_DISTINCT_ROWS_TERTIARY = max(MIN_DISTINCT_ROWS + 1, 4)
-    MIN_COVERAGE_TERTIARY = 0.05  # at least 5% of rows
+    # Tertiary: clusters that are clearly to the LEFT of the first
+    # significant column (or RIGHT of the last).  If words consistently
+    # start at a position left of the established first column boundary,
+    # they MUST be a separate column — regardless of how few rows they
+    # cover.  The only requirement is a clear spatial gap.
+    MIN_COVERAGE_TERTIARY = 0.02  # at least 1 row effectively
    tertiary = []
    for c in clusters:
        if id(c) in used_ids:
            continue
-        if c["distinct_rows"] < MIN_DISTINCT_ROWS_TERTIARY:
+        if c["distinct_rows"] < 1:
            continue
        if c["row_coverage"] < MIN_COVERAGE_TERTIARY:
            continue
@@ -906,13 +1072,42 @@ def _detect_heading_rows_by_single_cell(
            text = (cell.get("text") or "").strip()
            if not text or text.startswith("["):
                continue
+            # Continuation lines start with "(" — e.g. "(usw.)", "(TV-Serie)"
+            if text.startswith("("):
+                continue
+            # Single cell NOT in the first content column is likely a
+            # continuation/overflow line, not a heading.  Real headings
+            # ("Theme 1", "Unit 3: ...") appear in the first or second
+            # content column.
+            first_content_col = col_indices[0] if col_indices else 0
+            if cell.get("col_index", 0) > first_content_col + 1:
+                continue
            # Skip garbled IPA without brackets (e.g. "ska:f – ska:vz")
            # but NOT text with real IPA symbols (e.g. "Theme [θˈiːm]")
            _REAL_IPA_CHARS = set("ˈˌəɪɛɒʊʌæɑɔʃʒθðŋ")
            if _text_has_garbled_ipa(text) and not any(c in _REAL_IPA_CHARS for c in text):
                continue
+            # Guard: dictionary section headings are short (1-4 alpha chars
+            # like "A", "Ab", "Zi", "Sch").  Longer text that starts
+            # lowercase is a regular vocabulary word (e.g. "zentral") that
+            # happens to appear alone in its row.
+            alpha_only = re.sub(r'[^a-zA-ZäöüÄÖÜßẞ]', '', text)
+            if len(alpha_only) > 4 and text[0].islower():
+                continue
            heading_row_indices.append(ri)

+        # Guard: if >25% of eligible rows would become headings, the
+        # heuristic is misfiring (e.g. sparse single-column layout where
+        # most rows naturally have only 1 content cell).
+        eligible_rows = len(non_header_rows) - 2  # minus first/last excluded
+        if eligible_rows > 0 and len(heading_row_indices) > eligible_rows * 0.25:
+            logger.debug(
+                "Skipping single-cell heading detection for zone %s: "
+                "%d/%d rows would be headings (>25%%)",
+                z.get("zone_index"), len(heading_row_indices), eligible_rows,
+            )
+            continue
+
        for hri in heading_row_indices:
            header_cells = [c for c in cells if c.get("row_index") == hri]
            if not header_cells:
@@ -1023,6 +1218,130 @@ def _detect_header_rows(
    return headers


+def _detect_colspan_cells(
+    zone_words: List[Dict],
+    columns: List[Dict],
+    rows: List[Dict],
+    cells: List[Dict],
+    img_w: int,
+    img_h: int,
+) -> List[Dict]:
+    """Detect and merge cells that span multiple columns (colspan).
+
+    A word-block (PaddleOCR phrase) that extends significantly past a column
+    boundary into the next column indicates a merged cell.  This replaces
+    the incorrectly split cells with a single cell spanning multiple columns.
+
+    Works for both full-page scans and box zones.
+    """
+    if len(columns) < 2 or not zone_words or not rows:
+        return cells
+
+    from cv_words_first import _assign_word_to_row
+
+    # Column boundaries (midpoints between adjacent columns)
+    col_boundaries = []
+    for ci in range(len(columns) - 1):
+        col_boundaries.append((columns[ci]["x_max"] + columns[ci + 1]["x_min"]) / 2)
+
+    def _cols_covered(w_left: float, w_right: float) -> List[int]:
+        """Return list of column indices that a word-block covers."""
+        covered = []
+        for col in columns:
+            col_mid = (col["x_min"] + col["x_max"]) / 2
+            # Word covers a column if it extends past the column's midpoint
+            if w_left < col_mid < w_right:
+                covered.append(col["index"])
+            # Also include column if word starts within it
+            elif col["x_min"] <= w_left < col["x_max"]:
+                covered.append(col["index"])
+        return sorted(set(covered))
+
+    # Group original word-blocks by row
+    row_word_blocks: Dict[int, List[Dict]] = {}
+    for w in zone_words:
+        ri = _assign_word_to_row(w, rows)
+        row_word_blocks.setdefault(ri, []).append(w)
+
+    # For each row, check if any word-block spans multiple columns
+    rows_to_merge: Dict[int, List[Dict]] = {}  # row_index → list of spanning word-blocks
+
+    for ri, wblocks in row_word_blocks.items():
+        spanning = []
+        for w in wblocks:
+            w_left = w["left"]
+            w_right = w_left + w["width"]
+            covered = _cols_covered(w_left, w_right)
+            if len(covered) >= 2:
+                spanning.append({"word": w, "cols": covered})
+        if spanning:
+            rows_to_merge[ri] = spanning
+
+    if not rows_to_merge:
+        return cells
+
+    # Merge cells for spanning rows
+    new_cells = []
+    for cell in cells:
+        ri = cell.get("row_index", -1)
+        if ri not in rows_to_merge:
+            new_cells.append(cell)
+            continue
+
+        # Check if this cell's column is part of a spanning block
+        ci = cell.get("col_index", -1)
+        is_part_of_span = False
+        for span in rows_to_merge[ri]:
+            if ci in span["cols"]:
+                is_part_of_span = True
+                # Only emit the merged cell for the FIRST column in the span
+                if ci == span["cols"][0]:
+                    # Use the ORIGINAL word-block text (not the split cell texts
+                    # which may have broken words like "euros a" + "nd cents")
+                    orig_word = span["word"]
+                    merged_text = orig_word.get("text", "").strip()
+                    all_wb = [orig_word]
+
+                    # Compute merged bbox
+                    if all_wb:
+                        x_min = min(wb["left"] for wb in all_wb)
+                        y_min = min(wb["top"] for wb in all_wb)
+                        x_max = max(wb["left"] + wb["width"] for wb in all_wb)
+                        y_max = max(wb["top"] + wb["height"] for wb in all_wb)
+                    else:
+                        x_min = y_min = x_max = y_max = 0
+
+                    new_cells.append({
+                        "cell_id": cell["cell_id"],
+                        "row_index": ri,
+                        "col_index": span["cols"][0],
+                        "col_type": "spanning_header",
+                        "colspan": len(span["cols"]),
+                        "text": merged_text,
+                        "confidence": cell.get("confidence", 0),
+                        "bbox_px": {"x": x_min, "y": y_min,
+                                    "w": x_max - x_min, "h": y_max - y_min},
+                        "bbox_pct": {
+                            "x": round(x_min / img_w * 100, 2) if img_w else 0,
+                            "y": round(y_min / img_h * 100, 2) if img_h else 0,
+                            "w": round((x_max - x_min) / img_w * 100, 2) if img_w else 0,
+                            "h": round((y_max - y_min) / img_h * 100, 2) if img_h else 0,
+                        },
+                        "word_boxes": all_wb,
+                        "ocr_engine": cell.get("ocr_engine", ""),
+                        "is_bold": cell.get("is_bold", False),
+                    })
+                    logger.info(
+                        "colspan detected: row %d, cols %s → merged %d cells (%r)",
+                        ri, span["cols"], len(span["cols"]), merged_text[:50],
+                    )
+                break
+        if not is_part_of_span:
+            new_cells.append(cell)
+
+    return new_cells
+
+
 def _build_zone_grid(
    zone_words: List[Dict],
    zone_x: int,
@@ -1091,9 +1410,24 @@ def _build_zone_grid(
            "header_rows": [],
        }

+    # Split word boxes that straddle column boundaries (e.g. "sichzie"
+    # spanning Col 1 + Col 2).  Must happen after column detection and
+    # before cell assignment.
+    # Keep original words for colspan detection (split destroys span info).
+    original_zone_words = zone_words
+    if len(columns) >= 2:
+        zone_words = _split_cross_column_words(zone_words, columns)
+
    # Build cells
    cells = _build_cells(zone_words, columns, rows, img_w, img_h)

+    # --- Detect colspan (merged cells spanning multiple columns) ---
+    # Uses the ORIGINAL (pre-split) words to detect word-blocks that span
+    # multiple columns.  _split_cross_column_words would have destroyed
+    # this information by cutting words at column boundaries.
+    if len(columns) >= 2:
+        cells = _detect_colspan_cells(original_zone_words, columns, rows, cells, img_w, img_h)
+
    # Prefix cell IDs with zone index
    for cell in cells:
        cell["cell_id"] = f"Z{zone_index}_{cell['cell_id']}"
@@ -1288,29 +1622,42 @@ def _filter_footer_words(
    img_h: int,
    log: Any,
    session_id: str,
-) -> None:
+) -> Optional[Dict]:
    """Remove isolated words in the bottom 5% of the page (page numbers).

-    Modifies *words* in place.
+    Modifies *words* in place and returns a page_number metadata dict
+    if a page number was extracted, or None.
    """
    if not words or img_h <= 0:
-        return
+        return None
    footer_y = img_h * 0.95
    footer_words = [
        w for w in words
        if w["top"] + w.get("height", 0) / 2 > footer_y
    ]
    if not footer_words:
-        return
+        return None
    # Only remove if footer has very few words (≤ 3) with short text
    total_text = "".join((w.get("text") or "").strip() for w in footer_words)
    if len(footer_words) <= 3 and len(total_text) <= 10:
+        # Extract page number metadata before removing
+        page_number_info = {
+            "text": total_text.strip(),
+            "y_pct": round(footer_words[0]["top"] / img_h * 100, 1),
+        }
+        # Try to parse as integer
+        digits = "".join(c for c in total_text if c.isdigit())
+        if digits:
+            page_number_info["number"] = int(digits)
+
        footer_set = set(id(w) for w in footer_words)
        words[:] = [w for w in words if id(w) not in footer_set]
        log.info(
-            "build-grid session %s: removed %d footer words ('%s')",
-            session_id, len(footer_words), total_text,
+            "build-grid session %s: extracted page number '%s' and removed %d footer words",
+            session_id, total_text, len(footer_words),
        )
+        return page_number_info
+    return None


 def _filter_header_junk(
--- a/klausur-service/backend/main.py
+++ b/klausur-service/backend/main.py
@@ -46,6 +46,7 @@ from ocr_pipeline_api import router as ocr_pipeline_router, _cache as ocr_pipeli
 from grid_editor_api import router as grid_editor_router
 from orientation_crop_api import router as orientation_crop_router, set_cache_ref as set_orientation_crop_cache
 from ocr_pipeline_session_store import init_ocr_pipeline_tables
+from ocr_kombi.router import router as ocr_kombi_router
 try:
    from handwriting_htr_api import router as htr_router
 except ImportError:
@@ -186,6 +187,7 @@ if htr_router:
    app.include_router(htr_router)            # Handwriting HTR (Klausur)
 if dsfa_rag_router:
    app.include_router(dsfa_rag_router)   # DSFA RAG Corpus Search
+app.include_router(ocr_kombi_router)      # OCR Kombi Pipeline (modular)


 # =============================================
--- a/klausur-service/backend/migrations/009_add_document_group.sql
+++ b/klausur-service/backend/migrations/009_add_document_group.sql
@@ -0,0 +1,12 @@
+-- Migration: Add document_group_id and page_number for multi-page document grouping.
+-- A document_group_id groups multiple sessions that belong to the same scanned document.
+-- page_number is the 1-based page index within the group.
+
+ALTER TABLE ocr_pipeline_sessions
+    ADD COLUMN IF NOT EXISTS document_group_id UUID,
+    ADD COLUMN IF NOT EXISTS page_number INT;
+
+-- Index for efficient group lookups
+CREATE INDEX IF NOT EXISTS idx_ocr_sessions_document_group
+    ON ocr_pipeline_sessions (document_group_id)
+    WHERE document_group_id IS NOT NULL;
--- a/klausur-service/backend/ocr_kombi/init.py
+++ b/klausur-service/backend/ocr_kombi/init.py
@@ -0,0 +1 @@
+"""OCR Kombi Pipeline - modular step-based OCR processing."""
--- a/klausur-service/backend/ocr_kombi/router.py
+++ b/klausur-service/backend/ocr_kombi/router.py
@@ -0,0 +1,19 @@
+"""
+Composite router for the OCR Kombi pipeline.
+
+Aggregates step-specific sub-routers into one router for main.py to include.
+"""
+
+from fastapi import APIRouter
+
+from .step_upload import router as upload_router
+
+router = APIRouter(prefix="/api/v1/ocr-kombi", tags=["ocr-kombi"])
+
+# Include step-specific routes
+router.include_router(upload_router)
+
+# Future steps will be added here:
+# from .step_orientation import router as orientation_router
+# router.include_router(orientation_router)
+# ...
--- a/klausur-service/backend/ocr_kombi/step_upload.py
+++ b/klausur-service/backend/ocr_kombi/step_upload.py
@@ -0,0 +1,132 @@
+"""
+Step 1: Upload — handles single images and multi-page PDFs.
+
+Multi-page PDFs are split into individual PNG pages, each getting its own
+session linked by a shared document_group_id.
+"""
+
+import io
+import uuid
+import logging
+import time
+from typing import Optional
+
+from fastapi import APIRouter, UploadFile, File, Form, HTTPException
+
+from ocr_pipeline_session_store import create_session_db, get_document_group_sessions
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter()
+
+
+def _pdf_to_pngs(pdf_bytes: bytes) -> list[bytes]:
+    """Convert a PDF to a list of PNG byte buffers (one per page)."""
+    try:
+        import fitz  # PyMuPDF
+    except ImportError:
+        raise HTTPException(
+            status_code=500,
+            detail="PDF-Verarbeitung nicht verfuegbar (PyMuPDF fehlt)"
+        )
+
+    doc = fitz.open(stream=pdf_bytes, filetype="pdf")
+    pages: list[bytes] = []
+    for page in doc:
+        # Render at 300 DPI for OCR quality
+        mat = fitz.Matrix(300 / 72, 300 / 72)
+        pix = page.get_pixmap(matrix=mat)
+        pages.append(pix.tobytes("png"))
+    doc.close()
+    return pages
+
+
+@router.post("/upload")
+async def upload_document(
+    file: UploadFile = File(...),
+    name: Optional[str] = Form(None),
+    document_category: Optional[str] = Form(None),
+):
+    """Upload a single image or multi-page PDF.
+
+    Single image: Creates 1 session with document_group_id + page_number=1.
+    Multi-page PDF: Creates N sessions with shared document_group_id,
+    page_number 1..N, and titles "Title — S. X".
+    """
+    t0 = time.time()
+    file_bytes = await file.read()
+    filename = file.filename or "upload"
+    base_title = name or filename.rsplit(".", 1)[0]
+
+    is_pdf = (
+        filename.lower().endswith(".pdf")
+        or file.content_type == "application/pdf"
+        or file_bytes[:4] == b"%PDF"
+    )
+
+    group_id = str(uuid.uuid4())
+    created_sessions = []
+
+    if is_pdf:
+        pages = _pdf_to_pngs(file_bytes)
+        if not pages:
+            raise HTTPException(status_code=400, detail="PDF enthaelt keine Seiten")
+
+        for i, png_bytes in enumerate(pages, start=1):
+            session_id = str(uuid.uuid4())
+            page_title = f"{base_title} — S. {i}" if len(pages) > 1 else base_title
+            session = await create_session_db(
+                session_id=session_id,
+                name=page_title,
+                filename=filename,
+                original_png=png_bytes,
+                document_group_id=group_id,
+                page_number=i,
+            )
+            created_sessions.append({
+                "session_id": session["id"],
+                "name": session["name"],
+                "page_number": i,
+            })
+    else:
+        # Single image
+        session_id = str(uuid.uuid4())
+        session = await create_session_db(
+            session_id=session_id,
+            name=base_title,
+            filename=filename,
+            original_png=file_bytes,
+            document_group_id=group_id,
+            page_number=1,
+        )
+        created_sessions.append({
+            "session_id": session["id"],
+            "name": session["name"],
+            "page_number": 1,
+        })
+
+    duration = round(time.time() - t0, 2)
+    logger.info(
+        "Upload complete: %d page(s), group=%s, %.2fs",
+        len(created_sessions), group_id, duration,
+    )
+
+    return {
+        "document_group_id": group_id,
+        "page_count": len(created_sessions),
+        "sessions": created_sessions,
+        "duration_seconds": duration,
+    }
+
+
+@router.get("/documents/{group_id}")
+async def get_document_group(group_id: str):
+    """Get all sessions in a document group, sorted by page_number."""
+    sessions = await get_document_group_sessions(group_id)
+    if not sessions:
+        raise HTTPException(status_code=404, detail="Dokumentgruppe nicht gefunden")
+    return {
+        "document_group_id": group_id,
+        "page_count": len(sessions),
+        "sessions": sessions,
+    }
--- a/klausur-service/backend/ocr_pipeline_session_store.py
+++ b/klausur-service/backend/ocr_pipeline_session_store.py
@@ -76,7 +76,16 @@ async def init_ocr_pipeline_tables():
            ADD COLUMN IF NOT EXISTS parent_session_id UUID REFERENCES ocr_pipeline_sessions(id) ON DELETE CASCADE,
            ADD COLUMN IF NOT EXISTS box_index INT,
            ADD COLUMN IF NOT EXISTS grid_editor_result JSONB,
-            ADD COLUMN IF NOT EXISTS structure_result JSONB
+            ADD COLUMN IF NOT EXISTS structure_result JSONB,
+            ADD COLUMN IF NOT EXISTS document_group_id UUID,
+            ADD COLUMN IF NOT EXISTS page_number INT
+        """)
+
+        # Index for document group lookups
+        await conn.execute("""
+            CREATE INDEX IF NOT EXISTS idx_ocr_sessions_document_group
+            ON ocr_pipeline_sessions (document_group_id)
+            WHERE document_group_id IS NOT NULL
        """)


@@ -91,21 +100,26 @@ async def create_session_db(
    original_png: bytes,
    parent_session_id: Optional[str] = None,
    box_index: Optional[int] = None,
+    document_group_id: Optional[str] = None,
+    page_number: Optional[int] = None,
 ) -> Dict[str, Any]:
    """Create a new OCR pipeline session.

    Args:
        parent_session_id: If set, this is a sub-session for a box region.
        box_index: 0-based index of the box this sub-session represents.
+        document_group_id: Groups multi-page uploads into one document.
+        page_number: 1-based page index within the document group.
    """
    pool = await get_pool()
    parent_uuid = uuid.UUID(parent_session_id) if parent_session_id else None
+    group_uuid = uuid.UUID(document_group_id) if document_group_id else None
    async with pool.acquire() as conn:
        row = await conn.fetchrow("""
            INSERT INTO ocr_pipeline_sessions (
                id, name, filename, original_png, status, current_step,
-                parent_session_id, box_index
-            ) VALUES ($1, $2, $3, $4, 'active', 1, $5, $6)
+                parent_session_id, box_index, document_group_id, page_number
+            ) VALUES ($1, $2, $3, $4, 'active', 1, $5, $6, $7, $8)
            RETURNING id, name, filename, status, current_step,
                      orientation_result, crop_result,
                      deskew_result, dewarp_result, column_result, row_result,
@@ -114,9 +128,10 @@ async def create_session_db(
                      document_category, pipeline_log,
                      grid_editor_result, structure_result,
                      parent_session_id, box_index,
+                      document_group_id, page_number,
                      created_at, updated_at
        """, uuid.UUID(session_id), name, filename, original_png,
-            parent_uuid, box_index)
+            parent_uuid, box_index, group_uuid, page_number)

        return _row_to_dict(row)

@@ -134,6 +149,7 @@ async def get_session_db(session_id: str) -> Optional[Dict[str, Any]]:
                   document_category, pipeline_log,
                   grid_editor_result, structure_result,
                   parent_session_id, box_index,
+                   document_group_id, page_number,
                   created_at, updated_at
            FROM ocr_pipeline_sessions WHERE id = $1
        """, uuid.UUID(session_id))
@@ -186,6 +202,7 @@ async def update_session_db(session_id: str, **kwargs) -> Optional[Dict[str, Any
        'document_category', 'pipeline_log',
        'grid_editor_result', 'structure_result',
        'parent_session_id', 'box_index',
+        'document_group_id', 'page_number',
    }

    jsonb_fields = {'orientation_result', 'crop_result', 'deskew_result', 'dewarp_result', 'column_result', 'row_result', 'word_result', 'ground_truth', 'handwriting_removal_meta', 'doc_type_result', 'pipeline_log', 'grid_editor_result', 'structure_result'}
@@ -217,8 +234,9 @@ async def update_session_db(session_id: str, **kwargs) -> Optional[Dict[str, Any
                      word_result, ground_truth, auto_shear_degrees,
                      doc_type, doc_type_result,
                      document_category, pipeline_log,
-                      grid_editor_result,
+                      grid_editor_result, structure_result,
                      parent_session_id, box_index,
+                      document_group_id, page_number,
                      created_at, updated_at
        """, *values)

@@ -238,19 +256,28 @@ async def list_sessions_db(
    """
    pool = await get_pool()
    async with pool.acquire() as conn:
-        where = "" if include_sub_sessions else "WHERE parent_session_id IS NULL"
+        where = "" if include_sub_sessions else "WHERE parent_session_id IS NULL AND (status IS NULL OR status != 'split')"
        rows = await conn.fetch(f"""
            SELECT id, name, filename, status, current_step,
                   document_category, doc_type,
                   parent_session_id, box_index,
-                   created_at, updated_at
+                   document_group_id, page_number,
+                   created_at, updated_at,
+                   ground_truth
            FROM ocr_pipeline_sessions
            {where}
            ORDER BY created_at DESC
            LIMIT $1
        """, limit)

-        return [_row_to_dict(row) for row in rows]
+        results = []
+        for row in rows:
+            d = _row_to_dict(row)
+            # Derive is_ground_truth flag from JSONB, then drop the heavy field
+            gt = d.pop("ground_truth", None) or {}
+            d["is_ground_truth"] = bool(gt.get("build_grid_reference"))
+            results.append(d)
+        return results


 async def get_sub_sessions(parent_session_id: str) -> List[Dict[str, Any]]:
@@ -261,6 +288,7 @@ async def get_sub_sessions(parent_session_id: str) -> List[Dict[str, Any]]:
            SELECT id, name, filename, status, current_step,
                   document_category, doc_type,
                   parent_session_id, box_index,
+                   document_group_id, page_number,
                   created_at, updated_at
            FROM ocr_pipeline_sessions
            WHERE parent_session_id = $1
@@ -270,6 +298,24 @@ async def get_sub_sessions(parent_session_id: str) -> List[Dict[str, Any]]:
        return [_row_to_dict(row) for row in rows]


+async def get_document_group_sessions(document_group_id: str) -> List[Dict[str, Any]]:
+    """Get all sessions in a document group, ordered by page_number."""
+    pool = await get_pool()
+    async with pool.acquire() as conn:
+        rows = await conn.fetch("""
+            SELECT id, name, filename, status, current_step,
+                   document_category, doc_type,
+                   parent_session_id, box_index,
+                   document_group_id, page_number,
+                   created_at, updated_at
+            FROM ocr_pipeline_sessions
+            WHERE document_group_id = $1
+            ORDER BY page_number ASC
+        """, uuid.UUID(document_group_id))
+
+        return [_row_to_dict(row) for row in rows]
+
+
 async def list_ground_truth_sessions_db() -> List[Dict[str, Any]]:
    """List sessions that have a build_grid_reference in ground_truth."""
    pool = await get_pool()
@@ -324,7 +370,7 @@ def _row_to_dict(row: asyncpg.Record) -> Dict[str, Any]:
    result = dict(row)

    # UUID → string
-    for key in ['id', 'session_id', 'parent_session_id']:
+    for key in ['id', 'session_id', 'parent_session_id', 'document_group_id']:
        if key in result and result[key] is not None:
            result[key] = str(result[key])

--- a/klausur-service/backend/ocr_pipeline_sessions.py
+++ b/klausur-service/backend/ocr_pipeline_sessions.py
@@ -71,13 +71,36 @@ async def create_session(
    file: UploadFile = File(...),
    name: Optional[str] = Form(None),
 ):
-    """Upload a PDF or image file and create a pipeline session."""
+    """Upload a PDF or image file and create a pipeline session.
+
+    For multi-page PDFs (> 1 page), each page becomes its own session
+    grouped under a ``document_group_id``.  The response includes a
+    ``pages`` array with one entry per page/session.
+    """
    file_data = await file.read()
    filename = file.filename or "upload"
    content_type = file.content_type or ""

-    session_id = str(uuid.uuid4())
    is_pdf = content_type == "application/pdf" or filename.lower().endswith(".pdf")
+    session_name = name or filename
+
+    # --- Multi-page PDF handling ---
+    if is_pdf:
+        try:
+            import fitz  # PyMuPDF
+            pdf_doc = fitz.open(stream=file_data, filetype="pdf")
+            page_count = pdf_doc.page_count
+            pdf_doc.close()
+        except Exception as e:
+            raise HTTPException(status_code=400, detail=f"Could not read PDF: {e}")
+
+        if page_count > 1:
+            return await _create_multi_page_sessions(
+                file_data, filename, session_name, page_count,
+            )
+
+    # --- Single page (image or 1-page PDF) ---
+    session_id = str(uuid.uuid4())

    try:
        if is_pdf:
@@ -93,7 +116,6 @@ async def create_session(
        raise HTTPException(status_code=500, detail="Failed to encode image")

    original_png = png_buf.tobytes()
-    session_name = name or filename

    # Persist to DB
    await create_session_db(
@@ -134,6 +156,86 @@ async def create_session(
    }


+async def _create_multi_page_sessions(
+    pdf_data: bytes,
+    filename: str,
+    base_name: str,
+    page_count: int,
+) -> dict:
+    """Create one session per PDF page, grouped by document_group_id."""
+    document_group_id = str(uuid.uuid4())
+    pages = []
+
+    for page_idx in range(page_count):
+        session_id = str(uuid.uuid4())
+        page_name = f"{base_name} — Seite {page_idx + 1}"
+
+        try:
+            img_bgr = render_pdf_high_res(pdf_data, page_number=page_idx, zoom=3.0)
+        except Exception as e:
+            logger.warning(f"Failed to render PDF page {page_idx + 1}: {e}")
+            continue
+
+        ok, png_buf = cv2.imencode(".png", img_bgr)
+        if not ok:
+            continue
+        page_png = png_buf.tobytes()
+
+        await create_session_db(
+            session_id=session_id,
+            name=page_name,
+            filename=filename,
+            original_png=page_png,
+            document_group_id=document_group_id,
+            page_number=page_idx + 1,
+        )
+
+        _cache[session_id] = {
+            "id": session_id,
+            "filename": filename,
+            "name": page_name,
+            "original_bgr": img_bgr,
+            "oriented_bgr": None,
+            "cropped_bgr": None,
+            "deskewed_bgr": None,
+            "dewarped_bgr": None,
+            "orientation_result": None,
+            "crop_result": None,
+            "deskew_result": None,
+            "dewarp_result": None,
+            "ground_truth": {},
+            "current_step": 1,
+        }
+
+        h, w = img_bgr.shape[:2]
+        pages.append({
+            "session_id": session_id,
+            "name": page_name,
+            "page_number": page_idx + 1,
+            "image_width": w,
+            "image_height": h,
+            "original_image_url": f"/api/v1/ocr-pipeline/sessions/{session_id}/image/original",
+        })
+
+        logger.info(
+            f"OCR Pipeline: created page session {session_id} "
+            f"(page {page_idx + 1}/{page_count}) from {filename} ({w}x{h})"
+        )
+
+    # Include session_id pointing to first page for backwards compatibility
+    # (frontends that expect a single session_id will navigate to page 1)
+    first_session_id = pages[0]["session_id"] if pages else None
+
+    return {
+        "session_id": first_session_id,
+        "document_group_id": document_group_id,
+        "filename": filename,
+        "name": base_name,
+        "page_count": page_count,
+        "pages": pages,
+    }
+
+
@router.get("/sessions/{session_id}")
 async def get_session_info(session_id: str):
    """Get session info including deskew/dewarp/column results for step navigation."""
@@ -191,12 +293,12 @@ async def get_session_info(session_id: str):
    if session.get("ground_truth"):
        result["ground_truth"] = session["ground_truth"]

-    # Sub-session info
+    # Box sub-session info (zone_type='box' from column detection — NOT page-split)
    if session.get("parent_session_id"):
        result["parent_session_id"] = session["parent_session_id"]
        result["box_index"] = session.get("box_index")
    else:
-        # Check for sub-sessions
+        # Check for box sub-sessions (column detection creates these)
        subs = await get_sub_sessions(session_id)
        if subs:
            result["sub_sessions"] = [
--- a/klausur-service/backend/orientation_crop_api.py
+++ b/klausur-service/backend/orientation_crop_api.py
@@ -238,8 +238,8 @@ async def detect_page_split(session_id: str):
        "duration_seconds": round(duration, 2),
    }

-    # Mark parent session as split (store info in crop_result for backward compat)
-    await update_session_db(session_id, crop_result=split_info)
+    # Mark parent session as split and hidden from session list
+    await update_session_db(session_id, crop_result=split_info, status='split')
    cached["crop_result"] = split_info

    await _append_pipeline_log(session_id, "page_split", {
@@ -346,6 +346,7 @@ async def auto_crop(session_id: str):
            cropped_png=png_buf.tobytes() if ok else b"",
            crop_result=crop_info,
            current_step=5,
+            status='split',
        )

        logger.info(
@@ -461,8 +462,6 @@ async def _create_page_sub_sessions(
            name=sub_name,
            filename=parent_filename,
            original_png=page_png,
-            parent_session_id=parent_session_id,
-            box_index=pi,
        )

        # Pre-populate: set cropped = original (already cropped)
@@ -540,8 +539,6 @@ async def _create_page_sub_sessions_full(
            name=sub_name,
            filename=parent_filename,
            original_png=page_png,
-            parent_session_id=parent_session_id,
-            box_index=pi,
        )

        # start_step=2 → ready for deskew (orientation already done on spread)
@@ -553,7 +550,6 @@ async def _create_page_sub_sessions_full(
            "id": sub_id,
            "filename": parent_filename,
            "name": sub_name,
-            "parent_session_id": parent_session_id,
            "original_bgr": page_bgr,
            "oriented_bgr": None,
            "cropped_bgr": None,
--- a/klausur-service/backend/page_crop.py
+++ b/klausur-service/backend/page_crop.py
@@ -457,6 +457,164 @@ def _detect_spine_shadow(
    return spine_x


+def _detect_gutter_continuity(
+    gray: np.ndarray,
+    search_region: np.ndarray,
+    offset_x: int,
+    w: int,
+    side: str,
+) -> Optional[int]:
+    """Detect gutter shadow via vertical continuity analysis.
+
+    Camera book scans produce a subtle brightness gradient at the gutter
+    that is too faint for scanner-shadow detection (range < 40).  However,
+    the gutter shadow has a unique property: it runs **continuously from
+    top to bottom** without interruption.  Text and images always have
+    vertical gaps between lines, paragraphs, or sections.
+
+    Algorithm:
+    1. Divide image into N horizontal strips (~60px each)
+    2. For each column, compute what fraction of strips are darker than
+       the page median (from the center 50% of the full image)
+    3. A "gutter column" has ≥ 75% of strips darker than page_median − δ
+    4. Smooth the dark-fraction profile and find the transition point
+       from the edge inward where the fraction drops below 0.50
+    5. Validate: gutter band must be 0.5%-10% of image width
+
+    Args:
+        gray: Full grayscale image.
+        search_region: Edge slice of the grayscale image.
+        offset_x: X offset of search_region relative to full image.
+        w: Full image width.
+        side: 'left' or 'right'.
+
+    Returns:
+        X coordinate (in full image) of the gutter inner edge, or None.
+    """
+    region_h, region_w = search_region.shape[:2]
+    if region_w < 20 or region_h < 100:
+        return None
+
+    # --- 1. Divide into horizontal strips ---
+    strip_target_h = 60  # ~60px per strip
+    n_strips = max(10, region_h // strip_target_h)
+    strip_h = region_h // n_strips
+
+    strip_means = np.zeros((n_strips, region_w), dtype=np.float64)
+    for s in range(n_strips):
+        y0 = s * strip_h
+        y1 = min((s + 1) * strip_h, region_h)
+        strip_means[s] = np.mean(search_region[y0:y1, :], axis=0)
+
+    # --- 2. Page median from center 50% of full image ---
+    center_lo = w // 4
+    center_hi = 3 * w // 4
+    page_median = float(np.median(gray[:, center_lo:center_hi]))
+
+    # Camera shadows are subtle — threshold just 5 levels below page median
+    dark_thresh = page_median - 5.0
+
+    # If page is very dark overall (e.g. photo, not a book page), bail out
+    if page_median < 180:
+        return None
+
+    # --- 3. Per-column dark fraction ---
+    dark_count = np.sum(strip_means < dark_thresh, axis=0).astype(np.float64)
+    dark_frac = dark_count / n_strips  # shape: (region_w,)
+
+    # --- 4. Smooth and find transition ---
+    # Rolling mean (window = 1% of image width, min 5)
+    smooth_w = max(5, w // 100)
+    if smooth_w % 2 == 0:
+        smooth_w += 1
+    kernel = np.ones(smooth_w) / smooth_w
+    frac_smooth = np.convolve(dark_frac, kernel, mode="same")
+
+    # Trim convolution edges
+    margin = smooth_w // 2
+    if region_w <= 2 * margin + 10:
+        return None
+
+    # Find the peak of dark fraction (gutter center).
+    # For right gutters the peak is near the edge; for left gutters
+    # (V-shaped spine shadow) the peak may be well inside the region.
+    transition_thresh = 0.50
+    peak_frac = float(np.max(frac_smooth[margin:region_w - margin]))
+
+    if peak_frac < 0.70:
+        logger.debug(
+            "%s gutter: peak dark fraction %.2f < 0.70", side.capitalize(), peak_frac,
+        )
+        return None
+
+    peak_x = int(np.argmax(frac_smooth[margin:region_w - margin])) + margin
+    gutter_inner = None  # local x in search_region
+
+    if side == "right":
+        # Scan from peak toward the page center (leftward)
+        for x in range(peak_x, margin, -1):
+            if frac_smooth[x] < transition_thresh:
+                gutter_inner = x + 1
+                break
+    else:
+        # Scan from peak toward the page center (rightward)
+        for x in range(peak_x, region_w - margin):
+            if frac_smooth[x] < transition_thresh:
+                gutter_inner = x - 1
+                break
+
+    if gutter_inner is None:
+        return None
+
+    # --- 5. Validate gutter width ---
+    if side == "right":
+        gutter_width = region_w - gutter_inner
+    else:
+        gutter_width = gutter_inner
+
+    min_gutter = max(3, int(w * 0.005))   # at least 0.5% of image
+    max_gutter = int(w * 0.10)            # at most 10% of image
+
+    if gutter_width < min_gutter:
+        logger.debug(
+            "%s gutter: too narrow (%dpx < %dpx)", side.capitalize(),
+            gutter_width, min_gutter,
+        )
+        return None
+
+    if gutter_width > max_gutter:
+        logger.debug(
+            "%s gutter: too wide (%dpx > %dpx)", side.capitalize(),
+            gutter_width, max_gutter,
+        )
+        return None
+
+    # Check that the gutter band is meaningfully darker than the page
+    if side == "right":
+        gutter_brightness = float(np.mean(strip_means[:, gutter_inner:]))
+    else:
+        gutter_brightness = float(np.mean(strip_means[:, :gutter_inner]))
+
+    brightness_drop = page_median - gutter_brightness
+    if brightness_drop < 3:
+        logger.debug(
+            "%s gutter: insufficient brightness drop (%.1f levels)",
+            side.capitalize(), brightness_drop,
+        )
+        return None
+
+    gutter_x = offset_x + gutter_inner
+
+    logger.info(
+        "%s gutter (continuity): x=%d, width=%dpx (%.1f%%), "
+        "brightness=%.0f vs page=%.0f (drop=%.0f), frac@edge=%.2f",
+        side.capitalize(), gutter_x, gutter_width,
+        100.0 * gutter_width / w, gutter_brightness, page_median,
+        brightness_drop, float(frac_smooth[gutter_inner]),
+    )
+    return gutter_x
+
+
 def _detect_left_edge_shadow(
    gray: np.ndarray,
    binary: np.ndarray,
@@ -465,15 +623,22 @@ def _detect_left_edge_shadow(
 ) -> int:
    """Detect left content edge, accounting for book-spine shadow.

-    Looks at the left 25% for a scanner gray strip.  Cuts at the
-    darkest column (= spine center).  Fallback: binary projection.
+    Tries three methods in order:
+    1. Scanner spine-shadow (dark gradient, range > 40)
+    2. Camera gutter continuity (subtle shadow running top-to-bottom)
+    3. Binary projection fallback (first ink column)
    """
    search_w = max(1, w // 4)
    spine_x = _detect_spine_shadow(gray, gray[:, :search_w], 0, w, "left")
    if spine_x is not None:
        return spine_x

-    # Fallback: binary vertical projection
+    # Fallback 1: vertical continuity (camera gutter shadow)
+    gutter_x = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
+    if gutter_x is not None:
+        return gutter_x
+
+    # Fallback 2: binary vertical projection
    return _detect_edge_projection(binary, axis=0, from_start=True, dim=w)


@@ -485,8 +650,10 @@ def _detect_right_edge_shadow(
 ) -> int:
    """Detect right content edge, accounting for book-spine shadow.

-    Looks at the right 25% for a scanner gray strip.  Cuts at the
-    darkest column (= spine center).  Fallback: binary projection.
+    Tries three methods in order:
+    1. Scanner spine-shadow (dark gradient, range > 40)
+    2. Camera gutter continuity (subtle shadow running top-to-bottom)
+    3. Binary projection fallback (last ink column)
    """
    search_w = max(1, w // 4)
    right_start = w - search_w
@@ -494,7 +661,12 @@ def _detect_right_edge_shadow(
    if spine_x is not None:
        return spine_x

-    # Fallback: binary vertical projection
+    # Fallback 1: vertical continuity (camera gutter shadow)
+    gutter_x = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
+    if gutter_x is not None:
+        return gutter_x
+
+    # Fallback 2: binary vertical projection
    return _detect_edge_projection(binary, axis=0, from_start=False, dim=w)


--- a/klausur-service/backend/requirements.txt
+++ b/klausur-service/backend/requirements.txt
@@ -35,6 +35,9 @@ onnxruntime
 # IPA pronunciation dictionary lookup (MIT license, bundled CMU dict ~134k words)
 eng-to-ipa

+# German IPA rule-based fallback for OOV words (MIT license)
+epitran
+
 # Spell-checker for rule-based OCR correction (MIT license)
 pyspellchecker>=0.8.1

--- a/klausur-service/backend/smart_spell.py
+++ b/klausur-service/backend/smart_spell.py
@@ -0,0 +1,594 @@
+"""
+SmartSpellChecker — Language-aware OCR post-correction without LLMs.
+
+Uses pyspellchecker (MIT) with dual EN+DE dictionaries for:
+- Automatic language detection per word (dual-dictionary heuristic)
+- OCR error correction (digit↔letter, umlauts, transpositions)
+- Context-based disambiguation (a/I, l/I) via bigram lookup
+- Mixed-language support for example sentences
+
+Lizenz: Apache 2.0 (kommerziell nutzbar)
+"""
+
+import logging
+import re
+from dataclasses import dataclass, field
+from typing import Dict, List, Literal, Optional, Set, Tuple
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Init
+# ---------------------------------------------------------------------------
+
+try:
+    from spellchecker import SpellChecker as _SpellChecker
+    _en_spell = _SpellChecker(language='en', distance=1)
+    _de_spell = _SpellChecker(language='de', distance=1)
+    _AVAILABLE = True
+except ImportError:
+    _AVAILABLE = False
+    logger.warning("pyspellchecker not installed — SmartSpellChecker disabled")
+
+Lang = Literal["en", "de", "both", "unknown"]
+
+# ---------------------------------------------------------------------------
+# Bigram context for a/I disambiguation
+# ---------------------------------------------------------------------------
+
+# Words that commonly follow "I" (subject pronoun → verb/modal)
+_I_FOLLOWERS: frozenset = frozenset({
+    "am", "was", "have", "had", "do", "did", "will", "would", "can",
+    "could", "should", "shall", "may", "might", "must",
+    "think", "know", "see", "want", "need", "like", "love", "hate",
+    "go", "went", "come", "came", "say", "said", "get", "got",
+    "make", "made", "take", "took", "give", "gave", "tell", "told",
+    "feel", "felt", "find", "found", "believe", "hope", "wish",
+    "remember", "forget", "understand", "mean", "meant",
+    "don't", "didn't", "can't", "won't", "couldn't", "wouldn't",
+    "shouldn't", "haven't", "hadn't", "isn't", "wasn't",
+    "really", "just", "also", "always", "never", "often", "sometimes",
+})
+
+# Words that commonly follow "a" (article → noun/adjective)
+_A_FOLLOWERS: frozenset = frozenset({
+    "lot", "few", "little", "bit", "good", "bad", "great", "new", "old",
+    "long", "short", "big", "small", "large", "huge", "tiny",
+    "nice", "beautiful", "wonderful", "terrible", "horrible",
+    "man", "woman", "boy", "girl", "child", "dog", "cat", "bird",
+    "book", "car", "house", "room", "school", "teacher", "student",
+    "day", "week", "month", "year", "time", "place", "way",
+    "friend", "family", "person", "problem", "question", "story",
+    "very", "really", "quite", "rather", "pretty", "single",
+})
+
+# Digit→letter substitutions (OCR confusion)
+_DIGIT_SUBS: Dict[str, List[str]] = {
+    '0': ['o', 'O'],
+    '1': ['l', 'I'],
+    '5': ['s', 'S'],
+    '6': ['g', 'G'],
+    '8': ['b', 'B'],
+    '|': ['I', 'l'],
+    '/': ['l'],  # italic 'l' misread as slash (e.g. "p/" → "pl")
+}
+_SUSPICIOUS_CHARS = frozenset(_DIGIT_SUBS.keys())
+
+# Umlaut confusion: OCR drops dots (ü→u, ä→a, ö→o)
+_UMLAUT_MAP = {
+    'a': 'ä', 'o': 'ö', 'u': 'ü', 'i': 'ü',
+    'A': 'Ä', 'O': 'Ö', 'U': 'Ü', 'I': 'Ü',
+}
+
+# Tokenizer — includes | and / so OCR artifacts like "p/" are treated as words
+_TOKEN_RE = re.compile(r"([A-Za-zÄÖÜäöüß'|/]+)([^A-Za-zÄÖÜäöüß'|/]*)")
+
+
+# ---------------------------------------------------------------------------
+# Data types
+# ---------------------------------------------------------------------------
+
+@dataclass
+class CorrectionResult:
+    original: str
+    corrected: str
+    lang_detected: Lang
+    changed: bool
+    changes: List[str] = field(default_factory=list)
+
+
+# ---------------------------------------------------------------------------
+# Core class
+# ---------------------------------------------------------------------------
+
+class SmartSpellChecker:
+    """Language-aware OCR spell checker using pyspellchecker (no LLM)."""
+
+    def __init__(self):
+        if not _AVAILABLE:
+            raise RuntimeError("pyspellchecker not installed")
+        self.en = _en_spell
+        self.de = _de_spell
+
+    # --- Language detection ---
+
+    def detect_word_lang(self, word: str) -> Lang:
+        """Detect language of a single word using dual-dict heuristic."""
+        w = word.lower().strip(".,;:!?\"'()")
+        if not w:
+            return "unknown"
+        in_en = bool(self.en.known([w]))
+        in_de = bool(self.de.known([w]))
+        if in_en and in_de:
+            return "both"
+        if in_en:
+            return "en"
+        if in_de:
+            return "de"
+        return "unknown"
+
+    def detect_text_lang(self, text: str) -> Lang:
+        """Detect dominant language of a text string (sentence/phrase)."""
+        words = re.findall(r"[A-Za-zÄÖÜäöüß]+", text)
+        if not words:
+            return "unknown"
+
+        en_count = 0
+        de_count = 0
+        for w in words:
+            lang = self.detect_word_lang(w)
+            if lang == "en":
+                en_count += 1
+            elif lang == "de":
+                de_count += 1
+            # "both" doesn't count for either
+
+        if en_count > de_count:
+            return "en"
+        if de_count > en_count:
+            return "de"
+        if en_count == de_count and en_count > 0:
+            return "both"
+        return "unknown"
+
+    # --- Single-word correction ---
+
+    def _known(self, word: str) -> bool:
+        """True if word is known in EN or DE dictionary, or is a known abbreviation."""
+        w = word.lower()
+        if bool(self.en.known([w])) or bool(self.de.known([w])):
+            return True
+        # Also accept known abbreviations (sth, sb, adj, etc.)
+        try:
+            from cv_ocr_engines import _KNOWN_ABBREVIATIONS
+            if w in _KNOWN_ABBREVIATIONS:
+                return True
+        except ImportError:
+            pass
+        return False
+
+    def _word_freq(self, word: str) -> float:
+        """Get word frequency (max of EN and DE)."""
+        w = word.lower()
+        return max(self.en.word_usage_frequency(w), self.de.word_usage_frequency(w))
+
+    def _known_in(self, word: str, lang: str) -> bool:
+        """True if word is known in a specific language dictionary."""
+        w = word.lower()
+        spell = self.en if lang == "en" else self.de
+        return bool(spell.known([w]))
+
+    def correct_word(self, word: str, lang: str = "en",
+                     prev_word: str = "", next_word: str = "") -> Optional[str]:
+        """Correct a single word for the given language.
+
+        Returns None if no correction needed, or the corrected string.
+
+        Args:
+            word: The word to check/correct
+            lang: Expected language ("en" or "de")
+            prev_word: Previous word (for context)
+            next_word: Next word (for context)
+        """
+        if not word or not word.strip():
+            return None
+
+        # Skip numbers, abbreviations with dots, very short tokens
+        if word.isdigit() or '.' in word:
+            return None
+
+        # Skip IPA/phonetic content in brackets
+        if '[' in word or ']' in word:
+            return None
+
+        has_suspicious = any(ch in _SUSPICIOUS_CHARS for ch in word)
+
+        # 1. Already known → no fix
+        if self._known(word):
+            # But check a/I disambiguation for single-char words
+            if word.lower() in ('l', '|') and next_word:
+                return self._disambiguate_a_I(word, next_word)
+            return None
+
+        # 2. Digit/pipe substitution
+        if has_suspicious:
+            if word == '|':
+                return 'I'
+            # Try single-char substitutions
+            for i, ch in enumerate(word):
+                if ch not in _DIGIT_SUBS:
+                    continue
+                for replacement in _DIGIT_SUBS[ch]:
+                    candidate = word[:i] + replacement + word[i + 1:]
+                    if self._known(candidate):
+                        return candidate
+            # Try multi-char substitution (e.g., "sch00l" → "school")
+            multi = self._try_multi_digit_sub(word)
+            if multi:
+                return multi
+
+        # 3. Umlaut correction (German)
+        if lang == "de" and len(word) >= 3 and word.isalpha():
+            umlaut_fix = self._try_umlaut_fix(word)
+            if umlaut_fix:
+                return umlaut_fix
+
+        # 4. General spell correction
+        if not has_suspicious and len(word) >= 3 and word.isalpha():
+            # Safety: don't correct if the word is valid in the OTHER language
+            # (either directly or via umlaut fix)
+            other_lang = "de" if lang == "en" else "en"
+            if self._known_in(word, other_lang):
+                return None
+            if other_lang == "de" and self._try_umlaut_fix(word):
+                return None  # has a valid DE umlaut variant → don't touch
+
+            spell = self.en if lang == "en" else self.de
+            correction = spell.correction(word.lower())
+            if correction and correction != word.lower():
+                if word[0].isupper():
+                    correction = correction[0].upper() + correction[1:]
+                if self._known(correction):
+                    return correction
+
+        return None
+
+    # --- Multi-digit substitution ---
+
+    def _try_multi_digit_sub(self, word: str) -> Optional[str]:
+        """Try replacing multiple digits simultaneously."""
+        positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
+        if len(positions) < 1 or len(positions) > 4:
+            return None
+
+        # Try all combinations (max 2^4 = 16 for 4 positions)
+        chars = list(word)
+        best = None
+        self._multi_sub_recurse(chars, positions, 0, best_result=[None])
+        return self._multi_sub_recurse_result
+
+    _multi_sub_recurse_result: Optional[str] = None
+
+    def _try_multi_digit_sub(self, word: str) -> Optional[str]:
+        """Try replacing multiple digits simultaneously using BFS."""
+        positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
+        if not positions or len(positions) > 4:
+            return None
+
+        # BFS over substitution combinations
+        queue = [list(word)]
+        for pos, ch in positions:
+            next_queue = []
+            for current in queue:
+                # Keep original
+                next_queue.append(current[:])
+                # Try each substitution
+                for repl in _DIGIT_SUBS[ch]:
+                    variant = current[:]
+                    variant[pos] = repl
+                    next_queue.append(variant)
+            queue = next_queue
+
+        # Check which combinations produce known words
+        for combo in queue:
+            candidate = "".join(combo)
+            if candidate != word and self._known(candidate):
+                return candidate
+
+        return None
+
+    # --- Umlaut fix ---
+
+    def _try_umlaut_fix(self, word: str) -> Optional[str]:
+        """Try single-char umlaut substitutions for German words."""
+        for i, ch in enumerate(word):
+            if ch in _UMLAUT_MAP:
+                candidate = word[:i] + _UMLAUT_MAP[ch] + word[i + 1:]
+                if self._known(candidate):
+                    return candidate
+        return None
+
+    # --- Boundary repair (shifted word boundaries) ---
+
+    def _try_boundary_repair(self, word1: str, word2: str) -> Optional[Tuple[str, str]]:
+        """Fix shifted word boundaries between adjacent tokens.
+
+        OCR sometimes shifts the boundary: "at sth." → "ats th."
+        Try moving 1-2 chars from end of word1 to start of word2 and vice versa.
+        Returns (fixed_word1, fixed_word2) or None.
+        """
+        # Import known abbreviations for vocabulary context
+        try:
+            from cv_ocr_engines import _KNOWN_ABBREVIATIONS
+        except ImportError:
+            _KNOWN_ABBREVIATIONS = set()
+
+        # Strip trailing punctuation for checking, preserve for result
+        w2_stripped = word2.rstrip(".,;:!?")
+        w2_punct = word2[len(w2_stripped):]
+
+        # Try shifting 1-2 chars from word1 → word2
+        for shift in (1, 2):
+            if len(word1) <= shift:
+                continue
+            new_w1 = word1[:-shift]
+            new_w2_base = word1[-shift:] + w2_stripped
+
+            w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
+            w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
+
+            if w1_ok and w2_ok:
+                return (new_w1, new_w2_base + w2_punct)
+
+        # Try shifting 1-2 chars from word2 → word1
+        for shift in (1, 2):
+            if len(w2_stripped) <= shift:
+                continue
+            new_w1 = word1 + w2_stripped[:shift]
+            new_w2_base = w2_stripped[shift:]
+
+            w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
+            w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
+
+            if w1_ok and w2_ok:
+                return (new_w1, new_w2_base + w2_punct)
+
+        return None
+
+    # --- Context-based word split for ambiguous merges ---
+
+    # Patterns where a valid word is actually "a" + adjective/noun
+    _ARTICLE_SPLIT_CANDIDATES = {
+        # word → (article, remainder) — only when followed by a compatible word
+        "anew": ("a", "new"),
+        "areal": ("a", "real"),
+        "alive": None,    # genuinely one word, never split
+        "alone": None,
+        "aware": None,
+        "alike": None,
+        "apart": None,
+        "aside": None,
+        "above": None,
+        "about": None,
+        "among": None,
+        "along": None,
+    }
+
+    def _try_context_split(self, word: str, next_word: str,
+                           prev_word: str) -> Optional[str]:
+        """Split words like 'anew' → 'a new' when context indicates a merge.
+
+        Only splits when:
+        - The word is in the split candidates list
+        - The following word makes sense as a noun (for "a + adj + noun" pattern)
+        - OR the word is unknown and can be split into article + known word
+        """
+        w_lower = word.lower()
+
+        # Check explicit candidates
+        if w_lower in self._ARTICLE_SPLIT_CANDIDATES:
+            split = self._ARTICLE_SPLIT_CANDIDATES[w_lower]
+            if split is None:
+                return None  # explicitly marked as "don't split"
+            article, remainder = split
+            # Only split if followed by a word (noun pattern)
+            if next_word and next_word[0].islower():
+                return f"{article} {remainder}"
+            # Also split if remainder + next_word makes a common phrase
+            if next_word and self._known(next_word):
+                return f"{article} {remainder}"
+
+        # Generic: if word starts with 'a' and rest is a known adjective/word
+        if (len(word) >= 4 and word[0].lower() == 'a'
+                and not self._known(word)  # only for UNKNOWN words
+                and self._known(word[1:])):
+            return f"a {word[1:]}"
+
+        return None
+
+    # --- a/I disambiguation ---
+
+    def _disambiguate_a_I(self, token: str, next_word: str) -> Optional[str]:
+        """Disambiguate 'a' vs 'I' (and OCR variants like 'l', '|')."""
+        nw = next_word.lower().strip(".,;:!?")
+        if nw in _I_FOLLOWERS:
+            return "I"
+        if nw in _A_FOLLOWERS:
+            return "a"
+        # Fallback: check if next word is more commonly a verb (→I) or noun/adj (→a)
+        # Simple heuristic: if next word starts with uppercase (and isn't first in sentence)
+        # it's likely a German noun following "I"... but in English context, uppercase
+        # after "I" is unusual.
+        return None  # uncertain, don't change
+
+    # --- Full text correction ---
+
+    def correct_text(self, text: str, lang: str = "en") -> CorrectionResult:
+        """Correct a full text string (field value).
+
+        Three passes:
+        1. Boundary repair — fix shifted word boundaries between adjacent tokens
+        2. Context split — split ambiguous merges (anew → a new)
+        3. Per-word correction — spell check individual words
+
+        Args:
+            text: The text to correct
+            lang: Expected language ("en" or "de")
+        """
+        if not text or not text.strip():
+            return CorrectionResult(text, text, "unknown", False)
+
+        detected = self.detect_text_lang(text) if lang == "auto" else lang
+        effective_lang = detected if detected in ("en", "de") else "en"
+
+        changes: List[str] = []
+        tokens = list(_TOKEN_RE.finditer(text))
+
+        # Extract token list: [(word, separator), ...]
+        token_list: List[List[str]] = []  # [[word, sep], ...]
+        for m in tokens:
+            token_list.append([m.group(1), m.group(2)])
+
+        # --- Pass 1: Boundary repair between adjacent unknown words ---
+        # Import abbreviations for the heuristic below
+        try:
+            from cv_ocr_engines import _KNOWN_ABBREVIATIONS as _ABBREVS
+        except ImportError:
+            _ABBREVS = set()
+
+        for i in range(len(token_list) - 1):
+            w1 = token_list[i][0]
+            w2_raw = token_list[i + 1][0]
+
+            # Skip boundary repair for IPA/bracket content
+            # Brackets may be in the token OR in the adjacent separators
+            sep_before_w1 = token_list[i - 1][1] if i > 0 else ""
+            sep_after_w1 = token_list[i][1]
+            sep_after_w2 = token_list[i + 1][1]
+            has_bracket = (
+                '[' in w1 or ']' in w1 or '[' in w2_raw or ']' in w2_raw
+                or ']' in sep_after_w1  # w1 text was inside [brackets]
+                or '[' in sep_after_w1  # w2 starts a bracket
+                or ']' in sep_after_w2  # w2 text was inside [brackets]
+                or '[' in sep_before_w1  # w1 starts a bracket
+            )
+            if has_bracket:
+                continue
+
+            # Include trailing punct from separator in w2 for abbreviation matching
+            w2_with_punct = w2_raw + token_list[i + 1][1].rstrip(" ")
+
+            # Try boundary repair — always, even if both words are valid.
+            # Use word-frequency scoring to decide if repair is better.
+            repair = self._try_boundary_repair(w1, w2_with_punct)
+            if not repair and w2_with_punct != w2_raw:
+                repair = self._try_boundary_repair(w1, w2_raw)
+            if repair:
+                new_w1, new_w2_full = repair
+                new_w2_base = new_w2_full.rstrip(".,;:!?")
+
+                # Frequency-based scoring: product of word frequencies
+                # Higher product = more common word pair = better
+                old_freq = self._word_freq(w1) * self._word_freq(w2_raw)
+                new_freq = self._word_freq(new_w1) * self._word_freq(new_w2_base)
+
+                # Abbreviation bonus: if repair produces a known abbreviation
+                has_abbrev = new_w1.lower() in _ABBREVS or new_w2_base.lower() in _ABBREVS
+                if has_abbrev:
+                    # Accept abbreviation repair ONLY if at least one of the
+                    # original words is rare/unknown (prevents "Can I" → "Ca nI"
+                    # where both original words are common and correct).
+                    # "Rare" = frequency < 1e-6 (covers "ats", "th" but not "Can", "I")
+                    RARE_THRESHOLD = 1e-6
+                    orig_both_common = (
+                        self._word_freq(w1) > RARE_THRESHOLD
+                        and self._word_freq(w2_raw) > RARE_THRESHOLD
+                    )
+                    if not orig_both_common:
+                        new_freq = max(new_freq, old_freq * 10)
+                    else:
+                        has_abbrev = False  # both originals common → don't trust
+
+                # Accept if repair produces a more frequent word pair
+                # (threshold: at least 5x more frequent to avoid false positives)
+                if new_freq > old_freq * 5:
+                    new_w2_punct = new_w2_full[len(new_w2_base):]
+                    changes.append(f"{w1} {w2_raw}→{new_w1} {new_w2_base}")
+                    token_list[i][0] = new_w1
+                    token_list[i + 1][0] = new_w2_base
+                    if new_w2_punct:
+                        token_list[i + 1][1] = new_w2_punct + token_list[i + 1][1].lstrip(".,;:!?")
+
+        # --- Pass 2: Context split (anew → a new) ---
+        expanded: List[List[str]] = []
+        for i, (word, sep) in enumerate(token_list):
+            next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
+            prev_word = token_list[i - 1][0] if i > 0 else ""
+            split = self._try_context_split(word, next_word, prev_word)
+            if split and split != word:
+                changes.append(f"{word}→{split}")
+                expanded.append([split, sep])
+            else:
+                expanded.append([word, sep])
+        token_list = expanded
+
+        # --- Pass 3: Per-word correction ---
+        parts: List[str] = []
+
+        # Preserve any leading text before the first token match
+        # (e.g., "(= " before "I won and he lost.")
+        first_start = tokens[0].start() if tokens else 0
+        if first_start > 0:
+            parts.append(text[:first_start])
+
+        for i, (word, sep) in enumerate(token_list):
+            # Skip words inside IPA brackets (brackets land in separators)
+            prev_sep = token_list[i - 1][1] if i > 0 else ""
+            if '[' in prev_sep or ']' in sep:
+                parts.append(word)
+                parts.append(sep)
+                continue
+
+            next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
+            prev_word = token_list[i - 1][0] if i > 0 else ""
+
+            correction = self.correct_word(
+                word, lang=effective_lang,
+                prev_word=prev_word, next_word=next_word,
+            )
+            if correction and correction != word:
+                changes.append(f"{word}→{correction}")
+                parts.append(correction)
+            else:
+                parts.append(word)
+            parts.append(sep)
+
+        # Append any trailing text
+        last_end = tokens[-1].end() if tokens else 0
+        if last_end < len(text):
+            parts.append(text[last_end:])
+
+        corrected = "".join(parts)
+        return CorrectionResult(
+            original=text,
+            corrected=corrected,
+            lang_detected=detected,
+            changed=corrected != text,
+            changes=changes,
+        )
+
+    # --- Vocabulary entry correction ---
+
+    def correct_vocab_entry(self, english: str, german: str,
+                            example: str = "") -> Dict[str, CorrectionResult]:
+        """Correct a full vocabulary entry (EN + DE + example).
+
+        Uses column position to determine language — the most reliable signal.
+        """
+        results = {}
+        results["english"] = self.correct_text(english, lang="en")
+        results["german"] = self.correct_text(german, lang="de")
+        if example:
+            # For examples, auto-detect language
+            results["example"] = self.correct_text(example, lang="auto")
+        return results
--- a/klausur-service/backend/tests/test_box_layout.py
+++ b/klausur-service/backend/tests/test_box_layout.py
@@ -0,0 +1,124 @@
+"""Tests for cv_box_layout.py — box layout classification and grid building."""
+
+import pytest
+import sys, os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+
+from cv_box_layout import classify_box_layout, build_box_zone_grid, _group_into_lines
+
+
+def _make_words(lines_data):
+    """Create word dicts from [(y, x, text), ...] tuples."""
+    words = []
+    for y, x, text in lines_data:
+        words.append({"top": y, "left": x, "width": len(text) * 10, "height": 25, "text": text})
+    return words
+
+
+class TestClassifyBoxLayout:
+
+    def test_header_only(self):
+        words = _make_words([(100, 50, "Unit 3")])
+        assert classify_box_layout(words, 500, 50) == "header_only"
+
+    def test_empty(self):
+        assert classify_box_layout([], 500, 200) == "header_only"
+
+    def test_flowing(self):
+        """Multiple lines without bullet patterns → flowing."""
+        words = _make_words([
+            (100, 50, "German leihen title"),
+            (130, 50, "etwas ausleihen von jm"),
+            (160, 70, "borrow sth from sb"),
+            (190, 70, "Can I borrow your CD"),
+            (220, 50, "etwas verleihen an jn"),
+            (250, 70, "OK I can lend you my"),
+        ])
+        assert classify_box_layout(words, 500, 200) == "flowing"
+
+    def test_bullet_list(self):
+        """Lines starting with bullet markers → bullet_list."""
+        words = _make_words([
+            (100, 50, "Title of the box"),
+            (130, 50, "• First item text here"),
+            (160, 50, "• Second item text here"),
+            (190, 50, "• Third item text here"),
+            (220, 50, "• Fourth item text here"),
+            (250, 50, "• Fifth item text here"),
+        ])
+        assert classify_box_layout(words, 500, 150) == "bullet_list"
+
+
+class TestGroupIntoLines:
+
+    def test_single_line(self):
+        words = _make_words([(100, 50, "hello"), (100, 120, "world")])
+        lines = _group_into_lines(words)
+        assert len(lines) == 1
+        assert len(lines[0]) == 2
+
+    def test_two_lines(self):
+        words = _make_words([(100, 50, "line1"), (150, 50, "line2")])
+        lines = _group_into_lines(words)
+        assert len(lines) == 2
+
+    def test_y_proximity(self):
+        """Words within y-tolerance are on same line."""
+        words = _make_words([(100, 50, "a"), (103, 120, "b")])  # 3px apart
+        lines = _group_into_lines(words)
+        assert len(lines) == 1
+
+
+class TestBuildBoxZoneGrid:
+
+    def test_flowing_groups_by_indent(self):
+        """Flowing layout groups continuation lines by indentation."""
+        words = _make_words([
+            (100, 50, "Header Title"),
+            (130, 50, "Bullet start text"),
+            (160, 80, "continuation line 1"),
+            (190, 80, "continuation line 2"),
+        ])
+        result = build_box_zone_grid(words, 40, 90, 500, 120, 0, 1600, 2200, layout_type="flowing")
+        # Header + 1 grouped bullet = 2 rows
+        assert len(result["rows"]) == 2
+        assert len(result["cells"]) == 2
+        # Second cell should have \n (multi-line)
+        bullet_cell = result["cells"][1]
+        assert "\n" in bullet_cell["text"]
+
+    def test_header_only_single_cell(self):
+        words = _make_words([(100, 50, "Just a title")])
+        result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="header_only")
+        assert len(result["cells"]) == 1
+        assert result["box_layout_type"] == "header_only"
+
+    def test_columnar_delegates_to_zone_grid(self):
+        """Columnar layout uses standard grid builder."""
+        words = _make_words([
+            (100, 50, "Col A header"),
+            (100, 300, "Col B header"),
+            (130, 50, "A data"),
+            (130, 300, "B data"),
+        ])
+        result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="columnar")
+        assert result["box_layout_type"] == "columnar"
+        # Should have detected columns
+        assert len(result.get("columns", [])) >= 1
+
+    def test_row_fields_for_gridtable(self):
+        """Rows must have y_min_px, y_max_px, is_header for GridTable."""
+        words = _make_words([(100, 50, "Title"), (130, 50, "Body")])
+        result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="flowing")
+        for row in result["rows"]:
+            assert "y_min_px" in row
+            assert "y_max_px" in row
+            assert "is_header" in row
+
+    def test_column_fields_for_gridtable(self):
+        """Columns must have x_min_px, x_max_px for GridTable width calculation."""
+        words = _make_words([(100, 50, "Text")])
+        result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="flowing")
+        for col in result["columns"]:
+            assert "x_min_px" in col
+            assert "x_max_px" in col
--- a/klausur-service/backend/tests/test_gutter_repair.py
+++ b/klausur-service/backend/tests/test_gutter_repair.py
@@ -0,0 +1,339 @@
+"""Tests for cv_gutter_repair: gutter-edge word detection and repair."""
+
+import pytest
+import sys
+import os
+
+# Add parent directory to path so we can import the module
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
+
+from cv_gutter_repair import (
+    _is_known,
+    _try_hyphen_join,
+    _try_spell_fix,
+    _edit_distance,
+    _word_is_at_gutter_edge,
+    _MIN_WORD_LEN_SPELL,
+    _MIN_WORD_LEN_HYPHEN,
+    analyse_grid_for_gutter_repair,
+    apply_gutter_suggestions,
+)
+
+
+# ---------------------------------------------------------------------------
+# Helper function tests
+# ---------------------------------------------------------------------------
+
+class TestEditDistance:
+    def test_identical(self):
+        assert _edit_distance("hello", "hello") == 0
+
+    def test_one_substitution(self):
+        assert _edit_distance("stammeli", "stammeln") == 1
+
+    def test_one_deletion(self):
+        assert _edit_distance("cat", "ca") == 1
+
+    def test_one_insertion(self):
+        assert _edit_distance("ca", "cat") == 1
+
+    def test_empty(self):
+        assert _edit_distance("", "abc") == 3
+        assert _edit_distance("abc", "") == 3
+
+    def test_both_empty(self):
+        assert _edit_distance("", "") == 0
+
+
+class TestWordIsAtGutterEdge:
+    def test_word_at_right_edge(self):
+        # Word right edge at 90% of column = within gutter zone
+        word_bbox = {"left": 80, "width": 15}  # right edge = 95
+        assert _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
+
+    def test_word_in_middle(self):
+        # Word right edge at 50% of column = NOT at gutter
+        word_bbox = {"left": 30, "width": 20}  # right edge = 50
+        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
+
+    def test_word_at_left(self):
+        word_bbox = {"left": 5, "width": 20}  # right edge = 25
+        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
+
+    def test_zero_width_column(self):
+        word_bbox = {"left": 0, "width": 10}
+        assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=0)
+
+
+# ---------------------------------------------------------------------------
+# Spellchecker-dependent tests (skip if not installed)
+# ---------------------------------------------------------------------------
+
+try:
+    from spellchecker import SpellChecker
+    _HAS_SPELLCHECKER = True
+except ImportError:
+    _HAS_SPELLCHECKER = False
+
+needs_spellchecker = pytest.mark.skipif(
+    not _HAS_SPELLCHECKER, reason="pyspellchecker not installed"
+)
+
+
+@needs_spellchecker
+class TestIsKnown:
+    def test_known_english(self):
+        assert _is_known("hello") is True
+        assert _is_known("world") is True
+
+    def test_known_german(self):
+        assert _is_known("verkünden") is True
+        assert _is_known("stammeln") is True
+
+    def test_unknown_garbled(self):
+        assert _is_known("stammeli") is False
+        assert _is_known("xyzqwp") is False
+
+    def test_short_word(self):
+        # Words < 3 chars are not checked
+        assert _is_known("a") is False
+
+
+@needs_spellchecker
+class TestTryHyphenJoin:
+    def test_direct_join(self):
+        # "ver" + "künden" = "verkünden"
+        result = _try_hyphen_join("ver-", "künden")
+        assert result is not None
+        joined, missing, conf = result
+        assert joined == "verkünden"
+        assert missing == ""
+        assert conf >= 0.9
+
+    def test_join_with_missing_chars(self):
+        # "ve" + "künden" → needs "r" in between → "verkünden"
+        result = _try_hyphen_join("ve", "künden", max_missing=2)
+        assert result is not None
+        joined, missing, conf = result
+        assert joined == "verkünden"
+        assert "r" in missing
+
+    def test_no_valid_join(self):
+        result = _try_hyphen_join("xyz", "qwpgh")
+        assert result is None
+
+    def test_empty_inputs(self):
+        assert _try_hyphen_join("", "word") is None
+        assert _try_hyphen_join("word", "") is None
+
+    def test_join_strips_trailing_punctuation(self):
+        # "ver" + "künden," → should still find "verkünden" despite comma
+        result = _try_hyphen_join("ver-", "künden,")
+        assert result is not None
+        joined, missing, conf = result
+        assert joined == "verkünden"
+
+    def test_join_with_missing_chars_and_punctuation(self):
+        # "ve" + "künden," → needs "r" in between, comma must be stripped
+        result = _try_hyphen_join("ve", "künden,", max_missing=2)
+        assert result is not None
+        joined, missing, conf = result
+        assert joined == "verkünden"
+        assert "r" in missing
+
+
+@needs_spellchecker
+class TestTrySpellFix:
+    def test_fix_garbled_ending_returns_alternatives(self):
+        # "stammeli" should return a correction with alternatives
+        result = _try_spell_fix("stammeli", col_type="column_de")
+        assert result is not None
+        corrected, conf, alts = result
+        # The best correction is one of the valid forms
+        all_options = [corrected] + alts
+        all_lower = [w.lower() for w in all_options]
+        # "stammeln" must be among the candidates
+        assert "stammeln" in all_lower, f"Expected 'stammeln' in {all_options}"
+
+    def test_known_word_not_fixed(self):
+        # "Haus" is correct — no fix needed
+        result = _try_spell_fix("Haus", col_type="column_de")
+        # Should be None since the word is correct
+        if result is not None:
+            corrected, _, _ = result
+            assert corrected.lower() == "haus"
+
+    def test_short_word_skipped(self):
+        result = _try_spell_fix("ab")
+        assert result is None
+
+    def test_min_word_len_thresholds(self):
+        assert _MIN_WORD_LEN_HYPHEN == 2
+        assert _MIN_WORD_LEN_SPELL == 3
+
+
+# ---------------------------------------------------------------------------
+# Grid analysis tests
+# ---------------------------------------------------------------------------
+
+def _make_grid(cells, columns=None):
+    """Helper to create a minimal grid_data structure."""
+    if columns is None:
+        columns = [
+            {"index": 0, "type": "column_en", "x_min_px": 0, "x_max_px": 200},
+            {"index": 1, "type": "column_de", "x_min_px": 200, "x_max_px": 400},
+            {"index": 2, "type": "column_text", "x_min_px": 400, "x_max_px": 600},
+        ]
+    return {
+        "image_width": 600,
+        "image_height": 800,
+        "zones": [{
+            "columns": columns,
+            "cells": cells,
+        }],
+    }
+
+
+def _make_cell(row, col, text, left=0, width=50, col_width=200, col_x=0):
+    """Helper to create a cell dict with word_boxes at a specific position."""
+    return {
+        "cell_id": f"R{row:02d}_C{col}",
+        "row_index": row,
+        "col_index": col,
+        "col_type": "column_text",
+        "text": text,
+        "confidence": 90.0,
+        "bbox_px": {"x": left, "y": row * 25, "w": width, "h": 20},
+        "word_boxes": [
+            {"text": text, "left": left, "top": row * 25, "width": width, "height": 20, "conf": 90},
+        ],
+    }
+
+
+@needs_spellchecker
+class TestAnalyseGrid:
+    def test_empty_grid(self):
+        result = analyse_grid_for_gutter_repair({"zones": []})
+        assert result["suggestions"] == []
+        assert result["stats"]["words_checked"] == 0
+
+    def test_detects_spell_fix_at_edge(self):
+        # "stammeli" at position 160 in a column 0-200 wide = 80% = at gutter
+        cells = [
+            _make_cell(29, 2, "stammeli", left=540, width=55, col_width=200, col_x=400),
+        ]
+        grid = _make_grid(cells)
+        result = analyse_grid_for_gutter_repair(grid)
+        suggestions = result["suggestions"]
+        assert len(suggestions) >= 1
+        assert suggestions[0]["type"] == "spell_fix"
+        assert suggestions[0]["suggested_text"] == "stammeln"
+
+    def test_detects_hyphen_join(self):
+        # Row 30: "ve" at gutter edge, Row 31: "künden"
+        cells = [
+            _make_cell(30, 2, "ve", left=570, width=25, col_width=200, col_x=400),
+            _make_cell(31, 2, "künden", left=410, width=80, col_width=200, col_x=400),
+        ]
+        grid = _make_grid(cells)
+        result = analyse_grid_for_gutter_repair(grid)
+        suggestions = result["suggestions"]
+        # Should find hyphen_join or spell_fix
+        assert len(suggestions) >= 1
+
+    def test_ignores_known_words(self):
+        # "hello" is a known word — should not be suggested
+        cells = [
+            _make_cell(0, 0, "hello", left=160, width=35),
+        ]
+        grid = _make_grid(cells)
+        result = analyse_grid_for_gutter_repair(grid)
+        # Should not suggest anything for known words
+        spell_fixes = [s for s in result["suggestions"] if s["original_text"] == "hello"]
+        assert len(spell_fixes) == 0
+
+    def test_ignores_words_not_at_edge(self):
+        # "stammeli" at position 10 = NOT at gutter edge
+        cells = [
+            _make_cell(0, 0, "stammeli", left=10, width=50),
+        ]
+        grid = _make_grid(cells)
+        result = analyse_grid_for_gutter_repair(grid)
+        assert len(result["suggestions"]) == 0
+
+
+# ---------------------------------------------------------------------------
+# Apply suggestions tests
+# ---------------------------------------------------------------------------
+
+class TestApplySuggestions:
+    def test_apply_spell_fix(self):
+        cells = [
+            {"cell_id": "R29_C2", "row_index": 29, "col_index": 2,
+             "text": "er stammeli", "word_boxes": []},
+        ]
+        grid = _make_grid(cells)
+        suggestions = [{
+            "id": "abc",
+            "type": "spell_fix",
+            "zone_index": 0,
+            "row_index": 29,
+            "col_index": 2,
+            "original_text": "stammeli",
+            "suggested_text": "stammeln",
+        }]
+        result = apply_gutter_suggestions(grid, ["abc"], suggestions)
+        assert result["applied_count"] == 1
+        assert grid["zones"][0]["cells"][0]["text"] == "er stammeln"
+
+    def test_apply_hyphen_join(self):
+        cells = [
+            {"cell_id": "R30_C2", "row_index": 30, "col_index": 2,
+             "text": "ve", "word_boxes": []},
+            {"cell_id": "R31_C2", "row_index": 31, "col_index": 2,
+             "text": "künden und", "word_boxes": []},
+        ]
+        grid = _make_grid(cells)
+        suggestions = [{
+            "id": "def",
+            "type": "hyphen_join",
+            "zone_index": 0,
+            "row_index": 30,
+            "col_index": 2,
+            "original_text": "ve",
+            "suggested_text": "verkünden",
+            "next_row_index": 31,
+            "display_parts": ["ver-", "künden"],
+            "missing_chars": "r",
+        }]
+        result = apply_gutter_suggestions(grid, ["def"], suggestions)
+        assert result["applied_count"] == 1
+        # Current row: "ve" replaced with "ver-"
+        assert grid["zones"][0]["cells"][0]["text"] == "ver-"
+        # Next row: UNCHANGED — "künden" stays in its original row
+        assert grid["zones"][0]["cells"][1]["text"] == "künden und"
+
+    def test_apply_nothing_when_no_accepted(self):
+        grid = _make_grid([])
+        result = apply_gutter_suggestions(grid, [], [])
+        assert result["applied_count"] == 0
+
+    def test_skip_unknown_suggestion_id(self):
+        cells = [
+            {"cell_id": "R0_C0", "row_index": 0, "col_index": 0,
+             "text": "test", "word_boxes": []},
+        ]
+        grid = _make_grid(cells)
+        suggestions = [{
+            "id": "abc",
+            "type": "spell_fix",
+            "zone_index": 0,
+            "row_index": 0,
+            "col_index": 0,
+            "original_text": "test",
+            "suggested_text": "test2",
+        }]
+        # Accept a non-existent ID
+        result = apply_gutter_suggestions(grid, ["nonexistent"], suggestions)
+        assert result["applied_count"] == 0
+        assert grid["zones"][0]["cells"][0]["text"] == "test"
--- a/klausur-service/backend/tests/test_merge_wrapped_rows.py
+++ b/klausur-service/backend/tests/test_merge_wrapped_rows.py
@@ -0,0 +1,135 @@
+"""Tests for _merge_wrapped_rows — cell-wrap continuation row merging."""
+
+import pytest
+import sys
+import os
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
+from cv_cell_grid import _merge_wrapped_rows
+
+
+def _entry(row_index, english='', german='', example=''):
+    return {
+        'row_index': row_index,
+        'english': english,
+        'german': german,
+        'example': example,
+    }
+
+
+class TestMergeWrappedRows:
+    """Test cell-wrap continuation row merging."""
+
+    def test_basic_en_empty_merge(self):
+        """EN empty, DE has text → merge DE into previous row."""
+        entries = [
+            _entry(0, english='take part (in)', german='teilnehmen (an), mitmachen', example='More than 200 singers took'),
+            _entry(1, english='', german='(bei)', example='part in the concert.'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['german'] == 'teilnehmen (an), mitmachen (bei)'
+        assert result[0]['example'] == 'More than 200 singers took part in the concert.'
+
+    def test_en_empty_de_only(self):
+        """EN empty, only DE continuation (no example)."""
+        entries = [
+            _entry(0, english='competition', german='der Wettbewerb,'),
+            _entry(1, english='', german='das Turnier'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['german'] == 'der Wettbewerb, das Turnier'
+
+    def test_en_empty_example_only(self):
+        """EN empty, only example continuation."""
+        entries = [
+            _entry(0, english='to arrive', german='ankommen', example='We arrived at the'),
+            _entry(1, english='', german='', example='hotel at midnight.'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['example'] == 'We arrived at the hotel at midnight.'
+
+    def test_de_empty_paren_continuation(self):
+        """DE empty, EN starts with parenthetical → merge into previous EN."""
+        entries = [
+            _entry(0, english='to take part', german='teilnehmen'),
+            _entry(1, english='(in)', german=''),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['english'] == 'to take part (in)'
+
+    def test_de_empty_lowercase_continuation(self):
+        """DE empty, EN starts lowercase → merge into previous EN."""
+        entries = [
+            _entry(0, english='to put up', german='aufstellen'),
+            _entry(1, english='with sth.', german=''),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['english'] == 'to put up with sth.'
+
+    def test_no_merge_both_have_content(self):
+        """Both EN and DE have text → normal row, don't merge."""
+        entries = [
+            _entry(0, english='house', german='Haus'),
+            _entry(1, english='garden', german='Garten'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 2
+
+    def test_no_merge_new_word_uppercase(self):
+        """EN has uppercase text, DE is empty → could be a new word, not merged."""
+        entries = [
+            _entry(0, english='house', german='Haus'),
+            _entry(1, english='Garden', german=''),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 2
+
+    def test_triple_wrap(self):
+        """Three consecutive wrapped rows → all merge into first."""
+        entries = [
+            _entry(0, english='competition', german='der Wettbewerb,'),
+            _entry(1, english='', german='das Turnier,'),
+            _entry(2, english='', german='der Wettkampf'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['german'] == 'der Wettbewerb, das Turnier, der Wettkampf'
+
+    def test_empty_entries(self):
+        """Empty list."""
+        assert _merge_wrapped_rows([]) == []
+
+    def test_single_entry(self):
+        """Single entry unchanged."""
+        entries = [_entry(0, english='house', german='Haus')]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+
+    def test_mixed_normal_and_wrapped(self):
+        """Mix of normal rows and wrapped rows."""
+        entries = [
+            _entry(0, english='house', german='Haus'),
+            _entry(1, english='take part (in)', german='teilnehmen (an),'),
+            _entry(2, english='', german='mitmachen (bei)'),
+            _entry(3, english='garden', german='Garten'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 3
+        assert result[0]['english'] == 'house'
+        assert result[1]['german'] == 'teilnehmen (an), mitmachen (bei)'
+        assert result[2]['english'] == 'garden'
+
+    def test_comma_separator_handling(self):
+        """Previous DE ends with comma → no extra space needed."""
+        entries = [
+            _entry(0, english='word', german='Wort,'),
+            _entry(1, english='', german='Ausdruck'),
+        ]
+        result = _merge_wrapped_rows(entries)
+        assert len(result) == 1
+        assert result[0]['german'] == 'Wort, Ausdruck'
--- a/klausur-service/backend/tests/test_page_crop.py
+++ b/klausur-service/backend/tests/test_page_crop.py
@@ -18,6 +18,7 @@ from page_crop import (
    detect_page_splits,
    _detect_format,
    _detect_edge_projection,
+    _detect_gutter_continuity,
    _detect_left_edge_shadow,
    _detect_right_edge_shadow,
    _detect_spine_shadow,
@@ -564,3 +565,110 @@ class TestDetectPageSplits:
            assert pages[0]["x"] == 0
            total_w = sum(p["width"] for p in pages)
            assert total_w == w, f"Total page width {total_w} != image width {w}"
+
+
+# ---------------------------------------------------------------------------
+# Tests: _detect_gutter_continuity (camera book scans)
+# ---------------------------------------------------------------------------
+
+def _make_camera_book_scan(h: int = 2400, w: int = 1700, gutter_side: str = "right") -> np.ndarray:
+    """Create a synthetic camera book scan with a subtle gutter shadow.
+
+    Camera gutter shadows are much subtler than scanner shadows:
+    - Page brightness ~250 (well-lit)
+    - Gutter brightness ~210-230 (slight shadow)
+    - Shadow runs continuously from top to bottom
+    - Gradient is ~40px wide
+    """
+    img = np.full((h, w, 3), 250, dtype=np.uint8)
+
+    # Add some variation to make it realistic
+    rng = np.random.RandomState(99)
+
+    # Subtle gutter gradient at the specified side
+    gutter_w = int(w * 0.04)  # ~4% of width
+    gradient_w = int(w * 0.03)  # transition zone
+
+    if gutter_side == "right":
+        gutter_start = w - gutter_w - gradient_w
+        for x in range(gutter_start, w):
+            dist_from_start = x - gutter_start
+            # Linear gradient from 250 down to 210
+            brightness = int(250 - 40 * min(dist_from_start / (gutter_w + gradient_w), 1.0))
+            img[:, x] = brightness
+    else:
+        gutter_end = gutter_w + gradient_w
+        for x in range(gutter_end):
+            dist_from_edge = gutter_end - x
+            brightness = int(250 - 40 * min(dist_from_edge / (gutter_w + gradient_w), 1.0))
+            img[:, x] = brightness
+
+    # Scatter some text (dark pixels) in the content area
+    content_left = gutter_end + 20 if gutter_side == "left" else 50
+    content_right = gutter_start - 20 if gutter_side == "right" else w - 50
+    for _ in range(800):
+        y = rng.randint(h // 10, h - h // 10)
+        x = rng.randint(content_left, content_right)
+        y2 = min(y + 3, h)
+        x2 = min(x + 15, w)
+        img[y:y2, x:x2] = 20
+
+    return img
+
+
+class TestDetectGutterContinuity:
+    """Tests for camera gutter shadow detection via vertical continuity."""
+
+    def test_detects_right_gutter(self):
+        """Should detect a subtle gutter shadow on the right side."""
+        img = _make_camera_book_scan(gutter_side="right")
+        h, w = img.shape[:2]
+        gray = np.mean(img, axis=2).astype(np.uint8)
+        search_w = w // 4
+        right_start = w - search_w
+        result = _detect_gutter_continuity(
+            gray, gray[:, right_start:], right_start, w, "right",
+        )
+        assert result is not None
+        # Gutter starts roughly at 93% of width (w - 4% - 3%)
+        assert result > w * 0.85, f"Gutter x={result} too far left"
+        assert result < w * 0.98, f"Gutter x={result} too close to edge"
+
+    def test_detects_left_gutter(self):
+        """Should detect a subtle gutter shadow on the left side."""
+        img = _make_camera_book_scan(gutter_side="left")
+        h, w = img.shape[:2]
+        gray = np.mean(img, axis=2).astype(np.uint8)
+        search_w = w // 4
+        result = _detect_gutter_continuity(
+            gray, gray[:, :search_w], 0, w, "left",
+        )
+        assert result is not None
+        assert result > w * 0.02, f"Gutter x={result} too close to edge"
+        assert result < w * 0.15, f"Gutter x={result} too far right"
+
+    def test_no_gutter_on_clean_page(self):
+        """Should NOT detect a gutter on a uniformly bright page."""
+        img = np.full((2000, 1600, 3), 250, dtype=np.uint8)
+        # Add some text but no gutter
+        rng = np.random.RandomState(42)
+        for _ in range(500):
+            y = rng.randint(100, 1900)
+            x = rng.randint(100, 1500)
+            img[y:min(y+3, 2000), x:min(x+15, 1600)] = 20
+        gray = np.mean(img, axis=2).astype(np.uint8)
+        w = 1600
+        search_w = w // 4
+        right_start = w - search_w
+        result_r = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
+        result_l = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
+        assert result_r is None, f"False positive on right: x={result_r}"
+        assert result_l is None, f"False positive on left: x={result_l}"
+
+    def test_integrated_with_crop(self):
+        """End-to-end: detect_and_crop_page should crop at the gutter."""
+        img = _make_camera_book_scan(gutter_side="right")
+        cropped, result = detect_and_crop_page(img)
+        # The right border should be > 0 (gutter cropped)
+        right_border = result["border_fractions"]["right"]
+        assert right_border > 0.01, f"Right border {right_border} — gutter not cropped"
--- a/klausur-service/backend/tests/test_smart_spell.py
+++ b/klausur-service/backend/tests/test_smart_spell.py
@@ -0,0 +1,286 @@
+"""Tests for SmartSpellChecker — language-aware OCR post-correction."""
+
+import pytest
+import sys, os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+
+from smart_spell import SmartSpellChecker, CorrectionResult
+
+
+@pytest.fixture
+def sc():
+    return SmartSpellChecker()
+
+
+# ─── Language Detection ──────────────────────────────────────────────────────
+
+
+class TestLanguageDetection:
+
+    def test_clear_english_words(self, sc):
+        for word in ("school", "beautiful", "homework", "yesterday", "because"):
+            assert sc.detect_word_lang(word) in ("en", "both"), f"{word} should be EN"
+
+    def test_clear_german_words(self, sc):
+        for word in ("Schule", "Hausaufgaben", "Freundschaft", "Straße", "Entschuldigung"):
+            assert sc.detect_word_lang(word) in ("de", "both"), f"{word} should be DE"
+
+    def test_ambiguous_words(self, sc):
+        """Words that exist in both languages."""
+        for word in ("Hand", "Finger", "Arm", "Name", "Ball"):
+            assert sc.detect_word_lang(word) == "both", f"{word} should be 'both'"
+
+    def test_unknown_words(self, sc):
+        assert sc.detect_word_lang("xyzqwk") == "unknown"
+        assert sc.detect_word_lang("") == "unknown"
+
+    def test_english_sentence(self, sc):
+        assert sc.detect_text_lang("I go to school every day") == "en"
+
+    def test_german_sentence(self, sc):
+        assert sc.detect_text_lang("Ich gehe jeden Tag zur Schule") == "de"
+
+    def test_mixed_sentence(self, sc):
+        # Dominant language should win
+        lang = sc.detect_text_lang("I like to play Fußball with my Freunde")
+        assert lang in ("en", "both")
+
+
+# ─── Single Word Correction ────────────────────────────────────────────────
+
+
+class TestSingleWordCorrection:
+
+    def test_known_word_not_changed(self, sc):
+        assert sc.correct_word("school", "en") is None
+        assert sc.correct_word("Freund", "de") is None
+
+    def test_digit_letter_single(self, sc):
+        assert sc.correct_word("g0od", "en") == "good"
+        assert sc.correct_word("he1lo", "en") == "hello"
+
+    def test_digit_letter_multi(self, sc):
+        """Multiple digit substitutions (e.g., sch00l)."""
+        result = sc.correct_word("sch00l", "en")
+        assert result == "school", f"Expected 'school', got '{result}'"
+
+    def test_pipe_to_I(self, sc):
+        assert sc.correct_word("|", "en") == "I"
+
+    def test_umlaut_schuler(self, sc):
+        assert sc.correct_word("Schuler", "de") == "Schüler"
+
+    def test_umlaut_uber(self, sc):
+        assert sc.correct_word("uber", "de") == "über"
+
+    def test_umlaut_bucher(self, sc):
+        assert sc.correct_word("Bucher", "de") == "Bücher"
+
+    def test_umlaut_turkei(self, sc):
+        assert sc.correct_word("Turkei", "de") == "Türkei"
+
+    def test_missing_char(self, sc):
+        assert sc.correct_word("beautful", "en") == "beautiful"
+
+    def test_transposition(self, sc):
+        assert sc.correct_word("teh", "en") == "the"
+
+    def test_swap(self, sc):
+        assert sc.correct_word("freind", "en") == "friend"
+
+    def test_no_false_correction_cross_lang(self, sc):
+        """Don't correct a word that's valid in the other language.
+
+        'Schuler' in the EN column should NOT be corrected to 'Schuyler'
+        because 'Schüler' is valid German — it's likely a German word
+        that ended up in the wrong column (or is a surname).
+        """
+        # Schuler is valid DE (after umlaut fix → Schüler), so
+        # in the EN column it should be left alone
+        result = sc.correct_word("Schuler", "en")
+        # Should either be None (no change) or not "Schuyler"
+        assert result != "Schuyler", "Should not false-correct German word in EN column"
+
+
+# ─── a/I Disambiguation ──────────────────────────────────────────────────────
+
+
+class TestAIDisambiguation:
+
+    def test_I_before_verb(self, sc):
+        assert sc._disambiguate_a_I("l", "am") == "I"
+        assert sc._disambiguate_a_I("l", "was") == "I"
+        assert sc._disambiguate_a_I("l", "think") == "I"
+        assert sc._disambiguate_a_I("l", "have") == "I"
+        assert sc._disambiguate_a_I("l", "don't") == "I"
+
+    def test_a_before_noun_adj(self, sc):
+        assert sc._disambiguate_a_I("a", "book") == "a"
+        assert sc._disambiguate_a_I("a", "cat") == "a"
+        assert sc._disambiguate_a_I("a", "big") == "a"
+        assert sc._disambiguate_a_I("a", "lot") == "a"
+
+    def test_uncertain_returns_none(self, sc):
+        """When context is ambiguous, return None (don't change)."""
+        assert sc._disambiguate_a_I("l", "xyzqwk") is None
+
+
+# ─── Full Text Correction ───────────────────────────────────────────────────
+
+
+class TestFullTextCorrection:
+
+    def test_english_sentence(self, sc):
+        result = sc.correct_text("teh cat is beautful", "en")
+        assert result.changed
+        assert "the" in result.corrected
+        assert "beautiful" in result.corrected
+
+    def test_german_sentence_no_change(self, sc):
+        result = sc.correct_text("Ich gehe zur Schule", "de")
+        assert not result.changed
+
+    def test_german_umlaut_fix(self, sc):
+        result = sc.correct_text("Der Schuler liest Bucher", "de")
+        assert "Schüler" in result.corrected
+        assert "Bücher" in result.corrected
+
+    def test_preserves_punctuation(self, sc):
+        result = sc.correct_text("teh cat, beautful!", "en")
+        assert "," in result.corrected
+        assert "!" in result.corrected
+
+    def test_empty_text(self, sc):
+        result = sc.correct_text("", "en")
+        assert not result.changed
+        assert result.corrected == ""
+
+
+# ─── Boundary Repair ───────────────────────────────────────────────────────
+
+
+class TestBoundaryRepair:
+
+    def test_ats_th_to_at_sth(self, sc):
+        """'ats th.' → 'at sth.' — shifted boundary with abbreviation."""
+        result = sc.correct_text("be good ats th.", "en")
+        assert "at sth." in result.corrected, f"Expected 'at sth.' in '{result.corrected}'"
+
+    def test_no_repair_common_pair(self, sc):
+        """Don't repair if both words form a common pair."""
+        result = sc.correct_text("at the", "en")
+        assert result.corrected == "at the"
+        assert not result.changed
+
+    def test_boundary_shift_right(self, sc):
+        """Shift chars from word1 to word2."""
+        repair = sc._try_boundary_repair("ats", "th")
+        assert repair == ("at", "sth") or repair == ("at", "sth"), f"Got {repair}"
+
+    def test_boundary_shift_with_punct(self, sc):
+        """Preserve punctuation during boundary repair."""
+        repair = sc._try_boundary_repair("ats", "th.")
+        assert repair is not None
+        assert repair[0] == "at"
+        assert repair[1] == "sth."
+
+    def test_pound_sand_to_pounds_and(self, sc):
+        """'Pound sand' → 'Pounds and' — both valid but repair is much more frequent."""
+        result = sc.correct_text("Pound sand euros", "en")
+        assert "Pounds and" in result.corrected, f"Expected 'Pounds and' in '{result.corrected}'"
+
+    def test_wit_hit_to_with_it(self, sc):
+        """'wit hit' → 'with it' — frequency-based repair."""
+        result = sc.correct_text("be careful wit hit", "en")
+        assert "with it" in result.corrected, f"Expected 'with it' in '{result.corrected}'"
+
+    def test_done_euro_to_one_euro(self, sc):
+        """'done euro' → 'one euro' in context."""
+        result = sc.correct_text("done euro", "en")
+        assert "one euro" in result.corrected, f"Expected 'one euro' in '{result.corrected}'"
+
+
+# ─── Context Split ──────────────────────────────────────────────────────────
+
+
+class TestContextSplit:
+
+    def test_anew_to_a_new(self, sc):
+        """'anew' → 'a new' when followed by a noun."""
+        result = sc.correct_text("anew book", "en")
+        assert result.corrected == "a new book", f"Got '{result.corrected}'"
+
+    def test_anew_standalone_no_split(self, sc):
+        """'anew' at end of phrase might genuinely be 'anew'."""
+        # "start anew" — no next word to indicate split
+        # This is ambiguous, so we accept either behavior
+        pass
+
+    def test_alive_not_split(self, sc):
+        """'alive' should never be split to 'a live'."""
+        result = sc.correct_text("alive and well", "en")
+        assert "alive" in result.corrected
+
+    def test_alone_not_split(self, sc):
+        """'alone' should never be split."""
+        result = sc.correct_text("alone in the dark", "en")
+        assert "alone" in result.corrected
+
+    def test_about_not_split(self, sc):
+        """'about' should never be split to 'a bout'."""
+        result = sc.correct_text("about time", "en")
+        assert "about" in result.corrected
+
+
+# ─── Vocab Entry Correction ─────────────────────────────────────────────────
+
+
+class TestVocabEntryCorrection:
+
+    def test_basic_entry(self, sc):
+        results = sc.correct_vocab_entry(
+            english="beautful",
+            german="schön",
+        )
+        assert results["english"].corrected == "beautiful"
+        assert results["german"].changed is False
+
+    def test_umlaut_in_german(self, sc):
+        results = sc.correct_vocab_entry(
+            english="school",
+            german="Schuler",
+        )
+        assert results["english"].changed is False
+        assert results["german"].corrected == "Schüler"
+
+    def test_example_auto_detect(self, sc):
+        results = sc.correct_vocab_entry(
+            english="friend",
+            german="Freund",
+            example="My best freind lives in Berlin",
+        )
+        assert "friend" in results["example"].corrected
+
+
+# ─── Speed ─────────────────────────────────────────────────────────────────
+
+
+class TestSpeed:
+
+    def test_100_corrections_under_500ms(self, sc):
+        """100 word corrections should complete in under 500ms."""
+        import time
+        words = [
+            ("beautful", "en"), ("teh", "en"), ("freind", "en"),
+            ("homwork", "en"), ("yesturday", "en"),
+            ("Schuler", "de"), ("Bucher", "de"), ("Turkei", "de"),
+            ("uber", "de"), ("Ubung", "de"),
+        ] * 10
+
+        t0 = time.time()
+        for word, lang in words:
+            sc.correct_word(word, lang)
+        dt = time.time() - t0
+
+        print(f"\n  100 corrections in {dt*1000:.0f}ms")
+        assert dt < 0.5, f"Too slow: {dt*1000:.0f}ms"
--- a/klausur-service/backend/tests/test_spell_benchmark.py
+++ b/klausur-service/backend/tests/test_spell_benchmark.py
@@ -0,0 +1,494 @@
+"""
+Benchmark: Spell-checking & language detection approaches for OCR post-correction.
+
+Tests pyspellchecker (already used), symspellpy (candidate), and
+dual-dictionary language detection heuristic on real vocabulary OCR data.
+
+Run:  pytest tests/test_spell_benchmark.py -v -s
+"""
+
+import time
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _load_pyspellchecker():
+    from spellchecker import SpellChecker
+    en = SpellChecker(language='en', distance=1)
+    de = SpellChecker(language='de', distance=1)
+    return en, de
+
+
+def _load_symspellpy():
+    """Load symspellpy with English frequency dict (bundled)."""
+    from symspellpy import SymSpell, Verbosity
+    sym = SymSpell(max_dictionary_edit_distance=2)
+    # Use bundled English frequency dict
+    import pkg_resources
+    dict_path = pkg_resources.resource_filename("symspellpy", "frequency_dictionary_en_82_765.txt")
+    sym.load_dictionary(dict_path, term_index=0, count_index=1)
+    return sym, Verbosity
+
+
+# ---------------------------------------------------------------------------
+# Test data: (ocr_output, expected_correction, language, category)
+# ---------------------------------------------------------------------------
+
+OCR_TEST_CASES = [
+    # --- Single-char ambiguity ---
+    ("l am a student", "I am a student", "en", "a_vs_I"),
+    ("a book", "a book", "en", "a_vs_I"),  # should NOT change
+    ("I like cats", "I like cats", "en", "a_vs_I"),  # should NOT change
+    ("lt is raining", "It is raining", "en", "a_vs_I"),  # l→I at start
+
+    # --- Digit-letter confusion ---
+    ("g0od", "good", "en", "digit_letter"),
+    ("sch00l", "school", "en", "digit_letter"),
+    ("he1lo", "hello", "en", "digit_letter"),
+    ("Sch0n", "Schon", "de", "digit_letter"),  # German
+
+    # --- Umlaut drops ---
+    ("schon", "schön", "de", "umlaut"),  # context: "schon" is also valid DE!
+    ("Schuler", "Schüler", "de", "umlaut"),
+    ("uber", "über", "de", "umlaut"),
+    ("Bucher", "Bücher", "de", "umlaut"),
+    ("Turkei", "Türkei", "de", "umlaut"),
+
+    # --- Common OCR errors ---
+    ("beautful", "beautiful", "en", "missing_char"),
+    ("teh", "the", "en", "transposition"),
+    ("becasue", "because", "en", "transposition"),
+    ("freind", "friend", "en", "swap"),
+    ("Freund", "Freund", "de", "correct"),  # already correct
+
+    # --- Merged words ---
+    ("atmyschool", "at my school", "en", "merged"),
+    ("goodidea", "good idea", "en", "merged"),
+
+    # --- Mixed language example sentences ---
+    ("I go to teh school", "I go to the school", "en", "sentence"),
+    ("Ich gehe zur Schule", "Ich gehe zur Schule", "de", "sentence_correct"),
+]
+
+# Language detection test: (word, expected_language)
+LANG_DETECT_CASES = [
+    # Clear English
+    ("school", "en"),
+    ("beautiful", "en"),
+    ("homework", "en"),
+    ("yesterday", "en"),
+    ("children", "en"),
+    ("because", "en"),
+    ("environment", "en"),
+    ("although", "en"),
+
+    # Clear German
+    ("Schule", "de"),
+    ("Hausaufgaben", "de"),
+    ("Freundschaft", "de"),
+    ("Umwelt", "de"),
+    ("Kindergarten", "de"),  # also used in English!
+    ("Bücher", "de"),
+    ("Straße", "de"),
+    ("Entschuldigung", "de"),
+
+    # Ambiguous (exist in both)
+    ("Hand", "both"),
+    ("Finger", "both"),
+    ("Arm", "both"),
+    ("Name", "both"),
+    ("Ball", "both"),
+
+    # Short/tricky
+    ("a", "en"),
+    ("I", "en"),
+    ("in", "both"),
+    ("an", "both"),
+    ("the", "en"),
+    ("die", "de"),
+    ("der", "de"),
+    ("to", "en"),
+    ("zu", "de"),
+]
+
+
+# ===========================================================================
+# Tests
+# ===========================================================================
+
+
+class TestPyspellchecker:
+    """Test pyspellchecker capabilities for OCR correction."""
+
+    @pytest.fixture(autouse=True)
+    def setup(self):
+        self.en, self.de = _load_pyspellchecker()
+
+    def test_known_words(self):
+        """Verify basic dictionary lookup."""
+        assert self.en.known(["school"])
+        assert self.en.known(["beautiful"])
+        assert self.de.known(["schule"])  # lowercase
+        assert self.de.known(["freund"])
+        # Not known
+        assert not self.en.known(["xyzqwk"])
+        assert not self.de.known(["xyzqwk"])
+
+    def test_correction_quality(self):
+        """Test correction suggestions for OCR errors."""
+        results = []
+        for ocr, expected, lang, category in OCR_TEST_CASES:
+            if category in ("sentence", "sentence_correct", "merged", "a_vs_I"):
+                continue  # skip multi-word cases
+
+            spell = self.en if lang == "en" else self.de
+            words = ocr.split()
+            corrected = []
+            for w in words:
+                if spell.known([w.lower()]):
+                    corrected.append(w)
+                else:
+                    fix = spell.correction(w.lower())
+                    if fix and fix != w.lower():
+                        # Preserve case
+                        if w[0].isupper():
+                            fix = fix[0].upper() + fix[1:]
+                        corrected.append(fix)
+                    else:
+                        corrected.append(w)
+            result = " ".join(corrected)
+            ok = result == expected
+            results.append((ocr, expected, result, ok, category))
+            if not ok:
+                print(f"  MISS: '{ocr}' → '{result}' (expected '{expected}') [{category}]")
+            else:
+                print(f"  OK:   '{ocr}' → '{result}' [{category}]")
+
+        correct = sum(1 for *_, ok, _ in results if ok)
+        total = len(results)
+        print(f"\npyspellchecker: {correct}/{total} correct ({100*correct/total:.0f}%)")
+
+    def test_language_detection_heuristic(self):
+        """Test dual-dictionary language detection."""
+        results = []
+        for word, expected_lang in LANG_DETECT_CASES:
+            w = word.lower()
+            in_en = bool(self.en.known([w]))
+            in_de = bool(self.de.known([w]))
+
+            if in_en and in_de:
+                detected = "both"
+            elif in_en:
+                detected = "en"
+            elif in_de:
+                detected = "de"
+            else:
+                detected = "unknown"
+
+            ok = detected == expected_lang
+            results.append((word, expected_lang, detected, ok))
+            if not ok:
+                print(f"  MISS: '{word}' → {detected} (expected {expected_lang})")
+            else:
+                print(f"  OK:   '{word}' → {detected}")
+
+        correct = sum(1 for *_, ok in results if ok)
+        total = len(results)
+        print(f"\nLang detection heuristic: {correct}/{total} correct ({100*correct/total:.0f}%)")
+
+    def test_umlaut_awareness(self):
+        """Test if pyspellchecker suggests umlaut corrections."""
+        # "Schuler" should suggest "Schüler"
+        candidates = self.de.candidates("schuler")
+        print(f"  'schuler' candidates: {candidates}")
+        # "uber" should suggest "über"
+        candidates_uber = self.de.candidates("uber")
+        print(f"  'uber' candidates: {candidates_uber}")
+        # "Turkei" should suggest "Türkei"
+        candidates_turkei = self.de.candidates("turkei")
+        print(f"  'turkei' candidates: {candidates_turkei}")
+
+    def test_speed_100_words(self):
+        """Measure correction speed for 100 words."""
+        words_en = ["beautful", "teh", "becasue", "freind", "shcool",
+                     "homwork", "yesturday", "chilren", "becuse", "enviroment"] * 10
+        t0 = time.time()
+        for w in words_en:
+            self.en.correction(w)
+        dt = time.time() - t0
+        print(f"\n  pyspellchecker: 100 EN corrections in {dt*1000:.0f}ms")
+
+        words_de = ["schuler", "bucher", "turkei", "strasze", "entschuldigung",
+                     "kindergaten", "freumd", "hauaufgaben", "umwlt", "ubung"] * 10
+        t0 = time.time()
+        for w in words_de:
+            self.de.correction(w)
+        dt = time.time() - t0
+        print(f"  pyspellchecker: 100 DE corrections in {dt*1000:.0f}ms")
+
+
+class TestSymspellpy:
+    """Test symspellpy as a faster alternative."""
+
+    @pytest.fixture(autouse=True)
+    def setup(self):
+        try:
+            self.sym, self.Verbosity = _load_symspellpy()
+            self.available = True
+        except (ImportError, FileNotFoundError) as e:
+            self.available = False
+            pytest.skip(f"symspellpy not installed: {e}")
+
+    def test_correction_quality(self):
+        """Test symspellpy corrections (EN only — no DE dict bundled)."""
+        en_cases = [(o, e, c) for o, e, _, c in OCR_TEST_CASES
+                    if _ == "en" and c not in ("sentence", "sentence_correct", "merged", "a_vs_I")]
+
+        results = []
+        for ocr, expected, category in en_cases:
+            suggestions = self.sym.lookup(ocr.lower(), self.Verbosity.CLOSEST, max_edit_distance=2)
+            if suggestions:
+                fix = suggestions[0].term
+                if ocr[0].isupper():
+                    fix = fix[0].upper() + fix[1:]
+                result = fix
+            else:
+                result = ocr
+
+            ok = result == expected
+            results.append((ocr, expected, result, ok, category))
+            status = "OK" if ok else "MISS"
+            print(f"  {status}: '{ocr}' → '{result}' (expected '{expected}') [{category}]")
+
+        correct = sum(1 for *_, ok, _ in results if ok)
+        total = len(results)
+        print(f"\nsymspellpy EN: {correct}/{total} correct ({100*correct/total:.0f}%)")
+
+    def test_speed_100_words(self):
+        """Measure symspellpy correction speed for 100 words."""
+        words = ["beautful", "teh", "becasue", "freind", "shcool",
+                 "homwork", "yesturday", "chilren", "becuse", "enviroment"] * 10
+        t0 = time.time()
+        for w in words:
+            self.sym.lookup(w, self.Verbosity.CLOSEST, max_edit_distance=2)
+        dt = time.time() - t0
+        print(f"\n  symspellpy: 100 EN corrections in {dt*1000:.0f}ms")
+
+    def test_compound_segmentation(self):
+        """Test symspellpy's word segmentation for merged words."""
+        cases = [
+            ("atmyschool", "at my school"),
+            ("goodidea", "good idea"),
+            ("makeadecision", "make a decision"),
+        ]
+        for merged, expected in cases:
+            result = self.sym.word_segmentation(merged)
+            ok = result.corrected_string == expected
+            status = "OK" if ok else "MISS"
+            print(f"  {status}: '{merged}' → '{result.corrected_string}' (expected '{expected}')")
+
+
+class TestContextDisambiguation:
+    """Test context-based disambiguation for a/I and similar cases."""
+
+    @pytest.fixture(autouse=True)
+    def setup(self):
+        self.en, self.de = _load_pyspellchecker()
+
+    def test_bigram_context(self):
+        """Use simple bigram heuristic for a/I disambiguation.
+
+        Approach: check if 'a <next_word>' or 'I <next_word>' is more
+        common by checking if <next_word> is a noun (follows 'a') or
+        verb (follows 'I').
+        """
+        # Common words that follow "I" (verbs)
+        i_followers = {"am", "was", "have", "had", "do", "did", "will",
+                       "would", "can", "could", "should", "shall", "may",
+                       "might", "think", "know", "see", "want", "need",
+                       "like", "love", "hate", "go", "went", "come",
+                       "came", "say", "said", "get", "got", "make", "made",
+                       "take", "took", "give", "gave", "tell", "told",
+                       "feel", "felt", "find", "found", "believe", "hope",
+                       "remember", "forget", "understand", "mean", "meant",
+                       "don't", "didn't", "can't", "won't", "couldn't",
+                       "shouldn't", "wouldn't", "haven't", "hadn't"}
+
+        # Common words that follow "a" (nouns/adjectives)
+        a_followers = {"lot", "few", "little", "bit", "good", "bad",
+                       "big", "small", "great", "new", "old", "long",
+                       "short", "man", "woman", "boy", "girl", "dog",
+                       "cat", "book", "car", "house", "day", "year",
+                       "nice", "beautiful", "large", "huge", "tiny"}
+
+        def disambiguate_a_I(token: str, next_word: str) -> str:
+            """Given an ambiguous 'a' or 'I' (or 'l'), pick the right one."""
+            nw = next_word.lower()
+            if nw in i_followers:
+                return "I"
+            if nw in a_followers:
+                return "a"
+            # Fallback: if next word is known verb → I, known adj/noun → a
+            # For now, use a simple heuristic: lowercase → "a", uppercase first letter → "I"
+            return token  # no change if uncertain
+
+        cases = [
+            ("l", "am", "I"),
+            ("l", "was", "I"),
+            ("l", "think", "I"),
+            ("a", "book", "a"),
+            ("a", "cat", "a"),
+            ("a", "lot", "a"),
+            ("l", "big", "a"),  # "a big ..."
+            ("a", "have", "I"),  # "I have ..."
+        ]
+
+        results = []
+        for token, next_word, expected in cases:
+            result = disambiguate_a_I(token, next_word)
+            ok = result == expected
+            results.append((token, next_word, expected, result, ok))
+            status = "OK" if ok else "MISS"
+            print(f"  {status}: '{token} {next_word}...' → '{result}' (expected '{expected}')")
+
+        correct = sum(1 for *_, ok in results if ok)
+        total = len(results)
+        print(f"\na/I disambiguation: {correct}/{total} correct ({100*correct/total:.0f}%)")
+
+
+class TestLangDetectLibrary:
+    """Test py3langid or langdetect if available."""
+
+    def test_py3langid(self):
+        try:
+            import langid
+        except ImportError:
+            pytest.skip("langid not installed")
+
+        sentences = [
+            ("I go to school every day", "en"),
+            ("Ich gehe jeden Tag zur Schule", "de"),
+            ("The weather is nice today", "en"),
+            ("Das Wetter ist heute schön", "de"),
+            ("She likes to play football", "en"),
+            ("Er spielt gerne Fußball", "de"),
+        ]
+
+        results = []
+        for text, expected in sentences:
+            lang, confidence = langid.classify(text)
+            ok = lang == expected
+            results.append(ok)
+            status = "OK" if ok else "MISS"
+            print(f"  {status}: '{text[:40]}...' → {lang} ({confidence:.2f}) (expected {expected})")
+
+        correct = sum(results)
+        print(f"\nlangid sentence detection: {correct}/{len(results)} correct")
+
+    def test_langid_single_words(self):
+        """langid on single words — expected to be unreliable."""
+        try:
+            import langid
+        except ImportError:
+            pytest.skip("langid not installed")
+
+        words = [("school", "en"), ("Schule", "de"), ("book", "en"),
+                 ("Buch", "de"), ("car", "en"), ("Auto", "de"),
+                 ("a", "en"), ("I", "en"), ("der", "de"), ("the", "en")]
+
+        results = []
+        for word, expected in words:
+            lang, conf = langid.classify(word)
+            ok = lang == expected
+            results.append(ok)
+            status = "OK" if ok else "MISS"
+            print(f"  {status}: '{word}' → {lang} ({conf:.2f}) (expected {expected})")
+
+        correct = sum(results)
+        print(f"\nlangid single-word: {correct}/{len(results)} correct")
+
+
+class TestIntegratedApproach:
+    """Test the combined approach: dict-heuristic for lang + spell correction."""
+
+    @pytest.fixture(autouse=True)
+    def setup(self):
+        self.en, self.de = _load_pyspellchecker()
+
+    def detect_language(self, word: str) -> str:
+        """Dual-dict heuristic language detection."""
+        w = word.lower()
+        # Skip very short words — too ambiguous
+        if len(w) <= 2:
+            return "ambiguous"
+        in_en = bool(self.en.known([w]))
+        in_de = bool(self.de.known([w]))
+        if in_en and in_de:
+            return "both"
+        if in_en:
+            return "en"
+        if in_de:
+            return "de"
+        return "unknown"
+
+    def correct_word(self, word: str, expected_lang: str) -> str:
+        """Correct a single word given the expected language."""
+        w_lower = word.lower()
+        spell = self.en if expected_lang == "en" else self.de
+
+        # Already known
+        if spell.known([w_lower]):
+            return word
+
+        # Also check the other language — might be fine
+        other = self.de if expected_lang == "en" else self.en
+        if other.known([w_lower]):
+            return word  # valid in the other language
+
+        # Try correction
+        fix = spell.correction(w_lower)
+        if fix and fix != w_lower:
+            if word[0].isupper():
+                fix = fix[0].upper() + fix[1:]
+            return fix
+
+        return word
+
+    def test_full_pipeline(self):
+        """Test: detect language → correct with appropriate dict."""
+        vocab_entries = [
+            # (english_col, german_col, expected_en, expected_de)
+            ("beautful", "schön", "beautiful", "schön"),
+            ("school", "Schule", "school", "Schule"),
+            ("teh cat", "die Katze", "the cat", "die Katze"),
+            ("freind", "Freund", "friend", "Freund"),
+            ("homwork", "Hausaufgaben", "homework", "Hausaufgaben"),
+            ("Schuler", "Schuler", "Schuler", "Schüler"),  # DE umlaut: Schüler
+        ]
+
+        en_correct = 0
+        de_correct = 0
+        total = len(vocab_entries)
+
+        for en_ocr, de_ocr, exp_en, exp_de in vocab_entries:
+            # Correct each word in the column
+            en_words = en_ocr.split()
+            de_words = de_ocr.split()
+            en_fixed = " ".join(self.correct_word(w, "en") for w in en_words)
+            de_fixed = " ".join(self.correct_word(w, "de") for w in de_words)
+
+            en_ok = en_fixed == exp_en
+            de_ok = de_fixed == exp_de
+            en_correct += en_ok
+            de_correct += de_ok
+
+            en_status = "OK" if en_ok else "MISS"
+            de_status = "OK" if de_ok else "MISS"
+            print(f"  EN {en_status}: '{en_ocr}' → '{en_fixed}' (expected '{exp_en}')")
+            print(f"  DE {de_status}: '{de_ocr}' → '{de_fixed}' (expected '{exp_de}')")
+
+        print(f"\nEN corrections: {en_correct}/{total} correct")
+        print(f"DE corrections: {de_correct}/{total} correct")
--- a/klausur-service/backend/tests/test_unified_grid.py
+++ b/klausur-service/backend/tests/test_unified_grid.py
@@ -0,0 +1,141 @@
+"""Tests for unified_grid.py — merging multi-zone grids into single zone."""
+
+import pytest
+import sys, os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+
+from unified_grid import (
+    _compute_dominant_row_height,
+    _classify_boxes,
+    build_unified_grid,
+)
+
+
+def _make_content_zone(rows_y, num_cols=4, bbox_w=1400):
+    """Helper: create a content zone with rows at given y positions."""
+    rows = [{"index": i, "y_min_px": y, "y_max_px": y + 30, "y_min": y, "y_max": y + 30,
+             "is_header": False} for i, y in enumerate(rows_y)]
+    cols = [{"index": i, "x_min_px": i * (bbox_w // num_cols), "x_max_px": (i + 1) * (bbox_w // num_cols)}
+            for i in range(num_cols)]
+    cells = [{"row_index": r["index"], "col_index": c["index"], "col_type": f"column_{c['index']+1}",
+              "text": f"R{r['index']}C{c['index']}", "cell_id": f"R{r['index']}C{c['index']}",
+              "word_boxes": [], "confidence": 90, "is_bold": False, "ocr_engine": "test",
+              "bbox_px": {"x": 0, "y": 0, "w": 100, "h": 30}, "bbox_pct": {"x": 0, "y": 0, "w": 10, "h": 2}}
+             for r in rows for c in cols]
+    return {
+        "zone_index": 1, "zone_type": "content",
+        "bbox_px": {"x": 50, "y": rows_y[0] - 10, "w": bbox_w, "h": rows_y[-1] - rows_y[0] + 50},
+        "bbox_pct": {"x": 3, "y": 10, "w": 85, "h": 80},
+        "columns": cols, "rows": rows, "cells": cells,
+        "header_rows": [], "border": None, "word_count": len(cells),
+    }
+
+
+def _make_box_zone(zone_index, bbox, cells_data, bg_hex="#2563eb", layout_type="flowing"):
+    """Helper: create a box zone."""
+    rows = [{"index": i, "y_min_px": bbox["y"] + i * 30, "y_max_px": bbox["y"] + (i + 1) * 30,
+             "is_header": i == 0} for i in range(len(cells_data))]
+    cols = [{"index": 0, "x_min_px": bbox["x"], "x_max_px": bbox["x"] + bbox["w"]}]
+    cells = [{"row_index": i, "col_index": 0, "col_type": "column_1",
+              "text": text, "cell_id": f"Z{zone_index}_R{i}C0",
+              "word_boxes": [], "confidence": 90, "is_bold": False, "ocr_engine": "test",
+              "bbox_px": {"x": bbox["x"], "y": bbox["y"] + i * 30, "w": bbox["w"], "h": 30},
+              "bbox_pct": {"x": 50, "y": 50, "w": 30, "h": 10}}
+             for i, text in enumerate(cells_data)]
+    return {
+        "zone_index": zone_index, "zone_type": "box",
+        "bbox_px": bbox, "bbox_pct": {"x": 50, "y": 50, "w": 30, "h": 10},
+        "columns": cols, "rows": rows, "cells": cells,
+        "header_rows": [0], "border": None, "word_count": len(cells),
+        "box_bg_hex": bg_hex, "box_bg_color": "blue", "box_layout_type": layout_type,
+    }
+
+
+class TestDominantRowHeight:
+
+    def test_regular_spacing(self):
+        """Rows with uniform spacing → median = that spacing."""
+        zone = _make_content_zone([100, 147, 194, 241, 288])
+        h = _compute_dominant_row_height(zone)
+        assert h == 47
+
+    def test_filters_large_gaps(self):
+        """Large gaps (box interruptions) are filtered out."""
+        zone = _make_content_zone([100, 147, 194, 600, 647, 694])
+        # spacings: 47, 47, 406(!), 47, 47 → filter >100 → median of [47,47,47,47] = 47
+        h = _compute_dominant_row_height(zone)
+        assert h == 47
+
+    def test_single_row(self):
+        """Single row → default 47."""
+        zone = _make_content_zone([100])
+        h = _compute_dominant_row_height(zone)
+        assert h == 47.0
+
+
+class TestClassifyBoxes:
+
+    def test_full_width(self):
+        """Box wider than 85% of content → full_width."""
+        boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 1300, "h": 200}, ["Header", "Text"])]
+        result = _classify_boxes(boxes, content_width=1400)
+        assert result[0]["classification"] == "full_width"
+
+    def test_partial_width_right(self):
+        """Narrow box on right side → partial_width, side=right."""
+        boxes = [_make_box_zone(2, {"x": 800, "y": 500, "w": 500, "h": 200}, ["Header", "Text"])]
+        result = _classify_boxes(boxes, content_width=1400)
+        assert result[0]["classification"] == "partial_width"
+        assert result[0]["side"] == "right"
+
+    def test_partial_width_left(self):
+        """Narrow box on left side → partial_width, side=left."""
+        boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 500, "h": 200}, ["Header", "Text"])]
+        result = _classify_boxes(boxes, content_width=1400)
+        assert result[0]["classification"] == "partial_width"
+        assert result[0]["side"] == "left"
+
+    def test_text_line_count(self):
+        """Total text lines counted including \\n."""
+        boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 500, "h": 200},
+                                ["Header", "Line1\nLine2\nLine3"])]
+        result = _classify_boxes(boxes, content_width=1400)
+        assert result[0]["total_lines"] == 4  # "Header" (1) + "Line1\nLine2\nLine3" (3)
+
+
+class TestBuildUnifiedGrid:
+
+    def test_content_only(self):
+        """Content zone without boxes → single unified zone."""
+        content = _make_content_zone([100, 147, 194, 241])
+        result = build_unified_grid([content], 1600, 2200, {})
+        assert result["is_unified"] is True
+        assert len(result["zones"]) == 1
+        assert result["zones"][0]["zone_type"] == "unified"
+        assert result["summary"]["total_rows"] == 4
+
+    def test_full_width_box_integration(self):
+        """Full-width box rows are integrated into unified grid."""
+        content = _make_content_zone([100, 147, 194, 600, 647])
+        box = _make_box_zone(2, {"x": 50, "y": 300, "w": 1300, "h": 200},
+                             ["Box Header", "Box Row 1", "Box Row 2"])
+        result = build_unified_grid([content, box], 1600, 2200, {})
+        assert result["is_unified"] is True
+        total_rows = result["summary"]["total_rows"]
+        # 5 content rows + 3 box rows = 8
+        assert total_rows == 8
+
+    def test_box_cells_tagged(self):
+        """Box-origin cells have source_zone_type and box_region."""
+        content = _make_content_zone([100, 147, 600, 647])
+        box = _make_box_zone(2, {"x": 50, "y": 300, "w": 1300, "h": 200}, ["Box Text"])
+        result = build_unified_grid([content, box], 1600, 2200, {})
+        box_cells = [c for c in result["zones"][0]["cells"] if c.get("source_zone_type") == "box"]
+        assert len(box_cells) > 0
+        assert box_cells[0]["box_region"]["bg_hex"] == "#2563eb"
+
+    def test_no_content_zone(self):
+        """No content zone → returns zones as-is."""
+        box = _make_box_zone(2, {"x": 50, "y": 300, "w": 500, "h": 200}, ["Text"])
+        result = build_unified_grid([box], 1600, 2200, {})
+        assert "zones" in result
--- a/klausur-service/backend/tests/test_word_split.py
+++ b/klausur-service/backend/tests/test_word_split.py
@@ -0,0 +1,104 @@
+"""Tests for merged-word splitting in cv_review.py.
+
+The OCR sometimes merges adjacent words when character spacing is tight,
+e.g. "atmyschool" → "at my school".  The _try_split_merged_word() function
+uses dynamic programming + dictionary lookup to find valid splits.
+"""
+
+import pytest
+
+from cv_review import _try_split_merged_word, _spell_dict_knows, _SPELL_AVAILABLE
+
+pytestmark = pytest.mark.skipif(
+    not _SPELL_AVAILABLE,
+    reason="pyspellchecker not installed",
+)
+
+
+class TestTrySplitMergedWord:
+    """Tests for _try_split_merged_word()."""
+
+    # --- Should split ---
+
+    def test_atmyschool(self):
+        result = _try_split_merged_word("atmyschool")
+        assert result is not None
+        words = result.lower().split()
+        assert "at" in words
+        assert "my" in words
+        assert "school" in words
+
+    def test_goodidea(self):
+        result = _try_split_merged_word("goodidea")
+        assert result is not None
+        assert "good" in result.lower()
+        assert "idea" in result.lower()
+
+    def test_comeon(self):
+        result = _try_split_merged_word("Comeon")
+        assert result is not None
+        assert result.startswith("Come")  # preserves casing
+        assert "on" in result.lower().split()
+
+    def test_youknowthe(self):
+        result = _try_split_merged_word("youknowthe")
+        assert result is not None
+        words = result.lower().split()
+        assert "you" in words
+        assert "know" in words
+        assert "the" in words
+
+    # --- Should NOT split ---
+
+    def test_known_word_unchanged(self):
+        """A known dictionary word should not be split."""
+        assert _try_split_merged_word("school") is None
+        assert _try_split_merged_word("beautiful") is None
+        assert _try_split_merged_word("together") is None
+
+    def test_anew(self):
+        result = _try_split_merged_word("anew")
+        # "anew" is itself a known word, so should NOT be split
+        # But "a new" is also valid. Dictionary decides.
+        # If "anew" is known → None. If not → "a new".
+        # Either way, both are acceptable.
+        pass  # depends on dictionary
+
+    def test_imadea(self):
+        result = _try_split_merged_word("Imadea")
+        assert result is not None
+        assert "made" in result.lower() or "I" in result
+
+    def test_makeadecision(self):
+        result = _try_split_merged_word("makeadecision")
+        assert result is not None
+        assert "make" in result.lower()
+        assert "decision" in result.lower()
+
+    def test_short_word(self):
+        """Words < 4 chars should not be attempted."""
+        assert _try_split_merged_word("the") is None
+        assert _try_split_merged_word("at") is None
+
+    def test_nonsense(self):
+        """Random letter sequences should not produce a split."""
+        result = _try_split_merged_word("xyzqwk")
+        assert result is None
+
+    # --- Casing preservation ---
+
+    def test_preserves_capitalization(self):
+        result = _try_split_merged_word("Goodidea")
+        assert result is not None
+        assert result.startswith("Good")
+
+    # --- Edge cases ---
+
+    def test_empty_string(self):
+        assert _try_split_merged_word("") is None
+
+    def test_none_safe(self):
+        """Non-alpha input should be handled gracefully."""
+        # _try_split_merged_word is only called for .isalpha() tokens,
+        # but test robustness anyway
+        assert _try_split_merged_word("123") is None
--- a/klausur-service/backend/unified_grid.py
+++ b/klausur-service/backend/unified_grid.py
@@ -0,0 +1,425 @@
+"""
+Unified Grid Builder — merges multi-zone grid into a single Excel-like grid.
+
+Takes content zone + box zones and produces one unified zone where:
+- All content rows use the dominant row height
+- Full-width boxes are integrated directly (box rows replace standard rows)
+- Partial-width boxes: extra rows inserted if box has more lines than standard
+- Box-origin cells carry metadata (bg_color, border) for visual distinction
+
+The result is a single-zone StructuredGrid that can be:
+- Rendered in an Excel-like editor
+- Exported to Excel/CSV
+- Edited with unified row/column numbering
+"""
+
+import logging
+import math
+import statistics
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+
+def _compute_dominant_row_height(content_zone: Dict) -> float:
+    """Median of content row-to-row spacings, excluding box-gap jumps."""
+    rows = content_zone.get("rows", [])
+    if len(rows) < 2:
+        return 47.0
+
+    spacings = []
+    for i in range(len(rows) - 1):
+        y1 = rows[i].get("y_min_px", rows[i].get("y_min", 0))
+        y2 = rows[i + 1].get("y_min_px", rows[i + 1].get("y_min", 0))
+        d = y2 - y1
+        if 0 < d < 100:  # exclude box-gap jumps
+            spacings.append(d)
+
+    if not spacings:
+        return 47.0
+    spacings.sort()
+    return spacings[len(spacings) // 2]
+
+
+def _classify_boxes(
+    box_zones: List[Dict],
+    content_width: float,
+) -> List[Dict]:
+    """Classify each box as full_width or partial_width."""
+    result = []
+    for bz in box_zones:
+        bb = bz.get("bbox_px", {})
+        bw = bb.get("w", 0)
+        bx = bb.get("x", 0)
+
+        if bw >= content_width * 0.85:
+            classification = "full_width"
+            side = "center"
+        else:
+            classification = "partial_width"
+            # Determine which side of the page the box is on
+            page_center = content_width / 2
+            box_center = bx + bw / 2
+            side = "right" if box_center > page_center else "left"
+
+        # Count total text lines in box (including \n within cells)
+        total_lines = sum(
+            (c.get("text", "").count("\n") + 1)
+            for c in bz.get("cells", [])
+        )
+
+        result.append({
+            "zone": bz,
+            "classification": classification,
+            "side": side,
+            "y_start": bb.get("y", 0),
+            "y_end": bb.get("y", 0) + bb.get("h", 0),
+            "total_lines": total_lines,
+            "bg_hex": bz.get("box_bg_hex", ""),
+            "bg_color": bz.get("box_bg_color", ""),
+        })
+    return result
+
+
+def build_unified_grid(
+    zones: List[Dict],
+    image_width: int,
+    image_height: int,
+    layout_metrics: Dict,
+) -> Dict[str, Any]:
+    """Build a single-zone unified grid from multi-zone grid data.
+
+    Returns a StructuredGrid with one zone containing all rows and cells.
+    """
+    content_zone = None
+    box_zones = []
+    for z in zones:
+        if z.get("zone_type") == "content":
+            content_zone = z
+        elif z.get("zone_type") == "box":
+            box_zones.append(z)
+
+    if not content_zone:
+        logger.warning("build_unified_grid: no content zone found")
+        return {"zones": zones}  # fallback: return as-is
+
+    box_zones.sort(key=lambda b: b.get("bbox_px", {}).get("y", 0))
+
+    dominant_h = _compute_dominant_row_height(content_zone)
+    content_bbox = content_zone.get("bbox_px", {})
+    content_width = content_bbox.get("w", image_width)
+    content_x = content_bbox.get("x", 0)
+    content_cols = content_zone.get("columns", [])
+    num_cols = len(content_cols)
+
+    box_infos = _classify_boxes(box_zones, content_width)
+
+    logger.info(
+        "build_unified_grid: dominant_h=%.1f, %d content rows, %d boxes (%s)",
+        dominant_h, len(content_zone.get("rows", [])), len(box_infos),
+        [b["classification"] for b in box_infos],
+    )
+
+    # --- Build unified row list + cell list ---
+    unified_rows: List[Dict] = []
+    unified_cells: List[Dict] = []
+    unified_row_idx = 0
+
+    # Content rows and cells indexed by row_index
+    content_rows = content_zone.get("rows", [])
+    content_cells = content_zone.get("cells", [])
+    content_cells_by_row: Dict[int, List[Dict]] = {}
+    for c in content_cells:
+        content_cells_by_row.setdefault(c.get("row_index", -1), []).append(c)
+
+    # Track which content rows we've processed
+    content_row_ptr = 0
+
+    for bi, box_info in enumerate(box_infos):
+        bz = box_info["zone"]
+        by_start = box_info["y_start"]
+        by_end = box_info["y_end"]
+
+        # --- Add content rows ABOVE this box ---
+        while content_row_ptr < len(content_rows):
+            cr = content_rows[content_row_ptr]
+            cry = cr.get("y_min_px", cr.get("y_min", 0))
+            if cry >= by_start:
+                break
+            # Add this content row
+            _add_content_row(
+                unified_rows, unified_cells, unified_row_idx,
+                cr, content_cells_by_row, dominant_h, image_height,
+            )
+            unified_row_idx += 1
+            content_row_ptr += 1
+
+        # --- Add box rows ---
+        if box_info["classification"] == "full_width":
+            # Full-width box: integrate box rows directly
+            _add_full_width_box(
+                unified_rows, unified_cells, unified_row_idx,
+                bz, box_info, dominant_h, num_cols, image_height,
+            )
+            unified_row_idx += len(bz.get("rows", []))
+            # Skip content rows that overlap with this box
+            while content_row_ptr < len(content_rows):
+                cr = content_rows[content_row_ptr]
+                cry = cr.get("y_min_px", cr.get("y_min", 0))
+                if cry > by_end:
+                    break
+                content_row_ptr += 1
+
+        else:
+            # Partial-width box: merge with adjacent content rows
+            unified_row_idx = _add_partial_width_box(
+                unified_rows, unified_cells, unified_row_idx,
+                bz, box_info, content_rows, content_cells_by_row,
+                content_row_ptr, dominant_h, num_cols, image_height,
+                content_x, content_width,
+            )
+            # Advance content pointer past box region
+            while content_row_ptr < len(content_rows):
+                cr = content_rows[content_row_ptr]
+                cry = cr.get("y_min_px", cr.get("y_min", 0))
+                if cry > by_end:
+                    break
+                content_row_ptr += 1
+
+    # --- Add remaining content rows BELOW all boxes ---
+    while content_row_ptr < len(content_rows):
+        cr = content_rows[content_row_ptr]
+        _add_content_row(
+            unified_rows, unified_cells, unified_row_idx,
+            cr, content_cells_by_row, dominant_h, image_height,
+        )
+        unified_row_idx += 1
+        content_row_ptr += 1
+
+    # --- Build unified zone ---
+    unified_zone = {
+        "zone_index": 0,
+        "zone_type": "unified",
+        "bbox_px": content_bbox,
+        "bbox_pct": content_zone.get("bbox_pct", {}),
+        "border": None,
+        "word_count": sum(len(c.get("word_boxes", [])) for c in unified_cells),
+        "columns": content_cols,
+        "rows": unified_rows,
+        "cells": unified_cells,
+        "header_rows": [],
+    }
+
+    logger.info(
+        "build_unified_grid: %d unified rows, %d cells (from %d content + %d box zones)",
+        len(unified_rows), len(unified_cells),
+        len(content_rows), len(box_zones),
+    )
+
+    return {
+        "zones": [unified_zone],
+        "image_width": image_width,
+        "image_height": image_height,
+        "layout_metrics": layout_metrics,
+        "summary": {
+            "total_zones": 1,
+            "total_columns": num_cols,
+            "total_rows": len(unified_rows),
+            "total_cells": len(unified_cells),
+        },
+        "is_unified": True,
+        "dominant_row_h": dominant_h,
+    }
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _make_row(idx: int, y: float, h: float, img_h: int, is_header: bool = False) -> Dict:
+    return {
+        "index": idx,
+        "row_index": idx,
+        "y_min_px": round(y),
+        "y_max_px": round(y + h),
+        "y_min_pct": round(y / img_h * 100, 2) if img_h else 0,
+        "y_max_pct": round((y + h) / img_h * 100, 2) if img_h else 0,
+        "is_header": is_header,
+    }
+
+
+def _remap_cell(cell: Dict, new_row: int, new_col: int = None,
+                source_type: str = "content", box_region: Dict = None) -> Dict:
+    """Create a new cell dict with remapped indices."""
+    c = dict(cell)
+    c["row_index"] = new_row
+    if new_col is not None:
+        c["col_index"] = new_col
+    c["cell_id"] = f"U_R{new_row:02d}_C{c.get('col_index', 0)}"
+    c["source_zone_type"] = source_type
+    if box_region:
+        c["box_region"] = box_region
+    return c
+
+
+def _add_content_row(
+    unified_rows, unified_cells, row_idx,
+    content_row, cells_by_row, dominant_h, img_h,
+):
+    """Add a single content row to the unified grid."""
+    y = content_row.get("y_min_px", content_row.get("y_min", 0))
+    is_hdr = content_row.get("is_header", False)
+    unified_rows.append(_make_row(row_idx, y, dominant_h, img_h, is_hdr))
+
+    for cell in cells_by_row.get(content_row.get("index", -1), []):
+        unified_cells.append(_remap_cell(cell, row_idx, source_type="content"))
+
+
+def _add_full_width_box(
+    unified_rows, unified_cells, start_row_idx,
+    box_zone, box_info, dominant_h, num_cols, img_h,
+):
+    """Add a full-width box's rows to the unified grid."""
+    box_rows = box_zone.get("rows", [])
+    box_cells = box_zone.get("cells", [])
+    box_region = {"bg_hex": box_info["bg_hex"], "bg_color": box_info["bg_color"], "border": True}
+
+    # Distribute box height evenly among its rows
+    box_h = box_info["y_end"] - box_info["y_start"]
+    row_h = box_h / len(box_rows) if box_rows else dominant_h
+
+    for i, br in enumerate(box_rows):
+        y = box_info["y_start"] + i * row_h
+        new_idx = start_row_idx + i
+        is_hdr = br.get("is_header", False)
+        unified_rows.append(_make_row(new_idx, y, row_h, img_h, is_hdr))
+
+        for cell in box_cells:
+            if cell.get("row_index") == br.get("index", i):
+                unified_cells.append(
+                    _remap_cell(cell, new_idx, source_type="box", box_region=box_region)
+                )
+
+
+def _add_partial_width_box(
+    unified_rows, unified_cells, start_row_idx,
+    box_zone, box_info, content_rows, content_cells_by_row,
+    content_row_ptr, dominant_h, num_cols, img_h,
+    content_x, content_width,
+) -> int:
+    """Add a partial-width box merged with content rows.
+
+    Returns the next unified_row_idx after processing.
+    """
+    by_start = box_info["y_start"]
+    by_end = box_info["y_end"]
+    box_h = by_end - by_start
+    box_region = {"bg_hex": box_info["bg_hex"], "bg_color": box_info["bg_color"], "border": True}
+
+    # Content rows in the box's Y range
+    overlap_content_rows = []
+    ptr = content_row_ptr
+    while ptr < len(content_rows):
+        cr = content_rows[ptr]
+        cry = cr.get("y_min_px", cr.get("y_min", 0))
+        if cry > by_end:
+            break
+        if cry >= by_start:
+            overlap_content_rows.append(cr)
+        ptr += 1
+
+    # How many standard rows fit in the box height
+    standard_rows = max(1, math.floor(box_h / dominant_h))
+    # How many text lines the box actually has
+    box_text_lines = box_info["total_lines"]
+    # Extra rows needed
+    extra_rows = max(0, box_text_lines - standard_rows)
+    total_rows_for_region = standard_rows + extra_rows
+
+    logger.info(
+        "partial box: standard=%d, box_lines=%d, extra=%d, content_overlap=%d",
+        standard_rows, box_text_lines, extra_rows, len(overlap_content_rows),
+    )
+
+    # Determine which columns the box occupies
+    box_bb = box_zone.get("bbox_px", {})
+    box_x = box_bb.get("x", 0)
+    box_w = box_bb.get("w", 0)
+
+    # Map box to content columns: find which content columns overlap
+    box_col_start = 0
+    box_col_end = num_cols
+    content_cols_list = []
+    for z_col_idx in range(num_cols):
+        # Find the column definition by checking all column entries
+        # Simple heuristic: if box starts past halfway, it's the right columns
+        pass
+
+    # Simpler approach: box on right side → last N columns
+    # box on left side → first N columns
+    if box_info["side"] == "right":
+        # Box starts at x=box_x. Find first content column that overlaps
+        box_col_start = num_cols  # default: beyond all columns
+        for z in (box_zone.get("columns") or [{"index": 0}]):
+            pass
+        # Use content column positions to determine overlap
+        content_cols_data = [
+            {"idx": c.get("index", i), "x_min": c.get("x_min_px", 0), "x_max": c.get("x_max_px", 0)}
+            for i, c in enumerate(content_rows[0:0] or [])  # placeholder
+        ]
+        # Simple: split columns at midpoint
+        box_col_start = num_cols // 2  # right half
+        box_col_end = num_cols
+    else:
+        box_col_start = 0
+        box_col_end = num_cols // 2
+
+    # Build rows for this region
+    box_cells = box_zone.get("cells", [])
+    box_rows = box_zone.get("rows", [])
+    row_idx = start_row_idx
+
+    # Expand box cell texts with \n into individual lines for row mapping
+    box_lines: List[Tuple[str, Dict]] = []  # (text_line, parent_cell)
+    for bc in sorted(box_cells, key=lambda c: c.get("row_index", 0)):
+        text = bc.get("text", "")
+        for line in text.split("\n"):
+            box_lines.append((line.strip(), bc))
+
+    for i in range(total_rows_for_region):
+        y = by_start + i * dominant_h
+        unified_rows.append(_make_row(row_idx, y, dominant_h, img_h))
+
+        # Content cells for this row (from overlapping content rows)
+        if i < len(overlap_content_rows):
+            cr = overlap_content_rows[i]
+            for cell in content_cells_by_row.get(cr.get("index", -1), []):
+                # Only include cells from columns NOT covered by the box
+                ci = cell.get("col_index", 0)
+                if ci < box_col_start or ci >= box_col_end:
+                    unified_cells.append(_remap_cell(cell, row_idx, source_type="content"))
+
+        # Box cell for this row
+        if i < len(box_lines):
+            line_text, parent_cell = box_lines[i]
+            box_cell = {
+                "cell_id": f"U_R{row_idx:02d}_C{box_col_start}",
+                "row_index": row_idx,
+                "col_index": box_col_start,
+                "col_type": "spanning_header" if (box_col_end - box_col_start) > 1 else parent_cell.get("col_type", "column_1"),
+                "colspan": box_col_end - box_col_start,
+                "text": line_text,
+                "confidence": parent_cell.get("confidence", 0),
+                "bbox_px": parent_cell.get("bbox_px", {}),
+                "bbox_pct": parent_cell.get("bbox_pct", {}),
+                "word_boxes": [],
+                "ocr_engine": parent_cell.get("ocr_engine", ""),
+                "is_bold": parent_cell.get("is_bold", False),
+                "source_zone_type": "box",
+                "box_region": box_region,
+            }
+            unified_cells.append(box_cell)
+
+        row_idx += 1
+
+    return row_idx
--- a/klausur-service/backend/vocab_learn_bridge.py
+++ b/klausur-service/backend/vocab_learn_bridge.py
@@ -0,0 +1,196 @@
+"""
+Vocab Learn Bridge — Converts vocabulary session data into Learning Units.
+
+Bridges klausur-service (vocab extraction) with backend-lehrer (learning units + generators).
+Creates a Learning Unit in backend-lehrer, then triggers MC/Cloze/QA generation.
+
+DATENSCHUTZ: All communication stays within Docker network (breakpilot-network).
+"""
+
+import os
+import json
+import logging
+import httpx
+from typing import List, Dict, Any, Optional
+
+logger = logging.getLogger(__name__)
+
+BACKEND_LEHRER_URL = os.getenv("BACKEND_LEHRER_URL", "http://backend-lehrer:8001")
+
+
+def vocab_to_analysis_data(session_name: str, vocabulary: List[Dict[str, Any]]) -> Dict[str, Any]:
+    """
+    Convert vocabulary entries from a vocab session into the analysis_data format
+    expected by backend-lehrer generators (MC, Cloze, QA).
+
+    The generators consume:
+    - title: Display name
+    - subject: Subject area
+    - grade_level: Target grade
+    - canonical_text: Full text representation
+    - printed_blocks: Individual text blocks
+    - vocabulary: Original vocab data (for vocab-specific modules)
+    """
+    canonical_lines = []
+    printed_blocks = []
+
+    for v in vocabulary:
+        en = v.get("english", "").strip()
+        de = v.get("german", "").strip()
+        example = v.get("example_sentence", "").strip()
+
+        if not en and not de:
+            continue
+
+        line = f"{en} = {de}"
+        if example:
+            line += f" ({example})"
+        canonical_lines.append(line)
+
+        block_text = f"{en} — {de}"
+        if example:
+            block_text += f" | {example}"
+        printed_blocks.append({"text": block_text})
+
+    return {
+        "title": session_name,
+        "subject": "English Vocabulary",
+        "grade_level": "5-8",
+        "canonical_text": "\n".join(canonical_lines),
+        "printed_blocks": printed_blocks,
+        "vocabulary": vocabulary,
+    }
+
+
+async def create_learning_unit(
+    session_name: str,
+    vocabulary: List[Dict[str, Any]],
+    grade: Optional[str] = None,
+) -> Dict[str, Any]:
+    """
+    Create a Learning Unit in backend-lehrer from vocabulary data.
+
+    Steps:
+    1. Create unit via POST /api/learning-units/
+    2. Return the created unit info
+
+    Returns dict with unit_id, status, vocabulary_count.
+    """
+    if not vocabulary:
+        raise ValueError("No vocabulary entries provided")
+
+    analysis_data = vocab_to_analysis_data(session_name, vocabulary)
+
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        # 1. Create Learning Unit
+        create_payload = {
+            "title": session_name,
+            "subject": "Englisch",
+            "grade": grade or "5-8",
+        }
+
+        try:
+            resp = await client.post(
+                f"{BACKEND_LEHRER_URL}/api/learning-units/",
+                json=create_payload,
+            )
+            resp.raise_for_status()
+            unit = resp.json()
+        except httpx.HTTPError as e:
+            logger.error(f"Failed to create learning unit: {e}")
+            raise RuntimeError(f"Backend-Lehrer nicht erreichbar: {e}")
+
+        unit_id = unit.get("id")
+        if not unit_id:
+            raise RuntimeError("Learning Unit created but no ID returned")
+
+        logger.info(f"Created learning unit {unit_id} with {len(vocabulary)} vocabulary entries")
+
+        # 2. Save analysis_data as JSON file for generators
+        analysis_dir = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten")
+        os.makedirs(analysis_dir, exist_ok=True)
+        analysis_path = os.path.join(analysis_dir, f"{unit_id}_analyse.json")
+
+        with open(analysis_path, "w", encoding="utf-8") as f:
+            json.dump(analysis_data, f, ensure_ascii=False, indent=2)
+
+        logger.info(f"Saved analysis data to {analysis_path}")
+
+        return {
+            "unit_id": unit_id,
+            "unit": unit,
+            "analysis_path": analysis_path,
+            "vocabulary_count": len(vocabulary),
+            "status": "created",
+        }
+
+
+async def generate_learning_modules(
+    unit_id: str,
+    analysis_path: str,
+) -> Dict[str, Any]:
+    """
+    Trigger MC, Cloze, and QA generation from analysis data.
+
+    Imports generators directly (they run in-process for klausur-service)
+    or calls backend-lehrer API if generators aren't available locally.
+
+    Returns dict with generation results.
+    """
+    results = {
+        "unit_id": unit_id,
+        "mc": {"status": "pending"},
+        "cloze": {"status": "pending"},
+        "qa": {"status": "pending"},
+    }
+
+    # Load analysis data
+    with open(analysis_path, "r", encoding="utf-8") as f:
+        analysis_data = json.load(f)
+
+    # Try to generate via backend-lehrer API
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        # Generate QA (includes Leitner fields)
+        try:
+            resp = await client.post(
+                f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-qa",
+                json={"analysis_data": analysis_data, "num_questions": min(len(analysis_data.get("vocabulary", [])), 20)},
+            )
+            if resp.status_code == 200:
+                results["qa"] = {"status": "generated", "data": resp.json()}
+            else:
+                logger.warning(f"QA generation returned {resp.status_code}")
+                results["qa"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
+        except Exception as e:
+            logger.warning(f"QA generation failed: {e}")
+            results["qa"] = {"status": "error", "reason": str(e)}
+
+        # Generate MC
+        try:
+            resp = await client.post(
+                f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-mc",
+                json={"analysis_data": analysis_data, "num_questions": min(len(analysis_data.get("vocabulary", [])), 10)},
+            )
+            if resp.status_code == 200:
+                results["mc"] = {"status": "generated", "data": resp.json()}
+            else:
+                results["mc"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
+        except Exception as e:
+            logger.warning(f"MC generation failed: {e}")
+            results["mc"] = {"status": "error", "reason": str(e)}
+
+        # Generate Cloze
+        try:
+            resp = await client.post(
+                f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-cloze",
+                json={"analysis_data": analysis_data},
+            )
+            if resp.status_code == 200:
+                results["cloze"] = {"status": "generated", "data": resp.json()}
+            else:
+                results["cloze"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
+        except Exception as e:
+            logger.warning(f"Cloze generation failed: {e}")
+            results["cloze"] = {"status": "error", "reason": str(e)}
+
+    return results
--- a/klausur-service/backend/vocab_worksheet_api.py
+++ b/klausur-service/backend/vocab_worksheet_api.py
@@ -77,6 +77,11 @@ try:
        render_pdf_high_res,
        PageRegion, RowGeometry,
    )
+    from cv_cell_grid import (
+        _merge_wrapped_rows,
+        _merge_phonetic_continuation_rows,
+        _merge_continuation_rows,
+    )
    from ocr_pipeline_session_store import (
        create_session_db as create_pipeline_session_db,
        update_session_db as update_pipeline_session_db,
@@ -1283,12 +1288,18 @@ async def get_pdf_page_image(session_id: str, page_number: int, zoom: float = Qu
 async def process_single_page(
    session_id: str,
    page_number: int,
+    ipa_mode: str = Query("none", pattern="^(auto|all|de|en|none)$"),
+    syllable_mode: str = Query("none", pattern="^(auto|all|de|en|none)$"),
 ):
    """
-    Process a SINGLE page of an uploaded PDF using the OCR pipeline.
+    Process a SINGLE page of an uploaded PDF using the Kombi OCR pipeline.

-    Uses the multi-step CV pipeline (deskew → dewarp → columns → rows → words)
-    instead of LLM vision for much better extraction quality.
+    Uses the full Kombi pipeline (orientation → deskew → dewarp → crop →
+    dual-engine OCR → grid-build with autocorrect/merge) for best quality.
+
+    Query params:
+        ipa_mode: "none" (default), "auto", "all", "en", "de"
+        syllable_mode: "none" (default), "auto", "all", "en", "de"

    The frontend should call this sequentially for each page.
    Returns the vocabulary for just this one page.
@@ -1296,7 +1307,10 @@ async def process_single_page(
    logger.info(f"Processing SINGLE page {page_number + 1} for session {session_id}")

    if session_id not in _sessions:
-        raise HTTPException(status_code=404, detail="Session not found")
+        raise HTTPException(
+            status_code=404,
+            detail="Session nicht im Speicher. Bitte erstellen Sie eine neue Session und laden Sie das PDF erneut hoch.",
+        )

    session = _sessions[session_id]
    pdf_data = session.get("pdf_data")
@@ -1316,6 +1330,7 @@ async def process_single_page(
            img_bgr = render_pdf_high_res(pdf_data, page_number, zoom=3.0)
            page_vocabulary, rotation_deg = await _run_ocr_pipeline_for_page(
                img_bgr, page_number, session_id,
+                ipa_mode=ipa_mode, syllable_mode=syllable_mode,
            )
        except Exception as e:
            logger.error(f"OCR pipeline failed for page {page_number + 1}: {e}", exc_info=True)
@@ -1384,28 +1399,33 @@ async def _run_ocr_pipeline_for_page(
    img_bgr: np.ndarray,
    page_number: int,
    vocab_session_id: str,
+    *,
+    ipa_mode: str = "none",
+    syllable_mode: str = "none",
 ) -> tuple:
-    """Run the full OCR pipeline on a single page image and return vocab entries.
+    """Run the full Kombi OCR pipeline on a single page and return vocab entries.

-    Uses the same pipeline as the admin OCR pipeline (ocr_pipeline_api.py).
+    Uses the same pipeline as the admin OCR Kombi pipeline:
+    orientation → deskew → dewarp → crop → dual-engine OCR → grid-build
+    (with pipe-autocorrect, word-gap merge, dictionary detection, etc.)

    Args:
-        img_bgr: BGR numpy array (from render_pdf_high_res, same as admin pipeline).
+        img_bgr: BGR numpy array.
        page_number: 0-indexed page number.
        vocab_session_id: Vocab session ID for logging.
+        ipa_mode: "none" (default for worksheets), "auto", "all", "en", "de".
+        syllable_mode: "none" (default for worksheets), "auto", "all", "en", "de".

-    Steps: deskew → dewarp → columns → rows → words → (LLM review)
    Returns (entries, rotation_deg) where entries is a list of dicts and
    rotation_deg is the orientation correction applied (0, 90, 180, 270).
    """
    import time as _time

    t_total = _time.time()
-
    img_h, img_w = img_bgr.shape[:2]
-    logger.info(f"OCR Pipeline page {page_number + 1}: image {img_w}x{img_h}")
+    logger.info(f"Kombi Pipeline page {page_number + 1}: image {img_w}x{img_h}")

-    # 1b. Orientation detection (fix upside-down scans)
+    # 1. Orientation detection (fix upside-down scans)
    t0 = _time.time()
    img_bgr, rotation = detect_and_fix_orientation(img_bgr)
    if rotation:
@@ -1414,7 +1434,7 @@ async def _run_ocr_pipeline_for_page(
    else:
        logger.info(f"  orientation: OK ({_time.time() - t0:.1f}s)")

-    # 2. Create pipeline session in DB (for debugging in admin UI)
+    # 2. Create pipeline session in DB (visible in admin Kombi UI)
    pipeline_session_id = str(uuid.uuid4())
    try:
        _, png_buf = cv2.imencode(".png", img_bgr)
@@ -1428,155 +1448,299 @@ async def _run_ocr_pipeline_for_page(
    except Exception as e:
        logger.warning(f"Could not create pipeline session in DB: {e}")

-    # 3. Three-pass deskew: iterative + word-alignment + text-line regression
+    # 3. Three-pass deskew
    t0 = _time.time()
    deskewed_bgr, angle_applied, deskew_debug = deskew_two_pass(img_bgr.copy())
-    angle_pass1 = deskew_debug.get("pass1_angle", 0.0)
-    angle_pass2 = deskew_debug.get("pass2_angle", 0.0)
-    angle_pass3 = deskew_debug.get("pass3_angle", 0.0)
-
-    logger.info(f"  deskew: p1={angle_pass1:.2f} p2={angle_pass2:.2f} "
-                f"p3={angle_pass3:.2f} total={angle_applied:.2f} "
-                f"({_time.time() - t0:.1f}s)")
+    logger.info(f"  deskew: angle={angle_applied:.2f} ({_time.time() - t0:.1f}s)")

    # 4. Dewarp
    t0 = _time.time()
    dewarped_bgr, dewarp_info = dewarp_image(deskewed_bgr)
    logger.info(f"  dewarp: shear={dewarp_info['shear_degrees']:.3f} ({_time.time() - t0:.1f}s)")

-    # 5. Column detection
+    # 5. Content crop (removes scanner borders, gutter shadows)
    t0 = _time.time()
-    ocr_img = create_ocr_image(dewarped_bgr)
-    h, w = ocr_img.shape[:2]
-
-    geo_result = detect_column_geometry(ocr_img, dewarped_bgr)
-    if geo_result is None:
-        layout_img = create_layout_image(dewarped_bgr)
-        regions = analyze_layout(layout_img, ocr_img)
-        word_dicts = None
-        inv = None
-        content_bounds = None
-    else:
-        geometries, left_x, right_x, top_y, bottom_y, word_dicts, inv = geo_result
-        content_w = right_x - left_x
-        header_y, footer_y = _detect_header_footer_gaps(inv, w, h) if inv is not None else (None, None)
-        geometries = _detect_sub_columns(geometries, content_w, left_x=left_x,
-                                          top_y=top_y, header_y=header_y, footer_y=footer_y)
-        geometries = _split_broad_columns(geometries, content_w, left_x=left_x)
-        geometries = expand_narrow_columns(geometries, content_w, left_x, word_dicts)
-        content_h = bottom_y - top_y
-        regions = positional_column_regions(geometries, content_w, content_h, left_x)
-        content_bounds = (left_x, right_x, top_y, bottom_y)
-
-    logger.info(f"  columns: {len(regions)} detected ({_time.time() - t0:.1f}s)")
-
-    # 6. Row detection
-    t0 = _time.time()
-    if word_dicts is None or inv is None or content_bounds is None:
-        # Re-run geometry detection to get intermediates
-        geo_result2 = detect_column_geometry(ocr_img, dewarped_bgr)
-        if geo_result2 is None:
-            raise ValueError("Column geometry detection failed — cannot detect rows")
-        _, left_x, right_x, top_y, bottom_y, word_dicts, inv = geo_result2
-        content_bounds = (left_x, right_x, top_y, bottom_y)
-
-    left_x, right_x, top_y, bottom_y = content_bounds
-    rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y)
-    logger.info(f"  rows: {len(rows)} detected ({_time.time() - t0:.1f}s)")
-
-    # 7. Word recognition (cell-first OCR v2)
-    t0 = _time.time()
-    col_regions = regions  # already PageRegion objects
-
-    # Populate row.words for word_count filtering
-    for row in rows:
-        row_y_rel = row.y - top_y
-        row_bottom_rel = row_y_rel + row.height
-        row.words = [
-            wd for wd in word_dicts
-            if row_y_rel <= wd['top'] + wd['height'] / 2 < row_bottom_rel
-        ]
-        row.word_count = len(row.words)
-
-    cells, columns_meta = build_cell_grid_v2(
-        ocr_img, col_regions, rows, img_w, img_h,
-        ocr_engine="auto", img_bgr=dewarped_bgr,
-    )
-
-    col_types = {c['type'] for c in columns_meta}
-    is_vocab = bool(col_types & {'column_en', 'column_de'})
-    logger.info(f"  words: {len(cells)} cells, vocab={is_vocab} ({_time.time() - t0:.1f}s)")
-
-    if not is_vocab:
-        logger.warning(f"  Page {page_number + 1}: layout is not vocab table "
-                       f"(types: {col_types}), returning empty")
-        return [], rotation
-
-    # 8. Map cells → vocab entries
-    entries = _cells_to_vocab_entries(cells, columns_meta)
-    entries = _fix_phonetic_brackets(entries, pronunciation="british")
-
-    # 9. Optional LLM review
    try:
-        review_result = await llm_review_entries(entries)
-        if review_result and review_result.get("changes"):
-            # Apply corrections
-            changes_map = {}
-            for ch in review_result["changes"]:
-                idx = ch.get("index")
-                if idx is not None:
-                    changes_map[idx] = ch
-            for idx, ch in changes_map.items():
-                if 0 <= idx < len(entries):
-                    for field in ("english", "german", "example"):
-                        if ch.get(field) and ch[field] != entries[idx].get(field):
-                            entries[idx][field] = ch[field]
-            logger.info(f"  llm review: {len(review_result['changes'])} corrections applied")
+        from page_crop import detect_and_crop_page
+        cropped_bgr, crop_result = detect_and_crop_page(dewarped_bgr)
+        if crop_result.get("crop_applied"):
+            dewarped_bgr = cropped_bgr
+            logger.info(f"  crop: applied ({_time.time() - t0:.1f}s)")
+        else:
+            logger.info(f"  crop: skipped ({_time.time() - t0:.1f}s)")
    except Exception as e:
-        logger.warning(f"  llm review skipped: {e}")
+        logger.warning(f"  crop: failed ({e}), continuing with uncropped image")

-    # 10. Map to frontend format
-    page_vocabulary = []
-    for entry in entries:
-        if not entry.get("english") and not entry.get("german"):
-            continue  # skip empty rows
-        page_vocabulary.append({
-            "id": str(uuid.uuid4()),
-            "english": entry.get("english", ""),
-            "german": entry.get("german", ""),
-            "example_sentence": entry.get("example", ""),
-            "source_page": page_number + 1,
+    # 6. Dual-engine OCR (RapidOCR + Tesseract → merge)
+    t0 = _time.time()
+    img_h, img_w = dewarped_bgr.shape[:2]
+
+    # RapidOCR (local ONNX)
+    try:
+        from cv_ocr_engines import ocr_region_rapid
+        from cv_vocab_types import PageRegion
+        full_region = PageRegion(type="full_page", x=0, y=0, width=img_w, height=img_h)
+        rapid_words = ocr_region_rapid(dewarped_bgr, full_region) or []
+    except Exception as e:
+        logger.warning(f"  RapidOCR failed: {e}")
+        rapid_words = []
+
+    # Tesseract
+    from PIL import Image
+    import pytesseract
+    pil_img = Image.fromarray(cv2.cvtColor(dewarped_bgr, cv2.COLOR_BGR2RGB))
+    data = pytesseract.image_to_data(
+        pil_img, lang="eng+deu", config="--psm 6 --oem 3",
+        output_type=pytesseract.Output.DICT,
+    )
+    tess_words = []
+    for i in range(len(data["text"])):
+        text = str(data["text"][i]).strip()
+        conf_raw = str(data["conf"][i])
+        conf = int(conf_raw) if conf_raw.lstrip("-").isdigit() else -1
+        if not text or conf < 20:
+            continue
+        tess_words.append({
+            "text": text,
+            "left": data["left"][i], "top": data["top"][i],
+            "width": data["width"][i], "height": data["height"][i],
+            "conf": conf,
        })

-    # 11. Update pipeline session in DB (for admin debugging)
-    try:
-        success_dsk, dsk_buf = cv2.imencode(".png", deskewed_bgr)
-        deskewed_png = dsk_buf.tobytes() if success_dsk else None
-        success_dwp, dwp_buf = cv2.imencode(".png", dewarped_bgr)
-        dewarped_png = dwp_buf.tobytes() if success_dwp else None
+    # Merge dual-engine results
+    from ocr_pipeline_ocr_merge import _split_paddle_multi_words, _merge_paddle_tesseract, _deduplicate_words
+    from cv_words_first import build_grid_from_words

+    rapid_split = _split_paddle_multi_words(rapid_words) if rapid_words else []
+    if rapid_split or tess_words:
+        merged_words = _merge_paddle_tesseract(rapid_split, tess_words)
+        merged_words = _deduplicate_words(merged_words)
+    else:
+        merged_words = tess_words  # fallback to Tesseract only
+
+    # Build initial grid from merged words
+    cells, columns_meta = build_grid_from_words(merged_words, img_w, img_h)
+    for cell in cells:
+        cell["ocr_engine"] = "rapid_kombi"
+
+    n_rows = len(set(c["row_index"] for c in cells)) if cells else 0
+    n_cols = len(columns_meta)
+    logger.info(f"  ocr: rapid={len(rapid_words)}, tess={len(tess_words)}, "
+                f"merged={len(merged_words)}, cells={len(cells)} ({_time.time() - t0:.1f}s)")
+
+    # 7. Save word_result to pipeline session (needed by _build_grid_core)
+    word_result = {
+        "cells": cells,
+        "grid_shape": {"rows": n_rows, "cols": n_cols, "total_cells": len(cells)},
+        "columns_used": columns_meta,
+        "layout": "vocab" if {c.get("type") for c in columns_meta} & {"column_en", "column_de"} else "generic",
+        "image_width": img_w,
+        "image_height": img_h,
+        "duration_seconds": 0,
+        "ocr_engine": "rapid_kombi",
+        "raw_tesseract_words": tess_words,
+        "summary": {
+            "total_cells": len(cells),
+            "non_empty_cells": sum(1 for c in cells if c.get("text")),
+        },
+    }
+
+    # Save images + word_result to pipeline session for admin visibility
+    try:
+        _, dsk_buf = cv2.imencode(".png", deskewed_bgr)
+        _, dwp_buf = cv2.imencode(".png", dewarped_bgr)
        await update_pipeline_session_db(
            pipeline_session_id,
-            deskewed_png=deskewed_png,
-            dewarped_png=dewarped_png,
+            deskewed_png=dsk_buf.tobytes(),
+            dewarped_png=dwp_buf.tobytes(),
+            cropped_png=cv2.imencode(".png", dewarped_bgr)[1].tobytes(),
+            word_result=word_result,
            deskew_result={"angle_applied": round(angle_applied, 3)},
            dewarp_result={"shear_degrees": dewarp_info.get("shear_degrees", 0)},
-            column_result={"columns": [{"type": r.type, "x": r.x, "y": r.y,
-                                         "width": r.width, "height": r.height}
-                                        for r in col_regions]},
-            row_result={"total_rows": len(rows)},
-            word_result={
-                "entry_count": len(page_vocabulary),
-                "layout": "vocab",
-                "vocab_entries": entries,
-            },
-            current_step=6,
+            current_step=8,
        )
    except Exception as e:
        logger.warning(f"Could not update pipeline session: {e}")

+    # 8. Run full grid-build (with pipe-autocorrect, word-gap merge, etc.)
+    t0 = _time.time()
+    try:
+        from grid_editor_api import _build_grid_core
+        session_data = {
+            "word_result": word_result,
+        }
+        grid_result = await _build_grid_core(
+            pipeline_session_id, session_data,
+            ipa_mode=ipa_mode, syllable_mode=syllable_mode,
+        )
+        logger.info(f"  grid-build: {grid_result.get('summary', {}).get('total_cells', 0)} cells "
+                    f"({_time.time() - t0:.1f}s)")
+
+        # Save grid result to pipeline session
+        try:
+            await update_pipeline_session_db(
+                pipeline_session_id,
+                grid_editor_result=grid_result,
+                current_step=11,
+            )
+        except Exception:
+            pass
+
+    except Exception as e:
+        logger.warning(f"  grid-build failed: {e}, falling back to basic grid")
+        grid_result = None
+
+    # 9. Extract vocab entries
+    # Prefer grid-build result (better column detection, more cells) over
+    # the initial build_grid_from_words() which often under-clusters.
+    page_vocabulary = []
+    extraction_source = "none"
+
+    # A) Try grid-build zones first (best quality: 4-column detection, autocorrect)
+    if grid_result and grid_result.get("zones"):
+        for zone in grid_result["zones"]:
+            zone_cols = zone.get("columns", [])
+            zone_cells = zone.get("cells", [])
+            if not zone_cols or not zone_cells:
+                continue
+
+            # Sort columns by x position to determine roles
+            sorted_cols = sorted(zone_cols, key=lambda c: c.get("x_min_px", 0))
+            col_idx_to_pos = {}
+            for pos, col in enumerate(sorted_cols):
+                ci = col.get("col_index", col.get("index", -1))
+                col_idx_to_pos[ci] = pos
+
+            # Skip zones with only 1 column (likely headers/boxes)
+            if len(sorted_cols) < 2:
+                continue
+
+            # Group cells by row
+            rows_map: dict = {}
+            for cell in zone_cells:
+                ri = cell.get("row_index", 0)
+                if ri not in rows_map:
+                    rows_map[ri] = {}
+                ci = cell.get("col_index", 0)
+                rows_map[ri][ci] = (cell.get("text") or "").strip()
+
+            n_cols = len(sorted_cols)
+            for ri in sorted(rows_map.keys()):
+                row = rows_map[ri]
+                # Collect texts in column-position order
+                texts = []
+                for col in sorted_cols:
+                    ci = col.get("col_index", col.get("index", -1))
+                    texts.append(row.get(ci, ""))
+
+                if not any(texts):
+                    continue
+
+                # Map by position, skipping narrow first column (page refs/markers)
+                # Heuristic: if first column is very narrow (<15% of zone width),
+                # it's likely a marker/ref column — skip it for vocab
+                first_col_width = sorted_cols[0].get("x_max_px", 0) - sorted_cols[0].get("x_min_px", 0)
+                zone_width = max(1, (sorted_cols[-1].get("x_max_px", 0) - sorted_cols[0].get("x_min_px", 0)))
+                skip_first = first_col_width / zone_width < 0.15 and n_cols >= 3
+
+                data_texts = texts[1:] if skip_first else texts
+
+                entry = {
+                    "id": str(uuid.uuid4()),
+                    "english": data_texts[0] if len(data_texts) > 0 else "",
+                    "german": data_texts[1] if len(data_texts) > 1 else "",
+                    "example_sentence": " ".join(t for t in data_texts[2:] if t) if len(data_texts) > 2 else "",
+                    "source_page": page_number + 1,
+                }
+                if entry["english"] or entry["german"]:
+                    page_vocabulary.append(entry)
+
+        if page_vocabulary:
+            extraction_source = f"grid-zones ({len(grid_result['zones'])} zones)"
+
+    # B) Fallback: original cells with column classification
+    if not page_vocabulary:
+        col_types = {c.get("type") for c in columns_meta}
+        is_vocab = bool(col_types & {"column_en", "column_de"})
+
+        if is_vocab:
+            entries = _cells_to_vocab_entries(cells, columns_meta)
+            entries = _fix_phonetic_brackets(entries, pronunciation="british")
+            for entry in entries:
+                if not entry.get("english") and not entry.get("german"):
+                    continue
+                page_vocabulary.append({
+                    "id": str(uuid.uuid4()),
+                    "english": entry.get("english", ""),
+                    "german": entry.get("german", ""),
+                    "example_sentence": entry.get("example", ""),
+                    "source_page": page_number + 1,
+                })
+            extraction_source = f"classified ({len(columns_meta)} cols)"
+        else:
+            # Last resort: all cells by position
+            rows_map2: dict = {}
+            for cell in cells:
+                ri = cell.get("row_index", 0)
+                if ri not in rows_map2:
+                    rows_map2[ri] = {}
+                ci = cell.get("col_index", 0)
+                rows_map2[ri][ci] = (cell.get("text") or "").strip()
+            all_ci = sorted({ci for r in rows_map2.values() for ci in r.keys()})
+            for ri in sorted(rows_map2.keys()):
+                row = rows_map2[ri]
+                texts = [row.get(ci, "") for ci in all_ci]
+                if not any(texts):
+                    continue
+                page_vocabulary.append({
+                    "id": str(uuid.uuid4()),
+                    "english": texts[0] if len(texts) > 0 else "",
+                    "german": texts[1] if len(texts) > 1 else "",
+                    "example_sentence": " ".join(texts[2:]) if len(texts) > 2 else "",
+                    "source_page": page_number + 1,
+                })
+            extraction_source = f"generic ({len(all_ci)} cols)"
+
+    # --- Post-processing: merge cell-wrap continuation rows ---
+    if len(page_vocabulary) >= 2:
+        try:
+            # Convert to internal format (example_sentence → example)
+            internal = []
+            for v in page_vocabulary:
+                internal.append({
+                    'row_index': len(internal),
+                    'english': v.get('english', ''),
+                    'german': v.get('german', ''),
+                    'example': v.get('example_sentence', ''),
+                })
+
+            n_before = len(internal)
+            internal = _merge_wrapped_rows(internal)
+            internal = _merge_phonetic_continuation_rows(internal)
+            internal = _merge_continuation_rows(internal)
+
+            if len(internal) < n_before:
+                # Rebuild page_vocabulary from merged entries
+                merged_vocab = []
+                for entry in internal:
+                    if not entry.get('english') and not entry.get('german'):
+                        continue
+                    merged_vocab.append({
+                        'id': str(uuid.uuid4()),
+                        'english': entry.get('english', ''),
+                        'german': entry.get('german', ''),
+                        'example_sentence': entry.get('example', ''),
+                        'source_page': page_number + 1,
+                    })
+                logger.info(f"  row merging: {n_before} → {len(merged_vocab)} entries")
+                page_vocabulary = merged_vocab
+        except Exception as e:
+            logger.warning(f"  row merging failed (non-critical): {e}")
+
+    logger.info(f"  vocab extraction: {len(page_vocabulary)} entries via {extraction_source}")
+
    total_duration = _time.time() - t_total
-    logger.info(f"OCR Pipeline page {page_number + 1}: "
+    logger.info(f"Kombi Pipeline page {page_number + 1}: "
                f"{len(page_vocabulary)} vocab entries in {total_duration:.1f}s")

    return page_vocabulary, rotation
@@ -2554,3 +2718,66 @@ async def load_ground_truth(session_id: str, page_number: int):
        gt_data = json.load(f)

    return {"success": True, "entries": gt_data.get("entries", []), "source": "disk"}
+
+
+# ─── Learning Module Generation ─────────────────────────────────────────────
+
+
+class GenerateLearningUnitRequest(BaseModel):
+    grade: Optional[str] = None
+    generate_modules: bool = True
+
+
+@router.post("/sessions/{session_id}/generate-learning-unit")
+async def generate_learning_unit_endpoint(session_id: str, request: GenerateLearningUnitRequest = None):
+    """
+    Create a Learning Unit from the vocabulary in this session.
+
+    1. Takes vocabulary from the session
+    2. Creates a Learning Unit in backend-lehrer
+    3. Optionally triggers MC/Cloze/QA generation
+
+    Returns the created unit info and generation status.
+    """
+    if request is None:
+        request = GenerateLearningUnitRequest()
+
+    if session_id not in _sessions:
+        raise HTTPException(status_code=404, detail="Session not found")
+
+    session = _sessions[session_id]
+    vocabulary = session.get("vocabulary", [])
+
+    if not vocabulary:
+        raise HTTPException(status_code=400, detail="No vocabulary in this session")
+
+    try:
+        from vocab_learn_bridge import create_learning_unit, generate_learning_modules
+
+        # Step 1: Create Learning Unit
+        result = await create_learning_unit(
+            session_name=session["name"],
+            vocabulary=vocabulary,
+            grade=request.grade,
+        )
+
+        # Step 2: Generate modules if requested
+        if request.generate_modules:
+            try:
+                gen_result = await generate_learning_modules(
+                    unit_id=result["unit_id"],
+                    analysis_path=result["analysis_path"],
+                )
+                result["generation"] = gen_result
+            except Exception as e:
+                logger.warning(f"Module generation failed (unit created): {e}")
+                result["generation"] = {"status": "error", "reason": str(e)}
+
+        return result
+
+    except ImportError:
+        raise HTTPException(status_code=501, detail="vocab_learn_bridge module not available")
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except RuntimeError as e:
+        raise HTTPException(status_code=502, detail=str(e))
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -70,12 +70,14 @@ nav:
      - BYOEH Developer Guide: services/klausur-service/BYOEH-Developer-Guide.md
      - NiBiS Pipeline: services/klausur-service/NiBiS-Ingestion-Pipeline.md
      - OCR Pipeline: services/klausur-service/OCR-Pipeline.md
+      - OCR Kombi Pipeline: services/klausur-service/OCR-Kombi-Pipeline.md
      - TrOCR ONNX: services/klausur-service/TrOCR-ONNX.md
      - OCR Labeling: services/klausur-service/OCR-Labeling-Spec.md
      - OCR Vergleich: services/klausur-service/OCR-Compare.md
      - RAG Admin: services/klausur-service/RAG-Admin-Spec.md
      - Worksheet Editor: services/klausur-service/Worksheet-Editor-Architecture.md
      - Chunk-Browser: services/klausur-service/Chunk-Browser.md
+      - RAG Landkarte: services/klausur-service/RAG-Landkarte.md
    - Voice-Service:
      - Uebersicht: services/voice-service/index.md
    - Agent-Core:
--- a/studio-v2/app/learn/[unitId]/flashcards/page.tsx
+++ b/studio-v2/app/learn/[unitId]/flashcards/page.tsx
@@ -0,0 +1,189 @@
+'use client'
+
+import React, { useState, useEffect, useCallback } from 'react'
+import { useParams, useRouter } from 'next/navigation'
+import { useTheme } from '@/lib/ThemeContext'
+import { FlashCard } from '@/components/learn/FlashCard'
+import { AudioButton } from '@/components/learn/AudioButton'
+
+interface QAItem {
+  id: string
+  question: string
+  answer: string
+  leitner_box: number
+  correct_count: number
+  incorrect_count: number
+}
+
+function getBackendUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8001'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8001'
+  return `${protocol}//${hostname}:8001`
+}
+
+export default function FlashcardsPage() {
+  const { unitId } = useParams<{ unitId: string }>()
+  const router = useRouter()
+  const { isDark } = useTheme()
+
+  const [items, setItems] = useState<QAItem[]>([])
+  const [currentIndex, setCurrentIndex] = useState(0)
+  const [isLoading, setIsLoading] = useState(true)
+  const [error, setError] = useState<string | null>(null)
+  const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
+  const [isComplete, setIsComplete] = useState(false)
+
+  const glassCard = isDark
+    ? 'bg-white/10 backdrop-blur-xl border border-white/10'
+    : 'bg-white/80 backdrop-blur-xl border border-black/5'
+
+  useEffect(() => {
+    loadQA()
+  }, [unitId])
+
+  const loadQA = async () => {
+    setIsLoading(true)
+    try {
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
+      if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
+      const data = await resp.json()
+      setItems(data.qa_items || [])
+    } catch (err: any) {
+      setError(err.message)
+    } finally {
+      setIsLoading(false)
+    }
+  }
+
+  const handleAnswer = useCallback(async (correct: boolean) => {
+    const item = items[currentIndex]
+    if (!item) return
+
+    // Update Leitner progress
+    try {
+      await fetch(
+        `${getBackendUrl()}/api/learning-units/${unitId}/leitner/update?item_id=${item.id}&correct=${correct}`,
+        { method: 'POST' }
+      )
+    } catch (err) {
+      console.error('Leitner update failed:', err)
+    }
+
+    setStats((prev) => ({
+      correct: prev.correct + (correct ? 1 : 0),
+      incorrect: prev.incorrect + (correct ? 0 : 1),
+    }))
+
+    if (currentIndex + 1 >= items.length) {
+      setIsComplete(true)
+    } else {
+      setCurrentIndex((i) => i + 1)
+    }
+  }, [items, currentIndex, unitId])
+
+  if (isLoading) {
+    return (
+      <div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+        <div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
+      </div>
+    )
+  }
+
+  if (error) {
+    return (
+      <div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+        <div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
+          <p className={isDark ? 'text-red-300' : 'text-red-600'}>Fehler: {error}</p>
+          <button onClick={() => router.push('/learn')} className="mt-4 px-4 py-2 rounded-xl bg-blue-500 text-white text-sm">
+            Zurueck
+          </button>
+        </div>
+      </div>
+    )
+  }
+
+  return (
+    <div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+      {/* Header */}
+      <div className={`${glassCard} border-0 border-b`}>
+        <div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
+          <button
+            onClick={() => router.push('/learn')}
+            className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
+          >
+            <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+            </svg>
+            Zurueck
+          </button>
+          <h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Karteikarten
+          </h1>
+          <span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
+            {items.length} Karten
+          </span>
+        </div>
+      </div>
+
+      {/* Content */}
+      <div className="flex-1 flex items-center justify-center px-6 py-8">
+        {isComplete ? (
+          <div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
+            <div className="text-5xl mb-4">
+              {stats.correct > stats.incorrect ? '🎉' : '💪'}
+            </div>
+            <h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+              Geschafft!
+            </h2>
+            <div className={`flex justify-center gap-8 mb-6 ${isDark ? 'text-white/80' : 'text-slate-700'}`}>
+              <div>
+                <span className="text-3xl font-bold text-green-500">{stats.correct}</span>
+                <p className="text-sm mt-1">Richtig</p>
+              </div>
+              <div>
+                <span className="text-3xl font-bold text-red-500">{stats.incorrect}</span>
+                <p className="text-sm mt-1">Falsch</p>
+              </div>
+            </div>
+            <div className="flex gap-3">
+              <button
+                onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadQA() }}
+                className="flex-1 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium"
+              >
+                Nochmal
+              </button>
+              <button
+                onClick={() => router.push('/learn')}
+                className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
+              >
+                Zurueck
+              </button>
+            </div>
+          </div>
+        ) : items.length > 0 ? (
+          <div className="w-full max-w-lg">
+            <FlashCard
+              front={items[currentIndex].question}
+              back={items[currentIndex].answer}
+              cardNumber={currentIndex + 1}
+              totalCards={items.length}
+              leitnerBox={items[currentIndex].leitner_box}
+              onCorrect={() => handleAnswer(true)}
+              onIncorrect={() => handleAnswer(false)}
+              isDark={isDark}
+            />
+            {/* Audio Button */}
+            <div className="flex justify-center mt-4">
+              <AudioButton text={items[currentIndex].question} lang="en" isDark={isDark} />
+            </div>
+          </div>
+        ) : (
+          <div className={`${glassCard} rounded-2xl p-8 text-center`}>
+            <p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Karteikarten verfuegbar.</p>
+          </div>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/learn/[unitId]/quiz/page.tsx
+++ b/studio-v2/app/learn/[unitId]/quiz/page.tsx
@@ -0,0 +1,160 @@
+'use client'
+
+import React, { useState, useEffect, useCallback } from 'react'
+import { useParams, useRouter } from 'next/navigation'
+import { useTheme } from '@/lib/ThemeContext'
+import { QuizQuestion } from '@/components/learn/QuizQuestion'
+
+interface MCQuestion {
+  id: string
+  question: string
+  options: { id: string; text: string }[]
+  correct_answer: string
+  explanation?: string
+}
+
+function getBackendUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8001'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8001'
+  return `${protocol}//${hostname}:8001`
+}
+
+export default function QuizPage() {
+  const { unitId } = useParams<{ unitId: string }>()
+  const router = useRouter()
+  const { isDark } = useTheme()
+
+  const [questions, setQuestions] = useState<MCQuestion[]>([])
+  const [currentIndex, setCurrentIndex] = useState(0)
+  const [isLoading, setIsLoading] = useState(true)
+  const [error, setError] = useState<string | null>(null)
+  const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
+  const [isComplete, setIsComplete] = useState(false)
+
+  const glassCard = isDark
+    ? 'bg-white/10 backdrop-blur-xl border border-white/10'
+    : 'bg-white/80 backdrop-blur-xl border border-black/5'
+
+  useEffect(() => {
+    loadMC()
+  }, [unitId])
+
+  const loadMC = async () => {
+    setIsLoading(true)
+    try {
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/mc`)
+      if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
+      const data = await resp.json()
+      setQuestions(data.questions || [])
+    } catch (err: any) {
+      setError(err.message)
+    } finally {
+      setIsLoading(false)
+    }
+  }
+
+  const handleAnswer = useCallback((correct: boolean) => {
+    setStats((prev) => ({
+      correct: prev.correct + (correct ? 1 : 0),
+      incorrect: prev.incorrect + (correct ? 0 : 1),
+    }))
+
+    if (currentIndex + 1 >= questions.length) {
+      setIsComplete(true)
+    } else {
+      setCurrentIndex((i) => i + 1)
+    }
+  }, [currentIndex, questions.length])
+
+  if (isLoading) {
+    return (
+      <div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+        <div className={`w-8 h-8 border-4 ${isDark ? 'border-purple-400' : 'border-purple-600'} border-t-transparent rounded-full animate-spin`} />
+      </div>
+    )
+  }
+
+  return (
+    <div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+      {/* Header */}
+      <div className={`${glassCard} border-0 border-b`}>
+        <div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
+          <button
+            onClick={() => router.push('/learn')}
+            className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
+          >
+            <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+            </svg>
+            Zurueck
+          </button>
+          <h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Quiz
+          </h1>
+          <span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
+            {questions.length} Fragen
+          </span>
+        </div>
+      </div>
+
+      {/* Content */}
+      <div className="flex-1 flex items-center justify-center px-6 py-8">
+        {error ? (
+          <div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
+            <p className={isDark ? 'text-red-300' : 'text-red-600'}>{error}</p>
+            <button onClick={() => router.push('/learn')} className="mt-4 px-4 py-2 rounded-xl bg-purple-500 text-white text-sm">
+              Zurueck
+            </button>
+          </div>
+        ) : isComplete ? (
+          <div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
+            <div className="text-5xl mb-4">
+              {stats.correct === questions.length ? '🏆' : stats.correct > stats.incorrect ? '🎉' : '💪'}
+            </div>
+            <h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+              {stats.correct === questions.length ? 'Perfekt!' : 'Geschafft!'}
+            </h2>
+            <p className={`text-lg mb-4 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+              {stats.correct} von {questions.length} richtig
+              ({Math.round((stats.correct / questions.length) * 100)}%)
+            </p>
+            <div className="w-full h-3 rounded-full bg-white/10 overflow-hidden mb-6">
+              <div
+                className="h-full rounded-full bg-gradient-to-r from-purple-500 to-pink-500"
+                style={{ width: `${(stats.correct / questions.length) * 100}%` }}
+              />
+            </div>
+            <div className="flex gap-3">
+              <button
+                onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadMC() }}
+                className="flex-1 py-3 rounded-xl bg-gradient-to-r from-purple-500 to-pink-500 text-white font-medium"
+              >
+                Nochmal
+              </button>
+              <button
+                onClick={() => router.push('/learn')}
+                className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
+              >
+                Zurueck
+              </button>
+            </div>
+          </div>
+        ) : questions[currentIndex] ? (
+          <QuizQuestion
+            question={questions[currentIndex].question}
+            options={questions[currentIndex].options}
+            correctAnswer={questions[currentIndex].correct_answer}
+            explanation={questions[currentIndex].explanation}
+            questionNumber={currentIndex + 1}
+            totalQuestions={questions.length}
+            onAnswer={handleAnswer}
+            isDark={isDark}
+          />
+        ) : (
+          <p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Quiz-Fragen verfuegbar.</p>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/learn/[unitId]/story/page.tsx
+++ b/studio-v2/app/learn/[unitId]/story/page.tsx
@@ -0,0 +1,184 @@
+'use client'
+
+import React, { useState, useEffect } from 'react'
+import { useParams, useRouter } from 'next/navigation'
+import { useTheme } from '@/lib/ThemeContext'
+import { AudioButton } from '@/components/learn/AudioButton'
+
+function getBackendUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8001'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8001'
+  return `${protocol}//${hostname}:8001`
+}
+
+function getKlausurApiUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8086'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8086'
+  return `${protocol}//${hostname}/klausur-api`
+}
+
+export default function StoryPage() {
+  const { unitId } = useParams<{ unitId: string }>()
+  const router = useRouter()
+  const { isDark } = useTheme()
+
+  const [story, setStory] = useState<{ story_html: string; story_text: string; vocab_used: string[]; language: string } | null>(null)
+  const [isLoading, setIsLoading] = useState(false)
+  const [error, setError] = useState<string | null>(null)
+  const [language, setLanguage] = useState<'en' | 'de'>('en')
+
+  const glassCard = isDark
+    ? 'bg-white/10 backdrop-blur-xl border border-white/10'
+    : 'bg-white/80 backdrop-blur-xl border border-black/5'
+
+  const generateStory = async () => {
+    setIsLoading(true)
+    setError(null)
+
+    try {
+      // First get the QA data to extract vocabulary
+      const qaResp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
+      let vocabulary: { english: string; german: string }[] = []
+
+      if (qaResp.ok) {
+        const qaData = await qaResp.json()
+        // Convert QA items to vocabulary format
+        vocabulary = (qaData.qa_items || []).map((item: any) => ({
+          english: item.question,
+          german: item.answer,
+        }))
+      }
+
+      if (vocabulary.length === 0) {
+        setError('Keine Vokabeln gefunden.')
+        setIsLoading(false)
+        return
+      }
+
+      // Generate story
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/generate-story`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ vocabulary, language, grade_level: '5-8' }),
+      })
+
+      if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
+      const data = await resp.json()
+      setStory(data)
+    } catch (err: any) {
+      setError(err.message)
+    } finally {
+      setIsLoading(false)
+    }
+  }
+
+  return (
+    <div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-amber-50 to-orange-100'}`}>
+      {/* Header */}
+      <div className={`${glassCard} border-0 border-b`}>
+        <div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
+          <button
+            onClick={() => router.push('/learn')}
+            className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
+          >
+            <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+            </svg>
+            Zurueck
+          </button>
+          <h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Minigeschichte
+          </h1>
+          <button
+            onClick={() => setLanguage((l) => l === 'en' ? 'de' : 'en')}
+            className={`text-xs px-3 py-1.5 rounded-lg ${isDark ? 'bg-white/10 text-white/70' : 'bg-slate-100 text-slate-600'}`}
+          >
+            {language === 'en' ? 'Englisch' : 'Deutsch'}
+          </button>
+        </div>
+      </div>
+
+      {/* Content */}
+      <div className="flex-1 flex items-center justify-center px-6 py-8">
+        <div className="w-full max-w-lg space-y-6">
+          {story ? (
+            <>
+              {/* Story Card */}
+              <div className={`${glassCard} rounded-3xl p-8`}>
+                <div className="flex items-center justify-between mb-4">
+                  <span className={`text-xs font-medium uppercase ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+                    {story.language === 'en' ? 'English Story' : 'Deutsche Geschichte'}
+                  </span>
+                  <AudioButton text={story.story_text} lang={story.language as 'en' | 'de'} isDark={isDark} size="md" />
+                </div>
+                <div
+                  className={`text-lg leading-relaxed ${isDark ? 'text-white/90' : 'text-slate-800'}`}
+                  dangerouslySetInnerHTML={{ __html: story.story_html }}
+                />
+                <style>{`
+                  .vocab-highlight {
+                    background: ${isDark ? 'rgba(96, 165, 250, 0.3)' : 'rgba(59, 130, 246, 0.15)'};
+                    color: ${isDark ? '#93c5fd' : '#1d4ed8'};
+                    padding: 1px 4px;
+                    border-radius: 4px;
+                    font-weight: 600;
+                  }
+                `}</style>
+              </div>
+
+              {/* Vocab used */}
+              <div className={`text-center text-sm ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+                Vokabeln verwendet: {story.vocab_used.length} / {story.vocab_used.length > 0 ? story.vocab_used.join(', ') : '-'}
+              </div>
+
+              {/* New Story Button */}
+              <button
+                onClick={generateStory}
+                disabled={isLoading}
+                className="w-full py-3 rounded-xl bg-gradient-to-r from-amber-500 to-orange-500 text-white font-medium hover:shadow-lg transition-all"
+              >
+                Neue Geschichte generieren
+              </button>
+            </>
+          ) : (
+            <div className={`${glassCard} rounded-3xl p-10 text-center`}>
+              <div className="text-5xl mb-4">📖</div>
+              <h2 className={`text-xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                Minigeschichte
+              </h2>
+              <p className={`text-sm mb-6 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                Die KI schreibt eine kurze Geschichte mit deinen Vokabeln.
+                Die Vokabelwoerter werden farbig hervorgehoben.
+              </p>
+
+              {error && (
+                <p className={`text-sm mb-4 ${isDark ? 'text-red-300' : 'text-red-600'}`}>{error}</p>
+              )}
+
+              <button
+                onClick={generateStory}
+                disabled={isLoading}
+                className={`w-full py-4 rounded-xl font-medium transition-all ${
+                  isLoading
+                    ? (isDark ? 'bg-white/5 text-white/30' : 'bg-slate-100 text-slate-400')
+                    : 'bg-gradient-to-r from-amber-500 to-orange-500 text-white hover:shadow-lg hover:shadow-orange-500/25'
+                }`}
+              >
+                {isLoading ? (
+                  <span className="flex items-center justify-center gap-3">
+                    <div className="w-5 h-5 border-2 border-white border-t-transparent rounded-full animate-spin" />
+                    Geschichte wird geschrieben...
+                  </span>
+                ) : (
+                  'Geschichte generieren'
+                )}
+              </button>
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/learn/[unitId]/type/page.tsx
+++ b/studio-v2/app/learn/[unitId]/type/page.tsx
@@ -0,0 +1,194 @@
+'use client'
+
+import React, { useState, useEffect, useCallback } from 'react'
+import { useParams, useRouter } from 'next/navigation'
+import { useTheme } from '@/lib/ThemeContext'
+import { TypeInput } from '@/components/learn/TypeInput'
+import { AudioButton } from '@/components/learn/AudioButton'
+
+interface QAItem {
+  id: string
+  question: string
+  answer: string
+  leitner_box: number
+}
+
+function getBackendUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8001'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8001'
+  return `${protocol}//${hostname}:8001`
+}
+
+export default function TypePage() {
+  const { unitId } = useParams<{ unitId: string }>()
+  const router = useRouter()
+  const { isDark } = useTheme()
+
+  const [items, setItems] = useState<QAItem[]>([])
+  const [currentIndex, setCurrentIndex] = useState(0)
+  const [isLoading, setIsLoading] = useState(true)
+  const [error, setError] = useState<string | null>(null)
+  const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
+  const [isComplete, setIsComplete] = useState(false)
+  const [direction, setDirection] = useState<'en_to_de' | 'de_to_en'>('en_to_de')
+
+  const glassCard = isDark
+    ? 'bg-white/10 backdrop-blur-xl border border-white/10'
+    : 'bg-white/80 backdrop-blur-xl border border-black/5'
+
+  useEffect(() => {
+    loadQA()
+  }, [unitId])
+
+  const loadQA = async () => {
+    setIsLoading(true)
+    try {
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
+      if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
+      const data = await resp.json()
+      setItems(data.qa_items || [])
+    } catch (err: any) {
+      setError(err.message)
+    } finally {
+      setIsLoading(false)
+    }
+  }
+
+  const handleResult = useCallback(async (correct: boolean) => {
+    const item = items[currentIndex]
+    if (!item) return
+
+    try {
+      await fetch(
+        `${getBackendUrl()}/api/learning-units/${unitId}/leitner/update?item_id=${item.id}&correct=${correct}`,
+        { method: 'POST' }
+      )
+    } catch (err) {
+      console.error('Leitner update failed:', err)
+    }
+
+    setStats((prev) => ({
+      correct: prev.correct + (correct ? 1 : 0),
+      incorrect: prev.incorrect + (correct ? 0 : 1),
+    }))
+
+    if (currentIndex + 1 >= items.length) {
+      setIsComplete(true)
+    } else {
+      setCurrentIndex((i) => i + 1)
+    }
+  }, [items, currentIndex, unitId])
+
+  const currentItem = items[currentIndex]
+  const prompt = currentItem
+    ? (direction === 'en_to_de' ? currentItem.question : currentItem.answer)
+    : ''
+  const answer = currentItem
+    ? (direction === 'en_to_de' ? currentItem.answer : currentItem.question)
+    : ''
+
+  if (isLoading) {
+    return (
+      <div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+        <div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
+      </div>
+    )
+  }
+
+  return (
+    <div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
+      {/* Header */}
+      <div className={`${glassCard} border-0 border-b`}>
+        <div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
+          <button
+            onClick={() => router.push('/learn')}
+            className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
+          >
+            <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
+            </svg>
+            Zurueck
+          </button>
+          <h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Eintippen
+          </h1>
+          {/* Direction toggle */}
+          <button
+            onClick={() => setDirection((d) => d === 'en_to_de' ? 'de_to_en' : 'en_to_de')}
+            className={`text-xs px-3 py-1.5 rounded-lg ${isDark ? 'bg-white/10 text-white/70' : 'bg-slate-100 text-slate-600'}`}
+          >
+            {direction === 'en_to_de' ? 'EN → DE' : 'DE → EN'}
+          </button>
+        </div>
+      </div>
+
+      {/* Progress */}
+      <div className="w-full h-1 bg-white/10">
+        <div
+          className="h-full bg-gradient-to-r from-blue-500 to-cyan-500 transition-all"
+          style={{ width: `${((currentIndex) / Math.max(items.length, 1)) * 100}%` }}
+        />
+      </div>
+
+      {/* Content */}
+      <div className="flex-1 flex items-center justify-center px-6 py-8">
+        {error ? (
+          <div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
+            <p className={isDark ? 'text-red-300' : 'text-red-600'}>{error}</p>
+          </div>
+        ) : isComplete ? (
+          <div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
+            <div className="text-5xl mb-4">
+              {stats.correct > stats.incorrect ? '🎉' : '💪'}
+            </div>
+            <h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+              Geschafft!
+            </h2>
+            <div className={`flex justify-center gap-8 mb-6 ${isDark ? 'text-white/80' : 'text-slate-700'}`}>
+              <div>
+                <span className="text-3xl font-bold text-green-500">{stats.correct}</span>
+                <p className="text-sm mt-1">Richtig</p>
+              </div>
+              <div>
+                <span className="text-3xl font-bold text-red-500">{stats.incorrect}</span>
+                <p className="text-sm mt-1">Falsch</p>
+              </div>
+            </div>
+            <div className="flex gap-3">
+              <button
+                onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadQA() }}
+                className="flex-1 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium"
+              >
+                Nochmal
+              </button>
+              <button
+                onClick={() => router.push('/learn')}
+                className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
+              >
+                Zurueck
+              </button>
+            </div>
+          </div>
+        ) : currentItem ? (
+          <div className="w-full max-w-lg space-y-4">
+            <div className="flex justify-center">
+              <AudioButton text={prompt} lang={direction === 'en_to_de' ? 'en' : 'de'} isDark={isDark} />
+            </div>
+            <TypeInput
+              prompt={prompt}
+              answer={answer}
+              onResult={handleResult}
+              isDark={isDark}
+            />
+            <p className={`text-center text-sm ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+              {currentIndex + 1} / {items.length}
+            </p>
+          </div>
+        ) : (
+          <p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Vokabeln verfuegbar.</p>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/learn/page.tsx
+++ b/studio-v2/app/learn/page.tsx
@@ -0,0 +1,164 @@
+'use client'
+
+import React, { useState, useEffect } from 'react'
+import { useTheme } from '@/lib/ThemeContext'
+import { Sidebar } from '@/components/Sidebar'
+import { UnitCard } from '@/components/learn/UnitCard'
+
+interface LearningUnit {
+  id: string
+  label: string
+  meta: string
+  title: string
+  topic: string | null
+  grade_level: string | null
+  status: string
+  vocabulary_count?: number
+  created_at: string
+}
+
+function getBackendUrl() {
+  if (typeof window === 'undefined') return 'http://localhost:8001'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8001'
+  return `${protocol}//${hostname}:8001`
+}
+
+export default function LearnPage() {
+  const { isDark } = useTheme()
+  const [units, setUnits] = useState<LearningUnit[]>([])
+  const [isLoading, setIsLoading] = useState(true)
+  const [error, setError] = useState<string | null>(null)
+
+  const glassCard = isDark
+    ? 'bg-white/10 backdrop-blur-xl border border-white/10'
+    : 'bg-white/80 backdrop-blur-xl border border-black/5'
+
+  useEffect(() => {
+    loadUnits()
+  }, [])
+
+  const loadUnits = async () => {
+    setIsLoading(true)
+    try {
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/`)
+      if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
+      const data = await resp.json()
+      setUnits(data)
+    } catch (err: any) {
+      setError(err.message)
+    } finally {
+      setIsLoading(false)
+    }
+  }
+
+  const handleDelete = async (unitId: string) => {
+    try {
+      const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}`, { method: 'DELETE' })
+      if (resp.ok) {
+        setUnits((prev) => prev.filter((u) => u.id !== unitId))
+      }
+    } catch (err) {
+      console.error('Delete failed:', err)
+    }
+  }
+
+  return (
+    <div className={`min-h-screen flex relative overflow-hidden ${
+      isDark
+        ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800'
+        : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'
+    }`}>
+      {/* Background Blobs */}
+      <div className="absolute inset-0 overflow-hidden pointer-events-none">
+        <div className={`absolute -top-40 -right-40 w-80 h-80 rounded-full mix-blend-multiply filter blur-3xl animate-pulse ${
+          isDark ? 'bg-blue-500 opacity-50' : 'bg-blue-300 opacity-30'
+        }`} />
+        <div className={`absolute -bottom-40 -left-40 w-80 h-80 rounded-full mix-blend-multiply filter blur-3xl animate-pulse ${
+          isDark ? 'bg-cyan-500 opacity-50' : 'bg-cyan-300 opacity-30'
+        }`} style={{ animationDelay: '2s' }} />
+      </div>
+
+      {/* Sidebar */}
+      <div className="relative z-10 p-4">
+        <Sidebar />
+      </div>
+
+      {/* Main Content */}
+      <div className="flex-1 flex flex-col relative z-10 overflow-y-auto">
+        {/* Header */}
+        <div className={`${glassCard} border-0 border-b`}>
+          <div className="max-w-5xl mx-auto px-6 py-4">
+            <div className="flex items-center gap-4">
+              <div className={`w-12 h-12 rounded-xl flex items-center justify-center ${
+                isDark ? 'bg-blue-500/30' : 'bg-blue-200'
+              }`}>
+                <svg className={`w-6 h-6 ${isDark ? 'text-blue-300' : 'text-blue-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253" />
+                </svg>
+              </div>
+              <div>
+                <h1 className={`text-xl font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                  Meine Lernmodule
+                </h1>
+                <p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                  Karteikarten, Quiz und Lueckentexte aus deinen Vokabeln
+                </p>
+              </div>
+            </div>
+          </div>
+        </div>
+
+        {/* Content */}
+        <div className="max-w-5xl mx-auto w-full px-6 py-6">
+          {isLoading && (
+            <div className="flex items-center justify-center py-20">
+              <div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
+            </div>
+          )}
+
+          {error && (
+            <div className={`${glassCard} rounded-2xl p-6 text-center`}>
+              <p className={`${isDark ? 'text-red-300' : 'text-red-600'}`}>Fehler: {error}</p>
+              <button onClick={loadUnits} className="mt-3 px-4 py-2 rounded-xl bg-blue-500 text-white text-sm">
+                Erneut versuchen
+              </button>
+            </div>
+          )}
+
+          {!isLoading && !error && units.length === 0 && (
+            <div className={`${glassCard} rounded-2xl p-12 text-center`}>
+              <div className={`w-16 h-16 mx-auto mb-4 rounded-2xl flex items-center justify-center ${
+                isDark ? 'bg-blue-500/20' : 'bg-blue-100'
+              }`}>
+                <svg className={`w-8 h-8 ${isDark ? 'text-blue-300' : 'text-blue-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253" />
+                </svg>
+              </div>
+              <h3 className={`text-lg font-semibold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                Noch keine Lernmodule
+              </h3>
+              <p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                Scanne eine Schulbuchseite im Vokabel-Arbeitsblatt Generator und klicke &quot;Lernmodule generieren&quot;.
+              </p>
+              <a
+                href="/vocab-worksheet"
+                className="inline-block px-6 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium hover:shadow-lg transition-all"
+              >
+                Zum Vokabel-Scanner
+              </a>
+            </div>
+          )}
+
+          {!isLoading && units.length > 0 && (
+            <div className="grid gap-4">
+              {units.map((unit) => (
+                <UnitCard key={unit.id} unit={unit} isDark={isDark} glassCard={glassCard} onDelete={handleDelete} />
+              ))}
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/ExportTab.tsx
+++ b/studio-v2/app/vocab-worksheet/components/ExportTab.tsx
@@ -0,0 +1,153 @@
+'use client'
+
+import React, { useState } from 'react'
+import { useRouter } from 'next/navigation'
+import type { VocabWorksheetHook } from '../types'
+import { getApiBase } from '../constants'
+
+export function ExportTab({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard } = h
+  const router = useRouter()
+
+  const [isGeneratingLearning, setIsGeneratingLearning] = useState(false)
+  const [learningUnitId, setLearningUnitId] = useState<string | null>(null)
+  const [learningError, setLearningError] = useState<string | null>(null)
+
+  const handleGenerateLearningUnit = async () => {
+    if (!h.session) return
+    setIsGeneratingLearning(true)
+    setLearningError(null)
+
+    try {
+      const apiBase = getApiBase()
+      const resp = await fetch(`${apiBase}/api/v1/vocab/sessions/${h.session.id}/generate-learning-unit`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ generate_modules: true }),
+      })
+
+      if (!resp.ok) {
+        const err = await resp.json().catch(() => ({}))
+        throw new Error(err.detail || `HTTP ${resp.status}`)
+      }
+
+      const result = await resp.json()
+      setLearningUnitId(result.unit_id)
+    } catch (err: any) {
+      setLearningError(err.message || 'Fehler bei der Generierung')
+    } finally {
+      setIsGeneratingLearning(false)
+    }
+  }
+
+  return (
+    <div className="space-y-6">
+      {/* PDF Download Section */}
+      <div className={`${glassCard} rounded-2xl p-6`}>
+        <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>PDF herunterladen</h2>
+
+        {h.worksheetId ? (
+          <div className="space-y-4">
+            <div className={`p-4 rounded-xl ${isDark ? 'bg-green-500/20 border border-green-500/30' : 'bg-green-100 border border-green-200'}`}>
+              <div className="flex items-center gap-3">
+                <svg className="w-6 h-6 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                </svg>
+                <span className={`font-medium ${isDark ? 'text-green-200' : 'text-green-700'}`}>Arbeitsblatt erfolgreich generiert!</span>
+              </div>
+            </div>
+
+            <div className="grid grid-cols-2 gap-4">
+              <button onClick={() => h.downloadPDF('worksheet')} className={`${glassCard} p-6 rounded-xl text-left transition-all hover:shadow-lg ${isDark ? 'hover:border-purple-400/50' : 'hover:border-purple-500'}`}>
+                <div className={`w-12 h-12 mb-3 rounded-xl flex items-center justify-center ${isDark ? 'bg-purple-500/30' : 'bg-purple-100'}`}>
+                  <svg className={`w-6 h-6 ${isDark ? 'text-purple-300' : 'text-purple-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 10v6m0 0l-3-3m3 3l3-3m2 8H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
+                  </svg>
+                </div>
+                <h3 className={`font-semibold mb-1 ${isDark ? 'text-white' : 'text-slate-900'}`}>Arbeitsblatt</h3>
+                <p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>PDF zum Ausdrucken</p>
+              </button>
+
+              {h.includeSolutions && (
+                <button onClick={() => h.downloadPDF('solution')} className={`${glassCard} p-6 rounded-xl text-left transition-all hover:shadow-lg ${isDark ? 'hover:border-green-400/50' : 'hover:border-green-500'}`}>
+                  <div className={`w-12 h-12 mb-3 rounded-xl flex items-center justify-center ${isDark ? 'bg-green-500/30' : 'bg-green-100'}`}>
+                    <svg className={`w-6 h-6 ${isDark ? 'text-green-300' : 'text-green-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
+                    </svg>
+                  </div>
+                  <h3 className={`font-semibold mb-1 ${isDark ? 'text-white' : 'text-slate-900'}`}>Loesungsblatt</h3>
+                  <p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>PDF mit Loesungen</p>
+                </button>
+              )}
+            </div>
+          </div>
+        ) : (
+          <p className={`text-center py-8 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Noch kein Arbeitsblatt generiert.</p>
+        )}
+      </div>
+
+      {/* Learning Module Generation Section */}
+      <div className={`${glassCard} rounded-2xl p-6`}>
+        <h2 className={`text-lg font-semibold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>Interaktive Lernmodule</h2>
+        <p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+          Aus den Vokabeln automatisch Karteikarten, Quiz und Lueckentexte erstellen.
+        </p>
+
+        {learningError && (
+          <div className={`p-3 rounded-xl mb-4 ${isDark ? 'bg-red-500/20 border border-red-500/30' : 'bg-red-100 border border-red-200'}`}>
+            <p className={`text-sm ${isDark ? 'text-red-200' : 'text-red-700'}`}>{learningError}</p>
+          </div>
+        )}
+
+        {learningUnitId ? (
+          <div className="space-y-4">
+            <div className={`p-4 rounded-xl ${isDark ? 'bg-blue-500/20 border border-blue-500/30' : 'bg-blue-100 border border-blue-200'}`}>
+              <div className="flex items-center gap-3">
+                <svg className="w-6 h-6 text-blue-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                </svg>
+                <span className={`font-medium ${isDark ? 'text-blue-200' : 'text-blue-700'}`}>Lernmodule wurden generiert!</span>
+              </div>
+            </div>
+
+            <button
+              onClick={() => router.push(`/learn/${learningUnitId}`)}
+              className="w-full py-3 rounded-xl font-medium bg-gradient-to-r from-blue-500 to-cyan-500 text-white hover:shadow-lg transition-all"
+            >
+              Lernmodule oeffnen
+            </button>
+          </div>
+        ) : (
+          <button
+            onClick={handleGenerateLearningUnit}
+            disabled={isGeneratingLearning || h.vocabulary.length === 0}
+            className={`w-full py-4 rounded-xl font-medium transition-all ${
+              isGeneratingLearning || h.vocabulary.length === 0
+                ? (isDark ? 'bg-white/5 text-white/30 cursor-not-allowed' : 'bg-slate-100 text-slate-400 cursor-not-allowed')
+                : 'bg-gradient-to-r from-blue-500 to-cyan-500 text-white hover:shadow-lg hover:shadow-blue-500/25'
+            }`}
+          >
+            {isGeneratingLearning ? (
+              <span className="flex items-center justify-center gap-3">
+                <div className="w-5 h-5 border-2 border-white border-t-transparent rounded-full animate-spin" />
+                Lernmodule werden generiert...
+              </span>
+            ) : (
+              <span className="flex items-center justify-center gap-2">
+                <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z" />
+                </svg>
+                Lernmodule generieren ({h.vocabulary.length} Vokabeln)
+              </span>
+            )}
+          </button>
+        )}
+      </div>
+
+      {/* Reset Button */}
+      <button onClick={h.resetSession} className={`w-full py-3 rounded-xl border font-medium transition-colors ${isDark ? 'border-white/20 text-white/80 hover:bg-white/10' : 'border-slate-300 text-slate-700 hover:bg-slate-50'}`}>
+        Neues Arbeitsblatt erstellen
+      </button>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/FullscreenPreview.tsx
+++ b/studio-v2/app/vocab-worksheet/components/FullscreenPreview.tsx
@@ -0,0 +1,39 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+
+export function FullscreenPreview({ h }: { h: VocabWorksheetHook }) {
+  return (
+    <div className="fixed inset-0 z-50 bg-black/80 backdrop-blur-sm flex items-center justify-center" onClick={() => h.setShowFullPreview(false)}>
+      <button
+        onClick={() => h.setShowFullPreview(false)}
+        className="absolute top-4 right-4 p-2 rounded-full bg-white/10 hover:bg-white/20 text-white z-10 transition-colors"
+      >
+        <svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
+        </svg>
+      </button>
+      <div className="max-w-[95vw] max-h-[95vh] overflow-auto" onClick={(e) => e.stopPropagation()}>
+        {h.directFile?.type.startsWith('image/') && h.directFilePreview && (
+          <img src={h.directFilePreview} alt="Original" className="max-w-none" />
+        )}
+        {h.directFile?.type === 'application/pdf' && h.directFilePreview && (
+          <iframe src={h.directFilePreview} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
+        )}
+        {h.selectedMobileFile && !h.directFile && (
+          h.selectedMobileFile.type.startsWith('image/')
+            ? <img src={h.selectedMobileFile.dataUrl} alt="Original" className="max-w-none" />
+            : <iframe src={h.selectedMobileFile.dataUrl} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
+        )}
+        {h.selectedDocumentId && !h.directFile && !h.selectedMobileFile && (() => {
+          const doc = h.storedDocuments.find(d => d.id === h.selectedDocumentId)
+          if (!doc?.url) return null
+          return doc.type.startsWith('image/')
+            ? <img src={doc.url} alt="Original" className="max-w-none" />
+            : <iframe src={doc.url} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
+        })()}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/OcrComparisonModal.tsx
+++ b/studio-v2/app/vocab-worksheet/components/OcrComparisonModal.tsx
@@ -0,0 +1,135 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+
+export function OcrComparisonModal({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard } = h
+
+  return (
+    <div className="fixed inset-0 z-50 flex items-center justify-center p-4 bg-black/50 backdrop-blur-sm">
+      <div className={`relative w-full max-w-6xl max-h-[90vh] overflow-auto rounded-3xl ${glassCard} p-6`}>
+        {/* Header */}
+        <div className="flex items-center justify-between mb-6">
+          <div>
+            <h2 className={`text-xl font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+              OCR-Methoden Vergleich
+            </h2>
+            <p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+              Seite {h.ocrComparePageIndex !== null ? h.ocrComparePageIndex + 1 : '-'}
+            </p>
+          </div>
+          <button
+            onClick={() => h.setShowOcrComparison(false)}
+            className={`p-2 rounded-xl ${isDark ? 'hover:bg-white/10 text-white' : 'hover:bg-black/5 text-slate-500'}`}
+          >
+            <svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
+            </svg>
+          </button>
+        </div>
+
+        {/* Loading State */}
+        {h.isComparingOcr && (
+          <div className="flex flex-col items-center justify-center py-12">
+            <div className="w-12 h-12 border-4 border-purple-500 border-t-transparent rounded-full animate-spin mb-4" />
+            <p className={isDark ? 'text-white/60' : 'text-slate-500'}>
+              Vergleiche OCR-Methoden... (kann 1-2 Minuten dauern)
+            </p>
+          </div>
+        )}
+
+        {/* Error State */}
+        {h.ocrCompareError && (
+          <div className={`p-4 rounded-xl ${isDark ? 'bg-red-500/20 text-red-300' : 'bg-red-100 text-red-700'}`}>
+            Fehler: {h.ocrCompareError}
+          </div>
+        )}
+
+        {/* Results */}
+        {h.ocrCompareResult && !h.isComparingOcr && (
+          <div className="space-y-6">
+            {/* Method Results Grid */}
+            <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
+              {Object.entries(h.ocrCompareResult.methods || {}).map(([key, method]: [string, any]) => (
+                <div
+                  key={key}
+                  className={`p-4 rounded-2xl ${
+                    h.ocrCompareResult.recommendation?.best_method === key
+                      ? (isDark ? 'bg-green-500/20 border border-green-500/50' : 'bg-green-100 border border-green-300')
+                      : (isDark ? 'bg-white/5 border border-white/10' : 'bg-white/50 border border-black/10')
+                  }`}
+                >
+                  <div className="flex items-center justify-between mb-3">
+                    <h3 className={`font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                      {method.name}
+                    </h3>
+                    {h.ocrCompareResult.recommendation?.best_method === key && (
+                      <span className="px-2 py-1 text-xs font-medium bg-green-500 text-white rounded-full">
+                        Beste
+                      </span>
+                    )}
+                  </div>
+
+                  {method.success ? (
+                    <>
+                      <div className={`text-sm mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                        <span className="font-medium">{method.vocabulary_count}</span> Vokabeln in <span className="font-medium">{method.duration_seconds}s</span>
+                      </div>
+
+                      {method.vocabulary && method.vocabulary.length > 0 && (
+                        <div className={`max-h-48 overflow-y-auto rounded-xl p-2 ${isDark ? 'bg-black/20' : 'bg-white/50'}`}>
+                          {method.vocabulary.slice(0, 10).map((v: any, idx: number) => (
+                            <div key={idx} className={`text-sm py-1 border-b last:border-0 ${isDark ? 'border-white/10 text-white/80' : 'border-black/5 text-slate-700'}`}>
+                              <span className="font-medium">{v.english}</span> = {v.german}
+                            </div>
+                          ))}
+                          {method.vocabulary.length > 10 && (
+                            <div className={`text-xs mt-2 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+                              + {method.vocabulary.length - 10} weitere...
+                            </div>
+                          )}
+                        </div>
+                      )}
+                    </>
+                  ) : (
+                    <div className={`text-sm ${isDark ? 'text-red-300' : 'text-red-600'}`}>
+                      {method.error || 'Fehler'}
+                    </div>
+                  )}
+                </div>
+              ))}
+            </div>
+
+            {/* Comparison Summary */}
+            {h.ocrCompareResult.comparison && (
+              <div className={`p-4 rounded-2xl ${isDark ? 'bg-blue-500/20 border border-blue-500/30' : 'bg-blue-100 border border-blue-200'}`}>
+                <h3 className={`font-semibold mb-3 ${isDark ? 'text-blue-300' : 'text-blue-900'}`}>
+                  Uebereinstimmung
+                </h3>
+                <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
+                  <div>
+                    <span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Von allen erkannt:</span>
+                    <span className="ml-2 font-bold">{h.ocrCompareResult.comparison.found_by_all_methods?.length || 0}</span>
+                  </div>
+                  <div>
+                    <span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Nur teilweise:</span>
+                    <span className="ml-2 font-bold">{h.ocrCompareResult.comparison.found_by_some_methods?.length || 0}</span>
+                  </div>
+                  <div>
+                    <span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Gesamt einzigartig:</span>
+                    <span className="ml-2 font-bold">{h.ocrCompareResult.comparison.total_unique_vocabulary || 0}</span>
+                  </div>
+                  <div>
+                    <span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Uebereinstimmung:</span>
+                    <span className="ml-2 font-bold">{Math.round((h.ocrCompareResult.comparison.agreement_rate || 0) * 100)}%</span>
+                  </div>
+                </div>
+              </div>
+            )}
+          </div>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/OcrSettingsPanel.tsx
+++ b/studio-v2/app/vocab-worksheet/components/OcrSettingsPanel.tsx
@@ -0,0 +1,125 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+import { defaultOcrPrompts } from '../constants'
+
+export function OcrSettingsPanel({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard, glassInput } = h
+
+  return (
+    <div className={`${glassCard} rounded-2xl p-6 mb-6`}>
+      <div className="flex items-center justify-between mb-4">
+        <h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          OCR-Filter Einstellungen
+        </h2>
+        <button
+          onClick={() => h.setShowSettings(false)}
+          className={`p-1 rounded-lg ${isDark ? 'hover:bg-white/10 text-white/60' : 'hover:bg-black/5 text-slate-500'}`}
+        >
+          <svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
+          </svg>
+        </button>
+      </div>
+
+      <div className={`p-4 rounded-xl mb-4 ${isDark ? 'bg-blue-500/20 text-blue-200' : 'bg-blue-100 text-blue-800'}`}>
+        <p className="text-sm">
+          Diese Einstellungen helfen, unerwuenschte Elemente wie Seitenzahlen, Kapitelnamen oder Kopfzeilen aus dem OCR-Ergebnis zu filtern.
+        </p>
+      </div>
+
+      <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
+        {/* Checkboxes */}
+        <div className="space-y-3">
+          <label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            <input
+              type="checkbox"
+              checked={h.ocrPrompts.filterHeaders}
+              onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterHeaders: e.target.checked })}
+              className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
+            />
+            <span>Kopfzeilen filtern (z.B. Kapitelnamen)</span>
+          </label>
+
+          <label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            <input
+              type="checkbox"
+              checked={h.ocrPrompts.filterFooters}
+              onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterFooters: e.target.checked })}
+              className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
+            />
+            <span>Fusszeilen filtern</span>
+          </label>
+
+          <label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            <input
+              type="checkbox"
+              checked={h.ocrPrompts.filterPageNumbers}
+              onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterPageNumbers: e.target.checked })}
+              className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
+            />
+            <span>Seitenzahlen filtern (auch ausgeschrieben: &quot;zweihundertzwoelf&quot;)</span>
+          </label>
+        </div>
+
+        {/* Patterns */}
+        <div className="space-y-4">
+          <div>
+            <label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+              Kopfzeilen-Muster (kommagetrennt)
+            </label>
+            <input
+              type="text"
+              value={h.ocrPrompts.headerPatterns.join(', ')}
+              onChange={(e) => h.saveOcrPrompts({
+                ...h.ocrPrompts,
+                headerPatterns: e.target.value.split(',').map(s => s.trim()).filter(Boolean)
+              })}
+              placeholder="Unit, Chapter, Lesson..."
+              className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
+            />
+          </div>
+
+          <div>
+            <label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+              Fusszeilen-Muster (kommagetrennt)
+            </label>
+            <input
+              type="text"
+              value={h.ocrPrompts.footerPatterns.join(', ')}
+              onChange={(e) => h.saveOcrPrompts({
+                ...h.ocrPrompts,
+                footerPatterns: e.target.value.split(',').map(s => s.trim()).filter(Boolean)
+              })}
+              placeholder="zweihundert, Page, Seite..."
+              className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
+            />
+          </div>
+        </div>
+      </div>
+
+      <div className="mt-4">
+        <label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+          Zusaetzlicher Filter-Prompt (optional)
+        </label>
+        <textarea
+          value={h.ocrPrompts.customFilter}
+          onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, customFilter: e.target.value })}
+          placeholder="z.B.: Ignoriere alle Zeilen, die nur Zahlen oder Buchstaben enthalten..."
+          rows={2}
+          className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500 resize-none`}
+        />
+      </div>
+
+      <div className="mt-4 flex justify-end">
+        <button
+          onClick={() => h.saveOcrPrompts(defaultOcrPrompts)}
+          className={`px-4 py-2 rounded-xl text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-700'}`}
+        >
+          Auf Standard zuruecksetzen
+        </button>
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/PageSelection.tsx
+++ b/studio-v2/app/vocab-worksheet/components/PageSelection.tsx
@@ -0,0 +1,108 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+
+export function PageSelection({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard } = h
+
+  return (
+    <div className={`${glassCard} rounded-2xl p-6`}>
+      <div className="flex items-center justify-between mb-4">
+        <h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          PDF-Seiten auswaehlen ({h.selectedPages.length} von {h.pdfPageCount - h.excludedPages.length} ausgewaehlt)
+        </h2>
+        <div className="flex gap-2">
+          {h.excludedPages.length > 0 && (
+            <button onClick={h.restoreExcludedPages} className={`px-3 py-1 rounded-lg text-sm ${isDark ? 'bg-orange-500/20 text-orange-300 hover:bg-orange-500/30' : 'bg-orange-100 text-orange-700 hover:bg-orange-200'}`}>
+              {h.excludedPages.length} ausgeblendet - wiederherstellen
+            </button>
+          )}
+          <button onClick={h.selectAllPages} className={`px-3 py-1 rounded-lg text-sm transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
+            Alle
+          </button>
+          <button onClick={h.selectNoPages} className={`px-3 py-1 rounded-lg text-sm transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
+            Keine
+          </button>
+        </div>
+      </div>
+
+      <p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+        Klicken Sie auf eine Seite um sie auszuwaehlen. Klicken Sie auf das X um leere Seiten auszublenden.
+      </p>
+
+      {h.isLoadingThumbnails ? (
+        <div className="flex items-center justify-center py-12">
+          <div className="w-8 h-8 border-4 border-purple-500 border-t-transparent rounded-full animate-spin" />
+          <span className={`ml-3 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Lade Seitenvorschau...</span>
+        </div>
+      ) : (
+        <div className="grid grid-cols-2 sm:grid-cols-3 md:grid-cols-4 lg:grid-cols-6 gap-4 mb-6">
+          {h.pagesThumbnails.map((thumb, idx) => {
+            if (h.excludedPages.includes(idx)) return null
+            return (
+              <div key={idx} className="relative group">
+                {/* Exclude/Delete Button */}
+                <button
+                  onClick={(e) => h.excludePage(idx, e)}
+                  className="absolute top-1 left-1 z-10 p-1 rounded-full opacity-0 group-hover:opacity-100 transition-opacity bg-red-500/80 hover:bg-red-600 text-white"
+                  title="Seite ausblenden"
+                >
+                  <svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
+                  </svg>
+                </button>
+
+                {/* OCR Compare Button */}
+                <button
+                  onClick={(e) => { e.stopPropagation(); h.runOcrComparison(idx); }}
+                  className="absolute top-1 right-1 z-10 p-1 rounded-full opacity-0 group-hover:opacity-100 transition-opacity bg-blue-500/80 hover:bg-blue-600 text-white"
+                  title="OCR-Methoden vergleichen"
+                >
+                  <svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
+                  </svg>
+                </button>
+
+                <button
+                  onClick={() => h.togglePageSelection(idx)}
+                  className={`relative rounded-xl overflow-hidden border-2 transition-all w-full ${
+                    h.selectedPages.includes(idx)
+                      ? 'border-purple-500 ring-2 ring-purple-500/50'
+                      : (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
+                  }`}
+                >
+                  <img src={thumb} alt={`Seite ${idx + 1}`} className="w-full h-auto" />
+                  <div className={`absolute bottom-0 left-0 right-0 py-1 text-center text-xs font-medium ${
+                    h.selectedPages.includes(idx)
+                      ? 'bg-purple-500 text-white'
+                      : (isDark ? 'bg-black/60 text-white/80' : 'bg-white/90 text-slate-700')
+                  }`}>
+                    Seite {idx + 1}
+                  </div>
+                  {h.selectedPages.includes(idx) && (
+                    <div className="absolute top-2 right-2 w-6 h-6 bg-purple-500 rounded-full flex items-center justify-center">
+                      <svg className="w-4 h-4 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                      </svg>
+                    </div>
+                  )}
+                </button>
+              </div>
+            )
+          })}
+        </div>
+      )}
+
+      <div className="flex justify-center">
+        <button
+          onClick={h.processSelectedPages}
+          disabled={h.selectedPages.length === 0 || h.isExtracting}
+          className="px-8 py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-2xl font-semibold disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all transform hover:scale-105"
+        >
+          {h.isExtracting ? 'Extrahiere Vokabeln...' : `${h.selectedPages.length} Seiten verarbeiten`}
+        </button>
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/QRCodeModal.tsx
+++ b/studio-v2/app/vocab-worksheet/components/QRCodeModal.tsx
@@ -0,0 +1,31 @@
+'use client'
+
+import React from 'react'
+import { QRCodeUpload } from '@/components/QRCodeUpload'
+import type { VocabWorksheetHook } from '../types'
+
+export function QRCodeModal({ h }: { h: VocabWorksheetHook }) {
+  const { isDark } = h
+
+  return (
+    <div className="fixed inset-0 z-50 flex items-center justify-center p-4">
+      <div className="absolute inset-0 bg-black/50 backdrop-blur-sm" onClick={() => h.setShowQRModal(false)} />
+      <div className={`relative w-full max-w-md rounded-3xl ${
+        isDark ? 'bg-slate-900' : 'bg-white'
+      }`}>
+        <QRCodeUpload
+          sessionId={h.uploadSessionId}
+          onClose={() => h.setShowQRModal(false)}
+          onFilesChanged={(files) => {
+            h.setMobileUploadedFiles(files)
+            if (files.length > 0) {
+              h.setSelectedMobileFile(files[files.length - 1])
+              h.setDirectFile(null)
+              h.setSelectedDocumentId(null)
+            }
+          }}
+        />
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/SpreadsheetTab.tsx
+++ b/studio-v2/app/vocab-worksheet/components/SpreadsheetTab.tsx
@@ -0,0 +1,157 @@
+'use client'
+
+/**
+ * SpreadsheetTab — Fortune Sheet editor for vocabulary data.
+ *
+ * Converts VocabularyEntry[] into a Fortune Sheet workbook
+ * where users can edit vocabulary in a familiar Excel-like UI.
+ */
+
+import React, { useMemo, useCallback } from 'react'
+import dynamic from 'next/dynamic'
+import type { VocabWorksheetHook } from '../types'
+
+const Workbook = dynamic(
+  () => import('@fortune-sheet/react').then((m) => m.Workbook),
+  { ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
+)
+
+import '@fortune-sheet/react/dist/index.css'
+
+/** Convert VocabularyEntry[] to Fortune Sheet sheet data */
+function vocabToSheet(vocabulary: VocabWorksheetHook['vocabulary']) {
+  const headers = ['Englisch', 'Deutsch', 'Beispielsatz', 'Wortart', 'Seite']
+  const numCols = headers.length
+  const numRows = vocabulary.length + 1 // +1 for header
+
+  const celldata: any[] = []
+
+  // Header row
+  headers.forEach((label, c) => {
+    celldata.push({
+      r: 0,
+      c,
+      v: { v: label, m: label, bl: 1, bg: '#f0f4ff', fc: '#1e293b' },
+    })
+  })
+
+  // Data rows
+  vocabulary.forEach((entry, idx) => {
+    const r = idx + 1
+    celldata.push({ r, c: 0, v: { v: entry.english, m: entry.english } })
+    celldata.push({ r, c: 1, v: { v: entry.german, m: entry.german } })
+    celldata.push({ r, c: 2, v: { v: entry.example_sentence || '', m: entry.example_sentence || '' } })
+    celldata.push({ r, c: 3, v: { v: entry.word_type || '', m: entry.word_type || '' } })
+    celldata.push({ r, c: 4, v: { v: entry.source_page != null ? String(entry.source_page) : '', m: entry.source_page != null ? String(entry.source_page) : '' } })
+  })
+
+  // Column widths
+  const columnlen: Record<string, number> = {
+    '0': 180, // Englisch
+    '1': 180, // Deutsch
+    '2': 280, // Beispielsatz
+    '3': 100, // Wortart
+    '4': 60,  // Seite
+  }
+
+  // Row heights
+  const rowlen: Record<string, number> = {}
+  rowlen['0'] = 28 // header
+
+  // Borders: light grid
+  const borderInfo = numRows > 0 && numCols > 0 ? [{
+    rangeType: 'range',
+    borderType: 'border-all',
+    color: '#e5e7eb',
+    style: 1,
+    range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
+  }] : []
+
+  return {
+    name: 'Vokabeln',
+    id: 'vocab_sheet',
+    celldata,
+    row: numRows,
+    column: numCols,
+    status: 1,
+    config: {
+      columnlen,
+      rowlen,
+      borderInfo,
+    },
+  }
+}
+
+export function SpreadsheetTab({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard, vocabulary } = h
+
+  const sheets = useMemo(() => {
+    if (!vocabulary || vocabulary.length === 0) return []
+    return [vocabToSheet(vocabulary)]
+  }, [vocabulary])
+
+  const estimatedHeight = Math.max(500, (vocabulary.length + 2) * 26 + 80)
+
+  const handleSaveFromSheet = useCallback(async () => {
+    await h.saveVocabulary()
+  }, [h])
+
+  if (vocabulary.length === 0) {
+    return (
+      <div className={`${glassCard} rounded-2xl p-6`}>
+        <p className={`text-center py-12 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+          Keine Vokabeln vorhanden. Bitte zuerst Seiten verarbeiten.
+        </p>
+      </div>
+    )
+  }
+
+  return (
+    <div className={`${glassCard} rounded-2xl p-4`}>
+      <div className="flex items-center justify-between mb-4">
+        <h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          Spreadsheet-Editor
+        </h2>
+        <div className="flex items-center gap-3">
+          <span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
+            {vocabulary.length} Vokabeln
+          </span>
+          <button
+            onClick={handleSaveFromSheet}
+            className="px-4 py-2 rounded-xl text-sm font-medium bg-gradient-to-r from-purple-500 to-pink-500 text-white hover:shadow-lg transition-all"
+          >
+            Speichern
+          </button>
+        </div>
+      </div>
+
+      <div
+        className="rounded-xl overflow-hidden border"
+        style={{
+          borderColor: isDark ? 'rgba(255,255,255,0.1)' : 'rgba(0,0,0,0.1)',
+        }}
+      >
+        {sheets.length > 0 && (
+          <div style={{ width: '100%', height: `${estimatedHeight}px` }}>
+            <Workbook
+              data={sheets}
+              lang="en"
+              showToolbar
+              showFormulaBar={false}
+              showSheetTabs={false}
+              toolbarItems={[
+                'undo', 'redo', '|',
+                'font-bold', 'font-italic', 'font-strikethrough', '|',
+                'font-color', 'background', '|',
+                'font-size', '|',
+                'horizontal-align', 'vertical-align', '|',
+                'text-wrap', '|',
+                'border',
+              ]}
+            />
+          </div>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/UploadScreen.tsx
+++ b/studio-v2/app/vocab-worksheet/components/UploadScreen.tsx
@@ -0,0 +1,315 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+import { formatFileSize } from '../constants'
+
+export function UploadScreen({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard, glassInput } = h
+
+  return (
+    <div className="space-y-6">
+      {/* Existing Sessions */}
+      {h.existingSessions.length > 0 && (
+        <div className={`${glassCard} rounded-2xl p-6`}>
+          <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Vorhandene Sessions fortsetzen
+          </h2>
+          {h.isLoadingSessions ? (
+            <div className="flex items-center gap-3 py-4">
+              <div className="w-5 h-5 border-2 border-purple-500 border-t-transparent rounded-full animate-spin" />
+              <span className={isDark ? 'text-white/60' : 'text-slate-500'}>Lade Sessions...</span>
+            </div>
+          ) : (
+            <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
+              {h.existingSessions.map((s) => (
+                <div
+                  key={s.id}
+                  className={`${glassCard} p-4 rounded-xl text-left transition-all hover:shadow-lg relative group cursor-pointer ${
+                    isDark ? 'hover:border-purple-400/50' : 'hover:border-purple-400'
+                  }`}
+                  onClick={() => h.resumeSession(s)}
+                >
+                  {/* Delete Button */}
+                  <button
+                    onClick={(e) => h.deleteSession(s.id, e)}
+                    className={`absolute top-2 right-2 p-1.5 rounded-lg opacity-0 group-hover:opacity-100 transition-opacity ${
+                      isDark ? 'hover:bg-red-500/20 text-red-400' : 'hover:bg-red-100 text-red-500'
+                    }`}
+                    title="Session loeschen"
+                  >
+                    <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
+                    </svg>
+                  </button>
+
+                  <div className="flex items-start gap-3">
+                    <div className={`w-10 h-10 rounded-lg flex items-center justify-center flex-shrink-0 ${
+                      s.status === 'extracted' || s.status === 'completed'
+                        ? (isDark ? 'bg-green-500/30' : 'bg-green-100')
+                        : (isDark ? 'bg-white/10' : 'bg-slate-100')
+                    }`}>
+                      {s.status === 'extracted' || s.status === 'completed' ? (
+                        <svg className="w-5 h-5 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                          <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                        </svg>
+                      ) : (
+                        <svg className={`w-5 h-5 ${isDark ? 'text-white/40' : 'text-slate-400'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                          <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" />
+                        </svg>
+                      )}
+                    </div>
+                    <div className="flex-1 min-w-0">
+                      <h3 className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{s.name}</h3>
+                      <p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                        {s.vocabulary_count} Vokabeln
+                        {s.status === 'pending' && ' • Nicht gestartet'}
+                        {s.status === 'extracted' && ' • Bereit'}
+                        {s.status === 'completed' && ' • Abgeschlossen'}
+                      </p>
+                      {s.created_at && (
+                        <p className={`text-xs mt-1 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+                          {new Date(s.created_at).toLocaleDateString('de-DE', {
+                            day: '2-digit',
+                            month: '2-digit',
+                            year: 'numeric',
+                            hour: '2-digit',
+                            minute: '2-digit'
+                          })}
+                        </p>
+                      )}
+                    </div>
+                    <svg className={`w-5 h-5 flex-shrink-0 ${isDark ? 'text-white/30' : 'text-slate-300'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
+                    </svg>
+                  </div>
+                </div>
+              ))}
+            </div>
+          )}
+        </div>
+      )}
+
+      {/* Explanation */}
+      <div className={`${glassCard} rounded-2xl p-6 ${isDark ? 'bg-gradient-to-br from-purple-500/20 to-pink-500/20' : 'bg-gradient-to-br from-purple-100/50 to-pink-100/50'}`}>
+        <h2 className={`text-lg font-semibold mb-3 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          {h.existingSessions.length > 0 ? 'Oder neue Session starten:' : 'So funktioniert es:'}
+        </h2>
+        <ol className={`space-y-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+          {['Dokument (Bild oder PDF) auswaehlen', 'Vorschau pruefen und Session benennen', 'Bei PDFs: Seiten auswaehlen die verarbeitet werden sollen', 'KI extrahiert Vokabeln — pruefen, korrigieren, Arbeitsblatt-Typ waehlen', 'PDF herunterladen und ausdrucken'].map((text, i) => (
+            <li key={i} className="flex items-start gap-2">
+              <span className={`w-6 h-6 rounded-full flex items-center justify-center text-xs font-bold flex-shrink-0 ${isDark ? 'bg-purple-500/30 text-purple-300' : 'bg-purple-200 text-purple-700'}`}>{i + 1}</span>
+              <span>{text}</span>
+            </li>
+          ))}
+        </ol>
+      </div>
+
+      {/* Step 1: Document Selection */}
+      <div className={`${glassCard} rounded-2xl p-6`}>
+        <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          1. Dokument auswaehlen
+        </h2>
+
+        <input ref={h.directFileInputRef} type="file" accept="image/png,image/jpeg,image/jpg,application/pdf" onChange={h.handleDirectFileSelect} className="hidden" />
+
+        <div className="grid grid-cols-2 gap-3 mb-4">
+          {/* File Upload Button */}
+          <button
+            onClick={() => h.directFileInputRef.current?.click()}
+            className={`p-4 rounded-xl border-2 border-dashed transition-all ${
+              h.directFile
+                ? (isDark ? 'border-green-400/50 bg-green-500/20' : 'border-green-500 bg-green-50')
+                : (isDark ? 'border-white/20 hover:border-purple-400/50' : 'border-slate-300 hover:border-purple-500')
+            }`}
+          >
+            {h.directFile ? (
+              <div className="flex items-center gap-3">
+                <span className="text-2xl">{h.directFile.type === 'application/pdf' ? '📄' : '🖼️'}</span>
+                <div className="text-left flex-1 min-w-0">
+                  <p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{h.directFile.name}</p>
+                  <p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(h.directFile.size)}</p>
+                </div>
+                <svg className="w-5 h-5 text-green-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                </svg>
+              </div>
+            ) : (
+              <div className={`text-center ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                <span className="text-2xl block mb-1">📁</span>
+                <span className="text-sm">Datei auswaehlen</span>
+              </div>
+            )}
+          </button>
+
+          {/* QR Code Upload Button */}
+          <button
+            onClick={() => h.setShowQRModal(true)}
+            className={`p-4 rounded-xl border-2 border-dashed transition-all ${
+              h.selectedMobileFile
+                ? (isDark ? 'border-green-400/50 bg-green-500/20' : 'border-green-500 bg-green-50')
+                : (isDark ? 'border-white/20 hover:border-purple-400/50' : 'border-slate-300 hover:border-purple-500')
+            }`}
+          >
+            {h.selectedMobileFile ? (
+              <div className="flex items-center gap-3">
+                <span className="text-2xl">{h.selectedMobileFile.type.startsWith('image/') ? '🖼️' : '📄'}</span>
+                <div className="text-left flex-1 min-w-0">
+                  <p className={`font-medium truncate text-sm ${isDark ? 'text-white' : 'text-slate-900'}`}>{h.selectedMobileFile.name}</p>
+                  <p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>vom Handy</p>
+                </div>
+                <svg className="w-5 h-5 text-green-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                </svg>
+              </div>
+            ) : (
+              <div className={`text-center ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
+                <span className="text-2xl block mb-1">📱</span>
+                <span className="text-sm">Mit Handy scannen</span>
+              </div>
+            )}
+          </button>
+        </div>
+
+        {/* Mobile Uploaded Files */}
+        {h.mobileUploadedFiles.length > 0 && !h.directFile && (
+          <>
+            <div className={`text-center text-sm mb-3 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>— Vom Handy hochgeladen —</div>
+            <div className="space-y-2 max-h-32 overflow-y-auto mb-4">
+              {h.mobileUploadedFiles.map((file) => (
+                <button
+                  key={file.id}
+                  onClick={() => { h.setSelectedMobileFile(file); h.setDirectFile(null); h.setSelectedDocumentId(null); h.setError(null) }}
+                  className={`w-full flex items-center gap-3 p-3 rounded-xl text-left transition-all ${
+                    h.selectedMobileFile?.id === file.id
+                      ? (isDark ? 'bg-green-500/30 border-2 border-green-400/50' : 'bg-green-100 border-2 border-green-500')
+                      : (isDark ? 'bg-white/5 border-2 border-transparent hover:border-white/20' : 'bg-slate-50 border-2 border-transparent hover:border-slate-200')
+                  }`}
+                >
+                  <span className="text-xl">{file.type.startsWith('image/') ? '🖼️' : '📄'}</span>
+                  <div className="flex-1 min-w-0">
+                    <p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{file.name}</p>
+                    <p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(file.size)}</p>
+                  </div>
+                  {h.selectedMobileFile?.id === file.id && (
+                    <svg className="w-5 h-5 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                    </svg>
+                  )}
+                </button>
+              ))}
+            </div>
+          </>
+        )}
+
+        {/* Stored Documents */}
+        {h.storedDocuments.length > 0 && !h.directFile && !h.selectedMobileFile && (
+          <>
+            <div className={`text-center text-sm mb-3 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>— oder aus Ihren Dokumenten —</div>
+            <div className="space-y-2 max-h-32 overflow-y-auto">
+              {h.storedDocuments.map((doc) => (
+                <button
+                  key={doc.id}
+                  onClick={() => { h.setSelectedDocumentId(doc.id); h.setDirectFile(null); h.setSelectedMobileFile(null); h.setError(null) }}
+                  className={`w-full flex items-center gap-3 p-3 rounded-xl text-left transition-all ${
+                    h.selectedDocumentId === doc.id
+                      ? (isDark ? 'bg-purple-500/30 border-2 border-purple-400/50' : 'bg-purple-100 border-2 border-purple-500')
+                      : (isDark ? 'bg-white/5 border-2 border-transparent hover:border-white/20' : 'bg-slate-50 border-2 border-transparent hover:border-slate-200')
+                  }`}
+                >
+                  <span className="text-xl">{doc.type === 'application/pdf' ? '📄' : '🖼️'}</span>
+                  <div className="flex-1 min-w-0">
+                    <p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{doc.name}</p>
+                    <p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(doc.size)}</p>
+                  </div>
+                  {h.selectedDocumentId === doc.id && (
+                    <svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                    </svg>
+                  )}
+                </button>
+              ))}
+            </div>
+          </>
+        )}
+      </div>
+
+      {/* Step 2: Preview + Session Name */}
+      {(h.directFile || h.selectedMobileFile || h.selectedDocumentId) && (
+        <div className="grid grid-cols-1 lg:grid-cols-5 gap-6">
+          {/* Document Preview */}
+          <div className={`${glassCard} rounded-2xl p-6 lg:col-span-3`}>
+            <div className="flex items-center justify-between mb-4">
+              <h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                Vorschau
+              </h2>
+              <button
+                onClick={() => h.setShowFullPreview(true)}
+                className={`px-3 py-1.5 rounded-lg text-sm font-medium transition-all flex items-center gap-2 ${
+                  isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-700'
+                }`}
+              >
+                <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 21l-6-6m2-5a7 7 0 11-14 0 7 7 0 0114 0zM10 7v3m0 0v3m0-3h3m-3 0H7" />
+                </svg>
+                Originalgroesse
+              </button>
+            </div>
+            <div className={`max-h-[60vh] overflow-auto rounded-xl border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
+              {h.directFile?.type.startsWith('image/') && h.directFilePreview && (
+                <img src={h.directFilePreview} alt="Vorschau" className="w-full h-auto" />
+              )}
+              {h.directFile?.type === 'application/pdf' && h.directFilePreview && (
+                <iframe src={h.directFilePreview} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
+              )}
+              {h.selectedMobileFile && !h.directFile && (
+                h.selectedMobileFile.type.startsWith('image/')
+                  ? <img src={h.selectedMobileFile.dataUrl} alt="Vorschau" className="w-full h-auto" />
+                  : <iframe src={h.selectedMobileFile.dataUrl} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
+              )}
+              {h.selectedDocumentId && !h.directFile && !h.selectedMobileFile && (() => {
+                const doc = h.storedDocuments.find(d => d.id === h.selectedDocumentId)
+                if (!doc?.url) return <p className={`p-8 text-center ${isDark ? 'text-white/40' : 'text-slate-400'}`}>Keine Vorschau verfuegbar</p>
+                return doc.type.startsWith('image/')
+                  ? <img src={doc.url} alt="Vorschau" className="w-full h-auto" />
+                  : <iframe src={doc.url} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
+              })()}
+            </div>
+          </div>
+
+          {/* Session Name + Start */}
+          <div className={`${glassCard} rounded-2xl p-6 lg:col-span-2 flex flex-col`}>
+            <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+              2. Session benennen
+            </h2>
+            <input
+              type="text"
+              value={h.sessionName}
+              onChange={(e) => { h.setSessionName(e.target.value); h.setError(null) }}
+              placeholder="z.B. Englisch Klasse 7 - Unit 3"
+              className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500 mb-4`}
+              autoFocus
+            />
+            <p className={`text-sm mb-6 ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
+              Benennen Sie die Session z.B. nach dem Schulbuch-Kapitel, damit Sie sie spaeter wiederfinden.
+            </p>
+            <div className="flex-1" />
+            <button
+              onClick={() => {
+                if (!h.sessionName.trim()) {
+                  h.setError('Bitte geben Sie einen Session-Namen ein (z.B. "Englisch Klasse 7 - Unit 3")')
+                  return
+                }
+                h.startSession()
+              }}
+              disabled={h.isCreatingSession || !h.sessionName.trim()}
+              className="w-full px-6 py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-2xl font-semibold text-lg disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all transform hover:scale-105"
+            >
+              {h.isCreatingSession ? 'Verarbeite...' : 'Weiter →'}
+            </button>
+          </div>
+        </div>
+      )}
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/VocabularyTab.tsx
+++ b/studio-v2/app/vocab-worksheet/components/VocabularyTab.tsx
@@ -0,0 +1,312 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook, IpaMode, SyllableMode } from '../types'
+import { getApiBase } from '../constants'
+
+export function VocabularyTab({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard, glassInput } = h
+  const extras = h.getAllExtraColumns()
+  const baseCols = 3 + extras.length
+  const gridCols = `14px 32px 36px repeat(${baseCols}, 1fr) 32px`
+
+  return (
+    <div className="flex flex-col lg:flex-row gap-4" style={{ height: 'calc(100vh - 240px)', minHeight: '500px' }}>
+      {/* Left: Original pages */}
+      <div className={`${glassCard} rounded-2xl p-4 lg:w-1/3 flex flex-col overflow-hidden`}>
+        <h2 className={`text-sm font-semibold mb-3 flex-shrink-0 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
+          Original ({(() => { const pp = h.selectedPages.length > 0 ? h.selectedPages : [...new Set(h.vocabulary.map(v => (v.source_page || 1) - 1))]; return pp.length; })()} Seiten)
+        </h2>
+        <div className="flex-1 overflow-y-auto space-y-3">
+          {(() => {
+            const processedPageIndices = h.selectedPages.length > 0
+              ? h.selectedPages
+              : [...new Set(h.vocabulary.map(v => (v.source_page || 1) - 1))].sort((a, b) => a - b)
+
+            const apiBase = getApiBase()
+            const pagesToShow = processedPageIndices
+              .filter(idx => idx >= 0)
+              .map(idx => ({
+                idx,
+                src: h.session ? `${apiBase}/api/v1/vocab/sessions/${h.session.id}/pdf-page-image/${idx}` : null,
+              }))
+              .filter(t => t.src !== null) as { idx: number; src: string }[]
+
+            if (pagesToShow.length > 0) {
+              return pagesToShow.map(({ idx, src }) => (
+                <div key={idx} className={`relative rounded-xl overflow-hidden border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
+                  <div className={`absolute top-2 left-2 px-2 py-0.5 rounded-lg text-xs font-medium z-10 ${isDark ? 'bg-black/60 text-white' : 'bg-white/90 text-slate-700'}`}>
+                    S. {idx + 1}
+                  </div>
+                  <img src={src} alt={`Seite ${idx + 1}`} className="w-full h-auto" />
+                </div>
+              ))
+            }
+            if (h.uploadedImage) {
+              return (
+                <div className={`relative rounded-xl overflow-hidden border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
+                  <img src={h.uploadedImage} alt="Arbeitsblatt" className="w-full h-auto" />
+                </div>
+              )
+            }
+            return (
+              <div className={`flex-1 flex items-center justify-center py-12 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
+                <div className="text-center">
+                  <svg className="w-12 h-12 mx-auto mb-2 opacity-50" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                    <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M4 16l4.586-4.586a2 2 0 012.828 0L16 16m-2-2l1.586-1.586a2 2 0 012.828 0L20 14m-6-6h.01M6 20h12a2 2 0 002-2V6a2 2 0 00-2-2H6a2 2 0 00-2 2v12a2 2 0 002 2z" />
+                  </svg>
+                  <p className="text-xs">Kein Bild verfuegbar</p>
+                </div>
+              </div>
+            )
+          })()}
+        </div>
+      </div>
+
+      {/* Right: Vocabulary table */}
+      <div className={`${glassCard} rounded-2xl p-4 lg:w-2/3 flex flex-col overflow-hidden`}>
+        <div className="flex items-center justify-between mb-3 flex-shrink-0">
+          <h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
+            Vokabeln ({h.vocabulary.length})
+          </h2>
+          <div className="flex items-center gap-2">
+            {/* IPA mode */}
+            <select
+              value={h.ipaMode}
+              onChange={(e) => {
+                const newIpa = e.target.value as IpaMode
+                h.setIpaMode(newIpa)
+                h.reprocessPages(newIpa, h.syllableMode)
+              }}
+              className={`px-2 py-1.5 text-xs rounded-md border ${isDark ? 'border-white/20 bg-white/10 text-white' : 'border-gray-200 bg-white text-gray-600'}`}
+              title="Lautschrift (IPA)"
+            >
+              <option value="none">IPA: Aus</option>
+              <option value="auto">IPA: Auto</option>
+              <option value="en">IPA: nur EN</option>
+              <option value="de">IPA: nur DE</option>
+              <option value="all">IPA: Alle</option>
+            </select>
+            {/* Syllable mode */}
+            <select
+              value={h.syllableMode}
+              onChange={(e) => {
+                const newSyl = e.target.value as SyllableMode
+                h.setSyllableMode(newSyl)
+                h.reprocessPages(h.ipaMode, newSyl)
+              }}
+              className={`px-2 py-1.5 text-xs rounded-md border ${isDark ? 'border-white/20 bg-white/10 text-white' : 'border-gray-200 bg-white text-gray-600'}`}
+              title="Silbentrennung"
+            >
+              <option value="none">Silben: Aus</option>
+              <option value="auto">Silben: Original</option>
+              <option value="en">Silben: nur EN</option>
+              <option value="de">Silben: nur DE</option>
+              <option value="all">Silben: Alle</option>
+            </select>
+            <button
+              onClick={() => h.reprocessPages(h.ipaMode, h.syllableMode)}
+              className={`px-3 py-2 rounded-xl text-sm font-medium transition-colors ${isDark ? 'bg-orange-500/20 hover:bg-orange-500/30 text-orange-200 border border-orange-500/30' : 'bg-orange-50 hover:bg-orange-100 text-orange-700 border border-orange-200'}`}
+              title="Seiten erneut verarbeiten (OCR + Zeilenmerge)"
+            >
+              Neu verarbeiten
+            </button>
+            <button onClick={h.saveVocabulary} className={`px-4 py-2 rounded-xl text-sm font-medium transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
+              Speichern
+            </button>
+            <button onClick={() => h.setActiveTab('worksheet')} className="px-4 py-2 rounded-xl text-sm font-medium bg-gradient-to-r from-purple-500 to-pink-500 text-white hover:shadow-lg transition-all">
+              Weiter →
+            </button>
+          </div>
+        </div>
+
+        {/* Error messages for failed pages */}
+        {h.processingErrors.length > 0 && (
+          <div className={`rounded-xl p-3 mb-3 flex-shrink-0 ${isDark ? 'bg-orange-500/20 text-orange-200 border border-orange-500/30' : 'bg-orange-100 text-orange-700 border border-orange-200'}`}>
+            <div className="font-medium mb-1 text-sm">Einige Seiten konnten nicht verarbeitet werden:</div>
+            <ul className="text-xs space-y-0.5">
+              {h.processingErrors.map((err, idx) => (
+                <li key={idx}>• {err}</li>
+              ))}
+            </ul>
+          </div>
+        )}
+
+        {/* Processing Progress */}
+        {h.currentlyProcessingPage && (
+          <div className={`rounded-xl p-3 mb-3 flex-shrink-0 ${isDark ? 'bg-purple-500/20 border border-purple-500/30' : 'bg-purple-100 border border-purple-200'}`}>
+            <div className="flex items-center gap-3">
+              <div className={`w-4 h-4 border-2 ${isDark ? 'border-purple-300' : 'border-purple-600'} border-t-transparent rounded-full animate-spin`} />
+              <div>
+                <div className={`text-sm font-medium ${isDark ? 'text-purple-200' : 'text-purple-700'}`}>Verarbeite Seite {h.currentlyProcessingPage}...</div>
+                <div className={`text-xs ${isDark ? 'text-purple-300/70' : 'text-purple-600'}`}>
+                  {h.successfulPages.length > 0 && `${h.successfulPages.length} Seite(n) fertig • `}
+                  {h.vocabulary.length} Vokabeln bisher
+                </div>
+              </div>
+            </div>
+          </div>
+        )}
+
+        {/* Success info */}
+        {!h.currentlyProcessingPage && h.successfulPages.length > 0 && h.failedPages.length === 0 && (
+          <div className={`rounded-xl p-2 mb-3 text-xs flex-shrink-0 ${isDark ? 'bg-green-500/20 text-green-200 border border-green-500/30' : 'bg-green-100 text-green-700 border border-green-200'}`}>
+            Alle {h.successfulPages.length} Seite(n) erfolgreich verarbeitet - {h.vocabulary.length} Vokabeln insgesamt
+          </div>
+        )}
+
+        {/* Partial success info */}
+        {!h.currentlyProcessingPage && h.successfulPages.length > 0 && h.failedPages.length > 0 && (
+          <div className={`rounded-xl p-2 mb-3 text-xs flex-shrink-0 ${isDark ? 'bg-yellow-500/20 text-yellow-200 border border-yellow-500/30' : 'bg-yellow-100 text-yellow-700 border border-yellow-200'}`}>
+            {h.successfulPages.length} Seite(n) erfolgreich, {h.failedPages.length} fehlgeschlagen - {h.vocabulary.length} Vokabeln extrahiert
+          </div>
+        )}
+
+        {h.vocabulary.length === 0 ? (
+          <p className={`text-center py-8 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Keine Vokabeln gefunden.</p>
+        ) : (
+          <div className="flex flex-col flex-1 overflow-hidden">
+            {/* Fixed Header */}
+            <div className={`flex-shrink-0 grid gap-1 px-2 py-2 text-sm font-medium border-b items-center ${isDark ? 'border-white/10 text-white/60' : 'border-black/10 text-slate-500'}`} style={{ gridTemplateColumns: gridCols }}>
+              <div>{/* insert-triangle spacer */}</div>
+              <div className="flex items-center justify-center">
+                <input
+                  type="checkbox"
+                  checked={h.vocabulary.length > 0 && h.vocabulary.every(v => v.selected)}
+                  onChange={h.toggleAllSelection}
+                  className="w-4 h-4 rounded border-gray-300 text-purple-600 focus:ring-purple-500 cursor-pointer"
+                  title="Alle auswaehlen"
+                />
+              </div>
+              <div>S.</div>
+              <div>Englisch</div>
+              <div>Deutsch</div>
+              <div>Beispiel</div>
+              {extras.map(col => (
+                <div key={col.key} className="flex items-center gap-1 group">
+                  <span className="truncate">{col.label}</span>
+                  <button
+                    onClick={() => {
+                      const page = Object.entries(h.pageExtraColumns).find(([, cols]) => cols.some(c => c.key === col.key))
+                      if (page) h.removeExtraColumn(Number(page[0]), col.key)
+                    }}
+                    className={`opacity-0 group-hover:opacity-100 transition-opacity ${isDark ? 'text-red-400 hover:text-red-300' : 'text-red-500 hover:text-red-600'}`}
+                    title="Spalte entfernen"
+                  >
+                    <svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" /></svg>
+                  </button>
+                </div>
+              ))}
+              <div className="flex items-center justify-center">
+                <button
+                  onClick={() => h.addExtraColumn(0)}
+                  className={`p-0.5 rounded transition-colors ${isDark ? 'hover:bg-white/10 text-white/40 hover:text-white/70' : 'hover:bg-slate-200 text-slate-400 hover:text-slate-600'}`}
+                  title="Spalte hinzufuegen"
+                >
+                  <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 4v16m8-8H4" /></svg>
+                </button>
+              </div>
+            </div>
+
+            {/* Scrollable Content */}
+            <div className="flex-1 overflow-y-auto">
+              {h.vocabulary.map((entry, index) => (
+                <React.Fragment key={entry.id}>
+                  <div className={`grid gap-1 px-2 py-1 items-center ${isDark ? 'hover:bg-white/5' : 'hover:bg-black/5'}`} style={{ gridTemplateColumns: gridCols }}>
+                    <button
+                      onClick={() => h.addVocabularyEntry(index)}
+                      className={`w-3.5 h-3.5 flex items-center justify-center opacity-0 hover:opacity-100 transition-opacity ${isDark ? 'text-purple-400' : 'text-purple-500'}`}
+                      title="Zeile einfuegen"
+                    >
+                      <svg className="w-2.5 h-2.5" viewBox="0 0 10 10" fill="currentColor"><polygon points="0,0 10,5 0,10" /></svg>
+                    </button>
+                    <div className="flex items-center justify-center">
+                      <input
+                        type="checkbox"
+                        checked={entry.selected || false}
+                        onChange={() => h.toggleVocabularySelection(entry.id)}
+                        className="w-4 h-4 rounded border-gray-300 text-purple-600 focus:ring-purple-500 cursor-pointer"
+                      />
+                    </div>
+                    <div className={`flex items-center justify-center text-xs font-medium rounded ${isDark ? 'bg-white/10 text-white/60' : 'bg-black/10 text-slate-600'}`}>
+                      {entry.source_page || '-'}
+                    </div>
+                    <input
+                      type="text"
+                      value={entry.english}
+                      onChange={(e) => h.updateVocabularyEntry(entry.id, 'english', e.target.value)}
+                      className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
+                    />
+                    <input
+                      type="text"
+                      value={entry.german}
+                      onChange={(e) => h.updateVocabularyEntry(entry.id, 'german', e.target.value)}
+                      className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
+                    />
+                    <input
+                      type="text"
+                      value={entry.example_sentence || ''}
+                      onChange={(e) => h.updateVocabularyEntry(entry.id, 'example_sentence', e.target.value)}
+                      placeholder="Beispiel"
+                      className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
+                    />
+                    {extras.map(col => (
+                      <input
+                        key={col.key}
+                        type="text"
+                        value={(entry.extras && entry.extras[col.key]) || ''}
+                        onChange={(e) => h.updateVocabularyEntry(entry.id, col.key, e.target.value)}
+                        placeholder={col.label}
+                        className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
+                      />
+                    ))}
+                    <button onClick={() => h.deleteVocabularyEntry(entry.id)} className={`p-1 rounded-lg ${isDark ? 'hover:bg-red-500/20 text-red-400' : 'hover:bg-red-100 text-red-500'}`}>
+                      <svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
+                      </svg>
+                    </button>
+                  </div>
+                </React.Fragment>
+              ))}
+              {/* Final insert triangle */}
+              <div className="px-2 py-1">
+                <button
+                  onClick={() => h.addVocabularyEntry()}
+                  className={`w-3.5 h-3.5 flex items-center justify-center opacity-30 hover:opacity-100 transition-opacity ${isDark ? 'text-purple-400' : 'text-purple-500'}`}
+                  title="Zeile am Ende einfuegen"
+                >
+                  <svg className="w-2.5 h-2.5" viewBox="0 0 10 10" fill="currentColor"><polygon points="0,0 10,5 0,10" /></svg>
+                </button>
+              </div>
+            </div>
+
+            {/* Footer */}
+            <div className={`flex-shrink-0 pt-2 border-t flex items-center justify-between text-xs ${isDark ? 'border-white/10 text-white/50' : 'border-black/10 text-slate-400'}`}>
+              <span>
+                {h.vocabulary.length} Vokabeln
+                {h.vocabulary.filter(v => v.selected).length > 0 && ` (${h.vocabulary.filter(v => v.selected).length} ausgewaehlt)`}
+                {(() => {
+                  const pages = [...new Set(h.vocabulary.map(v => v.source_page).filter(Boolean))].sort((a, b) => (a || 0) - (b || 0))
+                  return pages.length > 1 ? ` • Seiten: ${pages.join(', ')}` : ''
+                })()}
+              </span>
+              <button
+                onClick={() => h.addVocabularyEntry()}
+                className={`px-3 py-1 rounded-lg text-xs flex items-center gap-1 transition-colors ${
+                  isDark
+                    ? 'bg-white/10 hover:bg-white/20 text-white/70'
+                    : 'bg-slate-100 hover:bg-slate-200 text-slate-600'
+                }`}
+              >
+                <svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 4v16m8-8H4" />
+                </svg>
+                Zeile
+              </button>
+            </div>
+          </div>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/components/WorksheetTab.tsx
+++ b/studio-v2/app/vocab-worksheet/components/WorksheetTab.tsx
@@ -0,0 +1,155 @@
+'use client'
+
+import React from 'react'
+import type { VocabWorksheetHook } from '../types'
+import { worksheetFormats, worksheetTypes } from '../constants'
+
+export function WorksheetTab({ h }: { h: VocabWorksheetHook }) {
+  const { isDark, glassCard, glassInput } = h
+
+  return (
+    <div className={`${glassCard} rounded-2xl p-6`}>
+      {/* Step 1: Format Selection */}
+      <div className="mb-8">
+        <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          1. Vorlage waehlen
+        </h2>
+        <div className="grid grid-cols-2 gap-4">
+          {worksheetFormats.map((format) => (
+            <button
+              key={format.id}
+              onClick={() => h.setSelectedFormat(format.id)}
+              className={`p-5 rounded-xl border text-left transition-all ${
+                h.selectedFormat === format.id
+                  ? (isDark ? 'border-purple-400/50 bg-purple-500/20 ring-2 ring-purple-500/50' : 'border-purple-500 bg-purple-50 ring-2 ring-purple-500/30')
+                  : (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
+              }`}
+            >
+              <div className="flex items-start gap-3">
+                <div className={`w-10 h-10 rounded-lg flex items-center justify-center shrink-0 ${
+                  h.selectedFormat === format.id
+                    ? (isDark ? 'bg-purple-500/30' : 'bg-purple-200')
+                    : (isDark ? 'bg-white/10' : 'bg-slate-100')
+                }`}>
+                  {format.id === 'standard' ? (
+                    <svg className={`w-5 h-5 ${h.selectedFormat === format.id ? 'text-purple-400' : (isDark ? 'text-white/60' : 'text-slate-500')}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
+                    </svg>
+                  ) : (
+                    <svg className={`w-5 h-5 ${h.selectedFormat === format.id ? 'text-purple-400' : (isDark ? 'text-white/60' : 'text-slate-500')}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                      <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M4 5a1 1 0 011-1h14a1 1 0 011 1v2a1 1 0 01-1 1H5a1 1 0 01-1-1V5zM4 13a1 1 0 011-1h6a1 1 0 011 1v6a1 1 0 01-1 1H5a1 1 0 01-1-1v-6zM16 13a1 1 0 011-1h2a1 1 0 011 1v6a1 1 0 01-1 1h-2a1 1 0 01-1-1v-6z" />
+                    </svg>
+                  )}
+                </div>
+                <div className="flex-1">
+                  <div className="flex items-center justify-between">
+                    <span className={`font-medium ${isDark ? 'text-white' : 'text-slate-900'}`}>{format.label}</span>
+                    {h.selectedFormat === format.id && (
+                      <svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
+                      </svg>
+                    )}
+                  </div>
+                  <p className={`text-sm mt-1 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{format.description}</p>
+                </div>
+              </div>
+            </button>
+          ))}
+        </div>
+      </div>
+
+      {/* Step 2: Configuration */}
+      <div className="mb-6">
+        <h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
+          2. Arbeitsblatt konfigurieren
+        </h2>
+
+        {/* Title */}
+        <div className="mb-6">
+          <label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Titel</label>
+          <input
+            type="text"
+            value={h.worksheetTitle}
+            onChange={(e) => h.setWorksheetTitle(e.target.value)}
+            placeholder="z.B. Vokabeln Unit 3"
+            className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
+          />
+        </div>
+
+        {/* Standard format options */}
+        {h.selectedFormat === 'standard' && (
+          <>
+            <div className="mb-6">
+              <label className={`block text-sm font-medium mb-3 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Arbeitsblatt-Typen</label>
+              <div className="grid grid-cols-2 gap-3">
+                {worksheetTypes.map((type) => (
+                  <button
+                    key={type.id}
+                    onClick={() => h.toggleWorksheetType(type.id)}
+                    className={`p-4 rounded-xl border text-left transition-all ${
+                      h.selectedTypes.includes(type.id)
+                        ? (isDark ? 'border-purple-400/50 bg-purple-500/20' : 'border-purple-500 bg-purple-50')
+                        : (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
+                    }`}
+                  >
+                    <div className="flex items-center justify-between">
+                      <span className={`font-medium ${isDark ? 'text-white' : 'text-slate-900'}`}>{type.label}</span>
+                      {h.selectedTypes.includes(type.id) && <svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" /></svg>}
+                    </div>
+                    <p className={`text-sm mt-1 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{type.description}</p>
+                  </button>
+                ))}
+              </div>
+            </div>
+
+            <div className="grid grid-cols-2 gap-6 mb-6">
+              <div>
+                <label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Zeilenhoehe</label>
+                <select value={h.lineHeight} onChange={(e) => h.setLineHeight(e.target.value)} className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}>
+                  <option value="normal">Normal</option>
+                  <option value="large">Gross</option>
+                  <option value="extra-large">Extra gross</option>
+                </select>
+              </div>
+              <div className="flex items-center">
+                <label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                  <input type="checkbox" checked={h.includeSolutions} onChange={(e) => h.setIncludeSolutions(e.target.checked)} className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500" />
+                  <span>Loesungsblatt erstellen</span>
+                </label>
+              </div>
+            </div>
+          </>
+        )}
+
+        {/* NRU format options */}
+        {h.selectedFormat === 'nru' && (
+          <div className="space-y-4">
+            <div className={`p-4 rounded-xl ${isDark ? 'bg-indigo-500/20 border border-indigo-500/30' : 'bg-indigo-50 border border-indigo-200'}`}>
+              <h4 className={`font-medium mb-2 ${isDark ? 'text-indigo-200' : 'text-indigo-700'}`}>NRU-Format Uebersicht:</h4>
+              <ul className={`text-sm space-y-1 ${isDark ? 'text-indigo-200/80' : 'text-indigo-600'}`}>
+                <li>• <strong>Vokabeln:</strong> 3-Spalten-Tabelle (Englisch | Deutsch leer | Korrektur leer)</li>
+                <li>• <strong>Lernsaetze:</strong> Deutscher Satz + 2 leere Zeilen fuer englische Uebersetzung</li>
+                <li>• Pro gescannter Seite werden 2 Arbeitsblatt-Seiten erzeugt</li>
+              </ul>
+            </div>
+
+            <div className="flex items-center">
+              <label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
+                <input type="checkbox" checked={h.includeSolutions} onChange={(e) => h.setIncludeSolutions(e.target.checked)} className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500" />
+                <span>Loesungsblatt erstellen (mit deutschen Uebersetzungen)</span>
+              </label>
+            </div>
+          </div>
+        )}
+      </div>
+
+      <button
+        onClick={h.generateWorksheet}
+        disabled={(h.selectedFormat === 'standard' && h.selectedTypes.length === 0) || h.isGenerating}
+        className="w-full py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-xl font-semibold disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all"
+      >
+        {h.isGenerating ? 'Generiere PDF...' : `${h.selectedFormat === 'nru' ? 'NRU-Arbeitsblatt' : 'Arbeitsblatt'} generieren`}
+      </button>
+    </div>
+  )
+}
--- a/studio-v2/app/vocab-worksheet/constants.ts
+++ b/studio-v2/app/vocab-worksheet/constants.ts
@@ -0,0 +1,56 @@
+import type { OcrPrompts, WorksheetFormat, WorksheetType } from './types'
+
+// API Base URL - dynamisch basierend auf Browser-Host
+// Verwendet /klausur-api/ Proxy um Zertifikat-Probleme zu vermeiden
+export const getApiBase = () => {
+  if (typeof window === 'undefined') return 'http://localhost:8086'
+  const { hostname, protocol } = window.location
+  if (hostname === 'localhost') return 'http://localhost:8086'
+  return `${protocol}//${hostname}/klausur-api`
+}
+
+// LocalStorage Keys
+export const DOCUMENTS_KEY = 'bp_documents'
+export const OCR_PROMPTS_KEY = 'bp_ocr_prompts'
+export const SESSION_ID_KEY = 'bp_upload_session'
+
+// Worksheet format templates
+export const worksheetFormats: { id: WorksheetFormat; label: string; description: string; icon: string }[] = [
+  {
+    id: 'standard',
+    label: 'Standard-Format',
+    description: 'Klassisches Arbeitsblatt mit waehlbarer Uebersetzungsrichtung',
+    icon: 'document'
+  },
+  {
+    id: 'nru',
+    label: 'NRU-Vorlage',
+    description: '3-Spalten-Tabelle (EN|DE|Korrektur) + Lernsaetze mit Uebersetzungszeilen',
+    icon: 'template'
+  },
+]
+
+// Default OCR filtering prompts
+export const defaultOcrPrompts: OcrPrompts = {
+  filterHeaders: true,
+  filterFooters: true,
+  filterPageNumbers: true,
+  customFilter: '',
+  headerPatterns: ['Unit', 'Chapter', 'Lesson', 'Kapitel', 'Lektion'],
+  footerPatterns: ['zweihundert', 'dreihundert', 'vierhundert', 'Page', 'Seite']
+}
+
+export const worksheetTypes: { id: WorksheetType; label: string; description: string }[] = [
+  { id: 'en_to_de', label: 'Englisch → Deutsch', description: 'Englische Woerter uebersetzen' },
+  { id: 'de_to_en', label: 'Deutsch → Englisch', description: 'Deutsche Woerter uebersetzen' },
+  { id: 'copy', label: 'Abschreibuebung', description: 'Woerter mehrfach schreiben' },
+  { id: 'gap_fill', label: 'Lueckensaetze', description: 'Saetze mit Luecken ausfuellen' },
+]
+
+export const formatFileSize = (bytes: number): string => {
+  if (bytes === 0) return '0 B'
+  const k = 1024
+  const sizes = ['B', 'KB', 'MB', 'GB']
+  const i = Math.floor(Math.log(bytes) / Math.log(k))
+  return parseFloat((bytes / Math.pow(k, i)).toFixed(1)) + ' ' + sizes[i]
+}
--- a/studio-v2/app/vocab-worksheet/page.tsx
+++ b/studio-v2/app/vocab-worksheet/page.tsx
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`"""OCR Kombi Pipeline - modular step-based OCR processing."""`