feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows)
Backend: build_word_grid() intersects column regions with content rows, OCRs each cell with language-specific Tesseract, and returns vocabulary entries with percent-based bounding boxes. New endpoints: POST /words, GET /image/words-overlay, ground-truth save/retrieve for words. Frontend: StepWordRecognition with overview + step-through labeling modes, goToStep callback for row correction feedback loop. MkDocs: OCR Pipeline documentation added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
373
docs-src/services/klausur-service/OCR-Pipeline.md
Normal file
373
docs-src/services/klausur-service/OCR-Pipeline.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# OCR Pipeline - Schrittweise Seitenrekonstruktion
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Status:** In Entwicklung
|
||||
**URL:** https://macmini:3002/ai/ocr-pipeline
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Die OCR Pipeline zerlegt den OCR-Prozess in **8 einzelne Schritte**, um eingescannte Vokabelseiten Wort fuer Wort zu rekonstruieren. Jeder Schritt kann individuell geprueft, korrigiert und mit Ground-Truth-Daten versehen werden.
|
||||
|
||||
**Ziel:** 10 Vokabelseiten fehlerfrei rekonstruieren.
|
||||
|
||||
### Pipeline-Schritte
|
||||
|
||||
| Schritt | Name | Beschreibung | Status |
|
||||
|---------|------|--------------|--------|
|
||||
| 1 | Begradigung (Deskew) | Scan begradigen (Hough Lines + Word Alignment) | Implementiert |
|
||||
| 2 | Entzerrung (Dewarp) | Buchwoelbung entzerren (Vertikalkanten-Analyse) | Implementiert |
|
||||
| 3 | Spaltenerkennung | Unsichtbare Spalten finden (Projektionsprofile) | Implementiert |
|
||||
| 4 | Zeilenerkennung | Horizontale Zeilen + Kopf-/Fusszeilen-Klassifikation | Implementiert |
|
||||
| 5 | Worterkennung | Grid aus Spalten x Zeilen, OCR pro Zelle | Implementiert |
|
||||
| 6 | Koordinatenzuweisung | Exakte Positionen innerhalb Zellen | Geplant |
|
||||
| 7 | Seitenrekonstruktion | Seite nachbauen aus Koordinaten | Geplant |
|
||||
| 8 | Ground Truth Validierung | Gesamtpruefung aller Schritte | Geplant |
|
||||
|
||||
---
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
Admin-Lehrer (Next.js) klausur-service (FastAPI :8086)
|
||||
┌────────────────────┐ ┌─────────────────────────────┐
|
||||
│ /ai/ocr-pipeline │ │ /api/v1/ocr-pipeline/ │
|
||||
│ │ REST │ │
|
||||
│ PipelineStepper │◄────────►│ Sessions CRUD │
|
||||
│ StepDeskew │ │ Image Serving │
|
||||
│ StepDewarp │ │ Deskew/Dewarp/Columns/Rows │
|
||||
│ StepColumnDetection│ │ Word Recognition │
|
||||
│ StepRowDetection │ │ Ground Truth │
|
||||
│ StepWordRecognition│ │ Overlay Images │
|
||||
└────────────────────┘ └─────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ PostgreSQL │
|
||||
│ ocr_pipeline_sessions│
|
||||
│ (Images + JSONB) │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Dateistruktur
|
||||
|
||||
```
|
||||
klausur-service/backend/
|
||||
├── ocr_pipeline_api.py # FastAPI Router (alle Endpoints)
|
||||
├── ocr_pipeline_session_store.py # PostgreSQL Persistence
|
||||
├── cv_vocab_pipeline.py # Computer Vision Algorithmen
|
||||
└── migrations/
|
||||
├── 002_ocr_pipeline_sessions.sql # Basis-Schema
|
||||
├── 003_add_row_result.sql # Row-Result Spalte
|
||||
└── 004_add_word_result.sql # Word-Result Spalte
|
||||
|
||||
admin-lehrer/
|
||||
├── app/(admin)/ai/ocr-pipeline/
|
||||
│ ├── page.tsx # Haupt-Page mit Session-Management
|
||||
│ └── types.ts # TypeScript Interfaces
|
||||
└── components/ocr-pipeline/
|
||||
├── PipelineStepper.tsx # Fortschritts-Stepper
|
||||
├── StepDeskew.tsx # Schritt 1
|
||||
├── StepDewarp.tsx # Schritt 2
|
||||
├── StepColumnDetection.tsx # Schritt 3
|
||||
├── StepRowDetection.tsx # Schritt 4
|
||||
├── StepWordRecognition.tsx # Schritt 5
|
||||
├── StepCoordinates.tsx # Schritt 6 (Platzhalter)
|
||||
├── StepReconstruction.tsx # Schritt 7 (Platzhalter)
|
||||
└── StepGroundTruth.tsx # Schritt 8 (Platzhalter)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API-Referenz
|
||||
|
||||
Alle Endpoints unter `/api/v1/ocr-pipeline/`.
|
||||
|
||||
### Sessions
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions` | Neue Session erstellen (Bild hochladen) |
|
||||
| `GET` | `/sessions` | Alle Sessions auflisten |
|
||||
| `GET` | `/sessions/{id}` | Session-Info mit allen Step-Results |
|
||||
| `PUT` | `/sessions/{id}` | Session umbenennen |
|
||||
| `DELETE` | `/sessions/{id}` | Session loeschen |
|
||||
|
||||
### Bilder
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `GET` | `/sessions/{id}/image/original` | Originalbild |
|
||||
| `GET` | `/sessions/{id}/image/deskewed` | Begradigtes Bild |
|
||||
| `GET` | `/sessions/{id}/image/dewarped` | Entzerrtes Bild |
|
||||
| `GET` | `/sessions/{id}/image/binarized` | Binarisiertes Bild |
|
||||
| `GET` | `/sessions/{id}/image/columns-overlay` | Spalten-Overlay |
|
||||
| `GET` | `/sessions/{id}/image/rows-overlay` | Zeilen-Overlay |
|
||||
| `GET` | `/sessions/{id}/image/words-overlay` | Wort-Grid-Overlay |
|
||||
|
||||
### Schritt 1: Begradigung
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions/{id}/deskew` | Automatische Begradigung |
|
||||
| `POST` | `/sessions/{id}/deskew/manual` | Manuelle Winkelkorrektur |
|
||||
| `POST` | `/sessions/{id}/ground-truth/deskew` | Ground Truth speichern |
|
||||
|
||||
### Schritt 2: Entzerrung
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions/{id}/dewarp` | Automatische Entzerrung |
|
||||
| `POST` | `/sessions/{id}/dewarp/manual` | Manueller Scherbungswinkel |
|
||||
| `POST` | `/sessions/{id}/ground-truth/dewarp` | Ground Truth speichern |
|
||||
|
||||
### Schritt 3: Spalten
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions/{id}/columns` | Automatische Spaltenerkennung |
|
||||
| `POST` | `/sessions/{id}/columns/manual` | Manuelle Spalten-Definition |
|
||||
| `POST` | `/sessions/{id}/ground-truth/columns` | Ground Truth speichern |
|
||||
|
||||
### Schritt 4: Zeilen
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions/{id}/rows` | Automatische Zeilenerkennung |
|
||||
| `POST` | `/sessions/{id}/rows/manual` | Manuelle Zeilen-Definition |
|
||||
| `POST` | `/sessions/{id}/ground-truth/rows` | Ground Truth speichern |
|
||||
| `GET` | `/sessions/{id}/ground-truth/rows` | Ground Truth abrufen |
|
||||
|
||||
### Schritt 5: Worterkennung
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|--------------|
|
||||
| `POST` | `/sessions/{id}/words` | Wort-Grid aus Spalten x Zeilen erstellen |
|
||||
| `POST` | `/sessions/{id}/ground-truth/words` | Ground Truth speichern |
|
||||
| `GET` | `/sessions/{id}/ground-truth/words` | Ground Truth abrufen |
|
||||
|
||||
---
|
||||
|
||||
## Schritt 5: Worterkennung (Detail)
|
||||
|
||||
### Algorithmus: `build_word_grid()`
|
||||
|
||||
Schritt 5 nutzt die Ergebnisse von Schritt 3 (Spalten) und Schritt 4 (Zeilen), um ein Grid zu erstellen und jede Zelle per OCR auszulesen.
|
||||
|
||||
```
|
||||
Spalten (Step 3): column_en | column_de | column_example
|
||||
───────────┼─────────────┼────────────────
|
||||
Zeilen (Step 4): R0 │ hello │ hallo │ Hello, World!
|
||||
R1 │ world │ Welt │ The whole world
|
||||
R2 │ book │ Buch │ Read a book
|
||||
───────────┼─────────────┼────────────────
|
||||
```
|
||||
|
||||
**Ablauf:**
|
||||
|
||||
1. **Filterung**: Nur `content`-Zeilen (kein Header/Footer) und relevante Spalten (`column_en`, `column_de`, `column_example`)
|
||||
2. **Zell-Bildung**: Pro content-Zeile x pro relevante Spalte eine `PageRegion` berechnen
|
||||
3. **OCR**: `ocr_region()` mit PSM 7 (Single Line) pro Zelle aufrufen
|
||||
4. **Sprache**: `eng` fuer EN-Spalte, `deu` fuer DE-Spalte, `eng+deu` fuer Beispiele
|
||||
5. **Gruppierung**: Zellen zu Vokabel-Eintraegen zusammenfuehren
|
||||
|
||||
### Response-Format
|
||||
|
||||
```json
|
||||
{
|
||||
"entries": [
|
||||
{
|
||||
"row_index": 0,
|
||||
"english": "hello",
|
||||
"german": "hallo",
|
||||
"example": "Hello, how are you?",
|
||||
"confidence": 85.3,
|
||||
"bbox": {"x": 5.2, "y": 12.1, "w": 90.0, "h": 2.8},
|
||||
"bbox_en": {"x": 5.2, "y": 12.1, "w": 30.0, "h": 2.8},
|
||||
"bbox_de": {"x": 35.5, "y": 12.1, "w": 25.0, "h": 2.8},
|
||||
"bbox_ex": {"x": 61.0, "y": 12.1, "w": 34.2, "h": 2.8}
|
||||
}
|
||||
],
|
||||
"entry_count": 25,
|
||||
"image_width": 2480,
|
||||
"image_height": 3508,
|
||||
"duration_seconds": 3.2,
|
||||
"summary": {
|
||||
"total_entries": 25,
|
||||
"with_english": 24,
|
||||
"with_german": 22,
|
||||
"low_confidence": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
!!! info "Bounding Boxes in Prozent"
|
||||
Alle `bbox`-Werte sind Prozent (0-100) relativ zur Bildgroesse.
|
||||
Das erleichtert die Darstellung im Frontend unabhaengig von der Bildaufloesung.
|
||||
|
||||
### Frontend: StepWordRecognition
|
||||
|
||||
Die Komponente bietet zwei Modi:
|
||||
|
||||
**Uebersicht-Modus:**
|
||||
|
||||
- Zwei Bilder nebeneinander: Grid-Overlay vs. sauberes Bild
|
||||
- Tabelle aller erkannten Eintraege mit Konfidenz-Werten
|
||||
- Klick auf Eintrag wechselt zum Labeling-Modus
|
||||
|
||||
**Labeling-Modus (Step-Through):**
|
||||
|
||||
- Links (2/3): Bild mit hervorgehobenem aktiven Eintrag (gelber Rahmen)
|
||||
- Rechts (1/3): Zell-Ausschnitte + editierbare Felder (English, Deutsch, Example)
|
||||
- Tastaturkuerzel:
|
||||
- `Enter` = Bestaetigen und weiter
|
||||
- `Ctrl+Pfeil runter` = Ueberspringen
|
||||
- `Ctrl+Pfeil hoch` = Zurueck
|
||||
|
||||
**Feedback-Loop:**
|
||||
|
||||
- "Zeilen korrigieren" springt zurueck zu Schritt 4
|
||||
- Nach Korrektur der Zeilen kann Schritt 5 erneut ausgefuehrt werden
|
||||
|
||||
---
|
||||
|
||||
## Datenbank-Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE ocr_pipeline_sessions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
name VARCHAR(255),
|
||||
filename VARCHAR(255),
|
||||
status VARCHAR(50) DEFAULT 'active',
|
||||
current_step INT DEFAULT 1,
|
||||
|
||||
-- Bilder (BYTEA)
|
||||
original_png BYTEA,
|
||||
deskewed_png BYTEA,
|
||||
binarized_png BYTEA,
|
||||
dewarped_png BYTEA,
|
||||
|
||||
-- Step-Results (JSONB)
|
||||
deskew_result JSONB,
|
||||
dewarp_result JSONB,
|
||||
column_result JSONB,
|
||||
row_result JSONB,
|
||||
word_result JSONB,
|
||||
|
||||
-- Ground Truth + Meta
|
||||
ground_truth JSONB,
|
||||
auto_shear_degrees REAL,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### Migrationen
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `002_ocr_pipeline_sessions.sql` | Basis-Schema (Steps 1-3) |
|
||||
| `003_add_row_result.sql` | `row_result JSONB` fuer Step 4 |
|
||||
| `004_add_word_result.sql` | `word_result JSONB` fuer Step 5 |
|
||||
|
||||
---
|
||||
|
||||
## TypeScript Interfaces
|
||||
|
||||
Die wichtigsten Typen in `types.ts`:
|
||||
|
||||
```typescript
|
||||
interface WordEntry {
|
||||
row_index: number
|
||||
english: string
|
||||
german: string
|
||||
example: string
|
||||
confidence: number
|
||||
bbox: WordBbox // Gesamte Zeile
|
||||
bbox_en: WordBbox | null // EN-Zelle
|
||||
bbox_de: WordBbox | null // DE-Zelle
|
||||
bbox_ex: WordBbox | null // Example-Zelle
|
||||
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
|
||||
}
|
||||
|
||||
interface WordResult {
|
||||
entries: WordEntry[]
|
||||
entry_count: number
|
||||
image_width: number
|
||||
image_height: number
|
||||
duration_seconds: number
|
||||
summary: {
|
||||
total_entries: number
|
||||
with_english: number
|
||||
with_german: number
|
||||
low_confidence: number
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Ground Truth System
|
||||
|
||||
Jeder Schritt kann mit Ground-Truth-Feedback versehen werden:
|
||||
|
||||
```json
|
||||
{
|
||||
"is_correct": false,
|
||||
"corrected_entries": [...],
|
||||
"notes": "Zeile 5 falsch erkannt",
|
||||
"saved_at": "2026-02-28T10:30:00"
|
||||
}
|
||||
```
|
||||
|
||||
Ground-Truth-Daten werden in der `ground_truth` JSONB-Spalte gespeichert, gruppiert nach Schritt:
|
||||
|
||||
```json
|
||||
{
|
||||
"deskew": { "is_correct": true, ... },
|
||||
"dewarp": { "is_correct": true, ... },
|
||||
"columns": { "is_correct": false, ... },
|
||||
"rows": { "is_correct": true, ... },
|
||||
"words": { "is_correct": false, ... }
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
```bash
|
||||
# 1. Git push
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# 2. Mac Mini pull + build
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && git pull --no-rebase origin main"
|
||||
|
||||
# klausur-service (Backend)
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && \
|
||||
/usr/local/bin/docker compose build --no-cache klausur-service && \
|
||||
/usr/local/bin/docker compose up -d klausur-service"
|
||||
|
||||
# admin-lehrer (Frontend)
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && \
|
||||
/usr/local/bin/docker compose build --no-cache admin-lehrer && \
|
||||
/usr/local/bin/docker compose up -d admin-lehrer"
|
||||
|
||||
# 3. Migration ausfuehren
|
||||
ssh macmini "/usr/local/bin/docker exec bp-lehrer-klausur-service \
|
||||
python -c \"import asyncio; from ocr_pipeline_session_store import *; asyncio.run(init_ocr_pipeline_tables())\""
|
||||
|
||||
# 4. Testen unter:
|
||||
# https://macmini:3002/ai/ocr-pipeline
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Aenderungshistorie
|
||||
|
||||
| Datum | Version | Aenderung |
|
||||
|-------|---------|----------|
|
||||
| 2026-02-28 | 1.0.0 | Schritt 5 (Worterkennung) implementiert |
|
||||
| 2026-02-22 | 0.4.0 | Schritt 4 (Zeilenerkennung) implementiert |
|
||||
| 2026-02-20 | 0.3.0 | Schritt 3 (Spaltenerkennung) mit Typ-Klassifikation |
|
||||
| 2026-02-15 | 0.2.0 | Schritt 2 (Entzerrung/Dewarp) |
|
||||
| 2026-02-12 | 0.1.0 | Schritt 1 (Begradigung/Deskew) + Session-Management |
|
||||
Reference in New Issue
Block a user