docs+test: add Kombi-Modus tests (19 passing) and MkDocs documentation
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 35s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 24s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 35s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 24s
- test_paddle_kombi.py: 6 IoU tests, 10 merge tests, 2 bullet-point tests - OCR-Pipeline.md: new "OCR Overlay" section with Paddle Direct/Kombi docs, merge algorithm flowchart, dateistruktur update, changelog v4.5.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -162,6 +162,12 @@ admin-lehrer/
|
||||
├── app/(admin)/ai/ocr-pipeline/
|
||||
│ ├── page.tsx # Haupt-Page mit Session-Management
|
||||
│ └── types.ts # TypeScript Interfaces
|
||||
├── app/(admin)/ai/ocr-overlay/
|
||||
│ ├── page.tsx # OCR Overlay: 3 Modi (Pipeline/Paddle/Kombi)
|
||||
│ └── types.ts # OVERLAY_/PADDLE_DIRECT_/KOMBI_STEPS
|
||||
├── components/ocr-overlay/
|
||||
│ ├── PaddleDirectStep.tsx # Wiederverwendbar fuer Paddle Direct + Kombi
|
||||
│ └── OverlayReconstruction.tsx # Overlay-Anzeige auf Bildhintergrund
|
||||
└── components/ocr-pipeline/
|
||||
├── PipelineStepper.tsx # Fortschritts-Stepper
|
||||
├── StepOrientation.tsx # Schritt 1: Orientierung
|
||||
@@ -1081,10 +1087,139 @@ ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/brea
|
||||
|
||||
---
|
||||
|
||||
## OCR Overlay — Alternative Pipelines
|
||||
|
||||
**URL:** https://macmini:3002/ai/ocr-overlay
|
||||
|
||||
Neben der vollen 10-Schritt-Pipeline gibt es die **OCR Overlay**-Seite mit
|
||||
vereinfachten Pfaden fuer schnelle Ergebnisse. Alle drei Modi teilen die
|
||||
gleichen Vorverarbeitungsschritte (Orient → Deskew → Dewarp → Crop).
|
||||
|
||||
### Modus-Uebersicht
|
||||
|
||||
| Modus | Schritte | Engine | Endpoint | Beschreibung |
|
||||
|-------|----------|--------|----------|--------------|
|
||||
| **Pipeline** | 7 | Tesseract | `/words` (SSE) | Volle Pipeline: Zeilen + Woerter + Overlay |
|
||||
| **Paddle Direct** | 5 | PaddleOCR | `/paddle-direct` | PaddleOCR ersetzt Zeilen + Woerter + Overlay |
|
||||
| **Kombi** | 5 | PaddleOCR + Tesseract | `/paddle-kombi` | Beide Engines, Ergebnisse gemittelt |
|
||||
|
||||
### Flussdiagramm
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ GEMEINSAME VORVERARBEITUNG (alle 3 Modi) │
|
||||
│ │
|
||||
│ Schritt 1: Orientierung │
|
||||
│ Schritt 2: Deskew │
|
||||
│ Schritt 3: Dewarp │
|
||||
│ Schritt 4: Crop │
|
||||
└──────────────────┬────────────────────┬───────────────────────┘
|
||||
│ │
|
||||
┌───────────┼────────────────────┼────────────────┐
|
||||
▼ ▼ ▼ ▼
|
||||
PIPELINE PADDLE DIRECT KOMBI-MODUS
|
||||
(7 Schritte) (5 Schritte) (5 Schritte)
|
||||
│ │ │
|
||||
Zeilen- PaddleOCR PaddleOCR
|
||||
erkennung word_boxes + Tesseract
|
||||
│ │ parallel
|
||||
Woerter- build_grid_ │
|
||||
erkennung from_words() _merge_paddle_
|
||||
│ │ tesseract()
|
||||
Overlay Overlay │
|
||||
│ │ build_grid_
|
||||
▼ ▼ from_words()
|
||||
Ergebnis Ergebnis │
|
||||
Overlay
|
||||
│
|
||||
Ergebnis
|
||||
```
|
||||
|
||||
### Paddle Direct
|
||||
|
||||
PaddleOCR laeuft auf dem vorverarbeiteten Bild und erkennt Woerter direkt.
|
||||
|
||||
**Endpoint:** `POST /api/v1/ocr-pipeline/sessions/{id}/paddle-direct`
|
||||
|
||||
**Ablauf:**
|
||||
|
||||
1. Cropped/dewarped Bild laden (Prioritaet: cropped > dewarped > original)
|
||||
2. `ocr_region_paddle(img_bgr, region=None)` aufrufen
|
||||
3. `build_grid_from_words(word_dicts, img_w, img_h)` fuer Grid-Erstellung
|
||||
4. Cells mit `ocr_engine="paddle_direct"` taggen
|
||||
5. In DB speichern (`current_step=8`)
|
||||
|
||||
**Frontend:** `PaddleDirectStep.tsx` — wiederverwendbare Komponente mit konfigurierbaren Props.
|
||||
|
||||
### Kombi-Modus (PaddleOCR + Tesseract)
|
||||
|
||||
!!! info "Motivation"
|
||||
PaddleOCR liefert gute Texterkennung, positioniert Woerter aber manchmal falsch
|
||||
(z.B. `!Betonung` als ein Wort, Bullet Points nicht erkannt). Tesseract erkennt
|
||||
Sonderzeichen besser und liefert feinere Word-Level-Boxen. Der Kombi-Modus
|
||||
nutzt beide Engines und mittelt die Koordinaten.
|
||||
|
||||
**Endpoint:** `POST /api/v1/ocr-pipeline/sessions/{id}/paddle-kombi`
|
||||
|
||||
**Ablauf:**
|
||||
|
||||
1. Cropped/dewarped Bild laden
|
||||
2. **Parallel** beide Engines aufrufen:
|
||||
- `ocr_region_paddle(img_bgr, region=None)` → `paddle_words`
|
||||
- `pytesseract.image_to_data(pil_img, lang='eng+deu', config='--psm 6 --oem 3')` → `tess_words`
|
||||
3. **Merge:** `_merge_paddle_tesseract(paddle_words, tess_words)`
|
||||
4. `build_grid_from_words(merged_words, img_w, img_h)` fuer Grid
|
||||
5. Cells mit `ocr_engine="kombi"` taggen
|
||||
6. In DB speichern
|
||||
|
||||
#### Merge-Algorithmus
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Paddle-Wort] --> B{Tesseract-Match<br/>IoU > 0.3?}
|
||||
B -->|Ja| C[Koordinaten mitteln<br/>gewichtet nach Confidence]
|
||||
B -->|Nein| D[Paddle-Wort behalten]
|
||||
E[Ungematchte<br/>Tesseract-Woerter] --> F{Confidence >= 40?}
|
||||
F -->|Ja| G[Hinzufuegen<br/>Bullet Points, Symbole]
|
||||
F -->|Nein| H[Verwerfen]
|
||||
```
|
||||
|
||||
**Koordinaten-Mittelung:**
|
||||
|
||||
```
|
||||
merged_left = (paddle_left × paddle_conf + tess_left × tess_conf) / (paddle_conf + tess_conf)
|
||||
```
|
||||
|
||||
Gleiches Prinzip fuer `top`, `width`, `height`. Der Text kommt immer von PaddleOCR (bessere Texterkennung).
|
||||
|
||||
#### Dateien
|
||||
|
||||
| Datei | Aenderung |
|
||||
|-------|-----------|
|
||||
| `ocr_pipeline_api.py` | `_box_iou()`, `_merge_paddle_tesseract()`, `/paddle-kombi` Endpoint |
|
||||
| `admin-lehrer/.../ocr-overlay/types.ts` | `KOMBI_STEPS` Konstante |
|
||||
| `admin-lehrer/.../PaddleDirectStep.tsx` | Wiederverwendbar mit `endpoint`/`engineKey` Props |
|
||||
| `admin-lehrer/.../ocr-overlay/page.tsx` | 3er-Toggle: Pipeline / Paddle Direct / Kombi |
|
||||
|
||||
#### Tests
|
||||
|
||||
```bash
|
||||
cd klausur-service/backend && pytest tests/test_paddle_kombi.py -v
|
||||
```
|
||||
|
||||
| Testklasse | Tests | Beschreibung |
|
||||
|------------|-------|--------------|
|
||||
| `TestBoxIoU` | 6 | IoU-Berechnung: identisch, kein Overlap, teilweise, enthalten, Kante, Null-Flaeche |
|
||||
| `TestMergePaddleTesseract` | 10 | Merge: Match-Averaging, kein Match, Low-Conf-Drop, leer, IoU-Schwelle, Text-Praeferenz, Zero-Conf |
|
||||
| `TestMergePaddleTesseractBulletPoints` | 2 | Bullet-Points und Sonderzeichen von Tesseract |
|
||||
|
||||
---
|
||||
|
||||
## Aenderungshistorie
|
||||
|
||||
| Datum | Version | Aenderung |
|
||||
|-------|---------|----------|
|
||||
| 2026-03-12 | 4.5.0 | Kombi-Modus (PaddleOCR + Tesseract): Beide Engines laufen parallel, Koordinaten werden IoU-basiert gematcht und confidence-gewichtet gemittelt. Ungematchte Tesseract-Woerter (Bullets, Symbole) werden hinzugefuegt. 3er-Toggle in OCR Overlay. |
|
||||
| 2026-03-12 | 4.4.0 | PaddleOCR Remote-Engine (`engine=paddle`): PP-OCRv5 Latin auf Hetzner x86_64. Neuer Microservice (`paddleocr-service/`), HTTP-Client (`paddleocr_remote.py`), Frontend-Dropdown-Option. Nutzt words_first Grid-Methode. |
|
||||
| 2026-03-12 | 4.3.0 | Words-First Grid Builder (`cv_words_first.py`): Bottom-up-Algorithmus clustert Tesseract word_boxes direkt zu Spalten/Zeilen/Zellen. Neuer `grid_method` Parameter im `/words` Endpoint. Frontend-Toggle in StepWordRecognition. |
|
||||
| 2026-03-10 | 4.2.0 | Rekonstruktion: Overlay-Modus mit Pixel-Wortpositionierung, 180°-Rotation, Sub-Session-Merging, usePixelWordPositions Hook, Box-Boundary-Schutz (box_ranges_inner) |
|
||||
|
||||
Reference in New Issue
Block a user