Commit Graph

6 Commits

Author SHA1 Message Date
Benjamin Admin
587b066a40 feat(ocr-pipeline): ground-truth comparison tool for column detection
Side-by-side view: auto result (readonly) vs GT editor where teacher
draws correct columns. Diff table shows Auto vs GT with IoU matching.
GT data persisted per session for algorithm tuning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:48:37 +01:00
Benjamin Admin
bb879a03a8 feat(ocr-pipeline): add column_ignore type for margins/empty areas
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:51:56 +01:00
Benjamin Admin
7a3570fe46 feat(ocr-pipeline): manual column editor for Step 3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:27:54 +01:00
Benjamin Admin
1393a994f9 Flexible inhaltsbasierte Spaltenerkennung (2-Phasen)
Ersetzt hardcodierte Positionsregeln durch ein zweistufiges System:
Phase A erkennt Spaltengeometrie (Clustering), Phase B klassifiziert
Typen per Inhalt (Sprache/Rolle) mit 3-stufiger Fallback-Kette.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:33:35 +01:00
Benjamin Admin
cf27a95308 feat(ocr-pipeline): word-based 5-column detection for vocabulary pages
Replace projection-profile layout analysis with Tesseract word bounding
box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE,
markers, examples). Falls back to projection profiles when < 3 clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:08:14 +01:00
Benjamin Admin
aa06ae0f61 feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3)
Sessions werden jetzt in PostgreSQL gespeichert statt in-memory.
Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben
Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout()
fuer automatische Spaltenerkennung mit farbigem Overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:16:37 +01:00