Phase 1 of the clean architecture refactor: Replaces the 751-line ocr-overlay
monolith with a modular pipeline. Each step gets its own component file.
Frontend: /ai/ocr-kombi route with 11 steps (Upload, Orientation, PageSplit,
Deskew, Dewarp, ContentCrop, OCR, Structure, GridBuild, GridReview, GroundTruth).
Session list supports document grouping for multi-page uploads.
Backend: New ocr_kombi/ module with multi-page PDF upload (splits PDF into N
sessions with shared document_group_id). DB migration adds document_group_id
and page_number columns.
Old /ai/ocr-overlay remains fully functional for A/B testing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: new grid_editor_api.py with build-grid endpoint that detects
bordered boxes, splits page into zones, clusters columns/rows per zone
from Kombi word positions. New DB column grid_editor_result JSONB.
Frontend: GridEditor component with editable HTML tables per zone,
column bold toggle, header row toggle, undo/redo, keyboard navigation
(Tab/Enter/Arrow), image overlay verification, and save/load.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sessions werden jetzt in PostgreSQL gespeichert statt in-memory.
Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben
Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout()
fuer automatische Spaltenerkennung mit farbigem Overlay.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Neue Route /ai/ocr-pipeline mit schrittweiser Begradigung (Deskew),
Raster-Overlay und Ground Truth. Schritte 2-6 als Platzhalter.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>