Commit Graph

75 Commits

Author SHA1 Message Date
Benjamin Admin
a6069631cc feat: PaddleOCR Remote-Engine (PP-OCRv5 Latin auf Hetzner x86_64)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m7s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 21s
PaddleOCR als neue engine=paddle Option in der OCR-Pipeline.
Microservice auf Hetzner (paddleocr-service/), async HTTP-Client
(paddleocr_remote.py), Frontend-Dropdown, automatisch words_first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:31:22 +01:00
Benjamin Admin
ced5bb3dd3 feat: Words-First Grid Builder (bottom-up alternative zu cell_grid_v2)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 54s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 32s
Neuer Algorithmus in cv_words_first.py: Clustert Tesseract word_boxes
direkt zu Spalten (X-Gap) und Zeilen (Y-Proximity), baut Zellen an
Schnittpunkten. Kein Spalten-/Zeilenerkennung noetig.

- cv_words_first.py: _cluster_columns, _cluster_rows, _build_cells, build_grid_from_words
- ocr_pipeline_api.py: grid_method Parameter (v2|words_first) im /words Endpoint
- StepWordRecognition.tsx: Dropdown Toggle fuer Grid-Methode
- OCR-Pipeline.md: Doku v4.3.0 mit Words-First Algorithmus
- 15 Unit-Tests fuer cv_words_first

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 06:46:05 +01:00
Benjamin Admin
2f51ac617f feat: IPA-Lautschrift in Cell-Texte einfuegen (fuer Overlay-Modus)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m5s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 22s
fix_cell_phonetics() ersetzt fehlerhafte IPA-Klammern UND fuegt fehlende
Lautschrift fuer englische Woerter ein (z.B. badge, film, challenge, profit).
Wird auf alle Zellen mit col_type column_en/column_text angewandt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 15:47:26 +01:00
Benjamin Admin
41e47baf13 fix: skip_heal_gaps Parameter an Stream-Generator durchreichen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m6s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 28s
NameError behoben: skip_heal_gaps war nicht im Scope der
_word_batch_stream_generator Funktion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 09:11:16 +01:00
Benjamin Admin
8a60f4bf30 fix: Overlay-Zellen ohne _heal_row_gaps positionieren (skip_heal_gaps)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 21s
_heal_row_gaps verschiebt Zell-Positionen nach Entfernung von Artefakt-Zeilen,
was im Overlay zu sichtbarem Versatz fuehrt (z.B. 23px bei "badge").
Neuer skip_heal_gaps Parameter in build_cell_grid_v2 und words-Endpoint
behaelt die exakten Zeilen-Positionen bei.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 08:59:50 +01:00
Benjamin Admin
e3ee1de790 Revert "fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)"
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m2s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 24s
This reverts commit b91f799ccf.
2026-03-11 08:44:07 +01:00
Benjamin Admin
b91f799ccf fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 26s
Seiten mit Info-Boxen (andere Zeilenhoehe) fuehren dazu, dass _regularize_row_grid
die Zeilenpositionen verzerrt. Neuer skip_regularize Parameter nutzt stattdessen
die gap-basierten Zeilen, die der tatsaechlichen Seitengeometrie folgen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 08:29:06 +01:00
Benjamin Admin
e2ad93fd57 fix: Word-Erkennung ohne Spalten ermoeglichen (Full-Page Pseudo-Column)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m14s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 22s
Wenn column_result fehlt (z.B. OCR Overlay Pipeline), wird automatisch
eine einzelne ganzseitige Pseudo-Spalte erzeugt statt einen Fehler zu werfen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 00:16:31 +01:00
Benjamin Admin
618c82ef42 fix: Zeilen an Box-Grenze nicht mehr abschneiden (border_thickness Margin)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 25s
- detect_rows: Content-Strips nutzen jetzt box_ranges_inner (geschrumpft
  um border_thickness, min 5px) statt der vollen Box-Range
- detect_words: _row_in_box Filter nutzt ebenfalls inner Range
- Dadurch wird die letzte Zeile oberhalb einer Box nicht mehr
  faelschlicherweise der Box zugeordnet und ausgeschlossen

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:44:02 +01:00
Benjamin Admin
6bb023bdc1 fix: vocab_entries fuer column_text Sub-Sessions generieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m8s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 23s
_cells_to_vocab_entries wurde nur bei is_vocab (column_en/column_de)
aufgerufen. Fuer Sub-Sessions mit column_text wurden keine Eintraege
erzeugt, daher blieb die Korrektur-Tabelle leer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:28:27 +01:00
Benjamin Admin
3a791179af debug: Logging fuer Sub-Session Woertererkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
Zeigt low-confidence Woerter (conf<30) und Zellinhalte pro Zeile,
um fehlende Euro/Pfund-Betraege zu diagnostizieren.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:31:34 +01:00
Benjamin Admin
f65bd11919 fix: Sub-Session Zeilenerkennung nutzt Word-Grouping statt Gap-Detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 23s
Gap-basierte Erkennung findet bei kleinen Box-Bildern zu wenige Gaps
und mergt Zeilen (7 raw gaps -> 4 validated -> nur 3 rows statt 6).
Sub-Sessions nutzen jetzt direkt _build_rows_from_word_grouping(),
das Woerter nach Y-Position clustert — robuster fuer komplexe Box-Layouts.

Zusaetzlich: alle zones=None Crashes gefixt (replace_all .get("zones") or []).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:05:24 +01:00
Benjamin Admin
785b4d7655 fix: zones=None crash bei Sub-Session Zeilenerkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
column_result.get("zones", []) gibt None zurueck wenn der Key mit
Wert None existiert. Geaendert zu .get("zones") or [].

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:50:58 +01:00
Benjamin Admin
2716495250 fix: Sub-Session Zeilenerkennung — Tesseract+inv im Spalten-Schritt cachen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
Bisher wurden _word_dicts, _inv und _content_bounds fuer Sub-Sessions
nicht gecacht, sodass detect_rows auf detect_column_geometry() zurueckfiel.
Das konnte bei kleinen Box-Bildern mit <5 Woertern fehlschlagen.

Jetzt laeuft Tesseract + Binarisierung direkt im Pseudo-Spalten-Block,
und die Intermediates werden gecacht. Zusaetzlich ausfuehrliche Kommentare
zur Zeilenerkennung (detect_row_geometry, _regularize_row_grid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:43:26 +01:00
Benjamin Admin
23b7840ea7 feat: Full-Row OCR mit Spacing fuer Box-Sub-Sessions
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m16s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 22s
Sub-Sessions ueberspringen Spaltenerkennung und nutzen stattdessen eine
Pseudo-Spalte ueber die volle Breite. Text wird mit proportionalem
Spacing aus Wort-Positionen rekonstruiert, um raeumliches Layout zu erhalten.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:28:29 +01:00
Benjamin Admin
34adb437d0 fix: Bild-Endpoints fallen auf original zurueck fuer Sub-Sessions
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m3s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s
Alle Bild-Endpoints (cropped, columns-overlay, rows-overlay,
words-overlay) suchten nur nach cropped/dewarped. Sub-Sessions haben
nur ein original-Bild. Neue Hilfsfunktion _get_base_image_png() mit
Fallback-Kette: cropped > dewarped > original.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:30:38 +01:00
Benjamin Admin
ceaef9c6a6 fix: Sub-Sessions original_bgr als cropped_bgr promoten
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m22s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 18s
Spalten-/Zeilen-/Woerter-Erkennung suchen nach cropped_bgr oder
dewarped_bgr. Bei Sub-Sessions existiert nur original_bgr (der
Box-Ausschnitt). Jetzt wird original_bgr automatisch als cropped_bgr
gesetzt, sowohl im Cache-Aufbau als auch bei der Erstellung.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:57:39 +01:00
Benjamin Admin
256efef3ea feat: Box-Zonen durch gesamte Pipeline + Sub-Sessions fuer Box-Inhalt
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
- Rote semi-transparente Box-Markierung in allen Overlays (Spalten, Zeilen, Woerter)
- Zeilenerkennung: Combined-Image-Ansatz schliesst Box-Bereiche aus
- Woerter-Erkennung: Zeilen innerhalb von Box-Zonen werden gefiltert
- Sub-Sessions: parent_session_id/box_index in DB-Schema
- POST /sessions/{id}/create-box-sessions erstellt Sub-Sessions aus Box-Regionen
- Session-Info zeigt Sub-Sessions bzw. Parent-Verknuepfung
- Sessions-Liste blendet Sub-Sessions per Default aus
- Rekonstruktion: Fabric-JSON merged Sub-Session-Zellen an Box-Positionen
- Save-Reconstruction routet box{N}_* Updates an Sub-Sessions
- GET /sessions/{id}/vocab-entries/merged fuer zusammengefuehrte Eintraege

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:24:34 +01:00
Benjamin Admin
7005b18561 feat: generische Box-Erkennung fuer zonenbasierte Spaltenerkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 19s
- Neue Datei cv_box_detect.py: 2-Stufen-Algorithmus (Linien + Farbe)
- DetectedBox/PageZone Dataclasses in cv_vocab_types.py
- detect_column_geometry_zoned() in cv_layout.py
- API-Endpoints erweitert: zones/boxes_detected im column_result
- Overlay-Funktionen zeichnen Box-Grenzen als gestrichelte Rechtecke
- Fix: numpy array or-Verknuepfung an 7 Stellen in ocr_pipeline_api.py
- 12 Unit-Tests fuer Box-Erkennung und Zone-Splitting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:06:23 +01:00
Benjamin Admin
e60254bc75 fix: alle Post-Crop-Schritte nutzen cropped statt dewarped Bild
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 24s
Spalten-, Zeilen-, Woerter-Overlay und alle nachfolgenden Steps
(LLM-Review, Rekonstruktion) lesen jetzt image/cropped mit Fallback
auf image/dewarped. Tests fuer page_crop.py hinzugefuegt (25 Tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 09:10:10 +01:00
Benjamin Admin
156a818246 refactor: Crop nach Deskew/Dewarp verschieben + content-basierter Buchscan-Crop
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m56s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 17s
Pipeline-Reihenfolge neu: Orientierung → Begradigung → Entzerrung → Zuschneiden → Spalten...
Crop arbeitet jetzt auf dem bereits geraden Bild, was bessere Ergebnisse liefert.

page_crop.py komplett ersetzt: Adaptive Threshold + 4-Kanten-Erkennung
(Buchruecken-Schatten links, Ink-Projektion fuer alle Raender) statt
Otsu + groesste Kontur.

Backend: Step-Nummern, Input-Bilder, Reprocess-Kaskade angepasst.
Frontend: PIPELINE_STEPS umgeordnet, Switch-Cases, Vorher-Bilder aktualisiert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:52:11 +01:00
Benjamin Admin
eb45bb4879 fix: numpy array or-Verknuepfung in Crop/Deskew + ImageCompareView Labels
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m17s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 24s
- orientation_crop_api.py: `array or array` durch `is not None` ersetzt
  (ValueError bei numpy Arrays)
- ocr_pipeline_api.py: gleicher Fix fuer Deskew-Fallback-Kette
- ImageCompareView.tsx: Fallback-Text nutzt rightLabel statt "Begradigung"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:02:44 +01:00
Benjamin Admin
2763631711 feat: Orientierung + Zuschneiden als Schritte 1-2 in OCR-Pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
Zwei neue Wizard-Schritte vor Begradigung:
- Step 1: Orientierungserkennung (0/90/180/270° via Tesseract OSD)
- Step 2: Seitenrand-Erkennung und Zuschnitt (Scannerraender entfernen)

Backend:
- orientation_crop_api.py: POST /orientation, POST /crop, POST /crop/skip
- page_crop.py: detect_and_crop_page() mit Format-Erkennung (A4/A5/Letter)
- Session-Store: orientation_result, crop_result Felder
- Pipeline nutzt zugeschnittenes Bild fuer Deskew/Dewarp

Frontend:
- StepOrientation.tsx: Upload + Auto-Orientierung + Vorher/Nachher
- StepCrop.tsx: Auto-Crop + Format-Badge + Ueberspringen-Option
- Pipeline-Stepper: 10 Schritte (war 8)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:55:23 +01:00
Benjamin Admin
931ab92c92 feat: Orientierungserkennung in OCR-Pipeline-Deskew integrieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 21s
detect_and_fix_orientation() wird jetzt vor dem Deskew-Schritt in der
OCR-Pipeline ausgefuehrt, sodass 90/180/270°-gedrehte Scans automatisch
korrigiert werden. Frontend zeigt Orientierungskorrektur als Info-Banner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:31:36 +01:00
Benjamin Admin
2ad391e4e4 feat: Feinabstimmung mit 7 Schiebereglern fuer Deskew/Dewarp
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Neues aufklappbares Panel unter Entzerrung mit individuellen Reglern:
- 3 Rotations-Regler (P1 Iterative, P2 Word-Alignment, P3 Textline)
- 4 Scherungs-Regler (A-D Methoden) mit Radio-Auswahl
- Kombinierte Vorschau und Ground-Truth-Speicherung
- Backend: POST /sessions/{id}/adjust-combined Endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 18:22:33 +01:00
Benjamin Admin
d39d249daa feat: add pass 3 text-line regression to deskew pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
After iterative projection (pass 1) and word-alignment (pass 2), a third
pass uses Tesseract word positions + linear regression per text line to
measure and correct residual rotation. This catches cases where passes 1-2
leave significant slope (e.g. 1.7° residual on heavily skewed scans).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 17:53:11 +01:00
Benjamin Admin
538d5c732e feat: two-pass deskew with wider angle range and residual correction
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m52s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
- Increase iterative deskew coarse_range from ±2° to ±5° to handle
  heavily skewed scans
- New deskew_two_pass(): runs iterative projection first, then
  word-alignment on the corrected image to detect/fix residual skew
  (applied when residual ≥ 0.3°)
- OCR pipeline API auto_deskew now uses deskew_two_pass by default
- Vocab worksheet _run_ocr_pipeline_for_page uses deskew_two_pass
- Deskew result now includes angle_residual and two_pass_debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 17:34:57 +01:00
Benjamin Admin
af1b12c97d feat: iterative projection-profile deskew (2-phase variance optimization)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 17s
Adds deskew_image_iterative() as 3rd deskew method that directly optimizes
for projection-profile sharpness instead of proxy signals (Hough/word alignment).
Coarse sweep on horizontal profile, fine sweep on vertical profile.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 13:46:44 +01:00
Benjamin Admin
1cc69d6b5e feat: OCR pipeline step 8 — validation view with image detection & generation
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 19s
Replaces the stub StepGroundTruth with a full side-by-side Original vs
Reconstruction view. Adds VLM-based image region detection (qwen2.5vl),
mflux image generation proxy, sync scroll/zoom, manual region drawing,
and score/notes persistence.

New backend endpoints: detect-images, generate-image, validate, get validation.
New standalone mflux-service (scripts/mflux-service.py) for Metal GPU generation.
Dockerfile.base: adds fonts-liberation (Apache-2.0).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:40:37 +01:00
Benjamin Admin
293e7914d8 feat: improved OCR pipeline session manager with categories, thumbnails, pipeline logging
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m48s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 20s
- Add document_category (10 types) and pipeline_log JSONB columns
- Session list: thumbnails, copyable IDs, category/doc_type badges
- Inline category dropdown, bulk delete, pipeline step logging
- New endpoints: thumbnail, delete-all, pipeline-log, categories
- Cleared all 22 old test sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 09:44:38 +01:00
Benjamin Admin
a58dfca1d8 fix: move char-confusion fix to correction step, add spell + page-ref corrections
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 30s
CI / test-nodejs-website (push) Successful in 20s
CI / nodejs-lint (push) Failing after 10m5s
- Remove _fix_character_confusion() from words endpoint (now only in Phase 0)
- Extend spell checker to find real OCR errors via spell.correction()
- Add field-aware dictionary selection (EN/DE) for spell corrections
- Add _normalize_page_ref() for page_ref column (p-60 → p.60)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 00:26:13 +01:00
Benjamin Admin
34c649c8be fix: send SSE keepalive events every 5s during batch OCR
Batch OCR takes 30-60s with 3x upscaling. Without keepalive events,
proxy servers (Nginx) drop the SSE connection after their read timeout.
Now sends keepalive events every 5s to prevent timeout, with elapsed
time for debugging. Also checks for client disconnect between keepalives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 22:21:14 +01:00
Benjamin Admin
dd16c88007 fix: retry words request on 400/404 + add backend diagnostic logging
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
Frontend: retry /words POST once after 2s delay if it gets 400/404,
which happens when navigating via wizard after container restart
(session cache not yet warm).

Backend: log when session needs DB reload and when dewarped_bgr is missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:15:54 +01:00
Benjamin Admin
68d230c297 fix: use batch-then-stream SSE for cell-first OCR
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m49s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 17s
The old per-cell streaming timed out because sequential cell OCR was
too slow to send the first event before proxy timeout. Now uses
build_cell_grid_v2 (parallel ThreadPoolExecutor) via run_in_executor,
then streams all cells at once after batch completes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 14:51:55 +01:00
Benjamin Admin
29c74a9962 feat: cell-first OCR + document type detection + dynamic pipeline steps
Cell-First OCR (v2): Each cell is cropped and OCR'd in isolation,
eliminating neighbour bleeding (e.g. "to", "ps" in marker columns).
Uses ThreadPoolExecutor for parallel Tesseract calls.

Document type detection: Classifies pages as vocab_table, full_text,
or generic_table using projection profiles (<2s, no OCR needed).
Frontend dynamically skips columns/rows steps for full-text pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:52:38 +01:00
Benjamin Admin
9dd77ab54a fix: move column expansion AFTER sub-column split
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s
The narrow column expansion was running inside detect_column_geometry()
on the 4 main columns, but the narrowest columns (marker ~14px,
page_ref ~93px) are created AFTERWARDS by _detect_sub_columns().

Extracted expand_narrow_columns() as standalone function and call it
after sub-column splitting in the columns API endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 10:07:40 +01:00
Benjamin Admin
ab3ecc7c08 feat: OCR pipeline v2.1 – narrow column OCR, dewarp automation, Fabric.js editor
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 15s
Proposal B: Adaptive padding, crop upscaling, PSM selection, row-strip re-OCR
for narrow columns (<15% width) – expected accuracy boost 60-70% → 85-90%.

Proposal A: New text-line straightness detector (Method D), quality gate
(rejects counterproductive corrections), 2-pass projection refinement,
higher confidence thresholds – expected manual dewarp reduction to <10%.

Proposal C: Fabric.js canvas editor with drag/drop, inline editing, undo/redo,
opacity slider, zoom, PDF/DOCX export endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 22:44:14 +01:00
Benjamin Admin
7eb03ca8d1 fix(ocr-pipeline): IndentationError in auto-mode deskew block
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m49s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 17s
The try/except block for the deskew step had 4 extra spaces of
indentation from a previous edit. Python rejected the file with
IndentationError at startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 13:21:49 +01:00
Benjamin Admin
50e1c964ee feat(klausur-service): OCR-Pipeline Optimierungen (Improvements 2-4)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m46s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 16s
## Improvement 2: VLM-basierter Dewarp
- Neuer Query-Parameter `method` für POST /sessions/{id}/dewarp
  Optionen: ensemble (default) | vlm | cv
- `_detect_shear_with_vlm()`: fragt qwen2.5vl:32b per Ollama nach
  dem Scherwinkel — gibt Zahlenwert + Konfidenz zurück
- `os`, `Query` zu ocr_pipeline_api.py Imports hinzugefügt
- `_apply_shear` aus cv_vocab_pipeline importiert

## Improvement 4: 3-Methoden Ensemble-Dewarp
- `_detect_shear_by_projection()`: Varianz-Sweep ±3° / 0.25°-Schritte
  auf horizontalen Text-Zeilen-Projektionen (~30ms)
- `_detect_shear_by_hough()`: Gewichteter Median über HoughLinesP
  auf Tabellen-Linien, Vorzeichen-Inversion (~20ms)
- `_ensemble_shear()`: Kombiniert alle 3 Methoden (conf >= 0.3),
  Ausreißer-Filter bei >1° Abweichung, Bonus bei Agreement <0.5°
- `dewarp_image()` nutzt jetzt alle 3 Methoden parallel,
  `use_ensemble: bool = True` für Rückwärtskompatibilität
- auto_dewarp Response enthält jetzt `detections`-Array

## Improvement 3: Vollautomatik-Endpoint
- POST /sessions/{id}/run-auto mit RunAutoRequest:
  from_step (1-6), ocr_engine, pronunciation,
  skip_llm_review, dewarp_method
- SSE-Streaming für alle 5+1 Schritte (deskew→dewarp→columns→rows→words→llm-review)
- Jeder Schritt: start / done / skipped / error Events
- Abschluss-Event: {steps_run, steps_skipped}
- LLM-Review-Fehler sind nicht-fatal (Pipeline läuft weiter)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 13:13:20 +01:00
Benjamin Admin
2e0f8632f8 feat(klausur): Handschrift entfernen + Klausur-HTR implementiert
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m49s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
Feature 1: Handschrift entfernen via OCR-Pipeline Session
- services/handwriting_detection.py: _detect_pencil() + target_ink Parameter
  ("all" | "colored" | "pencil") für gezielte Tinten-Erkennung
- ocr_pipeline_session_store.py: clean_png + handwriting_removal_meta Spalten
  (idempotentes ALTER TABLE in init_ocr_pipeline_tables)
- ocr_pipeline_api.py: POST /sessions/{id}/remove-handwriting Endpoint
  + "clean" zu valid_types für Image-Serving hinzugefügt

Feature 2: Klausur-HTR (Hochwertige Handschriftenerkennung)
- handwriting_htr_api.py: Neuer Router /api/v1/htr/recognize + /recognize-session
  Primary: qwen2.5vl:32b via Ollama, Fallback: trocr-large-handwritten
- services/trocr_service.py: size Parameter (base | large) für get_trocr_model()
  + run_trocr_ocr() - unterstützt jetzt trocr-large-handwritten
- main.py: HTR Router registriert

Config:
- docker-compose.yml: OLLAMA_HTR_MODEL, HTR_FALLBACK_MODEL
- .env.example: HTR Env-Vars dokumentiert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 12:04:26 +01:00
Benjamin Admin
606bef0591 fix(ocr-pipeline): overlap-based word assignment and empty row filtering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 1m14s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 17s
1. Word-to-column assignment now uses overlap-based matching instead of
   center-point matching. This fixes narrow page_ref columns losing
   their last digit (e.g. "p.59" → "p.5") when the digit's center
   falls slightly past the midpoint boundary into the next column.

2. Post-OCR empty row filter: rows where ALL cells have empty text
   are removed after OCR. This catches inter-row gaps that had stray
   Tesseract artifacts giving word_count > 0 but no actual content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 11:00:29 +01:00
Benjamin Admin
d6a8c1d821 fix(streaming): include page_ref columns in SSE metadata
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m52s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
The streaming word endpoint excluded page_ref from _skip_types,
causing sub-column splits to be lost in the meta event and final
grid_shape. Aligned _skip_types with build_cell_grid_streaming().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 07:48:07 +01:00
Benjamin Admin
6527beae03 fix(sub-columns): exclude header/footer words from alignment clustering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 24s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s
Header/footer words (page numbers, chapter titles) could pollute the
left-edge alignment bins and trigger false sub-column splits. Now
_detect_header_footer_gaps() runs early and its boundaries are passed
to _detect_sub_columns() to filter those words from clustering and
the split threshold check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 07:33:54 +01:00
Benjamin Admin
3904ddb493 fix(sub-columns): convert relative word positions to absolute coords for split
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 17s
Word 'left' values in ColumnGeometry.words are relative to the content
ROI (left_x), but geo.x is in absolute image coordinates. The split
position was computed from relative word positions and then compared
against absolute geo.x, resulting in negative widths and no splits on
real data. Pass left_x through to _detect_sub_columns to bridge the
two coordinate systems.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 19:16:13 +01:00
Benjamin Admin
1a246eb059 feat(ocr-pipeline): generic sub-column detection via left-edge clustering
Detects hidden sub-columns (e.g. page references like "p.59") within
already-recognized columns by clustering word left-edge positions and
splitting when a clear minority cluster exists. The sub-column is then
classified as page_ref and mapped to VocabRow.source_page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 18:18:02 +01:00
Benjamin Admin
c8981423d4 feat(ocr-pipeline): distinguish header/footer vs margin_top/margin_bottom
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
Check for actual ink content in detected top/bottom regions:
- 'header'/'footer' when text is present (e.g. title, page number)
- 'margin_top'/'margin_bottom' when the region is empty page margin

Also update all skip-type sets and color maps for the new types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 16:55:41 +01:00
Benjamin Admin
f615c5f66d feat(ocr-pipeline): generic header/footer detection via projection gap analysis
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m48s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 16s
Replace the trivial top_y/bottom_y threshold check with horizontal
projection gap analysis that finds large whitespace gaps separating
header/footer content from the main body. This correctly detects
headers (e.g. "VOCABULARY" banners) and footers (page numbers) even
when _find_content_bounds includes them in the content area.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 16:13:48 +01:00
Benjamin Admin
a052f73de3 fix(ocr-pipeline): pass left_x/right_x to classify_column_types in API path
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m45s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s
The ocr_pipeline_api.py code path called classify_column_types without
left_x/right_x, so margin regions were never created. Also add logging
to _build_margin_regions for debugging.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 15:42:39 +01:00
Benjamin Admin
e718353d9f feat(ocr-pipeline): 6 systematic improvements for robustness, performance & UX
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 21s
1. Unit tests: 76 new parametrized tests for noise filter, phonetic detection,
   cell text cleaning, and row merging (116 total, all green)
2. Continuation-row merge: detect multi-line vocab entries where text wraps
   (lowercase EN + empty DE) and merge into previous entry
3. Empty DE fallback: secondary PSM=7 OCR pass for cells missed by PSM=6
4. Batch-OCR: collect empty cells per column, run single Tesseract call on
   column strip instead of per-cell (~66% fewer calls for 3+ empty cells)
5. StepReconstruction UI: font scaling via naturalHeight, empty EN/DE field
   highlighting, undo/redo (Ctrl+Z), per-cell reset button
6. Session reprocess: POST /sessions/{id}/reprocess endpoint to re-run from
   any step, with reprocess button on completed pipeline steps

Also fixes pre-existing dewarp_image tuple unpacking bug in run_cv_pipeline
and updates dewarp tests to match current (image, info) return signature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 14:46:38 +01:00
Benjamin Admin
dbf0db0c13 feat(ocr-pipeline): improve LLM review UI + add reconstruction step
StepLlmReview: Show full vocab table with image overlay, row-level
status tracking (pending/active/reviewed/corrected/skipped), and
auto-scroll during SSE streaming. Load previous results on mount.

StepReconstruction: New step 7 with editable text fields at original
bbox positions over dewarped image. Zoom controls, tab navigation,
color-coded columns, save to backend.

Backend: Add POST /sessions/{id}/reconstruction endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 12:19:21 +01:00