Commit Graph

464 Commits

Author SHA1 Message Date
Benjamin Admin
b91f799ccf fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 26s
Seiten mit Info-Boxen (andere Zeilenhoehe) fuehren dazu, dass _regularize_row_grid
die Zeilenpositionen verzerrt. Neuer skip_regularize Parameter nutzt stattdessen
die gap-basierten Zeilen, die der tatsaechlichen Seitengeometrie folgen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 08:29:06 +01:00
Benjamin Admin
2df2a01a8b feat: Echtes Overlay — Text direkt ueber dem Originalbild
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m11s
CI / test-python-agent-core (push) Successful in 25s
CI / test-nodejs-website (push) Successful in 26s
Statt Side-by-Side wird der erkannte Text jetzt direkt ueber das
Originalbild gelegt. Textfarbe (rot/blau/schwarz) und Deckkraft
per Slider einstellbar fuer einfache visuelle Fehlersuche.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 00:25:11 +01:00
Benjamin Admin
e2ad93fd57 fix: Word-Erkennung ohne Spalten ermoeglichen (Full-Page Pseudo-Column)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m14s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 22s
Wenn column_result fehlt (z.B. OCR Overlay Pipeline), wird automatisch
eine einzelne ganzseitige Pseudo-Spalte erzeugt statt einen Fehler zu werfen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 00:16:31 +01:00
Benjamin Admin
2cbdfc56f3 feat: OCR Overlay — ganzseitige Rekonstruktion ohne Spaltenerkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 33s
CI / test-python-klausur (push) Failing after 2m6s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 28s
Neue Route /ai/ocr-overlay mit vereinfachter 7-Schritt-Pipeline
(Orientierung, Begradigung, Entzerrung, Zuschnitt, Zeilen, Woerter, Overlay).
Nutzt bestehende Step-Komponenten, ueberspringt Spalten/LLM-Review/Ground-Truth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 00:08:05 +01:00
Benjamin Admin
840918df2a fix: Originalbild im Overlay nicht extra drehen (Orientierung bereits im Cropped-Bild)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 33s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 22s
Das cropped image ist bereits orientierungskorrigiert. Die zusaetzliche
180°-Rotation ueber imageRotation drehte das Bild falsch herum.
imageRotation wird weiter fuer Pixel-Matching genutzt, aber nicht mehr
fuer die Bildanzeige.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 23:25:20 +01:00
Benjamin Admin
eb3fc05cdc fix: Box-Zone Clamping nach Box-Mitte statt Cell-Center entscheiden
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 34s
CI / test-python-klausur (push) Failing after 2m8s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 21s
Euro/Badge-Zeilen hatten ihren Center innerhalb der Box-Zone, weshalb
das Clamping nicht griff. Jetzt wird anhand der Box-Mitte entschieden
ob eine Zelle nach oben (clamp height) oder unten (push y) gehoert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 23:10:51 +01:00
Benjamin Admin
9dbb5fa708 fix: useMemo vor Early Returns verschieben (Rules of Hooks)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m10s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 25s
boxZonesPct useMemo war nach bedingten Returns platziert, was gegen
Reacts Rules of Hooks verstoesst und einen Client-Side Crash ausloest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 22:57:25 +01:00
Benjamin Admin
f468c30112 fix: Zellen an Box-Zone clampen im Overlay-Modus (keine Ueberlappung)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 23s
Zellen oberhalb der Box werden in der Hoehe begrenzt, Zellen unterhalb
werden nach unten verschoben. Sub-Session-Zellen bleiben unveraendert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 22:52:08 +01:00
Benjamin Admin
618c82ef42 fix: Zeilen an Box-Grenze nicht mehr abschneiden (border_thickness Margin)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 25s
- detect_rows: Content-Strips nutzen jetzt box_ranges_inner (geschrumpft
  um border_thickness, min 5px) statt der vollen Box-Range
- detect_words: _row_in_box Filter nutzt ebenfalls inner Range
- Dadurch wird die letzte Zeile oberhalb einer Box nicht mehr
  faelschlicherweise der Box zugeordnet und ausgeschlossen

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:44:02 +01:00
Benjamin Admin
080fcb5e3c feat: 180°-Rotation fuer Pixel-Matching im Overlay-Modus
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 35s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 23s
- usePixelWordPositions: neuer rotation-Parameter (0 | 180)
- Bei 180°: Bild auf Canvas rotiert, Zell-Koordinaten transformiert,
  Cluster-Positionen zurueck-gespiegelt
- StepReconstruction: 180°-Toggle-Button in Overlay-Toolbar
- Default 180° bei Parent-Sessions mit Boxen
- Linkes Originalbild wird ebenfalls CSS-rotiert

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:19:14 +01:00
Benjamin Admin
bcd97e7d78 feat: Overlay-Modus fuer ganzseitige Tabellenrekonstruktion mit Pixel-Positionierung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 24s
- usePixelWordPositions Hook extrahiert (shared zwischen StepLlmReview und StepReconstruction)
- StepReconstruction: neuer Overlay-Modus mit 50/50 Layout (Original + Rekonstruktion)
- Sub-Session-Zellen werden in Parent-Koordinaten konvertiert und zusammengefuehrt
- Spalten-/Zeilenlinien und Box-Zone-Markierung aus column_result/row_result
- Schriftgroesse-Slider und Bold-Toggle fuer Overlay
- StepLlmReview: ~140 Zeilen Pixel-Analyse durch Hook ersetzt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:18:47 +01:00
Benjamin Admin
7f8615b8c1 fix: Schriftgroesse auf haeufigsten Wert (Mode) normalisieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m26s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 23s
Alle Wortgruppen bekommen die gleiche fontRatio (gerundet auf 0.02),
basierend auf der haeufigsten berechneten Groesse. Ueberschriften
und Fliesstext haben damit einheitliche Schriftgroesse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 14:28:23 +01:00
Benjamin Admin
2055597ba4 fix: Pixel-Overlay fuer alle Zellen + Auto-Schriftgroesse + kein contentEditable
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 27s
- Auch Single-Group-Zellen (z.B. Ueberschriften) per Pixel positionieren
- Auto font-size per canvas measureText (Text fuellt Cluster-Breite aus)
- contentEditable entfernt (pointer-events-none), Tabelle zum Editieren
- overflow:visible statt hidden verhindert Klick-Shift-Bug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 13:25:16 +01:00
Benjamin Admin
ad28f9420a feat: Pixel-basierte Wortpositionierung im Overlay
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m6s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
Analysiert Schwarzpixel-Verteilung auf dem Originalbild per Canvas.
Findet Wort-Cluster pro Zeile und positioniert erkannte Textgruppen
an den exakten Pixel-Positionen. Monospace-Font zurueck auf Sans-Serif.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 12:36:57 +01:00
Benjamin Admin
6314e60464 fix: Monospace-Schrift im Overlay fuer korrekte Leerzeichen-Ausrichtung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m7s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 22s
column_text Zellen enthalten proportionale Leerzeichen zur Ausrichtung.
Mit Monospace-Font stehen Waehrungswerte korrekt untereinander.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:50:53 +01:00
Benjamin Admin
d530738b12 fix: useMemo vor early returns verschieben (React Hooks Regel)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 21s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:35:59 +01:00
Benjamin Admin
ca7d44e543 fix: Overlay spaltenweise Ausrichtung per Median-Snap
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 33s
CI / test-python-klausur (push) Failing after 2m7s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 22s
Alle Zellen einer Spalte bekommen die gleiche x-Position (Median)
damit Werte vertikal korrekt untereinander stehen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:20:06 +01:00
Benjamin Admin
e44e319ccf feat: Text-Overlay Rekonstruktion in StepLlmReview
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 33s
CI / test-python-klausur (push) Failing after 2m13s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 24s
Neuer Overlay-Modus zeigt OCR-Text per bbox_pct ueber weissem
Hintergrund neben dem Originalbild. Steuerelemente fuer Schriftgroesse,
Einrueckung und Bold. Inline-Editing per contentEditable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:07:11 +01:00
Benjamin Admin
6bb023bdc1 fix: vocab_entries fuer column_text Sub-Sessions generieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m8s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 23s
_cells_to_vocab_entries wurde nur bei is_vocab (column_en/column_de)
aufgerufen. Fuer Sub-Sessions mit column_text wurden keine Eintraege
erzeugt, daher blieb die Korrektur-Tabelle leer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:28:27 +01:00
Benjamin Admin
13553fc5e6 fix: column_text Typ fuer Sub-Sessions in Korrektur-Tabelle
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s
_cells_to_vocab_entries kannte column_text nicht, daher wurden
keine Eintraege erzeugt. Jetzt mappt column_text -> 'text' Feld.

Frontend: column_text in FIELD_LABELS/COL_TYPE_TO_FIELD/COL_TYPE_COLOR.
Label: "Tabelle" statt "Vokabeltabelle" fuer Sub-Sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:48:40 +01:00
Benjamin Admin
964c916a81 fix: _clean_cell_text entfernt Waehrungssymbole am Zeilenende
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 24s
_is_noise_tail_token() stuft rein nicht-alphabetische Tokens wie
€0.50, £1, €2.50 als OCR-Noise ein und entfernt sie. Zusaetzlich
zerstoert ' '.join(tokens) das proportionale Spacing.

Fuer Single-Column Sub-Sessions wird _clean_cell_text uebersprungen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:41:25 +01:00
Benjamin Admin
13510b62cc debug: Log-Level auf INFO fuer Sub-Session Zellinhalte
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m3s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:33:56 +01:00
Benjamin Admin
3a791179af debug: Logging fuer Sub-Session Woertererkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
Zeigt low-confidence Woerter (conf<30) und Zellinhalte pro Zeile,
um fehlende Euro/Pfund-Betraege zu diagnostizieren.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:31:34 +01:00
Benjamin Admin
f65bd11919 fix: Sub-Session Zeilenerkennung nutzt Word-Grouping statt Gap-Detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 23s
Gap-basierte Erkennung findet bei kleinen Box-Bildern zu wenige Gaps
und mergt Zeilen (7 raw gaps -> 4 validated -> nur 3 rows statt 6).
Sub-Sessions nutzen jetzt direkt _build_rows_from_word_grouping(),
das Woerter nach Y-Position clustert — robuster fuer komplexe Box-Layouts.

Zusaetzlich: alle zones=None Crashes gefixt (replace_all .get("zones") or []).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 09:05:24 +01:00
Benjamin Admin
785b4d7655 fix: zones=None crash bei Sub-Session Zeilenerkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
column_result.get("zones", []) gibt None zurueck wenn der Key mit
Wert None existiert. Geaendert zu .get("zones") or [].

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:50:58 +01:00
Benjamin Admin
2716495250 fix: Sub-Session Zeilenerkennung — Tesseract+inv im Spalten-Schritt cachen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
Bisher wurden _word_dicts, _inv und _content_bounds fuer Sub-Sessions
nicht gecacht, sodass detect_rows auf detect_column_geometry() zurueckfiel.
Das konnte bei kleinen Box-Bildern mit <5 Woertern fehlschlagen.

Jetzt laeuft Tesseract + Binarisierung direkt im Pseudo-Spalten-Block,
und die Intermediates werden gecacht. Zusaetzlich ausfuehrliche Kommentare
zur Zeilenerkennung (detect_row_geometry, _regularize_row_grid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:43:26 +01:00
Benjamin Admin
23b7840ea7 feat: Full-Row OCR mit Spacing fuer Box-Sub-Sessions
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m16s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 22s
Sub-Sessions ueberspringen Spaltenerkennung und nutzen stattdessen eine
Pseudo-Spalte ueber die volle Breite. Text wird mit proportionalem
Spacing aus Wort-Positionen rekonstruiert, um raeumliches Layout zu erhalten.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 08:28:29 +01:00
Benjamin Admin
34adb437d0 fix: Bild-Endpoints fallen auf original zurueck fuer Sub-Sessions
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m3s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s
Alle Bild-Endpoints (cropped, columns-overlay, rows-overlay,
words-overlay) suchten nur nach cropped/dewarped. Sub-Sessions haben
nur ein original-Bild. Neue Hilfsfunktion _get_base_image_png() mit
Fallback-Kette: cropped > dewarped > original.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:30:38 +01:00
Benjamin Admin
ceaef9c6a6 fix: Sub-Sessions original_bgr als cropped_bgr promoten
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m22s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 18s
Spalten-/Zeilen-/Woerter-Erkennung suchen nach cropped_bgr oder
dewarped_bgr. Bei Sub-Sessions existiert nur original_bgr (der
Box-Ausschnitt). Jetzt wird original_bgr automatisch als cropped_bgr
gesetzt, sowohl im Cache-Aufbau als auch bei der Erstellung.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:57:39 +01:00
Benjamin Admin
9047339f0d fix: Sub-Sessions starten direkt bei Spalten, ueberspringe Vorverarbeitung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m13s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 21s
Box-Sub-Sessions haben bereits ein zugeschnittenes Bild. Orientierung,
Begradigung, Entzerrung und Crop werden uebersprungen (skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:51:16 +01:00
Benjamin Admin
2592ef233b feat: Frontend Sub-Sessions (Boxen) in OCR-Pipeline UI
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
- BoxSessionTabs: Tab-Leiste zum Wechsel zwischen Haupt- und Box-Sessions
- StepColumnDetection: Box-Info + "Box-Sessions erstellen" Button
- page.tsx: Session-Wechsel, Sub-Session-State, auto-return nach Abschluss
- types.ts: SubSession, PageZone, erweiterte SessionInfo/ColumnResult

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:33:59 +01:00
Benjamin Admin
256efef3ea feat: Box-Zonen durch gesamte Pipeline + Sub-Sessions fuer Box-Inhalt
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
- Rote semi-transparente Box-Markierung in allen Overlays (Spalten, Zeilen, Woerter)
- Zeilenerkennung: Combined-Image-Ansatz schliesst Box-Bereiche aus
- Woerter-Erkennung: Zeilen innerhalb von Box-Zonen werden gefiltert
- Sub-Sessions: parent_session_id/box_index in DB-Schema
- POST /sessions/{id}/create-box-sessions erstellt Sub-Sessions aus Box-Regionen
- Session-Info zeigt Sub-Sessions bzw. Parent-Verknuepfung
- Sessions-Liste blendet Sub-Sessions per Default aus
- Rekonstruktion: Fabric-JSON merged Sub-Session-Zellen an Box-Positionen
- Save-Reconstruction routet box{N}_* Updates an Sub-Sessions
- GET /sessions/{id}/vocab-entries/merged fuer zusammengefuehrte Eintraege

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:24:34 +01:00
Benjamin Admin
4610137ecc fix: Box-Bereiche aus Bild entfernen statt pro Zone separat Spalten erkennen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Content-Streifen oberhalb/unterhalb von Boxen werden zu einem Bild zusammengefügt,
Spaltenerkennung läuft einmal auf dem kombinierten Bild. Entfernt Step 5c
(suspicion-based gap alignment), da der neue Ansatz das Problem an der Wurzel löst.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:03:05 +01:00
Benjamin Admin
fb46450802 fix: Alignment-Validierung nur fuer verdaechtige Gaps (>2x Median-Breite)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 20s
Vorher wurden alle internen Gaps geprueft, was echte Spaltentrennungen
(EN→DE) faelschlicherweise entfernte. Jetzt werden nur Gaps geprueft,
die eine unverhaeltnismaessig breite rechte Spalte erzeugen wuerden
(>2x Median-Spaltenbreite). Schwelle auf 15% gesenkt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 16:27:14 +01:00
Benjamin Admin
11126c4436 fix: UnboundLocalError edge_tolerance in Step 5c
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
Variable wurde vor ihrer Definition in Step 7 referenziert.
Eigene margin_thresh Variable fuer Step 5c eingefuehrt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 16:18:47 +01:00
Benjamin Admin
7a0ded7562 fix: Left-Edge-Alignment-Validierung fuer Spalten-Gaps
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m7s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 19s
Interiore Gaps werden jetzt geprueft: rechts des Gaps muessen
mindestens 25% der Woerter eine gemeinsame linke Kante teilen.
Verhindert falsche Spaltentrennungen innerhalb breiter Spalten
(z.B. Example-Spalte mit kurzen und langen Eintraegen).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 16:11:58 +01:00
Benjamin Admin
04be24a89e fix: fehlende Imports RAPIDOCR_AVAILABLE und _RE_ALPHA in cv_cell_grid.py
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s
Weitere NameError-Probleme vom Modul-Refactoring: beide Symbole
werden in cv_cell_grid.py benutzt, sind aber in cv_ocr_engines.py
definiert und waren nicht importiert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:59:24 +01:00
Benjamin Admin
cf9dde9876 fix: _group_words_into_lines nach cv_ocr_engines.py verschieben
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 21s
Funktion war nur in cv_review.py definiert, wurde aber auch in
cv_ocr_engines.py und cv_layout.py benutzt — NameError zur Laufzeit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:24:56 +01:00
Benjamin Admin
60c4138660 fix: _MIN_WORD_CONF als Modul-Konstante statt lokale Variable
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 20s
NameError in build_cell_grid_v2 weil _MIN_WORD_CONF nur in
_ocr_cell_crop und build_cell_grid lokal definiert war.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:12:02 +01:00
Benjamin Admin
7005b18561 feat: generische Box-Erkennung fuer zonenbasierte Spaltenerkennung
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 19s
- Neue Datei cv_box_detect.py: 2-Stufen-Algorithmus (Linien + Farbe)
- DetectedBox/PageZone Dataclasses in cv_vocab_types.py
- detect_column_geometry_zoned() in cv_layout.py
- API-Endpoints erweitert: zones/boxes_detected im column_result
- Overlay-Funktionen zeichnen Box-Grenzen als gestrichelte Rechtecke
- Fix: numpy array or-Verknuepfung an 7 Stellen in ocr_pipeline_api.py
- 12 Unit-Tests fuer Box-Erkennung und Zone-Splitting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:06:23 +01:00
Benjamin Admin
e60254bc75 fix: alle Post-Crop-Schritte nutzen cropped statt dewarped Bild
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 24s
Spalten-, Zeilen-, Woerter-Overlay und alle nachfolgenden Steps
(LLM-Review, Rekonstruktion) lesen jetzt image/cropped mit Fallback
auf image/dewarped. Tests fuer page_crop.py hinzugefuegt (25 Tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 09:10:10 +01:00
Benjamin Admin
156a818246 refactor: Crop nach Deskew/Dewarp verschieben + content-basierter Buchscan-Crop
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m56s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 17s
Pipeline-Reihenfolge neu: Orientierung → Begradigung → Entzerrung → Zuschneiden → Spalten...
Crop arbeitet jetzt auf dem bereits geraden Bild, was bessere Ergebnisse liefert.

page_crop.py komplett ersetzt: Adaptive Threshold + 4-Kanten-Erkennung
(Buchruecken-Schatten links, Ink-Projektion fuer alle Raender) statt
Otsu + groesste Kontur.

Backend: Step-Nummern, Input-Bilder, Reprocess-Kaskade angepasst.
Frontend: PIPELINE_STEPS umgeordnet, Switch-Cases, Vorher-Bilder aktualisiert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:52:11 +01:00
Benjamin Admin
eb45bb4879 fix: numpy array or-Verknuepfung in Crop/Deskew + ImageCompareView Labels
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m17s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 24s
- orientation_crop_api.py: `array or array` durch `is not None` ersetzt
  (ValueError bei numpy Arrays)
- ocr_pipeline_api.py: gleicher Fix fuer Deskew-Fallback-Kette
- ImageCompareView.tsx: Fallback-Text nutzt rightLabel statt "Begradigung"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:02:44 +01:00
Benjamin Admin
2763631711 feat: Orientierung + Zuschneiden als Schritte 1-2 in OCR-Pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
Zwei neue Wizard-Schritte vor Begradigung:
- Step 1: Orientierungserkennung (0/90/180/270° via Tesseract OSD)
- Step 2: Seitenrand-Erkennung und Zuschnitt (Scannerraender entfernen)

Backend:
- orientation_crop_api.py: POST /orientation, POST /crop, POST /crop/skip
- page_crop.py: detect_and_crop_page() mit Format-Erkennung (A4/A5/Letter)
- Session-Store: orientation_result, crop_result Felder
- Pipeline nutzt zugeschnittenes Bild fuer Deskew/Dewarp

Frontend:
- StepOrientation.tsx: Upload + Auto-Orientierung + Vorher/Nachher
- StepCrop.tsx: Auto-Crop + Format-Badge + Ueberspringen-Option
- Pipeline-Stepper: 10 Schritte (war 8)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:55:23 +01:00
Benjamin Admin
9a5a35bff1 refactor: cv_vocab_pipeline.py in 6 Module aufteilen (8163 → 6 + Fassade)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Monolithische 8163-Zeilen-Datei aufgeteilt in fokussierte Module:
- cv_vocab_types.py (156 Z.): Dataklassen, Konstanten, IPA, Feature-Flags
- cv_preprocessing.py (1166 Z.): Bild-I/O, Orientierung, Deskew, Dewarp
- cv_layout.py (3036 Z.): Dokumenttyp, Spalten, Zeilen, Klassifikation
- cv_ocr_engines.py (1282 Z.): OCR-Engines, Vocab-Postprocessing, Text-Cleaning
- cv_cell_grid.py (1510 Z.): Cell-Grid v2+Legacy, Vocab-Konvertierung
- cv_review.py (1184 Z.): LLM/Spell Review, Pipeline-Orchestrierung

cv_vocab_pipeline.py ist jetzt eine Re-Export-Fassade (35 Z.) —
alle bestehenden Imports bleiben unveraendert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:46:47 +01:00
Benjamin Admin
931ab92c92 feat: Orientierungserkennung in OCR-Pipeline-Deskew integrieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 21s
detect_and_fix_orientation() wird jetzt vor dem Deskew-Schritt in der
OCR-Pipeline ausgefuehrt, sodass 90/180/270°-gedrehte Scans automatisch
korrigiert werden. Frontend zeigt Orientierungskorrektur als Info-Banner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:31:36 +01:00
Benjamin Admin
853638b03c Revert "fix: _split_broad_columns nur bei maximal 1 breiter Spalte ausfuehren"
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
This reverts commit d98359fceb.
2026-03-07 22:55:24 +01:00
Benjamin Admin
d98359fceb fix: _split_broad_columns nur bei maximal 1 breiter Spalte ausfuehren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m26s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s
Wenn bereits 2+ breite Content-Spalten existieren, ist das Layout
wahrscheinlich korrekt in EN/DE getrennt. Split wird nur ausgefuehrt
wenn eine einzelne breite Spalte EN+DE kombiniert enthaelt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:51:14 +01:00
Benjamin Admin
e1ae5d5fa9 fix: Edge-Gaps in _split_broad_columns ignorieren + return-Tuple bei leerem Ergebnis
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 16s
Gaps die den Spaltenrand beruehren (Margins) werden jetzt ausgeschlossen,
nur interne Gaps werden als Split-Kandidaten betrachtet. Behebt das
Problem dass trailing whitespace faelschlich als groesster Gap gewaehlt
wurde. Early-return in _run_ocr_pipeline_for_page gibt jetzt korrekt
([], rotation) statt [] zurueck.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 22:16:29 +01:00
Benjamin Admin
4e8ea77140 fix: leere Spalten als strukturell behandeln + 2-Spalten-Layout korrekt labeln
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
Spalten mit <=2 Woertern und <15% Breite werden jetzt als column_marker
statt als content-Spalte klassifiziert. Bei 2 breiten Content-Spalten
wird die rechte als column_example statt column_de gelabelt, da die
linke Spalte EN+DE kombiniert enthaelt.
OSD-Zoom von 1.0 auf 2.0 erhoeht fuer zuverlaessigere Orientierungserkennung.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 19:35:21 +01:00