breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	846292f632	fix: rewrite Kombi merge with row-based sequence alignment Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 19s Details Replaces position-based word matching with row-based sequence alignment to fix doubled words and cross-line averaging in Kombi-Modus. New algorithm: 1. Group words into rows by Y-position clustering 2. Match rows between engines by vertical center proximity 3. Within each row: walk both sequences left-to-right, deduplicating 4. Unmatched rows kept as-is Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 08:45:03 +01:00
Benjamin Admin	4280298e02	fix: add _deduplicate_words safety net to Kombi merge Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m5s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 19s Details Even after multi-criteria matching, near-duplicate words can slip through (same text, centers within 30px horizontal / 15px vertical). The new _deduplicate_words() removes these, keeping the higher-confidence copy. Regression test with real session data (row 2 with 145 near-dupes) confirms no duplicates remain after merge + deduplication. Tests: 37 → 45 (added TestDeduplicateWords, TestMergeRealWorldRegression). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 08:27:45 +01:00
Benjamin Admin	4f2fb0e94c	fix: Kombi-Modus merge now deduplicates same words from both engines Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m13s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 22s Details The merge algorithm now uses 3 criteria instead of just IoU > 0.3: 1. IoU > 0.15 (relaxed threshold) 2. Center proximity < word height AND same row 3. Text similarity > 0.7 AND same row This prevents doubled overlapping words when both PaddleOCR and Tesseract find the same word at similar positions. Unique words from either engine (e.g. bullets from Tesseract) are still added. Tests expanded: 19 → 37 (added _box_center_dist, _text_similarity, _words_match tests + deduplication regression test). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 08:11:31 +01:00
Benjamin Admin	e9ccd1e35c	feat: add Kombi-Modus (PaddleOCR + Tesseract) for OCR Overlay Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 35s Details CI / test-go-edu-search (push) Successful in 33s Details CI / test-python-klausur (push) Failing after 2m20s Details CI / test-python-agent-core (push) Successful in 22s Details CI / test-nodejs-website (push) Successful in 41s Details Runs both OCR engines on the preprocessed image and merges results: word boxes matched by IoU, coordinates averaged by confidence weight. Unmatched Tesseract words (bullets, symbols) are added for better coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:05:50 +01:00
Benjamin Admin	8349c28f54	fix: paddle_direct reuses build_grid_from_words for correct overlay Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m22s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 23s Details Replaces custom _paddle_words_to_grid_cells with the proven build_grid_from_words from cv_words_first.py — same function the regular pipeline uses with PaddleOCR. Handles phrase splitting, column clustering, and produces cells with word_boxes that the slide/cluster positioning hooks expect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 17:19:52 +01:00
Benjamin Admin	71a1b5f058	fix: paddle_direct groups words per row (matching _build_cells format) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 34s Details CI / test-go-edu-search (push) Successful in 34s Details CI / test-python-klausur (push) Failing after 2m11s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 24s Details One cell per row with all words as word_boxes instead of one cell per word. Gives OverlayReconstruction a row-spanning bbox_pct for correct font sizing and per-word positions for slide/cluster placement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 17:10:10 +01:00
Benjamin Admin	c743a38eaf	fix: Paddle Direct keeps preprocessing (orient/deskew/dewarp/crop) Some checks failed CI / nodejs-lint (push) Has been cancelled Details CI / go-lint (push) Has been cancelled Details CI / python-lint (push) Has been cancelled Details CI / test-go-school (push) Has been cancelled Details CI / test-go-edu-search (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details Uses the cropped/dewarped image instead of the original so the overlay shows the correctly oriented page. 5 steps instead of 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 16:56:18 +01:00
Benjamin Admin	90c1efd9b0	feat: Paddle Direct — 1-click OCR without deskew/dewarp/crop Some checks failed CI / go-lint (push) Has been cancelled Details CI / python-lint (push) Has been cancelled Details CI / nodejs-lint (push) Has been cancelled Details CI / test-go-school (push) Has been cancelled Details CI / test-go-edu-search (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details New 2-step mode (Upload → PaddleOCR+Overlay) alongside the existing 7-step pipeline. Backend endpoint runs PaddleOCR on the original image and clusters words into rows/cells directly. Frontend adds a mode toggle and PaddleDirectStep component. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 16:41:55 +01:00
Benjamin Admin	a6069631cc	feat: PaddleOCR Remote-Engine (PP-OCRv5 Latin auf Hetzner x86_64) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 31s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m7s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 21s Details PaddleOCR als neue engine=paddle Option in der OCR-Pipeline. Microservice auf Hetzner (paddleocr-service/), async HTTP-Client (paddleocr_remote.py), Frontend-Dropdown, automatisch words_first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 09:31:22 +01:00
Benjamin Admin	ced5bb3dd3	feat: Words-First Grid Builder (bottom-up alternative zu cell_grid_v2) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 54s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m31s Details CI / test-python-agent-core (push) Successful in 23s Details CI / test-nodejs-website (push) Successful in 32s Details Neuer Algorithmus in cv_words_first.py: Clustert Tesseract word_boxes direkt zu Spalten (X-Gap) und Zeilen (Y-Proximity), baut Zellen an Schnittpunkten. Kein Spalten-/Zeilenerkennung noetig. - cv_words_first.py: _cluster_columns, _cluster_rows, _build_cells, build_grid_from_words - ocr_pipeline_api.py: grid_method Parameter (v2\|words_first) im /words Endpoint - StepWordRecognition.tsx: Dropdown Toggle fuer Grid-Methode - OCR-Pipeline.md: Doku v4.3.0 mit Words-First Algorithmus - 15 Unit-Tests fuer cv_words_first Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 06:46:05 +01:00
Benjamin Admin	2f51ac617f	feat: IPA-Lautschrift in Cell-Texte einfuegen (fuer Overlay-Modus) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 34s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m5s Details CI / test-python-agent-core (push) Successful in 23s Details CI / test-nodejs-website (push) Successful in 22s Details fix_cell_phonetics() ersetzt fehlerhafte IPA-Klammern UND fuegt fehlende Lautschrift fuer englische Woerter ein (z.B. badge, film, challenge, profit). Wird auf alle Zellen mit col_type column_en/column_text angewandt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 15:47:26 +01:00
Benjamin Admin	41e47baf13	fix: skip_heal_gaps Parameter an Stream-Generator durchreichen Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m6s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 28s Details NameError behoben: skip_heal_gaps war nicht im Scope der _word_batch_stream_generator Funktion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 09:11:16 +01:00
Benjamin Admin	8a60f4bf30	fix: Overlay-Zellen ohne _heal_row_gaps positionieren (skip_heal_gaps) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m12s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 21s Details _heal_row_gaps verschiebt Zell-Positionen nach Entfernung von Artefakt-Zeilen, was im Overlay zu sichtbarem Versatz fuehrt (z.B. 23px bei "badge"). Neuer skip_heal_gaps Parameter in build_cell_grid_v2 und words-Endpoint behaelt die exakten Zeilen-Positionen bei. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 08:59:50 +01:00
Benjamin Admin	e3ee1de790	Revert "fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)" Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m2s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 24s Details This reverts commit `b91f799ccf`.	2026-03-11 08:44:07 +01:00
Benjamin Admin	b91f799ccf	fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 26s Details Seiten mit Info-Boxen (andere Zeilenhoehe) fuehren dazu, dass _regularize_row_grid die Zeilenpositionen verzerrt. Neuer skip_regularize Parameter nutzt stattdessen die gap-basierten Zeilen, die der tatsaechlichen Seitengeometrie folgen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 08:29:06 +01:00
Benjamin Admin	e2ad93fd57	fix: Word-Erkennung ohne Spalten ermoeglichen (Full-Page Pseudo-Column) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 34s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m14s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 22s Details Wenn column_result fehlt (z.B. OCR Overlay Pipeline), wird automatisch eine einzelne ganzseitige Pseudo-Spalte erzeugt statt einen Fehler zu werfen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 00:16:31 +01:00
Benjamin Admin	618c82ef42	fix: Zeilen an Box-Grenze nicht mehr abschneiden (border_thickness Margin) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 25s Details - detect_rows: Content-Strips nutzen jetzt box_ranges_inner (geschrumpft um border_thickness, min 5px) statt der vollen Box-Range - detect_words: _row_in_box Filter nutzt ebenfalls inner Range - Dadurch wird die letzte Zeile oberhalb einer Box nicht mehr faelschlicherweise der Box zugeordnet und ausgeschlossen Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 17:44:02 +01:00
Benjamin Admin	6bb023bdc1	fix: vocab_entries fuer column_text Sub-Sessions generieren Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m8s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 23s Details _cells_to_vocab_entries wurde nur bei is_vocab (column_en/column_de) aufgerufen. Fuer Sub-Sessions mit column_text wurden keine Eintraege erzeugt, daher blieb die Korrektur-Tabelle leer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:28:27 +01:00
Benjamin Admin	3a791179af	debug: Logging fuer Sub-Session Woertererkennung Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 31s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details Zeigt low-confidence Woerter (conf<30) und Zellinhalte pro Zeile, um fehlende Euro/Pfund-Betraege zu diagnostizieren. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 09:31:34 +01:00
Benjamin Admin	f65bd11919	fix: Sub-Session Zeilenerkennung nutzt Word-Grouping statt Gap-Detection Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m0s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 23s Details Gap-basierte Erkennung findet bei kleinen Box-Bildern zu wenige Gaps und mergt Zeilen (7 raw gaps -> 4 validated -> nur 3 rows statt 6). Sub-Sessions nutzen jetzt direkt _build_rows_from_word_grouping(), das Woerter nach Y-Position clustert — robuster fuer komplexe Box-Layouts. Zusaetzlich: alle zones=None Crashes gefixt (replace_all .get("zones") or []). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 09:05:24 +01:00
Benjamin Admin	785b4d7655	fix: zones=None crash bei Sub-Session Zeilenerkennung Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 20s Details column_result.get("zones", []) gibt None zurueck wenn der Key mit Wert None existiert. Geaendert zu .get("zones") or []. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:50:58 +01:00
Benjamin Admin	2716495250	fix: Sub-Session Zeilenerkennung — Tesseract+inv im Spalten-Schritt cachen Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m9s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 20s Details Bisher wurden _word_dicts, _inv und _content_bounds fuer Sub-Sessions nicht gecacht, sodass detect_rows auf detect_column_geometry() zurueckfiel. Das konnte bei kleinen Box-Bildern mit <5 Woertern fehlschlagen. Jetzt laeuft Tesseract + Binarisierung direkt im Pseudo-Spalten-Block, und die Intermediates werden gecacht. Zusaetzlich ausfuehrliche Kommentare zur Zeilenerkennung (detect_row_geometry, _regularize_row_grid). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:43:26 +01:00
Benjamin Admin	23b7840ea7	feat: Full-Row OCR mit Spacing fuer Box-Sub-Sessions Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m16s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 22s Details Sub-Sessions ueberspringen Spaltenerkennung und nutzen stattdessen eine Pseudo-Spalte ueber die volle Breite. Text wird mit proportionalem Spacing aus Wort-Positionen rekonstruiert, um raeumliches Layout zu erhalten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 08:28:29 +01:00
Benjamin Admin	34adb437d0	fix: Bild-Endpoints fallen auf original zurueck fuer Sub-Sessions Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m3s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 20s Details Alle Bild-Endpoints (cropped, columns-overlay, rows-overlay, words-overlay) suchten nur nach cropped/dewarped. Sub-Sessions haben nur ein original-Bild. Neue Hilfsfunktion _get_base_image_png() mit Fallback-Kette: cropped > dewarped > original. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 23:30:38 +01:00
Benjamin Admin	ceaef9c6a6	fix: Sub-Sessions original_bgr als cropped_bgr promoten Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m22s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 18s Details Spalten-/Zeilen-/Woerter-Erkennung suchen nach cropped_bgr oder dewarped_bgr. Bei Sub-Sessions existiert nur original_bgr (der Box-Ausschnitt). Jetzt wird original_bgr automatisch als cropped_bgr gesetzt, sowohl im Cache-Aufbau als auch bei der Erstellung. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 22:57:39 +01:00
Benjamin Admin	256efef3ea	feat: Box-Zonen durch gesamte Pipeline + Sub-Sessions fuer Box-Inhalt Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m0s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 19s Details - Rote semi-transparente Box-Markierung in allen Overlays (Spalten, Zeilen, Woerter) - Zeilenerkennung: Combined-Image-Ansatz schliesst Box-Bereiche aus - Woerter-Erkennung: Zeilen innerhalb von Box-Zonen werden gefiltert - Sub-Sessions: parent_session_id/box_index in DB-Schema - POST /sessions/{id}/create-box-sessions erstellt Sub-Sessions aus Box-Regionen - Session-Info zeigt Sub-Sessions bzw. Parent-Verknuepfung - Sessions-Liste blendet Sub-Sessions per Default aus - Rekonstruktion: Fabric-JSON merged Sub-Session-Zellen an Box-Positionen - Save-Reconstruction routet box{N}_* Updates an Sub-Sessions - GET /sessions/{id}/vocab-entries/merged fuer zusammengefuehrte Eintraege Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 18:24:34 +01:00
Benjamin Admin	7005b18561	feat: generische Box-Erkennung fuer zonenbasierte Spaltenerkennung Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 30s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 19s Details - Neue Datei cv_box_detect.py: 2-Stufen-Algorithmus (Linien + Farbe) - DetectedBox/PageZone Dataclasses in cv_vocab_types.py - detect_column_geometry_zoned() in cv_layout.py - API-Endpoints erweitert: zones/boxes_detected im column_result - Overlay-Funktionen zeichnen Box-Grenzen als gestrichelte Rechtecke - Fix: numpy array or-Verknuepfung an 7 Stellen in ocr_pipeline_api.py - 12 Unit-Tests fuer Box-Erkennung und Zone-Splitting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 15:06:23 +01:00
Benjamin Admin	e60254bc75	fix: alle Post-Crop-Schritte nutzen cropped statt dewarped Bild Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 24s Details Spalten-, Zeilen-, Woerter-Overlay und alle nachfolgenden Steps (LLM-Review, Rekonstruktion) lesen jetzt image/cropped mit Fallback auf image/dewarped. Tests fuer page_crop.py hinzugefuegt (25 Tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 09:10:10 +01:00
Benjamin Admin	156a818246	refactor: Crop nach Deskew/Dewarp verschieben + content-basierter Buchscan-Crop Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details Pipeline-Reihenfolge neu: Orientierung → Begradigung → Entzerrung → Zuschneiden → Spalten... Crop arbeitet jetzt auf dem bereits geraden Bild, was bessere Ergebnisse liefert. page_crop.py komplett ersetzt: Adaptive Threshold + 4-Kanten-Erkennung (Buchruecken-Schatten links, Ink-Projektion fuer alle Raender) statt Otsu + groesste Kontur. Backend: Step-Nummern, Input-Bilder, Reprocess-Kaskade angepasst. Frontend: PIPELINE_STEPS umgeordnet, Switch-Cases, Vorher-Bilder aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 08:52:11 +01:00
Benjamin Admin	eb45bb4879	fix: numpy array or-Verknuepfung in Crop/Deskew + ImageCompareView Labels Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 30s Details CI / test-python-klausur (push) Failing after 2m17s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 24s Details - orientation_crop_api.py: `array or array` durch `is not None` ersetzt (ValueError bei numpy Arrays) - ocr_pipeline_api.py: gleicher Fix fuer Deskew-Fallback-Kette - ImageCompareView.tsx: Fallback-Text nutzt rightLabel statt "Begradigung" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 08:02:44 +01:00
Benjamin Admin	2763631711	feat: Orientierung + Zuschneiden als Schritte 1-2 in OCR-Pipeline Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Zwei neue Wizard-Schritte vor Begradigung: - Step 1: Orientierungserkennung (0/90/180/270° via Tesseract OSD) - Step 2: Seitenrand-Erkennung und Zuschnitt (Scannerraender entfernen) Backend: - orientation_crop_api.py: POST /orientation, POST /crop, POST /crop/skip - page_crop.py: detect_and_crop_page() mit Format-Erkennung (A4/A5/Letter) - Session-Store: orientation_result, crop_result Felder - Pipeline nutzt zugeschnittenes Bild fuer Deskew/Dewarp Frontend: - StepOrientation.tsx: Upload + Auto-Orientierung + Vorher/Nachher - StepCrop.tsx: Auto-Crop + Format-Badge + Ueberspringen-Option - Pipeline-Stepper: 10 Schritte (war 8) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 23:55:23 +01:00
Benjamin Admin	931ab92c92	feat: Orientierungserkennung in OCR-Pipeline-Deskew integrieren Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 21s Details detect_and_fix_orientation() wird jetzt vor dem Deskew-Schritt in der OCR-Pipeline ausgefuehrt, sodass 90/180/270°-gedrehte Scans automatisch korrigiert werden. Frontend zeigt Orientierungskorrektur als Info-Banner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 22:31:36 +01:00
Benjamin Admin	2ad391e4e4	feat: Feinabstimmung mit 7 Schiebereglern fuer Deskew/Dewarp Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details Neues aufklappbares Panel unter Entzerrung mit individuellen Reglern: - 3 Rotations-Regler (P1 Iterative, P2 Word-Alignment, P3 Textline) - 4 Scherungs-Regler (A-D Methoden) mit Radio-Auswahl - Kombinierte Vorschau und Ground-Truth-Speicherung - Backend: POST /sessions/{id}/adjust-combined Endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:22:33 +01:00
Benjamin Admin	d39d249daa	feat: add pass 3 text-line regression to deskew pipeline Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m53s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 15s Details After iterative projection (pass 1) and word-alignment (pass 2), a third pass uses Tesseract word positions + linear regression per text line to measure and correct residual rotation. This catches cases where passes 1-2 leave significant slope (e.g. 1.7° residual on heavily skewed scans). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:53:11 +01:00
Benjamin Admin	538d5c732e	feat: two-pass deskew with wider angle range and residual correction Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details - Increase iterative deskew coarse_range from ±2° to ±5° to handle heavily skewed scans - New deskew_two_pass(): runs iterative projection first, then word-alignment on the corrected image to detect/fix residual skew (applied when residual ≥ 0.3°) - OCR pipeline API auto_deskew now uses deskew_two_pass by default - Vocab worksheet _run_ocr_pipeline_for_page uses deskew_two_pass - Deskew result now includes angle_residual and two_pass_debug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:34:57 +01:00
Benjamin Admin	af1b12c97d	feat: iterative projection-profile deskew (2-phase variance optimization) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m53s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 17s Details Adds deskew_image_iterative() as 3rd deskew method that directly optimizes for projection-profile sharpness instead of proxy signals (Hough/word alignment). Coarse sweep on horizontal profile, fine sweep on vertical profile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 13:46:44 +01:00
Benjamin Admin	1cc69d6b5e	feat: OCR pipeline step 8 — validation view with image detection & generation Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m4s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 19s Details Replaces the stub StepGroundTruth with a full side-by-side Original vs Reconstruction view. Adds VLM-based image region detection (qwen2.5vl), mflux image generation proxy, sync scroll/zoom, manual region drawing, and score/notes persistence. New backend endpoints: detect-images, generate-image, validate, get validation. New standalone mflux-service (scripts/mflux-service.py) for Metal GPU generation. Dockerfile.base: adds fonts-liberation (Apache-2.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 10:40:37 +01:00
Benjamin Admin	293e7914d8	feat: improved OCR pipeline session manager with categories, thumbnails, pipeline logging Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m48s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 20s Details - Add document_category (10 types) and pipeline_log JSONB columns - Session list: thumbnails, copyable IDs, category/doc_type badges - Inline category dropdown, bulk delete, pipeline step logging - New endpoints: thumbnail, delete-all, pipeline-log, categories - Cleared all 22 old test sessions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 09:44:38 +01:00
Benjamin Admin	a58dfca1d8	fix: move char-confusion fix to correction step, add spell + page-ref corrections Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 30s Details CI / test-nodejs-website (push) Successful in 20s Details CI / nodejs-lint (push) Failing after 10m5s Details - Remove _fix_character_confusion() from words endpoint (now only in Phase 0) - Extend spell checker to find real OCR errors via spell.correction() - Add field-aware dictionary selection (EN/DE) for spell corrections - Add _normalize_page_ref() for page_ref column (p-60 → p.60) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 00:26:13 +01:00
Benjamin Admin	34c649c8be	fix: send SSE keepalive events every 5s during batch OCR Batch OCR takes 30-60s with 3x upscaling. Without keepalive events, proxy servers (Nginx) drop the SSE connection after their read timeout. Now sends keepalive events every 5s to prevent timeout, with elapsed time for debugging. Also checks for client disconnect between keepalives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 22:21:14 +01:00
Benjamin Admin	dd16c88007	fix: retry words request on 400/404 + add backend diagnostic logging Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Frontend: retry /words POST once after 2s delay if it gets 400/404, which happens when navigating via wizard after container restart (session cache not yet warm). Backend: log when session needs DB reload and when dewarped_bgr is missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:15:54 +01:00
Benjamin Admin	68d230c297	fix: use batch-then-stream SSE for cell-first OCR Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m49s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details The old per-cell streaming timed out because sequential cell OCR was too slow to send the first event before proxy timeout. Now uses build_cell_grid_v2 (parallel ThreadPoolExecutor) via run_in_executor, then streams all cells at once after batch completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 14:51:55 +01:00
Benjamin Admin	29c74a9962	feat: cell-first OCR + document type detection + dynamic pipeline steps Cell-First OCR (v2): Each cell is cropped and OCR'd in isolation, eliminating neighbour bleeding (e.g. "to", "ps" in marker columns). Uses ThreadPoolExecutor for parallel Tesseract calls. Document type detection: Classifies pages as vocab_table, full_text, or generic_table using projection profiles (<2s, no OCR needed). Frontend dynamically skips columns/rows steps for full-text pages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:52:38 +01:00
Benjamin Admin	9dd77ab54a	fix: move column expansion AFTER sub-column split Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m54s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 18s Details The narrow column expansion was running inside detect_column_geometry() on the 4 main columns, but the narrowest columns (marker ~14px, page_ref ~93px) are created AFTERWARDS by _detect_sub_columns(). Extracted expand_narrow_columns() as standalone function and call it after sub-column splitting in the columns API endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 10:07:40 +01:00
Benjamin Admin	ab3ecc7c08	feat: OCR pipeline v2.1 – narrow column OCR, dewarp automation, Fabric.js editor Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m50s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 15s Details Proposal B: Adaptive padding, crop upscaling, PSM selection, row-strip re-OCR for narrow columns (<15% width) – expected accuracy boost 60-70% → 85-90%. Proposal A: New text-line straightness detector (Method D), quality gate (rejects counterproductive corrections), 2-pass projection refinement, higher confidence thresholds – expected manual dewarp reduction to <10%. Proposal C: Fabric.js canvas editor with drag/drop, inline editing, undo/redo, opacity slider, zoom, PDF/DOCX export endpoints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 22:44:14 +01:00
Benjamin Admin	7eb03ca8d1	fix(ocr-pipeline): IndentationError in auto-mode deskew block Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m49s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 17s Details The try/except block for the deskew step had 4 extra spaces of indentation from a previous edit. Python rejected the file with IndentationError at startup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 13:21:49 +01:00
Benjamin Admin	50e1c964ee	feat(klausur-service): OCR-Pipeline Optimierungen (Improvements 2-4) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m46s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details ## Improvement 2: VLM-basierter Dewarp - Neuer Query-Parameter `method` für POST /sessions/{id}/dewarp Optionen: ensemble (default) \| vlm \| cv - `_detect_shear_with_vlm()`: fragt qwen2.5vl:32b per Ollama nach dem Scherwinkel — gibt Zahlenwert + Konfidenz zurück - `os`, `Query` zu ocr_pipeline_api.py Imports hinzugefügt - `_apply_shear` aus cv_vocab_pipeline importiert ## Improvement 4: 3-Methoden Ensemble-Dewarp - `_detect_shear_by_projection()`: Varianz-Sweep ±3° / 0.25°-Schritte auf horizontalen Text-Zeilen-Projektionen (~30ms) - `_detect_shear_by_hough()`: Gewichteter Median über HoughLinesP auf Tabellen-Linien, Vorzeichen-Inversion (~20ms) - `_ensemble_shear()`: Kombiniert alle 3 Methoden (conf >= 0.3), Ausreißer-Filter bei >1° Abweichung, Bonus bei Agreement <0.5° - `dewarp_image()` nutzt jetzt alle 3 Methoden parallel, `use_ensemble: bool = True` für Rückwärtskompatibilität - auto_dewarp Response enthält jetzt `detections`-Array ## Improvement 3: Vollautomatik-Endpoint - POST /sessions/{id}/run-auto mit RunAutoRequest: from_step (1-6), ocr_engine, pronunciation, skip_llm_review, dewarp_method - SSE-Streaming für alle 5+1 Schritte (deskew→dewarp→columns→rows→words→llm-review) - Jeder Schritt: start / done / skipped / error Events - Abschluss-Event: {steps_run, steps_skipped} - LLM-Review-Fehler sind nicht-fatal (Pipeline läuft weiter) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 13:13:20 +01:00
Benjamin Admin	2e0f8632f8	feat(klausur): Handschrift entfernen + Klausur-HTR implementiert Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m49s Details CI / test-python-agent-core (push) Successful in 14s Details CI / test-nodejs-website (push) Successful in 15s Details Feature 1: Handschrift entfernen via OCR-Pipeline Session - services/handwriting_detection.py: _detect_pencil() + target_ink Parameter ("all" \| "colored" \| "pencil") für gezielte Tinten-Erkennung - ocr_pipeline_session_store.py: clean_png + handwriting_removal_meta Spalten (idempotentes ALTER TABLE in init_ocr_pipeline_tables) - ocr_pipeline_api.py: POST /sessions/{id}/remove-handwriting Endpoint + "clean" zu valid_types für Image-Serving hinzugefügt Feature 2: Klausur-HTR (Hochwertige Handschriftenerkennung) - handwriting_htr_api.py: Neuer Router /api/v1/htr/recognize + /recognize-session Primary: qwen2.5vl:32b via Ollama, Fallback: trocr-large-handwritten - services/trocr_service.py: size Parameter (base \| large) für get_trocr_model() + run_trocr_ocr() - unterstützt jetzt trocr-large-handwritten - main.py: HTR Router registriert Config: - docker-compose.yml: OLLAMA_HTR_MODEL, HTR_FALLBACK_MODEL - .env.example: HTR Env-Vars dokumentiert Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 12:04:26 +01:00
Benjamin Admin	606bef0591	fix(ocr-pipeline): overlap-based word assignment and empty row filtering Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 1m14s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 17s Details 1. Word-to-column assignment now uses overlap-based matching instead of center-point matching. This fixes narrow page_ref columns losing their last digit (e.g. "p.59" → "p.5") when the digit's center falls slightly past the midpoint boundary into the next column. 2. Post-OCR empty row filter: rows where ALL cells have empty text are removed after OCR. This catches inter-row gaps that had stray Tesseract artifacts giving word_count > 0 but no actual content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 11:00:29 +01:00
Benjamin Admin	d6a8c1d821	fix(streaming): include page_ref columns in SSE metadata Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details The streaming word endpoint excluded page_ref from _skip_types, causing sub-column splits to be lost in the meta event and final grid_shape. Aligned _skip_types with build_cell_grid_streaming(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 07:48:07 +01:00

1 2

83 Commits