breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	c8027eb7f9	Fix: preserve = ; : - and other meaningful symbols in word_boxes Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m38s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Rule (a2) in Step 5i removed word_boxes with no letters/digits as "graphic OCR artifacts". This incorrectly removed = signs used as definition markers in textbooks ("film = 1. Film; 2. filmen"). Added exception list _KEEP_SYMBOLS for meaningful punctuation: = (= =) ; : - – — / + • · ( ) & * → ← ↔ The root cause: PaddleOCR returns "film = 1. Film; 2. filmen" as one block, which gets split into word_boxes ["film", "=", "1.", ...]. The "=" word_box had no alphanumeric chars and was removed as artifact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:18:35 +02:00
Benjamin Admin	ba0f659d1e	Preserve = and (= tokens in grid build and cell text cleanup Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 42s Details = signs are used as definition markers in textbooks ("film = 1. Film"). They were incorrectly removed by two filters: 1. grid_build_core.py Step 5j-pre: _PURE_JUNK_RE matched "=" as artifact noise. Now exempts =, (=, ;, :, - and similar meaningful punctuation tokens. 2. cv_ocr_engines.py _is_noise_tail_token: "pure non-alpha" check removed trailing = tokens. Now exempts meaningful punctuation. Fixes: "film = 1. Film; 2. filmen" losing the = sign, "(= I won and he lost.)" losing the (=. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:04:27 +02:00
Benjamin Admin	0599c72cc1	Fix IPA continuation: don't replace normal text with IPA Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 19s Details Text like "Betonung auf der 1. Silbe: profit ['profit]" was incorrectly detected as garbled IPA and replaced with generated IPA transcription of the previous row's example sentence. Added guard: if the cell text contains >=3 recognizable words (3+ letter alpha tokens), it's normal text, not garbled IPA. Garbled IPA is typically short and has no real dictionary words. Fixes: Row 13 C3 showing IPA instead of pronunciation hint text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 22:28:58 +02:00
Benjamin Admin	17f0fdb2ed	Refactor: extract _build_grid_core into grid_build_core.py + clean StepAnsicht Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Failing after 19s Details CI / test-go-edu-search (push) Failing after 23s Details CI / test-python-klausur (push) Failing after 10s Details CI / test-python-agent-core (push) Failing after 9s Details CI / test-nodejs-website (push) Failing after 26s Details grid_editor_api.py: 2411 → 474 lines - Extracted _build_grid_core() (1892 lines) into grid_build_core.py - API file now only contains endpoints (build, save, get, gutter, box, unified) StepAnsicht.tsx: 212 → 112 lines - Removed useGridEditor imports (not needed for read-only spreadsheet) - Removed unified grid fetch/build (not used with multi-sheet approach) - Removed Spreadsheet/Grid toggle (only spreadsheet mode now) - Simple: fetch grid-editor data → pass to SpreadsheetView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:54:55 +02:00

4 Commits