Benjamin Admin
cd13eca290
fix: IPA-Einfuegung fuer column_text mit word_boxes Synchronisation
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s
Fuer column_text werden fehlende IPA-Lautschriften (challenge, profit,
film, badge) wieder eingefuegt, aber gleichzeitig eine synthetische
word_box erzeugt, damit die 1:1 Token-zu-Box Zuordnung im Overlay
erhalten bleibt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 23:15:26 +01:00
Benjamin Admin
aa7db43f02
fix: column_text nur garbled IPA ersetzen, keine Einfuegung/Entfernung
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m8s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 21s
Fuer column_text (Full-Page Overlay mit gemischtem EN+DE Text):
- Kein IPA einfuegen (wuerde Token-Count aendern, Overlay-Positionen brechen)
- Keine orphan brackets entfernen (sind oft deutsche Bedeutungen wie (probieren))
- Nur garbled IPA ersetzen (z.B. [teıst] -> [tˈeɪst])
column_en behaelt volle Verarbeitung (replace + strip + insert).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 23:05:37 +01:00
Benjamin Admin
4afd5bd8e8
fix: Klammerwörter wie (probieren), (Profit) nicht mehr als garbled IPA entfernen
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 27s
_strip_orphan_bracket entfernte deutsche Bedeutungsangaben in Klammern,
weil sie weder als Grammar-Partikel noch als IPA erkannt wurden.
Fix: Klammerinhalte mit echten Wörtern (>=4 Buchstaben) werden behalten.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 22:47:01 +01:00
Benjamin Admin
2f51ac617f
feat: IPA-Lautschrift in Cell-Texte einfuegen (fuer Overlay-Modus)
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m5s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 22s
fix_cell_phonetics() ersetzt fehlerhafte IPA-Klammern UND fuegt fehlende
Lautschrift fuer englische Woerter ein (z.B. badge, film, challenge, profit).
Wird auf alle Zellen mit col_type column_en/column_text angewandt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 15:47:26 +01:00
Benjamin Admin
23b7840ea7
feat: Full-Row OCR mit Spacing fuer Box-Sub-Sessions
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m16s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 22s
Sub-Sessions ueberspringen Spaltenerkennung und nutzen stattdessen eine
Pseudo-Spalte ueber die volle Breite. Text wird mit proportionalem
Spacing aus Wort-Positionen rekonstruiert, um raeumliches Layout zu erhalten.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-10 08:28:29 +01:00
Benjamin Admin
cf9dde9876
fix: _group_words_into_lines nach cv_ocr_engines.py verschieben
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 21s
Funktion war nur in cv_review.py definiert, wurde aber auch in
cv_ocr_engines.py und cv_layout.py benutzt — NameError zur Laufzeit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-09 15:24:56 +01:00
Benjamin Admin
9a5a35bff1
refactor: cv_vocab_pipeline.py in 6 Module aufteilen (8163 → 6 + Fassade)
...
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Monolithische 8163-Zeilen-Datei aufgeteilt in fokussierte Module:
- cv_vocab_types.py (156 Z.): Dataklassen, Konstanten, IPA, Feature-Flags
- cv_preprocessing.py (1166 Z.): Bild-I/O, Orientierung, Deskew, Dewarp
- cv_layout.py (3036 Z.): Dokumenttyp, Spalten, Zeilen, Klassifikation
- cv_ocr_engines.py (1282 Z.): OCR-Engines, Vocab-Postprocessing, Text-Cleaning
- cv_cell_grid.py (1510 Z.): Cell-Grid v2+Legacy, Vocab-Konvertierung
- cv_review.py (1184 Z.): LLM/Spell Review, Pipeline-Orchestrierung
cv_vocab_pipeline.py ist jetzt eine Re-Export-Fassade (35 Z.) —
alle bestehenden Imports bleiben unveraendert.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-08 23:46:47 +01:00