Add 4 post-processing steps after OCR (no LLM needed): 1. Character confusion fix: I/1/l/| correction using cross-language context (if DE has "Ich", EN "1" → "I") 2. IPA dictionary replacement: detect [phonetics] brackets, look up correct IPA from eng_to_ipa (MIT, 134k words) — replaces OCR'd phonetic symbols with dictionary-correct transcription 3. Comma-split: "break, broke, broken" / "brechen, brach, gebrochen" → 3 individual entries when part counts match 4. Example sentence attachment: rows with EN but no DE translation get attached as examples to the preceding vocab entry All fixes are deterministic and generic — no hardcoded word lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
120 KiB
120 KiB