e3f939a6282f5dd5e53cd66ffbd02dd295e83793
Three non-generic solutions replaced with universal heuristics: 1. Cell-OCR fallback: instead of restricting to column_en/column_de, now checks pixel density (>2% dark pixels) for ANY column type. Truly empty cells are skipped without running Tesseract. 2. Example-sentence detection: instead of checking for example-column text (worksheet-specific), now uses sentence heuristics (>=4 words or ends with sentence punctuation). Short EN text without DE is kept as a vocab entry (OCR may have missed the translation). 3. Comma-split: re-enabled with singular/plural detection. Pairs like "mouse, mice" / "Maus, Mäuse" are kept together. Verb forms like "break, broke, broken" are still split into individual entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%