Files
breakpilot-lehrer/klausur-service
Benjamin Admin 010616be5a fix(ocr-pipeline): generic example attachment + cell padding
1. Semantic example matching: instead of attaching example sentences
   to the immediately preceding entry, find the vocab entry whose
   English word(s) appear in the example. "a broken arm" → matches
   "broken" via word overlap, not "egg/Ei". Uses stem matching for
   word form variants (break/broken share stem "bro").

2. Cell padding: add 8px padding to each cell region so words at
   column/row edges don't get clipped by OCR (fixes "er wollte"
   missing at cell boundaries).

3. Treat very short DE text (≤2 chars) as OCR noise, not real
   translation — prevents false positives in example detection.

All fixes are generic and deterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:24:28 +01:00
..