6bca3370e0deabe916f1fca2a191850d0d07b215
Three bugs in the post-processing pipeline were overwriting correct streaming results with wrong ones: 1. _split_comma_entries was splitting "Maus, Mäuse" into two separate entries. Disabled — word forms belong together. 2. _attach_example_sentences treated "Ei" (2 chars) as OCR noise due to `len(de) > 2` threshold. Lowered to `len(de) > 1`. 3. _attach_example_sentences wrongly classified rows with EN text but no DE (like "stand ...") as example sentences, merging them into the previous entry. Now only treats rows as examples if they also have no text in the example column. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%