Three bugs in the post-processing pipeline were overwriting correct
streaming results with wrong ones:
1. _split_comma_entries was splitting "Maus, Mäuse" into two separate
entries. Disabled — word forms belong together.
2. _attach_example_sentences treated "Ei" (2 chars) as OCR noise due
to `len(de) > 2` threshold. Lowered to `len(de) > 1`.
3. _attach_example_sentences wrongly classified rows with EN text but
no DE (like "stand ...") as example sentences, merging them into
the previous entry. Now only treats rows as examples if they also
have no text in the example column.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>