Add _clean_cell_text() with three sub-filters to remove OCR noise:
- _is_garbage_text(): vowel/consonant ratio check for phantom row garbage
- _is_noise_tail_token(): dictionary-based trailing noise detection
- _RE_REAL_WORD check for cells with no real words (just fragments)
Handles balanced parentheses "(auf)" and trailing hyphens "under-"
as legitimate tokens while stripping noise like "Es)", "3", "ee", "B".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>