Benjamin Admin
c3a924a620
fix(ocr-pipeline): merge phonetic-only rows and fix bracket noise filter
...
Two fixes:
1. Tokens ending with ] (e.g. "serva]") were stripped by the noise
filter because ] was not in the allowed punctuation list.
2. Rows containing only phonetic transcription (e.g. ['mani serva])
are now merged into the previous vocab entry instead of creating
a separate (invalid) entry. This prevents the LLM from trying
to "correct" phonetic fragments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-02 14:14:20 +01:00
..
2026-03-01 11:08:52 +01:00
2026-02-11 23:47:26 +01:00
2026-02-28 02:18:29 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-03-02 14:14:20 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-26 22:16:37 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-03-02 12:19:21 +01:00
2026-02-28 02:18:29 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-28 20:23:40 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00
2026-02-11 23:47:26 +01:00