Lower word-split threshold from 7 to 4 chars

Short merged words like "anew" (a new), "Imadea" (I made a), "makeadecision" (make a decision) were missed because the split threshold was too high. Now processes tokens >= 4 chars. English single-letter words (a, I) are already handled by the DP algorithm which allows them as valid split points. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:59:02 +02:00
parent 656cadbb1e
commit 7ffa4c90f9
3 changed files with 24 additions and 5 deletions
--- a/klausur-service/backend/grid_editor_api.py
+++ b/klausur-service/backend/grid_editor_api.py
@@ -1751,10 +1751,10 @@ async def _build_grid_core(
                    parts = []
                    changed = False
                    for token in text.split():
-                        # Only try splitting pure-alpha tokens > 7 chars
+                        # Try splitting pure-alpha tokens >= 4 chars
                        clean = token.rstrip(".,!?;:'\")")
                        suffix = token[len(clean):]
-                        if len(clean) > 7 and clean.isalpha():
+                        if len(clean) >= 4 and clean.isalpha():
                            split = _try_split_merged_word(clean)
                            if split:
                                parts.append(split + suffix)