Integrate SmartSpellChecker into build-grid finalization
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m45s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 40s

SmartSpellChecker now runs during grid build (not just LLM review),
so corrections are visible immediately in the grid editor.

Language detection per column:
- EN column detected via IPA signals (existing logic)
- All other columns assumed German for vocab tables
- Auto-detection for single/two-column layouts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-12 14:54:01 +02:00
parent 909d0729f6
commit f6372b8c69

View File

@@ -1835,6 +1835,49 @@ async def _build_grid_core(
if fixed != text:
cell["text"] = fixed
# --- SmartSpellChecker: language-aware OCR correction on all cells ---
try:
from smart_spell import SmartSpellChecker
_ssc = SmartSpellChecker()
spell_fix_count = 0
# Determine language per column:
# en_col_type was already detected (column with IPA = English).
# All other content columns are assumed German for vocab tables.
# For single/two-column layouts, use auto-detection.
for z in zones_data:
zone_cols = z.get("columns", [])
for cell in z.get("cells", []):
text = cell.get("text", "")
if not text or not text.strip():
continue
ct = cell.get("col_type", "")
if not ct.startswith("column_"):
continue
# Determine language for this cell
if total_cols >= 3 and en_col_type:
lang = "en" if ct == en_col_type else "de"
elif total_cols <= 2:
lang = "auto" # auto-detect for non-vocab layouts
else:
lang = "auto"
result = _ssc.correct_text(text, lang=lang)
if result.changed:
cell["text"] = result.corrected
spell_fix_count += 1
if spell_fix_count:
logger.info(
"build-grid session %s: SmartSpellChecker fixed %d cells",
session_id, spell_fix_count,
)
except ImportError:
logger.debug("SmartSpellChecker not available in build-grid")
except Exception as e:
logger.warning("SmartSpellChecker error in build-grid: %s", e)
# --- Debug: log cell counts per column before empty-column removal ---
for z in zones_data:
if z.get("zone_type") == "content":