Integrate SmartSpellChecker into build-grid finalization
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m45s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 40s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m45s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 40s
SmartSpellChecker now runs during grid build (not just LLM review), so corrections are visible immediately in the grid editor. Language detection per column: - EN column detected via IPA signals (existing logic) - All other columns assumed German for vocab tables - Auto-detection for single/two-column layouts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1835,6 +1835,49 @@ async def _build_grid_core(
|
||||
if fixed != text:
|
||||
cell["text"] = fixed
|
||||
|
||||
# --- SmartSpellChecker: language-aware OCR correction on all cells ---
|
||||
try:
|
||||
from smart_spell import SmartSpellChecker
|
||||
_ssc = SmartSpellChecker()
|
||||
spell_fix_count = 0
|
||||
|
||||
# Determine language per column:
|
||||
# en_col_type was already detected (column with IPA = English).
|
||||
# All other content columns are assumed German for vocab tables.
|
||||
# For single/two-column layouts, use auto-detection.
|
||||
for z in zones_data:
|
||||
zone_cols = z.get("columns", [])
|
||||
for cell in z.get("cells", []):
|
||||
text = cell.get("text", "")
|
||||
if not text or not text.strip():
|
||||
continue
|
||||
ct = cell.get("col_type", "")
|
||||
if not ct.startswith("column_"):
|
||||
continue
|
||||
|
||||
# Determine language for this cell
|
||||
if total_cols >= 3 and en_col_type:
|
||||
lang = "en" if ct == en_col_type else "de"
|
||||
elif total_cols <= 2:
|
||||
lang = "auto" # auto-detect for non-vocab layouts
|
||||
else:
|
||||
lang = "auto"
|
||||
|
||||
result = _ssc.correct_text(text, lang=lang)
|
||||
if result.changed:
|
||||
cell["text"] = result.corrected
|
||||
spell_fix_count += 1
|
||||
|
||||
if spell_fix_count:
|
||||
logger.info(
|
||||
"build-grid session %s: SmartSpellChecker fixed %d cells",
|
||||
session_id, spell_fix_count,
|
||||
)
|
||||
except ImportError:
|
||||
logger.debug("SmartSpellChecker not available in build-grid")
|
||||
except Exception as e:
|
||||
logger.warning("SmartSpellChecker error in build-grid: %s", e)
|
||||
|
||||
# --- Debug: log cell counts per column before empty-column removal ---
|
||||
for z in zones_data:
|
||||
if z.get("zone_type") == "content":
|
||||
|
||||
Reference in New Issue
Block a user