feat: apply IPA phonetic correction in build-grid combo mode
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 18s

fix_cell_phonetics was only called in the OCR pipeline endpoints
(/words, /cells) but not in the combo mode (build-grid / ocr-overlay).
Garbled IPA like [teist] is now corrected to [teɪst] using the
IPA dictionary, same as in the pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-17 12:53:58 +01:00
parent b0e1fbc8d6
commit ab30e8b17a

View File

@@ -21,6 +21,7 @@ from fastapi import APIRouter, HTTPException, Request
from cv_box_detect import detect_boxes, split_page_into_zones
from cv_color_detect import detect_word_colors, recover_colored_text
from cv_ocr_engines import fix_cell_phonetics
from cv_words_first import _cluster_rows, _build_cells
from ocr_pipeline_session_store import (
get_session_db,
@@ -970,6 +971,11 @@ async def build_grid(session_id: str):
if ")" in text and "(" not in text:
cell["text"] = "(" + text
# 5c. IPA phonetic correction — replace garbled OCR phonetics with
# correct IPA from the dictionary (same as in the OCR pipeline).
all_cells = [cell for z in zones_data for cell in z.get("cells", [])]
fix_cell_phonetics(all_cells, pronunciation="british")
duration = time.time() - t0
# 6. Build result