Only apply IPA correction on vocabulary tables (≥3 columns)

Single-column German text pages were getting IPA inserted for words that happen to exist in the English dictionary ("die" → [dˈaɪ], "Das" → [dɑs]). Now IPA correction only runs when the grid has ≥3 columns, which is the minimum for a vocabulary table layout (English | article | German). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 11:50:03 +01:00
parent b98ea33a3a
commit 821e5481c2
1 changed files with 28 additions and 27 deletions
@@ -1165,10 +1165,11 @@ async def build_grid(session_id: str):

    # 5c. IPA phonetic correction — replace garbled OCR phonetics with
    # correct IPA from the dictionary (same as in the OCR pipeline).
-    # The grid uses generic col_types (column_1, column_2, ...) but
-    # fix_cell_phonetics expects column_en / column_text.  Identify
-    # the English headword column (longest average text) and mark it.
+    # Only applies to vocabulary tables (≥3 columns: EN | article | DE).
+    # Single/two-column layouts are continuous text, not vocab tables.
    all_cells = [cell for z in zones_data for cell in z.get("cells", [])]
+    total_cols = sum(len(z.get("columns", [])) for z in zones_data)
+    if total_cols >= 3:
        # Find which col_type has the longest average text → English headwords
        col_avg_len: Dict[str, List[int]] = {}
        for cell in all_cells: