fix(ocr-pipeline): preserve sub-column data in vocab table display
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 16s

Three fixes for sub-columns disappearing at end of streaming:

1. Backend: add column_marker mapping in _cells_to_vocab_entries()
   so marker text is included in vocab entries (not silently dropped)

2. Frontend types: add source_page and bbox_ref to WordEntry interface

3. Frontend table: show page_ref column (Seite) in vocab table when
   entries have source_page data, instead of only EN/DE/Example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-03 08:06:15 +01:00
parent 0d72f2c836
commit dea3349b23
3 changed files with 54 additions and 37 deletions

View File

@@ -4215,12 +4215,14 @@ def _cells_to_vocab_entries(
'column_de': 'german',
'column_example': 'example',
'page_ref': 'source_page',
'column_marker': 'marker',
}
bbox_key_map = {
'column_en': 'bbox_en',
'column_de': 'bbox_de',
'column_example': 'bbox_ex',
'page_ref': 'bbox_ref',
'column_marker': 'bbox_marker',
}
# Group cells by row_index
@@ -4238,12 +4240,14 @@ def _cells_to_vocab_entries(
'german': '',
'example': '',
'source_page': '',
'marker': '',
'confidence': 0.0,
'bbox': None,
'bbox_en': None,
'bbox_de': None,
'bbox_ex': None,
'bbox_ref': None,
'bbox_marker': None,
'ocr_engine': row_cells[0].get('ocr_engine', '') if row_cells else '',
}