feat: OCR umlaut confusion correction + bold detection via stroke-width
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
- Add umlaut confusion rules (i→ü, a→ä, o→ö, u→ü) to _spell_fix_token for German text — fixes "iberqueren" → "überqueren" etc. - Add _detect_bold() using OpenCV stroke-width analysis on cell crops - Integrate bold detection in both narrow (cell-crop) and broad (word-lookup) paths - Add is_bold field to GridCell TypeScript interface - Render bold text in StepGroundTruth reconstruction view Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -389,6 +389,7 @@ export function StepGroundTruth({ sessionId, onNext }: StepGroundTruthProps) {
|
||||
height: `${cell.bbox_pct.h}%`,
|
||||
color: '#1a1a1a',
|
||||
fontSize: `${fontSize}px`,
|
||||
fontWeight: cell.is_bold ? 'bold' : 'normal',
|
||||
fontFamily: "'Liberation Sans', 'DejaVu Sans', Arial, sans-serif",
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
|
||||
Reference in New Issue
Block a user