Add Vision-LLM OCR Fusion (Step 4) for degraded scans
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 27s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 27s
New module vision_ocr_fusion.py: Sends scan image + OCR word coordinates + document type to Qwen2.5-VL 32B. The LLM reads the image visually while using OCR positions as structural hints. Key features: - Document-type-aware prompts (Vokabelseite, Woerterbuch, etc.) - OCR words grouped into lines with x/y coordinates in prompt - Low-confidence words marked with (?) for LLM attention - Continuation row merging instructions in prompt - JSON response parsing with markdown code block handling - Fallback to original OCR on any error Frontend (admin-lehrer Grid Review): - "Vision-LLM" checkbox toggle - "Typ" dropdown (Vokabelseite, Woerterbuch, etc.) - Steps 1-3 defaults set to inactive Activate: Check "Vision-LLM", select document type, click "OCR neu + Grid". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -67,6 +67,10 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
|
||||
setOcrMaxCols,
|
||||
ocrMinConf,
|
||||
setOcrMinConf,
|
||||
visionFusion,
|
||||
setVisionFusion,
|
||||
documentCategory,
|
||||
setDocumentCategory,
|
||||
rerunOcr,
|
||||
} = useGridEditor(sessionId)
|
||||
|
||||
@@ -291,6 +295,22 @@ export function StepGridReview({ sessionId, onNext, saveRef }: StepGridReviewPro
|
||||
</select>
|
||||
</label>
|
||||
|
||||
<span className="text-gray-400 dark:text-gray-500">|</span>
|
||||
<label className="flex items-center gap-1 cursor-pointer" title="Step 4: Vision-LLM Fusion — Qwen2.5-VL korrigiert OCR anhand des Bildes">
|
||||
<input type="checkbox" checked={visionFusion} onChange={(e) => setVisionFusion(e.target.checked)} className="rounded w-3 h-3 accent-orange-500" />
|
||||
<span className={`${visionFusion ? 'text-orange-500 dark:text-orange-400 font-medium' : 'text-gray-500 dark:text-gray-400'}`}>Vision-LLM</span>
|
||||
</label>
|
||||
<label className="flex items-center gap-1" title="Dokumenttyp fuer Vision-LLM Prompt">
|
||||
<span className="text-gray-500 dark:text-gray-400">Typ:</span>
|
||||
<select value={documentCategory} onChange={(e) => setDocumentCategory(e.target.value)} className="px-1 py-0.5 text-xs rounded border border-gray-200 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-300">
|
||||
<option value="vokabelseite">Vokabelseite</option>
|
||||
<option value="woerterbuch">Woerterbuch</option>
|
||||
<option value="arbeitsblatt">Arbeitsblatt</option>
|
||||
<option value="buchseite">Buchseite</option>
|
||||
<option value="sonstiges">Sonstiges</option>
|
||||
</select>
|
||||
</label>
|
||||
|
||||
<div className="ml-auto flex items-center gap-2">
|
||||
<button
|
||||
onClick={() => {
|
||||
|
||||
Reference in New Issue
Block a user