breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	545c8676b0	Add A/B testing toggles for OCR quality steps Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 18s Details Each quality improvement step can now be toggled independently: - CLAHE checkbox (Step 3: image enhancement on/off) - MaxCols dropdown (Step 2: 0=unlimited, 2-5) - MinConf dropdown (Step 1: auto/20/30/40/50/60) Backend: Query params enhance, max_cols, min_conf on process-single-page. Response includes active_steps dict showing which steps are enabled. Frontend: Toggle controls in VocabularyTab above the table. This allows empirical A/B testing of each step on the same scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 15:27:26 +02:00
Benjamin Admin	2f34ee9ede	Add scan quality scoring, column limit, image enhancement (Steps 1-3) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 32s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 28s Details CI / test-nodejs-website (push) Successful in 20s Details Step 1: scan_quality.py — Laplacian blur + contrast scoring, adjusts OCR confidence threshold (40 for good scans, 30 for degraded). Quality report included in API response + shown in frontend. Step 2: max_columns parameter in cv_words_first.py — limits column detection to 3 for vocab tables, preventing phantom columns D/E from degraded OCR fragments. Step 3: ocr_image_enhance.py — CLAHE contrast + bilateral filter denoising + unsharp mask, only for degraded scans (gated by quality score). Pattern from handwriting_htr_api.py. Frontend: quality info shown in extraction status after processing. Reprocess button now derives pages from vocabulary data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 14:58:39 +02:00
Benjamin Admin	9b0e310978	Fix: reprocess button works after session resume + apply merge logic Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 34s Details Two bugs fixed: 1. reprocessPages() failed silently after session resume because successfulPages was empty. Now derives pages from vocabulary source_page or selectedPages as fallback. 2. process-single-page endpoint built vocabulary entries WITHOUT applying merge logic (_merge_wrapped_rows, _merge_continuation_rows). Now applies full merge pipeline after vocabulary extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:46:15 +02:00
Benjamin Admin	909d0729f6	Add SmartSpellChecker + refactor vocab-worksheet page.tsx Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m51s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 37s Details SmartSpellChecker (klausur-service): - Language-aware OCR post-correction without LLMs - Dual-dictionary heuristic for EN/DE language detection - Context-based a/I disambiguation via bigram lookup - Multi-digit substitution (sch00l→school) - Cross-language guard (don't false-correct DE words in EN column) - Umlaut correction (Schuler→Schüler, uber→über) - Integrated into spell_review_entries_sync() pipeline - 31 tests, 9ms/100 corrections Vocab-worksheet refactoring (studio-v2): - Split 2337-line page.tsx into 14 files - Custom hook useVocabWorksheet.ts (all state + logic) - 9 components in components/ directory - types.ts, constants.ts for shared definitions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:25:01 +02:00

4 Commits