breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	f860eb66e6	Add German IPA support (wiki-pronunciation-dict + epitran) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m12s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 17s Details Hybrid approach mirroring English IPA: - Primary: wiki-pronunciation-dict (636k entries, CC-BY-SA, Wiktionary) - Fallback: epitran rule-based G2P (MIT license) IPA modes now use language-appropriate dictionaries: - auto/en: English IPA (Britfone + eng_to_ipa) - de: German IPA (wiki-pronunciation-dict + epitran) - all: EN column gets English IPA, other columns get German IPA - none: disabled Frontend shows CC-BY-SA attribution when German IPA is active. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 22:18:20 +01:00
Benjamin Admin	364086b86e	feat: auto-insert syllable dividers via pyphen on dictionary pages OCR engines don't detect \| pipe chars used as syllable dividers in dictionaries. After dictionary detection (is_dict=True), use pyphen (MIT) to insert syllable breaks into headword cells. Tries DE first, then EN. Skips IPA content, short words, and cells already containing \|. Also adds pyphen>=0.16.0 to requirements.txt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 14:17:26 +01:00
Benjamin Admin	be7f5f1872	feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management D2: TrOCR ONNX export script (printed + handwritten, int8 quantization) D3: PP-DocLayout ONNX export script (download or Docker-based conversion) B3: Model Management admin page (PyTorch vs ONNX status, benchmarks, config) A4: TrOCR ONNX service with runtime routing (auto/pytorch/onnx via TROCR_BACKEND) A5: PP-DocLayout ONNX detection with OpenCV fallback (via GRAPHIC_DETECT_BACKEND) B4: Structure Detection UI toggle (OpenCV vs PP-DocLayout) with class color coding C3: TrOCR-ONNX.md documentation C4: OCR-Pipeline.md ONNX section added C5: mkdocs.yml nav updated, optimum added to requirements.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:53:02 +01:00
Benjamin Admin	ab3ecc7c08	feat: OCR pipeline v2.1 – narrow column OCR, dewarp automation, Fabric.js editor Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m50s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 15s Details Proposal B: Adaptive padding, crop upscaling, PSM selection, row-strip re-OCR for narrow columns (<15% width) – expected accuracy boost 60-70% → 85-90%. Proposal A: New text-line straightness detector (Method D), quality gate (rejects counterproductive corrections), 2-pass projection refinement, higher confidence thresholds – expected manual dewarp reduction to <10%. Proposal C: Fabric.js canvas editor with drag/drop, inline editing, undo/redo, opacity slider, zoom, PDF/DOCX export endpoints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 22:44:14 +01:00
Benjamin Admin	21ea458fcf	feat(ocr-review): replace LLM with rule-based spell-checker (REVIEW_ENGINE=spell) - Add pyspellchecker (MIT) to requirements for EN+DE dictionary lookup - New spell_review_entries_sync() + spell_review_entries_streaming(): - Dictionary-backed substitution: checks if corrected word is known - Structural rule: digit at pos 0 + lowercase rest → most likely letter (e.g. "8en"→"Ben", "8uch"→"Buch", "5ee"→"See", "6eld"→"Geld") - Pattern rule: "\|." → "1." for numbered list prefixes - Standalone "\|" → "I" (capital I) - IPA entries still protected via existing _entry_needs_review filter - Headings/untranslated words (e.g. "Story") are untouched (no susp. chars) - llm_review_entries + llm_review_entries_streaming: route via REVIEW_ENGINE env var ("spell" default, "llm" to restore previous behaviour) - docker-compose.yml: REVIEW_ENGINE=${REVIEW_ENGINE:-spell} - LLM code preserved for fallback (set REVIEW_ENGINE=llm in .env) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 15:04:27 +01:00
Benjamin Admin	d481e0087b	deps: add eng-to-ipa for IPA dictionary lookup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 20:23:40 +01:00
Benjamin Admin	4ec7c20490	feat(ocr-pipeline): add rapidocr + onnxruntime to requirements RapidOCR uses PaddleOCR models on ONNX Runtime, works natively on ARM64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:08:21 +01:00
Benjamin Boenisch	5a31f52310	Initial commit: breakpilot-lehrer - Lehrer KI Platform Services: Admin-Lehrer, Backend-Lehrer, Studio v2, Website, Klausur-Service, School-Service, Voice-Service, Geo-Service, BreakPilot Drive, Agent-Core Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:47:26 +01:00

8 Commits