breakpilot-lehrer

Files

T

Benjamin Admin 364086b86e feat: auto-insert syllable dividers via pyphen on dictionary pages

OCR engines don't detect | pipe chars used as syllable dividers in
dictionaries. After dictionary detection (is_dict=True), use pyphen
(MIT) to insert syllable breaks into headword cells. Tries DE first,
then EN. Skips IPA content, short words, and cells already containing |.

Also adds pyphen>=0.16.0 to requirements.txt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-24 14:17:26 +01:00

data

feat(ocr-pipeline): British/American IPA pronunciation choice

2026-03-01 11:08:52 +01:00

mail

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

migrations

feat: Sprint 1 — IPA hardening, regression framework, ground-truth review

2026-03-23 09:21:27 +01:00

models

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

policies

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

routes

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

services

feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management

2026-03-23 09:53:02 +01:00

tests

fix(ocr-pipeline): improve page crop spine detection and cell assignment

2026-03-24 09:23:30 +01:00

admin_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

config.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

country_metadata.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

cv_box_detect.py

feat: run shading-based box detection alongside line detection

2026-03-16 08:12:52 +01:00

cv_cell_grid.py

fix: word_boxes auch fuer breite Spalten (Full-Page OCR) speichern

2026-03-11 20:41:29 +01:00

cv_color_detect.py

Fix bullet overlap disambiguation + raise red threshold to 90

2026-03-20 18:21:00 +01:00

cv_doclayout_detect.py

feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management

2026-03-23 09:53:02 +01:00

cv_graphic_detect.py

feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management

2026-03-23 09:53:02 +01:00

cv_layout.py

feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns

2026-03-24 07:45:39 +01:00

cv_ocr_engines.py

fix: preserve pipe syllable dividers + detect alphabet sidebar columns

2026-03-24 13:52:11 +01:00

cv_preprocessing.py

fix: swap 90°/270° rotation direction in orientation detection

2026-03-17 16:39:15 +01:00

cv_review.py

fix: _group_words_into_lines nach cv_ocr_engines.py verschieben

2026-03-09 15:24:56 +01:00

cv_vocab_pipeline.py

feat: Words-First Grid Builder (bottom-up alternative zu cell_grid_v2)

2026-03-12 06:46:05 +01:00

cv_vocab_types.py

feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns

2026-03-24 07:45:39 +01:00

cv_words_first.py

fix(ocr-pipeline): improve page crop spine detection and cell assignment

2026-03-24 09:23:30 +01:00

dsfa_corpus_ingestion.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

dsfa_rag_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

eh_pipeline.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

eh_templates.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

embedding_client.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

full_compliance_pipeline.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

full_reingestion.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

github_crawler.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

grid_editor_api.py

feat: auto-insert syllable dividers via pyphen on dictionary pages

2026-03-24 14:17:26 +01:00

handwriting_htr_api.py

feat(klausur): Handschrift entfernen + Klausur-HTR implementiert

2026-03-03 12:04:26 +01:00

hybrid_search.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

hybrid_vocab_extractor.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

hyde.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

legal_corpus_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

legal_corpus_ingestion.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

legal_corpus_robust.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

legal_templates_ingestion.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

main.py

feat: add Excel-like grid editor for OCR overlay (Kombi mode step 6)

2026-03-14 23:41:03 +01:00

metrics_db.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

migrate_rag_chunks.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

minio_storage.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

nibis_ingestion.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

nru_worksheet_generator.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

ocr_labeling_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

ocr_pipeline_api.py

Add Ground Truth regression test system for OCR pipeline

2026-03-18 13:46:48 +01:00

ocr_pipeline_auto.py

refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules

2026-03-18 08:42:00 +01:00

ocr_pipeline_common.py

feat: add Woerterbuch category + column add/delete in grid editor

2026-03-23 16:27:12 +01:00

ocr_pipeline_geometry.py

Invalidate grid_editor_result when exclude regions change

2026-03-19 09:19:09 +01:00

ocr_pipeline_ocr_merge.py

refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules

2026-03-18 08:42:00 +01:00

ocr_pipeline_overlays.py

Preserve alphabetic marker columns, broaden junk filter, enable IPA in grid

2026-03-18 11:08:23 +01:00

ocr_pipeline_postprocess.py

refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules

2026-03-18 08:42:00 +01:00

ocr_pipeline_regression.py

feat: save automatic grid snapshot before manual edits for GT comparison

2026-03-24 13:16:44 +01:00

ocr_pipeline_rows.py

refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules

2026-03-18 08:42:00 +01:00

ocr_pipeline_session_store.py

Add Ground Truth regression test system for OCR pipeline

2026-03-18 13:46:48 +01:00

ocr_pipeline_sessions.py

feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns

2026-03-24 07:45:39 +01:00

ocr_pipeline_words.py

refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules

2026-03-18 08:42:00 +01:00

orientation_crop_api.py

Add double-page spread detection to frontend pipeline

2026-03-24 11:09:44 +01:00

page_crop.py

fix(ocr-pipeline): improve page crop spine detection and cell assignment

2026-03-24 09:23:30 +01:00

pdf_export.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

pdf_extraction.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

pipeline_checkpoints.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

pyproject.toml

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

qdrant_service.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

rag_evaluation.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

rbac.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

requirements.txt

feat: auto-insert syllable dividers via pyphen on dictionary pages

2026-03-24 14:17:26 +01:00

reranker.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

self_rag.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

storage.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

template_sources.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

tesseract_vocab_extractor.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

training_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

training_export_service.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

trocr_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

upload_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

vocab_session_store.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

vocab_worksheet_api.py

fix: Edge-Gaps in _split_broad_columns ignorieren + return-Tuple bei leerem Ergebnis

2026-03-07 22:16:29 +01:00

worksheet_cleanup_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

worksheet_editor_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

zeugnis_api.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

zeugnis_crawler.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

zeugnis_models.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00

zeugnis_seed_data.py

Initial commit: breakpilot-lehrer - Lehrer KI Platform

2026-02-11 23:47:26 +01:00