breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	e1ae5d5fa9	fix: Edge-Gaps in _split_broad_columns ignorieren + return-Tuple bei leerem Ergebnis Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m57s Details CI / test-python-agent-core (push) Successful in 14s Details CI / test-nodejs-website (push) Successful in 16s Details Gaps die den Spaltenrand beruehren (Margins) werden jetzt ausgeschlossen, nur interne Gaps werden als Split-Kandidaten betrachtet. Behebt das Problem dass trailing whitespace faelschlich als groesster Gap gewaehlt wurde. Early-return in _run_ocr_pipeline_for_page gibt jetzt korrekt ([], rotation) statt [] zurueck. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 22:16:29 +01:00
Benjamin Admin	4e8ea77140	fix: leere Spalten als strukturell behandeln + 2-Spalten-Layout korrekt labeln Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m50s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details Spalten mit <=2 Woertern und <15% Breite werden jetzt als column_marker statt als content-Spalte klassifiziert. Bei 2 breiten Content-Spalten wird die rechte als column_example statt column_de gelabelt, da die linke Spalte EN+DE kombiniert enthaelt. OSD-Zoom von 1.0 auf 2.0 erhoeht fuer zuverlaessigere Orientierungserkennung. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 19:35:21 +01:00
Benjamin Admin	e8ba5ec073	fix: Orientierungserkennung beim PDF-Upload statt erst bei OCR Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 23s Details CI / test-go-edu-search (push) Successful in 23s Details CI / test-python-klausur (push) Failing after 1m47s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 17s Details Rotation wird jetzt in upload_pdf_get_info() erkannt, damit Thumbnails bei der Seitenauswahl bereits richtig herum angezeigt werden. Debug-Logging fuer _split_broad_columns hinzugefuegt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 19:11:45 +01:00
Benjamin Admin	02631dc4e0	feat: breite Spalten per Word-Gap splitten + gedrehte Scans im Frontend anzeigen Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 15s Details _split_broad_columns() erkennt EN/DE-Gemisch in breiten Spalten via Word-Coverage-Analyse und trennt sie am groessten Luecken-Gap. Thumbnails und Page-Images werden serverseitig per fitz rotiert, Frontend laedt Thumbnails nach OCR-Processing neu. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 18:16:32 +01:00
Benjamin Admin	a5635e0c43	feat: automatische Orientierungserkennung fuer umgedrehte Scans Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 23s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m50s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 15s Details Tesseract OSD erkennt 0/90/180/270° Rotation und korrigiert automatisch vor dem Deskew. Loest das Problem mit Buchscannern, bei denen jede 2. Seite auf dem Kopf steht. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 17:26:21 +01:00
Benjamin Admin	7a1bd5e82d	refactor: positional_column_regions auch in OCR Pipeline verwenden Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 24s Details CI / test-python-klausur (push) Failing after 1m48s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details Shared Funktion positional_column_regions() in cv_vocab_pipeline.py, wird jetzt von beiden Pfaden (Vocab-Worksheet + OCR Pipeline Admin) genutzt. classify_column_types() bleibt als Legacy erhalten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 17:20:51 +01:00
Benjamin Admin	a5df2b6e15	fix: Spaltenklassifikation im Vocab-Worksheet durch positionsbasierte Zuordnung ersetzen Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m47s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 20s Details Sprachbasiertes Scoring (classify_column_types) verursachte vertauschte Spalten auf Seite 3 bei Beispielsaetzen mit vielen englischen Funktionswoertern. Neue _positional_column_regions() ordnet Spalten rein geometrisch (links→rechts) zu. OCR Pipeline Admin bleibt unveraendert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 17:07:11 +01:00
Benjamin Admin	d39d249daa	feat: add pass 3 text-line regression to deskew pipeline Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m53s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 15s Details After iterative projection (pass 1) and word-alignment (pass 2), a third pass uses Tesseract word positions + linear regression per text line to measure and correct residual rotation. This catches cases where passes 1-2 leave significant slope (e.g. 1.7° residual on heavily skewed scans). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:53:11 +01:00
Benjamin Admin	538d5c732e	feat: two-pass deskew with wider angle range and residual correction Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details - Increase iterative deskew coarse_range from ±2° to ±5° to handle heavily skewed scans - New deskew_two_pass(): runs iterative projection first, then word-alignment on the corrected image to detect/fix residual skew (applied when residual ≥ 0.3°) - OCR pipeline API auto_deskew now uses deskew_two_pass by default - Vocab worksheet _run_ocr_pipeline_for_page uses deskew_two_pass - Deskew result now includes angle_residual and two_pass_debug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:34:57 +01:00
Benjamin Admin	b7ae36e92b	feat: use OCR pipeline instead of LLM vision for vocab worksheet extraction Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 17s Details process-single-page now runs the full CV pipeline (deskew → dewarp → columns → rows → cell-first OCR v2 → LLM review) for much better extraction quality. Falls back to LLM vision if pipeline imports are unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 15:35:44 +01:00
Benjamin Boenisch	5a31f52310	Initial commit: breakpilot-lehrer - Lehrer KI Platform Services: Admin-Lehrer, Backend-Lehrer, Studio v2, Website, Klausur-Service, School-Service, Voice-Service, Geo-Service, BreakPilot Drive, Agent-Core Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:47:26 +01:00

11 Commits