breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	00eb9f26f6	Add "OCR neu + Grid" button to Grid Review CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 51s Details CI / test-go-edu-search (push) Successful in 42s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 55s Details New endpoint POST /sessions/{id}/rerun-ocr-and-build-grid that: 1. Runs scan quality assessment 2. Applies CLAHE enhancement if degraded (controlled by enhance toggle) 3. Re-runs dual-engine OCR (RapidOCR + Tesseract) with min_conf filter 4. Merges OCR results and stores updated word_result 5. Builds grid with max_columns constraint Frontend: Orange "OCR neu + Grid" button in GridToolbar. Unlike "Neu berechnen" (which only rebuilds grid from existing words), this button re-runs the full OCR pipeline with quality settings. Now CLAHE toggle actually has an effect — it enhances the image before OCR runs, not after. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:55:01 +02:00
Benjamin Admin	141f69ceaa	Fix: max_columns now works in OCR Kombi build-grid pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m31s Details CI / test-python-agent-core (push) Successful in 27s Details CI / test-nodejs-website (push) Successful in 30s Details The max_columns parameter was only implemented in cv_words_first.py (vocab-worksheet path) but NOT in _build_grid_core which is what the admin OCR Kombi pipeline uses. The Kombi pipeline uses grid_editor_helpers._cluster_columns_by_alignment() which has its own column detection. Fix: Post-processing step 5k merges narrowest columns after grid building when zone has more columns than max_columns. Cells from merged columns get their text appended to the target column. min_conf word filtering was already working (applied before grid build). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:40:39 +02:00
Benjamin Admin	2baad68060	Remove A/B testing toggles from studio-v2 (customer frontend) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m50s Details CI / test-python-agent-core (push) Successful in 38s Details CI / test-nodejs-website (push) Successful in 43s Details Dev-only toggles belong in admin-lehrer (port 3002) only. The customer frontend runs the pipeline with optimal defaults and shows only the finished results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:18:44 +02:00
Benjamin Admin	25e5a7415a	Add A/B testing toggles to OCR Kombi Grid Review CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m27s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 24s Details Quality step toggles in admin-lehrer StepGridReview (port 3002): - CLAHE checkbox (Step 3: image enhancement) - MaxCol dropdown (Step 2: column limit, 0=off) - MinConf dropdown (Step 1: OCR confidence, 0=auto) Parameters flow through: StepGridReview → useGridEditor → build-grid endpoint → _build_grid_core. MinConf filters words before grid building. Toggle settings, click "Neu berechnen" to test each step individually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:09:17 +02:00
Benjamin Admin	545c8676b0	Add A/B testing toggles for OCR quality steps CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 18s Details Each quality improvement step can now be toggled independently: - CLAHE checkbox (Step 3: image enhancement on/off) - MaxCols dropdown (Step 2: 0=unlimited, 2-5) - MinConf dropdown (Step 1: auto/20/30/40/50/60) Backend: Query params enhance, max_cols, min_conf on process-single-page. Response includes active_steps dict showing which steps are enabled. Frontend: Toggle controls in VocabularyTab above the table. This allows empirical A/B testing of each step on the same scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 15:27:26 +02:00
Benjamin Admin	2f34ee9ede	Add scan quality scoring, column limit, image enhancement (Steps 1-3) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 32s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 28s Details CI / test-nodejs-website (push) Successful in 20s Details Step 1: scan_quality.py — Laplacian blur + contrast scoring, adjusts OCR confidence threshold (40 for good scans, 30 for degraded). Quality report included in API response + shown in frontend. Step 2: max_columns parameter in cv_words_first.py — limits column detection to 3 for vocab tables, preventing phantom columns D/E from degraded OCR fragments. Step 3: ocr_image_enhance.py — CLAHE contrast + bilateral filter denoising + unsharp mask, only for degraded scans (gated by quality score). Pattern from handwriting_htr_api.py. Frontend: quality info shown in extraction status after processing. Reprocess button now derives pages from vocabulary data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 14:58:39 +02:00
Benjamin Admin	5a154b744d	fix: migrate ocr-pipeline types to ocr-kombi after page deletion Types from deleted ocr-pipeline/types.ts inlined into ocr-kombi/types.ts. All imports updated across components/ocr-kombi/ and components/ocr-pipeline/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 14:22:09 +02:00
Benjamin Admin	f39cbe9283	refactor: remove unused pages and backends (model-management, OCR legacy, GPU/vast.ai, video-chat, matrix) Deleted pages: - /ai/model-management (mock data only, no real backend) - /ai/ocr-compare (old /vocab/ backend, replaced by ocr-kombi) - /ai/ocr-pipeline (minimal session browser, redundant) - /ai/ocr-overlay (legacy monolith, redundant) - /ai/gpu (vast.ai GPU management, no longer used) - /infrastructure/gpu (same) - /communication/video-chat (moved to core) - /communication/matrix (moved to core) Deleted backends: - backend-lehrer/infra/vast_client.py + vast_power.py - backend-lehrer/meetings_api.py + jitsi_api.py - website/app/api/admin/gpu/ - edu-search-service/scripts/vast_ai_extractor.py Total: ~7,800 LOC removed. All code preserved in git history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 13:14:12 +02:00
Benjamin Admin	5abdfa202e	chore: install refactoring guardrails (Phase 0) [guardrail-change] - scripts/check-loc.sh: LOC budget checker (500 LOC hard cap) - .claude/rules/architecture.md: split triggers, patterns per language - .claude/rules/loc-exceptions.txt: documented escape hatches - AGENTS.python.md: FastAPI conventions (routes thin, service layer) - AGENTS.go.md: Go/Gin conventions (handler ≤40 LOC) - AGENTS.typescript.md: Next.js conventions (page.tsx ≤250 LOC, colocation) - CLAUDE.md extended with guardrail section + commit markers 273 files currently exceed 500 LOC — to be addressed phase by phase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:25:36 +02:00
Benjamin Admin	9b0e310978	Fix: reprocess button works after session resume + apply merge logic CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 34s Details Two bugs fixed: 1. reprocessPages() failed silently after session resume because successfulPages was empty. Now derives pages from vocabulary source_page or selectedPages as fallback. 2. process-single-page endpoint built vocabulary entries WITHOUT applying merge logic (_merge_wrapped_rows, _merge_continuation_rows). Now applies full merge pipeline after vocabulary extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:46:15 +02:00
Benjamin Admin	46c2acb2f4	Add "Neu verarbeiten" button to VocabularyTab CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 53s Details CI / test-go-edu-search (push) Successful in 53s Details CI / test-python-klausur (push) Failing after 2m44s Details CI / test-python-agent-core (push) Successful in 1m3s Details CI / test-nodejs-website (push) Successful in 36s Details Allows reprocessing pages from the vocabulary view to apply new merge logic without navigating back to page selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:37:13 +02:00
Benjamin Admin	b8f1b71652	Fix: merge cell-wrap continuation rows in vocabulary extraction CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 58s Details CI / test-go-edu-search (push) Successful in 48s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has started running Details When textbook authors wrap text within a cell (e.g. long German translations), OCR treats each physical line as a separate row. New _merge_wrapped_rows() detects this by checking if the primary column (EN) is empty — indicating a continuation, not a new entry. Handles: empty EN + DE text, empty EN + example text, parenthetical continuations like "(bei)", triple wraps, comma-separated lists. 12 tests added covering all cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:32:45 +02:00
Benjamin Admin	6a165b36e5	Add Phase 5.1: LearningProgress dashboard widget CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 51s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 41s Details CI / test-nodejs-website (push) Successful in 32s Details Eltern-Dashboard widget showing per-unit learning stats: accuracy ring, coins, crowns, streak, and recent unit list. Uses ProgressRing and CrownBadge gamification components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:26:44 +02:00
Benjamin Admin	9dddd80d7a	Add Phases 3.2-4.3: STT, stories, syllables, gamification CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has started running Details Phase 3.2 — MicrophoneInput.tsx: Browser Web Speech API for speech-to-text recognition (EN+DE), integrated for pronunciation practice. Phase 4.1 — Story Generator: LLM-powered mini-stories using vocabulary words, with highlighted vocab in HTML output. Backend endpoint POST /learning-units/{id}/generate-story + frontend /learn/[unitId]/story. Phase 4.2 — SyllableBow.tsx: SVG arc component for syllable visualization under words, clickable for per-syllable TTS. Phase 4.3 — Gamification system: - CoinAnimation.tsx: Floating coin rewards with accumulator - CrownBadge.tsx: Crown/medal display for milestones - ProgressRing.tsx: Circular progress indicator - progress_api.py: Backend tracking coins, crowns, streaks per unit Also adds "Geschichte" exercise type button to UnitCard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:22:52 +02:00
Benjamin Admin	20a0585eb1	Add interactive learning modules MVP (Phases 1-3.1) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 51s Details CI / test-python-klausur (push) Failing after 2m44s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 34s Details New feature: After OCR vocabulary extraction, users can generate interactive learning modules (flashcards, quiz, type trainer) with one click. Frontend (studio-v2): - Fortune Sheet spreadsheet editor tab in vocab-worksheet - "Lernmodule generieren" button in ExportTab - /learn page with unit overview and exercise type cards - /learn/[unitId]/flashcards — Flip-card trainer with Leitner spaced repetition - /learn/[unitId]/quiz — Multiple choice quiz with explanations - /learn/[unitId]/type — Type-in trainer with Levenshtein distance feedback - AudioButton component using Web Speech API for EN+DE TTS Backend (klausur-service): - vocab_learn_bridge.py: Converts VocabularyEntry[] to analysis_data format - POST /sessions/{id}/generate-learning-unit endpoint Backend (backend-lehrer): - generate-qa, generate-mc, generate-cloze endpoints on learning units - get-qa/mc/cloze data retrieval endpoints - Leitner progress update + next review items endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:13:23 +02:00
Benjamin Admin	4561320e0d	Fix SmartSpellChecker: preserve leading non-alpha text like (= CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m36s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 33s Details The tokenizer regex only matches alphabetic characters, so text before the first word match (like "(= " in "(= I won...") was silently dropped when reassembling the corrected text. Now preserves text[:first_match_start] as a leading prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:41:33 +02:00
Benjamin Admin	596864431b	Rule (a2): switch from allow-list to block-list for symbol removal CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m42s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 36s Details Instead of keeping only specific symbols (_KEEP_SYMBOLS), now only removes explicitly decorative symbols (_REMOVE_SYMBOLS: > < ~ \ ^ etc). All other punctuation (= ( ) ; : - etc.) is preserved by default. This is more robust: any new symbol used in textbooks will be kept unless it's in the small block-list of known decorative artifacts. Fixes: (= token still being removed on page 5 despite being in the allow-list (possibly due to Unicode variants or whitespace). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:34:21 +02:00
Benjamin Admin	c8027eb7f9	Fix: preserve = ; : - and other meaningful symbols in word_boxes CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m38s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Rule (a2) in Step 5i removed word_boxes with no letters/digits as "graphic OCR artifacts". This incorrectly removed = signs used as definition markers in textbooks ("film = 1. Film; 2. filmen"). Added exception list _KEEP_SYMBOLS for meaningful punctuation: = (= =) ; : - – — / + • · ( ) & * → ← ↔ The root cause: PaddleOCR returns "film = 1. Film; 2. filmen" as one block, which gets split into word_boxes ["film", "=", "1.", ...]. The "=" word_box had no alphanumeric chars and was removed as artifact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:18:35 +02:00
Benjamin Admin	ba0f659d1e	Preserve = and (= tokens in grid build and cell text cleanup CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 42s Details = signs are used as definition markers in textbooks ("film = 1. Film"). They were incorrectly removed by two filters: 1. grid_build_core.py Step 5j-pre: _PURE_JUNK_RE matched "=" as artifact noise. Now exempts =, (=, ;, :, - and similar meaningful punctuation tokens. 2. cv_ocr_engines.py _is_noise_tail_token: "pure non-alpha" check removed trailing = tokens. Now exempts meaningful punctuation. Fixes: "film = 1. Film; 2. filmen" losing the = sign, "(= I won and he lost.)" losing the (=. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:04:27 +02:00
Benjamin Admin	50bfd6e902	Fix gutter repair: don't suggest corrections for words with parentheses CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 50s Details CI / test-go-edu-search (push) Successful in 50s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 40s Details CI / test-nodejs-website (push) Successful in 31s Details Words like "probieren)" or "Englisch)" were incorrectly flagged as gutter OCR errors because the closing parenthesis wasn't stripped before dictionary lookup. The spellchecker then suggested "probierend" (replacing ) with d, edit distance 1). Two fixes: 1. Strip trailing/leading parentheses in _try_spell_fix before checking if the bare word is valid — skip correction if it is 2. Add )( to the rstrip characters in the analysis phase so "probieren)" becomes "probieren" for the known-word check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 22:38:22 +02:00
Benjamin Admin	0599c72cc1	Fix IPA continuation: don't replace normal text with IPA CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 19s Details Text like "Betonung auf der 1. Silbe: profit ['profit]" was incorrectly detected as garbled IPA and replaced with generated IPA transcription of the previous row's example sentence. Added guard: if the cell text contains >=3 recognizable words (3+ letter alpha tokens), it's normal text, not garbled IPA. Garbled IPA is typically short and has no real dictionary words. Fixes: Row 13 C3 showing IPA instead of pronunciation hint text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 22:28:58 +02:00
Benjamin Admin	5fad2d420d	test+docs(rag): Tests und Entwicklerdoku fuer RAG Landkarte - 44 Vitest-Tests: JSON-Struktur, Branchen-Zuordnung, Applicability Notes, Dokumenttyp-Verteilung, keine Duplikate - MkDocs-Seite: Architektur, 10 Branchen, Zuordnungslogik, Integration in andere Projekte, Datenquellen Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 20:47:54 +02:00
Benjamin Admin	c8e5e498b5	feat(rag): Applicability Notes UI + Branchen-Review - Matrix-Zeilen aufklappbar: Klick zeigt Branchenrelevanz-Erklaerung, Beschreibung und Gueltigkeitsdatum - 27 Branchen-Zuordnungen korrigiert: - OWASP/NIST/CISA/SBOM-Standards → alle (Kunden entwickeln Software) - BSI-TR-03161 → leer (DiGA, nicht Zielmarkt) - BSI 200-4, ENISA Supply Chain → alle (CRA/NIS2-Pflicht) - EAA/BFSG → +automotive (digitale Interfaces) - 264 horizontal, 42 sektorspezifisch, 14 nicht zutreffend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 19:15:01 +02:00
Benjamin Admin	261f686dac	Add OCR Pipeline Extensions developer docs + update vocab-worksheet docs CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 39s Details CI / test-python-klausur (push) Failing after 2m36s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 40s Details New: .claude/rules/ocr-pipeline-extensions.md - Complete documentation for SmartSpellChecker, Box-Grid-Review (Step 11), Ansicht/Spreadsheet (Step 12), Unified Grid - All 14 pipeline steps listed - Backend/frontend file structure with line counts - 66 tests documented - API endpoints, data flow, formatting rules Updated: .claude/rules/vocab-worksheet.md - Added Frontend Refactoring section (page.tsx → 14 files) - Updated format extension instructions (constants.ts instead of page.tsx) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:35:16 +02:00
Benjamin Admin	3d3c2b30db	Add tests for unified_grid and cv_box_layout CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 50s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m30s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 34s Details test_unified_grid.py (10 tests): - Dominant row height calculation (regular, gaps filtered, single row) - Box classification (full-width, partial left/right, text line count) - Unified grid building (content-only, box integration, cell tagging) test_box_layout.py (13 tests): - Layout classification (header_only, flowing, bullet_list) - Line grouping by y-proximity - Flowing layout indent grouping (bullet + continuations → \n) - Row/column field completeness for GridTable compatibility Total: 66 tests passing (43 smart_spell + 13 box_layout + 10 unified) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:18:52 +02:00
Benjamin Admin	1d22f649ae	fix(rag): Branchen auf 10 VDMA/VDA/BDI-Sektoren korrigiert Alte 17 "Branchen" (inkl. IoT, KI, HR, KRITIS) durch 10 echte Industriesektoren ersetzt: Automotive, Maschinenbau, Elektrotechnik, Chemie, Metall, Energie, Transport, Handel, Konsumgueter, Bau. Zuordnungslogik: 244 horizontal (alle), 65 sektorspezifisch, 11 nicht zutreffend (Finanz/Medizin/Plattformen). 102 applicability_notes mit Begruendung pro Regulierung. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:56:28 +02:00
Benjamin Admin	610825ac14	SpreadsheetView: add bullet marker (•) for multi-line cells CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 38s Details Multi-line cells (containing \n) that don't already start with a bullet character get • prepended in the frontend. This ensures bullet points are visible regardless of whether the backend inserted them (depends on when boxes were last rebuilt). Skips header rows and cells that already have •, -, or – prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:53:54 +02:00
Benjamin Admin	6aec4742e5	SpreadsheetView: keep bullets as single cells with text-wrap CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 27s Details CI / test-nodejs-website (push) Successful in 31s Details Revert row expansion — multi-line bullet cells stay as single cells with \n and text-wrap (tb='2'). This way the text reflows when the user resizes the column, like normal Excel behavior. Row height auto-scales by line count (24px * lines). Vertical alignment: top (vt=0) for multi-line cells. Removed leading-space indentation hack (didn't work reliably). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:07:07 +02:00
Benjamin Admin	0491c2eb84	feat(rag): dynamische Branchen-Regulierungs-Matrix aus JSON Hardcodierte REGULATIONS/INDUSTRIES/INDUSTRY_REGULATION_MAP durch JSON-Import ersetzt. 320 Dokumente in 17 Kategorien mit collapsible Sektionen pro doc_type. page.tsx von 3672 auf 2655 Zeilen reduziert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:01:51 +02:00
Benjamin Admin	f2bc62b4f5	SpreadsheetView: bullet indentation, expanded rows, box borders CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 1m4s Details Multi-line cells (\n): expanded into separate rows so each line gets its own cell. Continuation lines (after •) indented with leading spaces. Bullet marker lines (•) are bold. Font-size detection: cells with word_box height >1.3x median get bold and larger font (fs=12) for box titles. Headers: is_header rows always bold with light background tint. Box borders: thick colored outside border + thin inner grid lines. Content zone: light gray grid borders. Auto-fit column widths from longest text per column. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:15:43 +02:00
Benjamin Admin	674c9e949e	SpreadsheetView: auto-fit column widths to longest text CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Failing after 22s Details CI / test-go-edu-search (push) Failing after 23s Details CI / test-python-klausur (push) Failing after 11s Details CI / test-python-agent-core (push) Failing after 8s Details CI / test-nodejs-website (push) Failing after 24s Details Column widths now calculated from the longest text in each column (~7.5px per character + padding). Takes the maximum of auto-fit width and scaled original pixel width. Multi-line cells: uses the longest line for width calculation. Spanning header cells excluded from width calculation (they span multiple columns and would inflate single-column widths). Minimum column width: 60px. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:43:50 +02:00
Benjamin Admin	e131aa719e	SpreadsheetView: formatting improvements for Excel-like display CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Failing after 21s Details CI / test-go-edu-search (push) Failing after 19s Details CI / test-python-klausur (push) Failing after 11s Details CI / test-python-agent-core (push) Failing after 10s Details CI / test-nodejs-website (push) Failing after 23s Details Height: sheet height auto-calculated from row count (26px/row + toolbar), no more cutoff at 21 rows. Row count set to exact (no padding). Box borders: thick colored outside border + thin inner grid lines. Content zone: light gray grid lines on all cells. Headers: bold (bl=1) for is_header rows. Larger font detected via word_box height comparison (>1.3x median → fs=12 + bold). Box cells: light tinted background from box_bg_hex. Header cells in boxes: slightly stronger tint. Multi-line cells: text wrap enabled (tb='2'), \n preserved. Bullet points (•) and indentation preserved in cell text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:29:50 +02:00
Benjamin Admin	17f0fdb2ed	Refactor: extract _build_grid_core into grid_build_core.py + clean StepAnsicht CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Failing after 19s Details CI / test-go-edu-search (push) Failing after 23s Details CI / test-python-klausur (push) Failing after 10s Details CI / test-python-agent-core (push) Failing after 9s Details CI / test-nodejs-website (push) Failing after 26s Details grid_editor_api.py: 2411 → 474 lines - Extracted _build_grid_core() (1892 lines) into grid_build_core.py - API file now only contains endpoints (build, save, get, gutter, box, unified) StepAnsicht.tsx: 212 → 112 lines - Removed useGridEditor imports (not needed for read-only spreadsheet) - Removed unified grid fetch/build (not used with multi-sheet approach) - Removed Spreadsheet/Grid toggle (only spreadsheet mode now) - Simple: fetch grid-editor data → pass to SpreadsheetView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:54:55 +02:00
Benjamin Admin	d4353d76fb	SpreadsheetView: multi-sheet tabs instead of unified single sheet CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 36s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 31s Details Each zone becomes its own Excel sheet tab with independent column widths: - Sheet "Vokabeln": main content zone with EN/DE/example columns - Sheet "Pounds and euros": Box 1 with its own 4-column layout - Sheet "German leihen": Box 2 with single column for flowing text This solves the column-width conflict: boxes have different column widths optimized for their content, which is impossible in a single unified sheet (Excel limitation: column width is per-column, not per-cell). Sheet tabs visible at bottom (showSheetTabs: true). Box sheets get colored tab (from box_bg_hex). First sheet active by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:51:21 +02:00
Benjamin Admin	b42f394833	Integrate Fortune Sheet spreadsheet editor in StepAnsicht CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m40s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 33s Details Install @fortune-sheet/react (MIT, v1.0.4) as Excel-like spreadsheet component. New SpreadsheetView.tsx converts unified grid data to Fortune Sheet format (celldata, merge config, column/row sizes). StepAnsicht now has Spreadsheet/Grid toggle: - Spreadsheet mode: full Fortune Sheet with toolbar (bold, italic, color, borders, merge cells, text wrap, undo/redo) - Grid mode: existing GridTable for quick editing Box-origin cells get light tinted background in spreadsheet view. Colspan cells converted to Fortune Sheet merge format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:08:03 +02:00
Benjamin Admin	c1a903537b	Unified Grid: merge all zones into single Excel-like grid CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 33s Details Backend (unified_grid.py): - build_unified_grid(): merges content + box zones into one zone - Dominant row height from median of content row spacings - Full-width boxes: rows integrated directly - Partial-width boxes: extra rows inserted when box has more text lines than standard rows fit (e.g., 7 lines in 5-row height) - Box-origin cells tagged with source_zone_type + box_region metadata Backend (grid_editor_api.py): - POST /sessions/{id}/build-unified-grid → persists as unified_grid_result - GET /sessions/{id}/unified-grid → retrieve persisted result Frontend: - GridEditorCell: added source_zone_type, box_region fields - GridTable: box-origin cells get tinted background + left border - StepAnsicht: split-view with original image (left) + editable unified GridTable (right). Auto-builds on first load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 23:37:55 +02:00
Benjamin Admin	7085c87618	StepAnsicht: dominant row height for content + proportional box rows CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 31s Details Content sections: use dominant (median) row height from all content rows instead of per-section average. This ensures uniform row height above and below boxes (the standard case on textbook pages). Box sections: distribute height proportionally by text line count per row. A header (1 line) gets 1/7 of box height, a bullet with 3 lines gets 3/7. Fixes Box 2 where row 3 was cut off because even distribution didn't account for multi-line cells. Removed overflow:hidden from box container to prevent clipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:43:02 +02:00
Benjamin Admin	1b7e095176	StepAnsicht: fix row filtering for partial-width boxes CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 36s Details Content rows were incorrectly filtered out when their Y overlapped with a box, even if the box only covered the right half of the page. Now checks both Y AND X overlap — rows are only excluded if they start within the box's horizontal range. Fixes: rows next to Box 2 (lend, coconut, taste) were missing from reconstruction because Box 2 (x=871, w=525) only covers the right side, but left-side content rows at x≈148 were being filtered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:00:28 +02:00
Benjamin Admin	dcb873db35	StepAnsicht: section-based layout with averaged row heights CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 38s Details CI / test-python-klausur (push) Failing after 2m28s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 40s Details Major rewrite of reconstruction rendering: - Page split into vertical sections (content/box) around box boundaries - Content sections: uniform row height = (last_row - first_row) / (n-1) - Box sections: rows evenly distributed within box height - Content rows positioned absolutely at original y-coordinates - Font size derived from row height (55% of row height) - Multi-line cells (bullets) get expanded height with indentation - Boxes render at exact bbox position with colored border - Preparation for unified grid where boxes become part of main grid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:29:40 +02:00
Benjamin Admin	fd39d13d06	StepAnsicht: use server-rendered OCR overlay image CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m38s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 24s Details Replace manual word_box positioning (wild/unsnapped) with the server-rendered words-overlay image from the OCR step endpoint. This shows the same cleanly snapped red letters as the OCR step. Endpoint: /sessions/{id}/image/words-overlay Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:26:54 +02:00
Benjamin Admin	c5733a171b	StepAnsicht: fix font size and row spacing to match original CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details - Font: use font_size_suggestion_px * scale directly (removed 0.85 factor) - Row height: calculate from row-to-row spacing (y_min of next row minus y_min of current row) instead of text height (y_max - y_min). This produces correct line spacing matching the original layout. - Multi-line cells: height multiplied by line count Content zone should now span from ~250 to ~2050 matching the original. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:24:27 +02:00
Benjamin Admin	18213f0bde	StepAnsicht: split-view with coordinate grid for comparison CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 36s Details Left panel: Original scan + OCR word overlay (red text at exact word_box positions) + coordinate grid Right panel: Reconstructed layout + same coordinate grid Features: - Coordinate grid toggle with 50/100/200px spacing options - Grid lines labeled with pixel coordinates in original image space - Both panels share the same scale for direct visual comparison - OCR overlay shows detected text in red mono font at original positions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:00:22 +02:00
Benjamin Admin	cd8eb6ce46	Add Ansicht step (Step 12) — read-only page layout preview CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 49s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 36s Details New pipeline step showing the reconstructed page with all zones positioned at their original coordinates: - Content zones with vocabulary grid cells - Box zones with colored borders (from structure detection) - Colspan cells rendered across multiple columns - Multi-line cells (bullets) with pre-wrap whitespace - Toggle to overlay original scan image at 15% opacity - Proportionally scaled to viewport width - Pure CSS positioning (no canvas/Fabric.js) Pipeline: 14 steps (0-13), Ground Truth moved to Step 13. Added colspan field to GridEditorCell type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 22:42:33 +02:00
Benjamin Admin	2c2bdf903a	Fix GridTable: replace ternary chain with IIFE for cell rendering CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 36s Details CI / test-python-klausur (push) Failing after 2m28s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 31s Details Chained ternary (colored ? div : multiline ? textarea : input) caused webpack SWC parser issues. Replaced with IIFE {(() => { if/return })()} which is more robust and readable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:10:22 +02:00
Benjamin Admin	947ff6bdcb	Fix JSX ternary nesting for textarea/input in GridTable CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m32s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 28s Details Remove extra curly braces around the textarea/input ternary that caused webpack syntax error. The ternary is now a chained condition: hasColoredWords ? <div> : text.includes('\n') ? <textarea> : <input> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:02:22 +02:00
Benjamin Admin	92e4021898	Fix GridTable JSX syntax error in colspan rendering CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 31s Details CI / test-go-edu-search (push) Successful in 42s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 39s Details Mismatched closing tags from previous colspan edit caused webpack build failure. Cleaned up spanning cell map() return structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:52:26 +02:00
Benjamin Admin	108f1b1a2a	GridTable: render multi-line cells with textarea CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 34s Details Cells containing \n (bullet items with continuation lines) now use <textarea> instead of <input type=text>, making all lines visible. Row height auto-expands based on line count in the cell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:17:29 +02:00
Benjamin Admin	48de4d98cd	Fix infinite loop in StepBoxGridReview auto-build CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m41s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 35s Details Auto-build was triggering on every grid.zones.length change, which happens on every rebuild (zone indices increment). Now uses a ref to ensure auto-build fires only once. Also removed boxZones.length===0 condition that could trigger unnecessary builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:06:11 +02:00
Benjamin Admin	b5900f1aff	Bullet indentation detection: group continuation lines into bullets CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m49s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 34s Details Flowing/bullet_list layout now analyzes left-edge indentation: - Lines at minimum indent = bullet start / main level - Lines indented >15px more = continuation (belongs to previous bullet) - Continuation lines merged with \n into parent bullet cell - Missing bullet markers (•) auto-added when pattern is clear Example: 7 OCR lines → 3 items (1 header + 2 bullets × 3 lines each) "German leihen" header, then two bullet groups with indented examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:57:16 +02:00
Benjamin Admin	baac98f837	Filter false-positive boxes in header/footer margins CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 55s Details CI / test-go-edu-search (push) Successful in 1m0s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 27s Details CI / test-nodejs-website (push) Successful in 27s Details Boxes whose vertical center falls within top/bottom 7% of image height are filtered out (page numbers, unit headers, running footers). At typical scan resolutions, 7% ≈ 2.5cm margin. Fixes: "Box 1" containing just "3" from "Unit 3" page header being incorrectly treated as an embedded box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:38:53 +02:00

1 2 3 4 5 ...

620 Commits