breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	6a9eb048da	Fix: use generic callback type for startActivity in useSessionHandlers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 14:38:24 +02:00
Benjamin Admin	c92f2dc7a7	Fix: startActivity parameter type mismatch in useSessionHandlers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 14:29:40 +02:00
Benjamin Admin	4e27c9b35a	Fix: conversation.muted also needs nullish coalescing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 14:16:25 +02:00
Benjamin Admin	8631971821	Fix: conversation.pinned may be undefined, use nullish coalescing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 14:02:13 +02:00
Benjamin Admin	713f0e7570	Fix: Rename .ts files containing JSX to .tsx - studio-v2/app/meet/_components/types.ts → types.tsx - website/app/admin/mail/constants.ts → constants.tsx - website/app/admin/compliance/types.ts → types.tsx Turbopack requires .tsx extension for files with JSX syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 13:42:03 +02:00
Benjamin Admin	df3f6e65c2	Fix: Remove unused useState import in EHWizardSteps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 13:13:11 +02:00
Benjamin Admin	37db47fcd9	[guardrail-change] Install pre-commit LOC budget hook Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 34s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 25s Details - Rewrote scripts/check-loc.sh: fixed macOS compat, added --staged mode, optimized --all mode with find+wc pipeline - Added .git/hooks/pre-commit that runs check-loc.sh --staged - Extended loc-exceptions.txt with glob patterns for test files (test) and blog content pages (blog/*/page.tsx) The hook blocks commits containing staged files >500 LOC unless exempted. Bypass for emergencies: git commit --no-verify Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 10:28:33 +02:00
Benjamin Admin	bd4b956e3c	[split-required] Split final 43 files (500-668 LOC) to complete refactoring klausur-service (11 files): - cv_gutter_repair, ocr_pipeline_regression, upload_api - ocr_pipeline_sessions, smart_spell, nru_worksheet_generator - ocr_pipeline_overlays, mail/aggregator, zeugnis_api - cv_syllable_detect, self_rag backend-lehrer (17 files): - classroom_engine/suggestions, generators/quiz_generator - worksheets_api, llm_gateway/comparison, state_engine_api - classroom/models (→ 4 submodules), services/file_processor - alerts_agent/api/wizard+digests+routes, content_generators/pdf - classroom/routes/sessions, llm_gateway/inference - classroom_engine/analytics, auth/keycloak_auth - alerts_agent/processing/rule_engine, ai_processor/print_versions agent-core (5 files): - brain/memory_store, brain/knowledge_graph, brain/context_manager - orchestrator/supervisor, sessions/session_manager admin-lehrer (5 components): - GridOverlay, StepGridReview, DevOpsPipelineSidebar - DataFlowDiagram, sbom/wizard/page website (2 files): - DependencyMap, lehrer/abitur-archiv Other: nibis_ingestion, grid_detection_service, export-doclayout-onnx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:41:42 +02:00
Benjamin Admin	451365a312	[split-required] Split remaining 500-680 LOC files (final batch) website (17 pages + 3 components): - multiplayer/wizard, middleware/wizard+test-wizard, communication - builds/wizard, staff-search, voice, sbom/wizard - foerderantrag, mail/tasks, tools/communication, sbom - compliance/evidence, uni-crawler, brandbook (already done) - CollectionsTab, IngestionTab, RiskHeatmap backend-lehrer (5 files): - letters_api (641 → 2), certificates_api (636 → 2) - alerts_agent/db/models (636 → 3) - llm_gateway/communication_service (614 → 2) - game/database already done in prior batch klausur-service (2 files): - hybrid_vocab_extractor (664 → 2) - klausur-service/frontend: api.ts (620 → 3), EHUploadWizard (591 → 2) voice-service (3 files): - bqas/rag_judge (618 → 3), runner (529 → 2) - enhanced_task_orchestrator (519 → 2) studio-v2 (6 files): - korrektur/[klausurId] (578 → 4), fairness (569 → 2) - AlertsWizard (552 → 2), OnboardingWizard (513 → 2) - korrektur/api.ts (506 → 3), geo-lernwelt (501 → 2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:56:45 +02:00
Benjamin Admin	b4613e26f3	[split-required] Split 500-850 LOC files (batch 2) backend-lehrer (10 files): - game/database.py (785 → 5), correction_api.py (683 → 4) - classroom_engine/antizipation.py (676 → 5) - llm_gateway schools/edu_search already done in prior batch klausur-service (12 files): - orientation_crop_api.py (694 → 5), pdf_export.py (677 → 4) - zeugnis_crawler.py (676 → 5), grid_editor_api.py (671 → 5) - eh_templates.py (658 → 5), mail/api.py (651 → 5) - qdrant_service.py (638 → 5), training_api.py (625 → 4) website (6 pages): - middleware (696 → 8), mail (733 → 6), consent (628 → 8) - compliance/risks (622 → 5), export (502 → 5), brandbook (629 → 7) studio-v2 (3 components): - B2BMigrationWizard (848 → 3), CleanupPanel (765 → 2) - dashboard-experimental (739 → 2) admin-lehrer (4 files): - uebersetzungen (769 → 4), manager (670 → 2) - ChunkBrowserQA (675 → 6), dsfa/page (674 → 5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:24:01 +02:00
Benjamin Admin	34da9f4cda	[split-required] Split 700-870 LOC files across all services backend-lehrer (11 files): - llm_gateway/routes/schools.py (867 → 5), recording_api.py (848 → 6) - messenger_api.py (840 → 5), print_generator.py (824 → 5) - unit_analytics_api.py (751 → 5), classroom/routes/context.py (726 → 4) - llm_gateway/routes/edu_search_seeds.py (710 → 4) klausur-service (12 files): - ocr_labeling_api.py (845 → 4), metrics_db.py (833 → 4) - legal_corpus_api.py (790 → 4), page_crop.py (758 → 3) - mail/ai_service.py (747 → 4), github_crawler.py (767 → 3) - trocr_service.py (730 → 4), full_compliance_pipeline.py (723 → 4) - dsfa_rag_api.py (715 → 4), ocr_pipeline_auto.py (705 → 4) website (6 pages): - audit-checklist (867 → 8), content (806 → 6) - screen-flow (790 → 4), scraper (789 → 5) - zeugnisse (776 → 5), modules (745 → 4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:01:18 +02:00
Benjamin Admin	b6983ab1dc	[split-required] Split 500-1000 LOC files across all services backend-lehrer (5 files): - alerts_agent/db/repository.py (992 → 5), abitur_docs_api.py (956 → 3) - teacher_dashboard_api.py (951 → 3), services/pdf_service.py (916 → 3) - mail/mail_db.py (987 → 6) klausur-service (5 files): - legal_templates_ingestion.py (942 → 3), ocr_pipeline_postprocess.py (929 → 4) - ocr_pipeline_words.py (876 → 3), ocr_pipeline_ocr_merge.py (616 → 2) - KorrekturPage.tsx (956 → 6) website (5 pages): - mail (985 → 9), edu-search (958 → 8), mac-mini (950 → 7) - ocr-labeling (946 → 7), audit-workspace (871 → 4) studio-v2 (5 files + 1 deleted): - page.tsx (946 → 5), MessagesContext.tsx (925 → 4) - korrektur (914 → 6), worksheet-cleanup (899 → 6) - useVocabWorksheet.ts (888 → 3) - Deleted dead page-original.tsx (934 LOC) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 23:35:37 +02:00
Benjamin Admin	6811264756	[split-required] Split final batch of monoliths >1000 LOC Python (6 files in klausur-service): - rbac.py (1,132 → 4), admin_api.py (1,012 → 4) - routes/eh.py (1,111 → 4), ocr_pipeline_geometry.py (1,105 → 5) Python (2 files in backend-lehrer): - unit_api.py (1,226 → 6), game_api.py (1,129 → 5) Website (6 page files): - 4x klausur-korrektur pages (1,249-1,328 LOC each) → shared components in website/components/klausur-korrektur/ (17 shared files) - companion (1,057 → 10), magic-help (1,017 → 8) All re-export barrels preserve backward compatibility. Zero import errors verified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 23:17:30 +02:00
Benjamin Admin	b2a0126f14	[split-required] Split remaining Python monoliths (Phase 1 continued) klausur-service (7 monoliths): - grid_editor_helpers.py (1,737 → 5 files: columns, filters, headers, zones) - cv_cell_grid.py (1,675 → 7 files: build, legacy, streaming, merge, vocab) - worksheet_editor_api.py (1,305 → 4 files: models, AI, reconstruct, routes) - legal_corpus_ingestion.py (1,280 → 3 files: registry, chunking, ingestion) - cv_review.py (1,248 → 4 files: pipeline, spell, LLM, barrel) - cv_preprocessing.py (1,166 → 3 files: deskew, dewarp, barrel) - rbac.py, admin_api.py, routes/eh.py remain (next batch) backend-lehrer (1 monolith): - classroom_engine/repository.py (1,705 → 7 files by domain) All re-export barrels preserve backward compatibility. Zero import errors verified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:47:59 +02:00
Benjamin Admin	0b37c5e692	[split-required] Split website + studio-v2 monoliths (Phase 3 continued) Website (14 monoliths split): - compliance/page.tsx (1,519 → 9), docs/audit (1,262 → 20) - quality (1,231 → 16), alerts (1,203 → 10), docs (1,202 → 11) - i18n.ts (1,173 → 8 language files) - unity-bridge (1,094 → 12), backlog (1,087 → 6) - training (1,066 → 8), rag (1,063 → 8) - Deleted index_original.ts (4,899 LOC dead backup) Studio-v2 (5 monoliths split): - meet/page.tsx (1,481 → 9), messages (1,166 → 9) - AlertsB2BContext.tsx (1,165 → 5 modules) - alerts-b2b/page.tsx (1,019 → 6), korrektur/archiv (1,001 → 6) All existing imports preserved. Zero new TypeScript errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 17:52:36 +02:00
Benjamin Admin	b681ddb131	[split-required] Split 58 monoliths across Python, Go, TypeScript (Phases 1-3) Phase 1 — Python (klausur-service): 5 monoliths → 36 files - dsfa_corpus_ingestion.py (1,828 LOC → 5 files) - cv_ocr_engines.py (2,102 LOC → 7 files) - cv_layout.py (3,653 LOC → 10 files) - vocab_worksheet_api.py (2,783 LOC → 8 files) - grid_build_core.py (1,958 LOC → 6 files) Phase 2 — Go (edu-search-service, school-service): 8 monoliths → 19 files - staff_crawler.go (1,402 → 4), policy/store.go (1,168 → 3) - policy_handlers.go (700 → 2), repository.go (684 → 2) - search.go (592 → 2), ai_extraction_handlers.go (554 → 2) - seed_data.go (591 → 2), grade_service.go (646 → 2) Phase 3 — TypeScript (admin-lehrer): 45 monoliths → 220+ files - sdk/types.ts (2,108 → 16 domain files) - ai/rag/page.tsx (2,686 → 14 files) - 22 page.tsx files split into _components/ + _hooks/ - 11 component files split into sub-components - 10 SDK data catalogs added to loc-exceptions - Deleted dead backup index_original.ts (4,899 LOC) All original public APIs preserved via re-export facades. Zero new errors: Python imports verified, Go builds clean, TypeScript tsc --noEmit shows only pre-existing errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 17:28:57 +02:00
Benjamin Admin	9ba420fa91	Fix: Remove broken getKlausurApiUrl and clean up empty lines Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 34s Details CI / test-python-klausur (push) Failing after 2m51s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 29s Details sed replacement left orphaned hostname references in story page and empty lines in getApiBase functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 16:02:04 +02:00
Benjamin Admin	b07f802c24	Fix: Use Next.js API proxy to avoid mixed-content/CORS errors Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 54s Details CI / test-go-edu-search (push) Successful in 53s Details CI / test-python-klausur (push) Failing after 2m57s Details CI / test-python-agent-core (push) Successful in 43s Details CI / test-nodejs-website (push) Successful in 46s Details HTTPS pages cannot fetch from HTTP backend ports. Added Next.js API route proxies for /api/vocabulary, /api/learning-units, /api/progress that forward to backend-lehrer internally (same Docker network, HTTP). All frontend pages now use same-origin requests (getApiBase = '') instead of direct port:8001 connections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 15:49:52 +02:00
Benjamin Admin	0dbfa87058	Fix: pg_trgm optional, table creation no longer fails without it Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 1m9s Details CI / test-go-edu-search (push) Successful in 1m4s Details CI / test-python-klausur (push) Failing after 2m59s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 28s Details Trigram extension and index are now created in a separate try/catch so table creation succeeds even without pg_trgm. Search falls back to ILIKE when trigram functions are not available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 13:51:09 +02:00
Benjamin Admin	c0b723e3b5	Fix: asyncpg needs postgresql:// not postgresql+asyncpg:// Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 1m1s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 42s Details CI / test-nodejs-website (push) Has been cancelled Details Strip SQLAlchemy dialect prefix from DATABASE_URL for asyncpg. Set search_path via server_settings on pool creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 13:45:26 +02:00
Benjamin Admin	7ff9860c69	Add Vocabulary Learning Platform (Phase 1: DB + API + Editor) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 59s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 3m7s Details CI / test-python-agent-core (push) Successful in 24s Details CI / test-nodejs-website (push) Successful in 31s Details Strategic pivot: Studio-v2 becomes a language learning platform. Compliance guardrail added to CLAUDE.md — no scan/OCR of third-party content in customer frontend. Upload of OWN materials remains allowed. Phase 1.1 — vocabulary_db.py: PostgreSQL model for 160k+ words with english, german, IPA, syllables, examples, images, audio, difficulty, tags, translations (multilingual). Trigram search index. Phase 1.2 — vocabulary_api.py: Search, browse, filters, bulk import, learning unit creation from word selection. Creates QA items with enhanced fields (IPA, syllables, image, audio) for flashcards. Phase 1.3 — /vocabulary page: Search bar with POS/difficulty filters, word cards with audio buttons, unit builder sidebar. Teacher selects words → creates learning unit → redirects to flashcards. Sidebar: Added "Woerterbuch" (/vocabulary) and "Lernmodule" (/learn). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 13:36:28 +02:00
Benjamin Admin	7fc5464df7	Switch Vision-LLM Fusion to llama3.2-vision:11b Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 28s Details qwen2.5vl:32b needs ~100GB RAM and crashes Ollama. llama3.2-vision:11b is already installed and fits in memory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:44:59 +02:00
Benjamin Admin	5fbf0f4ee2	Fix: _merge_paddle_tesseract takes 2 args not 4 Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 32s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 24s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:33:49 +02:00
Benjamin Admin	2f8270f77b	Add Vision-LLM OCR Fusion (Step 4) for degraded scans Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 27s Details New module vision_ocr_fusion.py: Sends scan image + OCR word coordinates + document type to Qwen2.5-VL 32B. The LLM reads the image visually while using OCR positions as structural hints. Key features: - Document-type-aware prompts (Vokabelseite, Woerterbuch, etc.) - OCR words grouped into lines with x/y coordinates in prompt - Low-confidence words marked with (?) for LLM attention - Continuation row merging instructions in prompt - JSON response parsing with markdown code block handling - Fallback to original OCR on any error Frontend (admin-lehrer Grid Review): - "Vision-LLM" checkbox toggle - "Typ" dropdown (Vokabelseite, Woerterbuch, etc.) - Steps 1-3 defaults set to inactive Activate: Check "Vision-LLM", select document type, click "OCR neu + Grid". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:24:22 +02:00
Benjamin Admin	00eb9f26f6	Add "OCR neu + Grid" button to Grid Review Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 51s Details CI / test-go-edu-search (push) Successful in 42s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-python-agent-core (push) Successful in 21s Details CI / test-nodejs-website (push) Successful in 55s Details New endpoint POST /sessions/{id}/rerun-ocr-and-build-grid that: 1. Runs scan quality assessment 2. Applies CLAHE enhancement if degraded (controlled by enhance toggle) 3. Re-runs dual-engine OCR (RapidOCR + Tesseract) with min_conf filter 4. Merges OCR results and stores updated word_result 5. Builds grid with max_columns constraint Frontend: Orange "OCR neu + Grid" button in GridToolbar. Unlike "Neu berechnen" (which only rebuilds grid from existing words), this button re-runs the full OCR pipeline with quality settings. Now CLAHE toggle actually has an effect — it enhances the image before OCR runs, not after. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:55:01 +02:00
Benjamin Admin	141f69ceaa	Fix: max_columns now works in OCR Kombi build-grid pipeline Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m31s Details CI / test-python-agent-core (push) Successful in 27s Details CI / test-nodejs-website (push) Successful in 30s Details The max_columns parameter was only implemented in cv_words_first.py (vocab-worksheet path) but NOT in _build_grid_core which is what the admin OCR Kombi pipeline uses. The Kombi pipeline uses grid_editor_helpers._cluster_columns_by_alignment() which has its own column detection. Fix: Post-processing step 5k merges narrowest columns after grid building when zone has more columns than max_columns. Cells from merged columns get their text appended to the target column. min_conf word filtering was already working (applied before grid build). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:40:39 +02:00
Benjamin Admin	2baad68060	Remove A/B testing toggles from studio-v2 (customer frontend) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m50s Details CI / test-python-agent-core (push) Successful in 38s Details CI / test-nodejs-website (push) Successful in 43s Details Dev-only toggles belong in admin-lehrer (port 3002) only. The customer frontend runs the pipeline with optimal defaults and shows only the finished results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:18:44 +02:00
Benjamin Admin	25e5a7415a	Add A/B testing toggles to OCR Kombi Grid Review Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m27s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 24s Details Quality step toggles in admin-lehrer StepGridReview (port 3002): - CLAHE checkbox (Step 3: image enhancement) - MaxCol dropdown (Step 2: column limit, 0=off) - MinConf dropdown (Step 1: OCR confidence, 0=auto) Parameters flow through: StepGridReview → useGridEditor → build-grid endpoint → _build_grid_core. MinConf filters words before grid building. Toggle settings, click "Neu berechnen" to test each step individually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:09:17 +02:00
Benjamin Admin	545c8676b0	Add A/B testing toggles for OCR quality steps Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 30s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 18s Details Each quality improvement step can now be toggled independently: - CLAHE checkbox (Step 3: image enhancement on/off) - MaxCols dropdown (Step 2: 0=unlimited, 2-5) - MinConf dropdown (Step 1: auto/20/30/40/50/60) Backend: Query params enhance, max_cols, min_conf on process-single-page. Response includes active_steps dict showing which steps are enabled. Frontend: Toggle controls in VocabularyTab above the table. This allows empirical A/B testing of each step on the same scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 15:27:26 +02:00
Benjamin Admin	2f34ee9ede	Add scan quality scoring, column limit, image enhancement (Steps 1-3) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 32s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 28s Details CI / test-nodejs-website (push) Successful in 20s Details Step 1: scan_quality.py — Laplacian blur + contrast scoring, adjusts OCR confidence threshold (40 for good scans, 30 for degraded). Quality report included in API response + shown in frontend. Step 2: max_columns parameter in cv_words_first.py — limits column detection to 3 for vocab tables, preventing phantom columns D/E from degraded OCR fragments. Step 3: ocr_image_enhance.py — CLAHE contrast + bilateral filter denoising + unsharp mask, only for degraded scans (gated by quality score). Pattern from handwriting_htr_api.py. Frontend: quality info shown in extraction status after processing. Reprocess button now derives pages from vocabulary data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 14:58:39 +02:00
Benjamin Admin	5a154b744d	fix: migrate ocr-pipeline types to ocr-kombi after page deletion Types from deleted ocr-pipeline/types.ts inlined into ocr-kombi/types.ts. All imports updated across components/ocr-kombi/ and components/ocr-pipeline/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 14:22:09 +02:00
Benjamin Admin	f39cbe9283	refactor: remove unused pages and backends (model-management, OCR legacy, GPU/vast.ai, video-chat, matrix) Deleted pages: - /ai/model-management (mock data only, no real backend) - /ai/ocr-compare (old /vocab/ backend, replaced by ocr-kombi) - /ai/ocr-pipeline (minimal session browser, redundant) - /ai/ocr-overlay (legacy monolith, redundant) - /ai/gpu (vast.ai GPU management, no longer used) - /infrastructure/gpu (same) - /communication/video-chat (moved to core) - /communication/matrix (moved to core) Deleted backends: - backend-lehrer/infra/vast_client.py + vast_power.py - backend-lehrer/meetings_api.py + jitsi_api.py - website/app/api/admin/gpu/ - edu-search-service/scripts/vast_ai_extractor.py Total: ~7,800 LOC removed. All code preserved in git history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 13:14:12 +02:00
Benjamin Admin	5abdfa202e	chore: install refactoring guardrails (Phase 0) [guardrail-change] - scripts/check-loc.sh: LOC budget checker (500 LOC hard cap) - .claude/rules/architecture.md: split triggers, patterns per language - .claude/rules/loc-exceptions.txt: documented escape hatches - AGENTS.python.md: FastAPI conventions (routes thin, service layer) - AGENTS.go.md: Go/Gin conventions (handler ≤40 LOC) - AGENTS.typescript.md: Next.js conventions (page.tsx ≤250 LOC, colocation) - CLAUDE.md extended with guardrail section + commit markers 273 files currently exceed 500 LOC — to be addressed phase by phase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:25:36 +02:00
Benjamin Admin	9b0e310978	Fix: reprocess button works after session resume + apply merge logic Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 34s Details Two bugs fixed: 1. reprocessPages() failed silently after session resume because successfulPages was empty. Now derives pages from vocabulary source_page or selectedPages as fallback. 2. process-single-page endpoint built vocabulary entries WITHOUT applying merge logic (_merge_wrapped_rows, _merge_continuation_rows). Now applies full merge pipeline after vocabulary extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 00:46:15 +02:00
Benjamin Admin	46c2acb2f4	Add "Neu verarbeiten" button to VocabularyTab Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 53s Details CI / test-go-edu-search (push) Successful in 53s Details CI / test-python-klausur (push) Failing after 2m44s Details CI / test-python-agent-core (push) Successful in 1m3s Details CI / test-nodejs-website (push) Successful in 36s Details Allows reprocessing pages from the vocabulary view to apply new merge logic without navigating back to page selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:37:13 +02:00
Benjamin Admin	b8f1b71652	Fix: merge cell-wrap continuation rows in vocabulary extraction Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 58s Details CI / test-go-edu-search (push) Successful in 48s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has started running Details When textbook authors wrap text within a cell (e.g. long German translations), OCR treats each physical line as a separate row. New _merge_wrapped_rows() detects this by checking if the primary column (EN) is empty — indicating a continuation, not a new entry. Handles: empty EN + DE text, empty EN + example text, parenthetical continuations like "(bei)", triple wraps, comma-separated lists. 12 tests added covering all cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:32:45 +02:00
Benjamin Admin	6a165b36e5	Add Phase 5.1: LearningProgress dashboard widget Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 51s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 41s Details CI / test-nodejs-website (push) Successful in 32s Details Eltern-Dashboard widget showing per-unit learning stats: accuracy ring, coins, crowns, streak, and recent unit list. Uses ProgressRing and CrownBadge gamification components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:26:44 +02:00
Benjamin Admin	9dddd80d7a	Add Phases 3.2-4.3: STT, stories, syllables, gamification Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has started running Details Phase 3.2 — MicrophoneInput.tsx: Browser Web Speech API for speech-to-text recognition (EN+DE), integrated for pronunciation practice. Phase 4.1 — Story Generator: LLM-powered mini-stories using vocabulary words, with highlighted vocab in HTML output. Backend endpoint POST /learning-units/{id}/generate-story + frontend /learn/[unitId]/story. Phase 4.2 — SyllableBow.tsx: SVG arc component for syllable visualization under words, clickable for per-syllable TTS. Phase 4.3 — Gamification system: - CoinAnimation.tsx: Floating coin rewards with accumulator - CrownBadge.tsx: Crown/medal display for milestones - ProgressRing.tsx: Circular progress indicator - progress_api.py: Backend tracking coins, crowns, streaks per unit Also adds "Geschichte" exercise type button to UnitCard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:22:52 +02:00
Benjamin Admin	20a0585eb1	Add interactive learning modules MVP (Phases 1-3.1) Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 51s Details CI / test-python-klausur (push) Failing after 2m44s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 34s Details New feature: After OCR vocabulary extraction, users can generate interactive learning modules (flashcards, quiz, type trainer) with one click. Frontend (studio-v2): - Fortune Sheet spreadsheet editor tab in vocab-worksheet - "Lernmodule generieren" button in ExportTab - /learn page with unit overview and exercise type cards - /learn/[unitId]/flashcards — Flip-card trainer with Leitner spaced repetition - /learn/[unitId]/quiz — Multiple choice quiz with explanations - /learn/[unitId]/type — Type-in trainer with Levenshtein distance feedback - AudioButton component using Web Speech API for EN+DE TTS Backend (klausur-service): - vocab_learn_bridge.py: Converts VocabularyEntry[] to analysis_data format - POST /sessions/{id}/generate-learning-unit endpoint Backend (backend-lehrer): - generate-qa, generate-mc, generate-cloze endpoints on learning units - get-qa/mc/cloze data retrieval endpoints - Leitner progress update + next review items endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:13:23 +02:00
Benjamin Admin	4561320e0d	Fix SmartSpellChecker: preserve leading non-alpha text like (= Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m36s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 33s Details The tokenizer regex only matches alphabetic characters, so text before the first word match (like "(= " in "(= I won...") was silently dropped when reassembling the corrected text. Now preserves text[:first_match_start] as a leading prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:41:33 +02:00
Benjamin Admin	596864431b	Rule (a2): switch from allow-list to block-list for symbol removal Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m42s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 36s Details Instead of keeping only specific symbols (_KEEP_SYMBOLS), now only removes explicitly decorative symbols (_REMOVE_SYMBOLS: > < ~ \ ^ etc). All other punctuation (= ( ) ; : - etc.) is preserved by default. This is more robust: any new symbol used in textbooks will be kept unless it's in the small block-list of known decorative artifacts. Fixes: (= token still being removed on page 5 despite being in the allow-list (possibly due to Unicode variants or whitespace). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:34:21 +02:00
Benjamin Admin	c8027eb7f9	Fix: preserve = ; : - and other meaningful symbols in word_boxes Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m38s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Rule (a2) in Step 5i removed word_boxes with no letters/digits as "graphic OCR artifacts". This incorrectly removed = signs used as definition markers in textbooks ("film = 1. Film; 2. filmen"). Added exception list _KEEP_SYMBOLS for meaningful punctuation: = (= =) ; : - – — / + • · ( ) & * → ← ↔ The root cause: PaddleOCR returns "film = 1. Film; 2. filmen" as one block, which gets split into word_boxes ["film", "=", "1.", ...]. The "=" word_box had no alphanumeric chars and was removed as artifact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:18:35 +02:00
Benjamin Admin	ba0f659d1e	Preserve = and (= tokens in grid build and cell text cleanup Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 42s Details = signs are used as definition markers in textbooks ("film = 1. Film"). They were incorrectly removed by two filters: 1. grid_build_core.py Step 5j-pre: _PURE_JUNK_RE matched "=" as artifact noise. Now exempts =, (=, ;, :, - and similar meaningful punctuation tokens. 2. cv_ocr_engines.py _is_noise_tail_token: "pure non-alpha" check removed trailing = tokens. Now exempts meaningful punctuation. Fixes: "film = 1. Film; 2. filmen" losing the = sign, "(= I won and he lost.)" losing the (=. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:04:27 +02:00
Benjamin Admin	50bfd6e902	Fix gutter repair: don't suggest corrections for words with parentheses Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 50s Details CI / test-go-edu-search (push) Successful in 50s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 40s Details CI / test-nodejs-website (push) Successful in 31s Details Words like "probieren)" or "Englisch)" were incorrectly flagged as gutter OCR errors because the closing parenthesis wasn't stripped before dictionary lookup. The spellchecker then suggested "probierend" (replacing ) with d, edit distance 1). Two fixes: 1. Strip trailing/leading parentheses in _try_spell_fix before checking if the bare word is valid — skip correction if it is 2. Add )( to the rstrip characters in the analysis phase so "probieren)" becomes "probieren" for the known-word check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 22:38:22 +02:00
Benjamin Admin	0599c72cc1	Fix IPA continuation: don't replace normal text with IPA Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 19s Details Text like "Betonung auf der 1. Silbe: profit ['profit]" was incorrectly detected as garbled IPA and replaced with generated IPA transcription of the previous row's example sentence. Added guard: if the cell text contains >=3 recognizable words (3+ letter alpha tokens), it's normal text, not garbled IPA. Garbled IPA is typically short and has no real dictionary words. Fixes: Row 13 C3 showing IPA instead of pronunciation hint text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 22:28:58 +02:00
Benjamin Admin	5fad2d420d	test+docs(rag): Tests und Entwicklerdoku fuer RAG Landkarte - 44 Vitest-Tests: JSON-Struktur, Branchen-Zuordnung, Applicability Notes, Dokumenttyp-Verteilung, keine Duplikate - MkDocs-Seite: Architektur, 10 Branchen, Zuordnungslogik, Integration in andere Projekte, Datenquellen Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 20:47:54 +02:00
Benjamin Admin	c8e5e498b5	feat(rag): Applicability Notes UI + Branchen-Review - Matrix-Zeilen aufklappbar: Klick zeigt Branchenrelevanz-Erklaerung, Beschreibung und Gueltigkeitsdatum - 27 Branchen-Zuordnungen korrigiert: - OWASP/NIST/CISA/SBOM-Standards → alle (Kunden entwickeln Software) - BSI-TR-03161 → leer (DiGA, nicht Zielmarkt) - BSI 200-4, ENISA Supply Chain → alle (CRA/NIS2-Pflicht) - EAA/BFSG → +automotive (digitale Interfaces) - 264 horizontal, 42 sektorspezifisch, 14 nicht zutreffend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 19:15:01 +02:00
Benjamin Admin	261f686dac	Add OCR Pipeline Extensions developer docs + update vocab-worksheet docs Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 39s Details CI / test-python-klausur (push) Failing after 2m36s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 40s Details New: .claude/rules/ocr-pipeline-extensions.md - Complete documentation for SmartSpellChecker, Box-Grid-Review (Step 11), Ansicht/Spreadsheet (Step 12), Unified Grid - All 14 pipeline steps listed - Backend/frontend file structure with line counts - 66 tests documented - API endpoints, data flow, formatting rules Updated: .claude/rules/vocab-worksheet.md - Added Frontend Refactoring section (page.tsx → 14 files) - Updated format extension instructions (constants.ts instead of page.tsx) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:35:16 +02:00
Benjamin Admin	3d3c2b30db	Add tests for unified_grid and cv_box_layout Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 50s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m30s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 34s Details test_unified_grid.py (10 tests): - Dominant row height calculation (regular, gaps filtered, single row) - Box classification (full-width, partial left/right, text line count) - Unified grid building (content-only, box integration, cell tagging) test_box_layout.py (13 tests): - Layout classification (header_only, flowing, bullet_list) - Line grouping by y-proximity - Flowing layout indent grouping (bullet + continuations → \n) - Row/column field completeness for GridTable compatibility Total: 66 tests passing (43 smart_spell + 13 box_layout + 10 unified) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:18:52 +02:00
Benjamin Admin	1d22f649ae	fix(rag): Branchen auf 10 VDMA/VDA/BDI-Sektoren korrigiert Alte 17 "Branchen" (inkl. IoT, KI, HR, KRITIS) durch 10 echte Industriesektoren ersetzt: Automotive, Maschinenbau, Elektrotechnik, Chemie, Metall, Energie, Transport, Handel, Konsumgueter, Bau. Zuordnungslogik: 244 horizontal (alle), 65 sektorspezifisch, 11 nicht zutreffend (Finanz/Medizin/Plattformen). 102 applicability_notes mit Begruendung pro Regulierung. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:56:28 +02:00

1 2 3 4 5 ...

644 Commits