breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	156a818246	refactor: Crop nach Deskew/Dewarp verschieben + content-basierter Buchscan-Crop CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details Pipeline-Reihenfolge neu: Orientierung → Begradigung → Entzerrung → Zuschneiden → Spalten... Crop arbeitet jetzt auf dem bereits geraden Bild, was bessere Ergebnisse liefert. page_crop.py komplett ersetzt: Adaptive Threshold + 4-Kanten-Erkennung (Buchruecken-Schatten links, Ink-Projektion fuer alle Raender) statt Otsu + groesste Kontur. Backend: Step-Nummern, Input-Bilder, Reprocess-Kaskade angepasst. Frontend: PIPELINE_STEPS umgeordnet, Switch-Cases, Vorher-Bilder aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 08:52:11 +01:00
Benjamin Admin	2763631711	feat: Orientierung + Zuschneiden als Schritte 1-2 in OCR-Pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Zwei neue Wizard-Schritte vor Begradigung: - Step 1: Orientierungserkennung (0/90/180/270° via Tesseract OSD) - Step 2: Seitenrand-Erkennung und Zuschnitt (Scannerraender entfernen) Backend: - orientation_crop_api.py: POST /orientation, POST /crop, POST /crop/skip - page_crop.py: detect_and_crop_page() mit Format-Erkennung (A4/A5/Letter) - Session-Store: orientation_result, crop_result Felder - Pipeline nutzt zugeschnittenes Bild fuer Deskew/Dewarp Frontend: - StepOrientation.tsx: Upload + Auto-Orientierung + Vorher/Nachher - StepCrop.tsx: Auto-Crop + Format-Badge + Ueberspringen-Option - Pipeline-Stepper: 10 Schritte (war 8) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 23:55:23 +01:00
Benjamin Admin	931ab92c92	feat: Orientierungserkennung in OCR-Pipeline-Deskew integrieren CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 21s Details detect_and_fix_orientation() wird jetzt vor dem Deskew-Schritt in der OCR-Pipeline ausgefuehrt, sodass 90/180/270°-gedrehte Scans automatisch korrigiert werden. Frontend zeigt Orientierungskorrektur als Info-Banner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 22:31:36 +01:00
Benjamin Admin	0f821afb23	feat(sbom): Lehrer-spezifisch — 17 Core/Compliance-Eintraege entfernt, Beschreibungen angepasst CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m58s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 20:34:20 +01:00
Benjamin Admin	2ad391e4e4	feat: Feinabstimmung mit 7 Schiebereglern fuer Deskew/Dewarp CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details Neues aufklappbares Panel unter Entzerrung mit individuellen Reglern: - 3 Rotations-Regler (P1 Iterative, P2 Word-Alignment, P3 Textline) - 4 Scherungs-Regler (A-D Methoden) mit Radio-Auswahl - Kombinierte Vorschau und Ground-Truth-Speicherung - Backend: POST /sessions/{id}/adjust-combined Endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:22:33 +01:00
Benjamin Admin	b9c3c47a37	refactor: LLM Compare komplett entfernt, Video/Voice/Alerts Sidebar hinzugefuegt - LLM Compare Seiten, Configs und alle Referenzen geloescht - Kommunikation-Kategorie in Sidebar mit Video & Chat, Voice Service, Alerts - Compliance SDK Kategorie aus Sidebar entfernt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:34:54 +01:00
Benjamin Admin	9912997187	refactor: Jitsi/Matrix/Voice von Core übernommen, Camunda/BPMN gelöscht, Kommunikation-Nav CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details - Voice-Service von Core nach Lehrer verschoben (bp-lehrer-voice-service) - 4 Jitsi-Services + 2 Synapse-Services in docker-compose.yml aufgenommen - Camunda komplett gelöscht: workflow pages, workflow-config.ts, bpmn-js deps - CAMUNDA_URL aus backend-lehrer environment entfernt - Sidebar: Kategorie "Compliance SDK" + "Katalogverwaltung" entfernt - Sidebar: Neue Kategorie "Kommunikation" mit Video & Chat, Voice Service, Alerts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:01:47 +01:00
Benjamin Admin	9ea77ba157	fix: Abschliessen button returns to session list on last pipeline step CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 2m4s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details handleNext() did nothing on the last step (early return). Now resets session, steps and navigates back to the session overview. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 15:05:48 +01:00
Benjamin Admin	cd12755da6	feat: OCR umlaut confusion correction + bold detection via stroke-width CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - Add umlaut confusion rules (i→ü, a→ä, o→ö, u→ü) to _spell_fix_token for German text — fixes "iberqueren" → "überqueren" etc. - Add _detect_bold() using OpenCV stroke-width analysis on cell crops - Integrate bold detection in both narrow (cell-crop) and broad (word-lookup) paths - Add is_bold field to GridCell TypeScript interface - Render bold text in StepGroundTruth reconstruction view Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 12:06:57 +01:00
Benjamin Admin	e6858010c2	feat: RAG Chunk Browser — alle Collections + 59 EDPB/WP29/DSFA Eintraege CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - rag-constants.ts: 11 → 59 EDPB/WP29/EDPS + 20 DSFA Muss-Listen - ChunkBrowserQA: Dropdown von 3 auf 7 Collections erweitert (+ bp_dsfa_corpus, bp_compliance_recht, bp_legal_templates, bp_nibis_eh) - page.tsx: Collection-Totals aktualisiert (datenschutz 17459, dsfa 8666) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 11:01:14 +01:00
Benjamin Admin	1cc69d6b5e	feat: OCR pipeline step 8 — validation view with image detection & generation CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m4s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 19s Details Replaces the stub StepGroundTruth with a full side-by-side Original vs Reconstruction view. Adds VLM-based image region detection (qwen2.5vl), mflux image generation proxy, sync scroll/zoom, manual region drawing, and score/notes persistence. New backend endpoints: detect-images, generate-image, validate, get validation. New standalone mflux-service (scripts/mflux-service.py) for Metal GPU generation. Dockerfile.base: adds fonts-liberation (Apache-2.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 10:40:37 +01:00
Benjamin Admin	293e7914d8	feat: improved OCR pipeline session manager with categories, thumbnails, pipeline logging CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m48s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 20s Details - Add document_category (10 types) and pipeline_log JSONB columns - Session list: thumbnails, copyable IDs, category/doc_type badges - Inline category dropdown, bulk delete, pipeline step logging - New endpoints: thumbnail, delete-all, pipeline-log, categories - Cleared all 22 old test sessions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 09:44:38 +01:00
Benjamin Admin	9cbf0fb278	fix: Fake Compliance Advisor aus Lehrer KI-Admin entfernt Der Compliance Advisor gehoert ins Compliance SDK (macmini:3007/sdk/agents), nicht ins Lehrer-Admin. Die verbleibenden 5 Agenten (TutorAgent, GraderAgent, QualityJudge, AlertAgent, Orchestrator) bleiben erhalten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:15:50 +01:00
Benjamin Admin	29c74a9962	feat: cell-first OCR + document type detection + dynamic pipeline steps Cell-First OCR (v2): Each cell is cropped and OCR'd in isolation, eliminating neighbour bleeding (e.g. "to", "ps" in marker columns). Uses ThreadPoolExecutor for parallel Tesseract calls. Document type detection: Classifies pages as vocab_table, full_text, or generic_table using projection profiles (<2s, no OCR needed). Frontend dynamically skips columns/rows steps for full-text pages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:52:38 +01:00
Benjamin Admin	c484a89b78	fix: dewarp UI shows detection details, quality gate status, confidence bars CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 35s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 19s Details - Add DewarpDetection type with per-method results - Expand method labels for all 4 detectors (A-D) - Show green/amber banner: applied vs quality-gate-rejected - Expandable "Details" panel showing all 4 methods with confidence bars - Visual confidence bars instead of plain percentage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 08:39:55 +01:00
Benjamin Admin	a610bc75ba	fix: rename LLM-Korrektur to Korrektur in wizard stepper and types	2026-03-03 17:56:46 +01:00
Benjamin Admin	ccba2bb887	fix(ocr-pipeline): show sub-columns in reconstruction and LLM review steps CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m54s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 21s Details - Add marker/bbox_marker fields to WordEntry type - Add page_ref/column_marker colors to StepReconstruction - Make StepLlmReview table dynamic based on columns_used metadata, showing all detected columns (EN, DE, Example, page_ref, marker) instead of hardcoded EN/DE/Beispiel only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 10:36:27 +01:00
Benjamin Admin	dea3349b23	fix(ocr-pipeline): preserve sub-column data in vocab table display CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m51s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details Three fixes for sub-columns disappearing at end of streaming: 1. Backend: add column_marker mapping in _cells_to_vocab_entries() so marker text is included in vocab entries (not silently dropped) 2. Frontend types: add source_page and bbox_ref to WordEntry interface 3. Frontend table: show page_ref column (Seite) in vocab table when entries have source_page data, instead of only EN/DE/Example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 08:06:15 +01:00
Benjamin Admin	e718353d9f	feat(ocr-pipeline): 6 systematic improvements for robustness, performance & UX CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m57s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 21s Details 1. Unit tests: 76 new parametrized tests for noise filter, phonetic detection, cell text cleaning, and row merging (116 total, all green) 2. Continuation-row merge: detect multi-line vocab entries where text wraps (lowercase EN + empty DE) and merge into previous entry 3. Empty DE fallback: secondary PSM=7 OCR pass for cells missed by PSM=6 4. Batch-OCR: collect empty cells per column, run single Tesseract call on column strip instead of per-cell (~66% fewer calls for 3+ empty cells) 5. StepReconstruction UI: font scaling via naturalHeight, empty EN/DE field highlighting, undo/redo (Ctrl+Z), per-cell reset button 6. Session reprocess: POST /sessions/{id}/reprocess endpoint to re-run from any step, with reprocess button on completed pipeline steps Also fixes pre-existing dewarp_image tuple unpacking bug in run_cv_pipeline and updates dewarp tests to match current (image, info) return signature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 14:46:38 +01:00
Benjamin Admin	dbf0db0c13	feat(ocr-pipeline): improve LLM review UI + add reconstruction step StepLlmReview: Show full vocab table with image overlay, row-level status tracking (pending/active/reviewed/corrected/skipped), and auto-scroll during SSE streaming. Load previous results on mount. StepReconstruction: New step 7 with editable text fields at original bbox positions over dewarped image. Zoom controls, tab navigation, color-coded columns, save to backend. Backend: Add POST /sessions/{id}/reconstruction endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 12:19:21 +01:00
Benjamin Admin	938d1d69cf	feat(ocr-pipeline): add LLM-based OCR correction step (Step 6) Replace the placeholder "Koordinaten" step with an LLM review step that sends vocab entries to qwen3:30b-a3b via Ollama for OCR error correction (e.g. "8en" → "Ben"). Teachers can review, accept/reject individual corrections in a diff table before applying them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 11:13:17 +01:00
Benjamin Admin	27b895a848	feat(ocr-pipeline): generic cell-grid with optional vocab mapping Extract build_cell_grid() as layout-agnostic foundation from build_word_grid(). Step 5 now produces a generic cell grid (columns x rows) and auto-detects whether vocab layout is present. Frontend dynamically switches between vocab table (EN/DE/Example) and generic cell table based on layout type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 17:22:56 +01:00
Benjamin Admin	854d8b431b	feat(rag-qa): add 14 missing PDF mappings for EDPB, ENISA, EDPS, TMG, UrhG Adds entries for all regulation codes in REGULATIONS_IN_RAG that were missing from RAG_PDF_MAPPING, fixing "Kein PDF-Mapping" messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 11:10:09 +01:00
Benjamin Admin	954d21e469	fix: use local Inter font to avoid Google Fonts timeout in Docker build The Docker container cannot reach Google Fonts, causing build failures. Switch to bundled local font file using next/font/local. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:26:34 +01:00
Benjamin Admin	e3aa8e899e	feat(rag-qa): add fullscreen mode for split-view chunk browser Allows viewing chunks side-by-side with original PDF in fullscreen mode for large screen QA review. Toggle via button or close with Escape key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:23:32 +01:00
Benjamin Admin	266b9dfad3	Fix PDF 404: default to bp_compliance_ce collection, add PDF existence check Default collection changed from bp_compliance_gesetze (DE/AT/CH laws where PDFs need manual download) to bp_compliance_ce (EU regulations where PDFs are auto-downloaded). Added HEAD request check so missing PDFs show a clear "PDF nicht vorhanden" message instead of a 404 in the iframe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:13:26 +01:00
Benjamin Admin	b48cd8bb46	Fix ChunkBrowserQA layout: proper height constraints, remove bottom nav duplication - Root container uses calc(100vh - 220px) for fixed viewport height - All flex children use min-h-0 to enable proper overflow scrolling - Removed duplicate bottom nav buttons (Zurueck/Weiter) that appeared in the middle of the chunk text — navigation is only in the header now - Chunk text panel scrolls internally with fixed header - Added prominent article/section badges in header and panel header - Added chunk length quality indicator (warns on very short/long chunks) - Structural metadata keys (article, section, pages) sorted first - Sidebar shows regulation name instead of code for better readability - PDF viewer uses pages metadata from payload when available Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 20:24:50 +01:00
Benjamin Admin	e7fb9d59f1	Fix ChunkBrowserQA: use regulation_id from Qdrant payload instead of regulation_code The Qdrant collections use regulation_id (e.g. eu_2016_679) as the filter key, not regulation_code (e.g. GDPR). Updated rag-constants.ts with correct qdrant_id mappings from actual Qdrant data, fixed API to filter on regulation_id, and updated ChunkBrowserQA to pass qdrant_id values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:22:12 +01:00
Benjamin Admin	8c42fefa77	feat(rag): add QA Split-View Chunk-Browser for ingestion verification New ChunkBrowserQA component replaces inline chunk browser with: - Document sidebar with live chunk counts per regulation (batched Qdrant count API) - Sequential chunk navigation with arrow keys (1/N through all chunks of a document) - Overlap display showing previous/next chunk boundaries (amber-highlighted) - Split-view with original PDF via iframe (estimated page from chunk index) - Adjustable chunks-per-page ratio for PDF page estimation Extracts REGULATIONS_IN_RAG and REGULATION_INFO to shared rag-constants.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:46:11 +01:00
Benjamin Admin	45435f226f	feat(ocr-pipeline): line grouping fix + RapidOCR integration Fix A: Use _group_words_into_lines() with adaptive Y-tolerance to correctly order words in multi-line cells (fixes word reordering bug). RapidOCR: Add as alternative OCR engine (PaddleOCR models on ONNX Runtime, native ARM64). Engine selectable via dropdown in UI or ?engine= query param. Auto mode prefers RapidOCR when available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:13:58 +01:00
Benjamin Admin	17604b8eb2	test: add tests for API proxy scroll/collection-count and Chunk-Browser logic CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m41s Details CI / test-python-agent-core (push) Successful in 14s Details CI / test-nodejs-website (push) Successful in 19s Details 42 tests covering: - Qdrant scroll endpoint proxy (offset, limit, filters, text search) - Collection-count endpoint - REGULATION_SOURCES URL validation (IFRS, EFRAG, ENISA, NIST, OECD) - Chunk-Browser collections, text search filtering, pagination state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 16:46:42 +01:00
Benjamin Admin	491df4e1b0	feat: add Chunk-Browser tab to RAG page - New 'Chunk-Browser' tab for sequential chunk browsing - Qdrant scroll API proxy (scroll + collection-count actions) - Pagination with prev/next through all chunks in a collection - Text search filter with highlighting - Click to expand chunk and see all metadata - 'In Chunks suchen' button now navigates to Chunk-Browser with correct collection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:35:52 +01:00
Benjamin Admin	954103cdf2	feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows) Backend: build_word_grid() intersects column regions with content rows, OCRs each cell with language-specific Tesseract, and returns vocabulary entries with percent-based bounding boxes. New endpoints: POST /words, GET /image/words-overlay, ground-truth save/retrieve for words. Frontend: StepWordRecognition with overview + step-through labeling modes, goToStep callback for row correction feedback loop. MkDocs: OCR Pipeline documentation added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 02:18:29 +01:00
Benjamin Admin	47dc2e6f7a	feat(rag): source URLs, low-chunk warnings & IFRS/EFRAG entries - Add REGULATION_SOURCES map with 88 original document URLs for all regulations (EUR-Lex, gesetze-im-internet.de, RIS, Fedlex, etc.) - Render "Originalquelle →" link in regulation detail panel - Add amber warning indicator for suspiciously low chunk counts (<10) - Add EU_IFRS_DE, EU_IFRS_EN, EFRAG_ENDORSEMENT to RAG tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:56:09 +01:00
Benjamin Admin	b58aecd081	feat(ocr-pipeline): add Step 4 row detection UI in admin frontend Insert rows step between columns and words in the pipeline wizard. Shows overlay image, row list with type badges, and ground truth controls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:28:05 +01:00
Benjamin Admin	c7ae44ff17	feat(rag): add 42 new regulations to RAG overview + update collection totals CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m46s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 23s Details New regulations across bp_compliance_ce (11), bp_compliance_gesetze (31), and bp_compliance_datenschutz (1). Collection totals updated: gesetze 58304, ce 18183, datenschutz 2448, total 103912. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:04:27 +01:00
Benjamin Admin	b03cb0a1e6	Fix Landkarte tab crash: variable name shadowed isInRag function Local variables named 'isInRag' shadowed the outer function, causing "isInRag is not a function" error. Renamed to regInRag/codeInRag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 00:01:01 +01:00
Benjamin Admin	5a45cbf605	Update RAG page: Chunks/Status columns use hardcoded data, Key Intersections show RAG status CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 24s Details CI / test-python-klausur (push) Failing after 1m36s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 15s Details - Chunks column now uses getKnownChunks() instead of API-based getRegulationChunks() - Status column uses isInRag() check (green/red) instead of ratio-based calculation - Key Intersections chips show green/red with checkmark/cross based on RAG status Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:53:21 +01:00
Benjamin Admin	2297f66edb	feat(rag): Add RAG status indicators and 4 new EU regulations CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m39s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 23s Details - Add REGULATIONS_IN_RAG Set tracking all 42 regulations currently in Qdrant - Add 4 new regulation entries: E-Commerce-RL, Verbraucherrechte-RL, Digitale-Inhalte-RL, DMA (all ingested Feb 2026) - Add RAG column to regulations table with green check/red x indicators - Update Landkarte tab: green/x on industry cards, thematic clusters, and regulation matrix - Replace old "Integrated Regulations" section with full RAG coverage overview - Update hardcoded chunk counts (Templates: 7689, NiBiS: 7996) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:23:52 +01:00
Benjamin Admin	587b066a40	feat(ocr-pipeline): ground-truth comparison tool for column detection Side-by-side view: auto result (readonly) vs GT editor where teacher draws correct columns. Diff table shows Auto vs GT with IoU matching. GT data persisted per session for algorithm tuning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:48:37 +01:00
Benjamin Admin	bb879a03a8	feat(ocr-pipeline): add column_ignore type for margins/empty areas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:51:56 +01:00
Benjamin Admin	7a3570fe46	feat(ocr-pipeline): manual column editor for Step 3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:27:54 +01:00
Benjamin Admin	1393a994f9	Flexible inhaltsbasierte Spaltenerkennung (2-Phasen) Ersetzt hardcodierte Positionsregeln durch ein zweistufiges System: Phase A erkennt Spaltengeometrie (Clustering), Phase B klassifiziert Typen per Inhalt (Sprache/Rolle) mit 3-stufiger Fallback-Kette. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 23:33:35 +01:00
Benjamin Admin	cf27a95308	feat(ocr-pipeline): word-based 5-column detection for vocabulary pages Replace projection-profile layout analysis with Tesseract word bounding box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE, markers, examples). Falls back to projection profiles when < 3 clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 23:08:14 +01:00
Benjamin Admin	aa06ae0f61	feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3) Sessions werden jetzt in PostgreSQL gespeichert statt in-memory. Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout() fuer automatische Spaltenerkennung mit farbigem Overlay. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 22:16:37 +01:00
Benjamin Admin	09b820efbe	refactor(dewarp): replace displacement map with affine shear correction The old displacement-map approach shifted entire rows by a parabolic profile, creating a circle/barrel distortion. The actual problem is a linear vertical shear: after deskew aligns horizontal lines, the vertical column edges are still tilted by ~0.5°. New approach: - Detect shear angle from strongest vertical edge slope (not curvature) - Apply cv2.warpAffine shear to straighten vertical features - Manual slider: -2.0° to +2.0° in 0.05° steps - Slider initializes to auto-detected shear angle - Ground truth question: "Spalten vertikal ausgerichtet?" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:23:04 +01:00
Benjamin Admin	9df745574b	fix(ocr-pipeline): dewarp visibility, grid on both sides, session persistence - Fix dewarp method selection: prefer methods with >5px curvature over higher confidence (vertical_edge 79px was being ignored for text_baseline 2px) - Add grid overlay on left image in Dewarp step for side-by-side comparison - Add GET /sessions/{id} endpoint to reload session data - StepDeskew accepts sessionId prop to restore state when navigating back - SessionInfo type extended with optional deskew_result and dewarp_result Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:29:53 +01:00
Benjamin Admin	589d2f811a	feat: Dewarp-Korrektur als Schritt 2 in OCR Pipeline (7 Schritte) Implementiert Buchwoelbungs-Entzerrung mit zwei Methoden: - Methode A: Vertikale-Kanten-Analyse (Sobel + Polynom 2. Grades) - Methode B: Textzeilen-Baseline (Tesseract + Baseline-Kruemmung) Beste Methode wird automatisch gewaehlt, manueller Slider (-3 bis +3). Backend: 3 neue Endpoints (auto/manual dewarp, ground truth) Frontend: StepDewarp + DewarpControls, Pipeline von 6 auf 7 Schritte Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:46:41 +01:00
Benjamin Admin	d552fd8b6b	feat: OCR Pipeline mit 6-Schritt-Wizard fuer Seitenrekonstruktion CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Successful in 1m46s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 22s Details Neue Route /ai/ocr-pipeline mit schrittweiser Begradigung (Deskew), Raster-Overlay und Ground Truth. Schritte 2-6 als Platzhalter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:38:08 +01:00
Benjamin Boenisch	6a53f8d79c	refactor: Remove all SDK/compliance pages and API routes from admin-lehrer SDK/compliance content belongs exclusively in admin-compliance (port 3007). Removed: - All (sdk)/ pages (document-crawler, dsb-portal, industry-templates, multi-tenant, sso) - All api/sdk/ proxy routes - All developers/sdk/ documentation pages - Unused lib/sdk/ modules (kept: catalog-manager + its deps for dashboard) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 09:24:36 +01:00

1 2

54 Commits