breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	49a36364a8	Add double-page split support to OCR Overlay (Kombi 7 Schritte) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 2m5s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details The page-split detection was only implemented in the regular pipeline page but not in the OCR Overlay page where the user actually tests with Kombi mode. Now the overlay page has full sub-session support: - openSession: handles sub_sessions, parent_session_id, skip logic for page-split vs crop-based sub-sessions, preserves current mode - handleOrientationComplete: async, fetches API to detect sub-sessions - BoxSessionTabs: shown between stepper and step content - handleNext: returns to parent after sub-session completion - handleSessionChange/handleBoxSessionsCreated: session switching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 11:48:26 +01:00
Benjamin Admin	14fd8e0b1e	Fix page-split: fetch sub-sessions from API instead of React state CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 17s Details handleOrientationComplete was checking subSessions from React state, but due to batching the state was still empty when the user clicked "Seiten verarbeiten". Now fetches session data directly from the API to reliably detect sub-sessions and auto-open the first one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 11:22:15 +01:00
Benjamin Admin	247b79674d	Add double-page spread detection to frontend pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 34s Details CI / test-python-klausur (push) Failing after 2m0s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details After orientation detection, the frontend now automatically calls the page-split endpoint. When a double-page book spread is detected, two sub-sessions are created and each goes through the full pipeline (deskew/dewarp/crop) independently — essential because each page of a spread tilts differently due to the spine. Frontend changes: - StepOrientation: calls POST /page-split after orientation, shows split info ("Doppelseite erkannt"), notifies parent of sub-sessions - page.tsx: distinguishes page-split sub-sessions (current_step < 5) from crop-based sub-sessions (current_step >= 5). Page-split subs only skip orientation, not deskew/dewarp/crop. - page.tsx: handleOrientationComplete opens first sub-session when page-split was detected Backend changes (orientation_crop_api.py): - page-split endpoint falls back to original image when orientation rotated a landscape spread to portrait - start_step parameter: 1 if split from original, 2 if from oriented Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 11:09:44 +01:00
Benjamin Admin	65f4ce1947	feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m52s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 18s Details - New ImageLayoutEditor: SVG overlay on original scan with draggable column dividers, horizontal guidelines (margins/header/footer), double-click to add columns, x-button to delete - GridTable: MIN_COL_WIDTH 40→80px for better readability - Arrow up/down keys navigate between rows in the grid editor - Ctrl+Click for multi-cell selection, Ctrl+B to toggle bold on selection - getAdjacentCell works for cells that don't exist yet (new rows/cols) - deleteColumn now merges x-boundaries correctly - Session restore fix: grid_editor_result/structure_result in session GET - Footer row 3-state cycle, auto-create cells for empty footer rows - Grid save/build/GT-mark now advance current_step=11 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 07:45:39 +01:00
Benjamin Admin	4e668660a7	feat: add Woerterbuch category + column add/delete in grid editor - New document category "Woerterbuch" (frontend type + backend validation) - Column delete: hover column header → red "x" button (with confirmation) - Column add: hover column header → "+" button inserts after that column - Both operations support undo/redo, update cell IDs and summary - Available in both GridEditor and StepGridReview (Kombi last step) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 16:27:12 +01:00
Benjamin Admin	7a6eadde8b	feat: integrate Ground Truth review into Kombi Pipeline last step - New StepGridReview component: split-view (scan image left, grid right), confidence stats, row-accept buttons, zoom controls - Kombi Pipeline case 6 now uses StepGridReview instead of plain GridEditor - Kombi step label changed to "Review & GT" - Ground Truth queue page simplified to overview/navigation only (links to Kombi pipeline for actual review work) - Deep-link support: /ai/ocr-overlay?session=xxx&mode=kombi Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 15:04:23 +01:00
Benjamin Admin	4e809c3860	fix: ground-truth crash on col_type + remove AIToolsSidebarResponsive from model-management CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m0s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - Ground-truth: zone.columns use 'label' not 'col_type' — calling .replace() on undefined crashed the page after grid data loaded - Model-management: same AIToolsSidebarResponsive wrapper bug as the other pages — does not render children Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 10:14:02 +01:00
Benjamin Admin	dccbb909bc	fix: remove AIToolsSidebarResponsive wrapper from ground-truth and regression pages CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 2m0s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details AIToolsSidebarResponsive does not accept children — it renders only a sidebar nav. Using it as a wrapper caused page content to never render. Replaced with plain div, matching the pattern used by ocr-pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:57:52 +01:00
Benjamin Admin	be7f5f1872	feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management D2: TrOCR ONNX export script (printed + handwritten, int8 quantization) D3: PP-DocLayout ONNX export script (download or Docker-based conversion) B3: Model Management admin page (PyTorch vs ONNX status, benchmarks, config) A4: TrOCR ONNX service with runtime routing (auto/pytorch/onnx via TROCR_BACKEND) A5: PP-DocLayout ONNX detection with OpenCV fallback (via GRAPHIC_DETECT_BACKEND) B4: Structure Detection UI toggle (OpenCV vs PP-DocLayout) with class color coding C3: TrOCR-ONNX.md documentation C4: OCR-Pipeline.md ONNX section added C5: mkdocs.yml nav updated, optimum added to requirements.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:53:02 +01:00
Benjamin Admin	c695b659fb	fix: PagePurpose props on ground-truth and regression pages CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m53s Details CI / test-python-agent-core (push) Successful in 14s Details CI / test-nodejs-website (push) Successful in 17s Details Both pages passed `moduleId` which is not a valid prop for PagePurpose. The component expects explicit title/purpose/audience — calling audience.join() on undefined caused the client-side crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:43:36 +01:00
Benjamin Admin	a1e079b911	feat: Sprint 1 — IPA hardening, regression framework, ground-truth review CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 19s Details Track A (Backend): - Compound word IPA decomposition (schoolbag→school+bag) - Trailing garbled IPA fragment removal after brackets (R21 fix) - Regression runner with DB persistence, history endpoints - Page crop determinism verified with tests Track B (Frontend): - OCR Regression dashboard (/ai/ocr-regression) - Ground Truth Review workflow (/ai/ocr-ground-truth) with split-view, confidence highlighting, inline edit, batch mark, progress tracking Track C (Docs): - OCR-Pipeline.md v5.0 (Steps 5e-5h) - Regression testing guide - mkdocs.yml nav update Track D (Infra): - TrOCR baseline benchmark script - run-regression.sh shell script - Migration 008: regression_runs table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:21:27 +01:00
Benjamin Admin	f9d71d50d1	Add exclude region marking in Structure step CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m47s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details Users can now draw rectangles on the document image in the Structure Detection step to mark areas (e.g. header graphics, alphabet strips) that should be excluded from OCR results during grid building. - Backend: PUT/DELETE endpoints for exclude regions stored in structure_result - Backend: _build_grid_core() filters all words inside user-defined exclude regions - Frontend: Interactive rectangle drawing with visual overlay and delete buttons - Preserve exclude regions when re-running structure detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 09:08:30 +01:00
Benjamin Admin	a3e2a7f994	Add GT button to OCR overlay, prominent category picker, track pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m51s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 18s Details - Ground Truth button on last step of Pipeline/Kombi modes in ocr-overlay - Prominent category picker in active session info bar (pulses when unset) - GT badge shown when session has ground truth reference - Backend: auto-detect pipeline from ocr_engine, store in GT snapshot - Pipeline info shown in GT session list and regression reports - Also pass pipeline param from ocr-pipeline StepGroundTruth Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 14:49:02 +01:00
Benjamin Admin	b1cdb2531c	feat: CSS Grid editor with OCR-measured column widths and row heights CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 30s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details Backend: add layout_metrics (avg_row_height_px, font_size_suggestion_px) to build-grid response for faithful grid reconstruction. Frontend: rewrite GridTable from HTML <table> to CSS Grid layout. Column widths are now proportional to the OCR-measured x_min/x_max positions. Row heights use the average content row height from the scan. Column and row resize via drag handles (Excel-like). Font: add Noto Sans (supports IPA characters) via next/font/google. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 13:48:47 +01:00
Benjamin Admin	729ebff63c	feat: add border ghost filter + graphic detection tests + structure overlay - Add _filter_border_ghost_words() to remove OCR artefacts from box borders (vertical + horizontal edge detection, column cleanup, re-indexing) - Add 20 tests for border ghost filter (basic filtering + column cleanup) - Add 24 tests for cv_graphic_detect (color detection, word overlap, boxes) - Clean up cv_graphic_detect.py logging (per-candidate → DEBUG) - Add structure overlay layer to StepReconstruction (boxes + graphics toggle) - Show border_ghosts_removed badge in StepStructureDetection - Update MkDocs with structure detection documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 18:28:53 +01:00
Benjamin Admin	3aa4a63257	fix: move Struktur step after OCR so word boxes are available for exclusion CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 30s Details CI / test-python-klausur (push) Failing after 2m2s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Graphic detection needs word positions to exclude text from the ink mask. Previously Struktur ran before OCR, causing every word to be detected as a graphic element. Now: - Pipeline: Struktur at index 7 (after Wörter) - Kombi: Struktur at index 5 (after PP-OCRv5+Tesseract, before Tabelle) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 13:38:58 +01:00
Benjamin Admin	6b9b280ba3	feat: integrate graphic element detection into structure step CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m58s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 19s Details Add cv_graphic_detect.py for detecting non-text visual elements (arrows, circles, lines, exclamation marks, icons, illustrations). Draw detected graphics on structure overlay image and display them in the frontend StepStructureDetection component with shape counts and individual listings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 13:21:55 +01:00
Benjamin Admin	1d34785e2b	feat: add Structure step to Kombi mode in OCR Overlay page CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 33s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 19s Details Insert the Struktur detection step between Zuschneiden and PP-OCRv5+Tesseract in the Kombi pipeline on /ai/ocr-overlay. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 12:59:05 +01:00
Benjamin Admin	5b5213c2b9	feat: add Structure Detection step to OCR pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m58s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 16s Details New pipeline step between Crop and Columns that visualizes detected document structure: boxes (line-based + shading), page zones, and color regions. Shows original image on the left, annotated overlay on the right. Backend: POST /detect-structure endpoint + /image/structure-overlay Frontend: StepStructureDetection component with zone/box/color details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 12:31:09 +01:00
Benjamin Admin	4a8d43fd71	feat: display detected text colors in grid editor UI CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 2m8s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details - Add color/color_name/recovered fields to OcrWordBox type - GridTable: show colored text + left-edge color indicator strip - GridEditor: show color stats and recovered count in summary bar Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 01:03:09 +01:00
Benjamin Admin	c3f1547e32	feat: add Excel-like grid editor for OCR overlay (Kombi mode step 6) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 17s Details Backend: new grid_editor_api.py with build-grid endpoint that detects bordered boxes, splits page into zones, clusters columns/rows per zone from Kombi word positions. New DB column grid_editor_result JSONB. Frontend: GridEditor component with editable HTML tables per zone, column bold toggle, header row toggle, undo/redo, keyboard navigation (Tab/Enter/Arrow), image overlay verification, and save/load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 23:41:03 +01:00
Benjamin Admin	4a15d46dfd	refactor: rename PaddleOCR → PP-OCRv5 in frontend, remove Kombi-Vergleich tab CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m53s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 16s Details Since ocr_region_paddle() now runs RapidOCR locally (same PP-OCRv5 models), the "PaddleOCR (Hetzner)" labels were misleading. Renamed to "PP-OCRv5 (lokal)". Removed the Kombi-Vergleich tab since both sides would produce identical results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 09:11:26 +01:00
Benjamin Admin	a994ddee83	feat: add Kombi-Vergleich mode for side-by-side Paddle vs RapidOCR comparison CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 21s Details Add /rapid-kombi backend endpoint using local RapidOCR + Tesseract merge, KombiCompareStep component for parallel execution and side-by-side overlay, and wordResultOverride prop on OverlayReconstruction for direct data injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 07:59:06 +01:00
Benjamin Admin	e9ccd1e35c	feat: add Kombi-Modus (PaddleOCR + Tesseract) for OCR Overlay CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 35s Details CI / test-go-edu-search (push) Successful in 33s Details CI / test-python-klausur (push) Failing after 2m20s Details CI / test-python-agent-core (push) Successful in 22s Details CI / test-nodejs-website (push) Successful in 41s Details Runs both OCR engines on the preprocessed image and merges results: word boxes matched by IoU, coordinates averaged by confidence weight. Unmatched Tesseract words (bullets, symbols) are added for better coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 20:05:50 +01:00
Benjamin Admin	c743a38eaf	fix: Paddle Direct keeps preprocessing (orient/deskew/dewarp/crop) CI / nodejs-lint (push) Has been cancelled Details CI / go-lint (push) Has been cancelled Details CI / python-lint (push) Has been cancelled Details CI / test-go-school (push) Has been cancelled Details CI / test-go-edu-search (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details Uses the cropped/dewarped image instead of the original so the overlay shows the correctly oriented page. 5 steps instead of 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 16:56:18 +01:00
Benjamin Admin	90c1efd9b0	feat: Paddle Direct — 1-click OCR without deskew/dewarp/crop CI / go-lint (push) Has been cancelled Details CI / python-lint (push) Has been cancelled Details CI / nodejs-lint (push) Has been cancelled Details CI / test-go-school (push) Has been cancelled Details CI / test-go-edu-search (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details New 2-step mode (Upload → PaddleOCR+Overlay) alongside the existing 7-step pipeline. Backend endpoint runs PaddleOCR on the original image and clusters words into rows/cells directly. Frontend adds a mode toggle and PaddleDirectStep component. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 16:41:55 +01:00
Benjamin Admin	3cc496f7f3	feat(rag): Update Verbraucherschutz docs + chunk counts + Landkarte CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Failing after 14s Details CI / test-python-klausur (push) Failing after 2m5s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 22s Details - Update chunk counts for 8 successfully ingested DE laws (Phase H1) - Add 6 new BGB-Teile entries (AGB, Fernabsatz, Kaufrecht, Widerruf, Digital) - Add EGBGB Widerrufsbelehrung entry - Update COLLECTION_TOTALS: gesetze 58304→63567 (+5263 Phase H chunks) - Add Verbraucherschutz thematic group to Landkarte - Extend ecommerce industry map with consumer protection regulations - Update date to March 2026 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 09:54:20 +01:00
Benjamin Admin	2fdf3ff868	feat(rag): Register Verbraucherschutz laws + EU directives in RAG constants CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 33s Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details Add 15 new regulations from Phase H ingestion: - DE: PAngV, VSBG, ProdHaftG, VerpackG, ElektroG, BattDG, BFSG, UWG, GewO - EU: Warenkauf-RL, Klausel-RL, UGP-RL, Preisangaben-RL, Omnibus-RL, BattVO Chunk counts set to 0 (will be updated after successful ingestion). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 06:43:19 +01:00
Benjamin Admin	0ee92e7210	feat: OCR word_boxes fuer pixelgenaue Overlay-Positionierung CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 37s Details CI / test-go-edu-search (push) Successful in 32s Details CI / test-python-klausur (push) Failing after 2m10s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 20s Details Backend: _ocr_cell_crop speichert jetzt word_boxes mit exakten Tesseract/RapidOCR Wort-Koordinaten (left, top, width, height) im Cell-Ergebnis. Absolute Bildkoordinaten, bereits zurueckgemappt. Frontend: Slide-Hook nutzt word_boxes direkt wenn vorhanden — jedes Wort wird exakt an seiner OCR-Position platziert. Kein Pixel-Scanning noetig. Fallback auf alten Slide wenn keine Boxes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 19:39:49 +01:00
Benjamin Admin	8a60f4bf30	fix: Overlay-Zellen ohne _heal_row_gaps positionieren (skip_heal_gaps) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m12s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 21s Details _heal_row_gaps verschiebt Zell-Positionen nach Entfernung von Artefakt-Zeilen, was im Overlay zu sichtbarem Versatz fuehrt (z.B. 23px bei "badge"). Neuer skip_heal_gaps Parameter in build_cell_grid_v2 und words-Endpoint behaelt die exakten Zeilen-Positionen bei. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 08:59:50 +01:00
Benjamin Admin	e3ee1de790	Revert "fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)" CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m2s Details CI / test-python-agent-core (push) Successful in 18s Details CI / test-nodejs-website (push) Successful in 24s Details This reverts commit `b91f799ccf`.	2026-03-11 08:44:07 +01:00
Benjamin Admin	b91f799ccf	fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 26s Details Seiten mit Info-Boxen (andere Zeilenhoehe) fuehren dazu, dass _regularize_row_grid die Zeilenpositionen verzerrt. Neuer skip_regularize Parameter nutzt stattdessen die gap-basierten Zeilen, die der tatsaechlichen Seitengeometrie folgen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 08:29:06 +01:00
Benjamin Admin	2cbdfc56f3	feat: OCR Overlay — ganzseitige Rekonstruktion ohne Spaltenerkennung CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 31s Details CI / test-go-edu-search (push) Successful in 33s Details CI / test-python-klausur (push) Failing after 2m6s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 28s Details Neue Route /ai/ocr-overlay mit vereinfachter 7-Schritt-Pipeline (Orientierung, Begradigung, Entzerrung, Zuschnitt, Zeilen, Woerter, Overlay). Nutzt bestehende Step-Komponenten, ueberspringt Spalten/LLM-Review/Ground-Truth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 00:08:05 +01:00
Benjamin Admin	9047339f0d	fix: Sub-Sessions starten direkt bei Spalten, ueberspringe Vorverarbeitung CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 34s Details CI / test-go-edu-search (push) Successful in 30s Details CI / test-python-klausur (push) Failing after 2m13s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 21s Details Box-Sub-Sessions haben bereits ein zugeschnittenes Bild. Orientierung, Begradigung, Entzerrung und Crop werden uebersprungen (skipped). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 22:51:16 +01:00
Benjamin Admin	2592ef233b	feat: Frontend Sub-Sessions (Boxen) in OCR-Pipeline UI CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 1m57s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - BoxSessionTabs: Tab-Leiste zum Wechsel zwischen Haupt- und Box-Sessions - StepColumnDetection: Box-Info + "Box-Sessions erstellen" Button - page.tsx: Session-Wechsel, Sub-Session-State, auto-return nach Abschluss - types.ts: SubSession, PageZone, erweiterte SessionInfo/ColumnResult Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 20:33:59 +01:00
Benjamin Admin	156a818246	refactor: Crop nach Deskew/Dewarp verschieben + content-basierter Buchscan-Crop CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details Pipeline-Reihenfolge neu: Orientierung → Begradigung → Entzerrung → Zuschneiden → Spalten... Crop arbeitet jetzt auf dem bereits geraden Bild, was bessere Ergebnisse liefert. page_crop.py komplett ersetzt: Adaptive Threshold + 4-Kanten-Erkennung (Buchruecken-Schatten links, Ink-Projektion fuer alle Raender) statt Otsu + groesste Kontur. Backend: Step-Nummern, Input-Bilder, Reprocess-Kaskade angepasst. Frontend: PIPELINE_STEPS umgeordnet, Switch-Cases, Vorher-Bilder aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 08:52:11 +01:00
Benjamin Admin	2763631711	feat: Orientierung + Zuschneiden als Schritte 1-2 in OCR-Pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 28s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details Zwei neue Wizard-Schritte vor Begradigung: - Step 1: Orientierungserkennung (0/90/180/270° via Tesseract OSD) - Step 2: Seitenrand-Erkennung und Zuschnitt (Scannerraender entfernen) Backend: - orientation_crop_api.py: POST /orientation, POST /crop, POST /crop/skip - page_crop.py: detect_and_crop_page() mit Format-Erkennung (A4/A5/Letter) - Session-Store: orientation_result, crop_result Felder - Pipeline nutzt zugeschnittenes Bild fuer Deskew/Dewarp Frontend: - StepOrientation.tsx: Upload + Auto-Orientierung + Vorher/Nachher - StepCrop.tsx: Auto-Crop + Format-Badge + Ueberspringen-Option - Pipeline-Stepper: 10 Schritte (war 8) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 23:55:23 +01:00
Benjamin Admin	931ab92c92	feat: Orientierungserkennung in OCR-Pipeline-Deskew integrieren CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m59s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 21s Details detect_and_fix_orientation() wird jetzt vor dem Deskew-Schritt in der OCR-Pipeline ausgefuehrt, sodass 90/180/270°-gedrehte Scans automatisch korrigiert werden. Frontend zeigt Orientierungskorrektur als Info-Banner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 22:31:36 +01:00
Benjamin Admin	0f821afb23	feat(sbom): Lehrer-spezifisch — 17 Core/Compliance-Eintraege entfernt, Beschreibungen angepasst CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m58s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 16s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 20:34:20 +01:00
Benjamin Admin	2ad391e4e4	feat: Feinabstimmung mit 7 Schiebereglern fuer Deskew/Dewarp CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 2m1s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details Neues aufklappbares Panel unter Entzerrung mit individuellen Reglern: - 3 Rotations-Regler (P1 Iterative, P2 Word-Alignment, P3 Textline) - 4 Scherungs-Regler (A-D Methoden) mit Radio-Auswahl - Kombinierte Vorschau und Ground-Truth-Speicherung - Backend: POST /sessions/{id}/adjust-combined Endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:22:33 +01:00
Benjamin Admin	b9c3c47a37	refactor: LLM Compare komplett entfernt, Video/Voice/Alerts Sidebar hinzugefuegt - LLM Compare Seiten, Configs und alle Referenzen geloescht - Kommunikation-Kategorie in Sidebar mit Video & Chat, Voice Service, Alerts - Compliance SDK Kategorie aus Sidebar entfernt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:34:54 +01:00
Benjamin Admin	9912997187	refactor: Jitsi/Matrix/Voice von Core übernommen, Camunda/BPMN gelöscht, Kommunikation-Nav CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 25s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m55s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 18s Details - Voice-Service von Core nach Lehrer verschoben (bp-lehrer-voice-service) - 4 Jitsi-Services + 2 Synapse-Services in docker-compose.yml aufgenommen - Camunda komplett gelöscht: workflow pages, workflow-config.ts, bpmn-js deps - CAMUNDA_URL aus backend-lehrer environment entfernt - Sidebar: Kategorie "Compliance SDK" + "Katalogverwaltung" entfernt - Sidebar: Neue Kategorie "Kommunikation" mit Video & Chat, Voice Service, Alerts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 17:01:47 +01:00
Benjamin Admin	9ea77ba157	fix: Abschliessen button returns to session list on last pipeline step CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 2m4s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 17s Details handleNext() did nothing on the last step (early return). Now resets session, steps and navigates back to the session overview. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 15:05:48 +01:00
Benjamin Admin	cd12755da6	feat: OCR umlaut confusion correction + bold detection via stroke-width CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m39s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - Add umlaut confusion rules (i→ü, a→ä, o→ö, u→ü) to _spell_fix_token for German text — fixes "iberqueren" → "überqueren" etc. - Add _detect_bold() using OpenCV stroke-width analysis on cell crops - Integrate bold detection in both narrow (cell-crop) and broad (word-lookup) paths - Add is_bold field to GridCell TypeScript interface - Render bold text in StepGroundTruth reconstruction view Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 12:06:57 +01:00
Benjamin Admin	e6858010c2	feat: RAG Chunk Browser — alle Collections + 59 EDPB/WP29/DSFA Eintraege CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 28s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 18s Details - rag-constants.ts: 11 → 59 EDPB/WP29/EDPS + 20 DSFA Muss-Listen - ChunkBrowserQA: Dropdown von 3 auf 7 Collections erweitert (+ bp_dsfa_corpus, bp_compliance_recht, bp_legal_templates, bp_nibis_eh) - page.tsx: Collection-Totals aktualisiert (datenschutz 17459, dsfa 8666) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 11:01:14 +01:00
Benjamin Admin	1cc69d6b5e	feat: OCR pipeline step 8 — validation view with image detection & generation CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 29s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m4s Details CI / test-python-agent-core (push) Successful in 19s Details CI / test-nodejs-website (push) Successful in 19s Details Replaces the stub StepGroundTruth with a full side-by-side Original vs Reconstruction view. Adds VLM-based image region detection (qwen2.5vl), mflux image generation proxy, sync scroll/zoom, manual region drawing, and score/notes persistence. New backend endpoints: detect-images, generate-image, validate, get validation. New standalone mflux-service (scripts/mflux-service.py) for Metal GPU generation. Dockerfile.base: adds fonts-liberation (Apache-2.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 10:40:37 +01:00
Benjamin Admin	293e7914d8	feat: improved OCR pipeline session manager with categories, thumbnails, pipeline logging CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m48s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 20s Details - Add document_category (10 types) and pipeline_log JSONB columns - Session list: thumbnails, copyable IDs, category/doc_type badges - Inline category dropdown, bulk delete, pipeline step logging - New endpoints: thumbnail, delete-all, pipeline-log, categories - Cleared all 22 old test sessions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 09:44:38 +01:00
Benjamin Admin	9cbf0fb278	fix: Fake Compliance Advisor aus Lehrer KI-Admin entfernt Der Compliance Advisor gehoert ins Compliance SDK (macmini:3007/sdk/agents), nicht ins Lehrer-Admin. Die verbleibenden 5 Agenten (TutorAgent, GraderAgent, QualityJudge, AlertAgent, Orchestrator) bleiben erhalten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:15:50 +01:00
Benjamin Admin	29c74a9962	feat: cell-first OCR + document type detection + dynamic pipeline steps Cell-First OCR (v2): Each cell is cropped and OCR'd in isolation, eliminating neighbour bleeding (e.g. "to", "ps" in marker columns). Uses ThreadPoolExecutor for parallel Tesseract calls. Document type detection: Classifies pages as vocab_table, full_text, or generic_table using projection profiles (<2s, no OCR needed). Frontend dynamically skips columns/rows steps for full-text pages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:52:38 +01:00
Benjamin Admin	c484a89b78	fix: dewarp UI shows detection details, quality gate status, confidence bars CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 35s Details CI / test-go-edu-search (push) Successful in 26s Details CI / test-python-klausur (push) Failing after 1m56s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 19s Details - Add DewarpDetection type with per-method results - Expand method labels for all 4 detectors (A-D) - Show green/amber banner: applied vs quality-gate-rejected - Expandable "Details" panel showing all 4 methods with confidence bars - Visual confidence bars instead of plain percentage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 08:39:55 +01:00

1 2

89 Commits