breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	8507e2e035	fix(ocr-pipeline): split oversized cells before OCR to capture all text For cells taller than 1.5× median row height, split vertically into sub-cells and OCR each separately. This fixes RapidOCR losing text at the bottom of tall cells (e.g. "floor/Fußboden" below "egg/Ei" in a merged row). Generic fix — works for any oversized cell. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 11:32:10 +01:00
Benjamin Admin	854d8b431b	feat(rag-qa): add 14 missing PDF mappings for EDPB, ENISA, EDPS, TMG, UrhG Adds entries for all regulation codes in REGULATIONS_IN_RAG that were missing from RAG_PDF_MAPPING, fixing "Kein PDF-Mapping" messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 11:10:09 +01:00
Benjamin Admin	f2521d2b9e	feat(ocr-pipeline): British/American IPA pronunciation choice - Integrate Britfone dictionary (MIT, 15k British English IPA entries) - Add pronunciation parameter: 'british' (default) or 'american' - British uses Britfone (Received Pronunciation), falls back to CMU - American uses eng_to_ipa/CMU, falls back to Britfone - Frontend: dropdown to switch pronunciation, default = British - API: ?pronunciation=british\|american query parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 11:08:52 +01:00
Benjamin Admin	954d21e469	fix: use local Inter font to avoid Google Fonts timeout in Docker build The Docker container cannot reach Google Fonts, causing build failures. Switch to bundled local font file using next/font/local. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:26:34 +01:00
Benjamin Admin	010616be5a	fix(ocr-pipeline): generic example attachment + cell padding 1. Semantic example matching: instead of attaching example sentences to the immediately preceding entry, find the vocab entry whose English word(s) appear in the example. "a broken arm" → matches "broken" via word overlap, not "egg/Ei". Uses stem matching for word form variants (break/broken share stem "bro"). 2. Cell padding: add 8px padding to each cell region so words at column/row edges don't get clipped by OCR (fixes "er wollte" missing at cell boundaries). 3. Treat very short DE text (≤2 chars) as OCR noise, not real translation — prevents false positives in example detection. All fixes are generic and deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:24:28 +01:00
Benjamin Admin	e3aa8e899e	feat(rag-qa): add fullscreen mode for split-view chunk browser Allows viewing chunks side-by-side with original PDF in fullscreen mode for large screen QA review. Toggle via button or close with Escape key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:23:32 +01:00
Benjamin Admin	266b9dfad3	Fix PDF 404: default to bp_compliance_ce collection, add PDF existence check Default collection changed from bp_compliance_gesetze (DE/AT/CH laws where PDFs need manual download) to bp_compliance_ce (EU regulations where PDFs are auto-downloaded). Added HEAD request check so missing PDFs show a clear "PDF nicht vorhanden" message instead of a 404 in the iframe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:13:26 +01:00
Benjamin Admin	ab294d5a6f	feat(ocr-pipeline): deterministic post-processing pipeline Add 4 post-processing steps after OCR (no LLM needed): 1. Character confusion fix: I/1/l/\| correction using cross-language context (if DE has "Ich", EN "1" → "I") 2. IPA dictionary replacement: detect [phonetics] brackets, look up correct IPA from eng_to_ipa (MIT, 134k words) — replaces OCR'd phonetic symbols with dictionary-correct transcription 3. Comma-split: "break, broke, broken" / "brechen, brach, gebrochen" → 3 individual entries when part counts match 4. Example sentence attachment: rows with EN but no DE translation get attached as examples to the preceding vocab entry All fixes are deterministic and generic — no hardcoded word lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:00:09 +01:00
Benjamin Admin	b48cd8bb46	Fix ChunkBrowserQA layout: proper height constraints, remove bottom nav duplication - Root container uses calc(100vh - 220px) for fixed viewport height - All flex children use min-h-0 to enable proper overflow scrolling - Removed duplicate bottom nav buttons (Zurueck/Weiter) that appeared in the middle of the chunk text — navigation is only in the header now - Chunk text panel scrolls internally with fixed header - Added prominent article/section badges in header and panel header - Added chunk length quality indicator (warns on very short/long chunks) - Structural metadata keys (article, section, pages) sorted first - Sidebar shows regulation name instead of code for better readability - PDF viewer uses pages metadata from payload when available Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 20:24:50 +01:00
Benjamin Admin	d481e0087b	deps: add eng-to-ipa for IPA dictionary lookup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 20:23:40 +01:00
Benjamin Admin	f7e0f2bb4f	feat(ocr-pipeline): line breaks, hyphen rejoin & oversized row splitting - Preserve \n between visual lines within cells (instead of joining with space) - Rejoin hyphenated words split across line breaks (e.g. Fuß-\nboden → Fußboden) - Split oversized rows (>1.5× median height) into sub-entries when EN/DE line counts match — deterministic fix for missed Step 4 row boundaries - Frontend: render \n as <br/>, use textarea for multiline editing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:49:28 +01:00
Benjamin Admin	e7fb9d59f1	Fix ChunkBrowserQA: use regulation_id from Qdrant payload instead of regulation_code The Qdrant collections use regulation_id (e.g. eu_2016_679) as the filter key, not regulation_code (e.g. GDPR). Updated rag-constants.ts with correct qdrant_id mappings from actual Qdrant data, fixed API to filter on regulation_id, and updated ChunkBrowserQA to pass qdrant_id values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:22:12 +01:00
Benjamin Admin	859342300e	fix(ocr-pipeline): configure RapidOCR for German + tighter word detection - Switch to PP-OCRv5 Latin model (supports ä, ö, ü, ß) - Use SERVER model for better accuracy - Lower Det.unclip_ratio 1.6→1.3 to reduce word merging - Raise Det.box_thresh 0.5→0.6 for stricter detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:17:49 +01:00
Benjamin Admin	8c42fefa77	feat(rag): add QA Split-View Chunk-Browser for ingestion verification New ChunkBrowserQA component replaces inline chunk browser with: - Document sidebar with live chunk counts per regulation (batched Qdrant count API) - Sequential chunk navigation with arrow keys (1/N through all chunks of a document) - Overlap display showing previous/next chunk boundaries (amber-highlighted) - Split-view with original PDF via iframe (estimated page from chunk index) - Adjustable chunks-per-page ratio for PDF page estimation Extracts REGULATIONS_IN_RAG and REGULATION_INFO to shared rag-constants.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:46:11 +01:00
Benjamin Admin	984dfab975	fix(ocr-pipeline): add libgl1 for RapidOCR OpenCV dependency RapidOCR pulls in full opencv-python which requires libGL.so.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:30:12 +01:00
Benjamin Admin	45435f226f	feat(ocr-pipeline): line grouping fix + RapidOCR integration Fix A: Use _group_words_into_lines() with adaptive Y-tolerance to correctly order words in multi-line cells (fixes word reordering bug). RapidOCR: Add as alternative OCR engine (PaddleOCR models on ONNX Runtime, native ARM64). Engine selectable via dropdown in UI or ?engine= query param. Auto mode prefers RapidOCR when available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:13:58 +01:00
Benjamin Admin	4ec7c20490	feat(ocr-pipeline): add rapidocr + onnxruntime to requirements RapidOCR uses PaddleOCR models on ONNX Runtime, works natively on ARM64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 17:08:21 +01:00
Benjamin Admin	17604b8eb2	test: add tests for API proxy scroll/collection-count and Chunk-Browser logic CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 27s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m41s Details CI / test-python-agent-core (push) Successful in 14s Details CI / test-nodejs-website (push) Successful in 19s Details 42 tests covering: - Qdrant scroll endpoint proxy (offset, limit, filters, text search) - Collection-count endpoint - REGULATION_SOURCES URL validation (IFRS, EFRAG, ENISA, NIST, OECD) - Chunk-Browser collections, text search filtering, pagination state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 16:46:42 +01:00
Benjamin Admin	f39314fb27	docs: add Chunk-Browser documentation - Document Chunk-Browser tab functionality and API - Cover scroll endpoint, text search, pagination - Document Originalquelle links and low-chunk warnings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:50:36 +01:00
Benjamin Admin	356d39d6ee	fix(ocr-pipeline): use PSM 6 (block) for multi-line cell OCR in word grid PSM 7 (single line) missed the second line in cells with two lines. PSM 6 handles multi-line content. Also fix sort order to Y-then-X for correct reading order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:40:04 +01:00
Benjamin Admin	491df4e1b0	feat: add Chunk-Browser tab to RAG page - New 'Chunk-Browser' tab for sequential chunk browsing - Qdrant scroll API proxy (scroll + collection-count actions) - Pagination with prev/next through all chunks in a collection - Text search filter with highlighting - Click to expand chunk and see all metadata - 'In Chunks suchen' button now navigates to Chunk-Browser with correct collection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:35:52 +01:00
Benjamin Admin	954103cdf2	feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows) Backend: build_word_grid() intersects column regions with content rows, OCRs each cell with language-specific Tesseract, and returns vocabulary entries with percent-based bounding boxes. New endpoints: POST /words, GET /image/words-overlay, ground-truth save/retrieve for words. Frontend: StepWordRecognition with overview + step-through labeling modes, goToStep callback for row correction feedback loop. MkDocs: OCR Pipeline documentation added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 02:18:29 +01:00
Benjamin Admin	47dc2e6f7a	feat(rag): source URLs, low-chunk warnings & IFRS/EFRAG entries - Add REGULATION_SOURCES map with 88 original document URLs for all regulations (EUR-Lex, gesetze-im-internet.de, RIS, Fedlex, etc.) - Render "Originalquelle →" link in regulation detail panel - Add amber warning indicator for suspiciously low chunk counts (<10) - Add EU_IFRS_DE, EU_IFRS_EN, EFRAG_ENDORSEMENT to RAG tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:56:09 +01:00
Benjamin Admin	203b3c0e2d	fix(ocr-pipeline): mask out images in row detection horizontal projection Build a word-coverage mask so only pixels near Tesseract word bounding boxes contribute to the horizontal projection. Image regions (high ink but no words) are treated as white, preventing illustrations from merging multiple vocabulary rows into one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:39:20 +01:00
Benjamin Admin	b58aecd081	feat(ocr-pipeline): add Step 4 row detection UI in admin frontend Insert rows step between columns and words in the pipeline wizard. Shows overlay image, row list with type badges, and ground truth controls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:28:05 +01:00
Benjamin Admin	04b83d5f46	feat(ocr-pipeline): add row detection step with horizontal gap analysis Add Step 4 (row detection) between column detection and word recognition. Uses horizontal projection profiles + whitespace gaps (same method as columns). Includes header/footer classification via gap-size heuristics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:14:31 +01:00
Benjamin Admin	c7ae44ff17	feat(rag): add 42 new regulations to RAG overview + update collection totals CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 25s Details CI / test-python-klausur (push) Failing after 1m46s Details CI / test-python-agent-core (push) Successful in 15s Details CI / test-nodejs-website (push) Successful in 23s Details New regulations across bp_compliance_ce (11), bp_compliance_gesetze (31), and bp_compliance_datenschutz (1). Collection totals updated: gesetze 58304, ce 18183, datenschutz 2448, total 103912. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:04:27 +01:00
Benjamin Admin	ce0815007e	feat(ocr-pipeline): replace clustering column detection with whitespace-gap analysis Column detection now uses vertical projection profiles to find whitespace gaps between columns, then validates gaps against word bounding boxes to prevent splitting through words. Old clustering algorithm extracted as fallback (_detect_columns_by_clustering) for pages with < 2 detected gaps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 00:36:28 +01:00
Benjamin Admin	b03cb0a1e6	Fix Landkarte tab crash: variable name shadowed isInRag function Local variables named 'isInRag' shadowed the outer function, causing "isInRag is not a function" error. Renamed to regInRag/codeInRag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 00:01:01 +01:00
Benjamin Admin	5a45cbf605	Update RAG page: Chunks/Status columns use hardcoded data, Key Intersections show RAG status CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 24s Details CI / test-go-edu-search (push) Successful in 24s Details CI / test-python-klausur (push) Failing after 1m36s Details CI / test-python-agent-core (push) Successful in 16s Details CI / test-nodejs-website (push) Successful in 15s Details - Chunks column now uses getKnownChunks() instead of API-based getRegulationChunks() - Status column uses isInRag() check (green/red) instead of ratio-based calculation - Key Intersections chips show green/red with checkmark/cross based on RAG status Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:53:21 +01:00
Benjamin Admin	164b35c06a	fix(ocr-pipeline): tighten page_ref constraints based on live testing - Reduce left-side threshold from 35% to 20% of content width - Strong language signal (eng/deu > 0.3) now prevents page_ref assignment - Increase column_ignore word threshold from 3 to 8 for edge columns - Apply language guard to Level 1 and Level 2 classification Fixes: column with deu=0.921 was misclassified as page_ref because reference score check ran before language analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:33:11 +01:00
Benjamin Admin	2297f66edb	feat(rag): Add RAG status indicators and 4 new EU regulations CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 26s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 1m39s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 23s Details - Add REGULATIONS_IN_RAG Set tracking all 42 regulations currently in Qdrant - Add 4 new regulation entries: E-Commerce-RL, Verbraucherrechte-RL, Digitale-Inhalte-RL, DMA (all ingested Feb 2026) - Add RAG column to regulations table with green check/red x indicators - Update Landkarte tab: green/x on industry cards, thematic clusters, and regulation matrix - Replace old "Integrated Regulations" section with full RAG coverage overview - Update hardcoded chunk counts (Templates: 7689, NiBiS: 7996) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:23:52 +01:00
Benjamin Admin	db8327f039	fix(ocr-pipeline): tune column detection based on GT comparison Address 5 weaknesses found via ground-truth comparison on session df3548d1: - Add column_ignore for edge columns with < 3 words (margin detection) - Absorb tiny clusters (< 5% width) into neighbors post-merge - Restrict page_ref to left 35% of content area across all 3 levels - Loosen marker thresholds (width < 6%, words <= 15) and add strong marker score for very narrow non-edge columns (< 4%) - Add EN/DE position tiebreaker when language signals are both weak Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 23:16:31 +01:00
Benjamin Admin	587b066a40	feat(ocr-pipeline): ground-truth comparison tool for column detection Side-by-side view: auto result (readonly) vs GT editor where teacher draws correct columns. Diff table shows Auto vs GT with IoU matching. GT data persisted per session for algorithm tuning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:48:37 +01:00
Benjamin Admin	03fa186fec	fix(ocr-pipeline): increase merge distance to 6% for better column merging Sub-alignments within a column (indented words, etc.) were 60-90px apart and not getting merged at 3%. On a typical 5-col page (~1500px), 6% = ~90px merges sub-alignments while keeping real column boundaries (~300px) separate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 20:19:09 +01:00
Benjamin Admin	1040729874	fix(ocr-pipeline): avoid backslash in f-string for Python 3.11 compat Use format() instead of nested f-strings with escaped quotes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 20:06:20 +01:00
Benjamin Admin	4f37afa222	feat(ocr-pipeline): verticality filter for column detection Clusters now track Y-positions of their words and filter by vertical coverage (>=30% primary, >=15%+5words secondary) to reject noise from indentations or page numbers. Merge distance widened to 3% content width. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 19:57:13 +01:00
Benjamin Admin	bb879a03a8	feat(ocr-pipeline): add column_ignore type for margins/empty areas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:51:56 +01:00
Benjamin Admin	f535d3c967	fix(ocr-pipeline): manual editor layout + no re-detection on cached result - ManualColumnEditor now uses grid-cols-2 layout (image left, controls right) matching the normal view size so the image doesn't zoom in - StepColumnDetection only runs auto-detection when no cached result exists; revisiting step 3 loads cached columns without re-running detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:45:49 +01:00
Benjamin Admin	7a3570fe46	feat(ocr-pipeline): manual column editor for Step 3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 08:27:54 +01:00
Benjamin Admin	1393a994f9	Flexible inhaltsbasierte Spaltenerkennung (2-Phasen) Ersetzt hardcodierte Positionsregeln durch ein zweistufiges System: Phase A erkennt Spaltengeometrie (Clustering), Phase B klassifiziert Typen per Inhalt (Sprache/Rolle) mit 3-stufiger Fallback-Kette. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 23:33:35 +01:00
Benjamin Admin	cf27a95308	feat(ocr-pipeline): word-based 5-column detection for vocabulary pages Replace projection-profile layout analysis with Tesseract word bounding box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE, markers, examples). Falls back to projection profiles when < 3 clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 23:08:14 +01:00
Benjamin Admin	aa06ae0f61	feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3) Sessions werden jetzt in PostgreSQL gespeichert statt in-memory. Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout() fuer automatische Spaltenerkennung mit farbigem Overlay. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 22:16:37 +01:00
Benjamin Admin	09b820efbe	refactor(dewarp): replace displacement map with affine shear correction The old displacement-map approach shifted entire rows by a parabolic profile, creating a circle/barrel distortion. The actual problem is a linear vertical shear: after deskew aligns horizontal lines, the vertical column edges are still tilted by ~0.5°. New approach: - Detect shear angle from strongest vertical edge slope (not curvature) - Apply cv2.warpAffine shear to straighten vertical features - Manual slider: -2.0° to +2.0° in 0.05° steps - Slider initializes to auto-detected shear angle - Ground truth question: "Spalten vertikal ausgerichtet?" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:23:04 +01:00
Benjamin Admin	ff2bb79a91	fix(dewarp): change manual slider to percentage (0-200%) instead of raw multiplier The old -3.0 to +3.0 scale multiplied the full displacement map (up to ~79px) directly, causing extreme distortion at values >1. New slider: - 0% = no correction - 100% = auto-detected correction (default) - 200% = double correction - Step size: 5% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:10:34 +01:00
Benjamin Admin	fb496c5e34	perf(klausur-service): split Dockerfile into base + app layer Tesseract OCR + 70 Debian packages + pip dependencies are now in a separate base image (klausur-base:latest) that is built once and reused. A --no-cache build now only rebuilds the code layer (~seconds) instead of re-downloading 33 MB of system packages (~9 minutes). Rebuild base when requirements.txt or system deps change: docker build -f klausur-service/Dockerfile.base -t klausur-base:latest klausur-service/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:43:24 +01:00
Benjamin Admin	9df745574b	fix(ocr-pipeline): dewarp visibility, grid on both sides, session persistence - Fix dewarp method selection: prefer methods with >5px curvature over higher confidence (vertical_edge 79px was being ignored for text_baseline 2px) - Add grid overlay on left image in Dewarp step for side-by-side comparison - Add GET /sessions/{id} endpoint to reload session data - StepDeskew accepts sessionId prop to restore state when navigating back - SessionInfo type extended with optional deskew_result and dewarp_result Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:29:53 +01:00
Benjamin Admin	44e8c573af	fix: Deskew Ground Truth Frage auf Rotation beschraenken "Korrekt ausgerichtet?" → "Rotation korrekt?" mit Hinweis, dass Woelbung/Verzerrung im naechsten Schritt korrigiert wird. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:16:24 +01:00
Benjamin Admin	589d2f811a	feat: Dewarp-Korrektur als Schritt 2 in OCR Pipeline (7 Schritte) Implementiert Buchwoelbungs-Entzerrung mit zwei Methoden: - Methode A: Vertikale-Kanten-Analyse (Sobel + Polynom 2. Grades) - Methode B: Textzeilen-Baseline (Tesseract + Baseline-Kruemmung) Beste Methode wird automatisch gewaehlt, manueller Slider (-3 bis +3). Backend: 3 neue Endpoints (auto/manual dewarp, ground truth) Frontend: StepDewarp + DewarpControls, Pipeline von 6 auf 7 Schritte Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:46:41 +01:00
Benjamin Admin	d552fd8b6b	feat: OCR Pipeline mit 6-Schritt-Wizard fuer Seitenrekonstruktion CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Successful in 1m46s Details CI / test-python-agent-core (push) Successful in 17s Details CI / test-nodejs-website (push) Successful in 22s Details Neue Route /ai/ocr-pipeline mit schrittweiser Begradigung (Deskew), Raster-Overlay und Ground Truth. Schritte 2-6 als Platzhalter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:38:08 +01:00

1 2

72 Commits