Commit Graph

49 Commits

Author SHA1 Message Date
Benjamin Admin
123b7ada0b fix(columns): filter phantom narrow columns + rename step to OCR-Zeichenkorrektur
Phantom column fix:
Adjacent tiny gaps (e.g. 11px + 35px) can create very narrow columns
(< 3% of content width) with 0 words. These are scan artefacts, not
real columns. New Step 9 in detect_column_geometry():
- Filter columns where width < max(20px, 3% content_w) AND words < 3
- After filtering, extend each remaining column to close the gap with
  its right neighbor, and re-assign words to correct column

Example from logs: 5 columns → 4 columns (phantom at x=710, width=36px
eliminated; neighbors expanded to cover the gap)

UI rename:
- 'Schritt 6: LLM-Korrektur' → 'Schritt 6: OCR-Zeichenkorrektur'
- 'LLM-Korrektur starten' → 'Zeichenkorrektur starten'
- Error message updated accordingly
(No LLM involved anymore — spell-checker is the active engine)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 16:06:59 +01:00
Benjamin Admin
ccba2bb887 fix(ocr-pipeline): show sub-columns in reconstruction and LLM review steps
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 21s
- Add marker/bbox_marker fields to WordEntry type
- Add page_ref/column_marker colors to StepReconstruction
- Make StepLlmReview table dynamic based on columns_used metadata,
  showing all detected columns (EN, DE, Example, page_ref, marker)
  instead of hardcoded EN/DE/Beispiel only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 10:36:27 +01:00
Benjamin Admin
4d428980c1 refactor(word-step): make table fully generic and fix marker-only row filter
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 24s
CI / test-python-klausur (push) Failing after 1m43s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 17s
Frontend: Replace hardcoded EN/DE/Example vocab table with unified dynamic
table driven by columns_used from backend. Labeling, confirmation, counts,
and summary badges are now all cell-based instead of branching on isVocab.

Backend: Change _cells_to_vocab_entries() entry filter from checking only
english/german/example to checking ANY mapped field. This preserves rows
with only marker or source_page content, fixing the issue where marker
sub-columns disappeared at the end of OCR processing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 08:45:24 +01:00
Benjamin Admin
dea3349b23 fix(ocr-pipeline): preserve sub-column data in vocab table display
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 16s
Three fixes for sub-columns disappearing at end of streaming:

1. Backend: add column_marker mapping in _cells_to_vocab_entries()
   so marker text is included in vocab entries (not silently dropped)

2. Frontend types: add source_page and bbox_ref to WordEntry interface

3. Frontend table: show page_ref column (Seite) in vocab table when
   entries have source_page data, instead of only EN/DE/Example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 08:06:15 +01:00
Benjamin Admin
e718353d9f feat(ocr-pipeline): 6 systematic improvements for robustness, performance & UX
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 21s
1. Unit tests: 76 new parametrized tests for noise filter, phonetic detection,
   cell text cleaning, and row merging (116 total, all green)
2. Continuation-row merge: detect multi-line vocab entries where text wraps
   (lowercase EN + empty DE) and merge into previous entry
3. Empty DE fallback: secondary PSM=7 OCR pass for cells missed by PSM=6
4. Batch-OCR: collect empty cells per column, run single Tesseract call on
   column strip instead of per-cell (~66% fewer calls for 3+ empty cells)
5. StepReconstruction UI: font scaling via naturalHeight, empty EN/DE field
   highlighting, undo/redo (Ctrl+Z), per-cell reset button
6. Session reprocess: POST /sessions/{id}/reprocess endpoint to re-run from
   any step, with reprocess button on completed pipeline steps

Also fixes pre-existing dewarp_image tuple unpacking bug in run_cv_pipeline
and updates dewarp tests to match current (image, info) return signature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 14:46:38 +01:00
Benjamin Admin
dbf0db0c13 feat(ocr-pipeline): improve LLM review UI + add reconstruction step
StepLlmReview: Show full vocab table with image overlay, row-level
status tracking (pending/active/reviewed/corrected/skipped), and
auto-scroll during SSE streaming. Load previous results on mount.

StepReconstruction: New step 7 with editable text fields at original
bbox positions over dewarped image. Zoom controls, tab navigation,
color-coded columns, save to backend.

Backend: Add POST /sessions/{id}/reconstruction endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 12:19:21 +01:00
Benjamin Admin
2a493890b6 feat(ocr-pipeline): add SSE streaming and phonetic filter to LLM review
- Stream LLM review results batch-by-batch (8 entries per batch) via SSE
- Frontend shows live progress bar, batch log, and corrections appearing
- Skip entries with IPA phonetic transcriptions (already dictionary-corrected)
- Refactor llm_review_entries into reusable helpers for both streaming and non-streaming paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 11:46:06 +01:00
Benjamin Admin
938d1d69cf feat(ocr-pipeline): add LLM-based OCR correction step (Step 6)
Replace the placeholder "Koordinaten" step with an LLM review step that
sends vocab entries to qwen3:30b-a3b via Ollama for OCR error correction
(e.g. "8en" → "Ben"). Teachers can review, accept/reject individual
corrections in a diff table before applying them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 11:13:17 +01:00
Benjamin Admin
6db3c02db4 fix(admin-lehrer): force unique build ID to bust browser caches
Next.js was producing the same chunk hash across builds, causing
browsers to serve stale cached JS even after redeployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 08:54:05 +01:00
Benjamin Admin
50ad06f43a fix(ocr-pipeline): always run fresh word detection, skip stale cache
Word-lookup is now ~0.03s (vs seconds with per-cell Tesseract), so
always re-run detection when entering Step 5 instead of showing
potentially stale cached word_result from the session DB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 08:05:13 +01:00
Benjamin Admin
7f27783008 feat(ocr-pipeline): add SSE streaming for word recognition (Step 5)
Cells now appear one-by-one in the UI as they are OCR'd, with a live
progress bar, instead of waiting for the full result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:54:20 +01:00
Benjamin Admin
27b895a848 feat(ocr-pipeline): generic cell-grid with optional vocab mapping
Extract build_cell_grid() as layout-agnostic foundation from
build_word_grid(). Step 5 now produces a generic cell grid (columns x
rows) and auto-detects whether vocab layout is present. Frontend
dynamically switches between vocab table (EN/DE/Example) and generic
cell table based on layout type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:22:56 +01:00
Benjamin Admin
854d8b431b feat(rag-qa): add 14 missing PDF mappings for EDPB, ENISA, EDPS, TMG, UrhG
Adds entries for all regulation codes in REGULATIONS_IN_RAG that were
missing from RAG_PDF_MAPPING, fixing "Kein PDF-Mapping" messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 11:10:09 +01:00
Benjamin Admin
f2521d2b9e feat(ocr-pipeline): British/American IPA pronunciation choice
- Integrate Britfone dictionary (MIT, 15k British English IPA entries)
- Add pronunciation parameter: 'british' (default) or 'american'
- British uses Britfone (Received Pronunciation), falls back to CMU
- American uses eng_to_ipa/CMU, falls back to Britfone
- Frontend: dropdown to switch pronunciation, default = British
- API: ?pronunciation=british|american query parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 11:08:52 +01:00
Benjamin Admin
954d21e469 fix: use local Inter font to avoid Google Fonts timeout in Docker build
The Docker container cannot reach Google Fonts, causing build failures.
Switch to bundled local font file using next/font/local.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:26:34 +01:00
Benjamin Admin
e3aa8e899e feat(rag-qa): add fullscreen mode for split-view chunk browser
Allows viewing chunks side-by-side with original PDF in fullscreen mode
for large screen QA review. Toggle via button or close with Escape key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:23:32 +01:00
Benjamin Admin
266b9dfad3 Fix PDF 404: default to bp_compliance_ce collection, add PDF existence check
Default collection changed from bp_compliance_gesetze (DE/AT/CH laws where
PDFs need manual download) to bp_compliance_ce (EU regulations where PDFs
are auto-downloaded). Added HEAD request check so missing PDFs show a clear
"PDF nicht vorhanden" message instead of a 404 in the iframe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:13:26 +01:00
Benjamin Admin
b48cd8bb46 Fix ChunkBrowserQA layout: proper height constraints, remove bottom nav duplication
- Root container uses calc(100vh - 220px) for fixed viewport height
- All flex children use min-h-0 to enable proper overflow scrolling
- Removed duplicate bottom nav buttons (Zurueck/Weiter) that appeared
  in the middle of the chunk text — navigation is only in the header now
- Chunk text panel scrolls internally with fixed header
- Added prominent article/section badges in header and panel header
- Added chunk length quality indicator (warns on very short/long chunks)
- Structural metadata keys (article, section, pages) sorted first
- Sidebar shows regulation name instead of code for better readability
- PDF viewer uses pages metadata from payload when available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 20:24:50 +01:00
Benjamin Admin
f7e0f2bb4f feat(ocr-pipeline): line breaks, hyphen rejoin & oversized row splitting
- Preserve \n between visual lines within cells (instead of joining with space)
- Rejoin hyphenated words split across line breaks (e.g. Fuß-\nboden → Fußboden)
- Split oversized rows (>1.5× median height) into sub-entries when EN/DE
  line counts match — deterministic fix for missed Step 4 row boundaries
- Frontend: render \n as <br/>, use textarea for multiline editing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 18:49:28 +01:00
Benjamin Admin
e7fb9d59f1 Fix ChunkBrowserQA: use regulation_id from Qdrant payload instead of regulation_code
The Qdrant collections use regulation_id (e.g. eu_2016_679) as the filter key,
not regulation_code (e.g. GDPR). Updated rag-constants.ts with correct qdrant_id
mappings from actual Qdrant data, fixed API to filter on regulation_id, and updated
ChunkBrowserQA to pass qdrant_id values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 18:22:12 +01:00
Benjamin Admin
8c42fefa77 feat(rag): add QA Split-View Chunk-Browser for ingestion verification
New ChunkBrowserQA component replaces inline chunk browser with:
- Document sidebar with live chunk counts per regulation (batched Qdrant count API)
- Sequential chunk navigation with arrow keys (1/N through all chunks of a document)
- Overlap display showing previous/next chunk boundaries (amber-highlighted)
- Split-view with original PDF via iframe (estimated page from chunk index)
- Adjustable chunks-per-page ratio for PDF page estimation

Extracts REGULATIONS_IN_RAG and REGULATION_INFO to shared rag-constants.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 17:46:11 +01:00
Benjamin Admin
45435f226f feat(ocr-pipeline): line grouping fix + RapidOCR integration
Fix A: Use _group_words_into_lines() with adaptive Y-tolerance to
correctly order words in multi-line cells (fixes word reordering bug).

RapidOCR: Add as alternative OCR engine (PaddleOCR models on ONNX
Runtime, native ARM64). Engine selectable via dropdown in UI or
?engine= query param. Auto mode prefers RapidOCR when available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 17:13:58 +01:00
Benjamin Admin
17604b8eb2 test: add tests for API proxy scroll/collection-count and Chunk-Browser logic
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m41s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 19s
42 tests covering:
- Qdrant scroll endpoint proxy (offset, limit, filters, text search)
- Collection-count endpoint
- REGULATION_SOURCES URL validation (IFRS, EFRAG, ENISA, NIST, OECD)
- Chunk-Browser collections, text search filtering, pagination state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 16:46:42 +01:00
Benjamin Admin
491df4e1b0 feat: add Chunk-Browser tab to RAG page
- New 'Chunk-Browser' tab for sequential chunk browsing
- Qdrant scroll API proxy (scroll + collection-count actions)
- Pagination with prev/next through all chunks in a collection
- Text search filter with highlighting
- Click to expand chunk and see all metadata
- 'In Chunks suchen' button now navigates to Chunk-Browser with correct collection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:35:52 +01:00
Benjamin Admin
954103cdf2 feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows)
Backend: build_word_grid() intersects column regions with content rows,
OCRs each cell with language-specific Tesseract, and returns vocabulary
entries with percent-based bounding boxes. New endpoints: POST /words,
GET /image/words-overlay, ground-truth save/retrieve for words.
Frontend: StepWordRecognition with overview + step-through labeling modes,
goToStep callback for row correction feedback loop.
MkDocs: OCR Pipeline documentation added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 02:18:29 +01:00
Benjamin Admin
47dc2e6f7a feat(rag): source URLs, low-chunk warnings & IFRS/EFRAG entries
- Add REGULATION_SOURCES map with 88 original document URLs for all
  regulations (EUR-Lex, gesetze-im-internet.de, RIS, Fedlex, etc.)
- Render "Originalquelle →" link in regulation detail panel
- Add amber warning indicator for suspiciously low chunk counts (<10)
- Add EU_IFRS_DE, EU_IFRS_EN, EFRAG_ENDORSEMENT to RAG tracking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:56:09 +01:00
Benjamin Admin
b58aecd081 feat(ocr-pipeline): add Step 4 row detection UI in admin frontend
Insert rows step between columns and words in the pipeline wizard.
Shows overlay image, row list with type badges, and ground truth controls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:28:05 +01:00
Benjamin Admin
c7ae44ff17 feat(rag): add 42 new regulations to RAG overview + update collection totals
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m46s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 23s
New regulations across bp_compliance_ce (11), bp_compliance_gesetze (31),
and bp_compliance_datenschutz (1). Collection totals updated:
gesetze 58304, ce 18183, datenschutz 2448, total 103912.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:04:27 +01:00
Benjamin Admin
b03cb0a1e6 Fix Landkarte tab crash: variable name shadowed isInRag function
Local variables named 'isInRag' shadowed the outer function, causing
"isInRag is not a function" error. Renamed to regInRag/codeInRag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 00:01:01 +01:00
Benjamin Admin
5a45cbf605 Update RAG page: Chunks/Status columns use hardcoded data, Key Intersections show RAG status
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 24s
CI / test-python-klausur (push) Failing after 1m36s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 15s
- Chunks column now uses getKnownChunks() instead of API-based getRegulationChunks()
- Status column uses isInRag() check (green/red) instead of ratio-based calculation
- Key Intersections chips show green/red with checkmark/cross based on RAG status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:53:21 +01:00
Benjamin Admin
2297f66edb feat(rag): Add RAG status indicators and 4 new EU regulations
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m39s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 23s
- Add REGULATIONS_IN_RAG Set tracking all 42 regulations currently in Qdrant
- Add 4 new regulation entries: E-Commerce-RL, Verbraucherrechte-RL,
  Digitale-Inhalte-RL, DMA (all ingested Feb 2026)
- Add RAG column to regulations table with green check/red x indicators
- Update Landkarte tab: green/x on industry cards, thematic clusters,
  and regulation matrix
- Replace old "Integrated Regulations" section with full RAG coverage overview
- Update hardcoded chunk counts (Templates: 7689, NiBiS: 7996)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:23:52 +01:00
Benjamin Admin
587b066a40 feat(ocr-pipeline): ground-truth comparison tool for column detection
Side-by-side view: auto result (readonly) vs GT editor where teacher
draws correct columns. Diff table shows Auto vs GT with IoU matching.
GT data persisted per session for algorithm tuning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:48:37 +01:00
Benjamin Admin
bb879a03a8 feat(ocr-pipeline): add column_ignore type for margins/empty areas
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:51:56 +01:00
Benjamin Admin
f535d3c967 fix(ocr-pipeline): manual editor layout + no re-detection on cached result
- ManualColumnEditor now uses grid-cols-2 layout (image left, controls right)
  matching the normal view size so the image doesn't zoom in
- StepColumnDetection only runs auto-detection when no cached result exists;
  revisiting step 3 loads cached columns without re-running detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:45:49 +01:00
Benjamin Admin
7a3570fe46 feat(ocr-pipeline): manual column editor for Step 3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:27:54 +01:00
Benjamin Admin
1393a994f9 Flexible inhaltsbasierte Spaltenerkennung (2-Phasen)
Ersetzt hardcodierte Positionsregeln durch ein zweistufiges System:
Phase A erkennt Spaltengeometrie (Clustering), Phase B klassifiziert
Typen per Inhalt (Sprache/Rolle) mit 3-stufiger Fallback-Kette.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:33:35 +01:00
Benjamin Admin
cf27a95308 feat(ocr-pipeline): word-based 5-column detection for vocabulary pages
Replace projection-profile layout analysis with Tesseract word bounding
box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE,
markers, examples). Falls back to projection profiles when < 3 clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:08:14 +01:00
Benjamin Admin
aa06ae0f61 feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3)
Sessions werden jetzt in PostgreSQL gespeichert statt in-memory.
Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben
Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout()
fuer automatische Spaltenerkennung mit farbigem Overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:16:37 +01:00
Benjamin Admin
09b820efbe refactor(dewarp): replace displacement map with affine shear correction
The old displacement-map approach shifted entire rows by a parabolic
profile, creating a circle/barrel distortion. The actual problem is
a linear vertical shear: after deskew aligns horizontal lines, the
vertical column edges are still tilted by ~0.5°.

New approach:
- Detect shear angle from strongest vertical edge slope (not curvature)
- Apply cv2.warpAffine shear to straighten vertical features
- Manual slider: -2.0° to +2.0° in 0.05° steps
- Slider initializes to auto-detected shear angle
- Ground truth question: "Spalten vertikal ausgerichtet?"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:23:04 +01:00
Benjamin Admin
ff2bb79a91 fix(dewarp): change manual slider to percentage (0-200%) instead of raw multiplier
The old -3.0 to +3.0 scale multiplied the full displacement map (up to ~79px)
directly, causing extreme distortion at values >1. New slider:
- 0% = no correction
- 100% = auto-detected correction (default)
- 200% = double correction
- Step size: 5%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:10:34 +01:00
Benjamin Admin
9df745574b fix(ocr-pipeline): dewarp visibility, grid on both sides, session persistence
- Fix dewarp method selection: prefer methods with >5px curvature over
  higher confidence (vertical_edge 79px was being ignored for text_baseline 2px)
- Add grid overlay on left image in Dewarp step for side-by-side comparison
- Add GET /sessions/{id} endpoint to reload session data
- StepDeskew accepts sessionId prop to restore state when navigating back
- SessionInfo type extended with optional deskew_result and dewarp_result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:29:53 +01:00
Benjamin Admin
44e8c573af fix: Deskew Ground Truth Frage auf Rotation beschraenken
"Korrekt ausgerichtet?" → "Rotation korrekt?" mit Hinweis,
dass Woelbung/Verzerrung im naechsten Schritt korrigiert wird.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:16:24 +01:00
Benjamin Admin
589d2f811a feat: Dewarp-Korrektur als Schritt 2 in OCR Pipeline (7 Schritte)
Implementiert Buchwoelbungs-Entzerrung mit zwei Methoden:
- Methode A: Vertikale-Kanten-Analyse (Sobel + Polynom 2. Grades)
- Methode B: Textzeilen-Baseline (Tesseract + Baseline-Kruemmung)
Beste Methode wird automatisch gewaehlt, manueller Slider (-3 bis +3).

Backend: 3 neue Endpoints (auto/manual dewarp, ground truth)
Frontend: StepDewarp + DewarpControls, Pipeline von 6 auf 7 Schritte

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:46:41 +01:00
Benjamin Admin
d552fd8b6b feat: OCR Pipeline mit 6-Schritt-Wizard fuer Seitenrekonstruktion
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Successful in 1m46s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 22s
Neue Route /ai/ocr-pipeline mit schrittweiser Begradigung (Deskew),
Raster-Overlay und Ground Truth. Schritte 2-6 als Platzhalter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:38:08 +01:00
Benjamin Boenisch
6a53f8d79c refactor: Remove all SDK/compliance pages and API routes from admin-lehrer
SDK/compliance content belongs exclusively in admin-compliance (port 3007).
Removed:
- All (sdk)/ pages (document-crawler, dsb-portal, industry-templates, multi-tenant, sso)
- All api/sdk/ proxy routes
- All developers/sdk/ documentation pages
- Unused lib/sdk/ modules (kept: catalog-manager + its deps for dashboard)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:24:36 +01:00
Benjamin Boenisch
27f1899428 feat: Sync SDK modules, API routes, blog and docs from admin-v2
- DSB Portal, Industry Templates, Multi-Tenant, SSO frontend pages
- All SDK API proxy routes (academy, crawler, incidents, vendors, whistleblower, etc.)
- Blog section with compliance articles
- BYOEH system documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:12:30 +01:00
Benjamin Boenisch
1bd1a0002a feat: Sync Document Crawler frontend page and API proxy (Phase 1.4)
Synced from admin-v2: 4-tab crawler page and catch-all API proxy route.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:35:29 +01:00
Benjamin Boenisch
8c1f7a783f refactor(admin-lehrer): Rename to Admin Lehrer KI and remove migrated categories
Remove communication, infrastructure, and development categories from
navigation (now in Admin Core on port 3008). Rename Admin v2 to
Admin Lehrer KI in sidebar, header, and browser title.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 20:23:16 +01:00
Benjamin Boenisch
5a31f52310 Initial commit: breakpilot-lehrer - Lehrer KI Platform
Services: Admin-Lehrer, Backend-Lehrer, Studio v2, Website,
Klausur-Service, School-Service, Voice-Service, Geo-Service,
BreakPilot Drive, Agent-Core

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:47:26 +01:00