- LLM Compare Seiten, Configs und alle Referenzen geloescht
- Kommunikation-Kategorie in Sidebar mit Video & Chat, Voice Service, Alerts
- Compliance SDK Kategorie aus Sidebar entfernt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handleNext() did nothing on the last step (early return). Now resets
session, steps and navigates back to the session overview.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The side-by-side panels used calc(100vh - 380px) pushing the Speichern/
Abschliessen buttons below the viewport. Reduced to calc(100vh - 580px)
and made the action bar sticky at the bottom.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bold detection:
- Replace absolute threshold with page-level relative comparison
- Measure stroke width for all cells, then mark cells >1.4× median as bold
- Adapts automatically to font, DPI and scan quality
Save buttons:
- Fix status stuck on 'error' preventing re-click
- Better error messages with response body
- Fallback score to 0 when null
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add umlaut confusion rules (i→ü, a→ä, o→ö, u→ü) to _spell_fix_token
for German text — fixes "iberqueren" → "überqueren" etc.
- Add _detect_bold() using OpenCV stroke-width analysis on cell crops
- Integrate bold detection in both narrow (cell-crop) and broad (word-lookup) paths
- Add is_bold field to GridCell TypeScript interface
- Render bold text in StepGroundTruth reconstruction view
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Prepend /klausur-api prefix to original image URL (nginx proxy)
- Remove colored column background stripes, use white background
- Change cell text color to black instead of per-column-type colors
- Calculate font size dynamically from cell bbox height via ResizeObserver
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the stub StepGroundTruth with a full side-by-side Original vs
Reconstruction view. Adds VLM-based image region detection (qwen2.5vl),
mflux image generation proxy, sync scroll/zoom, manual region drawing,
and score/notes persistence.
New backend endpoints: detect-images, generate-image, validate, get validation.
New standalone mflux-service (scripts/mflux-service.py) for Metal GPU generation.
Dockerfile.base: adds fonts-liberation (Apache-2.0).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Frontend: retry /words POST once after 2s delay if it gets 400/404,
which happens when navigating via wizard after container restart
(session cache not yet warm).
Backend: log when session needs DB reload and when dewarped_bgr is missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Der Compliance Advisor gehoert ins Compliance SDK (macmini:3007/sdk/agents),
nicht ins Lehrer-Admin. Die verbleibenden 5 Agenten (TutorAgent, GraderAgent,
QualityJudge, AlertAgent, Orchestrator) bleiben erhalten.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cell-First OCR (v2): Each cell is cropped and OCR'd in isolation,
eliminating neighbour bleeding (e.g. "to", "ps" in marker columns).
Uses ThreadPoolExecutor for parallel Tesseract calls.
Document type detection: Classifies pages as vocab_table, full_text,
or generic_table using projection profiles (<2s, no OCR needed).
Frontend dynamically skips columns/rows steps for full-text pages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add DewarpDetection type with per-method results
- Expand method labels for all 4 detectors (A-D)
- Show green/amber banner: applied vs quality-gate-rejected
- Expandable "Details" panel showing all 4 methods with confidence bars
- Visual confidence bars instead of plain percentage
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace setBackgroundImage() with backgroundImage property (v6 breaking change)
- Replace setWidth/setHeight with Canvas constructor options
- Fix opacity handler to use direct property access
- Update CLAUDE.md: use git -C and docker compose -f instead of cd
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Rename Step 6 label to "Korrektur" (was "OCR-Zeichenkorrektur")
2. Move _fix_character_confusion from pipeline Step 1 into
llm_review_entries_streaming so corrections are visible in the UI:
char changes (| → I, 1 → I, 8 → B) are now emitted as a batch event
right after the meta event, appearing in the corrections list
3. StepReconstruction: all cells (including empty) are now rendered as
editable inputs — removed filter that hid empty cells from the editor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phantom column fix:
Adjacent tiny gaps (e.g. 11px + 35px) can create very narrow columns
(< 3% of content width) with 0 words. These are scan artefacts, not
real columns. New Step 9 in detect_column_geometry():
- Filter columns where width < max(20px, 3% content_w) AND words < 3
- After filtering, extend each remaining column to close the gap with
its right neighbor, and re-assign words to correct column
Example from logs: 5 columns → 4 columns (phantom at x=710, width=36px
eliminated; neighbors expanded to cover the gap)
UI rename:
- 'Schritt 6: LLM-Korrektur' → 'Schritt 6: OCR-Zeichenkorrektur'
- 'LLM-Korrektur starten' → 'Zeichenkorrektur starten'
- Error message updated accordingly
(No LLM involved anymore — spell-checker is the active engine)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add marker/bbox_marker fields to WordEntry type
- Add page_ref/column_marker colors to StepReconstruction
- Make StepLlmReview table dynamic based on columns_used metadata,
showing all detected columns (EN, DE, Example, page_ref, marker)
instead of hardcoded EN/DE/Beispiel only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Frontend: Replace hardcoded EN/DE/Example vocab table with unified dynamic
table driven by columns_used from backend. Labeling, confirmation, counts,
and summary badges are now all cell-based instead of branching on isVocab.
Backend: Change _cells_to_vocab_entries() entry filter from checking only
english/german/example to checking ANY mapped field. This preserves rows
with only marker or source_page content, fixing the issue where marker
sub-columns disappeared at the end of OCR processing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for sub-columns disappearing at end of streaming:
1. Backend: add column_marker mapping in _cells_to_vocab_entries()
so marker text is included in vocab entries (not silently dropped)
2. Frontend types: add source_page and bbox_ref to WordEntry interface
3. Frontend table: show page_ref column (Seite) in vocab table when
entries have source_page data, instead of only EN/DE/Example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Unit tests: 76 new parametrized tests for noise filter, phonetic detection,
cell text cleaning, and row merging (116 total, all green)
2. Continuation-row merge: detect multi-line vocab entries where text wraps
(lowercase EN + empty DE) and merge into previous entry
3. Empty DE fallback: secondary PSM=7 OCR pass for cells missed by PSM=6
4. Batch-OCR: collect empty cells per column, run single Tesseract call on
column strip instead of per-cell (~66% fewer calls for 3+ empty cells)
5. StepReconstruction UI: font scaling via naturalHeight, empty EN/DE field
highlighting, undo/redo (Ctrl+Z), per-cell reset button
6. Session reprocess: POST /sessions/{id}/reprocess endpoint to re-run from
any step, with reprocess button on completed pipeline steps
Also fixes pre-existing dewarp_image tuple unpacking bug in run_cv_pipeline
and updates dewarp tests to match current (image, info) return signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StepLlmReview: Show full vocab table with image overlay, row-level
status tracking (pending/active/reviewed/corrected/skipped), and
auto-scroll during SSE streaming. Load previous results on mount.
StepReconstruction: New step 7 with editable text fields at original
bbox positions over dewarped image. Zoom controls, tab navigation,
color-coded columns, save to backend.
Backend: Add POST /sessions/{id}/reconstruction endpoint.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Stream LLM review results batch-by-batch (8 entries per batch) via SSE
- Frontend shows live progress bar, batch log, and corrections appearing
- Skip entries with IPA phonetic transcriptions (already dictionary-corrected)
- Refactor llm_review_entries into reusable helpers for both streaming and non-streaming paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the placeholder "Koordinaten" step with an LLM review step that
sends vocab entries to qwen3:30b-a3b via Ollama for OCR error correction
(e.g. "8en" → "Ben"). Teachers can review, accept/reject individual
corrections in a diff table before applying them.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Next.js was producing the same chunk hash across builds, causing
browsers to serve stale cached JS even after redeployment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Word-lookup is now ~0.03s (vs seconds with per-cell Tesseract), so
always re-run detection when entering Step 5 instead of showing
potentially stale cached word_result from the session DB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cells now appear one-by-one in the UI as they are OCR'd, with a live
progress bar, instead of waiting for the full result.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract build_cell_grid() as layout-agnostic foundation from
build_word_grid(). Step 5 now produces a generic cell grid (columns x
rows) and auto-detects whether vocab layout is present. Frontend
dynamically switches between vocab table (EN/DE/Example) and generic
cell table based on layout type.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds entries for all regulation codes in REGULATIONS_IN_RAG that were
missing from RAG_PDF_MAPPING, fixing "Kein PDF-Mapping" messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Integrate Britfone dictionary (MIT, 15k British English IPA entries)
- Add pronunciation parameter: 'british' (default) or 'american'
- British uses Britfone (Received Pronunciation), falls back to CMU
- American uses eng_to_ipa/CMU, falls back to Britfone
- Frontend: dropdown to switch pronunciation, default = British
- API: ?pronunciation=british|american query parameter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Docker container cannot reach Google Fonts, causing build failures.
Switch to bundled local font file using next/font/local.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows viewing chunks side-by-side with original PDF in fullscreen mode
for large screen QA review. Toggle via button or close with Escape key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default collection changed from bp_compliance_gesetze (DE/AT/CH laws where
PDFs need manual download) to bp_compliance_ce (EU regulations where PDFs
are auto-downloaded). Added HEAD request check so missing PDFs show a clear
"PDF nicht vorhanden" message instead of a 404 in the iframe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Root container uses calc(100vh - 220px) for fixed viewport height
- All flex children use min-h-0 to enable proper overflow scrolling
- Removed duplicate bottom nav buttons (Zurueck/Weiter) that appeared
in the middle of the chunk text — navigation is only in the header now
- Chunk text panel scrolls internally with fixed header
- Added prominent article/section badges in header and panel header
- Added chunk length quality indicator (warns on very short/long chunks)
- Structural metadata keys (article, section, pages) sorted first
- Sidebar shows regulation name instead of code for better readability
- PDF viewer uses pages metadata from payload when available
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Preserve \n between visual lines within cells (instead of joining with space)
- Rejoin hyphenated words split across line breaks (e.g. Fuß-\nboden → Fußboden)
- Split oversized rows (>1.5× median height) into sub-entries when EN/DE
line counts match — deterministic fix for missed Step 4 row boundaries
- Frontend: render \n as <br/>, use textarea for multiline editing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Qdrant collections use regulation_id (e.g. eu_2016_679) as the filter key,
not regulation_code (e.g. GDPR). Updated rag-constants.ts with correct qdrant_id
mappings from actual Qdrant data, fixed API to filter on regulation_id, and updated
ChunkBrowserQA to pass qdrant_id values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New ChunkBrowserQA component replaces inline chunk browser with:
- Document sidebar with live chunk counts per regulation (batched Qdrant count API)
- Sequential chunk navigation with arrow keys (1/N through all chunks of a document)
- Overlap display showing previous/next chunk boundaries (amber-highlighted)
- Split-view with original PDF via iframe (estimated page from chunk index)
- Adjustable chunks-per-page ratio for PDF page estimation
Extracts REGULATIONS_IN_RAG and REGULATION_INFO to shared rag-constants.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix A: Use _group_words_into_lines() with adaptive Y-tolerance to
correctly order words in multi-line cells (fixes word reordering bug).
RapidOCR: Add as alternative OCR engine (PaddleOCR models on ONNX
Runtime, native ARM64). Engine selectable via dropdown in UI or
?engine= query param. Auto mode prefers RapidOCR when available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New 'Chunk-Browser' tab for sequential chunk browsing
- Qdrant scroll API proxy (scroll + collection-count actions)
- Pagination with prev/next through all chunks in a collection
- Text search filter with highlighting
- Click to expand chunk and see all metadata
- 'In Chunks suchen' button now navigates to Chunk-Browser with correct collection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: build_word_grid() intersects column regions with content rows,
OCRs each cell with language-specific Tesseract, and returns vocabulary
entries with percent-based bounding boxes. New endpoints: POST /words,
GET /image/words-overlay, ground-truth save/retrieve for words.
Frontend: StepWordRecognition with overview + step-through labeling modes,
goToStep callback for row correction feedback loop.
MkDocs: OCR Pipeline documentation added.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add REGULATION_SOURCES map with 88 original document URLs for all
regulations (EUR-Lex, gesetze-im-internet.de, RIS, Fedlex, etc.)
- Render "Originalquelle →" link in regulation detail panel
- Add amber warning indicator for suspiciously low chunk counts (<10)
- Add EU_IFRS_DE, EU_IFRS_EN, EFRAG_ENDORSEMENT to RAG tracking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Insert rows step between columns and words in the pipeline wizard.
Shows overlay image, row list with type badges, and ground truth controls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New regulations across bp_compliance_ce (11), bp_compliance_gesetze (31),
and bp_compliance_datenschutz (1). Collection totals updated:
gesetze 58304, ce 18183, datenschutz 2448, total 103912.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Local variables named 'isInRag' shadowed the outer function, causing
"isInRag is not a function" error. Renamed to regInRag/codeInRag.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chunks column now uses getKnownChunks() instead of API-based getRegulationChunks()
- Status column uses isInRag() check (green/red) instead of ratio-based calculation
- Key Intersections chips show green/red with checkmark/cross based on RAG status
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add REGULATIONS_IN_RAG Set tracking all 42 regulations currently in Qdrant
- Add 4 new regulation entries: E-Commerce-RL, Verbraucherrechte-RL,
Digitale-Inhalte-RL, DMA (all ingested Feb 2026)
- Add RAG column to regulations table with green check/red x indicators
- Update Landkarte tab: green/x on industry cards, thematic clusters,
and regulation matrix
- Replace old "Integrated Regulations" section with full RAG coverage overview
- Update hardcoded chunk counts (Templates: 7689, NiBiS: 7996)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>