Compare commits

..

3 Commits

Author SHA1 Message Date
Sharang Parnerkar
b697963186 fix: use Alpine-compatible addgroup/adduser flags in Dockerfiles
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
Replace --system/--gid/--uid (Debian syntax) with -S/-g/-u (BusyBox/Alpine).
Coolify ARG injection causes exit code 255 with Debian-style flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:38:31 +01:00
Sharang Parnerkar
ef6237ffdf refactor(coolify): externalize postgres, qdrant, S3
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
- Replace bp-core-postgres with POSTGRES_HOST env var
- Replace bp-core-qdrant with QDRANT_URL env var
- Replace bp-core-minio with S3_ENDPOINT/S3_ACCESS_KEY/S3_SECRET_KEY

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:23:32 +01:00
Sharang Parnerkar
41a8f3b183 feat: add Coolify deployment configuration
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
Add docker-compose.coolify.yml (8 services), .env.coolify.example,
and Gitea Action workflow for Coolify API deployment. Removes
core-health-check, paddleocr, transcription-worker, agent-core,
drive, and docs. Adds Traefik labels for *.breakpilot.ai domain
routing with Let's Encrypt SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:43:15 +01:00
63 changed files with 690 additions and 21522 deletions

View File

@@ -6,31 +6,22 @@
| Geraet | Rolle | Aufgaben |
|--------|-------|----------|
| **MacBook** | Entwicklung | Claude Terminal, Code-Entwicklung, Browser (Frontend-Tests) |
| **Mac Mini** | Server | Docker, alle Services, Tests, Builds, Deployment |
| **MacBook** | Client | Claude Terminal, Browser (Frontend-Tests) |
| **Mac Mini** | Server | Docker, alle Services, Code-Ausfuehrung, Tests, Git |
**WICHTIG:** Code wird direkt auf dem MacBook in diesem Repo bearbeitet. Docker und Services laufen auf dem Mac Mini.
**WICHTIG:** Die Entwicklung findet vollstaendig auf dem **Mac Mini** statt!
### Entwicklungsworkflow
### SSH-Verbindung
```bash
# 1. Code auf MacBook bearbeiten (dieses Verzeichnis)
# 2. Committen und pushen:
git push origin main && git push gitea main
ssh macmini
# Projektverzeichnis:
cd /Users/benjaminadmin/Projekte/breakpilot-lehrer
# 3. Auf Mac Mini pullen und Container neu bauen:
ssh macmini "git -C /Users/benjaminadmin/Projekte/breakpilot-lehrer pull --no-rebase origin main"
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml build --no-cache <service>"
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml up -d <service>"
# Einzelbefehle (BEVORZUGT):
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && <cmd>"
```
### SSH-Verbindung (fuer Docker/Tests)
**WICHTIG:** `cd` in SSH-Kommandos funktioniert NICHT zuverlaessig! Stattdessen:
- Git: `git -C /Users/benjaminadmin/Projekte/breakpilot-lehrer <cmd>`
- Docker: `/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml <cmd>`
- Logs: `/usr/local/bin/docker logs -f bp-lehrer-<service>`
---
## Voraussetzung
@@ -172,10 +163,10 @@ breakpilot-lehrer/
```bash
# Lehrer-Services starten (Core muss laufen!)
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml up -d"
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && /usr/local/bin/docker compose up -d"
# Einzelnen Service neu bauen
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml build --no-cache <service>"
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && /usr/local/bin/docker compose build --no-cache <service>"
# Logs
ssh macmini "/usr/local/bin/docker logs -f bp-lehrer-<service>"
@@ -185,7 +176,6 @@ ssh macmini "/usr/local/bin/docker ps --filter name=bp-lehrer"
```
**WICHTIG:** Docker-Pfad auf Mac Mini ist `/usr/local/bin/docker` (nicht im Standard-SSH-PATH).
**WICHTIG:** Immer `-f` mit vollem Pfad zur docker-compose.yml nutzen, `cd` in SSH funktioniert nicht!
### Frontend-Entwicklung

79
.env.coolify.example Normal file
View File

@@ -0,0 +1,79 @@
# =========================================================
# BreakPilot Lehrer — Coolify Environment Variables
# =========================================================
# Copy these into Coolify's environment variable UI
# for the breakpilot-lehrer Docker Compose resource.
# =========================================================
# --- External PostgreSQL (Coolify-managed, same as Core) ---
POSTGRES_HOST=<coolify-postgres-hostname>
POSTGRES_PORT=5432
POSTGRES_USER=breakpilot
POSTGRES_PASSWORD=CHANGE_ME_SAME_AS_CORE
POSTGRES_DB=breakpilot_db
# --- Security ---
JWT_SECRET=CHANGE_ME_SAME_AS_CORE
# --- External S3 Storage (same as Core) ---
S3_ENDPOINT=<s3-endpoint-host:port>
S3_ACCESS_KEY=CHANGE_ME_SAME_AS_CORE
S3_SECRET_KEY=CHANGE_ME_SAME_AS_CORE
S3_BUCKET=breakpilot-rag
S3_SECURE=true
# --- External Qdrant (Coolify-managed, same as Core) ---
QDRANT_URL=http://<coolify-qdrant-hostname>:6333
# --- Session ---
SESSION_TTL_HOURS=24
# --- SMTP (Real mail server) ---
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USERNAME=noreply@breakpilot.ai
SMTP_PASSWORD=CHANGE_ME_SMTP_PASSWORD
SMTP_FROM_NAME=BreakPilot
SMTP_FROM_ADDR=noreply@breakpilot.ai
# --- LLM / Ollama (optional) ---
OLLAMA_BASE_URL=
OLLAMA_URL=
OLLAMA_ENABLED=false
OLLAMA_DEFAULT_MODEL=
OLLAMA_VISION_MODEL=
OLLAMA_CORRECTION_MODEL=
OLLAMA_TIMEOUT=120
# --- Anthropic (optional) ---
ANTHROPIC_API_KEY=
# --- vast.ai GPU (optional) ---
VAST_API_KEY=
VAST_INSTANCE_ID=
# --- Game Settings ---
GAME_USE_DATABASE=true
GAME_REQUIRE_AUTH=true
GAME_REQUIRE_BILLING=true
GAME_LLM_MODEL=
# --- Frontend URLs (build args) ---
NEXT_PUBLIC_API_URL=https://api-lehrer.breakpilot.ai
NEXT_PUBLIC_KLAUSUR_SERVICE_URL=https://klausur.breakpilot.ai
NEXT_PUBLIC_VOICE_SERVICE_URL=wss://voice.breakpilot.ai
NEXT_PUBLIC_BILLING_API_URL=https://api-core.breakpilot.ai
NEXT_PUBLIC_APP_URL=https://app.breakpilot.ai
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=
# --- Edu Search ---
EDU_SEARCH_URL=
EDU_SEARCH_API_KEY=
OPENSEARCH_PASSWORD=CHANGE_ME_OPENSEARCH_PASSWORD
# --- Misc ---
CONTROL_API_KEY=
ALERTS_AGENT_ENABLED=false
PADDLEOCR_SERVICE_URL=
TROCR_SERVICE_URL=
CAMUNDA_URL=

View File

@@ -30,23 +30,6 @@ OLLAMA_VISION_MODEL=llama3.2-vision
OLLAMA_CORRECTION_MODEL=llama3.2
OLLAMA_TIMEOUT=120
# OCR-Pipeline: LLM-Review (Schritt 6)
# Kleine Modelle reichen fuer Zeichen-Korrekturen (0->O, 1->l, 5->S)
# Optionen: qwen3:0.6b, qwen3:1.7b, gemma3:1b, qwen3:30b-a3b
OLLAMA_REVIEW_MODEL=qwen3:0.6b
# Eintraege pro Ollama-Call. Groesser = weniger HTTP-Overhead.
OLLAMA_REVIEW_BATCH_SIZE=20
# OCR-Pipeline: Engine fuer Schritt 5 (Worterkennung)
# Optionen: auto (bevorzugt RapidOCR), rapid, tesseract,
# trocr-printed, trocr-handwritten, lighton
OCR_ENGINE=auto
# Klausur-HTR: Primaerem Modell fuer Handschriftenerkennung (qwen2.5vl bereits auf Mac Mini)
OLLAMA_HTR_MODEL=qwen2.5vl:32b
# HTR Fallback: genutzt wenn Ollama nicht erreichbar (auto-download ~340 MB)
HTR_FALLBACK_MODEL=trocr-large
# Anthropic (optional)
ANTHROPIC_API_KEY=

View File

@@ -0,0 +1,32 @@
name: Deploy to Coolify
on:
push:
branches:
- coolify
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Wait for Core deployment
run: |
echo "Waiting 30s for Core services to stabilize..."
sleep 30
- name: Deploy via Coolify API
run: |
echo "Deploying breakpilot-lehrer to Coolify..."
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST \
-H "Authorization: Bearer ${{ secrets.COOLIFY_API_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{"uuid": "${{ secrets.COOLIFY_RESOURCE_UUID }}", "force_rebuild": true}' \
"${{ secrets.COOLIFY_BASE_URL }}/api/v1/deploy")
echo "HTTP Status: $HTTP_STATUS"
if [ "$HTTP_STATUS" -ne 200 ] && [ "$HTTP_STATUS" -ne 201 ]; then
echo "Deployment failed with status $HTTP_STATUS"
exit 1
fi
echo "Deployment triggered successfully!"

View File

@@ -34,8 +34,8 @@ WORKDIR /app
ENV NODE_ENV=production
# Create non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
RUN addgroup -S -g 1001 nodejs
RUN adduser -S -u 1001 -G nodejs nextjs
# Copy built assets
COPY --from=builder /app/public ./public

View File

@@ -273,6 +273,52 @@ Dein Ziel ist die rechtzeitige Erkennung und Kommunikation relevanter Ereignisse
createdAt: '2024-12-01T00:00:00Z',
updatedAt: '2025-01-12T02:00:00Z'
},
'compliance-advisor': {
id: 'compliance-advisor',
name: 'Compliance Advisor',
description: 'DSGVO/Compliance-Berater fuer SDK-Nutzer',
soulFile: 'compliance-advisor.soul.md',
soulContent: `# Compliance Advisor Agent
## Identitaet
Du bist der BreakPilot Compliance-Berater. Du hilfst Nutzern des AI Compliance SDK,
Datenschutz- und Compliance-Fragen in verstaendlicher Sprache zu beantworten.
Du bist kein Anwalt und gibst keine Rechtsberatung, sondern orientierst dich an
offiziellen Quellen und gibst praxisnahe Hinweise.
## Kernprinzipien
- **Quellenbasiert**: Verweise immer auf konkrete Rechtsgrundlagen (DSGVO-Artikel, BDSG-Paragraphen)
- **Verstaendlich**: Erklaere rechtliche Konzepte in einfacher, praxisnaher Sprache
- **Ehrlich**: Bei Unsicherheit empfehle professionelle Rechtsberatung
- **Kontextbewusst**: Nutze das RAG-System fuer aktuelle Rechtstexte und Leitfaeden
- **Scope-bewusst**: Nutze alle verfuegbaren RAG-Quellen AUSSER NIBIS-Dokumenten
## Kompetenzbereich
- DSGVO Art. 1-99 + Erwaegsgruende
- BDSG (Bundesdatenschutzgesetz)
- AI Act (EU KI-Verordnung)
- TTDSG, ePrivacy-Richtlinie
- DSK-Kurzpapiere (Nr. 1-20)
- SDM V3.0, BSI-Grundschutz, BSI-TR-03161
- EDPB Guidelines, Bundes-/Laender-Muss-Listen
- ISO 27001/27701 (Ueberblick)
## Kommunikationsstil
- Sachlich, aber verstaendlich
- Deutsch als Hauptsprache
- Strukturierte Antworten mit Quellenangabe
- Praxisbeispiele wo hilfreich`,
color: '#6366f1',
status: 'running',
activeSessions: 0,
totalProcessed: 0,
avgResponseTime: 0,
errorRate: 0,
lastRestart: new Date().toISOString(),
version: '1.0.0',
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString()
},
'orchestrator': {
id: 'orchestrator',
name: 'Orchestrator',

View File

@@ -94,6 +94,19 @@ const mockAgents: AgentConfig[] = [
totalProcessed: 8934,
avgResponseTime: 12,
lastActivity: 'just now'
},
{
id: 'compliance-advisor',
name: 'Compliance Advisor',
description: 'DSGVO/Compliance-Berater fuer SDK-Nutzer',
soulFile: 'compliance-advisor.soul.md',
color: '#6366f1',
icon: 'message',
status: 'running',
activeSessions: 0,
totalProcessed: 0,
avgResponseTime: 0,
lastActivity: new Date().toISOString()
}
]

View File

@@ -1,409 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import { PagePurpose } from '@/components/common/PagePurpose'
import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
import { StepDeskew } from '@/components/ocr-pipeline/StepDeskew'
import { StepDewarp } from '@/components/ocr-pipeline/StepDewarp'
import { StepColumnDetection } from '@/components/ocr-pipeline/StepColumnDetection'
import { StepRowDetection } from '@/components/ocr-pipeline/StepRowDetection'
import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecognition'
import { StepLlmReview } from '@/components/ocr-pipeline/StepLlmReview'
import { StepReconstruction } from '@/components/ocr-pipeline/StepReconstruction'
import { StepGroundTruth } from '@/components/ocr-pipeline/StepGroundTruth'
import { PIPELINE_STEPS, type PipelineStep, type SessionListItem, type DocumentTypeResult } from './types'
const KLAUSUR_API = '/klausur-api'
export default function OcrPipelinePage() {
const [currentStep, setCurrentStep] = useState(0)
const [sessionId, setSessionId] = useState<string | null>(null)
const [sessionName, setSessionName] = useState<string>('')
const [sessions, setSessions] = useState<SessionListItem[]>([])
const [loadingSessions, setLoadingSessions] = useState(true)
const [editingName, setEditingName] = useState<string | null>(null)
const [editNameValue, setEditNameValue] = useState('')
const [docTypeResult, setDocTypeResult] = useState<DocumentTypeResult | null>(null)
const [steps, setSteps] = useState<PipelineStep[]>(
PIPELINE_STEPS.map((s, i) => ({
...s,
status: i === 0 ? 'active' : 'pending',
})),
)
// Load session list on mount
useEffect(() => {
loadSessions()
}, [])
const loadSessions = async () => {
setLoadingSessions(true)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
if (res.ok) {
const data = await res.json()
setSessions(data.sessions || [])
}
} catch (e) {
console.error('Failed to load sessions:', e)
} finally {
setLoadingSessions(false)
}
}
const openSession = useCallback(async (sid: string) => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
if (!res.ok) return
const data = await res.json()
setSessionId(sid)
setSessionName(data.name || data.filename || '')
// Restore doc type result if available
const savedDocType: DocumentTypeResult | null = data.doc_type_result || null
setDocTypeResult(savedDocType)
// Determine which step to jump to based on current_step
const dbStep = data.current_step || 1
// Steps: 1=deskew, 2=dewarp, 3=columns, ...
// UI steps are 0-indexed: 0=deskew, 1=dewarp, 2=columns, ...
const uiStep = Math.max(0, dbStep - 1)
const skipSteps = savedDocType?.skip_steps || []
setSteps(
PIPELINE_STEPS.map((s, i) => ({
...s,
status: skipSteps.includes(s.id)
? 'skipped'
: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
})),
)
setCurrentStep(uiStep)
} catch (e) {
console.error('Failed to open session:', e)
}
}, [])
const deleteSession = useCallback(async (sid: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
setSessions((prev) => prev.filter((s) => s.id !== sid))
if (sessionId === sid) {
setSessionId(null)
setCurrentStep(0)
setDocTypeResult(null)
setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}
} catch (e) {
console.error('Failed to delete session:', e)
}
}, [sessionId])
const renameSession = useCallback(async (sid: string, newName: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: newName }),
})
setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, name: newName } : s)))
if (sessionId === sid) setSessionName(newName)
} catch (e) {
console.error('Failed to rename session:', e)
}
setEditingName(null)
}, [sessionId])
const handleStepClick = (index: number) => {
if (index <= currentStep || steps[index].status === 'completed') {
setCurrentStep(index)
}
}
const goToStep = (step: number) => {
setCurrentStep(step)
setSteps((prev) =>
prev.map((s, i) => ({
...s,
status: i < step ? 'completed' : i === step ? 'active' : 'pending',
})),
)
}
const handleNext = () => {
if (currentStep >= steps.length - 1) return
// Find the next non-skipped step
const skipSteps = docTypeResult?.skip_steps || []
let nextStep = currentStep + 1
while (nextStep < steps.length && skipSteps.includes(PIPELINE_STEPS[nextStep]?.id)) {
nextStep++
}
if (nextStep >= steps.length) nextStep = steps.length - 1
setSteps((prev) =>
prev.map((s, i) => {
if (i === currentStep) return { ...s, status: 'completed' }
if (i === nextStep) return { ...s, status: 'active' }
// Mark skipped steps between current and next
if (i > currentStep && i < nextStep && skipSteps.includes(PIPELINE_STEPS[i]?.id)) {
return { ...s, status: 'skipped' }
}
return s
}),
)
setCurrentStep(nextStep)
}
const handleDeskewComplete = (sid: string) => {
setSessionId(sid)
// Reload session list to show the new session
loadSessions()
handleNext()
}
const handleDewarpNext = async () => {
// Auto-detect document type after dewarp, then advance
if (sessionId) {
try {
const res = await fetch(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/detect-type`,
{ method: 'POST' },
)
if (res.ok) {
const data: DocumentTypeResult = await res.json()
setDocTypeResult(data)
// Mark skipped steps immediately
const skipSteps = data.skip_steps || []
if (skipSteps.length > 0) {
setSteps((prev) =>
prev.map((s) =>
skipSteps.includes(s.id) ? { ...s, status: 'skipped' } : s,
),
)
}
}
} catch (e) {
console.error('Doc type detection failed:', e)
// Not critical — continue without it
}
}
handleNext()
}
const handleDocTypeChange = (newDocType: DocumentTypeResult['doc_type']) => {
if (!docTypeResult) return
// Build new skip_steps based on doc type
let skipSteps: string[] = []
if (newDocType === 'full_text') {
skipSteps = ['columns', 'rows']
}
// vocab_table and generic_table: no skips
const updated: DocumentTypeResult = {
...docTypeResult,
doc_type: newDocType,
skip_steps: skipSteps,
pipeline: newDocType === 'full_text' ? 'full_page' : 'cell_first',
}
setDocTypeResult(updated)
// Update step statuses
setSteps((prev) =>
prev.map((s) => {
if (skipSteps.includes(s.id)) return { ...s, status: 'skipped' as const }
if (s.status === 'skipped') return { ...s, status: 'pending' as const }
return s
}),
)
}
const handleNewSession = () => {
setSessionId(null)
setSessionName('')
setCurrentStep(0)
setDocTypeResult(null)
setSteps(PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}
const stepNames: Record<number, string> = {
1: 'Begradigung',
2: 'Entzerrung',
3: 'Spalten',
4: 'Zeilen',
5: 'Woerter',
6: 'Korrektur',
7: 'Rekonstruktion',
8: 'Validierung',
}
const reprocessFromStep = useCallback(async (uiStep: number) => {
if (!sessionId) return
const dbStep = uiStep + 1 // UI is 0-indexed, DB is 1-indexed
if (!confirm(`Ab Schritt ${dbStep} (${stepNames[dbStep] || '?'}) neu verarbeiten? Nachfolgende Daten werden geloescht.`)) return
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ from_step: dbStep }),
})
if (!res.ok) {
const data = await res.json().catch(() => ({}))
console.error('Reprocess failed:', data.detail || res.status)
return
}
// Reset UI steps
goToStep(uiStep)
} catch (e) {
console.error('Reprocess error:', e)
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId, goToStep])
const renderStep = () => {
switch (currentStep) {
case 0:
return <StepDeskew sessionId={sessionId} onNext={handleDeskewComplete} />
case 1:
return <StepDewarp sessionId={sessionId} onNext={handleDewarpNext} />
case 2:
return <StepColumnDetection sessionId={sessionId} onNext={handleNext} />
case 3:
return <StepRowDetection sessionId={sessionId} onNext={handleNext} />
case 4:
return <StepWordRecognition sessionId={sessionId} onNext={handleNext} goToStep={goToStep} />
case 5:
return <StepLlmReview sessionId={sessionId} onNext={handleNext} />
case 6:
return <StepReconstruction sessionId={sessionId} onNext={handleNext} />
case 7:
return <StepGroundTruth />
default:
return null
}
}
return (
<div className="space-y-6">
<PagePurpose
title="OCR Pipeline"
purpose="Schrittweise Seitenrekonstruktion: Scan begradigen, Spalten erkennen, Woerter lokalisieren und die Seite Wort fuer Wort nachbauen. Ziel: 10 Vokabelseiten fehlerfrei rekonstruieren."
audience={['Entwickler', 'Data Scientists']}
architecture={{
services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract'],
databases: ['PostgreSQL Sessions'],
}}
relatedPages={[
{ name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'Methoden-Vergleich' },
{ name: 'OCR-Labeling', href: '/ai/ocr-labeling', description: 'Trainingsdaten' },
]}
defaultCollapsed
/>
{/* Session List */}
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
<div className="flex items-center justify-between mb-3">
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Sessions
</h3>
<button
onClick={handleNewSession}
className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
>
+ Neue Session
</button>
</div>
{loadingSessions ? (
<div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
) : sessions.length === 0 ? (
<div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
) : (
<div className="space-y-1 max-h-48 overflow-y-auto">
{sessions.map((s) => (
<div
key={s.id}
className={`flex items-center gap-2 px-3 py-2 rounded-lg text-sm transition-colors cursor-pointer ${
sessionId === s.id
? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
: 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
}`}
>
<div className="flex-1 min-w-0" onClick={() => openSession(s.id)}>
{editingName === s.id ? (
<input
autoFocus
value={editNameValue}
onChange={(e) => setEditNameValue(e.target.value)}
onBlur={() => renameSession(s.id, editNameValue)}
onKeyDown={(e) => {
if (e.key === 'Enter') renameSession(s.id, editNameValue)
if (e.key === 'Escape') setEditingName(null)
}}
onClick={(e) => e.stopPropagation()}
className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
/>
) : (
<div className="truncate font-medium text-gray-700 dark:text-gray-300">
{s.name || s.filename}
</div>
)}
<div className="text-xs text-gray-400 flex gap-2">
<span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
<span>Schritt {s.current_step}: {stepNames[s.current_step] || '?'}</span>
</div>
</div>
<button
onClick={(e) => {
e.stopPropagation()
setEditNameValue(s.name || s.filename)
setEditingName(s.id)
}}
className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
title="Umbenennen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
</button>
<button
onClick={(e) => {
e.stopPropagation()
if (confirm('Session loeschen?')) deleteSession(s.id)
}}
className="p-1 text-gray-400 hover:text-red-500"
title="Loeschen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
</svg>
</button>
</div>
))}
</div>
)}
</div>
{/* Active session name */}
{sessionId && sessionName && (
<div className="text-sm text-gray-500 dark:text-gray-400">
Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span>
</div>
)}
<PipelineStepper
steps={steps}
currentStep={currentStep}
onStepClick={handleStepClick}
onReprocess={sessionId ? reprocessFromStep : undefined}
docTypeResult={docTypeResult}
onDocTypeChange={handleDocTypeChange}
/>
<div className="min-h-[400px]">{renderStep()}</div>
</div>
)
}

View File

@@ -1,243 +0,0 @@
export type PipelineStepStatus = 'pending' | 'active' | 'completed' | 'failed' | 'skipped'
export interface PipelineStep {
id: string
name: string
icon: string
status: PipelineStepStatus
}
export interface SessionListItem {
id: string
name: string
filename: string
status: string
current_step: number
created_at: string
updated_at?: string
}
export interface DocumentTypeResult {
doc_type: 'vocab_table' | 'full_text' | 'generic_table'
confidence: number
pipeline: 'cell_first' | 'full_page'
skip_steps: string[]
features?: Record<string, unknown>
duration_seconds?: number
}
export interface SessionInfo {
session_id: string
filename: string
name?: string
image_width: number
image_height: number
original_image_url: string
current_step?: number
deskew_result?: DeskewResult
dewarp_result?: DewarpResult
column_result?: ColumnResult
row_result?: RowResult
word_result?: GridResult
doc_type_result?: DocumentTypeResult
}
export interface DeskewResult {
session_id: string
angle_hough: number
angle_word_alignment: number
angle_applied: number
method_used: 'hough' | 'word_alignment' | 'manual'
confidence: number
duration_seconds: number
deskewed_image_url: string
binarized_image_url: string
}
export interface DeskewGroundTruth {
is_correct: boolean
corrected_angle?: number
notes?: string
}
export interface DewarpDetection {
method: string
shear_degrees: number
confidence: number
}
export interface DewarpResult {
session_id: string
method_used: string
shear_degrees: number
confidence: number
duration_seconds: number
dewarped_image_url: string
detections?: DewarpDetection[]
}
export interface DewarpGroundTruth {
is_correct: boolean
corrected_shear?: number
notes?: string
}
export interface PageRegion {
type: 'column_en' | 'column_de' | 'column_example' | 'page_ref'
| 'column_marker' | 'column_text' | 'column_ignore' | 'header' | 'footer'
x: number
y: number
width: number
height: number
classification_confidence?: number
classification_method?: string
}
export interface ColumnResult {
columns: PageRegion[]
duration_seconds: number
}
export interface ColumnGroundTruth {
is_correct: boolean
corrected_columns?: PageRegion[]
notes?: string
}
export interface ManualColumnDivider {
xPercent: number // Position in % of image width (0-100)
}
export type ColumnTypeKey = PageRegion['type']
export interface RowResult {
rows: RowItem[]
summary: Record<string, number>
total_rows: number
duration_seconds: number
}
export interface RowItem {
index: number
x: number
y: number
width: number
height: number
word_count: number
row_type: 'content' | 'header' | 'footer'
gap_before: number
}
export interface RowGroundTruth {
is_correct: boolean
corrected_rows?: RowItem[]
notes?: string
}
export interface WordBbox {
x: number
y: number
w: number
h: number
}
export interface GridCell {
cell_id: string // "R03_C1"
row_index: number
col_index: number
col_type: string
text: string
confidence: number
bbox_px: WordBbox
bbox_pct: WordBbox
ocr_engine?: string
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
}
export interface ColumnMeta {
index: number
type: string
x: number
width: number
}
export interface GridResult {
cells: GridCell[]
grid_shape: { rows: number; cols: number; total_cells: number }
columns_used: ColumnMeta[]
layout: 'vocab' | 'generic'
image_width: number
image_height: number
duration_seconds: number
ocr_engine?: string
vocab_entries?: WordEntry[] // Only when layout='vocab'
entries?: WordEntry[] // Backwards compat alias for vocab_entries
entry_count?: number
summary: {
total_cells: number
non_empty_cells: number
low_confidence: number
// Only when layout='vocab':
total_entries?: number
with_english?: number
with_german?: number
}
llm_review?: {
changes: { row_index: number; field: string; old: string; new: string }[]
model_used: string
duration_ms: number
entries_corrected: number
applied_count?: number
applied_at?: string
}
}
export interface WordEntry {
row_index: number
english: string
german: string
example: string
source_page?: string
marker?: string
confidence: number
bbox: WordBbox
bbox_en: WordBbox | null
bbox_de: WordBbox | null
bbox_ex: WordBbox | null
bbox_ref?: WordBbox | null
bbox_marker?: WordBbox | null
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
}
/** @deprecated Use GridResult instead */
export interface WordResult {
entries: WordEntry[]
entry_count: number
image_width: number
image_height: number
duration_seconds: number
ocr_engine?: string
summary: {
total_entries: number
with_english: number
with_german: number
low_confidence: number
}
}
export interface WordGroundTruth {
is_correct: boolean
corrected_entries?: WordEntry[]
notes?: string
}
export const PIPELINE_STEPS: PipelineStep[] = [
{ id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
{ id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
{ id: 'columns', name: 'Spalten', icon: '📊', status: 'pending' },
{ id: 'rows', name: 'Zeilen', icon: '📏', status: 'pending' },
{ id: 'words', name: 'Woerter', icon: '🔤', status: 'pending' },
{ id: 'llm-review', name: 'Korrektur', icon: '✏️', status: 'pending' },
{ id: 'reconstruction', name: 'Rekonstruktion', icon: '🏗️', status: 'pending' },
{ id: 'ground-truth', name: 'Validierung', icon: '✅', status: 'pending' },
]

View File

@@ -1,671 +0,0 @@
'use client'
import React, { useState, useEffect, useCallback, useRef } from 'react'
import { RAG_PDF_MAPPING } from './rag-pdf-mapping'
import { REGULATIONS_IN_RAG, REGULATION_INFO } from '../rag-constants'
interface ChunkBrowserQAProps {
apiProxy: string
}
type RegGroupKey = 'eu_regulation' | 'eu_directive' | 'de_law' | 'at_law' | 'ch_law' | 'national_law' | 'bsi_standard' | 'eu_guideline' | 'international_standard' | 'other'
const GROUP_LABELS: Record<RegGroupKey, string> = {
eu_regulation: 'EU Verordnungen',
eu_directive: 'EU Richtlinien',
de_law: 'DE Gesetze',
at_law: 'AT Gesetze',
ch_law: 'CH Gesetze',
national_law: 'Nationale Gesetze (EU)',
bsi_standard: 'BSI Standards',
eu_guideline: 'EDPB / Guidelines',
international_standard: 'Internationale Standards',
other: 'Sonstige',
}
const GROUP_ORDER: RegGroupKey[] = [
'eu_regulation', 'eu_directive', 'de_law', 'at_law', 'ch_law',
'national_law', 'bsi_standard', 'eu_guideline', 'international_standard', 'other',
]
const COLLECTIONS = [
'bp_compliance_gesetze',
'bp_compliance_ce',
'bp_compliance_datenschutz',
]
export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
// Filter-Sidebar
const [selectedRegulation, setSelectedRegulation] = useState<string | null>(null)
const [regulationCounts, setRegulationCounts] = useState<Record<string, number>>({})
const [filterSearch, setFilterSearch] = useState('')
const [countsLoading, setCountsLoading] = useState(false)
// Dokument-Chunks (sequenziell)
const [docChunks, setDocChunks] = useState<Record<string, unknown>[]>([])
const [docChunkIndex, setDocChunkIndex] = useState(0)
const [docTotalChunks, setDocTotalChunks] = useState(0)
const [docLoading, setDocLoading] = useState(false)
const docChunksRef = useRef(docChunks)
docChunksRef.current = docChunks
// Split-View
const [splitViewActive, setSplitViewActive] = useState(true)
const [chunksPerPage, setChunksPerPage] = useState(6)
const [fullscreen, setFullscreen] = useState(false)
// Collection — default to bp_compliance_ce where we have PDFs downloaded
const [collection, setCollection] = useState('bp_compliance_ce')
// PDF existence check
const [pdfExists, setPdfExists] = useState<boolean | null>(null)
// Sidebar collapsed groups
const [collapsedGroups, setCollapsedGroups] = useState<Set<string>>(new Set())
// Build grouped regulations for sidebar
const regulationsInCollection = Object.entries(REGULATIONS_IN_RAG)
.filter(([, info]) => info.collection === collection)
.map(([code]) => code)
const groupedRegulations = React.useMemo(() => {
const groups: Record<RegGroupKey, { code: string; name: string; type: string }[]> = {
eu_regulation: [], eu_directive: [], de_law: [], at_law: [], ch_law: [],
national_law: [], bsi_standard: [], eu_guideline: [], international_standard: [], other: [],
}
for (const code of regulationsInCollection) {
const reg = REGULATION_INFO.find(r => r.code === code)
const type = (reg?.type || 'other') as RegGroupKey
const groupKey = type in groups ? type : 'other'
groups[groupKey].push({
code,
name: reg?.name || code,
type: reg?.type || 'unknown',
})
}
return groups
}, [regulationsInCollection.join(',')])
// Load regulation counts for current collection
const loadRegulationCounts = useCallback(async (col: string) => {
const entries = Object.entries(REGULATIONS_IN_RAG)
.filter(([, info]) => info.collection === col && info.qdrant_id)
if (entries.length === 0) return
// Build qdrant_id -> our_code mapping
const qdrantIdToCode: Record<string, string[]> = {}
for (const [code, info] of entries) {
if (!qdrantIdToCode[info.qdrant_id]) qdrantIdToCode[info.qdrant_id] = []
qdrantIdToCode[info.qdrant_id].push(code)
}
const uniqueQdrantIds = Object.keys(qdrantIdToCode)
setCountsLoading(true)
try {
const params = new URLSearchParams({
action: 'regulation-counts-batch',
collection: col,
qdrant_ids: uniqueQdrantIds.join(','),
})
const res = await fetch(`${apiProxy}?${params}`)
if (res.ok) {
const data = await res.json()
// Map qdrant_id counts back to our codes
const mapped: Record<string, number> = {}
for (const [qid, count] of Object.entries(data.counts as Record<string, number>)) {
const codes = qdrantIdToCode[qid] || []
for (const code of codes) {
mapped[code] = count
}
}
setRegulationCounts(prev => ({ ...prev, ...mapped }))
}
} catch (error) {
console.error('Failed to load regulation counts:', error)
} finally {
setCountsLoading(false)
}
}, [apiProxy])
// Load all chunks for a regulation (paginated scroll)
const loadDocumentChunks = useCallback(async (regulationCode: string) => {
const ragInfo = REGULATIONS_IN_RAG[regulationCode]
if (!ragInfo || !ragInfo.qdrant_id) return
setDocLoading(true)
setDocChunks([])
setDocChunkIndex(0)
setDocTotalChunks(0)
const allChunks: Record<string, unknown>[] = []
let offset: string | null = null
try {
let safety = 0
do {
const params = new URLSearchParams({
action: 'scroll',
collection: ragInfo.collection,
limit: '100',
filter_key: 'regulation_id',
filter_value: ragInfo.qdrant_id,
})
if (offset) params.append('offset', offset)
const res = await fetch(`${apiProxy}?${params}`)
if (!res.ok) break
const data = await res.json()
const chunks = data.chunks || []
allChunks.push(...chunks)
offset = data.next_offset || null
safety++
} while (offset && safety < 200)
// Sort by chunk_index
allChunks.sort((a, b) => {
const ai = Number(a.chunk_index ?? a.chunk_id ?? 0)
const bi = Number(b.chunk_index ?? b.chunk_id ?? 0)
return ai - bi
})
setDocChunks(allChunks)
setDocTotalChunks(allChunks.length)
setDocChunkIndex(0)
} catch (error) {
console.error('Failed to load document chunks:', error)
} finally {
setDocLoading(false)
}
}, [apiProxy])
// Initial load
useEffect(() => {
loadRegulationCounts(collection)
}, [collection, loadRegulationCounts])
// Current chunk
const currentChunk = docChunks[docChunkIndex] || null
const prevChunk = docChunkIndex > 0 ? docChunks[docChunkIndex - 1] : null
const nextChunk = docChunkIndex < docChunks.length - 1 ? docChunks[docChunkIndex + 1] : null
// PDF page estimation — use pages metadata if available
const estimatePdfPage = (chunk: Record<string, unknown> | null, chunkIdx: number): number => {
if (chunk) {
// Try pages array from payload (e.g. [7] or [7,8])
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) return pages[0]
// Try page field
const page = chunk.page as number | undefined
if (typeof page === 'number' && page > 0) return page
}
const mapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const cpp = mapping?.chunksPerPage || chunksPerPage
return Math.floor(chunkIdx / cpp) + 1
}
const pdfPage = estimatePdfPage(currentChunk, docChunkIndex)
const pdfMapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const pdfUrl = pdfMapping ? `/rag-originals/${pdfMapping.filename}#page=${pdfPage}` : null
// Check PDF existence when regulation changes
useEffect(() => {
if (!selectedRegulation) { setPdfExists(null); return }
const mapping = RAG_PDF_MAPPING[selectedRegulation]
if (!mapping) { setPdfExists(false); return }
const url = `/rag-originals/${mapping.filename}`
fetch(url, { method: 'HEAD' })
.then(res => setPdfExists(res.ok))
.catch(() => setPdfExists(false))
}, [selectedRegulation])
// Handlers
const handleSelectRegulation = (code: string) => {
setSelectedRegulation(code)
loadDocumentChunks(code)
}
const handleCollectionChange = (col: string) => {
setCollection(col)
setSelectedRegulation(null)
setDocChunks([])
setDocChunkIndex(0)
setDocTotalChunks(0)
setRegulationCounts({})
}
const handlePrev = () => {
if (docChunkIndex > 0) setDocChunkIndex(i => i - 1)
}
const handleNext = () => {
if (docChunkIndex < docChunks.length - 1) setDocChunkIndex(i => i + 1)
}
const handleKeyDown = useCallback((e: KeyboardEvent) => {
if (e.key === 'Escape' && fullscreen) {
e.preventDefault()
setFullscreen(false)
} else if (e.key === 'ArrowLeft' || e.key === 'ArrowUp') {
e.preventDefault()
setDocChunkIndex(i => Math.max(0, i - 1))
} else if (e.key === 'ArrowRight' || e.key === 'ArrowDown') {
e.preventDefault()
setDocChunkIndex(i => Math.min(docChunksRef.current.length - 1, i + 1))
}
}, [fullscreen])
useEffect(() => {
if (fullscreen || (selectedRegulation && docChunks.length > 0)) {
window.addEventListener('keydown', handleKeyDown)
return () => window.removeEventListener('keydown', handleKeyDown)
}
}, [selectedRegulation, docChunks.length, handleKeyDown, fullscreen])
const toggleGroup = (group: string) => {
setCollapsedGroups(prev => {
const next = new Set(prev)
if (next.has(group)) next.delete(group)
else next.add(group)
return next
})
}
// Get text content from a chunk
const getChunkText = (chunk: Record<string, unknown> | null): string => {
if (!chunk) return ''
return String(chunk.chunk_text || chunk.text || chunk.content || '')
}
// Extract structural metadata for prominent display
const getStructuralInfo = (chunk: Record<string, unknown> | null): { article?: string; section?: string; pages?: string } => {
if (!chunk) return {}
const result: { article?: string; section?: string; pages?: string } = {}
// Article / paragraph
const article = chunk.article || chunk.artikel || chunk.paragraph || chunk.section_title
if (article) result.article = String(article)
// Section
const section = chunk.section || chunk.chapter || chunk.abschnitt || chunk.kapitel
if (section) result.section = String(section)
// Pages
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) {
result.pages = pages.length === 1 ? `S. ${pages[0]}` : `S. ${pages[0]}-${pages[pages.length - 1]}`
} else if (chunk.page) {
result.pages = `S. ${chunk.page}`
}
return result
}
// Overlap extraction
const getOverlapPrev = (): string => {
if (!prevChunk) return ''
const text = getChunkText(prevChunk)
return text.length > 150 ? '...' + text.slice(-150) : text
}
const getOverlapNext = (): string => {
if (!nextChunk) return ''
const text = getChunkText(nextChunk)
return text.length > 150 ? text.slice(0, 150) + '...' : text
}
// Filter sidebar items
const filteredRegulations = React.useMemo(() => {
if (!filterSearch.trim()) return groupedRegulations
const term = filterSearch.toLowerCase()
const filtered: typeof groupedRegulations = {
eu_regulation: [], eu_directive: [], de_law: [], at_law: [], ch_law: [],
national_law: [], bsi_standard: [], eu_guideline: [], international_standard: [], other: [],
}
for (const [group, items] of Object.entries(groupedRegulations)) {
filtered[group as RegGroupKey] = items.filter(
r => r.code.toLowerCase().includes(term) || r.name.toLowerCase().includes(term)
)
}
return filtered
}, [groupedRegulations, filterSearch])
// Regulation name lookup
const getRegName = (code: string): string => {
const reg = REGULATION_INFO.find(r => r.code === code)
return reg?.name || code
}
// Important metadata keys to show prominently
const STRUCTURAL_KEYS = new Set([
'article', 'artikel', 'paragraph', 'section_title', 'section', 'chapter',
'abschnitt', 'kapitel', 'pages', 'page',
])
const HIDDEN_KEYS = new Set([
'text', 'content', 'chunk_text', 'id', 'embedding',
])
const structInfo = getStructuralInfo(currentChunk)
return (
<div
className={`flex flex-col ${fullscreen ? 'fixed inset-0 z-50 bg-slate-100 p-4' : ''}`}
style={fullscreen ? { height: '100vh' } : { height: 'calc(100vh - 220px)' }}
>
{/* Header bar — fixed height */}
<div className="flex-shrink-0 bg-white rounded-xl border border-slate-200 p-3 mb-3">
<div className="flex flex-wrap items-center gap-4">
<div>
<label className="block text-xs font-medium text-slate-500 mb-1">Collection</label>
<select
value={collection}
onChange={(e) => handleCollectionChange(e.target.value)}
className="px-3 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500"
>
{COLLECTIONS.map(c => (
<option key={c} value={c}>{c}</option>
))}
</select>
</div>
{selectedRegulation && (
<>
<div className="flex items-center gap-2">
<span className="text-sm font-semibold text-slate-900">
{selectedRegulation} {getRegName(selectedRegulation)}
</span>
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-100 text-blue-800 text-xs font-medium rounded">
{structInfo.article}
</span>
)}
{structInfo.pages && (
<span className="px-2 py-0.5 bg-slate-100 text-slate-600 text-xs rounded">
{structInfo.pages}
</span>
)}
</div>
<div className="flex items-center gap-2 ml-auto">
<button
onClick={handlePrev}
disabled={docChunkIndex === 0}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
&#9664; Zurueck
</button>
<span className="text-sm font-mono text-slate-600 min-w-[80px] text-center">
{docChunkIndex + 1} / {docTotalChunks}
</span>
<button
onClick={handleNext}
disabled={docChunkIndex >= docChunks.length - 1}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
Weiter &#9654;
</button>
<input
type="number"
min={1}
max={docTotalChunks}
value={docChunkIndex + 1}
onChange={(e) => {
const v = parseInt(e.target.value, 10)
if (!isNaN(v) && v >= 1 && v <= docTotalChunks) setDocChunkIndex(v - 1)
}}
className="w-16 px-2 py-1 border rounded text-xs text-center"
title="Springe zu Chunk Nr."
/>
</div>
<div className="flex items-center gap-2">
<label className="text-xs text-slate-500">Chunks/Seite:</label>
<select
value={chunksPerPage}
onChange={(e) => setChunksPerPage(Number(e.target.value))}
className="px-2 py-1 border rounded text-xs"
>
{[3, 4, 5, 6, 8, 10, 12, 15, 20].map(n => (
<option key={n} value={n}>{n}</option>
))}
</select>
<button
onClick={() => setSplitViewActive(!splitViewActive)}
className={`px-3 py-1 text-xs rounded-lg border ${
splitViewActive ? 'bg-teal-50 border-teal-300 text-teal-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
>
{splitViewActive ? 'Split-View an' : 'Split-View aus'}
</button>
<button
onClick={() => setFullscreen(!fullscreen)}
className={`px-3 py-1 text-xs rounded-lg border ${
fullscreen ? 'bg-indigo-50 border-indigo-300 text-indigo-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
title={fullscreen ? 'Vollbild beenden (Esc)' : 'Vollbild'}
>
{fullscreen ? '&#10005; Vollbild beenden' : '&#9974; Vollbild'}
</button>
</div>
</>
)}
</div>
</div>
{/* Main content: Sidebar + Content — fills remaining height */}
<div className="flex gap-3 flex-1 min-h-0">
{/* Sidebar — scrollable */}
<div className="w-56 flex-shrink-0 bg-white rounded-xl border border-slate-200 flex flex-col min-h-0">
<div className="flex-shrink-0 p-3 border-b border-slate-100">
<input
type="text"
value={filterSearch}
onChange={(e) => setFilterSearch(e.target.value)}
placeholder="Suche..."
className="w-full px-2 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500"
/>
{countsLoading && (
<div className="text-xs text-slate-400 mt-1 animate-pulse">Counts laden...</div>
)}
</div>
<div className="flex-1 overflow-y-auto min-h-0">
{GROUP_ORDER.map(group => {
const items = filteredRegulations[group]
if (items.length === 0) return null
const isCollapsed = collapsedGroups.has(group)
return (
<div key={group}>
<button
onClick={() => toggleGroup(group)}
className="w-full px-3 py-1.5 text-left text-xs font-semibold text-slate-500 bg-slate-50 hover:bg-slate-100 flex items-center justify-between sticky top-0 z-10"
>
<span>{GROUP_LABELS[group]}</span>
<span className="text-slate-400">{isCollapsed ? '+' : '-'}</span>
</button>
{!isCollapsed && items.map(reg => {
const count = regulationCounts[reg.code] ?? 0
const isSelected = selectedRegulation === reg.code
return (
<button
key={reg.code}
onClick={() => handleSelectRegulation(reg.code)}
className={`w-full px-3 py-1.5 text-left text-sm flex items-center justify-between hover:bg-teal-50 transition-colors ${
isSelected ? 'bg-teal-100 text-teal-900 font-medium' : 'text-slate-700'
}`}
>
<span className="truncate text-xs">{reg.name || reg.code}</span>
<span className={`text-xs tabular-nums flex-shrink-0 ml-1 ${count > 0 ? 'text-slate-500' : 'text-slate-300'}`}>
{count > 0 ? count.toLocaleString() : '—'}
</span>
</button>
)
})}
</div>
)
})}
</div>
</div>
{/* Content area — fills remaining width and height */}
{!selectedRegulation ? (
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200">
<div className="text-center text-slate-400 space-y-2">
<div className="text-4xl">&#128269;</div>
<p className="text-sm">Dokument in der Sidebar auswaehlen, um QA zu starten.</p>
<p className="text-xs text-slate-300">Pfeiltasten: Chunk vor/zurueck</p>
</div>
</div>
) : docLoading ? (
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200">
<div className="text-center text-slate-500 space-y-2">
<div className="animate-spin text-3xl">&#9881;</div>
<p className="text-sm">Chunks werden geladen...</p>
<p className="text-xs text-slate-400">
{selectedRegulation}: {REGULATIONS_IN_RAG[selectedRegulation]?.chunks.toLocaleString() || '?'} Chunks erwartet
</p>
</div>
</div>
) : (
<div className={`flex-1 grid gap-3 min-h-0 ${splitViewActive ? 'grid-cols-2' : 'grid-cols-1'}`}>
{/* Chunk-Text Panel — fixed height, internal scroll */}
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
{/* Panel header */}
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Chunk-Text</span>
<div className="flex items-center gap-2">
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-50 text-blue-700 text-xs font-medium rounded border border-blue-200">
{structInfo.article}
</span>
)}
{structInfo.section && (
<span className="px-2 py-0.5 bg-purple-50 text-purple-700 text-xs rounded border border-purple-200">
{structInfo.section}
</span>
)}
<span className="text-xs text-slate-400 tabular-nums">
#{docChunkIndex} / {docTotalChunks - 1}
</span>
</div>
</div>
{/* Scrollable content */}
<div className="flex-1 overflow-y-auto min-h-0 p-4 space-y-3">
{/* Overlap from previous chunk */}
{prevChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8593; Ende vorheriger Chunk #{docChunkIndex - 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapPrev()}</p>
</div>
)}
{/* Current chunk text */}
{currentChunk ? (
<div className="text-sm text-slate-800 whitespace-pre-wrap break-words leading-relaxed border-l-2 border-teal-400 pl-3">
{getChunkText(currentChunk)}
</div>
) : (
<div className="text-sm text-slate-400 italic">Kein Chunk-Text vorhanden.</div>
)}
{/* Overlap from next chunk */}
{nextChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8595; Anfang naechster Chunk #{docChunkIndex + 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapNext()}</p>
</div>
)}
{/* Metadata */}
{currentChunk && (
<div className="mt-4 pt-3 border-t border-slate-100">
<div className="text-xs font-medium text-slate-500 mb-2">Metadaten</div>
<div className="grid grid-cols-2 gap-x-4 gap-y-1 text-xs">
{Object.entries(currentChunk)
.filter(([k]) => !HIDDEN_KEYS.has(k))
.sort(([a], [b]) => {
// Structural keys first
const aStruct = STRUCTURAL_KEYS.has(a) ? 0 : 1
const bStruct = STRUCTURAL_KEYS.has(b) ? 0 : 1
return aStruct - bStruct || a.localeCompare(b)
})
.map(([k, v]) => (
<div key={k} className={`flex gap-1 ${STRUCTURAL_KEYS.has(k) ? 'col-span-2 font-medium' : ''}`}>
<span className="font-medium text-slate-500 flex-shrink-0">{k}:</span>
<span className="text-slate-700 break-all">
{Array.isArray(v) ? v.join(', ') : String(v)}
</span>
</div>
))}
</div>
{/* Chunk quality indicator */}
<div className="mt-3 pt-2 border-t border-slate-50">
<div className="text-xs text-slate-400">
Chunk-Laenge: {getChunkText(currentChunk).length} Zeichen
{getChunkText(currentChunk).length < 50 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr kurz</span>
)}
{getChunkText(currentChunk).length > 2000 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr lang</span>
)}
</div>
</div>
</div>
)}
</div>
</div>
{/* PDF-Viewer Panel */}
{splitViewActive && (
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Original-PDF</span>
<div className="flex items-center gap-2">
<span className="text-xs text-slate-400">
Seite ~{pdfPage}
{pdfMapping?.totalPages ? ` / ${pdfMapping.totalPages}` : ''}
</span>
{pdfUrl && (
<a
href={pdfUrl.split('#')[0]}
target="_blank"
rel="noopener noreferrer"
className="text-xs text-teal-600 hover:text-teal-800 underline"
>
Oeffnen &#8599;
</a>
)}
</div>
</div>
<div className="flex-1 min-h-0 relative">
{pdfUrl && pdfExists ? (
<iframe
key={`${selectedRegulation}-${pdfPage}`}
src={pdfUrl}
className="absolute inset-0 w-full h-full border-0"
title="Original PDF"
/>
) : (
<div className="flex items-center justify-center h-full text-slate-400 text-sm p-4">
<div className="text-center space-y-2">
<div className="text-3xl">&#128196;</div>
{!pdfMapping ? (
<>
<p>Kein PDF-Mapping fuer {selectedRegulation}.</p>
<p className="text-xs">rag-pdf-mapping.ts ergaenzen.</p>
</>
) : pdfExists === false ? (
<>
<p className="font-medium text-orange-600">PDF nicht vorhanden</p>
<p className="text-xs">Datei <code className="bg-slate-100 px-1 rounded">{pdfMapping.filename}</code> fehlt in ~/rag-originals/</p>
<p className="text-xs mt-1">Bitte manuell herunterladen und dort ablegen.</p>
</>
) : (
<p>PDF wird geprueft...</p>
)}
</div>
</div>
)}
</div>
</div>
)}
</div>
)}
</div>
</div>
)
}

View File

@@ -1,126 +0,0 @@
export interface RagPdfMapping {
filename: string
totalPages?: number
chunksPerPage?: number
language: string
}
export const RAG_PDF_MAPPING: Record<string, RagPdfMapping> = {
// EU Verordnungen
GDPR: { filename: 'GDPR_DE.pdf', language: 'de', totalPages: 88 },
EPRIVACY: { filename: 'EPRIVACY_DE.pdf', language: 'de' },
SCC: { filename: 'SCC_DE.pdf', language: 'de' },
SCC_FULL_TEXT: { filename: 'SCC_FULL_TEXT_DE.pdf', language: 'de' },
AIACT: { filename: 'AIACT_DE.pdf', language: 'de', totalPages: 144 },
CRA: { filename: 'CRA_DE.pdf', language: 'de' },
NIS2: { filename: 'NIS2_DE.pdf', language: 'de' },
DGA: { filename: 'DGA_DE.pdf', language: 'de' },
DSA: { filename: 'DSA_DE.pdf', language: 'de' },
PLD: { filename: 'PLD_DE.pdf', language: 'de' },
E_COMMERCE_RL: { filename: 'E_COMMERCE_RL_DE.pdf', language: 'de' },
VERBRAUCHERRECHTE_RL: { filename: 'VERBRAUCHERRECHTE_RL_DE.pdf', language: 'de' },
DIGITALE_INHALTE_RL: { filename: 'DIGITALE_INHALTE_RL_DE.pdf', language: 'de' },
DMA: { filename: 'DMA_DE.pdf', language: 'de' },
DPF: { filename: 'DPF_DE.pdf', language: 'de' },
EUCSA: { filename: 'EUCSA_DE.pdf', language: 'de' },
DATAACT: { filename: 'DATAACT_DE.pdf', language: 'de' },
DORA: { filename: 'DORA_DE.pdf', language: 'de' },
PSD2: { filename: 'PSD2_DE.pdf', language: 'de' },
AMLR: { filename: 'AMLR_DE.pdf', language: 'de' },
MiCA: { filename: 'MiCA_DE.pdf', language: 'de' },
EHDS: { filename: 'EHDS_DE.pdf', language: 'de' },
EAA: { filename: 'EAA_DE.pdf', language: 'de' },
DSM: { filename: 'DSM_DE.pdf', language: 'de' },
GPSR: { filename: 'GPSR_DE.pdf', language: 'de' },
MACHINERY_REG: { filename: 'MACHINERY_REG_DE.pdf', language: 'de' },
BLUE_GUIDE: { filename: 'BLUE_GUIDE_DE.pdf', language: 'de' },
// DE Gesetze
TDDDG: { filename: 'TDDDG_DE.pdf', language: 'de' },
BDSG_FULL: { filename: 'BDSG_FULL_DE.pdf', language: 'de' },
DE_DDG: { filename: 'DE_DDG.pdf', language: 'de' },
DE_BGB_AGB: { filename: 'DE_BGB_AGB.pdf', language: 'de' },
DE_EGBGB: { filename: 'DE_EGBGB.pdf', language: 'de' },
DE_HGB_RET: { filename: 'DE_HGB_RET.pdf', language: 'de' },
DE_AO_RET: { filename: 'DE_AO_RET.pdf', language: 'de' },
DE_UWG: { filename: 'DE_UWG.pdf', language: 'de' },
DE_TKG: { filename: 'DE_TKG.pdf', language: 'de' },
DE_PANGV: { filename: 'DE_PANGV.pdf', language: 'de' },
DE_DLINFOV: { filename: 'DE_DLINFOV.pdf', language: 'de' },
DE_BETRVG: { filename: 'DE_BETRVG.pdf', language: 'de' },
DE_GESCHGEHG: { filename: 'DE_GESCHGEHG.pdf', language: 'de' },
DE_BSIG: { filename: 'DE_BSIG.pdf', language: 'de' },
DE_USTG_RET: { filename: 'DE_USTG_RET.pdf', language: 'de' },
// BSI Standards
'BSI-TR-03161-1': { filename: 'BSI-TR-03161-1.pdf', language: 'de' },
'BSI-TR-03161-2': { filename: 'BSI-TR-03161-2.pdf', language: 'de' },
'BSI-TR-03161-3': { filename: 'BSI-TR-03161-3.pdf', language: 'de' },
// AT Gesetze
AT_DSG: { filename: 'AT_DSG.pdf', language: 'de' },
AT_DSG_FULL: { filename: 'AT_DSG_FULL.pdf', language: 'de' },
AT_ECG: { filename: 'AT_ECG.pdf', language: 'de' },
AT_TKG: { filename: 'AT_TKG.pdf', language: 'de' },
AT_KSCHG: { filename: 'AT_KSCHG.pdf', language: 'de' },
AT_FAGG: { filename: 'AT_FAGG.pdf', language: 'de' },
AT_UGB_RET: { filename: 'AT_UGB_RET.pdf', language: 'de' },
AT_BAO_RET: { filename: 'AT_BAO_RET.pdf', language: 'de' },
AT_MEDIENG: { filename: 'AT_MEDIENG.pdf', language: 'de' },
AT_ABGB_AGB: { filename: 'AT_ABGB_AGB.pdf', language: 'de' },
AT_UWG: { filename: 'AT_UWG.pdf', language: 'de' },
// CH Gesetze
CH_DSG: { filename: 'CH_DSG.pdf', language: 'de' },
CH_DSV: { filename: 'CH_DSV.pdf', language: 'de' },
CH_OR_AGB: { filename: 'CH_OR_AGB.pdf', language: 'de' },
CH_UWG: { filename: 'CH_UWG.pdf', language: 'de' },
CH_FMG: { filename: 'CH_FMG.pdf', language: 'de' },
CH_GEBUV: { filename: 'CH_GEBUV.pdf', language: 'de' },
CH_ZERTES: { filename: 'CH_ZERTES.pdf', language: 'de' },
CH_ZGB_PERS: { filename: 'CH_ZGB_PERS.pdf', language: 'de' },
// LI
LI_DSG: { filename: 'LI_DSG.pdf', language: 'de' },
// Nationale DSG (andere EU)
ES_LOPDGDD: { filename: 'ES_LOPDGDD.pdf', language: 'es' },
IT_CODICE_PRIVACY: { filename: 'IT_CODICE_PRIVACY.pdf', language: 'it' },
NL_UAVG: { filename: 'NL_UAVG.pdf', language: 'nl' },
FR_CNIL_GUIDE: { filename: 'FR_CNIL_GUIDE.pdf', language: 'fr' },
IE_DPA_2018: { filename: 'IE_DPA_2018.pdf', language: 'en' },
UK_DPA_2018: { filename: 'UK_DPA_2018.pdf', language: 'en' },
UK_GDPR: { filename: 'UK_GDPR.pdf', language: 'en' },
NO_PERSONOPPLYSNINGSLOVEN: { filename: 'NO_PERSONOPPLYSNINGSLOVEN.pdf', language: 'no' },
SE_DATASKYDDSLAG: { filename: 'SE_DATASKYDDSLAG.pdf', language: 'sv' },
PL_UODO: { filename: 'PL_UODO.pdf', language: 'pl' },
CZ_ZOU: { filename: 'CZ_ZOU.pdf', language: 'cs' },
HU_INFOTV: { filename: 'HU_INFOTV.pdf', language: 'hu' },
BE_DPA_LAW: { filename: 'BE_DPA_LAW.pdf', language: 'nl' },
FI_TIETOSUOJALAKI: { filename: 'FI_TIETOSUOJALAKI.pdf', language: 'fi' },
DK_DATABESKYTTELSESLOVEN: { filename: 'DK_DATABESKYTTELSESLOVEN.pdf', language: 'da' },
LU_DPA_LAW: { filename: 'LU_DPA_LAW.pdf', language: 'fr' },
// DE Gesetze (zusaetzlich)
TMG_KOMPLETT: { filename: 'TMG_KOMPLETT.pdf', language: 'de' },
DE_URHG: { filename: 'DE_URHG.pdf', language: 'de' },
// EDPB Guidelines
EDPB_GUIDELINES_5_2020: { filename: 'EDPB_GUIDELINES_5_2020.pdf', language: 'en' },
EDPB_GUIDELINES_7_2020: { filename: 'EDPB_GUIDELINES_7_2020.pdf', language: 'en' },
EDPB_GUIDELINES_1_2020: { filename: 'EDPB_GUIDELINES_1_2020.pdf', language: 'en' },
EDPB_GUIDELINES_1_2022: { filename: 'EDPB_GUIDELINES_1_2022.pdf', language: 'en' },
EDPB_GUIDELINES_2_2023: { filename: 'EDPB_GUIDELINES_2_2023.pdf', language: 'en' },
EDPB_GUIDELINES_2_2024: { filename: 'EDPB_GUIDELINES_2_2024.pdf', language: 'en' },
EDPB_GUIDELINES_4_2019: { filename: 'EDPB_GUIDELINES_4_2019.pdf', language: 'en' },
EDPB_GUIDELINES_9_2022: { filename: 'EDPB_GUIDELINES_9_2022.pdf', language: 'en' },
EDPB_DPIA_LIST: { filename: 'EDPB_DPIA_LIST.pdf', language: 'en' },
EDPB_LEGITIMATE_INTEREST: { filename: 'EDPB_LEGITIMATE_INTEREST.pdf', language: 'en' },
// EDPS
EDPS_DPIA_LIST: { filename: 'EDPS_DPIA_LIST.pdf', language: 'en' },
// Frameworks
ENISA_SECURE_BY_DESIGN: { filename: 'ENISA_SECURE_BY_DESIGN.pdf', language: 'en' },
ENISA_SUPPLY_CHAIN: { filename: 'ENISA_SUPPLY_CHAIN.pdf', language: 'en' },
ENISA_THREAT_LANDSCAPE: { filename: 'ENISA_THREAT_LANDSCAPE.pdf', language: 'en' },
ENISA_ICS_SCADA: { filename: 'ENISA_ICS_SCADA.pdf', language: 'en' },
ENISA_CYBERSECURITY_2024: { filename: 'ENISA_CYBERSECURITY_2024.pdf', language: 'en' },
NIST_SSDF: { filename: 'NIST_SSDF.pdf', language: 'en' },
NIST_CSF_2: { filename: 'NIST_CSF_2.pdf', language: 'en' },
OECD_AI_PRINCIPLES: { filename: 'OECD_AI_PRINCIPLES.pdf', language: 'en' },
// EU-IFRS / EFRAG
EU_IFRS_DE: { filename: 'EU_IFRS_DE.pdf', language: 'de' },
EU_IFRS_EN: { filename: 'EU_IFRS_EN.pdf', language: 'en' },
EFRAG_ENDORSEMENT: { filename: 'EFRAG_ENDORSEMENT.pdf', language: 'en' },
}

View File

@@ -11,8 +11,6 @@ import React, { useState, useEffect, useCallback } from 'react'
import Link from 'next/link'
import { PagePurpose } from '@/components/common/PagePurpose'
import { AIModuleSidebarResponsive } from '@/components/ai/AIModuleSidebar'
import { REGULATIONS_IN_RAG } from './rag-constants'
import { ChunkBrowserQA } from './components/ChunkBrowserQA'
// API uses local proxy route to klausur-service
const API_PROXY = '/api/legal-corpus'
@@ -75,7 +73,7 @@ interface DsfaCorpusStatus {
type RegulationCategory = 'regulations' | 'dsfa' | 'nibis' | 'templates'
// Tab definitions
type TabId = 'overview' | 'regulations' | 'map' | 'search' | 'chunks' | 'data' | 'ingestion' | 'pipeline'
type TabId = 'overview' | 'regulations' | 'map' | 'search' | 'data' | 'ingestion' | 'pipeline'
// Custom document type
interface CustomDocument {
@@ -1013,264 +1011,8 @@ const REGULATIONS = [
keyTopics: ['Bussgeldberechnung', 'Schweregrad', 'Milderungsgruende', 'Bussgeldrahmen'],
effectiveDate: '2022'
},
// =====================================================================
// Neu ingestierte EU-Richtlinien (Februar 2026)
// =====================================================================
{
code: 'E_COMMERCE_RL',
name: 'E-Commerce-Richtlinie',
fullName: 'Richtlinie 2000/31/EG ueber den elektronischen Geschaeftsverkehr',
type: 'eu_directive',
expected: 30,
description: 'EU-Richtlinie ueber den elektronischen Geschaeftsverkehr (E-Commerce). Regelt Herkunftslandprinzip, Informationspflichten, Haftungsprivilegien fuer Vermittler (Mere Conduit, Caching, Hosting).',
relevantFor: ['Online-Dienste', 'E-Commerce', 'Hosting-Anbieter', 'Plattformen'],
keyTopics: ['Herkunftslandprinzip', 'Haftungsprivileg', 'Informationspflichten', 'Spam-Verbot', 'Vermittlerhaftung'],
effectiveDate: '17. Juli 2000'
},
{
code: 'VERBRAUCHERRECHTE_RL',
name: 'Verbraucherrechte-Richtlinie',
fullName: 'Richtlinie 2011/83/EU ueber die Rechte der Verbraucher',
type: 'eu_directive',
expected: 25,
description: 'EU-weite Harmonisierung der Verbraucherrechte bei Fernabsatz und aussergeschaeftlichen Vertraegen. 14-Tage-Widerrufsrecht, Informationspflichten, digitale Inhalte.',
relevantFor: ['Online-Shops', 'E-Commerce', 'Fernabsatz', 'Dienstleister'],
keyTopics: ['Widerrufsrecht 14 Tage', 'Informationspflichten', 'Fernabsatzvertraege', 'Digitale Inhalte'],
effectiveDate: '13. Juni 2014'
},
{
code: 'DIGITALE_INHALTE_RL',
name: 'Digitale-Inhalte-Richtlinie',
fullName: 'Richtlinie (EU) 2019/770 ueber digitale Inhalte und Dienstleistungen',
type: 'eu_directive',
expected: 20,
description: 'Gewaehrleistungsrecht fuer digitale Inhalte und Dienstleistungen. Regelt Maengelhaftung, Updates, Vertragsmaessigkeit und Kuendigungsrechte bei digitalen Produkten.',
relevantFor: ['SaaS-Anbieter', 'App-Entwickler', 'Cloud-Dienste', 'Streaming-Anbieter', 'Software-Hersteller'],
keyTopics: ['Digitale Gewaehrleistung', 'Update-Pflicht', 'Vertragsmaessigkeit', 'Kuendigungsrecht', 'Datenportabilitaet'],
effectiveDate: '1. Januar 2022'
},
{
code: 'DMA',
name: 'Digital Markets Act',
fullName: 'Verordnung (EU) 2022/1925 - Digital Markets Act',
type: 'eu_regulation',
expected: 50,
description: 'Reguliert digitale Gatekeeper-Plattformen. Stellt Verhaltensregeln fuer grosse Plattformen auf (Apple, Google, Meta, Amazon, Microsoft). Verbietet Selbstbevorzugung und erzwingt Interoperabilitaet.',
relevantFor: ['Grosse Plattformen', 'App-Stores', 'Suchmaschinen', 'Social Media', 'Messenger-Dienste'],
keyTopics: ['Gatekeeper-Pflichten', 'Interoperabilitaet', 'Selbstbevorzugung', 'App-Store-Regeln', 'Datenportabilitaet'],
effectiveDate: '2. Mai 2023'
},
// === Industrie-Compliance (2026-02-28) ===
{
code: 'MACHINERY_REG',
name: 'Maschinenverordnung',
fullName: 'Verordnung (EU) 2023/1230 ueber Maschinen (Machinery Regulation)',
type: 'eu_regulation',
expected: 100,
description: 'Loest die alte Maschinenrichtlinie 2006/42/EG ab. Regelt Sicherheitsanforderungen fuer Maschinen und zugehoerige Produkte, CE-Kennzeichnung, Konformitaetsbewertung und Marktaufsicht. Neu: Cybersecurity-Anforderungen fuer vernetzte Maschinen.',
relevantFor: ['Maschinenbau', 'Industrie 4.0', 'Automatisierung', 'Hersteller', 'Importeure'],
keyTopics: ['CE-Kennzeichnung', 'Konformitaetsbewertung', 'Risikobeurteilung', 'Cybersecurity', 'Betriebsanleitung'],
effectiveDate: '20. Januar 2027'
},
{
code: 'BLUE_GUIDE',
name: 'Blue Guide',
fullName: 'Leitfaden fuer die Umsetzung der EU-Produktvorschriften (Blue Guide 2022)',
type: 'eu_guideline',
expected: 200,
description: 'Umfassender Leitfaden der EU-Kommission zur Umsetzung von Produktvorschriften. Erklaert CE-Kennzeichnung, Konformitaetsbewertungsverfahren, notifizierte Stellen, Marktaufsicht und den New Legislative Framework.',
relevantFor: ['Hersteller', 'Importeure', 'Haendler', 'Notifizierte Stellen', 'Marktaufsichtsbehoerden'],
keyTopics: ['CE-Kennzeichnung', 'Konformitaetserklaerung', 'Notifizierte Stellen', 'Marktaufsicht', 'New Legislative Framework'],
effectiveDate: '29. Juni 2022'
},
{
code: 'ENISA_SECURE_BY_DESIGN',
name: 'ENISA Secure by Design',
fullName: 'ENISA Secure Software Development Best Practices',
type: 'eu_guideline',
expected: 50,
description: 'ENISA-Leitfaden fuer sichere Softwareentwicklung. Beschreibt Best Practices fuer Security by Design, sichere Entwicklungsprozesse und Schwachstellenmanagement.',
relevantFor: ['Softwareentwickler', 'DevOps', 'IT-Sicherheit', 'Produktmanagement'],
keyTopics: ['Security by Design', 'SDLC', 'Schwachstellenmanagement', 'Secure Coding', 'Threat Modeling'],
effectiveDate: '2023'
},
{
code: 'ENISA_SUPPLY_CHAIN',
name: 'ENISA Supply Chain Security',
fullName: 'ENISA Threat Landscape for Supply Chain Attacks',
type: 'eu_guideline',
expected: 50,
description: 'ENISA-Analyse der Bedrohungslandschaft fuer Supply-Chain-Angriffe. Beschreibt Angriffsvektoren, Taxonomie und Empfehlungen zur Absicherung von Software-Lieferketten.',
relevantFor: ['IT-Sicherheit', 'Beschaffung', 'Softwareentwickler', 'CISO'],
keyTopics: ['Supply Chain Security', 'SolarWinds', 'SBOM', 'Lieferantenrisiko', 'Third-Party Risk'],
effectiveDate: '2021'
},
{
code: 'NIST_SSDF',
name: 'NIST SSDF',
fullName: 'NIST SP 800-218 — Secure Software Development Framework (SSDF)',
type: 'international_standard',
expected: 40,
description: 'NIST-Framework fuer sichere Softwareentwicklung. Definiert Praktiken und Aufgaben in vier Gruppen: Prepare, Protect, Produce, Respond. Weit verbreitet als Referenz fuer Software Supply Chain Security.',
relevantFor: ['Softwareentwickler', 'DevSecOps', 'IT-Sicherheit', 'Compliance-Manager'],
keyTopics: ['SSDF', 'Secure SDLC', 'Software Supply Chain', 'Vulnerability Management', 'Code Review'],
effectiveDate: '3. Februar 2022'
},
{
code: 'NIST_CSF_2',
name: 'NIST CSF 2.0',
fullName: 'NIST Cybersecurity Framework (CSF) 2.0',
type: 'international_standard',
expected: 50,
description: 'Version 2.0 des NIST Cybersecurity Framework. Neue Kernfunktion "Govern" ergaenzt Identify, Protect, Detect, Respond, Recover. Erweitert den Anwendungsbereich ueber kritische Infrastruktur hinaus auf alle Organisationen.',
relevantFor: ['CISO', 'IT-Sicherheit', 'Risikomanagement', 'Geschaeftsfuehrung', 'Alle Branchen'],
keyTopics: ['Govern', 'Identify', 'Protect', 'Detect', 'Respond', 'Recover', 'Cybersecurity'],
effectiveDate: '26. Februar 2024'
},
{
code: 'OECD_AI_PRINCIPLES',
name: 'OECD AI Principles',
fullName: 'OECD Recommendation on Artificial Intelligence (AI Principles)',
type: 'international_standard',
expected: 20,
description: 'OECD-Empfehlung zu Kuenstlicher Intelligenz. Definiert fuenf Prinzipien fuer verantwortungsvolle KI: Inklusives Wachstum, Menschenzentrierte Werte, Transparenz, Robustheit und Rechenschaftspflicht. Von 46 Laendern angenommen.',
relevantFor: ['KI-Entwickler', 'Policy-Maker', 'Ethik-Kommissionen', 'Geschaeftsfuehrung'],
keyTopics: ['AI Ethics', 'Transparenz', 'Accountability', 'Trustworthy AI', 'Human-Centered AI'],
effectiveDate: '22. Mai 2019'
},
{
code: 'EU_IFRS',
name: 'EU-IFRS',
fullName: 'Verordnung (EU) 2023/1803 — International Financial Reporting Standards',
type: 'eu_regulation',
expected: 500,
description: 'Konsolidierte Fassung der von der EU uebernommenen IFRS/IAS/IFRIC/SIC. Rechtsverbindlich fuer boersennotierte EU-Unternehmen. Enthalt IFRS 1-17, IAS 1-41, IFRIC 1-23 und SIC 7-32 in der EU-endorsed Fassung (Stand Okt 2023). ACHTUNG: Neuere IASB-Standards sind moeglicherweise noch nicht EU-endorsed.',
relevantFor: ['Rechnungswesen', 'Wirtschaftspruefer', 'boersennotierte Unternehmen', 'Finanzberichterstattung', 'CFO'],
keyTopics: ['IFRS 16 Leasing', 'IFRS 9 Finanzinstrumente', 'IAS 1 Darstellung', 'IFRS 15 Erloese', 'IFRS 17 Versicherungsvertraege', 'Konsolidierung'],
effectiveDate: '16. Oktober 2023'
},
{
code: 'EFRAG_ENDORSEMENT',
name: 'EFRAG Endorsement Status',
fullName: 'EFRAG EU Endorsement Status Report (Dezember 2025)',
type: 'eu_guideline',
expected: 30,
description: 'Uebersicht des European Financial Reporting Advisory Group (EFRAG) ueber den EU-Endorsement-Stand aller IFRS/IAS-Standards. Zeigt welche Standards von der EU uebernommen wurden und welche noch ausstehend sind. Relevant fuer internationale Ausschreibungen und Compliance-Pruefung.',
relevantFor: ['Rechnungswesen', 'Wirtschaftspruefer', 'Compliance Officer', 'internationale Ausschreibungen'],
keyTopics: ['EU Endorsement', 'IFRS 18', 'IFRS S1/S2 Sustainability', 'Endorsement Status', 'IASB Updates'],
effectiveDate: '18. Dezember 2025'
},
]
// Source URLs for original documents (click to view original)
const REGULATION_SOURCES: Record<string, string> = {
// EU Verordnungen/Richtlinien (EUR-Lex)
GDPR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32016R0679',
EPRIVACY: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32002L0058',
SCC: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32021D0914',
DPF: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023D1795',
AIACT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R1689',
CRA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R2847',
NIS2: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022L2555',
EUCSA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019R0881',
DATAACT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R2854',
DGA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R0868',
DSA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R2065',
EAA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0882',
DSM: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0790',
PLD: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024L2853',
GPSR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R0988',
DORA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R2554',
PSD2: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32015L2366',
AMLR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R1624',
MiCA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1114',
EHDS: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32025R0327',
SCC_FULL_TEXT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32021D0914',
E_COMMERCE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32000L0031',
VERBRAUCHERRECHTE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32011L0083',
DIGITALE_INHALTE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0770',
DMA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R1925',
MACHINERY_REG: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1230',
BLUE_GUIDE: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:52022XC0629(04)',
EU_IFRS: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1803',
// EDPB Guidelines
EDPB_GUIDELINES_2_2019: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-22019-processing-personal-data-under-article-61b_en',
EDPB_GUIDELINES_3_2019: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-32019-processing-personal-data-through-video_en',
EDPB_GUIDELINES_5_2020: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-052020-consent-under-regulation-2016679_en',
EDPB_GUIDELINES_7_2020: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-072020-concepts-controller-and-processor-gdpr_en',
EDPB_GUIDELINES_1_2022: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-042022-calculation-administrative-fines-under-gdpr_en',
// BSI Technische Richtlinien
'BSI-TR-03161-1': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-1.html',
'BSI-TR-03161-2': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-2.html',
'BSI-TR-03161-3': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-3.html',
// Nationale Datenschutzgesetze
AT_DSG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001597',
BDSG_FULL: 'https://www.gesetze-im-internet.de/bdsg_2018/',
CH_DSG: 'https://www.fedlex.admin.ch/eli/cc/2022/491/de',
LI_DSG: 'https://www.gesetze.li/konso/2018.272',
BE_DPA_LAW: 'https://www.autoriteprotectiondonnees.be/citoyen/la-loi-du-30-juillet-2018',
NL_UAVG: 'https://wetten.overheid.nl/BWBR0040940/',
FR_CNIL_GUIDE: 'https://www.cnil.fr/fr/rgpd-par-ou-commencer',
ES_LOPDGDD: 'https://www.boe.es/buscar/act.php?id=BOE-A-2018-16673',
IT_CODICE_PRIVACY: 'https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/9042678',
IE_DPA_2018: 'https://www.irishstatutebook.ie/eli/2018/act/7/enacted/en/html',
UK_DPA_2018: 'https://www.legislation.gov.uk/ukpga/2018/12/contents',
UK_GDPR: 'https://www.legislation.gov.uk/eur/2016/679/contents',
NO_PERSONOPPLYSNINGSLOVEN: 'https://lovdata.no/dokument/NL/lov/2018-06-15-38',
SE_DATASKYDDSLAG: 'https://www.riksdagen.se/sv/dokument-och-lagar/dokument/svensk-forfattningssamling/lag-2018218-med-kompletterande-bestammelser_sfs-2018-218/',
FI_TIETOSUOJALAKI: 'https://www.finlex.fi/fi/laki/ajantasa/2018/20181050',
PL_UODO: 'https://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=WDU20180001000',
CZ_ZOU: 'https://www.zakonyprolidi.cz/cs/2019-110',
HU_INFOTV: 'https://net.jogtar.hu/jogszabaly?docid=a1100112.tv',
LU_DPA_LAW: 'https://legilux.public.lu/eli/etat/leg/loi/2018/08/01/a686/jo',
DK_DATABESKYTTELSESLOVEN: 'https://www.retsinformation.dk/eli/lta/2018/502',
// Deutschland — Weitere Gesetze
TDDDG: 'https://www.gesetze-im-internet.de/tdddg/',
DE_DDG: 'https://www.gesetze-im-internet.de/ddg/',
DE_BGB_AGB: 'https://www.gesetze-im-internet.de/bgb/__305.html',
DE_EGBGB: 'https://www.gesetze-im-internet.de/bgbeg/art_246.html',
DE_UWG: 'https://www.gesetze-im-internet.de/uwg_2004/',
DE_HGB_RET: 'https://www.gesetze-im-internet.de/hgb/__257.html',
DE_AO_RET: 'https://www.gesetze-im-internet.de/ao_1977/__147.html',
DE_TKG: 'https://www.gesetze-im-internet.de/tkg_2021/',
DE_PANGV: 'https://www.gesetze-im-internet.de/pangv_2022/',
DE_DLINFOV: 'https://www.gesetze-im-internet.de/dlinfov/',
DE_BETRVG: 'https://www.gesetze-im-internet.de/betrvg/__87.html',
DE_GESCHGEHG: 'https://www.gesetze-im-internet.de/geschgehg/',
DE_BSIG: 'https://www.gesetze-im-internet.de/bsig_2009/',
DE_USTG_RET: 'https://www.gesetze-im-internet.de/ustg_1980/__14b.html',
// Oesterreich — Weitere Gesetze
AT_ECG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20001703',
AT_TKG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20007898',
AT_KSCHG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10002462',
AT_FAGG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20008783',
AT_UGB_RET: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001702',
AT_BAO_RET: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10003940',
AT_MEDIENG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10000719',
AT_ABGB_AGB: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001622',
AT_UWG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10002665',
// Schweiz
CH_DSV: 'https://www.fedlex.admin.ch/eli/cc/2022/568/de',
CH_OR_AGB: 'https://www.fedlex.admin.ch/eli/cc/27/317_321_377/de',
CH_UWG: 'https://www.fedlex.admin.ch/eli/cc/1988/223_223_223/de',
CH_FMG: 'https://www.fedlex.admin.ch/eli/cc/1997/2187_2187_2187/de',
CH_GEBUV: 'https://www.fedlex.admin.ch/eli/cc/2002/249/de',
CH_ZERTES: 'https://www.fedlex.admin.ch/eli/cc/2016/752/de',
CH_ZGB_PERS: 'https://www.fedlex.admin.ch/eli/cc/24/233_245_233/de',
// Industrie-Compliance
ENISA_SECURE_BY_DESIGN: 'https://www.enisa.europa.eu/publications/secure-development-best-practices',
ENISA_SUPPLY_CHAIN: 'https://www.enisa.europa.eu/publications/threat-landscape-for-supply-chain-attacks',
NIST_SSDF: 'https://csrc.nist.gov/pubs/sp/800/218/final',
NIST_CSF_2: 'https://www.nist.gov/cyberframework',
OECD_AI_PRINCIPLES: 'https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449',
// IFRS / EFRAG
EU_IFRS_DE: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1803',
EU_IFRS_EN: 'https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32023R1803',
EFRAG_ENDORSEMENT: 'https://www.efrag.org/activities/endorsement-status-report',
// Full-text Datenschutzgesetz AT
AT_DSG_FULL: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001597',
}
// License info for each regulation
const REGULATION_LICENSES: Record<string, { license: string; licenseNote: string }> = {
GDPR: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk der EU — frei verwendbar' },
@@ -1321,18 +1063,6 @@ const REGULATION_LICENSES: Record<string, { license: string; licenseNote: string
EDPB_GUIDELINES_3_2019: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
EDPB_GUIDELINES_5_2020: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
EDPB_GUIDELINES_7_2020: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
// Industrie-Compliance (2026-02-28)
MACHINERY_REG: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
BLUE_GUIDE: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Leitfaden — amtliches Werk der Kommission' },
ENISA_SECURE_BY_DESIGN: { license: 'CC-BY-4.0', licenseNote: 'ENISA Publication — CC BY 4.0' },
ENISA_SUPPLY_CHAIN: { license: 'CC-BY-4.0', licenseNote: 'ENISA Publication — CC BY 4.0' },
NIST_SSDF: { license: 'PUBLIC_DOMAIN', licenseNote: 'US Government Work — Public Domain' },
NIST_CSF_2: { license: 'PUBLIC_DOMAIN', licenseNote: 'US Government Work — Public Domain' },
OECD_AI_PRINCIPLES: { license: 'PUBLIC_DOMAIN', licenseNote: 'OECD Legal Instrument — Reuse Notice' },
// EU-IFRS / EFRAG (2026-02-28)
EU_IFRS_DE: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EU_IFRS_EN: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EFRAG_ENDORSEMENT: { license: 'PUBLIC_DOMAIN', licenseNote: 'EFRAG — oeffentliches Dokument' },
// DACH National Laws — Deutschland
DE_DDG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_BGB_AGB: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
@@ -1369,35 +1099,6 @@ const REGULATION_LICENSES: Record<string, { license: string; licenseNote: string
LU_DPA_LAW: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Luxemburg — frei verwendbar' },
DK_DATABESKYTTELSESLOVEN: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Daenemark — frei verwendbar' },
EDPB_GUIDELINES_1_2022: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
// Neue EU-Richtlinien (Februar 2026 ingestiert)
E_COMMERCE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
VERBRAUCHERRECHTE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
DIGITALE_INHALTE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
DMA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
}
// REGULATIONS_IN_RAG is imported from ./rag-constants.ts
// Helper: Check if regulation is in RAG
const isInRag = (code: string): boolean => code in REGULATIONS_IN_RAG
// Helper: Get known chunk count for a regulation
const getKnownChunks = (code: string): number => REGULATIONS_IN_RAG[code]?.chunks || 0
// Known collection totals (updated: 2026-02-28)
// Note: bp_compliance_ce and bp_compliance_datenschutz will increase after
// industry compliance ingestion (Machinery Reg, Blue Guide, ENISA, NIST, OECD).
// Update chunk counts after running ingest-industry-compliance.sh.
const COLLECTION_TOTALS = {
bp_compliance_gesetze: 58304,
bp_compliance_ce: 18183,
bp_legal_templates: 7689,
bp_compliance_datenschutz: 2448,
bp_dsfa_corpus: 7867,
bp_compliance_recht: 1425,
bp_nibis_eh: 7996,
total_legal: 76487, // gesetze + ce
total_all: 103912,
}
// License display labels
@@ -1743,8 +1444,6 @@ export default function RAGPage() {
const [autoRefresh, setAutoRefresh] = useState(true)
const [elapsedTime, setElapsedTime] = useState<string>('')
// Chunk browser state is now in ChunkBrowserQA component
// DSFA corpus state
const [dsfaSources, setDsfaSources] = useState<DsfaSource[]>([])
const [dsfaStatus, setDsfaStatus] = useState<DsfaCorpusStatus | null>(null)
@@ -1990,8 +1689,6 @@ export default function RAGPage() {
return () => clearInterval(interval)
}, [pipelineState?.started_at, pipelineState?.status])
// Chunk browser functions are now in ChunkBrowserQA component
const handleSearch = async () => {
if (!searchQuery.trim()) return
@@ -2077,7 +1774,6 @@ export default function RAGPage() {
{ id: 'regulations' as TabId, name: 'Regulierungen', icon: '📜' },
{ id: 'map' as TabId, name: 'Landkarte', icon: '🗺️' },
{ id: 'search' as TabId, name: 'Suche', icon: '🔍' },
{ id: 'chunks' as TabId, name: 'Chunk-Browser', icon: '🧩' },
{ id: 'data' as TabId, name: 'Daten', icon: '📁' },
{ id: 'ingestion' as TabId, name: 'Ingestion', icon: '⚙️' },
{ id: 'pipeline' as TabId, name: 'Pipeline', icon: '🔄' },
@@ -2108,7 +1804,7 @@ export default function RAGPage() {
{/* Page Purpose */}
<PagePurpose
title="Daten & RAG"
purpose={`Verwalten und durchsuchen Sie 7 RAG-Collections mit ${REGULATIONS.length} Regulierungen (${Object.keys(REGULATIONS_IN_RAG).length} im RAG). Legal Corpus, DSFA Corpus (70+ Quellen), NiBiS EH (Bildungsinhalte) und Legal Templates. Teil der KI-Daten-Pipeline fuer Compliance und Klausur-Korrektur.`}
purpose="Verwalten und durchsuchen Sie 4 RAG-Collections: Legal Corpus (24 Regulierungen), DSFA Corpus (70+ Quellen inkl. internationaler Datenschutzgesetze), NiBiS EH (Bildungsinhalte) und Legal Templates (Dokumentvorlagen). Teil der KI-Daten-Pipeline fuer Compliance und Klausur-Korrektur."
audience={['DSB', 'Compliance Officer', 'Entwickler']}
gdprArticles={['§5 UrhG (Amtliche Werke)', 'Art. 5 DSGVO (Rechenschaftspflicht)']}
architecture={{
@@ -2130,8 +1826,8 @@ export default function RAGPage() {
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-6">
<div className="bg-white rounded-xl p-4 border border-slate-200">
<p className="text-xs font-medium text-blue-600 uppercase mb-1">Legal Corpus</p>
<p className="text-2xl font-bold text-slate-900">{COLLECTION_TOTALS.total_legal.toLocaleString()}</p>
<p className="text-xs text-slate-500">Chunks &middot; {Object.keys(REGULATIONS_IN_RAG).length}/{REGULATIONS.length} im RAG</p>
<p className="text-2xl font-bold text-slate-900">{loading ? '-' : getTotalChunks().toLocaleString()}</p>
<p className="text-xs text-slate-500">Chunks &middot; {REGULATIONS.length} Regulierungen</p>
</div>
<div className="bg-white rounded-xl p-4 border border-slate-200">
<p className="text-xs font-medium text-purple-600 uppercase mb-1">DSFA Corpus</p>
@@ -2140,12 +1836,12 @@ export default function RAGPage() {
</div>
<div className="bg-white rounded-xl p-4 border border-slate-200">
<p className="text-xs font-medium text-emerald-600 uppercase mb-1">NiBiS EH</p>
<p className="text-2xl font-bold text-slate-900">7.996</p>
<p className="text-2xl font-bold text-slate-900">28.662</p>
<p className="text-xs text-slate-500">Chunks &middot; Bildungs-Erwartungshorizonte</p>
</div>
<div className="bg-white rounded-xl p-4 border border-slate-200">
<p className="text-xs font-medium text-orange-600 uppercase mb-1">Legal Templates</p>
<p className="text-2xl font-bold text-slate-900">7.689</p>
<p className="text-2xl font-bold text-slate-900">824</p>
<p className="text-xs text-slate-500">Chunks &middot; Dokumentvorlagen</p>
</div>
</div>
@@ -2180,8 +1876,8 @@ export default function RAGPage() {
className="p-4 rounded-lg border border-blue-200 bg-blue-50 hover:bg-blue-100 transition-colors text-left"
>
<p className="text-xs font-medium text-blue-600 uppercase">Gesetze & Regulierungen</p>
<p className="text-2xl font-bold text-slate-900 mt-1">{COLLECTION_TOTALS.total_legal.toLocaleString()}</p>
<p className="text-xs text-slate-500 mt-1">{Object.keys(REGULATIONS_IN_RAG).length}/{REGULATIONS.length} im RAG</p>
<p className="text-2xl font-bold text-slate-900 mt-1">{loading ? '-' : getTotalChunks().toLocaleString()}</p>
<p className="text-xs text-slate-500 mt-1">{REGULATIONS.length} Regulierungen (EU, DE, BSI)</p>
</button>
<button
onClick={() => { setRegulationCategory('dsfa'); setActiveTab('regulations') }}
@@ -2193,12 +1889,12 @@ export default function RAGPage() {
</button>
<div className="p-4 rounded-lg border border-emerald-200 bg-emerald-50 text-left">
<p className="text-xs font-medium text-emerald-600 uppercase">NiBiS EH</p>
<p className="text-2xl font-bold text-slate-900 mt-1">7.996</p>
<p className="text-2xl font-bold text-slate-900 mt-1">28.662</p>
<p className="text-xs text-slate-500 mt-1">Chunks &middot; Bildungs-Erwartungshorizonte</p>
</div>
<div className="p-4 rounded-lg border border-orange-200 bg-orange-50 text-left">
<p className="text-xs font-medium text-orange-600 uppercase">Legal Templates</p>
<p className="text-2xl font-bold text-slate-900 mt-1">7.689</p>
<p className="text-2xl font-bold text-slate-900 mt-1">824</p>
<p className="text-xs text-slate-500 mt-1">Chunks &middot; Dokumentvorlagen (VVT, TOM, DSFA)</p>
</div>
</div>
@@ -2208,13 +1904,12 @@ export default function RAGPage() {
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
{Object.entries(TYPE_LABELS).map(([type, label]) => {
const regs = REGULATIONS.filter((r) => r.type === type)
const inRagCount = regs.filter((r) => isInRag(r.code)).length
const totalChunks = regs.reduce((sum, r) => sum + getKnownChunks(r.code), 0)
const totalChunks = regs.reduce((sum, r) => sum + getRegulationChunks(r.code), 0)
return (
<div key={type} className="bg-white rounded-xl p-4 border border-slate-200">
<div className="flex items-center gap-2 mb-2">
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[type]}`}>{label}</span>
<span className="text-slate-500 text-sm">{inRagCount}/{regs.length} im RAG</span>
<span className="text-slate-500 text-sm">{regs.length} Dok.</span>
</div>
<p className="text-xl font-bold text-slate-900">{totalChunks.toLocaleString()} Chunks</p>
</div>
@@ -2228,25 +1923,20 @@ export default function RAGPage() {
<h3 className="font-semibold text-slate-900">Top Regulierungen (nach Chunks)</h3>
</div>
<div className="divide-y">
{[...REGULATIONS].sort((a, b) => getKnownChunks(b.code) - getKnownChunks(a.code))
.slice(0, 10)
{REGULATIONS.sort((a, b) => getRegulationChunks(b.code) - getRegulationChunks(a.code))
.slice(0, 5)
.map((reg) => {
const chunks = getKnownChunks(reg.code)
const chunks = getRegulationChunks(reg.code)
return (
<div key={reg.code} className="px-4 py-3 flex items-center justify-between">
<div className="flex items-center gap-3">
{isInRag(reg.code) ? (
<span className="text-green-500 text-sm"></span>
) : (
<span className="text-red-400 text-sm"></span>
)}
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[reg.type]}`}>
{TYPE_LABELS[reg.type]}
</span>
<span className="font-medium text-slate-900">{reg.name}</span>
<span className="text-slate-500 text-sm">({reg.code})</span>
</div>
<span className={`font-bold ${chunks > 0 ? 'text-teal-600' : 'text-slate-300'}`}>{chunks > 0 ? chunks.toLocaleString() + ' Chunks' : '—'}</span>
<span className="font-bold text-teal-600">{chunks.toLocaleString()} Chunks</span>
</div>
)
})}
@@ -2305,13 +1995,7 @@ export default function RAGPage() {
{regulationCategory === 'regulations' && (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50 flex items-center justify-between">
<h3 className="font-semibold text-slate-900">
Alle {REGULATIONS.length} Regulierungen
<span className="ml-2 text-sm font-normal text-slate-500">
({REGULATIONS.filter(r => isInRag(r.code)).length} im RAG,{' '}
{REGULATIONS.filter(r => !isInRag(r.code)).length} ausstehend)
</span>
</h3>
<h3 className="font-semibold text-slate-900">Alle {REGULATIONS.length} Regulierungen</h3>
<button
onClick={fetchStatus}
className="text-sm text-teal-600 hover:text-teal-700"
@@ -2323,7 +2007,6 @@ export default function RAGPage() {
<table className="w-full">
<thead className="bg-slate-50 border-b">
<tr>
<th className="px-4 py-3 text-center text-xs font-medium text-slate-500 uppercase w-12">RAG</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Code</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Typ</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Name</th>
@@ -2334,10 +2017,17 @@ export default function RAGPage() {
</thead>
<tbody className="divide-y">
{REGULATIONS.map((reg) => {
const chunks = getKnownChunks(reg.code)
const inRag = isInRag(reg.code)
let statusColor = inRag ? 'text-green-500' : 'text-red-500'
let statusIcon = inRag ? '✓' : '❌'
const chunks = getRegulationChunks(reg.code)
const ratio = chunks / (reg.expected * 10) // Rough estimate: 10 chunks per requirement
let statusColor = 'text-red-500'
let statusIcon = '❌'
if (ratio > 0.5) {
statusColor = 'text-green-500'
statusIcon = '✓'
} else if (ratio > 0.1) {
statusColor = 'text-yellow-500'
statusIcon = '⚠'
}
const isExpanded = expandedRegulation === reg.code
return (
@@ -2346,13 +2036,6 @@ export default function RAGPage() {
onClick={() => setExpandedRegulation(isExpanded ? null : reg.code)}
className="hover:bg-slate-50 cursor-pointer transition-colors"
>
<td className="px-4 py-3 text-center">
{isInRag(reg.code) ? (
<span className="inline-flex items-center justify-center w-6 h-6 bg-green-100 text-green-600 rounded-full text-xs font-bold" title="Im RAG vorhanden"></span>
) : (
<span className="inline-flex items-center justify-center w-6 h-6 bg-red-50 text-red-400 rounded-full text-xs font-bold" title="Nicht im RAG"></span>
)}
</td>
<td className="px-4 py-3 font-mono font-medium text-teal-600">
<span className="inline-flex items-center gap-2">
<span className={`transform transition-transform ${isExpanded ? 'rotate-90' : ''}`}></span>
@@ -2365,20 +2048,13 @@ export default function RAGPage() {
</span>
</td>
<td className="px-4 py-3 text-slate-900">{reg.name}</td>
<td className="px-4 py-3 text-right font-bold">
<span className={chunks > 0 && chunks < 10 && reg.expected >= 10 ? 'text-amber-600' : ''}>
{chunks.toLocaleString()}
{chunks > 0 && chunks < 10 && reg.expected >= 10 && (
<span className="ml-1 inline-block w-4 h-4 text-[10px] leading-4 text-center bg-amber-100 text-amber-700 rounded-full" title="Verdaechtig niedrig — Ingestion pruefen"></span>
)}
</span>
</td>
<td className="px-4 py-3 text-right font-bold">{chunks.toLocaleString()}</td>
<td className="px-4 py-3 text-right text-slate-500">{reg.expected}</td>
<td className={`px-4 py-3 text-center ${statusColor}`}>{statusIcon}</td>
</tr>
{isExpanded && (
<tr key={`${reg.code}-detail`} className="bg-slate-50">
<td colSpan={7} className="px-4 py-4">
<td colSpan={6} className="px-4 py-4">
<div className="bg-white rounded-lg border border-slate-200 p-4 space-y-3">
<div>
<h4 className="font-semibold text-slate-900 mb-1">{reg.fullName}</h4>
@@ -2418,28 +2094,16 @@ export default function RAGPage() {
</span>
)}
</div>
<div className="flex items-center gap-3">
{REGULATION_SOURCES[reg.code] && (
<a
href={REGULATION_SOURCES[reg.code]}
target="_blank"
rel="noopener noreferrer"
onClick={(e) => e.stopPropagation()}
className="text-blue-600 hover:text-blue-700 font-medium"
>
Originalquelle
</a>
)}
<button
onClick={(e) => {
e.stopPropagation()
setActiveTab('chunks')
}}
className="text-teal-600 hover:text-teal-700 font-medium"
>
In Chunks suchen
</button>
</div>
<button
onClick={(e) => {
e.stopPropagation()
setSearchQuery(reg.name)
setActiveTab('search')
}}
className="text-teal-600 hover:text-teal-700 font-medium"
>
In Chunks suchen
</button>
</div>
</div>
</td>
@@ -2568,7 +2232,7 @@ export default function RAGPage() {
<div className="grid grid-cols-3 gap-4 mb-4">
<div className="bg-emerald-50 rounded-lg p-4 border border-emerald-200">
<p className="text-sm text-emerald-600 font-medium">Chunks</p>
<p className="text-2xl font-bold text-slate-900">7.996</p>
<p className="text-2xl font-bold text-slate-900">28.662</p>
</div>
<div className="bg-emerald-50 rounded-lg p-4 border border-emerald-200">
<p className="text-sm text-emerald-600 font-medium">Vector Size</p>
@@ -2600,7 +2264,7 @@ export default function RAGPage() {
<div className="grid grid-cols-3 gap-4 mb-4">
<div className="bg-orange-50 rounded-lg p-4 border border-orange-200">
<p className="text-sm text-orange-600 font-medium">Chunks</p>
<p className="text-2xl font-bold text-slate-900">7.689</p>
<p className="text-2xl font-bold text-slate-900">824</p>
</div>
<div className="bg-orange-50 rounded-lg p-4 border border-orange-200">
<p className="text-sm text-orange-600 font-medium">Vector Size</p>
@@ -2668,28 +2332,20 @@ export default function RAGPage() {
</div>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-3">
{regs.map((reg) => {
const regInRag = isInRag(reg.code)
return (
{regs.map((reg) => (
<div
key={reg.code}
className={`bg-white p-3 rounded-lg border ${regInRag ? 'border-green-200' : 'border-slate-200'}`}
className="bg-white p-3 rounded-lg border border-slate-200"
>
<div className="flex items-center gap-2 mb-1">
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[reg.type]}`}>
{reg.code}
</span>
{regInRag ? (
<span className="px-1.5 py-0.5 text-[10px] font-bold bg-green-100 text-green-600 rounded">RAG</span>
) : (
<span className="px-1.5 py-0.5 text-[10px] font-bold bg-red-50 text-red-400 rounded"></span>
)}
</div>
<div className="font-medium text-sm text-slate-900">{reg.name}</div>
<div className="text-xs text-slate-500 mt-1 line-clamp-2">{reg.description}</div>
</div>
)
})}
))}
</div>
</>
)
@@ -2716,22 +2372,17 @@ export default function RAGPage() {
<div className="flex flex-wrap gap-2">
{group.regulations.map((code) => {
const reg = REGULATIONS.find(r => r.code === code)
const codeInRag = isInRag(code)
return (
<span
key={code}
className={`px-3 py-1.5 rounded-full text-sm font-medium cursor-pointer ${
codeInRag
? 'bg-green-100 text-green-700 hover:bg-green-200'
: 'bg-slate-100 text-slate-700 hover:bg-slate-200'
}`}
className="px-3 py-1.5 bg-slate-100 rounded-full text-sm font-medium text-slate-700 hover:bg-slate-200 cursor-pointer"
onClick={() => {
setActiveTab('regulations')
setExpandedRegulation(code)
}}
title={`${reg?.fullName || code}${codeInRag ? ' (im RAG)' : ' (nicht im RAG)'}`}
title={reg?.fullName || code}
>
{codeInRag ? '✓ ' : '✗ '}{code}
{code}
</span>
)
})}
@@ -2755,13 +2406,9 @@ export default function RAGPage() {
{intersection.regulations.map((code) => (
<span
key={code}
className={`px-2 py-0.5 text-xs font-medium rounded ${
isInRag(code)
? 'bg-green-100 text-green-700'
: 'bg-red-50 text-red-500'
}`}
className="px-2 py-0.5 text-xs font-medium bg-teal-100 text-teal-700 rounded"
>
{isInRag(code) ? '✓ ' : '✗ '}{code}
{code}
</span>
))}
</div>
@@ -2796,15 +2443,8 @@ export default function RAGPage() {
<tbody className="divide-y">
{REGULATIONS.map((reg) => (
<tr key={reg.code} className="hover:bg-slate-50">
<td className="px-2 py-2 font-medium sticky left-0 bg-white">
<span className="flex items-center gap-1">
{isInRag(reg.code) ? (
<span className="text-green-500 text-[10px]"></span>
) : (
<span className="text-red-300 text-[10px]"></span>
)}
<span className="text-teal-600">{reg.code}</span>
</span>
<td className="px-2 py-2 font-medium text-teal-600 sticky left-0 bg-white">
{reg.code}
</td>
{INDUSTRIES.filter(i => i.id !== 'all').map((industry) => {
const applies = INDUSTRY_REGULATION_MAP[industry.id]?.includes(reg.code)
@@ -2891,32 +2531,26 @@ export default function RAGPage() {
</div>
</div>
{/* RAG Coverage Overview */}
{/* Integrated Regulations */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center gap-3 mb-4">
<span className="text-2xl"></span>
<div>
<h3 className="font-semibold text-slate-900">RAG-Abdeckung ({Object.keys(REGULATIONS_IN_RAG).length} von {REGULATIONS.length} Regulierungen)</h3>
<p className="text-sm text-slate-500">Stand: Februar 2026 Alle im RAG-System verfuegbaren Regulierungen</p>
<h3 className="font-semibold text-slate-900">Neu integrierte Regulierungen</h3>
<p className="text-sm text-slate-500">Jetzt im RAG-System verfuegbar (Stand: Januar 2025)</p>
</div>
</div>
<div className="flex flex-wrap gap-2">
{REGULATIONS.filter(r => isInRag(r.code)).map((reg) => (
<span key={reg.code} className="px-2.5 py-1 text-xs font-medium bg-green-100 text-green-700 rounded-full border border-green-200">
{reg.code}
</span>
))}
</div>
<div className="mt-4 pt-4 border-t border-slate-100">
<p className="text-xs font-medium text-slate-500 mb-2">Noch nicht im RAG:</p>
<div className="flex flex-wrap gap-2">
{REGULATIONS.filter(r => !isInRag(r.code)).map((reg) => (
<span key={reg.code} className="px-2.5 py-1 text-xs font-medium bg-red-50 text-red-400 rounded-full border border-red-100">
{reg.code}
<div className="grid grid-cols-2 md:grid-cols-5 gap-3">
{INTEGRATED_REGULATIONS.map((reg) => (
<div key={reg.code} className="rounded-lg border border-green-200 bg-green-50 p-3 text-center">
<span className="px-2 py-1 text-sm font-bold bg-green-100 text-green-700 rounded">
{reg.code}
</span>
))}
</div>
<p className="text-xs text-slate-600 mt-2">{reg.name}</p>
<p className="text-xs text-green-600 mt-1">Im RAG</p>
</div>
))}
</div>
</div>
@@ -3080,10 +2714,6 @@ export default function RAGPage() {
</div>
)}
{activeTab === 'chunks' && (
<ChunkBrowserQA apiProxy={API_PROXY} />
)}
{activeTab === 'data' && (
<div className="space-y-6">
{/* Upload Document */}
@@ -3269,7 +2899,7 @@ export default function RAGPage() {
<span className="flex items-center gap-2 text-teal-600">
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
Ingestion laeuft...
</span>
@@ -3339,7 +2969,7 @@ export default function RAGPage() {
{pipelineStarting ? (
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
) : (
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
@@ -3358,7 +2988,7 @@ export default function RAGPage() {
{pipelineLoading ? (
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
) : (
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
@@ -3391,7 +3021,7 @@ export default function RAGPage() {
<>
<svg className="animate-spin h-5 w-5" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
Startet...
</>
@@ -3428,7 +3058,7 @@ export default function RAGPage() {
{pipelineState.status === 'running' && (
<svg className="w-6 h-6 text-blue-600 animate-spin" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
)}
{pipelineState.status === 'failed' && (

View File

@@ -1,241 +0,0 @@
/**
* Shared RAG constants used by both page.tsx and ChunkBrowserQA.
* REGULATIONS_IN_RAG maps regulation codes to their Qdrant collection, chunk count, and qdrant_id.
* The qdrant_id is the actual `regulation_id` value stored in Qdrant payloads.
* REGULATION_INFO provides minimal metadata (code, name, type) for all regulations.
*/
export interface RagRegulationEntry {
collection: string
chunks: number
qdrant_id: string // The actual regulation_id value in Qdrant payload
}
export const REGULATIONS_IN_RAG: Record<string, RagRegulationEntry> = {
// === EU Verordnungen/Richtlinien (bp_compliance_ce) ===
GDPR: { collection: 'bp_compliance_ce', chunks: 423, qdrant_id: 'eu_2016_679' },
EPRIVACY: { collection: 'bp_compliance_ce', chunks: 134, qdrant_id: 'eu_2002_58' },
SCC: { collection: 'bp_compliance_ce', chunks: 330, qdrant_id: 'eu_2021_914' },
SCC_FULL_TEXT: { collection: 'bp_compliance_ce', chunks: 330, qdrant_id: 'eu_2021_914' },
AIACT: { collection: 'bp_compliance_ce', chunks: 726, qdrant_id: 'eu_2024_1689' },
CRA: { collection: 'bp_compliance_ce', chunks: 429, qdrant_id: 'eu_2024_2847' },
NIS2: { collection: 'bp_compliance_ce', chunks: 342, qdrant_id: 'eu_2022_2555' },
DGA: { collection: 'bp_compliance_ce', chunks: 508, qdrant_id: 'eu_2022_868' },
DSA: { collection: 'bp_compliance_ce', chunks: 1106, qdrant_id: 'eu_2022_2065' },
PLD: { collection: 'bp_compliance_ce', chunks: 44, qdrant_id: 'eu_1985_374' },
E_COMMERCE_RL: { collection: 'bp_compliance_ce', chunks: 197, qdrant_id: 'eu_2000_31' },
VERBRAUCHERRECHTE_RL: { collection: 'bp_compliance_ce', chunks: 266, qdrant_id: 'eu_2011_83' },
DIGITALE_INHALTE_RL: { collection: 'bp_compliance_ce', chunks: 321, qdrant_id: 'eu_2019_770' },
DMA: { collection: 'bp_compliance_ce', chunks: 701, qdrant_id: 'eu_2022_1925' },
DPF: { collection: 'bp_compliance_ce', chunks: 2464, qdrant_id: 'dpf' },
EUCSA: { collection: 'bp_compliance_ce', chunks: 558, qdrant_id: 'eucsa' },
DATAACT: { collection: 'bp_compliance_ce', chunks: 809, qdrant_id: 'dataact' },
DORA: { collection: 'bp_compliance_ce', chunks: 823, qdrant_id: 'dora' },
PSD2: { collection: 'bp_compliance_ce', chunks: 796, qdrant_id: 'psd2' },
AMLR: { collection: 'bp_compliance_ce', chunks: 1182, qdrant_id: 'amlr' },
MiCA: { collection: 'bp_compliance_ce', chunks: 1640, qdrant_id: 'mica' },
EHDS: { collection: 'bp_compliance_ce', chunks: 1212, qdrant_id: 'ehds' },
EAA: { collection: 'bp_compliance_ce', chunks: 433, qdrant_id: 'eaa' },
DSM: { collection: 'bp_compliance_ce', chunks: 416, qdrant_id: 'dsm' },
GPSR: { collection: 'bp_compliance_ce', chunks: 509, qdrant_id: 'gpsr' },
MACHINERY_REG: { collection: 'bp_compliance_ce', chunks: 1271, qdrant_id: 'eu_2023_1230' },
BLUE_GUIDE: { collection: 'bp_compliance_ce', chunks: 2271, qdrant_id: 'eu_blue_guide_2022' },
EU_IFRS_DE: { collection: 'bp_compliance_ce', chunks: 34388, qdrant_id: 'eu_2023_1803' },
EU_IFRS_EN: { collection: 'bp_compliance_ce', chunks: 34388, qdrant_id: 'eu_2023_1803' },
// International standards in bp_compliance_ce
NIST_SSDF: { collection: 'bp_compliance_ce', chunks: 111, qdrant_id: 'nist_sp_800_218' },
NIST_CSF_2: { collection: 'bp_compliance_ce', chunks: 67, qdrant_id: 'nist_csf_2_0' },
OECD_AI_PRINCIPLES: { collection: 'bp_compliance_ce', chunks: 34, qdrant_id: 'oecd_ai_principles' },
ENISA_SECURE_BY_DESIGN: { collection: 'bp_compliance_ce', chunks: 97, qdrant_id: 'cisa_secure_by_design' },
ENISA_SUPPLY_CHAIN: { collection: 'bp_compliance_ce', chunks: 110, qdrant_id: 'enisa_supply_chain_good_practices' },
ENISA_THREAT_LANDSCAPE: { collection: 'bp_compliance_ce', chunks: 118, qdrant_id: 'enisa_threat_landscape_supply_chain' },
ENISA_ICS_SCADA: { collection: 'bp_compliance_ce', chunks: 195, qdrant_id: 'enisa_ics_scada_dependencies' },
ENISA_CYBERSECURITY_2024: { collection: 'bp_compliance_ce', chunks: 22, qdrant_id: 'enisa_cybersecurity_state_2024' },
// === DE Gesetze (bp_compliance_gesetze) ===
TDDDG: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'tdddg_25' },
TMG_KOMPLETT: { collection: 'bp_compliance_gesetze', chunks: 108, qdrant_id: 'tmg_komplett' },
BDSG_FULL: { collection: 'bp_compliance_gesetze', chunks: 1056, qdrant_id: 'bdsg_2018_komplett' },
DE_DDG: { collection: 'bp_compliance_gesetze', chunks: 40, qdrant_id: 'ddg_5' },
DE_BGB_AGB: { collection: 'bp_compliance_gesetze', chunks: 4024, qdrant_id: 'bgb_komplett' },
DE_EGBGB: { collection: 'bp_compliance_gesetze', chunks: 36, qdrant_id: 'egbgb_widerruf' },
DE_HGB_RET: { collection: 'bp_compliance_gesetze', chunks: 11363, qdrant_id: 'hgb_komplett' },
DE_AO_RET: { collection: 'bp_compliance_gesetze', chunks: 9669, qdrant_id: 'ao_komplett' },
DE_TKG: { collection: 'bp_compliance_gesetze', chunks: 1631, qdrant_id: 'de_tkg' },
DE_DLINFOV: { collection: 'bp_compliance_gesetze', chunks: 21, qdrant_id: 'de_dlinfov' },
DE_BETRVG: { collection: 'bp_compliance_gesetze', chunks: 498, qdrant_id: 'de_betrvg' },
DE_GESCHGEHG: { collection: 'bp_compliance_gesetze', chunks: 63, qdrant_id: 'de_geschgehg' },
DE_USTG_RET: { collection: 'bp_compliance_gesetze', chunks: 1071, qdrant_id: 'de_ustg_ret' },
DE_URHG: { collection: 'bp_compliance_gesetze', chunks: 626, qdrant_id: 'urhg_komplett' },
// === BSI Standards (bp_compliance_gesetze) ===
'BSI-TR-03161-1': { collection: 'bp_compliance_gesetze', chunks: 138, qdrant_id: 'bsi_tr_03161_1' },
'BSI-TR-03161-2': { collection: 'bp_compliance_gesetze', chunks: 124, qdrant_id: 'bsi_tr_03161_2' },
'BSI-TR-03161-3': { collection: 'bp_compliance_gesetze', chunks: 121, qdrant_id: 'bsi_tr_03161_3' },
// === AT Gesetze (bp_compliance_gesetze) ===
AT_DSG: { collection: 'bp_compliance_gesetze', chunks: 805, qdrant_id: 'at_dsg' },
AT_DSG_FULL: { collection: 'bp_compliance_gesetze', chunks: 6, qdrant_id: 'at_dsg_full' },
AT_ECG: { collection: 'bp_compliance_gesetze', chunks: 120, qdrant_id: 'at_ecg' },
AT_TKG: { collection: 'bp_compliance_gesetze', chunks: 4348, qdrant_id: 'at_tkg' },
AT_KSCHG: { collection: 'bp_compliance_gesetze', chunks: 402, qdrant_id: 'at_kschg' },
AT_FAGG: { collection: 'bp_compliance_gesetze', chunks: 2, qdrant_id: 'at_fagg' },
AT_UGB_RET: { collection: 'bp_compliance_gesetze', chunks: 2828, qdrant_id: 'at_ugb_ret' },
AT_BAO_RET: { collection: 'bp_compliance_gesetze', chunks: 2246, qdrant_id: 'at_bao_ret' },
AT_MEDIENG: { collection: 'bp_compliance_gesetze', chunks: 571, qdrant_id: 'at_medieng' },
AT_ABGB_AGB: { collection: 'bp_compliance_gesetze', chunks: 2521, qdrant_id: 'at_abgb_agb' },
AT_UWG: { collection: 'bp_compliance_gesetze', chunks: 403, qdrant_id: 'at_uwg' },
// === CH Gesetze (bp_compliance_gesetze) ===
CH_DSG: { collection: 'bp_compliance_gesetze', chunks: 180, qdrant_id: 'ch_revdsg' },
CH_DSV: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'ch_dsv' },
CH_OR_AGB: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'ch_or_agb' },
CH_GEBUV: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'ch_gebuv' },
CH_ZERTES: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'ch_zertes' },
CH_ZGB_PERS: { collection: 'bp_compliance_gesetze', chunks: 5, qdrant_id: 'ch_zgb_pers' },
// === Nationale Gesetze (andere EU) in bp_compliance_gesetze ===
ES_LOPDGDD: { collection: 'bp_compliance_gesetze', chunks: 782, qdrant_id: 'es_lopdgdd' },
IT_CODICE_PRIVACY: { collection: 'bp_compliance_gesetze', chunks: 59, qdrant_id: 'it_codice_privacy' },
NL_UAVG: { collection: 'bp_compliance_gesetze', chunks: 523, qdrant_id: 'nl_uavg' },
FR_CNIL_GUIDE: { collection: 'bp_compliance_gesetze', chunks: 562, qdrant_id: 'fr_loi_informatique' },
IE_DPA_2018: { collection: 'bp_compliance_gesetze', chunks: 64, qdrant_id: 'ie_dpa_2018' },
UK_DPA_2018: { collection: 'bp_compliance_gesetze', chunks: 156, qdrant_id: 'uk_dpa_2018' },
UK_GDPR: { collection: 'bp_compliance_gesetze', chunks: 45, qdrant_id: 'uk_gdpr' },
NO_PERSONOPPLYSNINGSLOVEN: { collection: 'bp_compliance_gesetze', chunks: 41, qdrant_id: 'no_pol' },
SE_DATASKYDDSLAG: { collection: 'bp_compliance_gesetze', chunks: 56, qdrant_id: 'se_dataskyddslag' },
PL_UODO: { collection: 'bp_compliance_gesetze', chunks: 39, qdrant_id: 'pl_ustawa' },
CZ_ZOU: { collection: 'bp_compliance_gesetze', chunks: 238, qdrant_id: 'cz_zakon' },
HU_INFOTV: { collection: 'bp_compliance_gesetze', chunks: 747, qdrant_id: 'hu_info_tv' },
LU_DPA_LAW: { collection: 'bp_compliance_gesetze', chunks: 2, qdrant_id: 'lu_dpa_law' },
// === EDPB Guidelines (bp_compliance_datenschutz) ===
EDPB_GUIDELINES_5_2020: { collection: 'bp_compliance_datenschutz', chunks: 236, qdrant_id: 'edpb_05_2020' },
EDPB_GUIDELINES_7_2020: { collection: 'bp_compliance_datenschutz', chunks: 347, qdrant_id: 'edpb_guidelines_7_2020' },
EDPB_GUIDELINES_1_2020: { collection: 'bp_compliance_datenschutz', chunks: 337, qdrant_id: 'edpb_01_2020' },
EDPB_GUIDELINES_1_2022: { collection: 'bp_compliance_datenschutz', chunks: 510, qdrant_id: 'edpb_01_2022' },
EDPB_GUIDELINES_2_2023: { collection: 'bp_compliance_datenschutz', chunks: 94, qdrant_id: 'edpb_02_2023' },
EDPB_GUIDELINES_2_2024: { collection: 'bp_compliance_datenschutz', chunks: 79, qdrant_id: 'edpb_02_2024' },
EDPB_GUIDELINES_4_2019: { collection: 'bp_compliance_datenschutz', chunks: 202, qdrant_id: 'edpb_04_2019' },
EDPB_GUIDELINES_9_2022: { collection: 'bp_compliance_datenschutz', chunks: 243, qdrant_id: 'edpb_09_2022' },
EDPB_DPIA_LIST: { collection: 'bp_compliance_datenschutz', chunks: 29, qdrant_id: 'edpb_dpia_list' },
EDPB_LEGITIMATE_INTEREST: { collection: 'bp_compliance_datenschutz', chunks: 336, qdrant_id: 'edpb_legitimate_interest' },
EDPS_DPIA_LIST: { collection: 'bp_compliance_datenschutz', chunks: 35, qdrant_id: 'edps_dpia_list' },
}
/**
* Minimal regulation info for sidebar display.
* Full REGULATIONS array with descriptions remains in page.tsx.
*/
export interface RegulationInfo {
code: string
name: string
type: string
}
export const REGULATION_INFO: RegulationInfo[] = [
// EU Verordnungen
{ code: 'GDPR', name: 'DSGVO', type: 'eu_regulation' },
{ code: 'EPRIVACY', name: 'ePrivacy-Richtlinie', type: 'eu_directive' },
{ code: 'SCC', name: 'Standardvertragsklauseln', type: 'eu_regulation' },
{ code: 'SCC_FULL_TEXT', name: 'SCC Volltext', type: 'eu_regulation' },
{ code: 'DPF', name: 'EU-US Data Privacy Framework', type: 'eu_regulation' },
{ code: 'AIACT', name: 'EU AI Act', type: 'eu_regulation' },
{ code: 'CRA', name: 'Cyber Resilience Act', type: 'eu_regulation' },
{ code: 'NIS2', name: 'NIS2-Richtlinie', type: 'eu_directive' },
{ code: 'EUCSA', name: 'EU Cybersecurity Act', type: 'eu_regulation' },
{ code: 'DATAACT', name: 'Data Act', type: 'eu_regulation' },
{ code: 'DGA', name: 'Data Governance Act', type: 'eu_regulation' },
{ code: 'DSA', name: 'Digital Services Act', type: 'eu_regulation' },
{ code: 'DMA', name: 'Digital Markets Act', type: 'eu_regulation' },
{ code: 'EAA', name: 'European Accessibility Act', type: 'eu_directive' },
{ code: 'DSM', name: 'DSM-Urheberrechtsrichtlinie', type: 'eu_directive' },
{ code: 'PLD', name: 'Produkthaftungsrichtlinie', type: 'eu_directive' },
{ code: 'GPSR', name: 'General Product Safety', type: 'eu_regulation' },
{ code: 'E_COMMERCE_RL', name: 'E-Commerce-Richtlinie', type: 'eu_directive' },
{ code: 'VERBRAUCHERRECHTE_RL', name: 'Verbraucherrechte-RL', type: 'eu_directive' },
{ code: 'DIGITALE_INHALTE_RL', name: 'Digitale-Inhalte-RL', type: 'eu_directive' },
// Financial
{ code: 'DORA', name: 'DORA', type: 'eu_regulation' },
{ code: 'PSD2', name: 'PSD2', type: 'eu_directive' },
{ code: 'AMLR', name: 'AML-Verordnung', type: 'eu_regulation' },
{ code: 'MiCA', name: 'MiCA', type: 'eu_regulation' },
{ code: 'EHDS', name: 'EHDS', type: 'eu_regulation' },
{ code: 'MACHINERY_REG', name: 'Maschinenverordnung', type: 'eu_regulation' },
{ code: 'BLUE_GUIDE', name: 'Blue Guide', type: 'eu_regulation' },
{ code: 'EU_IFRS_DE', name: 'EU-IFRS (DE)', type: 'eu_regulation' },
{ code: 'EU_IFRS_EN', name: 'EU-IFRS (EN)', type: 'eu_regulation' },
// DE Gesetze
{ code: 'TDDDG', name: 'TDDDG', type: 'de_law' },
{ code: 'TMG_KOMPLETT', name: 'TMG', type: 'de_law' },
{ code: 'BDSG_FULL', name: 'BDSG', type: 'de_law' },
{ code: 'DE_DDG', name: 'DDG', type: 'de_law' },
{ code: 'DE_BGB_AGB', name: 'BGB/AGB', type: 'de_law' },
{ code: 'DE_EGBGB', name: 'EGBGB', type: 'de_law' },
{ code: 'DE_HGB_RET', name: 'HGB', type: 'de_law' },
{ code: 'DE_AO_RET', name: 'AO', type: 'de_law' },
{ code: 'DE_TKG', name: 'TKG', type: 'de_law' },
{ code: 'DE_DLINFOV', name: 'DL-InfoV', type: 'de_law' },
{ code: 'DE_BETRVG', name: 'BetrVG', type: 'de_law' },
{ code: 'DE_GESCHGEHG', name: 'GeschGehG', type: 'de_law' },
{ code: 'DE_USTG_RET', name: 'UStG', type: 'de_law' },
{ code: 'DE_URHG', name: 'UrhG', type: 'de_law' },
// BSI
{ code: 'BSI-TR-03161-1', name: 'BSI-TR Teil 1', type: 'bsi_standard' },
{ code: 'BSI-TR-03161-2', name: 'BSI-TR Teil 2', type: 'bsi_standard' },
{ code: 'BSI-TR-03161-3', name: 'BSI-TR Teil 3', type: 'bsi_standard' },
// AT
{ code: 'AT_DSG', name: 'DSG Oesterreich', type: 'at_law' },
{ code: 'AT_DSG_FULL', name: 'DSG Volltext', type: 'at_law' },
{ code: 'AT_ECG', name: 'ECG', type: 'at_law' },
{ code: 'AT_TKG', name: 'TKG AT', type: 'at_law' },
{ code: 'AT_KSCHG', name: 'KSchG', type: 'at_law' },
{ code: 'AT_FAGG', name: 'FAGG', type: 'at_law' },
{ code: 'AT_UGB_RET', name: 'UGB', type: 'at_law' },
{ code: 'AT_BAO_RET', name: 'BAO', type: 'at_law' },
{ code: 'AT_MEDIENG', name: 'MedienG', type: 'at_law' },
{ code: 'AT_ABGB_AGB', name: 'ABGB/AGB', type: 'at_law' },
{ code: 'AT_UWG', name: 'UWG AT', type: 'at_law' },
// CH
{ code: 'CH_DSG', name: 'DSG Schweiz', type: 'ch_law' },
{ code: 'CH_DSV', name: 'DSV', type: 'ch_law' },
{ code: 'CH_OR_AGB', name: 'OR/AGB', type: 'ch_law' },
{ code: 'CH_GEBUV', name: 'GeBuV', type: 'ch_law' },
{ code: 'CH_ZERTES', name: 'ZertES', type: 'ch_law' },
{ code: 'CH_ZGB_PERS', name: 'ZGB', type: 'ch_law' },
// Andere EU nationale
{ code: 'ES_LOPDGDD', name: 'LOPDGDD Spanien', type: 'national_law' },
{ code: 'IT_CODICE_PRIVACY', name: 'Codice Privacy Italien', type: 'national_law' },
{ code: 'NL_UAVG', name: 'UAVG Niederlande', type: 'national_law' },
{ code: 'FR_CNIL_GUIDE', name: 'CNIL Guide RGPD', type: 'national_law' },
{ code: 'IE_DPA_2018', name: 'DPA 2018 Ireland', type: 'national_law' },
{ code: 'UK_DPA_2018', name: 'DPA 2018 UK', type: 'national_law' },
{ code: 'UK_GDPR', name: 'UK GDPR', type: 'national_law' },
{ code: 'NO_PERSONOPPLYSNINGSLOVEN', name: 'Personopplysningsloven', type: 'national_law' },
{ code: 'SE_DATASKYDDSLAG', name: 'Dataskyddslag Schweden', type: 'national_law' },
{ code: 'PL_UODO', name: 'UODO Polen', type: 'national_law' },
{ code: 'CZ_ZOU', name: 'Zakon Tschechien', type: 'national_law' },
{ code: 'HU_INFOTV', name: 'Infotv. Ungarn', type: 'national_law' },
{ code: 'LU_DPA_LAW', name: 'Datenschutzgesetz Luxemburg', type: 'national_law' },
// EDPB
{ code: 'EDPB_GUIDELINES_5_2020', name: 'EDPB GL Einwilligung', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_7_2020', name: 'EDPB GL C/P Konzepte', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_1_2020', name: 'EDPB GL Fahrzeuge', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_1_2022', name: 'EDPB GL Bussgelder', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_2_2023', name: 'EDPB GL Art. 37 Scope', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_2_2024', name: 'EDPB GL 2024', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_4_2019', name: 'EDPB GL Art. 25 DPbD', type: 'eu_guideline' },
{ code: 'EDPB_GUIDELINES_9_2022', name: 'EDPB GL Datenschutzverletzung', type: 'eu_guideline' },
{ code: 'EDPB_DPIA_LIST', name: 'EDPB DPIA-Liste', type: 'eu_guideline' },
{ code: 'EDPB_LEGITIMATE_INTEREST', name: 'EDPB Berecht. Interesse', type: 'eu_guideline' },
{ code: 'EDPS_DPIA_LIST', name: 'EDPS DPIA-Liste', type: 'eu_guideline' },
// International Standards
{ code: 'NIST_SSDF', name: 'NIST SSDF', type: 'international_standard' },
{ code: 'NIST_CSF_2', name: 'NIST CSF 2.0', type: 'international_standard' },
{ code: 'OECD_AI_PRINCIPLES', name: 'OECD AI Principles', type: 'international_standard' },
{ code: 'ENISA_SECURE_BY_DESIGN', name: 'CISA Secure by Design', type: 'international_standard' },
{ code: 'ENISA_SUPPLY_CHAIN', name: 'ENISA Supply Chain', type: 'international_standard' },
{ code: 'ENISA_THREAT_LANDSCAPE', name: 'ENISA Threat Landscape', type: 'international_standard' },
{ code: 'ENISA_ICS_SCADA', name: 'ENISA ICS/SCADA', type: 'international_standard' },
{ code: 'ENISA_CYBERSECURITY_2024', name: 'ENISA Cybersecurity 2024', type: 'international_standard' },
]

View File

@@ -1,163 +0,0 @@
import { describe, it, expect, vi, beforeEach } from 'vitest'
/**
* Tests for Chunk-Browser logic:
* - Collection dropdown has all 10 collections
* - COLLECTION_TOTALS has expected keys
* - Text search highlighting logic
* - Pagination state management
*/
// Replicate the COMPLIANCE_COLLECTIONS from the dropdown
const COMPLIANCE_COLLECTIONS = [
'bp_compliance_gesetze',
'bp_compliance_ce',
'bp_compliance_datenschutz',
'bp_dsfa_corpus',
'bp_compliance_recht',
'bp_legal_templates',
'bp_compliance_gdpr',
'bp_compliance_schulrecht',
'bp_dsfa_templates',
'bp_dsfa_risks',
] as const
// Replicate COLLECTION_TOTALS from page.tsx
const COLLECTION_TOTALS: Record<string, number> = {
bp_compliance_gesetze: 58304,
bp_compliance_ce: 18183,
bp_legal_templates: 7689,
bp_compliance_datenschutz: 2448,
bp_dsfa_corpus: 7867,
bp_compliance_recht: 1425,
bp_nibis_eh: 7996,
total_legal: 76487,
total_all: 103912,
}
describe('Chunk-Browser Logic', () => {
describe('COMPLIANCE_COLLECTIONS', () => {
it('should have exactly 10 collections', () => {
expect(COMPLIANCE_COLLECTIONS).toHaveLength(10)
})
it('should include bp_compliance_ce for IFRS documents', () => {
expect(COMPLIANCE_COLLECTIONS).toContain('bp_compliance_ce')
})
it('should include bp_compliance_datenschutz for EFRAG/ENISA', () => {
expect(COMPLIANCE_COLLECTIONS).toContain('bp_compliance_datenschutz')
})
it('should include bp_compliance_gesetze as default', () => {
expect(COMPLIANCE_COLLECTIONS[0]).toBe('bp_compliance_gesetze')
})
it('should have all collection names starting with bp_', () => {
COMPLIANCE_COLLECTIONS.forEach((col) => {
expect(col).toMatch(/^bp_/)
})
})
})
describe('COLLECTION_TOTALS', () => {
it('should have bp_compliance_ce key', () => {
expect(COLLECTION_TOTALS).toHaveProperty('bp_compliance_ce')
})
it('should have bp_compliance_datenschutz key', () => {
expect(COLLECTION_TOTALS).toHaveProperty('bp_compliance_datenschutz')
})
it('should have positive counts for all collections', () => {
Object.values(COLLECTION_TOTALS).forEach((count) => {
expect(count).toBeGreaterThan(0)
})
})
it('total_all should be greater than total_legal', () => {
expect(COLLECTION_TOTALS.total_all).toBeGreaterThan(COLLECTION_TOTALS.total_legal)
})
})
describe('Text search filtering logic', () => {
const mockChunks = [
{ id: '1', text: 'DSGVO Artikel 1 Datenschutz', regulation_code: 'GDPR' },
{ id: '2', text: 'IFRS 16 Leasing Standard', regulation_code: 'EU_IFRS' },
{ id: '3', text: 'Datenschutz Grundverordnung', regulation_code: 'GDPR' },
{ id: '4', text: 'ENISA Supply Chain Security', regulation_code: 'ENISA' },
]
it('should filter chunks by text search (case insensitive)', () => {
const search = 'datenschutz'
const filtered = mockChunks.filter((c) =>
c.text.toLowerCase().includes(search.toLowerCase())
)
expect(filtered).toHaveLength(2)
})
it('should return all chunks when search is empty', () => {
const search = ''
const filtered = search
? mockChunks.filter((c) => c.text.toLowerCase().includes(search.toLowerCase()))
: mockChunks
expect(filtered).toHaveLength(4)
})
it('should return 0 chunks when no match', () => {
const search = 'blockchain'
const filtered = mockChunks.filter((c) =>
c.text.toLowerCase().includes(search.toLowerCase())
)
expect(filtered).toHaveLength(0)
})
it('should match IFRS chunks', () => {
const search = 'IFRS'
const filtered = mockChunks.filter((c) =>
c.text.toLowerCase().includes(search.toLowerCase())
)
expect(filtered).toHaveLength(1)
expect(filtered[0].regulation_code).toBe('EU_IFRS')
})
})
describe('Pagination state', () => {
it('should start at page 0', () => {
const currentPage = 0
expect(currentPage).toBe(0)
})
it('should increment page on next', () => {
let currentPage = 0
currentPage += 1
expect(currentPage).toBe(1)
})
it('should maintain offset history for back navigation', () => {
const history: (string | null)[] = []
history.push(null) // page 0 offset
history.push('uuid-20') // page 1 offset
history.push('uuid-40') // page 2 offset
// Go back to page 1
const prevOffset = history[history.length - 2]
expect(prevOffset).toBe('uuid-20')
})
it('should reset state on collection change', () => {
let chunkOffset: string | null = 'some-offset'
let chunkHistory: (string | null)[] = [null, 'uuid-1']
let chunkCurrentPage = 3
// Simulate collection change
chunkOffset = null
chunkHistory = []
chunkCurrentPage = 0
expect(chunkOffset).toBeNull()
expect(chunkHistory).toHaveLength(0)
expect(chunkCurrentPage).toBe(0)
})
})
})

View File

@@ -1,90 +0,0 @@
import { describe, it, expect } from 'vitest'
/**
* Tests for RAG page constants - REGULATIONS_IN_RAG, REGULATION_SOURCES, REGULATION_LICENSES
*
* These are defined inline in page.tsx, so we test the data structures
* by importing a subset of the expected values.
*/
// Expected IFRS entries in REGULATIONS_IN_RAG
const EXPECTED_IFRS_ENTRIES = {
EU_IFRS_DE: { collection: 'bp_compliance_ce', chunks: 0 },
EU_IFRS_EN: { collection: 'bp_compliance_ce', chunks: 0 },
EFRAG_ENDORSEMENT: { collection: 'bp_compliance_datenschutz', chunks: 0 },
}
// Expected REGULATION_SOURCES URLs
const EXPECTED_SOURCES = {
GDPR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32016R0679',
EU_IFRS_DE: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1803',
EU_IFRS_EN: 'https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32023R1803',
EFRAG_ENDORSEMENT: 'https://www.efrag.org/activities/endorsement-status-report',
ENISA_SECURE_DEV: 'https://www.enisa.europa.eu/publications/secure-development-best-practices',
NIST_SSDF: 'https://csrc.nist.gov/pubs/sp/800/218/final',
NIST_CSF: 'https://www.nist.gov/cyberframework',
OECD_AI: 'https://oecd.ai/en/ai-principles',
}
describe('RAG Page Constants', () => {
describe('IFRS entries in REGULATIONS_IN_RAG', () => {
it('should have EU_IFRS_DE entry with bp_compliance_ce collection', () => {
expect(EXPECTED_IFRS_ENTRIES.EU_IFRS_DE.collection).toBe('bp_compliance_ce')
})
it('should have EU_IFRS_EN entry with bp_compliance_ce collection', () => {
expect(EXPECTED_IFRS_ENTRIES.EU_IFRS_EN.collection).toBe('bp_compliance_ce')
})
it('should have EFRAG_ENDORSEMENT entry with bp_compliance_datenschutz collection', () => {
expect(EXPECTED_IFRS_ENTRIES.EFRAG_ENDORSEMENT.collection).toBe('bp_compliance_datenschutz')
})
})
describe('REGULATION_SOURCES URLs', () => {
it('should have valid EUR-Lex URLs for EU regulations', () => {
expect(EXPECTED_SOURCES.GDPR).toMatch(/^https:\/\/eur-lex\.europa\.eu/)
expect(EXPECTED_SOURCES.EU_IFRS_DE).toMatch(/^https:\/\/eur-lex\.europa\.eu/)
expect(EXPECTED_SOURCES.EU_IFRS_EN).toMatch(/^https:\/\/eur-lex\.europa\.eu/)
})
it('should have correct CELEX for IFRS DE (32023R1803)', () => {
expect(EXPECTED_SOURCES.EU_IFRS_DE).toContain('32023R1803')
})
it('should have correct CELEX for IFRS EN (32023R1803)', () => {
expect(EXPECTED_SOURCES.EU_IFRS_EN).toContain('32023R1803')
})
it('should have DE language for IFRS DE', () => {
expect(EXPECTED_SOURCES.EU_IFRS_DE).toContain('/DE/')
})
it('should have EN language for IFRS EN', () => {
expect(EXPECTED_SOURCES.EU_IFRS_EN).toContain('/EN/')
})
it('should have EFRAG URL for endorsement status', () => {
expect(EXPECTED_SOURCES.EFRAG_ENDORSEMENT).toMatch(/^https:\/\/www\.efrag\.org/)
})
it('should have ENISA URL for secure development', () => {
expect(EXPECTED_SOURCES.ENISA_SECURE_DEV).toMatch(/^https:\/\/www\.enisa\.europa\.eu/)
})
it('should have NIST URLs for SSDF and CSF', () => {
expect(EXPECTED_SOURCES.NIST_SSDF).toMatch(/nist\.gov/)
expect(EXPECTED_SOURCES.NIST_CSF).toMatch(/nist\.gov/)
})
it('should have OECD URL for AI principles', () => {
expect(EXPECTED_SOURCES.OECD_AI).toMatch(/oecd\.ai/)
})
it('should all be valid HTTPS URLs', () => {
Object.values(EXPECTED_SOURCES).forEach((url) => {
expect(url).toMatch(/^https:\/\//)
})
})
})
})

View File

@@ -1,249 +0,0 @@
import { describe, it, expect, vi, beforeEach } from 'vitest'
// Mock fetch globally
const mockFetch = vi.fn()
global.fetch = mockFetch
// Mock NextRequest and NextResponse
vi.mock('next/server', () => ({
NextRequest: class MockNextRequest {
url: string
constructor(url: string) {
this.url = url
}
},
NextResponse: {
json: (data: unknown, init?: { status?: number }) => ({
data,
status: init?.status || 200,
}),
},
}))
describe('Legal Corpus API Proxy', () => {
beforeEach(() => {
mockFetch.mockClear()
})
describe('scroll action', () => {
it('should call Qdrant scroll endpoint with correct collection', async () => {
const mockScrollResponse = {
result: {
points: [
{ id: 'uuid-1', payload: { text: 'DSGVO Artikel 1', regulation_code: 'GDPR' } },
{ id: 'uuid-2', payload: { text: 'DSGVO Artikel 2', regulation_code: 'GDPR' } },
],
next_page_offset: 'uuid-3',
},
}
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve(mockScrollResponse),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce&limit=20' }
const response = await GET(request as any)
expect(mockFetch).toHaveBeenCalledTimes(1)
const calledUrl = mockFetch.mock.calls[0][0]
expect(calledUrl).toContain('/collections/bp_compliance_ce/points/scroll')
const body = JSON.parse(mockFetch.mock.calls[0][1].body)
expect(body.limit).toBe(20)
expect(body.with_payload).toBe(true)
expect(body.with_vector).toBe(false)
})
it('should pass offset parameter to Qdrant', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({ result: { points: [], next_page_offset: null } }),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_gesetze&offset=some-uuid' }
await GET(request as any)
const body = JSON.parse(mockFetch.mock.calls[0][1].body)
expect(body.offset).toBe('some-uuid')
})
it('should limit chunks to max 100', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({ result: { points: [], next_page_offset: null } }),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce&limit=500' }
await GET(request as any)
const body = JSON.parse(mockFetch.mock.calls[0][1].body)
expect(body.limit).toBe(100)
})
it('should apply text_search filter client-side', async () => {
const mockScrollResponse = {
result: {
points: [
{ id: 'uuid-1', payload: { text: 'DSGVO Artikel 1 Datenschutz' } },
{ id: 'uuid-2', payload: { text: 'IFRS Standard 16 Leasing' } },
{ id: 'uuid-3', payload: { text: 'Datenschutz Grundverordnung' } },
],
next_page_offset: null,
},
}
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve(mockScrollResponse),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce&text_search=Datenschutz' }
const response = await GET(request as any)
// Should filter to only chunks containing "Datenschutz"
expect((response as any).data.chunks).toHaveLength(2)
expect((response as any).data.chunks[0].text).toContain('Datenschutz')
})
it('should flatten payload into chunk objects', async () => {
const mockScrollResponse = {
result: {
points: [
{
id: 'uuid-1',
payload: {
text: 'IFRS 16 Leasing',
regulation_code: 'EU_IFRS',
language: 'de',
celex: '32023R1803',
},
},
],
next_page_offset: null,
},
}
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve(mockScrollResponse),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce' }
const response = await GET(request as any)
const chunk = (response as any).data.chunks[0]
expect(chunk.id).toBe('uuid-1')
expect(chunk.text).toBe('IFRS 16 Leasing')
expect(chunk.regulation_code).toBe('EU_IFRS')
expect(chunk.language).toBe('de')
})
it('should return next_offset from Qdrant response', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({
result: { points: [], next_page_offset: 'next-uuid' },
}),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce' }
const response = await GET(request as any)
expect((response as any).data.next_offset).toBe('next-uuid')
})
it('should handle Qdrant scroll failure', async () => {
mockFetch.mockResolvedValueOnce({
ok: false,
status: 404,
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=nonexistent' }
const response = await GET(request as any)
expect((response as any).status).toBe(404)
})
it('should apply filter when filter_key and filter_value provided', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({ result: { points: [], next_page_offset: null } }),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll&collection=bp_compliance_ce&filter_key=language&filter_value=de' }
await GET(request as any)
const body = JSON.parse(mockFetch.mock.calls[0][1].body)
expect(body.filter).toEqual({
must: [{ key: 'language', match: { value: 'de' } }],
})
})
it('should default collection to bp_compliance_gesetze', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({ result: { points: [], next_page_offset: null } }),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=scroll' }
await GET(request as any)
const calledUrl = mockFetch.mock.calls[0][0]
expect(calledUrl).toContain('/collections/bp_compliance_gesetze/')
})
})
describe('collection-count action', () => {
it('should return points_count from Qdrant collection info', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({
result: { points_count: 55053 },
}),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=collection-count&collection=bp_compliance_ce' }
const response = await GET(request as any)
expect((response as any).data.count).toBe(55053)
})
it('should return 0 when Qdrant is unavailable', async () => {
mockFetch.mockResolvedValueOnce({
ok: false,
status: 500,
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=collection-count&collection=bp_compliance_ce' }
const response = await GET(request as any)
expect((response as any).data.count).toBe(0)
})
it('should default to bp_compliance_gesetze collection', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
json: () => Promise.resolve({ result: { points_count: 1234 } }),
})
const { GET } = await import('../route')
const request = { url: 'http://localhost/api/legal-corpus?action=collection-count' }
await GET(request as any)
const calledUrl = mockFetch.mock.calls[0][0]
expect(calledUrl).toContain('/collections/bp_compliance_gesetze')
})
})
})

View File

@@ -66,99 +66,6 @@ export async function GET(request: NextRequest) {
url += `/traceability?chunk_id=${encodeURIComponent(chunkId || '')}&regulation=${encodeURIComponent(regulation || '')}`
break
}
case 'scroll': {
const collection = searchParams.get('collection') || 'bp_compliance_gesetze'
const limit = parseInt(searchParams.get('limit') || '20', 10)
const offsetParam = searchParams.get('offset')
const filterKey = searchParams.get('filter_key')
const filterValue = searchParams.get('filter_value')
const textSearch = searchParams.get('text_search')
const scrollBody: Record<string, unknown> = {
limit: Math.min(limit, 100),
with_payload: true,
with_vector: false,
}
if (offsetParam) {
scrollBody.offset = offsetParam
}
if (filterKey && filterValue) {
scrollBody.filter = {
must: [{ key: filterKey, match: { value: filterValue } }],
}
}
const scrollRes = await fetch(`${QDRANT_URL}/collections/${encodeURIComponent(collection)}/points/scroll`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(scrollBody),
cache: 'no-store',
})
if (!scrollRes.ok) {
return NextResponse.json({ error: 'Qdrant scroll failed' }, { status: scrollRes.status })
}
const scrollData = await scrollRes.json()
const points = (scrollData.result?.points || []).map((p: { id: string; payload?: Record<string, unknown> }) => ({
id: p.id,
...p.payload,
}))
// Client-side text search filter
let filtered = points
if (textSearch && textSearch.trim()) {
const term = textSearch.toLowerCase()
filtered = points.filter((p: Record<string, unknown>) => {
const text = String(p.text || p.content || p.chunk_text || '')
return text.toLowerCase().includes(term)
})
}
return NextResponse.json({
chunks: filtered,
next_offset: scrollData.result?.next_page_offset || null,
total_in_page: points.length,
})
}
case 'regulation-counts-batch': {
const col = searchParams.get('collection') || 'bp_compliance_gesetze'
// Accept qdrant_ids (actual regulation_id values in Qdrant payload)
const qdrantIds = (searchParams.get('qdrant_ids') || '').split(',').filter(Boolean)
const results: Record<string, number> = {}
for (let i = 0; i < qdrantIds.length; i += 10) {
const batch = qdrantIds.slice(i, i + 10)
await Promise.all(batch.map(async (qid) => {
try {
const res = await fetch(`${QDRANT_URL}/collections/${encodeURIComponent(col)}/points/count`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
filter: { must: [{ key: 'regulation_id', match: { value: qid } }] },
exact: true,
}),
cache: 'no-store',
})
if (res.ok) {
const data = await res.json()
results[qid] = data.result?.count || 0
}
} catch { /* skip failed counts */ }
}))
}
return NextResponse.json({ counts: results })
}
case 'collection-count': {
const col = searchParams.get('collection') || 'bp_compliance_gesetze'
const countRes = await fetch(`${QDRANT_URL}/collections/${encodeURIComponent(col)}`, {
cache: 'no-store',
})
if (!countRes.ok) {
return NextResponse.json({ count: 0 })
}
const countData = await countRes.json()
return NextResponse.json({
count: countData.result?.points_count || 0,
})
}
default:
return NextResponse.json({ error: 'Unknown action' }, { status: 400 })
}

View File

@@ -1,12 +1,8 @@
import type { Metadata } from 'next'
import localFont from 'next/font/local'
import { Inter } from 'next/font/google'
import './globals.css'
const inter = localFont({
src: '../public/fonts/Inter-VariableFont.woff2',
variable: '--font-inter',
display: 'swap',
})
const inter = Inter({ subsets: ['latin'] })
export const metadata: Metadata = {
title: 'BreakPilot Admin Lehrer KI',

View File

@@ -1,320 +0,0 @@
'use client'
import { useState, useMemo } from 'react'
import type { ColumnResult, ColumnGroundTruth, PageRegion } from '@/app/(admin)/ai/ocr-pipeline/types'
interface ColumnControlsProps {
columnResult: ColumnResult | null
onRerun: () => void
onManualMode: () => void
onGtMode: () => void
onGroundTruth: (gt: ColumnGroundTruth) => void
onNext: () => void
isDetecting: boolean
savedGtColumns: PageRegion[] | null
}
const TYPE_COLORS: Record<string, string> = {
column_en: 'bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400',
column_de: 'bg-green-100 text-green-700 dark:bg-green-900/30 dark:text-green-400',
column_example: 'bg-orange-100 text-orange-700 dark:bg-orange-900/30 dark:text-orange-400',
column_text: 'bg-cyan-100 text-cyan-700 dark:bg-cyan-900/30 dark:text-cyan-400',
page_ref: 'bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-400',
column_marker: 'bg-red-100 text-red-700 dark:bg-red-900/30 dark:text-red-400',
column_ignore: 'bg-gray-100 text-gray-500 dark:bg-gray-700/30 dark:text-gray-500',
header: 'bg-gray-100 text-gray-600 dark:bg-gray-700/50 dark:text-gray-400',
footer: 'bg-gray-100 text-gray-600 dark:bg-gray-700/50 dark:text-gray-400',
}
const TYPE_LABELS: Record<string, string> = {
column_en: 'EN',
column_de: 'DE',
column_example: 'Beispiel',
column_text: 'Text',
page_ref: 'Seite',
column_marker: 'Marker',
column_ignore: 'Ignorieren',
header: 'Header',
footer: 'Footer',
}
const METHOD_LABELS: Record<string, string> = {
content: 'Inhalt',
position_enhanced: 'Position',
position_fallback: 'Fallback',
}
interface DiffRow {
index: number
autoCol: PageRegion | null
gtCol: PageRegion | null
diffX: number | null
diffW: number | null
typeMismatch: boolean
}
/** Match auto columns to GT columns by overlap on X-axis (IoU > 50%) */
function computeDiff(autoCols: PageRegion[], gtCols: PageRegion[]): DiffRow[] {
const rows: DiffRow[] = []
const usedGt = new Set<number>()
const usedAuto = new Set<number>()
// Match auto → GT by best X-axis overlap
for (let ai = 0; ai < autoCols.length; ai++) {
const a = autoCols[ai]
let bestIdx = -1
let bestIoU = 0
for (let gi = 0; gi < gtCols.length; gi++) {
if (usedGt.has(gi)) continue
const g = gtCols[gi]
const overlapStart = Math.max(a.x, g.x)
const overlapEnd = Math.min(a.x + a.width, g.x + g.width)
const overlap = Math.max(0, overlapEnd - overlapStart)
const union = (a.width + g.width) - overlap
const iou = union > 0 ? overlap / union : 0
if (iou > bestIoU) {
bestIoU = iou
bestIdx = gi
}
}
if (bestIdx >= 0 && bestIoU > 0.3) {
usedGt.add(bestIdx)
usedAuto.add(ai)
const g = gtCols[bestIdx]
rows.push({
index: rows.length + 1,
autoCol: a,
gtCol: g,
diffX: g.x - a.x,
diffW: g.width - a.width,
typeMismatch: a.type !== g.type,
})
}
}
// Unmatched auto columns
for (let ai = 0; ai < autoCols.length; ai++) {
if (usedAuto.has(ai)) continue
rows.push({
index: rows.length + 1,
autoCol: autoCols[ai],
gtCol: null,
diffX: null,
diffW: null,
typeMismatch: false,
})
}
// Unmatched GT columns
for (let gi = 0; gi < gtCols.length; gi++) {
if (usedGt.has(gi)) continue
rows.push({
index: rows.length + 1,
autoCol: null,
gtCol: gtCols[gi],
diffX: null,
diffW: null,
typeMismatch: false,
})
}
return rows
}
export function ColumnControls({ columnResult, onRerun, onManualMode, onGtMode, onGroundTruth, onNext, isDetecting, savedGtColumns }: ColumnControlsProps) {
const [gtSaved, setGtSaved] = useState(false)
const diffRows = useMemo(() => {
if (!columnResult || !savedGtColumns) return null
const autoCols = columnResult.columns.filter(c => c.type.startsWith('column') || c.type === 'page_ref')
const gtCols = savedGtColumns.filter(c => c.type.startsWith('column') || c.type === 'page_ref')
return computeDiff(autoCols, gtCols)
}, [columnResult, savedGtColumns])
if (!columnResult) return null
const columns = columnResult.columns.filter((c: PageRegion) => c.type.startsWith('column') || c.type === 'page_ref')
const headerFooter = columnResult.columns.filter((c: PageRegion) => !c.type.startsWith('column') && c.type !== 'page_ref')
const handleGt = (isCorrect: boolean) => {
onGroundTruth({ is_correct: isCorrect })
setGtSaved(true)
}
return (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-4">
{/* Summary */}
<div className="flex items-center gap-3 flex-wrap">
<div className="text-sm text-gray-600 dark:text-gray-400">
<span className="font-medium text-gray-800 dark:text-gray-200">{columns.length} Spalten</span> erkannt
{columnResult.duration_seconds > 0 && (
<span className="ml-2 text-xs">({columnResult.duration_seconds}s)</span>
)}
</div>
<button
onClick={onRerun}
disabled={isDetecting}
className="text-xs px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded hover:bg-gray-200 dark:hover:bg-gray-600 transition-colors disabled:opacity-50"
>
Erneut erkennen
</button>
<button
onClick={onManualMode}
className="text-xs px-2 py-1 bg-teal-100 text-teal-700 dark:bg-teal-900/30 dark:text-teal-400 rounded hover:bg-teal-200 dark:hover:bg-teal-900/50 transition-colors"
>
Manuell markieren
</button>
<button
onClick={onGtMode}
className="text-xs px-2 py-1 bg-amber-100 text-amber-700 dark:bg-amber-900/30 dark:text-amber-400 rounded hover:bg-amber-200 dark:hover:bg-amber-900/50 transition-colors"
>
{savedGtColumns ? 'Ground Truth bearbeiten' : 'Ground Truth eintragen'}
</button>
</div>
{/* Column list */}
<div className="space-y-2">
{columns.map((col: PageRegion, i: number) => (
<div key={i} className="flex items-center gap-3 text-sm">
<span className={`px-2 py-0.5 rounded text-xs font-medium ${TYPE_COLORS[col.type] || ''}`}>
{TYPE_LABELS[col.type] || col.type}
</span>
{col.classification_confidence != null && col.classification_confidence < 1.0 && (
<span className="text-xs font-medium text-gray-600 dark:text-gray-300">
{Math.round(col.classification_confidence * 100)}%
</span>
)}
{col.classification_method && (
<span className="text-xs text-gray-400 dark:text-gray-500">
({METHOD_LABELS[col.classification_method] || col.classification_method})
</span>
)}
<span className="text-gray-500 dark:text-gray-400 text-xs font-mono">
x={col.x} y={col.y} {col.width}x{col.height}px
</span>
</div>
))}
{headerFooter.map((r: PageRegion, i: number) => (
<div key={`hf-${i}`} className="flex items-center gap-3 text-sm">
<span className={`px-2 py-0.5 rounded text-xs font-medium ${TYPE_COLORS[r.type] || ''}`}>
{TYPE_LABELS[r.type] || r.type}
</span>
<span className="text-gray-500 dark:text-gray-400 text-xs font-mono">
x={r.x} y={r.y} {r.width}x{r.height}px
</span>
</div>
))}
</div>
{/* Diff table (Auto vs GT) */}
{diffRows && diffRows.length > 0 && (
<div className="border-t border-gray-100 dark:border-gray-700 pt-3">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-2">
Vergleich: Auto vs Ground Truth
</div>
<div className="overflow-x-auto">
<table className="w-full text-xs">
<thead>
<tr className="text-gray-500 dark:text-gray-400 border-b border-gray-100 dark:border-gray-700">
<th className="text-left py-1 pr-2">#</th>
<th className="text-left py-1 pr-2">Auto (Typ, x, w)</th>
<th className="text-left py-1 pr-2">GT (Typ, x, w)</th>
<th className="text-right py-1 pr-2">Diff X</th>
<th className="text-right py-1">Diff W</th>
</tr>
</thead>
<tbody>
{diffRows.map((row) => (
<tr
key={row.index}
className={
!row.autoCol || !row.gtCol || row.typeMismatch
? 'bg-red-50 dark:bg-red-900/10'
: (row.diffX !== null && Math.abs(row.diffX) > 20) || (row.diffW !== null && Math.abs(row.diffW) > 20)
? 'bg-amber-50 dark:bg-amber-900/10'
: ''
}
>
<td className="py-1 pr-2 font-mono text-gray-400">{row.index}</td>
<td className="py-1 pr-2 font-mono">
{row.autoCol ? (
<span>
<span className={`inline-block px-1 rounded ${TYPE_COLORS[row.autoCol.type] || ''}`}>
{TYPE_LABELS[row.autoCol.type] || row.autoCol.type}
</span>
{' '}{row.autoCol.x}, {row.autoCol.width}
</span>
) : (
<span className="text-red-400">fehlt</span>
)}
</td>
<td className="py-1 pr-2 font-mono">
{row.gtCol ? (
<span>
<span className={`inline-block px-1 rounded ${TYPE_COLORS[row.gtCol.type] || ''}`}>
{TYPE_LABELS[row.gtCol.type] || row.gtCol.type}
</span>
{' '}{row.gtCol.x}, {row.gtCol.width}
</span>
) : (
<span className="text-red-400">fehlt</span>
)}
</td>
<td className="py-1 pr-2 text-right font-mono">
{row.diffX !== null ? (
<span className={Math.abs(row.diffX) > 20 ? 'text-amber-600 dark:text-amber-400' : 'text-gray-500'}>
{row.diffX > 0 ? '+' : ''}{row.diffX}
</span>
) : '—'}
</td>
<td className="py-1 text-right font-mono">
{row.diffW !== null ? (
<span className={Math.abs(row.diffW) > 20 ? 'text-amber-600 dark:text-amber-400' : 'text-gray-500'}>
{row.diffW > 0 ? '+' : ''}{row.diffW}
</span>
) : '—'}
</td>
</tr>
))}
</tbody>
</table>
</div>
</div>
)}
{/* Ground Truth + Navigation */}
<div className="flex items-center justify-between pt-2 border-t border-gray-100 dark:border-gray-700">
<div className="flex items-center gap-2">
<span className="text-sm text-gray-500 dark:text-gray-400">Spalten korrekt?</span>
{gtSaved ? (
<span className="text-xs text-green-600 dark:text-green-400">Gespeichert</span>
) : (
<>
<button
onClick={() => handleGt(true)}
className="text-xs px-3 py-1 bg-green-100 text-green-700 dark:bg-green-900/30 dark:text-green-400 rounded hover:bg-green-200 dark:hover:bg-green-900/50 transition-colors"
>
Ja
</button>
<button
onClick={() => handleGt(false)}
className="text-xs px-3 py-1 bg-red-100 text-red-700 dark:bg-red-900/30 dark:text-red-400 rounded hover:bg-red-200 dark:hover:bg-red-900/50 transition-colors"
>
Nein
</button>
</>
)}
</div>
<button
onClick={onNext}
className="px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium"
>
Weiter
</button>
</div>
</div>
)
}

View File

@@ -1,209 +0,0 @@
'use client'
import { useState } from 'react'
import type { DeskewResult, DeskewGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
interface DeskewControlsProps {
deskewResult: DeskewResult | null
showBinarized: boolean
onToggleBinarized: () => void
showGrid: boolean
onToggleGrid: () => void
onManualDeskew: (angle: number) => void
onGroundTruth: (gt: DeskewGroundTruth) => void
onNext: () => void
isApplying: boolean
}
const METHOD_LABELS: Record<string, string> = {
hough: 'Hough-Linien',
word_alignment: 'Wortausrichtung',
manual: 'Manuell',
}
export function DeskewControls({
deskewResult,
showBinarized,
onToggleBinarized,
showGrid,
onToggleGrid,
onManualDeskew,
onGroundTruth,
onNext,
isApplying,
}: DeskewControlsProps) {
const [manualAngle, setManualAngle] = useState(0)
const [gtFeedback, setGtFeedback] = useState<'correct' | 'incorrect' | null>(null)
const [gtNotes, setGtNotes] = useState('')
const [gtSaved, setGtSaved] = useState(false)
const handleGroundTruth = (isCorrect: boolean) => {
setGtFeedback(isCorrect ? 'correct' : 'incorrect')
if (isCorrect) {
onGroundTruth({ is_correct: true })
setGtSaved(true)
}
}
const handleGroundTruthIncorrect = () => {
onGroundTruth({
is_correct: false,
corrected_angle: manualAngle !== 0 ? manualAngle : undefined,
notes: gtNotes || undefined,
})
setGtSaved(true)
}
return (
<div className="space-y-4">
{/* Results */}
{deskewResult && (
<div className="bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 p-4">
<div className="flex flex-wrap items-center gap-3 text-sm">
<div>
<span className="text-gray-500">Winkel:</span>{' '}
<span className="font-mono font-medium">{deskewResult.angle_applied}°</span>
</div>
<div className="h-4 w-px bg-gray-300 dark:bg-gray-600" />
<div>
<span className="text-gray-500">Methode:</span>{' '}
<span className="inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium bg-teal-100 text-teal-700 dark:bg-teal-900/40 dark:text-teal-300">
{METHOD_LABELS[deskewResult.method_used] || deskewResult.method_used}
</span>
</div>
<div className="h-4 w-px bg-gray-300 dark:bg-gray-600" />
<div>
<span className="text-gray-500">Konfidenz:</span>{' '}
<span className="font-mono">{Math.round(deskewResult.confidence * 100)}%</span>
</div>
<div className="h-4 w-px bg-gray-300 dark:bg-gray-600" />
<div className="text-gray-400 text-xs">
Hough: {deskewResult.angle_hough}° | WA: {deskewResult.angle_word_alignment}°
</div>
</div>
{/* Toggles */}
<div className="flex gap-3 mt-3">
<button
onClick={onToggleBinarized}
className={`text-xs px-3 py-1 rounded-full border transition-colors ${
showBinarized
? 'bg-teal-100 border-teal-300 text-teal-700 dark:bg-teal-900/40 dark:border-teal-600 dark:text-teal-300'
: 'border-gray-300 text-gray-500 dark:border-gray-600 dark:text-gray-400'
}`}
>
Binarisiert anzeigen
</button>
<button
onClick={onToggleGrid}
className={`text-xs px-3 py-1 rounded-full border transition-colors ${
showGrid
? 'bg-teal-100 border-teal-300 text-teal-700 dark:bg-teal-900/40 dark:border-teal-600 dark:text-teal-300'
: 'border-gray-300 text-gray-500 dark:border-gray-600 dark:text-gray-400'
}`}
>
Raster anzeigen
</button>
</div>
</div>
)}
{/* Manual angle */}
{deskewResult && (
<div className="bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 p-4">
<div className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">Manuelle Korrektur</div>
<div className="flex items-center gap-3">
<span className="text-xs text-gray-400 w-8 text-right">-5°</span>
<input
type="range"
min={-5}
max={5}
step={0.1}
value={manualAngle}
onChange={(e) => setManualAngle(parseFloat(e.target.value))}
className="flex-1 h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer dark:bg-gray-700 accent-teal-500"
/>
<span className="text-xs text-gray-400 w-8">+5°</span>
<span className="font-mono text-sm w-14 text-right">{manualAngle.toFixed(1)}°</span>
<button
onClick={() => onManualDeskew(manualAngle)}
disabled={isApplying}
className="px-3 py-1.5 text-sm bg-teal-600 text-white rounded-md hover:bg-teal-700 disabled:opacity-50 transition-colors"
>
{isApplying ? '...' : 'Anwenden'}
</button>
</div>
</div>
)}
{/* Ground Truth */}
{deskewResult && (
<div className="bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 p-4">
<div className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Rotation korrekt?
</div>
<p className="text-xs text-gray-400 mb-2">Nur die Drehung bewerten Woelbung/Verzerrung wird im naechsten Schritt korrigiert.</p>
{!gtSaved ? (
<div className="space-y-3">
<div className="flex gap-2">
<button
onClick={() => handleGroundTruth(true)}
className={`px-4 py-1.5 rounded-md text-sm font-medium transition-colors ${
gtFeedback === 'correct'
? 'bg-green-100 text-green-700 ring-2 ring-green-400'
: 'bg-gray-100 text-gray-600 hover:bg-green-50 dark:bg-gray-700 dark:text-gray-300'
}`}
>
Ja
</button>
<button
onClick={() => handleGroundTruth(false)}
className={`px-4 py-1.5 rounded-md text-sm font-medium transition-colors ${
gtFeedback === 'incorrect'
? 'bg-red-100 text-red-700 ring-2 ring-red-400'
: 'bg-gray-100 text-gray-600 hover:bg-red-50 dark:bg-gray-700 dark:text-gray-300'
}`}
>
Nein
</button>
</div>
{gtFeedback === 'incorrect' && (
<div className="space-y-2">
<textarea
value={gtNotes}
onChange={(e) => setGtNotes(e.target.value)}
placeholder="Notizen zur Korrektur..."
className="w-full text-sm border border-gray-300 dark:border-gray-600 rounded-md p-2 bg-white dark:bg-gray-900 text-gray-800 dark:text-gray-200"
rows={2}
/>
<button
onClick={handleGroundTruthIncorrect}
className="text-sm px-3 py-1 bg-red-600 text-white rounded-md hover:bg-red-700 transition-colors"
>
Feedback speichern
</button>
</div>
)}
</div>
) : (
<div className="text-sm text-green-600 dark:text-green-400">
Feedback gespeichert
</div>
)}
</div>
)}
{/* Next button */}
{deskewResult && (
<div className="flex justify-end">
<button
onClick={onNext}
className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium transition-colors"
>
Uebernehmen & Weiter &rarr;
</button>
</div>
)}
</div>
)
}

View File

@@ -1,309 +0,0 @@
'use client'
import { useEffect, useState } from 'react'
import type { DewarpResult, DewarpDetection, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
interface DewarpControlsProps {
dewarpResult: DewarpResult | null
showGrid: boolean
onToggleGrid: () => void
onManualDewarp: (shearDegrees: number) => void
onGroundTruth: (gt: DewarpGroundTruth) => void
onNext: () => void
isApplying: boolean
}
const METHOD_LABELS: Record<string, string> = {
vertical_edge: 'A: Vertikale Kanten',
projection: 'B: Projektions-Varianz',
hough_lines: 'C: Hough-Linien',
text_lines: 'D: Textzeilenanalyse',
manual: 'Manuell',
none: 'Keine Korrektur',
}
/** Colour for a confidence value (0-1). */
function confColor(conf: number): string {
if (conf >= 0.7) return 'text-green-600 dark:text-green-400'
if (conf >= 0.5) return 'text-yellow-600 dark:text-yellow-400'
return 'text-gray-400'
}
/** Short confidence bar (visual). */
function ConfBar({ value }: { value: number }) {
const pct = Math.round(value * 100)
const bg = value >= 0.7 ? 'bg-green-500' : value >= 0.5 ? 'bg-yellow-500' : 'bg-gray-400'
return (
<div className="flex items-center gap-1.5">
<div className="w-16 h-1.5 bg-gray-200 dark:bg-gray-700 rounded-full overflow-hidden">
<div className={`h-full rounded-full ${bg}`} style={{ width: `${pct}%` }} />
</div>
<span className={`text-xs font-mono ${confColor(value)}`}>{pct}%</span>
</div>
)
}
export function DewarpControls({
dewarpResult,
showGrid,
onToggleGrid,
onManualDewarp,
onGroundTruth,
onNext,
isApplying,
}: DewarpControlsProps) {
const [manualShear, setManualShear] = useState(0)
const [gtFeedback, setGtFeedback] = useState<'correct' | 'incorrect' | null>(null)
const [gtNotes, setGtNotes] = useState('')
const [gtSaved, setGtSaved] = useState(false)
const [showDetails, setShowDetails] = useState(false)
// Initialize slider to auto-detected value when result arrives
useEffect(() => {
if (dewarpResult && dewarpResult.shear_degrees !== undefined) {
setManualShear(dewarpResult.shear_degrees)
}
}, [dewarpResult?.shear_degrees])
const handleGroundTruth = (isCorrect: boolean) => {
setGtFeedback(isCorrect ? 'correct' : 'incorrect')
if (isCorrect) {
onGroundTruth({ is_correct: true })
setGtSaved(true)
}
}
const handleGroundTruthIncorrect = () => {
onGroundTruth({
is_correct: false,
corrected_shear: manualShear !== 0 ? manualShear : undefined,
notes: gtNotes || undefined,
})
setGtSaved(true)
}
const wasRejected = dewarpResult && dewarpResult.method_used === 'none' && (dewarpResult.detections || []).length > 0
const wasApplied = dewarpResult && dewarpResult.method_used !== 'none' && dewarpResult.method_used !== 'manual'
const detections = dewarpResult?.detections || []
return (
<div className="space-y-4">
{/* Summary banner */}
{dewarpResult && (
<div className={`rounded-lg border p-4 ${
wasRejected
? 'bg-amber-50 border-amber-200 dark:bg-amber-900/20 dark:border-amber-700'
: wasApplied
? 'bg-green-50 border-green-200 dark:bg-green-900/20 dark:border-green-700'
: 'bg-white border-gray-200 dark:bg-gray-800 dark:border-gray-700'
}`}>
{/* Status line */}
<div className="flex items-center gap-2 mb-3">
<span className={`text-lg ${wasRejected ? '' : wasApplied ? '' : ''}`}>
{wasRejected ? '\u26A0\uFE0F' : wasApplied ? '\u2705' : '\u2796'}
</span>
<span className="text-sm font-medium text-gray-800 dark:text-gray-200">
{wasRejected
? 'Quality Gate: Korrektur verworfen (Projektion nicht verbessert)'
: wasApplied
? `Korrektur angewendet: ${dewarpResult.shear_degrees.toFixed(2)}°`
: dewarpResult.method_used === 'manual'
? `Manuelle Korrektur: ${dewarpResult.shear_degrees.toFixed(2)}°`
: 'Keine Korrektur noetig'}
</span>
</div>
{/* Key metrics */}
<div className="flex flex-wrap items-center gap-4 text-sm">
<div>
<span className="text-gray-500">Scherung:</span>{' '}
<span className="font-mono font-medium">{dewarpResult.shear_degrees.toFixed(2)}°</span>
</div>
<div className="h-4 w-px bg-gray-300 dark:bg-gray-600" />
<div>
<span className="text-gray-500">Methode:</span>{' '}
<span className="inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium bg-teal-100 text-teal-700 dark:bg-teal-900/40 dark:text-teal-300">
{dewarpResult.method_used.includes('+')
? `Ensemble (${dewarpResult.method_used.split('+').map(m => METHOD_LABELS[m] || m).join(' + ')})`
: METHOD_LABELS[dewarpResult.method_used] || dewarpResult.method_used}
</span>
</div>
<div className="h-4 w-px bg-gray-300 dark:bg-gray-600" />
<div className="flex items-center gap-1.5">
<span className="text-gray-500">Konfidenz:</span>
<ConfBar value={dewarpResult.confidence} />
</div>
</div>
{/* Toggles row */}
<div className="flex gap-2 mt-3">
<button
onClick={onToggleGrid}
className={`text-xs px-3 py-1 rounded-full border transition-colors ${
showGrid
? 'bg-teal-100 border-teal-300 text-teal-700 dark:bg-teal-900/40 dark:border-teal-600 dark:text-teal-300'
: 'border-gray-300 text-gray-500 dark:border-gray-600 dark:text-gray-400'
}`}
>
Raster
</button>
{detections.length > 0 && (
<button
onClick={() => setShowDetails(v => !v)}
className={`text-xs px-3 py-1 rounded-full border transition-colors ${
showDetails
? 'bg-blue-100 border-blue-300 text-blue-700 dark:bg-blue-900/40 dark:border-blue-600 dark:text-blue-300'
: 'border-gray-300 text-gray-500 dark:border-gray-600 dark:text-gray-400'
}`}
>
Details ({detections.length} Methoden)
</button>
)}
</div>
{/* Detailed detections */}
{showDetails && detections.length > 0 && (
<div className="mt-3 pt-3 border-t border-gray-200 dark:border-gray-700">
<div className="text-xs text-gray-500 mb-2">Einzelne Detektoren:</div>
<div className="space-y-1.5">
{detections.map((d: DewarpDetection) => {
const isUsed = dewarpResult.method_used.includes(d.method)
const aboveThreshold = d.confidence >= 0.5
return (
<div
key={d.method}
className={`flex items-center gap-3 text-xs px-2 py-1.5 rounded ${
isUsed
? 'bg-teal-50 dark:bg-teal-900/20'
: 'bg-gray-50 dark:bg-gray-800'
}`}
>
<span className="w-4 text-center">
{isUsed ? '\u2713' : aboveThreshold ? '\u2012' : '\u2717'}
</span>
<span className={`w-40 ${isUsed ? 'font-medium text-gray-800 dark:text-gray-200' : 'text-gray-500'}`}>
{METHOD_LABELS[d.method] || d.method}
</span>
<span className="font-mono w-16 text-right">
{d.shear_degrees.toFixed(2)}°
</span>
<ConfBar value={d.confidence} />
{!aboveThreshold && (
<span className="text-gray-400 ml-1">(unter Schwelle)</span>
)}
</div>
)
})}
</div>
{wasRejected && (
<div className="mt-2 text-xs text-amber-600 dark:text-amber-400">
Die Korrektur wurde verworfen, weil die horizontale Projektions-Varianz nach Anwendung nicht besser war als vorher.
</div>
)}
</div>
)}
</div>
)}
{/* Manual shear angle slider */}
{dewarpResult && (
<div className="bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 p-4">
<div className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">Scherwinkel (manuell)</div>
<div className="flex items-center gap-3">
<span className="text-xs text-gray-400 w-10 text-right">-2.0°</span>
<input
type="range"
min={-200}
max={200}
step={5}
value={Math.round(manualShear * 100)}
onChange={(e) => setManualShear(parseInt(e.target.value) / 100)}
className="flex-1 h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer dark:bg-gray-700 accent-teal-500"
/>
<span className="text-xs text-gray-400 w-10">+2.0°</span>
<span className="font-mono text-sm w-16 text-right">{manualShear.toFixed(2)}°</span>
<button
onClick={() => onManualDewarp(manualShear)}
disabled={isApplying}
className="px-3 py-1.5 text-sm bg-teal-600 text-white rounded-md hover:bg-teal-700 disabled:opacity-50 transition-colors"
>
{isApplying ? '...' : 'Anwenden'}
</button>
</div>
<p className="text-xs text-gray-400 mt-1">
Scherung der vertikalen Achse in Grad. Positiv = Spalten nach rechts kippen, negativ = nach links.
</p>
</div>
)}
{/* Ground Truth */}
{dewarpResult && (
<div className="bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 p-4">
<div className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Spalten vertikal ausgerichtet?
</div>
<p className="text-xs text-gray-400 mb-2">Pruefen ob die Spaltenraender jetzt senkrecht zum Raster stehen.</p>
{!gtSaved ? (
<div className="space-y-3">
<div className="flex gap-2">
<button
onClick={() => handleGroundTruth(true)}
className={`px-4 py-1.5 rounded-md text-sm font-medium transition-colors ${
gtFeedback === 'correct'
? 'bg-green-100 text-green-700 ring-2 ring-green-400'
: 'bg-gray-100 text-gray-600 hover:bg-green-50 dark:bg-gray-700 dark:text-gray-300'
}`}
>
Ja
</button>
<button
onClick={() => handleGroundTruth(false)}
className={`px-4 py-1.5 rounded-md text-sm font-medium transition-colors ${
gtFeedback === 'incorrect'
? 'bg-red-100 text-red-700 ring-2 ring-red-400'
: 'bg-gray-100 text-gray-600 hover:bg-red-50 dark:bg-gray-700 dark:text-gray-300'
}`}
>
Nein
</button>
</div>
{gtFeedback === 'incorrect' && (
<div className="space-y-2">
<textarea
value={gtNotes}
onChange={(e) => setGtNotes(e.target.value)}
placeholder="Notizen zur Korrektur..."
className="w-full text-sm border border-gray-300 dark:border-gray-600 rounded-md p-2 bg-white dark:bg-gray-900 text-gray-800 dark:text-gray-200"
rows={2}
/>
<button
onClick={handleGroundTruthIncorrect}
className="text-sm px-3 py-1 bg-red-600 text-white rounded-md hover:bg-red-700 transition-colors"
>
Feedback speichern
</button>
</div>
)}
</div>
) : (
<div className="text-sm text-green-600 dark:text-green-400">
Feedback gespeichert
</div>
)}
</div>
)}
{/* Next button */}
{dewarpResult && (
<div className="flex justify-end">
<button
onClick={onNext}
className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium transition-colors"
>
Uebernehmen & Weiter &rarr;
</button>
</div>
)}
</div>
)
}

View File

@@ -1,403 +0,0 @@
'use client'
import { useCallback, useEffect, useRef, useState } from 'react'
import type { GridCell } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
// Column type → colour mapping
const COL_TYPE_COLORS: Record<string, string> = {
column_en: '#3b82f6', // blue-500
column_de: '#22c55e', // green-500
column_example: '#f97316', // orange-500
column_text: '#a855f7', // purple-500
page_ref: '#06b6d4', // cyan-500
column_marker: '#6b7280', // gray-500
}
interface FabricReconstructionCanvasProps {
sessionId: string
cells: GridCell[]
onCellsChanged: (updates: { cell_id: string; text: string }[]) => void
}
// Fabric.js types (subset used here)
interface FabricCanvas {
add: (...objects: FabricObject[]) => FabricCanvas
remove: (...objects: FabricObject[]) => FabricCanvas
setBackgroundImage: (img: FabricImage, callback: () => void) => void
renderAll: () => void
getObjects: () => FabricObject[]
dispose: () => void
on: (event: string, handler: (e: FabricEvent) => void) => void
setWidth: (w: number) => void
setHeight: (h: number) => void
getActiveObject: () => FabricObject | null
discardActiveObject: () => FabricCanvas
requestRenderAll: () => void
setZoom: (z: number) => void
getZoom: () => number
}
interface FabricObject {
type?: string
left?: number
top?: number
width?: number
height?: number
text?: string
set: (props: Record<string, unknown>) => FabricObject
get: (prop: string) => unknown
data?: Record<string, unknown>
selectable?: boolean
on?: (event: string, handler: () => void) => void
setCoords?: () => void
}
interface FabricImage extends FabricObject {
width?: number
height?: number
scaleX?: number
scaleY?: number
}
interface FabricEvent {
target?: FabricObject
e?: MouseEvent
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
type FabricModule = any
export function FabricReconstructionCanvas({
sessionId,
cells,
onCellsChanged,
}: FabricReconstructionCanvasProps) {
const canvasElRef = useRef<HTMLCanvasElement>(null)
const fabricRef = useRef<FabricCanvas | null>(null)
const fabricModuleRef = useRef<FabricModule>(null)
const [ready, setReady] = useState(false)
const [opacity, setOpacity] = useState(30)
const [zoom, setZoom] = useState(100)
const [selectedCell, setSelectedCell] = useState<string | null>(null)
const [error, setError] = useState('')
// Undo/Redo
const undoStackRef = useRef<{ cellId: string; oldText: string; newText: string }[]>([])
const redoStackRef = useRef<{ cellId: string; oldText: string; newText: string }[]>([])
// ---- Initialise Fabric.js ----
useEffect(() => {
let disposed = false
async function init() {
try {
const fabricModule = await import('fabric')
if (disposed) return
fabricModuleRef.current = fabricModule
const canvasEl = canvasElRef.current
if (!canvasEl) return
// Load background image first to get dimensions
const imgUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
const bgImg = await fabricModule.FabricImage.fromURL(imgUrl, { crossOrigin: 'anonymous' }) as FabricImage
if (disposed) return
const imgW = (bgImg.width || 800) * (bgImg.scaleX || 1)
const imgH = (bgImg.height || 600) * (bgImg.scaleY || 1)
bgImg.set({ opacity: opacity / 100, selectable: false, evented: false } as Record<string, unknown>)
const canvas = new fabricModule.Canvas(canvasEl, {
width: imgW,
height: imgH,
selection: true,
preserveObjectStacking: true,
backgroundImage: bgImg,
}) as unknown as FabricCanvas
fabricRef.current = canvas
canvas.renderAll()
// Add cell objects
addCellObjects(canvas, fabricModule, cells, imgW, imgH)
// Listen for text changes
canvas.on('object:modified', (e: FabricEvent) => {
if (e.target?.data?.cellId) {
const cellId = e.target.data.cellId as string
const newText = (e.target.text || '') as string
onCellsChanged([{ cell_id: cellId, text: newText }])
}
})
// Selection tracking
canvas.on('selection:created', (e: FabricEvent) => {
if (e.target?.data?.cellId) setSelectedCell(e.target.data.cellId as string)
})
canvas.on('selection:updated', (e: FabricEvent) => {
if (e.target?.data?.cellId) setSelectedCell(e.target.data.cellId as string)
})
canvas.on('selection:cleared', () => setSelectedCell(null))
setReady(true)
} catch (err) {
if (!disposed) setError(err instanceof Error ? err.message : 'Fabric.js konnte nicht geladen werden')
}
}
init()
return () => {
disposed = true
fabricRef.current?.dispose()
fabricRef.current = null
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
function addCellObjects(
canvas: FabricCanvas,
fabricModule: FabricModule,
gridCells: GridCell[],
imgW: number,
imgH: number,
) {
for (const cell of gridCells) {
const color = COL_TYPE_COLORS[cell.col_type] || '#6b7280'
const x = (cell.bbox_pct.x / 100) * imgW
const y = (cell.bbox_pct.y / 100) * imgH
const w = (cell.bbox_pct.w / 100) * imgW
const h = (cell.bbox_pct.h / 100) * imgH
const fontSize = Math.max(8, Math.min(18, h * 0.55))
const textObj = new fabricModule.IText(cell.text || '', {
left: x,
top: y,
width: w,
fontSize,
fontFamily: 'monospace',
fill: '#000000',
backgroundColor: `${color}22`,
padding: 2,
editable: true,
selectable: true,
lockScalingFlip: true,
data: {
cellId: cell.cell_id,
colType: cell.col_type,
rowIndex: cell.row_index,
colIndex: cell.col_index,
originalText: cell.text,
},
})
// Border colour matches column type
textObj.set({
borderColor: color,
cornerColor: color,
cornerSize: 6,
transparentCorners: false,
} as Record<string, unknown>)
canvas.add(textObj)
}
canvas.renderAll()
}
// ---- Opacity slider ----
const handleOpacityChange = useCallback((val: number) => {
setOpacity(val)
const canvas = fabricRef.current
if (!canvas) return
// Fabric v6: backgroundImage is a direct property on the canvas
const bgImg = (canvas as unknown as { backgroundImage?: FabricObject }).backgroundImage
if (bgImg) {
bgImg.set({ opacity: val / 100 })
canvas.renderAll()
}
}, [])
// ---- Zoom ----
const handleZoomChange = useCallback((val: number) => {
setZoom(val)
const canvas = fabricRef.current
if (!canvas) return
;(canvas as unknown as { zoom: number }).zoom = val / 100
canvas.requestRenderAll()
}, [])
// ---- Undo / Redo via keyboard ----
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if (!(e.metaKey || e.ctrlKey) || e.key !== 'z') return
e.preventDefault()
const canvas = fabricRef.current
if (!canvas) return
if (e.shiftKey) {
// Redo
const action = redoStackRef.current.pop()
if (!action) return
undoStackRef.current.push(action)
const obj = canvas.getObjects().find(
(o: FabricObject) => o.data?.cellId === action.cellId
)
if (obj) {
obj.set({ text: action.newText } as Record<string, unknown>)
canvas.renderAll()
onCellsChanged([{ cell_id: action.cellId, text: action.newText }])
}
} else {
// Undo
const action = undoStackRef.current.pop()
if (!action) return
redoStackRef.current.push(action)
const obj = canvas.getObjects().find(
(o: FabricObject) => o.data?.cellId === action.cellId
)
if (obj) {
obj.set({ text: action.oldText } as Record<string, unknown>)
canvas.renderAll()
onCellsChanged([{ cell_id: action.cellId, text: action.oldText }])
}
}
}
document.addEventListener('keydown', handler)
return () => document.removeEventListener('keydown', handler)
}, [onCellsChanged])
// ---- Delete selected cell (via context-menu or Delete key) ----
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if (e.key !== 'Delete' && e.key !== 'Backspace') return
// Only delete if not currently editing text inside an IText
const canvas = fabricRef.current
if (!canvas) return
const active = canvas.getActiveObject()
if (!active) return
// If the IText is in editing mode, let the keypress pass through
if ((active as unknown as Record<string, boolean>).isEditing) return
e.preventDefault()
canvas.remove(active)
canvas.discardActiveObject()
canvas.renderAll()
}
document.addEventListener('keydown', handler)
return () => document.removeEventListener('keydown', handler)
}, [])
// ---- Export helpers ----
const handleExportPdf = useCallback(() => {
window.open(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reconstruction/export/pdf`,
'_blank'
)
}, [sessionId])
const handleExportDocx = useCallback(() => {
window.open(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reconstruction/export/docx`,
'_blank'
)
}, [sessionId])
if (error) {
return (
<div className="flex flex-col items-center justify-center py-8 text-red-500 text-sm">
<p>Fabric.js Editor konnte nicht geladen werden:</p>
<p className="text-xs mt-1 text-gray-400">{error}</p>
</div>
)
}
return (
<div className="space-y-2">
{/* Toolbar */}
<div className="flex items-center gap-3 bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 px-3 py-2 text-xs">
{/* Opacity slider */}
<label className="flex items-center gap-1.5 text-gray-500">
Hintergrund
<input
type="range"
min={0} max={100}
value={opacity}
onChange={e => handleOpacityChange(Number(e.target.value))}
className="w-20 h-1 accent-teal-500"
/>
<span className="w-8 text-right">{opacity}%</span>
</label>
<div className="w-px h-5 bg-gray-300 dark:bg-gray-600" />
{/* Zoom */}
<label className="flex items-center gap-1.5 text-gray-500">
Zoom
<button onClick={() => handleZoomChange(Math.max(25, zoom - 25))}
className="px-1.5 py-0.5 border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700">
&minus;
</button>
<span className="w-8 text-center">{zoom}%</span>
<button onClick={() => handleZoomChange(Math.min(200, zoom + 25))}
className="px-1.5 py-0.5 border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700">
+
</button>
<button onClick={() => handleZoomChange(100)}
className="px-1.5 py-0.5 border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700">
Fit
</button>
</label>
<div className="w-px h-5 bg-gray-300 dark:bg-gray-600" />
{/* Selected cell info */}
{selectedCell && (
<span className="text-gray-400">
Zelle: <span className="text-gray-600 dark:text-gray-300">{selectedCell}</span>
</span>
)}
<div className="flex-1" />
{/* Export buttons */}
<button onClick={handleExportPdf}
className="px-2.5 py-1 border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700">
PDF
</button>
<button onClick={handleExportDocx}
className="px-2.5 py-1 border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700">
DOCX
</button>
</div>
{/* Canvas */}
<div className="border rounded-lg overflow-auto dark:border-gray-700 bg-gray-100 dark:bg-gray-900"
style={{ maxHeight: '75vh' }}>
{!ready && (
<div className="flex items-center justify-center py-12">
<div className="animate-spin rounded-full h-5 w-5 border-b-2 border-teal-500" />
<span className="ml-2 text-sm text-gray-500">Canvas wird geladen...</span>
</div>
)}
<canvas ref={canvasElRef} />
</div>
{/* Legend */}
<div className="flex items-center gap-4 text-xs text-gray-500">
{Object.entries(COL_TYPE_COLORS).map(([type, color]) => (
<span key={type} className="flex items-center gap-1">
<span className="w-3 h-3 rounded" style={{ backgroundColor: color + '44', border: `1px solid ${color}` }} />
{type.replace('column_', '').replace('page_', '')}
</span>
))}
<span className="ml-auto text-gray-400">Doppelklick = Text bearbeiten | Delete = Zelle entfernen | Cmd+Z = Undo</span>
</div>
</div>
)
}

View File

@@ -1,143 +0,0 @@
'use client'
import { useState } from 'react'
const A4_WIDTH_MM = 210
const A4_HEIGHT_MM = 297
interface ImageCompareViewProps {
originalUrl: string | null
deskewedUrl: string | null
showGrid: boolean
showGridLeft?: boolean
showBinarized: boolean
binarizedUrl: string | null
leftLabel?: string
rightLabel?: string
}
function MmGridOverlay() {
const lines: React.ReactNode[] = []
// Vertical lines every 10mm
for (let mm = 0; mm <= A4_WIDTH_MM; mm += 10) {
const x = (mm / A4_WIDTH_MM) * 100
const is50 = mm % 50 === 0
lines.push(
<line
key={`v-${mm}`}
x1={x} y1={0} x2={x} y2={100}
stroke={is50 ? 'rgba(59, 130, 246, 0.4)' : 'rgba(59, 130, 246, 0.15)'}
strokeWidth={is50 ? 0.12 : 0.05}
/>
)
// Label every 50mm
if (is50 && mm > 0) {
lines.push(
<text key={`vl-${mm}`} x={x} y={1.2} fill="rgba(59,130,246,0.6)" fontSize="1.2" textAnchor="middle">
{mm}
</text>
)
}
}
// Horizontal lines every 10mm
for (let mm = 0; mm <= A4_HEIGHT_MM; mm += 10) {
const y = (mm / A4_HEIGHT_MM) * 100
const is50 = mm % 50 === 0
lines.push(
<line
key={`h-${mm}`}
x1={0} y1={y} x2={100} y2={y}
stroke={is50 ? 'rgba(59, 130, 246, 0.4)' : 'rgba(59, 130, 246, 0.15)'}
strokeWidth={is50 ? 0.12 : 0.05}
/>
)
if (is50 && mm > 0) {
lines.push(
<text key={`hl-${mm}`} x={0.5} y={y + 0.6} fill="rgba(59,130,246,0.6)" fontSize="1.2">
{mm}
</text>
)
}
}
return (
<svg
viewBox="0 0 100 100"
preserveAspectRatio="none"
className="absolute inset-0 w-full h-full pointer-events-none"
style={{ zIndex: 10 }}
>
<g style={{ pointerEvents: 'none' }}>{lines}</g>
</svg>
)
}
export function ImageCompareView({
originalUrl,
deskewedUrl,
showGrid,
showGridLeft,
showBinarized,
binarizedUrl,
leftLabel,
rightLabel,
}: ImageCompareViewProps) {
const [leftError, setLeftError] = useState(false)
const [rightError, setRightError] = useState(false)
const rightUrl = showBinarized && binarizedUrl ? binarizedUrl : deskewedUrl
return (
<div className="grid grid-cols-1 lg:grid-cols-2 gap-4">
{/* Left: Original */}
<div className="space-y-2">
<h3 className="text-sm font-medium text-gray-500 dark:text-gray-400">{leftLabel || 'Original (unbearbeitet)'}</h3>
<div className="relative bg-gray-100 dark:bg-gray-900 rounded-lg overflow-hidden border border-gray-200 dark:border-gray-700"
style={{ aspectRatio: '210/297' }}>
{originalUrl && !leftError ? (
<>
<img
src={originalUrl}
alt="Original Scan"
className="w-full h-full object-contain"
onError={() => setLeftError(true)}
/>
{showGridLeft && <MmGridOverlay />}
</>
) : (
<div className="flex items-center justify-center h-full text-gray-400">
{leftError ? 'Fehler beim Laden' : 'Noch kein Bild'}
</div>
)}
</div>
</div>
{/* Right: Deskewed with Grid */}
<div className="space-y-2">
<h3 className="text-sm font-medium text-gray-500 dark:text-gray-400">
{rightLabel || `${showBinarized ? 'Binarisiert' : 'Begradigt'}${showGrid ? ' + Raster (mm)' : ''}`}
</h3>
<div className="relative bg-gray-100 dark:bg-gray-900 rounded-lg overflow-hidden border border-gray-200 dark:border-gray-700"
style={{ aspectRatio: '210/297' }}>
{rightUrl && !rightError ? (
<>
<img
src={rightUrl}
alt="Begradigtes Bild"
className="w-full h-full object-contain"
onError={() => setRightError(true)}
/>
{showGrid && <MmGridOverlay />}
</>
) : (
<div className="flex items-center justify-center h-full text-gray-400">
{rightError ? 'Fehler beim Laden' : 'Begradigung laeuft...'}
</div>
)}
</div>
</div>
</div>
)
}

View File

@@ -1,359 +0,0 @@
'use client'
import { useCallback, useEffect, useRef, useState } from 'react'
import type { ColumnTypeKey, PageRegion } from '@/app/(admin)/ai/ocr-pipeline/types'
const COLUMN_TYPES: { value: ColumnTypeKey; label: string }[] = [
{ value: 'column_en', label: 'EN' },
{ value: 'column_de', label: 'DE' },
{ value: 'column_example', label: 'Beispiel' },
{ value: 'column_text', label: 'Text' },
{ value: 'page_ref', label: 'Seite' },
{ value: 'column_marker', label: 'Marker' },
{ value: 'column_ignore', label: 'Ignorieren' },
]
const TYPE_OVERLAY_COLORS: Record<string, string> = {
column_en: 'rgba(59, 130, 246, 0.12)',
column_de: 'rgba(34, 197, 94, 0.12)',
column_example: 'rgba(249, 115, 22, 0.12)',
column_text: 'rgba(6, 182, 212, 0.12)',
page_ref: 'rgba(168, 85, 247, 0.12)',
column_marker: 'rgba(239, 68, 68, 0.12)',
column_ignore: 'rgba(128, 128, 128, 0.06)',
}
const TYPE_BADGE_COLORS: Record<string, string> = {
column_en: 'bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400',
column_de: 'bg-green-100 text-green-700 dark:bg-green-900/30 dark:text-green-400',
column_example: 'bg-orange-100 text-orange-700 dark:bg-orange-900/30 dark:text-orange-400',
column_text: 'bg-cyan-100 text-cyan-700 dark:bg-cyan-900/30 dark:text-cyan-400',
page_ref: 'bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-400',
column_marker: 'bg-red-100 text-red-700 dark:bg-red-900/30 dark:text-red-400',
column_ignore: 'bg-gray-100 text-gray-500 dark:bg-gray-700/30 dark:text-gray-500',
}
// Default column type sequence for newly created columns
const DEFAULT_TYPE_SEQUENCE: ColumnTypeKey[] = [
'page_ref', 'column_en', 'column_de', 'column_example', 'column_text',
]
const MIN_DIVIDER_DISTANCE_PERCENT = 2 // Minimum 2% apart
interface ManualColumnEditorProps {
imageUrl: string
imageWidth: number
imageHeight: number
onApply: (columns: PageRegion[]) => void
onCancel: () => void
applying: boolean
mode?: 'manual' | 'ground-truth'
layout?: 'two-column' | 'stacked'
initialDividers?: number[]
initialColumnTypes?: ColumnTypeKey[]
}
export function ManualColumnEditor({
imageUrl,
imageWidth,
imageHeight,
onApply,
onCancel,
applying,
mode = 'manual',
layout = 'two-column',
initialDividers,
initialColumnTypes,
}: ManualColumnEditorProps) {
const containerRef = useRef<HTMLDivElement>(null)
const [dividers, setDividers] = useState<number[]>(initialDividers ?? [])
const [columnTypes, setColumnTypes] = useState<ColumnTypeKey[]>(initialColumnTypes ?? [])
const [dragging, setDragging] = useState<number | null>(null)
const [imageLoaded, setImageLoaded] = useState(false)
const isGT = mode === 'ground-truth'
// Sync columnTypes length when dividers change
useEffect(() => {
const numColumns = dividers.length + 1
setColumnTypes(prev => {
if (prev.length === numColumns) return prev
const next = [...prev]
while (next.length < numColumns) {
const idx = next.length
next.push(DEFAULT_TYPE_SEQUENCE[idx] || 'column_text')
}
while (next.length > numColumns) {
next.pop()
}
return next
})
}, [dividers.length])
const getXPercent = useCallback((clientX: number): number => {
if (!containerRef.current) return 0
const rect = containerRef.current.getBoundingClientRect()
const pct = ((clientX - rect.left) / rect.width) * 100
return Math.max(0, Math.min(100, pct))
}, [])
const canPlaceDivider = useCallback((xPct: number, excludeIndex?: number): boolean => {
for (let i = 0; i < dividers.length; i++) {
if (i === excludeIndex) continue
if (Math.abs(dividers[i] - xPct) < MIN_DIVIDER_DISTANCE_PERCENT) return false
}
return xPct > MIN_DIVIDER_DISTANCE_PERCENT && xPct < (100 - MIN_DIVIDER_DISTANCE_PERCENT)
}, [dividers])
// Click on image to add a divider
const handleImageClick = useCallback((e: React.MouseEvent) => {
if (dragging !== null) return
// Don't add if clicking on a divider handle
if ((e.target as HTMLElement).dataset.divider) return
const xPct = getXPercent(e.clientX)
if (!canPlaceDivider(xPct)) return
setDividers(prev => [...prev, xPct].sort((a, b) => a - b))
}, [dragging, getXPercent, canPlaceDivider])
// Drag handlers
const handleDividerMouseDown = useCallback((e: React.MouseEvent, index: number) => {
e.stopPropagation()
e.preventDefault()
setDragging(index)
}, [])
useEffect(() => {
if (dragging === null) return
const handleMouseMove = (e: MouseEvent) => {
const xPct = getXPercent(e.clientX)
if (canPlaceDivider(xPct, dragging)) {
setDividers(prev => {
const next = [...prev]
next[dragging] = xPct
return next.sort((a, b) => a - b)
})
}
}
const handleMouseUp = () => {
setDragging(null)
}
window.addEventListener('mousemove', handleMouseMove)
window.addEventListener('mouseup', handleMouseUp)
return () => {
window.removeEventListener('mousemove', handleMouseMove)
window.removeEventListener('mouseup', handleMouseUp)
}
}, [dragging, getXPercent, canPlaceDivider])
const removeDivider = useCallback((index: number) => {
setDividers(prev => prev.filter((_, i) => i !== index))
}, [])
const updateColumnType = useCallback((colIndex: number, type: ColumnTypeKey) => {
setColumnTypes(prev => {
const next = [...prev]
next[colIndex] = type
return next
})
}, [])
const handleApply = useCallback(() => {
// Build PageRegion array from dividers
const sorted = [...dividers].sort((a, b) => a - b)
const columns: PageRegion[] = []
for (let i = 0; i <= sorted.length; i++) {
const leftPct = i === 0 ? 0 : sorted[i - 1]
const rightPct = i === sorted.length ? 100 : sorted[i]
const x = Math.round((leftPct / 100) * imageWidth)
const w = Math.round(((rightPct - leftPct) / 100) * imageWidth)
columns.push({
type: columnTypes[i] || 'column_text',
x,
y: 0,
width: w,
height: imageHeight,
classification_confidence: 1.0,
classification_method: 'manual',
})
}
onApply(columns)
}, [dividers, columnTypes, imageWidth, imageHeight, onApply])
// Compute column regions for overlay
const sorted = [...dividers].sort((a, b) => a - b)
const columnRegions = Array.from({ length: sorted.length + 1 }, (_, i) => ({
leftPct: i === 0 ? 0 : sorted[i - 1],
rightPct: i === sorted.length ? 100 : sorted[i],
type: columnTypes[i] || 'column_text',
}))
return (
<div className="space-y-4">
{/* Layout: image + controls */}
<div className={layout === 'stacked' ? 'space-y-4' : 'grid grid-cols-2 gap-4'}>
{/* Left: Interactive image */}
<div>
<div className="flex items-center justify-between mb-1">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400">
Klicken um Trennlinien zu setzen
</div>
<button
onClick={onCancel}
className="text-xs px-2 py-0.5 text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200"
>
Abbrechen
</button>
</div>
<div
ref={containerRef}
className="relative border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900 cursor-crosshair select-none"
onClick={handleImageClick}
>
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={imageUrl}
alt="Entzerrtes Bild"
className="w-full h-auto block"
draggable={false}
onLoad={() => setImageLoaded(true)}
/>
{imageLoaded && (
<>
{/* Column overlays */}
{columnRegions.map((region, i) => (
<div
key={`col-${i}`}
className="absolute top-0 bottom-0 pointer-events-none"
style={{
left: `${region.leftPct}%`,
width: `${region.rightPct - region.leftPct}%`,
backgroundColor: TYPE_OVERLAY_COLORS[region.type] || 'rgba(128,128,128,0.08)',
}}
>
<span className="absolute top-1 left-1/2 -translate-x-1/2 text-[10px] font-medium text-gray-600 dark:text-gray-300 bg-white/80 dark:bg-gray-800/80 px-1 rounded">
{i + 1}
</span>
</div>
))}
{/* Divider lines */}
{sorted.map((xPct, i) => (
<div
key={`div-${i}`}
data-divider="true"
className="absolute top-0 bottom-0 group"
style={{
left: `${xPct}%`,
transform: 'translateX(-50%)',
width: '12px',
cursor: 'col-resize',
zIndex: 10,
}}
onMouseDown={(e) => handleDividerMouseDown(e, i)}
>
{/* Visible line */}
<div
data-divider="true"
className="absolute top-0 bottom-0 left-1/2 -translate-x-1/2 w-0.5 border-l-2 border-dashed border-red-500"
/>
{/* Delete button */}
<button
data-divider="true"
onClick={(e) => {
e.stopPropagation()
removeDivider(i)
}}
className="absolute top-2 left-1/2 -translate-x-1/2 w-4 h-4 bg-red-500 text-white rounded-full text-[10px] leading-none flex items-center justify-center opacity-0 group-hover:opacity-100 transition-opacity z-20"
title="Linie entfernen"
>
x
</button>
</div>
))}
</>
)}
</div>
</div>
{/* Right: Column type assignment + actions */}
<div className="space-y-4">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Spaltentypen
</div>
{dividers.length === 0 ? (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-6 text-center">
<div className="text-3xl mb-2">👆</div>
<p className="text-sm text-gray-500 dark:text-gray-400">
Klicken Sie auf das Bild links, um vertikale Trennlinien zwischen den Spalten zu setzen.
</p>
<p className="text-xs text-gray-400 dark:text-gray-500 mt-2">
Linien koennen per Drag verschoben und per Hover geloescht werden.
</p>
</div>
) : (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<div className="text-sm text-gray-600 dark:text-gray-400">
<span className="font-medium text-gray-800 dark:text-gray-200">
{dividers.length} Linien = {dividers.length + 1} Spalten
</span>
</div>
<div className="grid gap-2">
{columnRegions.map((region, i) => (
<div key={i} className="flex items-center gap-3">
<span className={`w-16 text-center px-2 py-0.5 rounded text-xs font-medium ${TYPE_BADGE_COLORS[region.type] || 'bg-gray-100 text-gray-600'}`}>
Spalte {i + 1}
</span>
<select
value={columnTypes[i] || 'column_text'}
onChange={(e) => updateColumnType(i, e.target.value as ColumnTypeKey)}
className="text-sm border border-gray-200 dark:border-gray-600 rounded px-2 py-1 bg-white dark:bg-gray-700 text-gray-800 dark:text-gray-200"
>
{COLUMN_TYPES.map(t => (
<option key={t.value} value={t.value}>{t.label}</option>
))}
</select>
<span className="text-xs text-gray-400 font-mono">
{Math.round(region.rightPct - region.leftPct)}%
</span>
</div>
))}
</div>
</div>
)}
{/* Action buttons */}
<div className="flex flex-col gap-2">
<button
onClick={handleApply}
disabled={dividers.length === 0 || applying}
className="w-full px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium disabled:opacity-50 disabled:cursor-not-allowed"
>
{applying
? 'Wird gespeichert...'
: isGT
? `${dividers.length + 1} Spalten als Ground Truth speichern`
: `${dividers.length + 1} Spalten uebernehmen`}
</button>
<button
onClick={() => setDividers([])}
disabled={dividers.length === 0}
className="text-xs px-3 py-2 text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200 disabled:opacity-50"
>
Alle Linien entfernen
</button>
</div>
</div>
</div>
</div>
)
}

View File

@@ -1,115 +0,0 @@
'use client'
import { PipelineStep, DocumentTypeResult } from '@/app/(admin)/ai/ocr-pipeline/types'
const DOC_TYPE_LABELS: Record<string, string> = {
vocab_table: 'Vokabeltabelle',
full_text: 'Volltext',
generic_table: 'Tabelle',
}
interface PipelineStepperProps {
steps: PipelineStep[]
currentStep: number
onStepClick: (index: number) => void
onReprocess?: (index: number) => void
docTypeResult?: DocumentTypeResult | null
onDocTypeChange?: (docType: DocumentTypeResult['doc_type']) => void
}
export function PipelineStepper({
steps,
currentStep,
onStepClick,
onReprocess,
docTypeResult,
onDocTypeChange,
}: PipelineStepperProps) {
return (
<div className="space-y-2">
<div className="flex items-center justify-between px-4 py-3 bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700">
{steps.map((step, index) => {
const isActive = index === currentStep
const isCompleted = step.status === 'completed'
const isFailed = step.status === 'failed'
const isSkipped = step.status === 'skipped'
const isClickable = (index <= currentStep || isCompleted) && !isSkipped
return (
<div key={step.id} className="flex items-center">
{index > 0 && (
<div
className={`h-0.5 w-8 mx-1 ${
isSkipped
? 'bg-gray-200 dark:bg-gray-700 border-t border-dashed border-gray-400'
: index <= currentStep ? 'bg-teal-400' : 'bg-gray-300 dark:bg-gray-600'
}`}
/>
)}
<div className="relative group">
<button
onClick={() => isClickable && onStepClick(index)}
disabled={!isClickable}
className={`flex items-center gap-1.5 px-3 py-1.5 rounded-full text-sm font-medium transition-all ${
isSkipped
? 'bg-gray-100 text-gray-400 dark:bg-gray-800 dark:text-gray-600 line-through'
: isActive
? 'bg-teal-100 text-teal-700 dark:bg-teal-900/40 dark:text-teal-300 ring-2 ring-teal-400'
: isCompleted
? 'bg-green-100 text-green-700 dark:bg-green-900/40 dark:text-green-300'
: isFailed
? 'bg-red-100 text-red-700 dark:bg-red-900/40 dark:text-red-300'
: 'text-gray-400 dark:text-gray-500'
} ${isClickable ? 'cursor-pointer hover:opacity-80' : 'cursor-default'}`}
>
<span className="text-base">
{isSkipped ? '-' : isCompleted ? '\u2713' : isFailed ? '\u2717' : step.icon}
</span>
<span className="hidden sm:inline">{step.name}</span>
<span className="sm:hidden">{index + 1}</span>
</button>
{/* Reprocess button — shown on completed steps on hover */}
{isCompleted && onReprocess && (
<button
onClick={(e) => { e.stopPropagation(); onReprocess(index) }}
className="absolute -top-1 -right-1 w-4 h-4 bg-orange-500 text-white rounded-full text-[9px] leading-none opacity-0 group-hover:opacity-100 transition-opacity flex items-center justify-center"
title={`Ab hier neu verarbeiten`}
>
&#x21BB;
</button>
)}
</div>
</div>
)
})}
</div>
{/* Document type badge */}
{docTypeResult && (
<div className="flex items-center gap-2 px-4 py-2 bg-blue-50 dark:bg-blue-900/20 rounded-lg border border-blue-200 dark:border-blue-800 text-sm">
<span className="text-blue-600 dark:text-blue-400 font-medium">
Dokumenttyp:
</span>
{onDocTypeChange ? (
<select
value={docTypeResult.doc_type}
onChange={(e) => onDocTypeChange(e.target.value as DocumentTypeResult['doc_type'])}
className="bg-white dark:bg-gray-800 border border-blue-300 dark:border-blue-700 rounded px-2 py-0.5 text-sm text-blue-700 dark:text-blue-300"
>
<option value="vocab_table">Vokabeltabelle</option>
<option value="generic_table">Tabelle (generisch)</option>
<option value="full_text">Volltext</option>
</select>
) : (
<span className="text-blue-700 dark:text-blue-300">
{DOC_TYPE_LABELS[docTypeResult.doc_type] || docTypeResult.doc_type}
</span>
)}
<span className="text-blue-400 dark:text-blue-500 text-xs">
({Math.round(docTypeResult.confidence * 100)}% Konfidenz)
</span>
</div>
)}
</div>
)
}

View File

@@ -1,341 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import type { ColumnResult, ColumnGroundTruth, PageRegion } from '@/app/(admin)/ai/ocr-pipeline/types'
import { ColumnControls } from './ColumnControls'
import { ManualColumnEditor } from './ManualColumnEditor'
import type { ColumnTypeKey } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
type ViewMode = 'normal' | 'ground-truth' | 'manual'
interface StepColumnDetectionProps {
sessionId: string | null
onNext: () => void
}
/** Convert PageRegion[] to divider percentages + column types for ManualColumnEditor */
function columnsToEditorState(
columns: PageRegion[],
imageWidth: number
): { dividers: number[]; columnTypes: ColumnTypeKey[] } {
if (!columns.length || !imageWidth) return { dividers: [], columnTypes: [] }
const sorted = [...columns].sort((a, b) => a.x - b.x)
const dividers: number[] = []
const columnTypes: ColumnTypeKey[] = sorted.map(c => c.type)
for (let i = 1; i < sorted.length; i++) {
const xPct = (sorted[i].x / imageWidth) * 100
dividers.push(xPct)
}
return { dividers, columnTypes }
}
export function StepColumnDetection({ sessionId, onNext }: StepColumnDetectionProps) {
const [columnResult, setColumnResult] = useState<ColumnResult | null>(null)
const [detecting, setDetecting] = useState(false)
const [error, setError] = useState<string | null>(null)
const [viewMode, setViewMode] = useState<ViewMode>('normal')
const [applying, setApplying] = useState(false)
const [imageDimensions, setImageDimensions] = useState<{ width: number; height: number } | null>(null)
const [savedGtColumns, setSavedGtColumns] = useState<PageRegion[] | null>(null)
// Fetch session info (image dimensions) + check for cached column result
useEffect(() => {
if (!sessionId || imageDimensions) return
const fetchSessionInfo = async () => {
try {
const infoRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
if (infoRes.ok) {
const info = await infoRes.json()
if (info.image_width && info.image_height) {
setImageDimensions({ width: info.image_width, height: info.image_height })
}
if (info.column_result) {
setColumnResult(info.column_result)
return
}
}
} catch (e) {
console.error('Failed to fetch session info:', e)
}
// No cached result - run auto-detection
runAutoDetection()
}
fetchSessionInfo()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
// Load saved GT if exists
useEffect(() => {
if (!sessionId) return
const fetchGt = async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/columns`)
if (res.ok) {
const data = await res.json()
const corrected = data.columns_gt?.corrected_columns
if (corrected) setSavedGtColumns(corrected)
}
} catch {
// No saved GT - that's fine
}
}
fetchGt()
}, [sessionId])
const runAutoDetection = useCallback(async () => {
if (!sessionId) return
setDetecting(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/columns`, {
method: 'POST',
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
throw new Error(err.detail || 'Spaltenerkennung fehlgeschlagen')
}
const data: ColumnResult = await res.json()
setColumnResult(data)
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
} finally {
setDetecting(false)
}
}, [sessionId])
const handleRerun = useCallback(() => {
runAutoDetection()
}, [runAutoDetection])
const handleGroundTruth = useCallback(async (gt: ColumnGroundTruth) => {
if (!sessionId) return
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/columns`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
} catch (e) {
console.error('Ground truth save failed:', e)
}
}, [sessionId])
const handleManualApply = useCallback(async (columns: PageRegion[]) => {
if (!sessionId) return
setApplying(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/columns/manual`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ columns }),
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
throw new Error(err.detail || 'Manuelle Spalten konnten nicht gespeichert werden')
}
const data = await res.json()
setColumnResult({
columns: data.columns,
duration_seconds: data.duration_seconds ?? 0,
})
setViewMode('normal')
} catch (e) {
setError(e instanceof Error ? e.message : 'Fehler beim Speichern')
} finally {
setApplying(false)
}
}, [sessionId])
const handleGtApply = useCallback(async (columns: PageRegion[]) => {
if (!sessionId) return
setApplying(true)
setError(null)
try {
const gt: ColumnGroundTruth = {
is_correct: false,
corrected_columns: columns,
}
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/columns`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
setSavedGtColumns(columns)
setViewMode('normal')
} catch (e) {
setError(e instanceof Error ? e.message : 'Fehler beim Speichern')
} finally {
setApplying(false)
}
}, [sessionId])
if (!sessionId) {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4">📊</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 3: Spaltenerkennung
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Bitte zuerst Schritt 1 und 2 abschliessen.
</p>
</div>
)
}
const dewarpedUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
const overlayUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/columns-overlay`
// Pre-compute editor state from saved GT or auto columns for GT mode
const gtInitial = savedGtColumns
? columnsToEditorState(savedGtColumns, imageDimensions?.width ?? 1000)
: undefined
return (
<div className="space-y-4">
{/* Loading indicator */}
{detecting && (
<div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
<div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
Spaltenerkennung laeuft...
</div>
)}
{viewMode === 'manual' ? (
/* Manual column editor - overwrites column_result */
<ManualColumnEditor
imageUrl={dewarpedUrl}
imageWidth={imageDimensions?.width ?? 1000}
imageHeight={imageDimensions?.height ?? 1400}
onApply={handleManualApply}
onCancel={() => setViewMode('normal')}
applying={applying}
mode="manual"
/>
) : viewMode === 'ground-truth' ? (
/* GT mode: auto result (left, readonly) + GT editor (right) */
<div className="grid grid-cols-2 gap-4">
{/* Left: Auto result (readonly overlay) */}
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Auto-Ergebnis (readonly)
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{columnResult ? (
// eslint-disable-next-line @next/next/no-img-element
<img
src={`${overlayUrl}?t=${Date.now()}`}
alt="Auto Spalten-Overlay"
className="w-full h-auto"
/>
) : (
<div className="aspect-[3/4] flex items-center justify-center text-gray-400 text-sm">
Keine Auto-Daten
</div>
)}
</div>
{/* Auto column list */}
{columnResult && (
<div className="mt-2 space-y-1">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400">
Auto: {columnResult.columns.length} Spalten
</div>
{columnResult.columns
.filter(c => c.type.startsWith('column') || c.type === 'page_ref')
.map((col, i) => (
<div key={i} className="text-xs text-gray-500 dark:text-gray-400 font-mono">
{i + 1}. {col.type} x={col.x} w={col.width}
</div>
))}
</div>
)}
</div>
{/* Right: GT editor */}
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Ground Truth Editor
</div>
<ManualColumnEditor
imageUrl={dewarpedUrl}
imageWidth={imageDimensions?.width ?? 1000}
imageHeight={imageDimensions?.height ?? 1400}
onApply={handleGtApply}
onCancel={() => setViewMode('normal')}
applying={applying}
mode="ground-truth"
layout="stacked"
initialDividers={gtInitial?.dividers}
initialColumnTypes={gtInitial?.columnTypes}
/>
</div>
</div>
) : (
/* Normal mode: overlay (left) vs clean (right) */
<div className="grid grid-cols-2 gap-4">
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Mit Spalten-Overlay
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{columnResult ? (
// eslint-disable-next-line @next/next/no-img-element
<img
src={`${overlayUrl}?t=${Date.now()}`}
alt="Spalten-Overlay"
className="w-full h-auto"
/>
) : (
<div className="aspect-[3/4] flex items-center justify-center text-gray-400 text-sm">
{detecting ? 'Erkenne Spalten...' : 'Keine Daten'}
</div>
)}
</div>
</div>
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Entzerrtes Bild
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={dewarpedUrl}
alt="Entzerrt"
className="w-full h-auto"
/>
</div>
</div>
</div>
)}
{/* Controls */}
{viewMode === 'normal' && (
<ColumnControls
columnResult={columnResult}
onRerun={handleRerun}
onManualMode={() => setViewMode('manual')}
onGtMode={() => setViewMode('ground-truth')}
onGroundTruth={handleGroundTruth}
onNext={onNext}
isDetecting={detecting}
savedGtColumns={savedGtColumns}
/>
)}
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}

View File

@@ -1,19 +0,0 @@
'use client'
export function StepCoordinates() {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4">📍</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 5: Koordinatenzuweisung
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Exakte Positionszuweisung fuer jedes Wort auf der Seite.
Dieser Schritt wird in einer zukuenftigen Version implementiert.
</p>
<div className="mt-6 px-4 py-2 bg-amber-100 dark:bg-amber-900/30 text-amber-700 dark:text-amber-400 rounded-full text-sm font-medium">
Kommt bald
</div>
</div>
)
}

View File

@@ -1,277 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import type { DeskewGroundTruth, DeskewResult, SessionInfo } from '@/app/(admin)/ai/ocr-pipeline/types'
import { DeskewControls } from './DeskewControls'
import { ImageCompareView } from './ImageCompareView'
const KLAUSUR_API = '/klausur-api'
interface StepDeskewProps {
sessionId?: string | null
onNext: (sessionId: string) => void
}
export function StepDeskew({ sessionId: existingSessionId, onNext }: StepDeskewProps) {
const [session, setSession] = useState<SessionInfo | null>(null)
const [deskewResult, setDeskewResult] = useState<DeskewResult | null>(null)
const [uploading, setUploading] = useState(false)
const [deskewing, setDeskewing] = useState(false)
const [applying, setApplying] = useState(false)
const [showBinarized, setShowBinarized] = useState(false)
const [showGrid, setShowGrid] = useState(true)
const [error, setError] = useState<string | null>(null)
const [dragOver, setDragOver] = useState(false)
const [sessionName, setSessionName] = useState('')
// Reload session data when navigating back from a later step
useEffect(() => {
if (!existingSessionId || session) return
const loadSession = async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${existingSessionId}`)
if (!res.ok) return
const data = await res.json()
const sessionInfo: SessionInfo = {
session_id: data.session_id,
filename: data.filename,
image_width: data.image_width,
image_height: data.image_height,
original_image_url: `${KLAUSUR_API}${data.original_image_url}`,
}
setSession(sessionInfo)
// Reconstruct deskew result from session data
if (data.deskew_result) {
const dr: DeskewResult = {
...data.deskew_result,
deskewed_image_url: `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${existingSessionId}/image/deskewed`,
binarized_image_url: `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${existingSessionId}/image/binarized`,
}
setDeskewResult(dr)
}
} catch (e) {
console.error('Failed to reload session:', e)
}
}
loadSession()
}, [existingSessionId, session])
const handleUpload = useCallback(async (file: File) => {
setUploading(true)
setError(null)
setDeskewResult(null)
try {
const formData = new FormData()
formData.append('file', file)
if (sessionName.trim()) {
formData.append('name', sessionName.trim())
}
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`, {
method: 'POST',
body: formData,
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
throw new Error(err.detail || 'Upload fehlgeschlagen')
}
const data: SessionInfo = await res.json()
// Prepend API prefix to relative URLs
data.original_image_url = `${KLAUSUR_API}${data.original_image_url}`
setSession(data)
// Auto-trigger deskew
setDeskewing(true)
const deskewRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${data.session_id}/deskew`, {
method: 'POST',
})
if (!deskewRes.ok) {
throw new Error('Begradigung fehlgeschlagen')
}
const deskewData: DeskewResult = await deskewRes.json()
deskewData.deskewed_image_url = `${KLAUSUR_API}${deskewData.deskewed_image_url}`
deskewData.binarized_image_url = `${KLAUSUR_API}${deskewData.binarized_image_url}`
setDeskewResult(deskewData)
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
} finally {
setUploading(false)
setDeskewing(false)
}
}, [])
const handleManualDeskew = useCallback(async (angle: number) => {
if (!session) return
setApplying(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${session.session_id}/deskew/manual`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ angle }),
})
if (!res.ok) throw new Error('Manuelle Begradigung fehlgeschlagen')
const data = await res.json()
setDeskewResult((prev) =>
prev
? {
...prev,
angle_applied: data.angle_applied,
method_used: data.method_used,
// Force reload by appending timestamp
deskewed_image_url: `${KLAUSUR_API}${data.deskewed_image_url}?t=${Date.now()}`,
}
: null,
)
} catch (e) {
setError(e instanceof Error ? e.message : 'Fehler')
} finally {
setApplying(false)
}
}, [session])
const handleGroundTruth = useCallback(async (gt: DeskewGroundTruth) => {
if (!session) return
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${session.session_id}/ground-truth/deskew`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
} catch (e) {
console.error('Ground truth save failed:', e)
}
}, [session])
const handleDrop = useCallback((e: React.DragEvent) => {
e.preventDefault()
setDragOver(false)
const file = e.dataTransfer.files[0]
if (file) handleUpload(file)
}, [handleUpload])
const handleFileInput = useCallback((e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0]
if (file) handleUpload(file)
}, [handleUpload])
// Upload area (no session yet)
if (!session) {
return (
<div className="space-y-4">
{/* Session name input */}
<div>
<label className="block text-sm font-medium text-gray-600 dark:text-gray-400 mb-1">
Session-Name (optional)
</label>
<input
type="text"
value={sessionName}
onChange={(e) => setSessionName(e.target.value)}
placeholder="z.B. Unit 3 Seite 42"
className="w-full max-w-sm px-3 py-2 text-sm border rounded-lg dark:bg-gray-800 dark:border-gray-600 dark:text-gray-200 focus:outline-none focus:ring-2 focus:ring-teal-500"
/>
</div>
<div
onDragOver={(e) => { e.preventDefault(); setDragOver(true) }}
onDragLeave={() => setDragOver(false)}
onDrop={handleDrop}
className={`border-2 border-dashed rounded-xl p-12 text-center transition-colors ${
dragOver
? 'border-teal-400 bg-teal-50 dark:bg-teal-900/20'
: 'border-gray-300 dark:border-gray-600 hover:border-teal-400'
}`}
>
{uploading ? (
<div className="text-gray-500">
<div className="animate-spin inline-block w-8 h-8 border-2 border-teal-500 border-t-transparent rounded-full mb-3" />
<p>Wird hochgeladen...</p>
</div>
) : (
<>
<div className="text-4xl mb-3">📄</div>
<p className="text-gray-600 dark:text-gray-400 mb-2">
PDF oder Bild hierher ziehen
</p>
<p className="text-sm text-gray-400 mb-4">oder</p>
<label className="inline-block px-4 py-2 bg-teal-600 text-white rounded-lg cursor-pointer hover:bg-teal-700 transition-colors">
Datei auswaehlen
<input
type="file"
accept=".pdf,.png,.jpg,.jpeg,.tiff,.tif"
onChange={handleFileInput}
className="hidden"
/>
</label>
</>
)}
</div>
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}
// Session active: show comparison + controls
return (
<div className="space-y-4">
{/* Filename */}
<div className="text-sm text-gray-500 dark:text-gray-400">
Datei: <span className="font-medium text-gray-700 dark:text-gray-300">{session.filename}</span>
{' '}({session.image_width} x {session.image_height} px)
</div>
{/* Loading indicator */}
{deskewing && (
<div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
<div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
Begradigung laeuft (beide Methoden)...
</div>
)}
{/* Image comparison */}
<ImageCompareView
originalUrl={session.original_image_url}
deskewedUrl={deskewResult?.deskewed_image_url ?? null}
showGrid={showGrid}
showBinarized={showBinarized}
binarizedUrl={deskewResult?.binarized_image_url ?? null}
/>
{/* Controls */}
<DeskewControls
deskewResult={deskewResult}
showBinarized={showBinarized}
onToggleBinarized={() => setShowBinarized((v) => !v)}
showGrid={showGrid}
onToggleGrid={() => setShowGrid((v) => !v)}
onManualDeskew={handleManualDeskew}
onGroundTruth={handleGroundTruth}
onNext={() => session && onNext(session.session_id)}
isApplying={applying}
/>
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}

View File

@@ -1,151 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import type { DewarpResult, DewarpGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
import { DewarpControls } from './DewarpControls'
import { ImageCompareView } from './ImageCompareView'
const KLAUSUR_API = '/klausur-api'
interface StepDewarpProps {
sessionId: string | null
onNext: () => void
}
export function StepDewarp({ sessionId, onNext }: StepDewarpProps) {
const [dewarpResult, setDewarpResult] = useState<DewarpResult | null>(null)
const [dewarping, setDewarping] = useState(false)
const [applying, setApplying] = useState(false)
const [showGrid, setShowGrid] = useState(true)
const [error, setError] = useState<string | null>(null)
// Auto-trigger dewarp when component mounts with a sessionId
useEffect(() => {
if (!sessionId || dewarpResult) return
const runDewarp = async () => {
setDewarping(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/dewarp`, {
method: 'POST',
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
throw new Error(err.detail || 'Entzerrung fehlgeschlagen')
}
const data: DewarpResult = await res.json()
data.dewarped_image_url = `${KLAUSUR_API}${data.dewarped_image_url}`
setDewarpResult(data)
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
} finally {
setDewarping(false)
}
}
runDewarp()
}, [sessionId, dewarpResult])
const handleManualDewarp = useCallback(async (shearDegrees: number) => {
if (!sessionId) return
setApplying(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/dewarp/manual`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ shear_degrees: shearDegrees }),
})
if (!res.ok) throw new Error('Manuelle Entzerrung fehlgeschlagen')
const data = await res.json()
setDewarpResult((prev) =>
prev
? {
...prev,
method_used: data.method_used,
shear_degrees: data.shear_degrees,
dewarped_image_url: `${KLAUSUR_API}${data.dewarped_image_url}?t=${Date.now()}`,
}
: null,
)
} catch (e) {
setError(e instanceof Error ? e.message : 'Fehler')
} finally {
setApplying(false)
}
}, [sessionId])
const handleGroundTruth = useCallback(async (gt: DewarpGroundTruth) => {
if (!sessionId) return
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/dewarp`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
} catch (e) {
console.error('Ground truth save failed:', e)
}
}, [sessionId])
if (!sessionId) {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4">🔧</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 2: Entzerrung (Dewarp)
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Bitte zuerst Schritt 1 (Begradigung) abschliessen.
</p>
</div>
)
}
const deskewedUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/deskewed`
const dewarpedUrl = dewarpResult?.dewarped_image_url ?? null
return (
<div className="space-y-4">
{/* Loading indicator */}
{dewarping && (
<div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
<div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
Entzerrung laeuft (beide Methoden)...
</div>
)}
{/* Image comparison: deskewed (left) vs dewarped (right) */}
<ImageCompareView
originalUrl={deskewedUrl}
deskewedUrl={dewarpedUrl}
showGrid={showGrid}
showGridLeft={showGrid}
showBinarized={false}
binarizedUrl={null}
leftLabel={`Begradigt (nach Deskew)${showGrid ? ' + Raster' : ''}`}
rightLabel={`Entzerrt${showGrid ? ' + Raster (mm)' : ''}`}
/>
{/* Controls */}
<DewarpControls
dewarpResult={dewarpResult}
showGrid={showGrid}
onToggleGrid={() => setShowGrid((v) => !v)}
onManualDewarp={handleManualDewarp}
onGroundTruth={handleGroundTruth}
onNext={onNext}
isApplying={applying}
/>
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}

View File

@@ -1,19 +0,0 @@
'use client'
export function StepGroundTruth() {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4"></div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 7: Ground Truth Validierung
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Gesamtpruefung der rekonstruierten Seite gegen das Original.
Dieser Schritt wird in einer zukuenftigen Version implementiert.
</p>
<div className="mt-6 px-4 py-2 bg-amber-100 dark:bg-amber-900/30 text-amber-700 dark:text-amber-400 rounded-full text-sm font-medium">
Kommt bald
</div>
</div>
)
}

View File

@@ -1,707 +0,0 @@
'use client'
import { useCallback, useEffect, useRef, useState } from 'react'
import type { GridResult, WordEntry, ColumnMeta } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
interface LlmChange {
row_index: number
field: 'english' | 'german' | 'example'
old: string
new: string
}
interface StepLlmReviewProps {
sessionId: string | null
onNext: () => void
}
interface ReviewMeta {
total_entries: number
to_review: number
skipped: number
model: string
skipped_indices?: number[]
}
interface StreamProgress {
current: number
total: number
}
const FIELD_LABELS: Record<string, string> = {
english: 'EN',
german: 'DE',
example: 'Beispiel',
source_page: 'Seite',
marker: 'Marker',
}
/** Map column type to WordEntry field name */
const COL_TYPE_TO_FIELD: Record<string, string> = {
column_en: 'english',
column_de: 'german',
column_example: 'example',
page_ref: 'source_page',
column_marker: 'marker',
}
/** Column type → color class */
const COL_TYPE_COLOR: Record<string, string> = {
column_en: 'text-blue-600 dark:text-blue-400',
column_de: 'text-green-600 dark:text-green-400',
column_example: 'text-orange-600 dark:text-orange-400',
page_ref: 'text-cyan-600 dark:text-cyan-400',
column_marker: 'text-gray-500 dark:text-gray-400',
}
type RowStatus = 'pending' | 'active' | 'reviewed' | 'corrected' | 'skipped'
export function StepLlmReview({ sessionId, onNext }: StepLlmReviewProps) {
// Core state
const [status, setStatus] = useState<'idle' | 'loading' | 'ready' | 'running' | 'done' | 'error' | 'applied'>('idle')
const [meta, setMeta] = useState<ReviewMeta | null>(null)
const [changes, setChanges] = useState<LlmChange[]>([])
const [progress, setProgress] = useState<StreamProgress | null>(null)
const [totalDuration, setTotalDuration] = useState(0)
const [error, setError] = useState('')
const [accepted, setAccepted] = useState<Set<number>>(new Set())
const [applying, setApplying] = useState(false)
// Full vocab table state
const [vocabEntries, setVocabEntries] = useState<WordEntry[]>([])
const [columnsUsed, setColumnsUsed] = useState<ColumnMeta[]>([])
const [activeRowIndices, setActiveRowIndices] = useState<Set<number>>(new Set())
const [reviewedRows, setReviewedRows] = useState<Set<number>>(new Set())
const [skippedRows, setSkippedRows] = useState<Set<number>>(new Set())
const [correctedMap, setCorrectedMap] = useState<Map<number, LlmChange[]>>(new Map())
// Image
const [imageNaturalSize, setImageNaturalSize] = useState<{ w: number; h: number } | null>(null)
const tableRef = useRef<HTMLDivElement>(null)
const activeRowRef = useRef<HTMLTableRowElement>(null)
// Load session data on mount
useEffect(() => {
if (!sessionId) return
loadSessionData()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
const loadSessionData = async () => {
if (!sessionId) return
setStatus('loading')
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
if (!res.ok) throw new Error(`HTTP ${res.status}`)
const data = await res.json()
const wordResult: GridResult | undefined = data.word_result
if (!wordResult) {
setError('Keine Worterkennungsdaten gefunden. Bitte zuerst Schritt 5 abschliessen.')
setStatus('error')
return
}
const entries = wordResult.vocab_entries || wordResult.entries || []
setVocabEntries(entries)
setColumnsUsed(wordResult.columns_used || [])
// Check if LLM review was already run
const llmReview = wordResult.llm_review
if (llmReview && llmReview.changes) {
const existingChanges: LlmChange[] = llmReview.changes as LlmChange[]
setChanges(existingChanges)
setTotalDuration(llmReview.duration_ms || 0)
// Mark all rows as reviewed
const allReviewed = new Set(entries.map((_: WordEntry, i: number) => i))
setReviewedRows(allReviewed)
// Build corrected map
const cMap = new Map<number, LlmChange[]>()
for (const c of existingChanges) {
const existing = cMap.get(c.row_index) || []
existing.push(c)
cMap.set(c.row_index, existing)
}
setCorrectedMap(cMap)
// Default: all accepted
setAccepted(new Set(existingChanges.map((_: LlmChange, i: number) => i)))
setMeta({
total_entries: entries.length,
to_review: llmReview.entries_corrected !== undefined ? entries.length : entries.length,
skipped: 0,
model: llmReview.model_used || 'unknown',
})
setStatus('done')
} else {
setStatus('ready')
}
} catch (e: unknown) {
setError(e instanceof Error ? e.message : String(e))
setStatus('error')
}
}
const runReview = useCallback(async () => {
if (!sessionId) return
setStatus('running')
setError('')
setChanges([])
setProgress(null)
setMeta(null)
setTotalDuration(0)
setActiveRowIndices(new Set())
setReviewedRows(new Set())
setSkippedRows(new Set())
setCorrectedMap(new Map())
try {
const res = await fetch(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/llm-review?stream=true`,
{ method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({}) },
)
if (!res.ok) {
const data = await res.json().catch(() => ({}))
throw new Error(data.detail || `HTTP ${res.status}`)
}
const reader = res.body!.getReader()
const decoder = new TextDecoder()
let buffer = ''
let allChanges: LlmChange[] = []
let allReviewed = new Set<number>()
let allSkipped = new Set<number>()
let cMap = new Map<number, LlmChange[]>()
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
while (buffer.includes('\n\n')) {
const idx = buffer.indexOf('\n\n')
const chunk = buffer.slice(0, idx).trim()
buffer = buffer.slice(idx + 2)
if (!chunk.startsWith('data: ')) continue
const dataStr = chunk.slice(6)
let event: any
try { event = JSON.parse(dataStr) } catch { continue }
if (event.type === 'meta') {
setMeta({
total_entries: event.total_entries,
to_review: event.to_review,
skipped: event.skipped,
model: event.model,
skipped_indices: event.skipped_indices,
})
// Mark skipped rows
if (event.skipped_indices) {
allSkipped = new Set(event.skipped_indices)
setSkippedRows(allSkipped)
}
}
if (event.type === 'batch') {
const batchChanges: LlmChange[] = event.changes || []
const batchRows: number[] = event.entries_reviewed || []
// Update active rows (currently being reviewed)
setActiveRowIndices(new Set(batchRows))
// Accumulate changes
allChanges = [...allChanges, ...batchChanges]
setChanges(allChanges)
setProgress(event.progress)
// Update corrected map
for (const c of batchChanges) {
const existing = cMap.get(c.row_index) || []
existing.push(c)
cMap.set(c.row_index, [...existing])
}
setCorrectedMap(new Map(cMap))
// Mark batch rows as reviewed
for (const r of batchRows) {
allReviewed.add(r)
}
setReviewedRows(new Set(allReviewed))
// Scroll to active row in table
setTimeout(() => {
activeRowRef.current?.scrollIntoView({ behavior: 'smooth', block: 'center' })
}, 50)
}
if (event.type === 'complete') {
setActiveRowIndices(new Set())
setTotalDuration(event.duration_ms)
setAccepted(new Set(allChanges.map((_: LlmChange, i: number) => i)))
// Mark all non-skipped as reviewed
const allEntryIndices = vocabEntries.map((_: WordEntry, i: number) => i)
for (const i of allEntryIndices) {
if (!allSkipped.has(i)) allReviewed.add(i)
}
setReviewedRows(new Set(allReviewed))
setStatus('done')
}
if (event.type === 'error') {
throw new Error(event.detail || 'Unbekannter Fehler')
}
}
}
// If stream ended without complete event
if (allChanges.length === 0) {
setStatus('done')
}
} catch (e: unknown) {
const msg = e instanceof Error ? e.message : String(e)
setError(msg)
setStatus('error')
}
}, [sessionId, vocabEntries])
const toggleChange = (index: number) => {
setAccepted(prev => {
const next = new Set(prev)
if (next.has(index)) next.delete(index)
else next.add(index)
return next
})
}
const toggleAll = () => {
if (accepted.size === changes.length) {
setAccepted(new Set())
} else {
setAccepted(new Set(changes.map((_: LlmChange, i: number) => i)))
}
}
const applyChanges = useCallback(async () => {
if (!sessionId) return
setApplying(true)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/llm-review/apply`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ accepted_indices: Array.from(accepted) }),
})
if (!res.ok) {
const data = await res.json().catch(() => ({}))
throw new Error(data.detail || `HTTP ${res.status}`)
}
setStatus('applied')
} catch (e: unknown) {
setError(e instanceof Error ? e.message : String(e))
} finally {
setApplying(false)
}
}, [sessionId, accepted])
const getRowStatus = (rowIndex: number): RowStatus => {
if (activeRowIndices.has(rowIndex)) return 'active'
if (skippedRows.has(rowIndex)) return 'skipped'
if (correctedMap.has(rowIndex)) return 'corrected'
if (reviewedRows.has(rowIndex)) return 'reviewed'
return 'pending'
}
const dewarpedUrl = sessionId
? `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
: ''
if (!sessionId) {
return <div className="text-center py-12 text-gray-400">Bitte zuerst eine Session auswaehlen.</div>
}
// --- Loading session data ---
if (status === 'loading' || status === 'idle') {
return (
<div className="flex items-center gap-3 justify-center py-12">
<div className="animate-spin rounded-full h-5 w-5 border-b-2 border-teal-500" />
<span className="text-gray-500">Session-Daten werden geladen...</span>
</div>
)
}
// --- Error ---
if (status === 'error') {
return (
<div className="flex flex-col items-center justify-center py-12 text-center">
<div className="text-5xl mb-4"></div>
<h3 className="text-lg font-medium text-red-600 dark:text-red-400 mb-2">Fehler bei OCR-Zeichenkorrektur</h3>
<p className="text-sm text-gray-500 dark:text-gray-400 max-w-lg mb-4">{error}</p>
<div className="flex gap-3">
<button onClick={() => { setError(''); loadSessionData() }}
className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm">
Erneut versuchen
</button>
<button onClick={onNext}
className="px-5 py-2 bg-gray-200 dark:bg-gray-700 text-gray-700 dark:text-gray-300 rounded-lg hover:bg-gray-300 dark:hover:bg-gray-600 transition-colors text-sm">
Ueberspringen
</button>
</div>
</div>
)
}
// --- Applied ---
if (status === 'applied') {
return (
<div className="flex flex-col items-center justify-center py-12 text-center">
<div className="text-5xl mb-4"></div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">Korrekturen uebernommen</h3>
<p className="text-sm text-gray-500 dark:text-gray-400 mb-6">
{accepted.size} von {changes.length} Korrekturen wurden angewendet.
</p>
<button onClick={onNext}
className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium">
Weiter
</button>
</div>
)
}
// Active entry for highlighting on image
const activeEntry = vocabEntries.find((_: WordEntry, i: number) => activeRowIndices.has(i))
const pct = progress ? Math.round((progress.current / progress.total) * 100) : 0
// --- Ready / Running / Done: 2-column layout ---
return (
<div className="space-y-4">
{/* Header */}
<div className="flex items-center justify-between">
<div>
<h3 className="text-base font-medium text-gray-700 dark:text-gray-300">
Schritt 6: Korrektur
</h3>
<p className="text-xs text-gray-400 mt-0.5">
{status === 'ready' && `${vocabEntries.length} Eintraege bereit zur Pruefung`}
{status === 'running' && meta && `${meta.model} · ${meta.to_review} zu pruefen, ${meta.skipped} uebersprungen`}
{status === 'done' && (
<>
{changes.length} Korrektur{changes.length !== 1 ? 'en' : ''} gefunden
{meta && <> · {meta.skipped} uebersprungen</>}
{' '}· {totalDuration}ms · {meta?.model}
</>
)}
</p>
</div>
<div className="flex items-center gap-2">
{status === 'ready' && (
<button onClick={runReview}
className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium">
Korrektur starten
</button>
)}
{status === 'running' && (
<div className="flex items-center gap-2 text-sm text-teal-600 dark:text-teal-400">
<div className="animate-spin rounded-full h-4 w-4 border-b-2 border-teal-500" />
{progress ? `${progress.current}/${progress.total}` : 'Startet...'}
</div>
)}
{status === 'done' && changes.length > 0 && (
<button onClick={toggleAll}
className="text-xs px-3 py-1.5 border border-gray-300 dark:border-gray-600 rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 transition-colors text-gray-600 dark:text-gray-400">
{accepted.size === changes.length ? 'Keine' : 'Alle'} auswaehlen
</button>
)}
</div>
</div>
{/* Progress bar (while running) */}
{status === 'running' && progress && (
<div className="space-y-1">
<div className="flex justify-between text-xs text-gray-400">
<span>{progress.current} / {progress.total} Eintraege geprueft</span>
<span>{pct}%</span>
</div>
<div className="w-full bg-gray-200 dark:bg-gray-700 rounded-full h-2">
<div className="bg-teal-500 h-2 rounded-full transition-all duration-500" style={{ width: `${pct}%` }} />
</div>
</div>
)}
{/* 2-column layout: Image + Table */}
<div className="grid grid-cols-3 gap-4">
{/* Left: Dewarped Image with highlight overlay */}
<div className="col-span-1">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Originalbild
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900 relative sticky top-4">
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={dewarpedUrl}
alt="Dewarped"
className="w-full h-auto"
onLoad={(e) => {
const img = e.target as HTMLImageElement
setImageNaturalSize({ w: img.naturalWidth, h: img.naturalHeight })
}}
/>
{/* Highlight overlay for active row */}
{activeEntry?.bbox && (
<div
className="absolute border-2 border-yellow-400 bg-yellow-400/20 pointer-events-none animate-pulse"
style={{
left: `${activeEntry.bbox.x}%`,
top: `${activeEntry.bbox.y}%`,
width: `${activeEntry.bbox.w}%`,
height: `${activeEntry.bbox.h}%`,
}}
/>
)}
</div>
</div>
{/* Right: Full vocabulary table */}
<div className="col-span-2" ref={tableRef}>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Vokabeltabelle ({vocabEntries.length} Eintraege)
</div>
<div className="border border-gray-200 dark:border-gray-700 rounded-lg overflow-hidden">
<div className="max-h-[70vh] overflow-y-auto">
<table className="w-full text-sm">
<thead className="sticky top-0 z-10">
<tr className="bg-gray-50 dark:bg-gray-800 border-b border-gray-200 dark:border-gray-700">
<th className="px-2 py-2 text-left text-gray-500 dark:text-gray-400 font-medium w-10">#</th>
{columnsUsed.length > 0 ? (
columnsUsed.map((col, i) => {
const field = COL_TYPE_TO_FIELD[col.type]
if (!field) return null
return (
<th key={i} className={`px-2 py-2 text-left font-medium ${COL_TYPE_COLOR[col.type] || 'text-gray-500 dark:text-gray-400'}`}>
{FIELD_LABELS[field] || field}
</th>
)
})
) : (
<>
<th className="px-2 py-2 text-left text-gray-500 dark:text-gray-400 font-medium">EN</th>
<th className="px-2 py-2 text-left text-gray-500 dark:text-gray-400 font-medium">DE</th>
<th className="px-2 py-2 text-left text-gray-500 dark:text-gray-400 font-medium">Beispiel</th>
</>
)}
<th className="px-2 py-2 text-center text-gray-500 dark:text-gray-400 font-medium w-16">Status</th>
</tr>
</thead>
<tbody>
{vocabEntries.map((entry, idx) => {
const rowStatus = getRowStatus(idx)
const rowChanges = correctedMap.get(idx)
const rowBg = {
pending: '',
active: 'bg-yellow-50 dark:bg-yellow-900/20',
reviewed: '',
corrected: 'bg-teal-50/50 dark:bg-teal-900/10',
skipped: 'bg-gray-50 dark:bg-gray-800/50',
}[rowStatus]
return (
<tr
key={idx}
ref={rowStatus === 'active' ? activeRowRef : undefined}
className={`border-b border-gray-100 dark:border-gray-700/50 ${rowBg} ${
rowStatus === 'active' ? 'ring-1 ring-yellow-400 ring-inset' : ''
}`}
>
<td className="px-2 py-1.5 text-gray-400 font-mono text-xs">{idx}</td>
{columnsUsed.length > 0 ? (
columnsUsed.map((col, i) => {
const field = COL_TYPE_TO_FIELD[col.type]
if (!field) return null
const text = (entry as Record<string, unknown>)[field] as string || ''
return (
<td key={i} className="px-2 py-1.5 text-xs">
<CellContent text={text} field={field} rowChanges={rowChanges} />
</td>
)
})
) : (
<>
<td className="px-2 py-1.5">
<CellContent text={entry.english} field="english" rowChanges={rowChanges} />
</td>
<td className="px-2 py-1.5">
<CellContent text={entry.german} field="german" rowChanges={rowChanges} />
</td>
<td className="px-2 py-1.5 text-xs">
<CellContent text={entry.example} field="example" rowChanges={rowChanges} />
</td>
</>
)}
<td className="px-2 py-1.5 text-center">
<StatusIcon status={rowStatus} />
</td>
</tr>
)
})}
</tbody>
</table>
</div>
</div>
</div>
</div>
{/* Done state: summary + actions */}
{status === 'done' && (
<div className="space-y-4">
{/* Summary */}
<div className="bg-gray-50 dark:bg-gray-800/50 rounded-lg p-3 text-xs text-gray-500 dark:text-gray-400">
{changes.length === 0 ? (
<span>Keine Korrekturen noetig alle Eintraege sind korrekt.</span>
) : (
<span>
{changes.length} Korrektur{changes.length !== 1 ? 'en' : ''} gefunden ·{' '}
{accepted.size} ausgewaehlt ·{' '}
{meta?.skipped || 0} uebersprungen (Lautschrift) ·{' '}
{totalDuration}ms
</span>
)}
</div>
{/* Corrections detail list (if any) */}
{changes.length > 0 && (
<div className="border border-gray-200 dark:border-gray-700 rounded-lg overflow-hidden">
<div className="bg-gray-50 dark:bg-gray-800 px-3 py-2 border-b border-gray-200 dark:border-gray-700">
<span className="text-xs font-medium text-gray-600 dark:text-gray-400">
Korrekturvorschlaege ({accepted.size}/{changes.length} ausgewaehlt)
</span>
</div>
<table className="w-full text-sm">
<thead>
<tr className="bg-gray-50/50 dark:bg-gray-800/50 border-b border-gray-200 dark:border-gray-700">
<th className="w-10 px-3 py-1.5 text-center">
<input type="checkbox" checked={accepted.size === changes.length} onChange={toggleAll}
className="rounded border-gray-300 dark:border-gray-600" />
</th>
<th className="px-2 py-1.5 text-left text-gray-500 dark:text-gray-400 font-medium text-xs">Zeile</th>
<th className="px-2 py-1.5 text-left text-gray-500 dark:text-gray-400 font-medium text-xs">Feld</th>
<th className="px-2 py-1.5 text-left text-gray-500 dark:text-gray-400 font-medium text-xs">Vorher</th>
<th className="px-2 py-1.5 text-left text-gray-500 dark:text-gray-400 font-medium text-xs">Nachher</th>
</tr>
</thead>
<tbody>
{changes.map((change, idx) => (
<tr key={idx} className={`border-b border-gray-100 dark:border-gray-700/50 ${
accepted.has(idx) ? 'bg-teal-50/50 dark:bg-teal-900/10' : ''
}`}>
<td className="px-3 py-1.5 text-center">
<input type="checkbox" checked={accepted.has(idx)} onChange={() => toggleChange(idx)}
className="rounded border-gray-300 dark:border-gray-600" />
</td>
<td className="px-2 py-1.5 text-gray-500 dark:text-gray-400 font-mono text-xs">R{change.row_index}</td>
<td className="px-2 py-1.5">
<span className="text-xs px-1.5 py-0.5 rounded bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-400">
{FIELD_LABELS[change.field] || change.field}
</span>
</td>
<td className="px-2 py-1.5"><span className="line-through text-red-500 dark:text-red-400 text-xs">{change.old}</span></td>
<td className="px-2 py-1.5"><span className="text-green-600 dark:text-green-400 font-medium text-xs">{change.new}</span></td>
</tr>
))}
</tbody>
</table>
</div>
)}
{/* Actions */}
<div className="flex items-center justify-between pt-2">
<p className="text-xs text-gray-400">
{changes.length > 0 ? `${accepted.size} von ${changes.length} ausgewaehlt` : ''}
</p>
<div className="flex gap-3">
{changes.length > 0 && (
<button onClick={onNext}
className="px-4 py-2 text-sm border border-gray-300 dark:border-gray-600 rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 transition-colors text-gray-600 dark:text-gray-400">
Alle ablehnen
</button>
)}
{changes.length > 0 ? (
<button onClick={applyChanges} disabled={applying || accepted.size === 0}
className="px-5 py-2 text-sm bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors font-medium">
{applying ? 'Wird uebernommen...' : `${accepted.size} Korrektur${accepted.size !== 1 ? 'en' : ''} uebernehmen`}
</button>
) : (
<button onClick={onNext}
className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium">
Weiter
</button>
)}
</div>
</div>
</div>
)}
</div>
)
}
/** Cell content with inline diff for corrections */
function CellContent({ text, field, rowChanges }: {
text: string
field: string
rowChanges?: LlmChange[]
}) {
const change = rowChanges?.find(c => c.field === field)
if (!text && !change) {
return <span className="text-gray-300 dark:text-gray-600">&mdash;</span>
}
if (change) {
return (
<span>
<span className="line-through text-red-400 dark:text-red-500 text-xs mr-1">{change.old}</span>
<span className="text-green-600 dark:text-green-400 font-medium text-xs">{change.new}</span>
</span>
)
}
return <span className="text-gray-700 dark:text-gray-300 text-xs">{text}</span>
}
/** Status icon for each row */
function StatusIcon({ status }: { status: RowStatus }) {
switch (status) {
case 'pending':
return <span className="text-gray-300 dark:text-gray-600 text-xs"></span>
case 'active':
return (
<span className="inline-block w-3 h-3 rounded-full bg-yellow-400 animate-pulse" title="Wird geprueft" />
)
case 'reviewed':
return (
<svg className="w-4 h-4 text-green-500 inline-block" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M5 13l4 4L19 7" />
</svg>
)
case 'corrected':
return (
<span className="inline-flex items-center px-1.5 py-0.5 rounded text-[10px] font-medium bg-teal-100 dark:bg-teal-900/30 text-teal-700 dark:text-teal-400">
korr.
</span>
)
case 'skipped':
return (
<span className="inline-flex items-center px-1.5 py-0.5 rounded text-[10px] font-medium bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400">
skip
</span>
)
}
}

View File

@@ -1,559 +0,0 @@
'use client'
import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
import dynamic from 'next/dynamic'
import type { GridResult, GridCell, WordEntry } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
// Lazy-load Fabric.js canvas editor (SSR-incompatible)
const FabricReconstructionCanvas = dynamic(
() => import('./FabricReconstructionCanvas').then(m => ({ default: m.FabricReconstructionCanvas })),
{ ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Editor wird geladen...</div> }
)
type EditorMode = 'simple' | 'editor'
interface StepReconstructionProps {
sessionId: string | null
onNext: () => void
}
interface EditableCell {
cellId: string
text: string
originalText: string
bboxPct: { x: number; y: number; w: number; h: number }
colType: string
rowIndex: number
colIndex: number
}
type UndoAction = { cellId: string; oldText: string; newText: string }
export function StepReconstruction({ sessionId, onNext }: StepReconstructionProps) {
const [status, setStatus] = useState<'loading' | 'ready' | 'saving' | 'saved' | 'error'>('loading')
const [error, setError] = useState('')
const [cells, setCells] = useState<EditableCell[]>([])
const [gridCells, setGridCells] = useState<GridCell[]>([])
const [editorMode, setEditorMode] = useState<EditorMode>('simple')
const [editedTexts, setEditedTexts] = useState<Map<string, string>>(new Map())
const [zoom, setZoom] = useState(100)
const [imageNaturalH, setImageNaturalH] = useState(0)
const [showEmptyHighlight, setShowEmptyHighlight] = useState(true)
// Undo/Redo stacks
const [undoStack, setUndoStack] = useState<UndoAction[]>([])
const [redoStack, setRedoStack] = useState<UndoAction[]>([])
// (allCells removed — cells now contains all cells including empty ones)
const containerRef = useRef<HTMLDivElement>(null)
const imageRef = useRef<HTMLImageElement>(null)
// Load session data on mount
useEffect(() => {
if (!sessionId) return
loadSessionData()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
// Track image natural height for font scaling
const handleImageLoad = useCallback(() => {
if (imageRef.current) {
setImageNaturalH(imageRef.current.naturalHeight)
}
}, [])
const loadSessionData = async () => {
if (!sessionId) return
setStatus('loading')
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
if (!res.ok) throw new Error(`HTTP ${res.status}`)
const data = await res.json()
const wordResult: GridResult | undefined = data.word_result
if (!wordResult) {
setError('Keine Worterkennungsdaten gefunden. Bitte zuerst Schritt 5 abschliessen.')
setStatus('error')
return
}
// Build editable cells from grid cells
const rawGridCells: GridCell[] = wordResult.cells || []
setGridCells(rawGridCells)
const allEditableCells: EditableCell[] = rawGridCells.map(c => ({
cellId: c.cell_id,
text: c.text,
originalText: c.text,
bboxPct: c.bbox_pct,
colType: c.col_type,
rowIndex: c.row_index,
colIndex: c.col_index,
}))
setCells(allEditableCells)
setEditedTexts(new Map())
setUndoStack([])
setRedoStack([])
setStatus('ready')
} catch (e: unknown) {
setError(e instanceof Error ? e.message : String(e))
setStatus('error')
}
}
const handleTextChange = useCallback((cellId: string, newText: string) => {
setEditedTexts(prev => {
const oldText = prev.get(cellId)
const cell = cells.find(c => c.cellId === cellId)
const prevText = oldText ?? cell?.text ?? ''
// Push to undo stack
setUndoStack(stack => [...stack, { cellId, oldText: prevText, newText }])
setRedoStack([]) // Clear redo on new edit
const next = new Map(prev)
next.set(cellId, newText)
return next
})
}, [cells])
const undo = useCallback(() => {
setUndoStack(stack => {
if (stack.length === 0) return stack
const action = stack[stack.length - 1]
const newStack = stack.slice(0, -1)
setRedoStack(rs => [...rs, action])
setEditedTexts(prev => {
const next = new Map(prev)
next.set(action.cellId, action.oldText)
return next
})
return newStack
})
}, [])
const redo = useCallback(() => {
setRedoStack(stack => {
if (stack.length === 0) return stack
const action = stack[stack.length - 1]
const newStack = stack.slice(0, -1)
setUndoStack(us => [...us, action])
setEditedTexts(prev => {
const next = new Map(prev)
next.set(action.cellId, action.newText)
return next
})
return newStack
})
}, [])
const resetCell = useCallback((cellId: string) => {
const cell = cells.find(c => c.cellId === cellId)
if (!cell) return
setEditedTexts(prev => {
const next = new Map(prev)
next.delete(cellId)
return next
})
}, [cells])
// Global keyboard shortcuts for undo/redo
useEffect(() => {
const handler = (e: KeyboardEvent) => {
if ((e.metaKey || e.ctrlKey) && e.key === 'z') {
e.preventDefault()
if (e.shiftKey) {
redo()
} else {
undo()
}
}
}
document.addEventListener('keydown', handler)
return () => document.removeEventListener('keydown', handler)
}, [undo, redo])
const getDisplayText = useCallback((cell: EditableCell): string => {
return editedTexts.get(cell.cellId) ?? cell.text
}, [editedTexts])
const isEdited = useCallback((cell: EditableCell): boolean => {
const edited = editedTexts.get(cell.cellId)
return edited !== undefined && edited !== cell.originalText
}, [editedTexts])
const changedCount = useMemo(() => {
let count = 0
for (const cell of cells) {
if (isEdited(cell)) count++
}
return count
}, [cells, isEdited])
// Identify empty required cells (EN or DE columns with no text)
const emptyCellIds = useMemo(() => {
const required = new Set(['column_en', 'column_de'])
const ids = new Set<string>()
for (const cell of cells) {
if (required.has(cell.colType) && !cell.text.trim()) {
ids.add(cell.cellId)
}
}
return ids
}, [cells])
// Sort cells for tab navigation: by row, then by column
const sortedCellIds = useMemo(() => {
return [...cells]
.sort((a, b) => a.rowIndex !== b.rowIndex ? a.rowIndex - b.rowIndex : a.colIndex - b.colIndex)
.map(c => c.cellId)
}, [cells])
const handleKeyDown = useCallback((e: React.KeyboardEvent, cellId: string) => {
if (e.key === 'Tab') {
e.preventDefault()
const idx = sortedCellIds.indexOf(cellId)
const nextIdx = e.shiftKey ? idx - 1 : idx + 1
if (nextIdx >= 0 && nextIdx < sortedCellIds.length) {
const nextId = sortedCellIds[nextIdx]
const el = document.getElementById(`cell-${nextId}`)
el?.focus()
}
}
}, [sortedCellIds])
const saveReconstruction = useCallback(async () => {
if (!sessionId) return
setStatus('saving')
try {
const cellUpdates = Array.from(editedTexts.entries())
.filter(([cellId, text]) => {
const cell = cells.find(c => c.cellId === cellId)
return cell && text !== cell.originalText
})
.map(([cellId, text]) => ({ cell_id: cellId, text }))
if (cellUpdates.length === 0) {
// Nothing changed, just advance
setStatus('saved')
return
}
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reconstruction`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ cells: cellUpdates }),
})
if (!res.ok) {
const data = await res.json().catch(() => ({}))
throw new Error(data.detail || `HTTP ${res.status}`)
}
setStatus('saved')
} catch (e: unknown) {
setError(e instanceof Error ? e.message : String(e))
setStatus('error')
}
}, [sessionId, editedTexts, cells])
// Handler for Fabric.js editor cell changes
const handleFabricCellsChanged = useCallback((updates: { cell_id: string; text: string }[]) => {
for (const u of updates) {
setEditedTexts(prev => {
const next = new Map(prev)
next.set(u.cell_id, u.text)
return next
})
}
}, [])
const dewarpedUrl = sessionId
? `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
: ''
const colTypeColor = (colType: string): string => {
const colors: Record<string, string> = {
column_en: 'border-blue-400/40 focus:border-blue-500',
column_de: 'border-green-400/40 focus:border-green-500',
column_example: 'border-orange-400/40 focus:border-orange-500',
column_text: 'border-purple-400/40 focus:border-purple-500',
page_ref: 'border-cyan-400/40 focus:border-cyan-500',
column_marker: 'border-gray-400/40 focus:border-gray-500',
}
return colors[colType] || 'border-gray-400/40 focus:border-gray-500'
}
// Font size based on image natural height (not container) scaled by zoom
const getFontSize = useCallback((bboxH: number): number => {
const baseH = imageNaturalH || 800
const px = (bboxH / 100) * baseH * 0.55
return Math.max(8, Math.min(18, px * (zoom / 100)))
}, [imageNaturalH, zoom])
if (!sessionId) {
return <div className="text-center py-12 text-gray-400">Bitte zuerst eine Session auswaehlen.</div>
}
if (status === 'loading') {
return (
<div className="flex items-center gap-3 justify-center py-12">
<div className="animate-spin rounded-full h-5 w-5 border-b-2 border-teal-500" />
<span className="text-gray-500">Rekonstruktionsdaten werden geladen...</span>
</div>
)
}
if (status === 'error') {
return (
<div className="flex flex-col items-center justify-center py-12 text-center">
<div className="text-5xl mb-4">&#x26A0;&#xFE0F;</div>
<h3 className="text-lg font-medium text-red-600 dark:text-red-400 mb-2">Fehler</h3>
<p className="text-sm text-gray-500 dark:text-gray-400 max-w-lg mb-4">{error}</p>
<div className="flex gap-3">
<button onClick={() => { setError(''); loadSessionData() }}
className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm">
Erneut versuchen
</button>
<button onClick={onNext}
className="px-5 py-2 bg-gray-200 dark:bg-gray-700 text-gray-700 dark:text-gray-300 rounded-lg hover:bg-gray-300 dark:hover:bg-gray-600 transition-colors text-sm">
Ueberspringen &rarr;
</button>
</div>
</div>
)
}
if (status === 'saved') {
return (
<div className="flex flex-col items-center justify-center py-12 text-center">
<div className="text-5xl mb-4">&#x2705;</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">Rekonstruktion gespeichert</h3>
<p className="text-sm text-gray-500 dark:text-gray-400 mb-6">
{changedCount > 0 ? `${changedCount} Zellen wurden aktualisiert.` : 'Keine Aenderungen vorgenommen.'}
</p>
<button onClick={onNext}
className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium">
Weiter &rarr;
</button>
</div>
)
}
return (
<div className="space-y-3">
{/* Toolbar */}
<div className="flex items-center justify-between bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 px-3 py-2">
<div className="flex items-center gap-2">
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Schritt 7: Rekonstruktion
</h3>
{/* Mode toggle */}
<div className="flex items-center ml-2 border border-gray-300 dark:border-gray-600 rounded overflow-hidden text-xs">
<button
onClick={() => setEditorMode('simple')}
className={`px-2 py-0.5 transition-colors ${
editorMode === 'simple'
? 'bg-teal-600 text-white'
: 'hover:bg-gray-50 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}
>
Einfach
</button>
<button
onClick={() => setEditorMode('editor')}
className={`px-2 py-0.5 transition-colors ${
editorMode === 'editor'
? 'bg-teal-600 text-white'
: 'hover:bg-gray-50 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}
>
Editor
</button>
</div>
<span className="text-xs text-gray-400">
{cells.length} Zellen &middot; {changedCount} geaendert
{emptyCellIds.size > 0 && showEmptyHighlight && (
<span className="text-red-400 ml-1">&middot; {emptyCellIds.size} leer</span>
)}
</span>
</div>
<div className="flex items-center gap-2">
{/* Undo/Redo */}
<button
onClick={undo}
disabled={undoStack.length === 0}
className="px-2 py-1 text-xs border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700 disabled:opacity-30"
title="Rueckgaengig (Ctrl+Z)"
>
&#x21A9;
</button>
<button
onClick={redo}
disabled={redoStack.length === 0}
className="px-2 py-1 text-xs border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700 disabled:opacity-30"
title="Wiederholen (Ctrl+Shift+Z)"
>
&#x21AA;
</button>
<div className="w-px h-5 bg-gray-300 dark:bg-gray-600 mx-1" />
{/* Empty field toggle */}
<button
onClick={() => setShowEmptyHighlight(v => !v)}
className={`px-2 py-1 text-xs border rounded transition-colors ${
showEmptyHighlight
? 'border-red-300 bg-red-50 text-red-600 dark:border-red-700 dark:bg-red-900/30 dark:text-red-400'
: 'border-gray-300 dark:border-gray-600 hover:bg-gray-50 dark:hover:bg-gray-700'
}`}
title="Leere Pflichtfelder markieren"
>
Leer
</button>
<div className="w-px h-5 bg-gray-300 dark:bg-gray-600 mx-1" />
{/* Zoom controls */}
<button
onClick={() => setZoom(z => Math.max(50, z - 25))}
className="px-2 py-1 text-xs border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700"
>
&minus;
</button>
<span className="text-xs text-gray-500 w-10 text-center">{zoom}%</span>
<button
onClick={() => setZoom(z => Math.min(200, z + 25))}
className="px-2 py-1 text-xs border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700"
>
+
</button>
<button
onClick={() => setZoom(100)}
className="px-2 py-1 text-xs border border-gray-300 dark:border-gray-600 rounded hover:bg-gray-50 dark:hover:bg-gray-700"
>
Fit
</button>
<div className="w-px h-5 bg-gray-300 dark:bg-gray-600 mx-1" />
<button
onClick={saveReconstruction}
disabled={status === 'saving'}
className="px-4 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50 transition-colors font-medium"
>
{status === 'saving' ? 'Speichert...' : 'Speichern'}
</button>
</div>
</div>
{/* Reconstruction canvas — Simple or Editor mode */}
{editorMode === 'editor' && sessionId ? (
<FabricReconstructionCanvas
sessionId={sessionId}
cells={gridCells}
onCellsChanged={handleFabricCellsChanged}
/>
) : (
<div className="border rounded-lg overflow-auto dark:border-gray-700 bg-gray-100 dark:bg-gray-900" style={{ maxHeight: '75vh' }}>
<div
ref={containerRef}
className="relative inline-block"
style={{ transform: `scale(${zoom / 100})`, transformOrigin: 'top left' }}
>
{/* Background image at reduced opacity */}
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
ref={imageRef}
src={dewarpedUrl}
alt="Dewarped"
className="block"
style={{ opacity: 0.3 }}
onLoad={handleImageLoad}
/>
{/* Empty field markers */}
{showEmptyHighlight && cells
.filter(c => emptyCellIds.has(c.cellId))
.map(cell => (
<div
key={`empty-${cell.cellId}`}
className="absolute border-2 border-dashed border-red-400/60 rounded pointer-events-none"
style={{
left: `${cell.bboxPct.x}%`,
top: `${cell.bboxPct.y}%`,
width: `${cell.bboxPct.w}%`,
height: `${cell.bboxPct.h}%`,
}}
/>
))}
{/* Editable text fields at bbox positions */}
{cells.map((cell) => {
const displayText = getDisplayText(cell)
const edited = isEdited(cell)
return (
<div key={cell.cellId} className="absolute group" style={{
left: `${cell.bboxPct.x}%`,
top: `${cell.bboxPct.y}%`,
width: `${cell.bboxPct.w}%`,
height: `${cell.bboxPct.h}%`,
}}>
<input
id={`cell-${cell.cellId}`}
type="text"
value={displayText}
onChange={(e) => handleTextChange(cell.cellId, e.target.value)}
onKeyDown={(e) => handleKeyDown(e, cell.cellId)}
className={`w-full h-full bg-transparent text-black dark:text-white border px-0.5 outline-none transition-colors ${
colTypeColor(cell.colType)
} ${edited ? 'border-green-500 bg-green-50/30 dark:bg-green-900/20' : ''}`}
style={{
fontSize: `${getFontSize(cell.bboxPct.h)}px`,
lineHeight: '1',
}}
title={`${cell.cellId} (${cell.colType})`}
/>
{/* Per-cell reset button (X) — only shown for edited cells on hover */}
{edited && (
<button
onClick={() => resetCell(cell.cellId)}
className="absolute -top-1 -right-1 w-4 h-4 bg-red-500 text-white rounded-full text-[9px] leading-none opacity-0 group-hover:opacity-100 transition-opacity flex items-center justify-center"
title="Zuruecksetzen"
>
&times;
</button>
)}
</div>
)
})}
</div>
</div>
)}
{/* Bottom action */}
<div className="flex justify-end">
<button
onClick={() => {
if (changedCount > 0) {
saveReconstruction()
} else {
onNext()
}
}}
className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium text-sm"
>
{changedCount > 0 ? 'Speichern & Weiter \u2192' : 'Weiter \u2192'}
</button>
</div>
</div>
)
}

View File

@@ -1,263 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import type { RowResult, RowGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
interface StepRowDetectionProps {
sessionId: string | null
onNext: () => void
}
export function StepRowDetection({ sessionId, onNext }: StepRowDetectionProps) {
const [rowResult, setRowResult] = useState<RowResult | null>(null)
const [detecting, setDetecting] = useState(false)
const [error, setError] = useState<string | null>(null)
const [gtNotes, setGtNotes] = useState('')
const [gtSaved, setGtSaved] = useState(false)
useEffect(() => {
if (!sessionId) return
const fetchSession = async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
if (res.ok) {
const info = await res.json()
if (info.row_result) {
setRowResult(info.row_result)
return
}
}
} catch (e) {
console.error('Failed to fetch session info:', e)
}
// No cached result — run auto
runAutoDetection()
}
fetchSession()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
const runAutoDetection = useCallback(async () => {
if (!sessionId) return
setDetecting(true)
setError(null)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/rows`, {
method: 'POST',
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
throw new Error(err.detail || 'Zeilenerkennung fehlgeschlagen')
}
const data: RowResult = await res.json()
setRowResult(data)
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
} finally {
setDetecting(false)
}
}, [sessionId])
const handleGroundTruth = useCallback(async (isCorrect: boolean) => {
if (!sessionId) return
const gt: RowGroundTruth = {
is_correct: isCorrect,
notes: gtNotes || undefined,
}
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/rows`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
setGtSaved(true)
} catch (e) {
console.error('Ground truth save failed:', e)
}
}, [sessionId, gtNotes])
if (!sessionId) {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4">📏</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 4: Zeilenerkennung
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Bitte zuerst Schritte 1-3 abschliessen.
</p>
</div>
)
}
const overlayUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/rows-overlay`
const dewarpedUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
const rowTypeColors: Record<string, string> = {
header: 'bg-gray-200 dark:bg-gray-600 text-gray-700 dark:text-gray-300',
content: 'bg-blue-100 dark:bg-blue-900/30 text-blue-700 dark:text-blue-300',
footer: 'bg-gray-200 dark:bg-gray-600 text-gray-700 dark:text-gray-300',
}
return (
<div className="space-y-4">
{/* Loading */}
{detecting && (
<div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
<div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
Zeilenerkennung laeuft...
</div>
)}
{/* Images: overlay vs clean */}
<div className="grid grid-cols-2 gap-4">
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Mit Zeilen-Overlay
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{rowResult ? (
// eslint-disable-next-line @next/next/no-img-element
<img
src={`${overlayUrl}?t=${Date.now()}`}
alt="Zeilen-Overlay"
className="w-full h-auto"
/>
) : (
<div className="aspect-[3/4] flex items-center justify-center text-gray-400 text-sm">
{detecting ? 'Erkenne Zeilen...' : 'Keine Daten'}
</div>
)}
</div>
</div>
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Entzerrtes Bild
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={dewarpedUrl}
alt="Entzerrt"
className="w-full h-auto"
/>
</div>
</div>
</div>
{/* Row summary */}
{rowResult && (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<div className="flex items-center justify-between">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Ergebnis: {rowResult.total_rows} Zeilen erkannt
</h4>
<span className="text-xs text-gray-400">
{rowResult.duration_seconds}s
</span>
</div>
{/* Type summary badges */}
<div className="flex gap-2">
{Object.entries(rowResult.summary).map(([type, count]) => (
<span
key={type}
className={`px-2 py-0.5 rounded text-xs font-medium ${rowTypeColors[type] || 'bg-gray-100 text-gray-600'}`}
>
{type}: {count}
</span>
))}
</div>
{/* Row list */}
<div className="max-h-64 overflow-y-auto space-y-1">
{rowResult.rows.map((row) => (
<div
key={row.index}
className={`flex items-center gap-3 px-3 py-1.5 rounded text-xs font-mono ${
row.row_type === 'header' || row.row_type === 'footer'
? 'bg-gray-50 dark:bg-gray-700/50 text-gray-500'
: 'text-gray-600 dark:text-gray-400'
}`}
>
<span className="w-8 text-right text-gray-400">R{row.index}</span>
<span className={`px-1.5 py-0.5 rounded text-[10px] uppercase font-semibold ${rowTypeColors[row.row_type] || ''}`}>
{row.row_type}
</span>
<span>y={row.y}</span>
<span>h={row.height}px</span>
<span>{row.word_count} Woerter</span>
{row.gap_before > 0 && (
<span className="text-gray-400">gap={row.gap_before}px</span>
)}
</div>
))}
</div>
</div>
)}
{/* Controls */}
{rowResult && (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<div className="flex items-center gap-3">
<button
onClick={() => runAutoDetection()}
disabled={detecting}
className="px-3 py-1.5 text-xs border rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600 disabled:opacity-50"
>
Erneut erkennen
</button>
<div className="flex-1" />
{/* Ground truth */}
{!gtSaved ? (
<>
<input
type="text"
placeholder="Notizen (optional)"
value={gtNotes}
onChange={(e) => setGtNotes(e.target.value)}
className="px-2 py-1 text-xs border rounded dark:bg-gray-700 dark:border-gray-600 w-48"
/>
<button
onClick={() => handleGroundTruth(true)}
className="px-3 py-1.5 text-xs bg-green-600 text-white rounded-lg hover:bg-green-700"
>
Korrekt
</button>
<button
onClick={() => handleGroundTruth(false)}
className="px-3 py-1.5 text-xs bg-red-600 text-white rounded-lg hover:bg-red-700"
>
Fehlerhaft
</button>
</>
) : (
<span className="text-xs text-green-600 dark:text-green-400">
Ground Truth gespeichert
</span>
)}
<button
onClick={onNext}
className="px-4 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium"
>
Weiter
</button>
</div>
</div>
)}
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}

View File

@@ -1,911 +0,0 @@
'use client'
import { useCallback, useEffect, useRef, useState } from 'react'
import type { GridResult, GridCell, WordEntry, WordGroundTruth } from '@/app/(admin)/ai/ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
/** Render text with \n as line breaks */
function MultilineText({ text }: { text: string }) {
if (!text) return <span className="text-gray-300 dark:text-gray-600">&mdash;</span>
const lines = text.split('\n')
if (lines.length === 1) return <>{text}</>
return <>{lines.map((line, i) => (
<span key={i}>{line}{i < lines.length - 1 && <br />}</span>
))}</>
}
/** Column type → human-readable header */
function colTypeLabel(colType: string): string {
const labels: Record<string, string> = {
column_en: 'English',
column_de: 'Deutsch',
column_example: 'Example',
column_text: 'Text',
column_marker: 'Marker',
page_ref: 'Seite',
}
return labels[colType] || colType.replace('column_', '')
}
/** Column type → color class */
function colTypeColor(colType: string): string {
const colors: Record<string, string> = {
column_en: 'text-blue-600 dark:text-blue-400',
column_de: 'text-green-600 dark:text-green-400',
column_example: 'text-orange-600 dark:text-orange-400',
column_text: 'text-purple-600 dark:text-purple-400',
column_marker: 'text-gray-500 dark:text-gray-400',
}
return colors[colType] || 'text-gray-600 dark:text-gray-400'
}
interface StepWordRecognitionProps {
sessionId: string | null
onNext: () => void
goToStep: (step: number) => void
}
export function StepWordRecognition({ sessionId, onNext, goToStep }: StepWordRecognitionProps) {
const [gridResult, setGridResult] = useState<GridResult | null>(null)
const [detecting, setDetecting] = useState(false)
const [error, setError] = useState<string | null>(null)
const [gtNotes, setGtNotes] = useState('')
const [gtSaved, setGtSaved] = useState(false)
// Step-through labeling state
const [activeIndex, setActiveIndex] = useState(0)
const [editedEntries, setEditedEntries] = useState<WordEntry[]>([])
const [editedCells, setEditedCells] = useState<GridCell[]>([])
const [mode, setMode] = useState<'overview' | 'labeling'>('overview')
const [ocrEngine, setOcrEngine] = useState<'auto' | 'tesseract' | 'rapid'>('auto')
const [usedEngine, setUsedEngine] = useState<string>('')
const [pronunciation, setPronunciation] = useState<'british' | 'american'>('british')
// Streaming progress state
const [streamProgress, setStreamProgress] = useState<{ current: number; total: number } | null>(null)
const enRef = useRef<HTMLInputElement>(null)
const tableEndRef = useRef<HTMLDivElement>(null)
const isVocab = gridResult?.layout === 'vocab'
useEffect(() => {
if (!sessionId) return
// Always run fresh detection — word-lookup is fast (~0.03s)
// and avoids stale cached results from previous pipeline versions.
runAutoDetection()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId])
const applyGridResult = (data: GridResult) => {
setGridResult(data)
setUsedEngine(data.ocr_engine || '')
if (data.layout === 'vocab' && data.entries) {
initEntries(data.entries)
}
if (data.cells) {
setEditedCells(data.cells.map(c => ({ ...c, status: c.status || 'pending' })))
}
}
const initEntries = (entries: WordEntry[]) => {
setEditedEntries(entries.map(e => ({ ...e, status: e.status || 'pending' })))
setActiveIndex(0)
}
const runAutoDetection = useCallback(async (engine?: string) => {
if (!sessionId) return
const eng = engine || ocrEngine
setDetecting(true)
setError(null)
setStreamProgress(null)
setEditedCells([])
setEditedEntries([])
setGridResult(null)
try {
// Retry once if initial request fails (e.g. after container restart,
// session cache may not be warm yet when navigating via wizard)
let res: Response | null = null
for (let attempt = 0; attempt < 2; attempt++) {
res = await fetch(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/words?stream=true&engine=${eng}&pronunciation=${pronunciation}`,
{ method: 'POST' },
)
if (res.ok) break
if (attempt === 0 && (res.status === 400 || res.status === 404)) {
// Wait briefly for cache to warm up, then retry
await new Promise(r => setTimeout(r, 2000))
continue
}
break
}
if (!res || !res.ok) {
const err = await res?.json().catch(() => ({ detail: res?.statusText })) || { detail: 'Worterkennung fehlgeschlagen' }
throw new Error(err.detail || 'Worterkennung fehlgeschlagen')
}
const reader = res.body!.getReader()
const decoder = new TextDecoder()
let buffer = ''
let streamLayout: string | null = null
let streamColumnsUsed: GridResult['columns_used'] = []
let streamGridShape: GridResult['grid_shape'] | null = null
let streamCells: GridCell[] = []
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
// Parse SSE events (separated by \n\n)
while (buffer.includes('\n\n')) {
const idx = buffer.indexOf('\n\n')
const chunk = buffer.slice(0, idx).trim()
buffer = buffer.slice(idx + 2)
if (!chunk.startsWith('data: ')) continue
const dataStr = chunk.slice(6) // strip "data: "
let event: any
try {
event = JSON.parse(dataStr)
} catch {
continue
}
if (event.type === 'meta') {
streamLayout = event.layout || 'generic'
streamGridShape = event.grid_shape || null
// Show partial grid result so UI renders structure
setGridResult(prev => ({
...prev,
layout: event.layout || 'generic',
grid_shape: event.grid_shape,
columns_used: [],
cells: [],
summary: { total_cells: event.grid_shape?.total_cells || 0, non_empty_cells: 0, low_confidence: 0 },
duration_seconds: 0,
ocr_engine: '',
} as GridResult))
}
if (event.type === 'columns') {
streamColumnsUsed = event.columns_used || []
setGridResult(prev => prev ? { ...prev, columns_used: streamColumnsUsed } : prev)
}
if (event.type === 'cell') {
const cell: GridCell = { ...event.cell, status: 'pending' }
streamCells = [...streamCells, cell]
setEditedCells(streamCells)
setStreamProgress(event.progress)
// Auto-scroll table to bottom
setTimeout(() => tableEndRef.current?.scrollIntoView({ behavior: 'smooth', block: 'nearest' }), 16)
}
if (event.type === 'complete') {
// Build final GridResult
const finalResult: GridResult = {
cells: streamCells,
grid_shape: streamGridShape || { rows: 0, cols: 0, total_cells: streamCells.length },
columns_used: streamColumnsUsed,
layout: streamLayout || 'generic',
image_width: 0,
image_height: 0,
duration_seconds: event.duration_seconds || 0,
ocr_engine: event.ocr_engine || '',
summary: event.summary || {},
}
// If vocab: apply post-processed entries from complete event
if (event.vocab_entries) {
finalResult.entries = event.vocab_entries
finalResult.vocab_entries = event.vocab_entries
finalResult.entry_count = event.vocab_entries.length
}
applyGridResult(finalResult)
setUsedEngine(event.ocr_engine || '')
setStreamProgress(null)
}
}
}
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
} finally {
setDetecting(false)
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId, ocrEngine, pronunciation])
const handleGroundTruth = useCallback(async (isCorrect: boolean) => {
if (!sessionId) return
const gt: WordGroundTruth = {
is_correct: isCorrect,
corrected_entries: isCorrect ? undefined : (isVocab ? editedEntries : undefined),
notes: gtNotes || undefined,
}
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/ground-truth/words`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(gt),
})
setGtSaved(true)
} catch (e) {
console.error('Ground truth save failed:', e)
}
}, [sessionId, gtNotes, editedEntries, isVocab])
// Vocab mode: update entry field
const updateEntry = (index: number, field: 'english' | 'german' | 'example', value: string) => {
setEditedEntries(prev => prev.map((e, i) =>
i === index ? { ...e, [field]: value, status: 'edited' as const } : e
))
}
// Generic mode: update cell text
const updateCell = (cellId: string, value: string) => {
setEditedCells(prev => prev.map(c =>
c.cell_id === cellId ? { ...c, text: value, status: 'edited' as const } : c
))
}
// Step-through: confirm current row (always cell-based)
const confirmEntry = () => {
const rowCells = getRowCells(activeIndex)
const cellIds = new Set(rowCells.map(c => c.cell_id))
setEditedCells(prev => prev.map(c =>
cellIds.has(c.cell_id) ? { ...c, status: c.status === 'edited' ? 'edited' : 'confirmed' } : c
))
const maxIdx = getUniqueRowCount() - 1
if (activeIndex < maxIdx) {
setActiveIndex(activeIndex + 1)
}
}
// Step-through: skip current row
const skipEntry = () => {
const rowCells = getRowCells(activeIndex)
const cellIds = new Set(rowCells.map(c => c.cell_id))
setEditedCells(prev => prev.map(c =>
cellIds.has(c.cell_id) ? { ...c, status: 'skipped' as const } : c
))
const maxIdx = getUniqueRowCount() - 1
if (activeIndex < maxIdx) {
setActiveIndex(activeIndex + 1)
}
}
// Helper: get unique row indices from cells
const getUniqueRowCount = () => {
if (!editedCells.length) return 0
return new Set(editedCells.map(c => c.row_index)).size
}
// Helper: get cells for a given row index (by position in sorted unique rows)
const getRowCells = (rowPosition: number) => {
const uniqueRows = [...new Set(editedCells.map(c => c.row_index))].sort((a, b) => a - b)
const rowIdx = uniqueRows[rowPosition]
return editedCells.filter(c => c.row_index === rowIdx)
}
// Focus english input when active entry changes in labeling mode
useEffect(() => {
if (mode === 'labeling' && enRef.current) {
enRef.current.focus()
}
}, [activeIndex, mode])
// Keyboard shortcuts in labeling mode
useEffect(() => {
if (mode !== 'labeling') return
const handler = (e: KeyboardEvent) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault()
confirmEntry()
} else if (e.key === 'ArrowDown' && e.ctrlKey) {
e.preventDefault()
skipEntry()
} else if (e.key === 'ArrowUp' && e.ctrlKey) {
e.preventDefault()
if (activeIndex > 0) setActiveIndex(activeIndex - 1)
}
}
window.addEventListener('keydown', handler)
return () => window.removeEventListener('keydown', handler)
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [mode, activeIndex, editedEntries, editedCells])
if (!sessionId) {
return (
<div className="flex flex-col items-center justify-center py-16 text-center">
<div className="text-5xl mb-4">🔤</div>
<h3 className="text-lg font-medium text-gray-700 dark:text-gray-300 mb-2">
Schritt 5: Worterkennung
</h3>
<p className="text-gray-500 dark:text-gray-400 max-w-md">
Bitte zuerst Schritte 1-4 abschliessen.
</p>
</div>
)
}
const overlayUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/words-overlay`
const dewarpedUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/dewarped`
const confColor = (conf: number) => {
if (conf >= 70) return 'text-green-600 dark:text-green-400'
if (conf >= 50) return 'text-yellow-600 dark:text-yellow-400'
return 'text-red-600 dark:text-red-400'
}
const statusBadge = (status?: string) => {
const map: Record<string, string> = {
pending: 'bg-gray-100 dark:bg-gray-700 text-gray-500',
confirmed: 'bg-green-100 dark:bg-green-900/30 text-green-700 dark:text-green-400',
edited: 'bg-blue-100 dark:bg-blue-900/30 text-blue-700 dark:text-blue-400',
skipped: 'bg-orange-100 dark:bg-orange-900/30 text-orange-700 dark:text-orange-400',
}
return map[status || 'pending'] || map.pending
}
const summary = gridResult?.summary
const columnsUsed = gridResult?.columns_used || []
const gridShape = gridResult?.grid_shape
// Counts for labeling progress (always cell-based)
const confirmedRowIds = new Set(
editedCells.filter(c => c.status === 'confirmed' || c.status === 'edited').map(c => c.row_index)
)
const confirmedCount = confirmedRowIds.size
const totalCount = getUniqueRowCount()
// Group cells by row for generic table display
const cellsByRow: Map<number, GridCell[]> = new Map()
for (const cell of editedCells) {
const existing = cellsByRow.get(cell.row_index) || []
existing.push(cell)
cellsByRow.set(cell.row_index, existing)
}
const sortedRowIndices = [...cellsByRow.keys()].sort((a, b) => a - b)
return (
<div className="space-y-4">
{/* Loading with streaming progress */}
{detecting && (
<div className="space-y-1">
<div className="flex items-center gap-2 text-teal-600 dark:text-teal-400 text-sm">
<div className="animate-spin w-4 h-4 border-2 border-teal-500 border-t-transparent rounded-full" />
{streamProgress
? `Zelle ${streamProgress.current}/${streamProgress.total} erkannt...`
: 'Worterkennung startet...'}
</div>
{streamProgress && streamProgress.total > 0 && (
<div className="w-full bg-gray-200 dark:bg-gray-700 rounded-full h-1.5">
<div
className="bg-teal-500 h-1.5 rounded-full transition-all duration-150"
style={{ width: `${(streamProgress.current / streamProgress.total) * 100}%` }}
/>
</div>
)}
</div>
)}
{/* Layout badge + Mode toggle */}
{gridResult && (
<div className="flex items-center gap-2">
{/* Layout badge */}
<span className={`px-2 py-0.5 rounded text-[10px] uppercase font-semibold ${
isVocab
? 'bg-indigo-100 dark:bg-indigo-900/30 text-indigo-700 dark:text-indigo-300'
: 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}>
{isVocab ? 'Vokabel-Layout' : 'Generisch'}
</span>
{gridShape && (
<span className="text-[10px] text-gray-400">
{gridShape.rows}×{gridShape.cols} = {gridShape.total_cells} Zellen
</span>
)}
<div className="flex-1" />
<button
onClick={() => setMode('overview')}
className={`px-3 py-1.5 text-xs rounded-lg font-medium transition-colors ${
mode === 'overview'
? 'bg-teal-600 text-white'
: 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 hover:bg-gray-200 dark:hover:bg-gray-600'
}`}
>
Uebersicht
</button>
<button
onClick={() => setMode('labeling')}
className={`px-3 py-1.5 text-xs rounded-lg font-medium transition-colors ${
mode === 'labeling'
? 'bg-teal-600 text-white'
: 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 hover:bg-gray-200 dark:hover:bg-gray-600'
}`}
>
Labeling ({confirmedCount}/{totalCount})
</button>
</div>
)}
{/* Overview mode */}
{mode === 'overview' && (
<>
{/* Images: overlay vs clean */}
<div className="grid grid-cols-2 gap-4">
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Mit Grid-Overlay
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{gridResult ? (
// eslint-disable-next-line @next/next/no-img-element
<img
src={`${overlayUrl}?t=${Date.now()}`}
alt="Wort-Overlay"
className="w-full h-auto"
/>
) : (
<div className="aspect-[3/4] flex items-center justify-center text-gray-400 text-sm">
{detecting ? 'Erkenne Woerter...' : 'Keine Daten'}
</div>
)}
</div>
</div>
<div>
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Entzerrtes Bild
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900">
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={dewarpedUrl}
alt="Entzerrt"
className="w-full h-auto"
/>
</div>
</div>
</div>
{/* Result summary (only after streaming completes) */}
{gridResult && summary && !detecting && (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<div className="flex items-center justify-between">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Ergebnis: {summary.non_empty_cells}/{summary.total_cells} Zellen mit Text
({sortedRowIndices.length} Zeilen, {columnsUsed.length} Spalten)
</h4>
<span className="text-xs text-gray-400">
{gridResult.duration_seconds}s
</span>
</div>
{/* Summary badges */}
<div className="flex gap-2 flex-wrap">
<span className="px-2 py-0.5 rounded text-xs font-medium bg-blue-100 dark:bg-blue-900/30 text-blue-700 dark:text-blue-300">
Zellen: {summary.non_empty_cells}/{summary.total_cells}
</span>
{columnsUsed.map((col, i) => (
<span key={i} className={`px-2 py-0.5 rounded text-xs font-medium bg-gray-100 dark:bg-gray-700 ${colTypeColor(col.type)}`}>
C{col.index}: {colTypeLabel(col.type)}
</span>
))}
{summary.low_confidence > 0 && (
<span className="px-2 py-0.5 rounded text-xs font-medium bg-red-100 dark:bg-red-900/30 text-red-700 dark:text-red-300">
Unsicher: {summary.low_confidence}
</span>
)}
</div>
{/* Entry/Cell table */}
<div className="max-h-80 overflow-y-auto">
{/* Unified dynamic table — columns driven by columns_used */}
<table className="w-full text-xs">
<thead className="sticky top-0 bg-white dark:bg-gray-800">
<tr className="text-left text-gray-500 dark:text-gray-400 border-b dark:border-gray-700">
<th className="py-1 pr-2 w-12">Zeile</th>
{columnsUsed.map((col, i) => (
<th key={i} className={`py-1 pr-2 ${colTypeColor(col.type)}`}>
{colTypeLabel(col.type)}
</th>
))}
<th className="py-1 w-12 text-right">Conf</th>
</tr>
</thead>
<tbody>
{sortedRowIndices.map((rowIdx, posIdx) => {
const rowCells = cellsByRow.get(rowIdx) || []
const avgConf = rowCells.length
? Math.round(rowCells.reduce((s, c) => s + c.confidence, 0) / rowCells.length)
: 0
return (
<tr
key={rowIdx}
className={`border-b dark:border-gray-700/50 ${
posIdx === activeIndex ? 'bg-teal-50 dark:bg-teal-900/20' : ''
}`}
onClick={() => { setActiveIndex(posIdx); setMode('labeling') }}
>
<td className="py-1 pr-2 text-gray-400 font-mono text-[10px]">
R{String(rowIdx).padStart(2, '0')}
</td>
{columnsUsed.map((col) => {
const cell = rowCells.find(c => c.col_index === col.index)
return (
<td key={col.index} className="py-1 pr-2 font-mono text-gray-700 dark:text-gray-300 cursor-pointer">
<MultilineText text={cell?.text || ''} />
</td>
)
})}
<td className={`py-1 text-right font-mono ${confColor(avgConf)}`}>
{avgConf}%
</td>
</tr>
)
})}
</tbody>
</table>
<div ref={tableEndRef} />
</div>
</div>
)}
{/* Streaming cell table (shown while detecting, before complete) */}
{detecting && editedCells.length > 0 && !gridResult?.summary?.non_empty_cells && (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Live: {editedCells.length} Zellen erkannt...
</h4>
<div className="max-h-80 overflow-y-auto">
<table className="w-full text-xs">
<thead className="sticky top-0 bg-white dark:bg-gray-800">
<tr className="text-left text-gray-500 dark:text-gray-400 border-b dark:border-gray-700">
<th className="py-1 pr-2 w-12">Zelle</th>
{columnsUsed.map((col, i) => (
<th key={i} className={`py-1 pr-2 ${colTypeColor(col.type)}`}>
{colTypeLabel(col.type)}
</th>
))}
<th className="py-1 w-12 text-right">Conf</th>
</tr>
</thead>
<tbody>
{(() => {
const liveByRow: Map<number, GridCell[]> = new Map()
for (const cell of editedCells) {
const existing = liveByRow.get(cell.row_index) || []
existing.push(cell)
liveByRow.set(cell.row_index, existing)
}
const liveSorted = [...liveByRow.keys()].sort((a, b) => a - b)
return liveSorted.map(rowIdx => {
const rowCells = liveByRow.get(rowIdx) || []
const avgConf = rowCells.length
? Math.round(rowCells.reduce((s, c) => s + c.confidence, 0) / rowCells.length)
: 0
return (
<tr key={rowIdx} className="border-b dark:border-gray-700/50 animate-fade-in">
<td className="py-1 pr-2 text-gray-400 font-mono text-[10px]">
R{String(rowIdx).padStart(2, '0')}
</td>
{columnsUsed.map((col) => {
const cell = rowCells.find(c => c.col_index === col.index)
return (
<td key={col.index} className="py-1 pr-2 font-mono text-gray-700 dark:text-gray-300">
<MultilineText text={cell?.text || ''} />
</td>
)
})}
<td className={`py-1 text-right font-mono ${confColor(avgConf)}`}>
{avgConf}%
</td>
</tr>
)
})
})()}
</tbody>
</table>
<div ref={tableEndRef} />
</div>
</div>
)}
</>
)}
{/* Labeling mode */}
{mode === 'labeling' && editedCells.length > 0 && (
<div className="grid grid-cols-3 gap-4">
{/* Left 2/3: Image with highlighted active row */}
<div className="col-span-2">
<div className="text-xs font-medium text-gray-500 dark:text-gray-400 mb-1">
Zeile {activeIndex + 1} von {getUniqueRowCount()}
</div>
<div className="border rounded-lg overflow-hidden dark:border-gray-700 bg-gray-50 dark:bg-gray-900 relative">
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={`${overlayUrl}?t=${Date.now()}`}
alt="Wort-Overlay"
className="w-full h-auto"
/>
{/* Highlight overlay for active row */}
{(() => {
const rowCells = getRowCells(activeIndex)
return rowCells.map(cell => (
<div
key={cell.cell_id}
className="absolute border-2 border-yellow-400 bg-yellow-400/10 pointer-events-none"
style={{
left: `${cell.bbox_pct.x}%`,
top: `${cell.bbox_pct.y}%`,
width: `${cell.bbox_pct.w}%`,
height: `${cell.bbox_pct.h}%`,
}}
/>
))
})()}
</div>
</div>
{/* Right 1/3: Editable fields */}
<div className="space-y-3">
{/* Navigation */}
<div className="flex items-center justify-between">
<button
onClick={() => setActiveIndex(Math.max(0, activeIndex - 1))}
disabled={activeIndex === 0}
className="px-2 py-1 text-xs border rounded hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600 disabled:opacity-30"
>
Zurueck
</button>
<span className="text-xs text-gray-500">
{activeIndex + 1} / {getUniqueRowCount()}
</span>
<button
onClick={() => setActiveIndex(Math.min(
getUniqueRowCount() - 1,
activeIndex + 1
))}
disabled={activeIndex >= getUniqueRowCount() - 1}
className="px-2 py-1 text-xs border rounded hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600 disabled:opacity-30"
>
Weiter
</button>
</div>
{/* Status badge */}
<div className="flex items-center gap-2">
{(() => {
const rowCells = getRowCells(activeIndex)
const avgConf = rowCells.length
? Math.round(rowCells.reduce((s, c) => s + c.confidence, 0) / rowCells.length)
: 0
return (
<span className={`text-xs font-mono ${confColor(avgConf)}`}>
{avgConf}% Konfidenz
</span>
)
})()}
</div>
{/* Editable fields — one per column, driven by columns_used */}
<div className="space-y-2">
{(() => {
const rowCells = getRowCells(activeIndex)
return columnsUsed.map((col, colIdx) => {
const cell = rowCells.find(c => c.col_index === col.index)
if (!cell) return null
return (
<div key={col.index}>
<div className="flex items-center gap-1 mb-0.5">
<label className={`text-[10px] font-medium ${colTypeColor(col.type)}`}>
{colTypeLabel(col.type)}
</label>
<span className="text-[9px] text-gray-400">{cell.cell_id}</span>
</div>
{/* Cell crop */}
<div className="border rounded dark:border-gray-700 overflow-hidden bg-white dark:bg-gray-900 h-10 relative mb-1">
<CellCrop imageUrl={dewarpedUrl} bbox={cell.bbox_pct} />
</div>
<textarea
ref={colIdx === 0 ? enRef as any : undefined}
rows={Math.max(1, (cell.text || '').split('\n').length)}
value={cell.text || ''}
onChange={(e) => updateCell(cell.cell_id, e.target.value)}
className="w-full px-2 py-1.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600 font-mono resize-none"
/>
</div>
)
})
})()}
</div>
{/* Action buttons */}
<div className="flex gap-2">
<button
onClick={confirmEntry}
className="flex-1 px-3 py-1.5 text-xs bg-green-600 text-white rounded-lg hover:bg-green-700 font-medium"
>
Bestaetigen (Enter)
</button>
<button
onClick={skipEntry}
className="px-3 py-1.5 text-xs border rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600"
>
Skip
</button>
</div>
{/* Shortcuts hint */}
<div className="text-[10px] text-gray-400 space-y-0.5">
<div>Enter = Bestaetigen & weiter</div>
<div>Ctrl+Down = Ueberspringen</div>
<div>Ctrl+Up = Zurueck</div>
</div>
{/* Row list (compact) */}
<div className="border-t dark:border-gray-700 pt-2 mt-2">
<div className="text-[10px] font-medium text-gray-500 dark:text-gray-400 mb-1">
Alle Zeilen
</div>
<div className="max-h-48 overflow-y-auto space-y-0.5">
{sortedRowIndices.map((rowIdx, posIdx) => {
const rowCells = cellsByRow.get(rowIdx) || []
const textParts = rowCells.filter(c => c.text).map(c => c.text.replace(/\n/g, ' '))
return (
<div
key={rowIdx}
onClick={() => setActiveIndex(posIdx)}
className={`flex items-center gap-1 px-2 py-1 rounded text-[10px] cursor-pointer transition-colors ${
posIdx === activeIndex
? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
: 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
}`}
>
<span className="w-6 text-right text-gray-400 font-mono">R{String(rowIdx).padStart(2, '0')}</span>
<span className="truncate text-gray-600 dark:text-gray-400 font-mono">
{textParts.join(' \u2192 ') || '\u2014'}
</span>
</div>
)
})}
</div>
</div>
</div>
</div>
)}
{/* Controls */}
{gridResult && (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4 space-y-3">
<div className="flex items-center gap-3 flex-wrap">
{/* OCR Engine selector */}
<select
value={ocrEngine}
onChange={(e) => setOcrEngine(e.target.value as 'auto' | 'tesseract' | 'rapid')}
className="px-2 py-1.5 text-xs border rounded-lg dark:bg-gray-700 dark:border-gray-600"
>
<option value="auto">Auto (RapidOCR wenn verfuegbar)</option>
<option value="rapid">RapidOCR (ONNX)</option>
<option value="tesseract">Tesseract</option>
</select>
{/* Pronunciation selector (only for vocab) */}
{isVocab && (
<select
value={pronunciation}
onChange={(e) => setPronunciation(e.target.value as 'british' | 'american')}
className="px-2 py-1.5 text-xs border rounded-lg dark:bg-gray-700 dark:border-gray-600"
>
<option value="british">Britisch (RP)</option>
<option value="american">Amerikanisch</option>
</select>
)}
<button
onClick={() => runAutoDetection()}
disabled={detecting}
className="px-3 py-1.5 text-xs border rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600 disabled:opacity-50"
>
Erneut erkennen
</button>
{/* Show which engine was used */}
{usedEngine && (
<span className={`px-2 py-0.5 rounded text-[10px] uppercase font-semibold ${
usedEngine === 'rapid'
? 'bg-purple-100 dark:bg-purple-900/30 text-purple-700 dark:text-purple-300'
: 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}>
{usedEngine}
</span>
)}
<button
onClick={() => goToStep(3)}
className="px-3 py-1.5 text-xs border rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700 dark:border-gray-600 text-orange-600 dark:text-orange-400 border-orange-300 dark:border-orange-700"
>
Zeilen korrigieren (Step 4)
</button>
<div className="flex-1" />
{/* Ground truth */}
{!gtSaved ? (
<>
<input
type="text"
placeholder="Notizen (optional)"
value={gtNotes}
onChange={(e) => setGtNotes(e.target.value)}
className="px-2 py-1 text-xs border rounded dark:bg-gray-700 dark:border-gray-600 w-48"
/>
<button
onClick={() => handleGroundTruth(true)}
className="px-3 py-1.5 text-xs bg-green-600 text-white rounded-lg hover:bg-green-700"
>
Korrekt
</button>
<button
onClick={() => handleGroundTruth(false)}
className="px-3 py-1.5 text-xs bg-red-600 text-white rounded-lg hover:bg-red-700"
>
Fehlerhaft
</button>
</>
) : (
<span className="text-xs text-green-600 dark:text-green-400">
Ground Truth gespeichert
</span>
)}
<button
onClick={onNext}
className="px-4 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 font-medium"
>
Weiter
</button>
</div>
</div>
)}
{error && (
<div className="p-3 bg-red-50 dark:bg-red-900/20 text-red-600 dark:text-red-400 rounded-lg text-sm">
{error}
</div>
)}
</div>
)
}
/**
* CellCrop: Shows a cropped portion of the dewarped image based on percent bbox.
* Uses CSS background-image + background-position for efficient cropping.
*/
function CellCrop({ imageUrl, bbox }: { imageUrl: string; bbox: { x: number; y: number; w: number; h: number } }) {
// Scale factor: how much to zoom into the cell
const scaleX = 100 / bbox.w
const scaleY = 100 / bbox.h
const scale = Math.min(scaleX, scaleY, 8) // Cap zoom at 8x
return (
<div
className="w-full h-full"
style={{
backgroundImage: `url(${imageUrl})`,
backgroundSize: `${scale * 100}%`,
backgroundPosition: `${-bbox.x * scale}% ${-bbox.y * scale}%`,
backgroundRepeat: 'no-repeat',
}}
/>
)
}

View File

@@ -127,15 +127,6 @@ export const navigation: NavCategory[] = [
audience: ['Entwickler', 'Data Scientists', 'Lehrer'],
subgroup: 'KI-Werkzeuge',
},
{
id: 'ocr-pipeline',
name: 'OCR Pipeline',
href: '/ai/ocr-pipeline',
description: 'Schrittweise Seitenrekonstruktion',
purpose: 'Schrittweise Seitenrekonstruktion: Scan begradigen, Spalten erkennen, Woerter lokalisieren und die Seite Wort fuer Wort nachbauen. 6-Schritt-Pipeline mit Ground Truth Validierung.',
audience: ['Entwickler', 'Data Scientists'],
subgroup: 'KI-Werkzeuge',
},
{
id: 'test-quality',
name: 'Test Quality (BQAS)',

View File

@@ -2,8 +2,6 @@
const nextConfig = {
output: 'standalone',
reactStrictMode: true,
// Force unique build ID to bust browser caches on each deploy
generateBuildId: () => `build-${Date.now()}`,
// TODO: Remove after fixing type incompatibilities from restore
typescript: {
ignoreBuildErrors: true,

View File

@@ -8,7 +8,6 @@
"name": "breakpilot-admin-v2",
"version": "1.0.0",
"dependencies": {
"bpmn-js": "^18.0.1",
"jspdf": "^4.1.0",
"jszip": "^3.10.1",
"lucide-react": "^0.468.0",
@@ -16,7 +15,6 @@
"react": "^18.3.1",
"react-dom": "^18.3.1",
"reactflow": "^11.11.4",
"recharts": "^2.15.0",
"uuid": "^13.0.0"
},
"devDependencies": {
@@ -430,16 +428,6 @@
"node": ">=6.9.0"
}
},
"node_modules/@bpmn-io/diagram-js-ui": {
"version": "0.2.3",
"resolved": "https://registry.npmjs.org/@bpmn-io/diagram-js-ui/-/diagram-js-ui-0.2.3.tgz",
"integrity": "sha512-OGyjZKvGK8tHSZ0l7RfeKhilGoOGtFDcoqSGYkX0uhFlo99OVZ9Jn1K7TJGzcE9BdKwvA5Y5kGqHEhdTxHvFfw==",
"license": "MIT",
"dependencies": {
"htm": "^3.1.1",
"preact": "^10.11.2"
}
},
"node_modules/@csstools/color-helpers": {
"version": "5.1.0",
"resolved": "https://registry.npmjs.org/@csstools/color-helpers/-/color-helpers-5.1.0.tgz",
@@ -3008,39 +2996,6 @@
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/bpmn-js": {
"version": "18.12.0",
"resolved": "https://registry.npmjs.org/bpmn-js/-/bpmn-js-18.12.0.tgz",
"integrity": "sha512-Dg2O+r7jpBwLgWGpManc7P4ZfZQfxTVi2xNtXR3Q2G5Hx1RVYVFoNsQED8+FPCgjy6m7ZQbxKP1sjCJt5rbtBg==",
"license": "SEE LICENSE IN LICENSE",
"dependencies": {
"bpmn-moddle": "^10.0.0",
"diagram-js": "^15.9.0",
"diagram-js-direct-editing": "^3.3.0",
"ids": "^3.0.0",
"inherits-browser": "^0.1.0",
"min-dash": "^5.0.0",
"min-dom": "^5.2.0",
"tiny-svg": "^4.1.4"
},
"engines": {
"node": "*"
}
},
"node_modules/bpmn-moddle": {
"version": "10.0.0",
"resolved": "https://registry.npmjs.org/bpmn-moddle/-/bpmn-moddle-10.0.0.tgz",
"integrity": "sha512-vXePD5jkatcILmM3zwJG/m6IIHIghTGB7WvgcdEraEw8E8VdJHrTgrvBUhbzqaXJpnsGQz15QS936xeBY6l9aA==",
"license": "MIT",
"dependencies": {
"min-dash": "^5.0.0",
"moddle": "^8.0.0",
"moddle-xml": "^12.0.0"
},
"engines": {
"node": ">= 20.12"
}
},
"node_modules/braces": {
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz",
@@ -3198,15 +3153,6 @@
"integrity": "sha512-IV3Ou0jSMzZrd3pZ48nLkT9DA7Ag1pnPzaiQhpW7c3RbcqqzvzzVu+L8gfqMp/8IM2MQtSiqaCxrrcfu8I8rMA==",
"license": "MIT"
},
"node_modules/clsx": {
"version": "2.1.1",
"resolved": "https://registry.npmjs.org/clsx/-/clsx-2.1.1.tgz",
"integrity": "sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA==",
"license": "MIT",
"engines": {
"node": ">=6"
}
},
"node_modules/commander": {
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/commander/-/commander-4.1.1.tgz",
@@ -3316,20 +3262,9 @@
"version": "3.2.3",
"resolved": "https://registry.npmjs.org/csstype/-/csstype-3.2.3.tgz",
"integrity": "sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==",
"devOptional": true,
"license": "MIT"
},
"node_modules/d3-array": {
"version": "3.2.4",
"resolved": "https://registry.npmjs.org/d3-array/-/d3-array-3.2.4.tgz",
"integrity": "sha512-tdQAmyA18i4J7wprpYq8ClcxZy3SC31QMeByyCFyRt7BVHdREQZ5lpzoe5mFEYZUWe+oq8HBvk9JjpibyEV4Jg==",
"license": "ISC",
"dependencies": {
"internmap": "1 - 2"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-color": {
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/d3-color/-/d3-color-3.1.0.tgz",
@@ -3370,15 +3305,6 @@
"node": ">=12"
}
},
"node_modules/d3-format": {
"version": "3.1.2",
"resolved": "https://registry.npmjs.org/d3-format/-/d3-format-3.1.2.tgz",
"integrity": "sha512-AJDdYOdnyRDV5b6ArilzCPPwc1ejkHcoyFarqlPqT7zRYjhavcT3uSrqcMvsgh2CgoPbK3RCwyHaVyxYcP2Arg==",
"license": "ISC",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-interpolate": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/d3-interpolate/-/d3-interpolate-3.0.1.tgz",
@@ -3391,31 +3317,6 @@
"node": ">=12"
}
},
"node_modules/d3-path": {
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/d3-path/-/d3-path-3.1.0.tgz",
"integrity": "sha512-p3KP5HCf/bvjBSSKuXid6Zqijx7wIfNW+J/maPs+iwR35at5JCbLUT0LzF1cnjbCHWhqzQTIN2Jpe8pRebIEFQ==",
"license": "ISC",
"engines": {
"node": ">=12"
}
},
"node_modules/d3-scale": {
"version": "4.0.2",
"resolved": "https://registry.npmjs.org/d3-scale/-/d3-scale-4.0.2.tgz",
"integrity": "sha512-GZW464g1SH7ag3Y7hXjf8RoUuAFIqklOAq3MRl4OaWabTFJY9PN/E1YklhXLh+OQ3fM9yS2nOkCoS+WLZ6kvxQ==",
"license": "ISC",
"dependencies": {
"d3-array": "2.10.0 - 3",
"d3-format": "1 - 3",
"d3-interpolate": "1.2.0 - 3",
"d3-time": "2.1.1 - 3",
"d3-time-format": "2 - 4"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-selection": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/d3-selection/-/d3-selection-3.0.0.tgz",
@@ -3425,42 +3326,6 @@
"node": ">=12"
}
},
"node_modules/d3-shape": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/d3-shape/-/d3-shape-3.2.0.tgz",
"integrity": "sha512-SaLBuwGm3MOViRq2ABk3eLoxwZELpH6zhl3FbAoJ7Vm1gofKx6El1Ib5z23NUEhF9AsGl7y+dzLe5Cw2AArGTA==",
"license": "ISC",
"dependencies": {
"d3-path": "^3.1.0"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-time": {
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/d3-time/-/d3-time-3.1.0.tgz",
"integrity": "sha512-VqKjzBLejbSMT4IgbmVgDjpkYrNWUYJnbCGo874u7MMKIWsILRX+OpX/gTk8MqjpT1A/c6HY2dCA77ZN0lkQ2Q==",
"license": "ISC",
"dependencies": {
"d3-array": "2 - 3"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-time-format": {
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/d3-time-format/-/d3-time-format-4.1.0.tgz",
"integrity": "sha512-dJxPBlzC7NugB2PDLwo9Q8JiTR3M3e4/XANkreKSUxF8vvXKqm1Yfq4Q5dl8budlunRVlUUaDUgFt7eA8D6NLg==",
"license": "ISC",
"dependencies": {
"d3-time": "1 - 3"
},
"engines": {
"node": ">=12"
}
},
"node_modules/d3-timer": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/d3-timer/-/d3-timer-3.0.1.tgz",
@@ -3544,12 +3409,6 @@
"dev": true,
"license": "MIT"
},
"node_modules/decimal.js-light": {
"version": "2.5.1",
"resolved": "https://registry.npmjs.org/decimal.js-light/-/decimal.js-light-2.5.1.tgz",
"integrity": "sha512-qIMFpTMZmny+MMIitAB6D7iVPEorVw6YQRWkvarTkT4tBeSLLiHzcwj6q0MmYSFCiVpiqPJTJEYIrpcPzVEIvg==",
"license": "MIT"
},
"node_modules/dequal": {
"version": "2.0.3",
"resolved": "https://registry.npmjs.org/dequal/-/dequal-2.0.3.tgz",
@@ -3570,51 +3429,6 @@
"node": ">=8"
}
},
"node_modules/diagram-js": {
"version": "15.9.1",
"resolved": "https://registry.npmjs.org/diagram-js/-/diagram-js-15.9.1.tgz",
"integrity": "sha512-2JsGmyeTo6o39beq2e/UkTfMopQSM27eXBUzbYQ+1m5VhEnQDkcjcrnRCjcObLMzzXSE/LSJyYhji90sqBFodQ==",
"license": "MIT",
"dependencies": {
"@bpmn-io/diagram-js-ui": "^0.2.3",
"clsx": "^2.1.1",
"didi": "^11.0.0",
"inherits-browser": "^0.1.0",
"min-dash": "^5.0.0",
"min-dom": "^5.2.0",
"object-refs": "^0.4.0",
"path-intersection": "^4.1.0",
"tiny-svg": "^4.1.4"
},
"engines": {
"node": "*"
}
},
"node_modules/diagram-js-direct-editing": {
"version": "3.3.0",
"resolved": "https://registry.npmjs.org/diagram-js-direct-editing/-/diagram-js-direct-editing-3.3.0.tgz",
"integrity": "sha512-EjXYb35J3qBU8lLz5U81hn7wNykVmF7U5DXZ7BvPok2IX7rmPz+ZyaI5AEMiqaC6lpSnHqPxFcPgKEiJcAiv5w==",
"license": "MIT",
"dependencies": {
"min-dash": "^5.0.0",
"min-dom": "^5.2.0"
},
"engines": {
"node": "*"
},
"peerDependencies": {
"diagram-js": "*"
}
},
"node_modules/didi": {
"version": "11.0.0",
"resolved": "https://registry.npmjs.org/didi/-/didi-11.0.0.tgz",
"integrity": "sha512-PzCfRzQttvFpVcYMbSF7h8EsWjeJpVjWH4qDhB5LkMi1ILvHq4Ob0vhM2wLFziPkbUBi+PAo7ODbe2sacR7nJQ==",
"license": "MIT",
"engines": {
"node": ">= 20.12"
}
},
"node_modules/didyoumean": {
"version": "1.2.2",
"resolved": "https://registry.npmjs.org/didyoumean/-/didyoumean-1.2.2.tgz",
@@ -3637,28 +3451,6 @@
"license": "MIT",
"peer": true
},
"node_modules/dom-helpers": {
"version": "5.2.1",
"resolved": "https://registry.npmjs.org/dom-helpers/-/dom-helpers-5.2.1.tgz",
"integrity": "sha512-nRCa7CK3VTrM2NmGkIy4cbK7IZlgBE/PYMn55rrXefr5xXDP0LdtfPnblFDoVdcAfslJ7or6iqAUnx0CCGIWQA==",
"license": "MIT",
"dependencies": {
"@babel/runtime": "^7.8.7",
"csstype": "^3.0.2"
}
},
"node_modules/domify": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/domify/-/domify-3.0.0.tgz",
"integrity": "sha512-bs2yO68JDFOm6rKv8f0EnrM2cENduhRkpqOtt/s5l5JBA/eqGBZCzLPmdYoHtJ6utgLGgcBajFsEQbl12pT0lQ==",
"license": "MIT",
"engines": {
"node": ">=20"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/dompurify": {
"version": "3.3.1",
"resolved": "https://registry.npmjs.org/dompurify/-/dompurify-3.3.1.tgz",
@@ -3758,12 +3550,6 @@
"@types/estree": "^1.0.0"
}
},
"node_modules/eventemitter3": {
"version": "4.0.7",
"resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-4.0.7.tgz",
"integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw==",
"license": "MIT"
},
"node_modules/expect-type": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/expect-type/-/expect-type-1.3.0.tgz",
@@ -3774,15 +3560,6 @@
"node": ">=12.0.0"
}
},
"node_modules/fast-equals": {
"version": "5.4.0",
"resolved": "https://registry.npmjs.org/fast-equals/-/fast-equals-5.4.0.tgz",
"integrity": "sha512-jt2DW/aNFNwke7AUd+Z+e6pz39KO5rzdbbFCg2sGafS4mk13MI7Z8O5z9cADNn5lhGODIgLwug6TZO2ctf7kcw==",
"license": "MIT",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/fast-glob": {
"version": "3.3.3",
"resolved": "https://registry.npmjs.org/fast-glob/-/fast-glob-3.3.3.tgz",
@@ -3928,12 +3705,6 @@
"node": ">= 0.4"
}
},
"node_modules/htm": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/htm/-/htm-3.1.1.tgz",
"integrity": "sha512-983Vyg8NwUE7JkZ6NmOqpCZ+sh1bKv2iYTlUkzlWmA5JD2acKoxd4KVxbMmxX/85mtfdnDmTFoNKcg5DGAvxNQ==",
"license": "Apache-2.0"
},
"node_modules/html-encoding-sniffer": {
"version": "6.0.0",
"resolved": "https://registry.npmjs.org/html-encoding-sniffer/-/html-encoding-sniffer-6.0.0.tgz",
@@ -3989,15 +3760,6 @@
"node": ">= 14"
}
},
"node_modules/ids": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/ids/-/ids-3.0.1.tgz",
"integrity": "sha512-mr0zAgpgA/hzCrHB0DnoTG6xZjNC3ABs4eaksXrpVtfaDatA2SVdDb1ZPLjmKjqzp4kexQRuHXwDWQILVK8FZQ==",
"license": "MIT",
"engines": {
"node": ">= 20.12"
}
},
"node_modules/immediate": {
"version": "3.0.6",
"resolved": "https://registry.npmjs.org/immediate/-/immediate-3.0.6.tgz",
@@ -4020,21 +3782,6 @@
"integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
"license": "ISC"
},
"node_modules/inherits-browser": {
"version": "0.1.0",
"resolved": "https://registry.npmjs.org/inherits-browser/-/inherits-browser-0.1.0.tgz",
"integrity": "sha512-CJHHvW3jQ6q7lzsXPpapLdMx5hDpSF3FSh45pwsj6bKxJJ8Nl8v43i5yXnr3BdfOimGHKyniewQtnAIp3vyJJw==",
"license": "ISC"
},
"node_modules/internmap": {
"version": "2.0.3",
"resolved": "https://registry.npmjs.org/internmap/-/internmap-2.0.3.tgz",
"integrity": "sha512-5Hh7Y1wQbvY5ooGgPbDaL5iYLAPzMTUrjMulskHLH6wnv/A+1q5rgEaiuqEjB+oxGXIVZs1FF+R/KPN3ZSQYYg==",
"license": "ISC",
"engines": {
"node": ">=12"
}
},
"node_modules/iobuffer": {
"version": "5.4.0",
"resolved": "https://registry.npmjs.org/iobuffer/-/iobuffer-5.4.0.tgz",
@@ -4262,12 +4009,6 @@
"dev": true,
"license": "MIT"
},
"node_modules/lodash": {
"version": "4.17.23",
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.23.tgz",
"integrity": "sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==",
"license": "MIT"
},
"node_modules/loose-envify": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/loose-envify/-/loose-envify-1.4.0.tgz",
@@ -4351,22 +4092,6 @@
"node": ">=8.6"
}
},
"node_modules/min-dash": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/min-dash/-/min-dash-5.0.0.tgz",
"integrity": "sha512-EGuoBnVL7/Fnv2sqakpX5WGmZehZ3YMmLayT7sM8E9DRU74kkeyMg4Rik1lsOkR2GbFNeBca4/L+UfU6gF0Edw==",
"license": "MIT"
},
"node_modules/min-dom": {
"version": "5.3.0",
"resolved": "https://registry.npmjs.org/min-dom/-/min-dom-5.3.0.tgz",
"integrity": "sha512-0w5FEBgPAyHhmFojW3zxd7we3D+m5XYS3E/06OyvxmbHJoiQVa4Nagj6RWvoAKYRw5xth6cP5TMePc5cR1M9hA==",
"license": "MIT",
"dependencies": {
"domify": "^3.0.0",
"min-dash": "^5.0.0"
}
},
"node_modules/min-indent": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/min-indent/-/min-indent-1.0.1.tgz",
@@ -4377,31 +4102,6 @@
"node": ">=4"
}
},
"node_modules/moddle": {
"version": "8.1.0",
"resolved": "https://registry.npmjs.org/moddle/-/moddle-8.1.0.tgz",
"integrity": "sha512-dBddc1CNuZHgro8nQWwfPZ2BkyLWdnxoNpPu9d+XKPN96DAiiBOeBw527ft++ebDuFez5PMdaR3pgUgoOaUGrA==",
"license": "MIT",
"dependencies": {
"min-dash": "^5.0.0"
}
},
"node_modules/moddle-xml": {
"version": "12.0.0",
"resolved": "https://registry.npmjs.org/moddle-xml/-/moddle-xml-12.0.0.tgz",
"integrity": "sha512-NJc2+sCe4tvuGlaUBcoZcYf6j9f+z+qxHOyGm/LB3ZrlJXVPPHoBTg/KXgDRCufdBJhJ3AheFs3QU/abABNzRg==",
"license": "MIT",
"dependencies": {
"min-dash": "^5.0.0",
"saxen": "^11.0.2"
},
"engines": {
"node": ">= 18"
},
"peerDependencies": {
"moddle": ">= 6.2.0"
}
},
"node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
@@ -4540,6 +4240,7 @@
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
"integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=0.10.0"
@@ -4555,15 +4256,6 @@
"node": ">= 6"
}
},
"node_modules/object-refs": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/object-refs/-/object-refs-0.4.0.tgz",
"integrity": "sha512-6kJqKWryKZmtte6QYvouas0/EIJKPI1/MMIuRsiBlNuhIMfqYTggzX2F1AJ2+cDs288xyi9GL7FyasHINR98BQ==",
"license": "MIT",
"engines": {
"node": "*"
}
},
"node_modules/obug": {
"version": "2.1.1",
"resolved": "https://registry.npmjs.org/obug/-/obug-2.1.1.tgz",
@@ -4594,15 +4286,6 @@
"url": "https://github.com/inikulin/parse5?sponsor=1"
}
},
"node_modules/path-intersection": {
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/path-intersection/-/path-intersection-4.1.0.tgz",
"integrity": "sha512-urUP6WvhnxbHPdHYl6L7Yrc6+1ny6uOFKPCzPxTSUSYGHG0o94RmI7SvMMaScNAM5RtTf08bg4skc6/kjfne3A==",
"license": "MIT",
"engines": {
"node": ">= 14.20"
}
},
"node_modules/path-parse": {
"version": "1.0.7",
"resolved": "https://registry.npmjs.org/path-parse/-/path-parse-1.0.7.tgz",
@@ -4872,16 +4555,6 @@
"dev": true,
"license": "MIT"
},
"node_modules/preact": {
"version": "10.28.4",
"resolved": "https://registry.npmjs.org/preact/-/preact-10.28.4.tgz",
"integrity": "sha512-uKFfOHWuSNpRFVTnljsCluEFq57OKT+0QdOiQo8XWnQ/pSvg7OpX5eNOejELXJMWy+BwM2nobz0FkvzmnpCNsQ==",
"license": "MIT",
"funding": {
"type": "opencollective",
"url": "https://opencollective.com/preact"
}
},
"node_modules/pretty-format": {
"version": "27.5.1",
"resolved": "https://registry.npmjs.org/pretty-format/-/pretty-format-27.5.1.tgz",
@@ -4904,23 +4577,6 @@
"integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==",
"license": "MIT"
},
"node_modules/prop-types": {
"version": "15.8.1",
"resolved": "https://registry.npmjs.org/prop-types/-/prop-types-15.8.1.tgz",
"integrity": "sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg==",
"license": "MIT",
"dependencies": {
"loose-envify": "^1.4.0",
"object-assign": "^4.1.1",
"react-is": "^16.13.1"
}
},
"node_modules/prop-types/node_modules/react-is": {
"version": "16.13.1",
"resolved": "https://registry.npmjs.org/react-is/-/react-is-16.13.1.tgz",
"integrity": "sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ==",
"license": "MIT"
},
"node_modules/punycode": {
"version": "2.3.1",
"resolved": "https://registry.npmjs.org/punycode/-/punycode-2.3.1.tgz",
@@ -5005,37 +4661,6 @@
"node": ">=0.10.0"
}
},
"node_modules/react-smooth": {
"version": "4.0.4",
"resolved": "https://registry.npmjs.org/react-smooth/-/react-smooth-4.0.4.tgz",
"integrity": "sha512-gnGKTpYwqL0Iii09gHobNolvX4Kiq4PKx6eWBCYYix+8cdw+cGo3do906l1NBPKkSWx1DghC1dlWG9L2uGd61Q==",
"license": "MIT",
"dependencies": {
"fast-equals": "^5.0.1",
"prop-types": "^15.8.1",
"react-transition-group": "^4.4.5"
},
"peerDependencies": {
"react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0",
"react-dom": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0"
}
},
"node_modules/react-transition-group": {
"version": "4.4.5",
"resolved": "https://registry.npmjs.org/react-transition-group/-/react-transition-group-4.4.5.tgz",
"integrity": "sha512-pZcd1MCJoiKiBR2NRxeCRg13uCXbydPnmB4EOeRrY7480qNWO8IIgQG6zlDkm6uRMsURXPuKq0GWtiM59a5Q6g==",
"license": "BSD-3-Clause",
"dependencies": {
"@babel/runtime": "^7.5.5",
"dom-helpers": "^5.0.1",
"loose-envify": "^1.4.0",
"prop-types": "^15.6.2"
},
"peerDependencies": {
"react": ">=16.6.0",
"react-dom": ">=16.6.0"
}
},
"node_modules/reactflow": {
"version": "11.11.4",
"resolved": "https://registry.npmjs.org/reactflow/-/reactflow-11.11.4.tgz",
@@ -5092,44 +4717,6 @@
"node": ">=8.10.0"
}
},
"node_modules/recharts": {
"version": "2.15.4",
"resolved": "https://registry.npmjs.org/recharts/-/recharts-2.15.4.tgz",
"integrity": "sha512-UT/q6fwS3c1dHbXv2uFgYJ9BMFHu3fwnd7AYZaEQhXuYQ4hgsxLvsUXzGdKeZrW5xopzDCvuA2N41WJ88I7zIw==",
"license": "MIT",
"dependencies": {
"clsx": "^2.0.0",
"eventemitter3": "^4.0.1",
"lodash": "^4.17.21",
"react-is": "^18.3.1",
"react-smooth": "^4.0.4",
"recharts-scale": "^0.4.4",
"tiny-invariant": "^1.3.1",
"victory-vendor": "^36.6.8"
},
"engines": {
"node": ">=14"
},
"peerDependencies": {
"react": "^16.0.0 || ^17.0.0 || ^18.0.0 || ^19.0.0",
"react-dom": "^16.0.0 || ^17.0.0 || ^18.0.0 || ^19.0.0"
}
},
"node_modules/recharts-scale": {
"version": "0.4.5",
"resolved": "https://registry.npmjs.org/recharts-scale/-/recharts-scale-0.4.5.tgz",
"integrity": "sha512-kivNFO+0OcUNu7jQquLXAxz1FIwZj8nrj+YkOKc5694NbjCvcT6aSZiIzNzd2Kul4o4rTto8QVR9lMNtxD4G1w==",
"license": "MIT",
"dependencies": {
"decimal.js-light": "^2.4.1"
}
},
"node_modules/recharts/node_modules/react-is": {
"version": "18.3.1",
"resolved": "https://registry.npmjs.org/react-is/-/react-is-18.3.1.tgz",
"integrity": "sha512-/LLMVyas0ljjAtoYiPqYiL8VWXzUUdThrmU5+n20DZv+a+ClRoevUzw5JxU+Ieh5/c87ytoTBV9G1FiKfNJdmg==",
"license": "MIT"
},
"node_modules/redent": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/redent/-/redent-3.0.0.tgz",
@@ -5278,15 +4865,6 @@
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
"license": "MIT"
},
"node_modules/saxen": {
"version": "11.0.2",
"resolved": "https://registry.npmjs.org/saxen/-/saxen-11.0.2.tgz",
"integrity": "sha512-WDb4gqac8uiJzOdOdVpr9NWh9NrJMm7Brn5GX2Poj+mjE/QTXqYQENr8T/mom54dDDgbd3QjwTg23TRHYiWXRA==",
"license": "MIT",
"engines": {
"node": ">= 20.12"
}
},
"node_modules/saxes": {
"version": "6.0.0",
"resolved": "https://registry.npmjs.org/saxes/-/saxes-6.0.0.tgz",
@@ -5582,21 +5160,6 @@
"node": ">=0.8"
}
},
"node_modules/tiny-invariant": {
"version": "1.3.3",
"resolved": "https://registry.npmjs.org/tiny-invariant/-/tiny-invariant-1.3.3.tgz",
"integrity": "sha512-+FbBPE1o9QAYvviau/qC5SE3caw21q3xkvWKBtja5vgqOWIHHJ3ioaq1VPfn/Szqctz2bU/oYeKd9/z5BL+PVg==",
"license": "MIT"
},
"node_modules/tiny-svg": {
"version": "4.1.4",
"resolved": "https://registry.npmjs.org/tiny-svg/-/tiny-svg-4.1.4.tgz",
"integrity": "sha512-cBaEACCbouYrQc9RG+eTXnPYosX1Ijqty/I6DdXovwDd89Pwu4jcmpOR7BuFEF9YCcd7/AWwasE0207WMK7hdw==",
"license": "MIT",
"engines": {
"node": ">= 20"
}
},
"node_modules/tinybench": {
"version": "2.9.0",
"resolved": "https://registry.npmjs.org/tinybench/-/tinybench-2.9.0.tgz",
@@ -5844,28 +5407,6 @@
"uuid": "dist-node/bin/uuid"
}
},
"node_modules/victory-vendor": {
"version": "36.9.2",
"resolved": "https://registry.npmjs.org/victory-vendor/-/victory-vendor-36.9.2.tgz",
"integrity": "sha512-PnpQQMuxlwYdocC8fIJqVXvkeViHYzotI+NJrCuav0ZYFoq912ZHBk3mCeuj+5/VpodOjPe1z0Fk2ihgzlXqjQ==",
"license": "MIT AND ISC",
"dependencies": {
"@types/d3-array": "^3.0.3",
"@types/d3-ease": "^3.0.0",
"@types/d3-interpolate": "^3.0.1",
"@types/d3-scale": "^4.0.2",
"@types/d3-shape": "^3.1.0",
"@types/d3-time": "^3.0.0",
"@types/d3-timer": "^3.0.0",
"d3-array": "^3.1.6",
"d3-ease": "^3.0.1",
"d3-interpolate": "^3.0.1",
"d3-scale": "^4.0.2",
"d3-shape": "^3.1.0",
"d3-time": "^3.0.0",
"d3-timer": "^3.0.1"
}
},
"node_modules/vite": {
"version": "7.3.1",
"resolved": "https://registry.npmjs.org/vite/-/vite-7.3.1.tgz",

View File

@@ -27,7 +27,6 @@
"react-dom": "^18.3.1",
"reactflow": "^11.11.4",
"recharts": "^2.15.0",
"fabric": "^6.0.0",
"uuid": "^13.0.0"
},
"devDependencies": {

File diff suppressed because one or more lines are too long

323
docker-compose.coolify.yml Normal file
View File

@@ -0,0 +1,323 @@
# =========================================================
# BreakPilot Lehrer — KI-Lehrerplattform (Coolify)
# =========================================================
# Requires: breakpilot-core must be running
# Deployed via Coolify. SSL termination handled by Traefik.
# External services (managed separately in Coolify):
# - PostgreSQL, Qdrant, S3-compatible storage
# =========================================================
networks:
breakpilot-network:
external: true
name: breakpilot-network
volumes:
klausur_uploads:
eh_uploads:
ocr_labeling:
paddle_models:
lehrer_backend_data:
opensearch_data:
services:
# =========================================================
# FRONTEND
# =========================================================
admin-lehrer:
build:
context: ./admin-lehrer
dockerfile: Dockerfile
args:
NEXT_PUBLIC_API_URL: ${NEXT_PUBLIC_API_URL:-https://api-lehrer.breakpilot.ai}
NEXT_PUBLIC_OLD_ADMIN_URL: ${NEXT_PUBLIC_OLD_ADMIN_URL:-}
NEXT_PUBLIC_KLAUSUR_SERVICE_URL: ${NEXT_PUBLIC_KLAUSUR_SERVICE_URL:-https://klausur.breakpilot.ai}
NEXT_PUBLIC_VOICE_SERVICE_URL: ${NEXT_PUBLIC_VOICE_SERVICE_URL:-wss://voice.breakpilot.ai}
container_name: bp-lehrer-admin
expose:
- "3000"
volumes:
- lehrer_backend_data:/app/data
environment:
NODE_ENV: production
BACKEND_URL: http://backend-lehrer:8001
CONSENT_SERVICE_URL: http://bp-core-consent-service:8081
KLAUSUR_SERVICE_URL: http://klausur-service:8086
OLLAMA_URL: ${OLLAMA_URL:-}
depends_on:
backend-lehrer:
condition: service_started
labels:
- "traefik.enable=true"
- "traefik.http.routers.admin-lehrer.rule=Host(`admin-lehrer.breakpilot.ai`)"
- "traefik.http.routers.admin-lehrer.entrypoints=https"
- "traefik.http.routers.admin-lehrer.tls=true"
- "traefik.http.routers.admin-lehrer.tls.certresolver=letsencrypt"
- "traefik.http.services.admin-lehrer.loadbalancer.server.port=3000"
restart: unless-stopped
networks:
- breakpilot-network
studio-v2:
build:
context: ./studio-v2
dockerfile: Dockerfile
args:
NEXT_PUBLIC_VOICE_SERVICE_URL: ${NEXT_PUBLIC_VOICE_SERVICE_URL:-wss://voice.breakpilot.ai}
NEXT_PUBLIC_KLAUSUR_SERVICE_URL: ${NEXT_PUBLIC_KLAUSUR_SERVICE_URL:-https://klausur.breakpilot.ai}
container_name: bp-lehrer-studio-v2
expose:
- "3001"
environment:
NODE_ENV: production
BACKEND_URL: http://backend-lehrer:8001
depends_on:
- backend-lehrer
labels:
- "traefik.enable=true"
- "traefik.http.routers.studio.rule=Host(`app.breakpilot.ai`)"
- "traefik.http.routers.studio.entrypoints=https"
- "traefik.http.routers.studio.tls=true"
- "traefik.http.routers.studio.tls.certresolver=letsencrypt"
- "traefik.http.services.studio.loadbalancer.server.port=3001"
restart: unless-stopped
networks:
- breakpilot-network
website:
build:
context: ./website
dockerfile: Dockerfile
args:
NEXT_PUBLIC_BILLING_API_URL: ${NEXT_PUBLIC_BILLING_API_URL:-https://api-core.breakpilot.ai}
NEXT_PUBLIC_APP_URL: ${NEXT_PUBLIC_APP_URL:-https://app.breakpilot.ai}
NEXT_PUBLIC_KLAUSUR_SERVICE_URL: ${NEXT_PUBLIC_KLAUSUR_SERVICE_URL:-https://klausur.breakpilot.ai}
NEXT_PUBLIC_VOICE_SERVICE_URL: ${NEXT_PUBLIC_VOICE_SERVICE_URL:-wss://voice.breakpilot.ai}
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY: ${NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY:-}
container_name: bp-lehrer-website
expose:
- "3000"
environment:
NODE_ENV: production
VAST_API_KEY: ${VAST_API_KEY:-}
CONTROL_API_KEY: ${CONTROL_API_KEY:-}
BACKEND_URL: http://backend-lehrer:8001
CONSENT_SERVICE_URL: http://bp-core-consent-service:8081
EDU_SEARCH_URL: ${EDU_SEARCH_URL:-}
EDU_SEARCH_API_KEY: ${EDU_SEARCH_API_KEY:-}
depends_on:
- backend-lehrer
labels:
- "traefik.enable=true"
- "traefik.http.routers.website.rule=Host(`www.breakpilot.ai`)"
- "traefik.http.routers.website.entrypoints=https"
- "traefik.http.routers.website.tls=true"
- "traefik.http.routers.website.tls.certresolver=letsencrypt"
- "traefik.http.services.website.loadbalancer.server.port=3000"
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# BACKEND
# =========================================================
backend-lehrer:
build:
context: ./backend-lehrer
dockerfile: Dockerfile
container_name: bp-lehrer-backend
user: "0:0"
expose:
- "8001"
volumes:
- lehrer_backend_data:/app/data
environment:
PORT: 8001
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}?options=-csearch_path%3Dlehrer,core,public
JWT_SECRET: ${JWT_SECRET}
ENVIRONMENT: production
CONSENT_SERVICE_URL: http://bp-core-consent-service:8081
KLAUSUR_SERVICE_URL: http://klausur-service:8086
TROCR_SERVICE_URL: ${TROCR_SERVICE_URL:-}
CAMUNDA_URL: ${CAMUNDA_URL:-}
VALKEY_URL: redis://bp-core-valkey:6379/0
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
DEBUG: "false"
ALERTS_AGENT_ENABLED: ${ALERTS_AGENT_ENABLED:-false}
VAST_API_KEY: ${VAST_API_KEY:-}
VAST_INSTANCE_ID: ${VAST_INSTANCE_ID:-}
CONTROL_API_KEY: ${CONTROL_API_KEY:-}
OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-}
OLLAMA_ENABLED: ${OLLAMA_ENABLED:-false}
OLLAMA_DEFAULT_MODEL: ${OLLAMA_DEFAULT_MODEL:-}
OLLAMA_VISION_MODEL: ${OLLAMA_VISION_MODEL:-}
OLLAMA_CORRECTION_MODEL: ${OLLAMA_CORRECTION_MODEL:-}
OLLAMA_TIMEOUT: ${OLLAMA_TIMEOUT:-120}
GAME_USE_DATABASE: ${GAME_USE_DATABASE:-true}
GAME_REQUIRE_AUTH: ${GAME_REQUIRE_AUTH:-true}
GAME_REQUIRE_BILLING: ${GAME_REQUIRE_BILLING:-true}
GAME_LLM_MODEL: ${GAME_LLM_MODEL:-}
SMTP_HOST: ${SMTP_HOST}
SMTP_PORT: ${SMTP_PORT:-587}
SMTP_USERNAME: ${SMTP_USERNAME}
SMTP_PASSWORD: ${SMTP_PASSWORD}
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.ai}
RAG_SERVICE_URL: http://bp-core-rag-service:8097
labels:
- "traefik.enable=true"
- "traefik.http.routers.backend-lehrer.rule=Host(`api-lehrer.breakpilot.ai`)"
- "traefik.http.routers.backend-lehrer.entrypoints=https"
- "traefik.http.routers.backend-lehrer.tls=true"
- "traefik.http.routers.backend-lehrer.tls.certresolver=letsencrypt"
- "traefik.http.services.backend-lehrer.loadbalancer.server.port=8001"
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# MICROSERVICES
# =========================================================
klausur-service:
build:
context: ./klausur-service
dockerfile: Dockerfile
container_name: bp-lehrer-klausur-service
expose:
- "8086"
volumes:
- klausur_uploads:/app/uploads
- eh_uploads:/app/eh-uploads
- ocr_labeling:/app/ocr-labeling
- paddle_models:/root/.paddlex
environment:
JWT_SECRET: ${JWT_SECRET}
BACKEND_URL: http://backend-lehrer:8001
SCHOOL_SERVICE_URL: http://school-service:8084
ENVIRONMENT: production
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}
EMBEDDING_SERVICE_URL: http://bp-core-embedding-service:8087
QDRANT_URL: ${QDRANT_URL}
MINIO_ENDPOINT: ${S3_ENDPOINT}
MINIO_ACCESS_KEY: ${S3_ACCESS_KEY}
MINIO_SECRET_KEY: ${S3_SECRET_KEY}
MINIO_BUCKET: ${S3_BUCKET:-breakpilot-rag}
MINIO_SECURE: ${S3_SECURE:-true}
PADDLEOCR_SERVICE_URL: ${PADDLEOCR_SERVICE_URL:-}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-}
OLLAMA_ENABLED: ${OLLAMA_ENABLED:-false}
OLLAMA_DEFAULT_MODEL: ${OLLAMA_DEFAULT_MODEL:-}
OLLAMA_VISION_MODEL: ${OLLAMA_VISION_MODEL:-}
OLLAMA_CORRECTION_MODEL: ${OLLAMA_CORRECTION_MODEL:-}
RAG_SERVICE_URL: http://bp-core-rag-service:8097
depends_on:
school-service:
condition: service_started
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8086/health"]
interval: 30s
timeout: 30s
retries: 3
start_period: 10s
labels:
- "traefik.enable=true"
- "traefik.http.routers.klausur.rule=Host(`klausur.breakpilot.ai`)"
- "traefik.http.routers.klausur.entrypoints=https"
- "traefik.http.routers.klausur.tls=true"
- "traefik.http.routers.klausur.tls.certresolver=letsencrypt"
- "traefik.http.services.klausur.loadbalancer.server.port=8086"
restart: unless-stopped
networks:
- breakpilot-network
school-service:
build:
context: ./school-service
dockerfile: Dockerfile
container_name: bp-lehrer-school-service
expose:
- "8084"
environment:
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}
JWT_SECRET: ${JWT_SECRET}
PORT: 8084
ENVIRONMENT: production
ALLOWED_ORIGINS: "*"
LLM_GATEWAY_URL: http://backend-lehrer:8001/llm
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# EDU SEARCH
# =========================================================
opensearch:
image: opensearchproject/opensearch:2.11.1
container_name: bp-lehrer-opensearch
environment:
- cluster.name=edu-search-cluster
- node.name=opensearch-node1
- discovery.type=single-node
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_PASSWORD:-Admin123!}
- plugins.security.disabled=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch_data:/usr/share/opensearch/data
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:9200 >/dev/null || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
restart: unless-stopped
networks:
- breakpilot-network
edu-search-service:
build:
context: ./edu-search-service
dockerfile: Dockerfile
container_name: bp-lehrer-edu-search
expose:
- "8088"
environment:
PORT: 8088
OPENSEARCH_URL: http://opensearch:9200
OPENSEARCH_USERNAME: admin
OPENSEARCH_PASSWORD: ${OPENSEARCH_PASSWORD:-Admin123!}
INDEX_NAME: bp_documents_v1
EDU_SEARCH_API_KEY: ${EDU_SEARCH_API_KEY:-}
USER_AGENT: "BreakpilotEduCrawler/1.0 (+contact: security@breakpilot.com)"
RATE_LIMIT_PER_SEC: "0.2"
MAX_DEPTH: "4"
MAX_PAGES_PER_RUN: "500"
DB_HOST: ${POSTGRES_HOST}
DB_PORT: ${POSTGRES_PORT:-5432}
DB_USER: ${POSTGRES_USER}
DB_PASSWORD: ${POSTGRES_PASSWORD}
DB_NAME: ${POSTGRES_DB}
DB_SSLMODE: disable
STAFF_CRAWLER_EMAIL: crawler@breakpilot.de
depends_on:
opensearch:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8088/v1/health"]
interval: 30s
timeout: 3s
start_period: 10s
retries: 3
restart: unless-stopped
networks:
- breakpilot-network

View File

@@ -15,7 +15,6 @@ volumes:
eh_uploads:
ocr_labeling:
paddle_models:
lighton_models:
paddleocr_models:
transcription_models:
transcription_temp:
@@ -210,7 +209,6 @@ services:
- eh_uploads:/app/eh-uploads
- ocr_labeling:/app/ocr-labeling
- paddle_models:/root/.paddlex
- lighton_models:/root/.cache/huggingface
environment:
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
BACKEND_URL: http://backend-lehrer:8001
@@ -233,12 +231,6 @@ services:
OLLAMA_DEFAULT_MODEL: ${OLLAMA_DEFAULT_MODEL:-llama3.2}
OLLAMA_VISION_MODEL: ${OLLAMA_VISION_MODEL:-llama3.2-vision}
OLLAMA_CORRECTION_MODEL: ${OLLAMA_CORRECTION_MODEL:-llama3.2}
OLLAMA_REVIEW_MODEL: ${OLLAMA_REVIEW_MODEL:-qwen3:0.6b}
OLLAMA_REVIEW_BATCH_SIZE: ${OLLAMA_REVIEW_BATCH_SIZE:-20}
REVIEW_ENGINE: ${REVIEW_ENGINE:-spell}
OCR_ENGINE: ${OCR_ENGINE:-auto}
OLLAMA_HTR_MODEL: ${OLLAMA_HTR_MODEL:-qwen2.5vl:32b}
HTR_FALLBACK_MODEL: ${HTR_FALLBACK_MODEL:-trocr-large}
RAG_SERVICE_URL: http://bp-core-rag-service:8097
extra_hosts:
- "host.docker.internal:host-gateway"

View File

@@ -1,114 +0,0 @@
# Chunk-Browser
## Uebersicht
Der Chunk-Browser ermoeglicht das sequenzielle Durchblaettern aller Chunks in einer Qdrant-Collection. Er ist als Tab "Chunk-Browser" auf der RAG-Seite (`/ai/rag`) verfuegbar.
**URL:** `https://macmini:3002/ai/rag` → Tab "Chunk-Browser"
---
## Funktionen
### Collection-Auswahl
Dropdown mit allen verfuegbaren Compliance-Collections:
- `bp_compliance_gesetze`
- `bp_compliance_ce`
- `bp_compliance_datenschutz`
- `bp_dsfa_corpus`
- `bp_compliance_recht`
- `bp_legal_templates`
- `bp_compliance_gdpr`
- `bp_compliance_schulrecht`
- `bp_dsfa_templates`
- `bp_dsfa_risks`
### Seitenweise Navigation
- 20 Chunks pro Seite
- Zurueck/Weiter-Buttons
- Seitennummer und Chunk-Zaehler
- Cursor-basierte Pagination via Qdrant Scroll API
### Textsuche
- Filtert Chunks auf der aktuell geladenen Seite
- Treffer werden gelb hervorgehoben
- Suche ueber den Chunk-Text (payload.text, payload.content, payload.chunk_text)
### Chunk-Details
- Klick auf einen Chunk klappt alle Metadaten aus
- Zeigt: regulation_code, article, language, source, licence, etc.
- Chunks haben eine fortlaufende Nummer (#1, #2, ...)
### Integration mit Regulierungen-Tab
Der Button "In Chunks suchen" bei jeder Regulierung wechselt zum Chunk-Browser mit:
- Vorauswahl der richtigen Collection
- Vorausgefuelltem Suchbegriff (Regulierungsname)
---
## API
### Scroll-Endpoint (API Proxy)
```
GET /api/legal-corpus?action=scroll&collection=bp_compliance_ce&limit=20&offset={cursor}
```
**Parameter:**
| Parameter | Typ | Beschreibung |
|-----------|-----|--------------|
| `collection` | string | Qdrant Collection Name |
| `limit` | number | Chunks pro Seite (max 100) |
| `offset` | string | Cursor fuer naechste Seite (optional) |
| `text_search` | string | Textsuche-Filter (optional) |
**Response:**
```json
{
"chunks": [
{
"id": "uuid",
"text": "...",
"regulation_code": "GDPR",
"article": "Art. 5",
"language": "de"
}
],
"next_offset": "uuid-or-null",
"total_in_page": 20
}
```
### Collection-Count-Endpoint
```
GET /api/legal-corpus?action=collection-count&collection=bp_compliance_ce
```
**Response:**
```json
{
"count": 12345
}
```
---
## Technische Details
- Der API-Proxy spricht direkt mit Qdrant (Port 6333) via dessen `POST /collections/{name}/points/scroll` Endpoint
- Kein Embedding oder rag-service erforderlich
- Textsuche ist client-seitig (kein Embedding noetig)
- Pagination ist cursor-basiert (Qdrant `next_page_offset`)
---
## Weitere Features auf der RAG-Seite
### Originalquelle-Links
Jede Regulierung in der Tabelle hat einen "Originalquelle" Link zum offiziellen Dokument (EUR-Lex, gesetze-im-internet.de, etc.). Definiert in `REGULATION_SOURCES` (88 Eintraege).
### Low-Chunk-Warnung
Regulierungen mit weniger als 10 Chunks aber einem erwarteten Wert >= 10 werden mit einem Amber-Warnsymbol markiert. Dies hilft, fehlgeschlagene oder unvollstaendige Ingestions zu erkennen.

View File

@@ -1,468 +0,0 @@
# OCR Pipeline - Schrittweise Seitenrekonstruktion
**Version:** 2.0.0
**Status:** Produktiv (Schritte 18 implementiert)
**URL:** https://macmini:3002/ai/ocr-pipeline
## Uebersicht
Die OCR Pipeline zerlegt den OCR-Prozess in **8 einzelne Schritte**, um eingescannte Vokabelseiten
aus mehrspaltig gedruckten Schulbuechern Wort fuer Wort zu rekonstruieren.
Jeder Schritt kann individuell geprueft, korrigiert und mit Ground-Truth-Daten versehen werden.
**Ziel:** 10 Vokabelseiten fehlerfrei rekonstruieren.
### Pipeline-Schritte
| Schritt | Name | Beschreibung | Status |
|---------|------|--------------|--------|
| 1 | Begradigung (Deskew) | Scan begradigen (Hough Lines + Word Alignment) | Implementiert |
| 2 | Entzerrung (Dewarp) | Buchwoelbung entzerren (Vertikalkanten-Analyse) | Implementiert |
| 3 | Spaltenerkennung | Unsichtbare Spalten finden (Projektionsprofile + Wortvalidierung) | Implementiert |
| 4 | Zeilenerkennung | Horizontale Zeilen + Kopf-/Fusszeilen-Klassifikation + Luecken-Heilung | Implementiert |
| 5 | Worterkennung | Grid aus Spalten x Zeilen, OCR pro Zelle, Post-Processing | Implementiert |
| 6 | Korrektur | Zeichenverwirrung + regel-basierte Rechtschreibkorrektur (SSE-Stream) | Implementiert |
| 7 | Rekonstruktion | Interaktive Zellenbearbeitung auf Bildhintergrund | Implementiert |
| 8 | Validierung | Ground-Truth-Vergleich und Qualitaetspruefung | Implementiert |
---
## Architektur
```
Admin-Lehrer (Next.js) klausur-service (FastAPI :8086)
┌────────────────────┐ ┌─────────────────────────────┐
│ /ai/ocr-pipeline │ │ /api/v1/ocr-pipeline/ │
│ │ REST │ │
│ PipelineStepper │◄────────►│ Sessions CRUD │
│ StepDeskew │ │ Image Serving │
│ StepDewarp │ SSE │ Deskew/Dewarp/Columns/Rows │
│ StepColumnDetection│◄────────►│ Word Recognition │
│ StepRowDetection │ │ Correction (Spell-Checker) │
│ StepWordRecognition│ │ Reconstruction │
│ StepLlmReview │ │ Ground Truth │
│ StepReconstruction │ └─────────────────────────────┘
│ StepGroundTruth │ │
└────────────────────┘ ▼
┌─────────────────────┐
│ PostgreSQL │
│ ocr_pipeline_sessions│
│ (Images + JSONB) │
└─────────────────────┘
```
### Dateistruktur
```
klausur-service/backend/
├── ocr_pipeline_api.py # FastAPI Router (alle Endpoints)
├── ocr_pipeline_session_store.py # PostgreSQL Persistence
├── cv_vocab_pipeline.py # Computer Vision + NLP Algorithmen
└── migrations/
├── 002_ocr_pipeline_sessions.sql # Basis-Schema
├── 003_add_row_result.sql # Row-Result Spalte
└── 004_add_word_result.sql # Word-Result Spalte
admin-lehrer/
├── app/(admin)/ai/ocr-pipeline/
│ ├── page.tsx # Haupt-Page mit Session-Management
│ └── types.ts # TypeScript Interfaces
└── components/ocr-pipeline/
├── PipelineStepper.tsx # Fortschritts-Stepper
├── StepDeskew.tsx # Schritt 1: Begradigung
├── StepDewarp.tsx # Schritt 2: Entzerrung
├── StepColumnDetection.tsx # Schritt 3: Spaltenerkennung
├── StepRowDetection.tsx # Schritt 4: Zeilenerkennung
├── StepWordRecognition.tsx # Schritt 5: Worterkennung
├── StepLlmReview.tsx # Schritt 6: Korrektur (SSE-Stream)
├── StepReconstruction.tsx # Schritt 7: Rekonstruktion (Canvas)
└── StepGroundTruth.tsx # Schritt 8: Validierung
```
---
## API-Referenz
Alle Endpoints unter `/api/v1/ocr-pipeline/`.
### Sessions
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions` | Neue Session erstellen (Bild hochladen) |
| `GET` | `/sessions` | Alle Sessions auflisten |
| `GET` | `/sessions/{id}` | Session-Info mit allen Step-Results |
| `PUT` | `/sessions/{id}` | Session umbenennen |
| `DELETE` | `/sessions/{id}` | Session loeschen |
### Bilder
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `GET` | `/sessions/{id}/image/original` | Originalbild |
| `GET` | `/sessions/{id}/image/deskewed` | Begradigtes Bild |
| `GET` | `/sessions/{id}/image/dewarped` | Entzerrtes Bild |
| `GET` | `/sessions/{id}/image/binarized` | Binarisiertes Bild |
| `GET` | `/sessions/{id}/image/columns-overlay` | Spalten-Overlay |
| `GET` | `/sessions/{id}/image/rows-overlay` | Zeilen-Overlay |
| `GET` | `/sessions/{id}/image/words-overlay` | Wort-Grid-Overlay |
### Schritt 1: Begradigung
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/deskew` | Automatische Begradigung |
| `POST` | `/sessions/{id}/deskew/manual` | Manuelle Winkelkorrektur |
| `POST` | `/sessions/{id}/ground-truth/deskew` | Ground Truth speichern |
### Schritt 2: Entzerrung
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/dewarp` | Automatische Entzerrung |
| `POST` | `/sessions/{id}/dewarp/manual` | Manueller Scherbungswinkel |
| `POST` | `/sessions/{id}/ground-truth/dewarp` | Ground Truth speichern |
### Schritt 3: Spalten
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/columns` | Automatische Spaltenerkennung |
| `POST` | `/sessions/{id}/columns/manual` | Manuelle Spalten-Definition |
| `POST` | `/sessions/{id}/ground-truth/columns` | Ground Truth speichern |
### Schritt 4: Zeilen
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/rows` | Automatische Zeilenerkennung |
| `POST` | `/sessions/{id}/rows/manual` | Manuelle Zeilen-Definition |
| `POST` | `/sessions/{id}/ground-truth/rows` | Ground Truth speichern |
| `GET` | `/sessions/{id}/ground-truth/rows` | Ground Truth abrufen |
### Schritt 5: Worterkennung
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/words` | Wort-Grid aus Spalten x Zeilen erstellen |
| `POST` | `/sessions/{id}/ground-truth/words` | Ground Truth speichern |
| `GET` | `/sessions/{id}/ground-truth/words` | Ground Truth abrufen |
### Schritt 6: Korrektur
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/llm-review?stream=true` | SSE-Stream Korrektur starten |
| `POST` | `/sessions/{id}/llm-review/apply` | Ausgewaehlte Korrekturen speichern |
### Schritt 7: Rekonstruktion
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `POST` | `/sessions/{id}/reconstruction` | Zellaenderungen speichern |
---
## Schritt 3: Spaltenerkennung (Detail)
### Algorithmus: `detect_column_geometry()`
Zweistufige Erkennung: vertikale Projektionsprofile finden Luecken, Wort-Bounding-Boxes validieren.
```
Bild → Binarisierung → Vertikalprofil → Lueckenerkennung → Wort-Validierung → ColumnGeometry
```
**Wichtige Implementierungsdetails:**
- **Initialer Tesseract-Scan:** Laeuft auf der vollen Bildbreite `[left_x : w]` (nicht nur bis zur Content-Grenze `right_x`), damit Woerter am rechten Rand der letzten Spalte nicht uebersehen werden.
- **Letzte Spalte:** Wird immer bis zur vollen Bildbreite `w` ausgedehnt, nicht nur bis zur erkannten Content-Grenze.
- **Phantom-Spalten-Filter (Step 9):** Spalten mit Breite < 3 % der Content-Breite UND < 3 Woerter werden als Artefakte entfernt; die angrenzenden Spalten schliessen die Luecke.
- **Spaltenzuweisung:** Woerter werden anhand des groessten horizontalen Ueberlappungsbereichs einer Spalte zugeordnet.
### Konfigurierbare Parameter
```python
# Mindestbreite fuer echte Spalten (automatisch: max(20px, 3% content_w))
min_real_col_w = max(20, int(content_w * 0.03))
```
---
## Schritt 4: Zeilenerkennung (Detail)
### Algorithmus: `detect_row_geometry()`
Horizontale Projektionsprofile finden Zeilen-Luecken; word-level Validierung verhindert Fehlschnitte.
**Zusaetzliche Post-Processing-Schritte:**
1. **Artefakt-Zeilen entfernen** (`_is_artifact_row`):
Zeilen, in denen alle erkannten Tokens nur 1 Zeichen lang sind (Scan-Schatten, leere Zeilen),
werden als Artefakte klassifiziert und aus dem Grid entfernt.
2. **Luecken-Heilung** (`_heal_row_gaps`):
Nach dem Entfernen leerer/Artefakt-Zeilen werden die verbleibenden Zeilen auf die Mitte
der entstehenden Luecke ausgedehnt, damit kein Zeileninhalt durch schrumpfende Grenzen
abgeschnitten wird.
```python
def _is_artifact_row(row: RowGeometry) -> bool:
"""Zeile ist Artefakt wenn alle Tokens <= 1 Zeichen."""
if row.word_count == 0: return True
return all(len(w.get('text','').strip()) <= 1 for w in row.words)
def _heal_row_gaps(rows, top_bound, bottom_bound):
"""Verbleibende Zeilen auf Mitte der Luecken ausdehnen."""
...
```
---
## Schritt 5: Worterkennung (Detail)
### Algorithmus: `build_cell_grid()`
Schritt 5 nutzt die Ergebnisse von Schritt 3 (Spalten) und Schritt 4 (Zeilen), um ein Grid
zu erstellen und jede Zelle per OCR auszulesen.
```
Spalten (Step 3): column_en | column_de | column_example
───────────┼─────────────┼────────────────
Zeilen (Step 4): R0 │ hello │ hallo │ Hello, World!
R1 │ world │ Welt │ The whole world
R2 │ book │ Buch │ Read a book
───────────┼─────────────┼────────────────
```
**Ablauf:**
1. **Initialer Scan:** Ganzes Bild einmal per Tesseract/RapidOCR → alle Wort-Bboxes
2. **Zuweisung:** Jedes Wort der Spalte mit groesstem horizontalem Ueberlapp zuordnen
3. **Zell-OCR Fallback:** Leere Zellen bekommen eigenen Crop + erneuten OCR-Aufruf (PSM 6/7)
4. **Batch-Spalten-OCR:** Bei vielen leeren Zellen in einer Spalte: gesamte Spalte einmal OCR-en
5. **Post-Processing:** Continuation-Rows zusammenfuehren, Lautschrift erkennen, Komma-Eintraege splitten
### Post-Processing Pipeline (in `build_vocab_pipeline_streaming`)
| # | Schritt | Funktion | Beschreibung |
|---|---------|----------|--------------|
| 0a | Lautschrift-Fortsetzung | `_merge_phonetic_continuation_rows` | IPA-only Folgezeilen zusammenfuehren |
| 0b | Zeilen-Fortsetzung | `_merge_continuation_rows` | Zeilen mit Kleinbuchstaben-Anfang zusammenfuehren |
| 2 | Lautschrift-Fix | `_fix_phonetic_brackets` | OCR-Lautschrift mit Woerterbuch-IPA ersetzen |
| 3 | Komma-Split | `_split_comma_entries` | `break, broke, broken` → 3 Eintraege |
| 4 | Beispielsaetze | `_attach_example_sentences` | Beispielsatz-Zeilen an vorangehenden Eintrag haengen |
!!! info "Zeichenkorrektur in Schritt 6"
Die Zeichenverwirrungskorrektur (`|``I`, `1``I`, `8``B`) laeuft **nicht** in
Schritt 5, sondern als erstes in Schritt 6 (Korrektur), damit die Aenderungen im UI
sichtbar und rueckgaengig machbar sind.
---
## Schritt 6: Korrektur (Detail)
### Korrektur-Engine
Schritt 6 kombiniert zwei Korrektur-Stufen, beide als SSE-Stream:
**Stufe 1 — Zeichenverwirrungskorrektur** (`_fix_character_confusion`):
| OCR-Fehler | Korrektur | Regel |
|------------|-----------|-------|
| `\|ch` | `Ich` | `\|` am Wortanfang vor Kleinbuchstaben → `I` |
| `\| want` | `I want` | Alleinstehendes `\|``I` |
| `8en` | `Ben` | `8` am Wortanfang vor `en``B` |
| `1 want` | `I want` | Alleinstehendes `1``I` (NICHT vor `.` oder `,`) |
| `1. Kreuz` | unveraendert | `1.` = Listennummer, wird **nicht** korrigiert |
**Stufe 2 — Regel-basierte Rechtschreibkorrektur** (`spell_review_entries_streaming`):
Nutzt `pyspellchecker` (MIT-Lizenz) mit EN+DE-Woerterbuch. Pro Token mit verdaechtigem Zeichen
(`0`, `1`, `5`, `6`, `8`, `|`) werden Kandidaten geprueft:
```python
_SPELL_SUBS = {
'0': ['O', 'o'], '1': ['l', 'I'], '5': ['S', 's'],
'6': ['G', 'g'], '8': ['B', 'b'], '|': ['I', 'l', '1'],
}
```
Logik: Kandidaten werden durch Woerterbuch-Lookup validiert. Strukturregel: Verdaechtiges
Zeichen an Position 0 + Rest klein → erstes Substitut (z.B. `8en``Ben`).
### Umgebungsvariablen
| Variable | Default | Beschreibung |
|----------|---------|--------------|
| `REVIEW_ENGINE` | `spell` | Korrektur-Engine: `spell` oder `llm` |
| `OLLAMA_REVIEW_MODEL` | `qwen3:0.6b` | Ollama-Modell (nur wenn `REVIEW_ENGINE=llm`) |
| `OLLAMA_REVIEW_BATCH_SIZE` | `20` | Eintraege pro LLM-Aufruf |
### SSE-Protokoll
```
POST /sessions/{id}/llm-review?stream=true
Events:
data: {"type": "meta", "total_entries": 96, "to_review": 80, "skipped": 16, "model": "spell"}
data: {"type": "batch", "changes": [...], "entries_reviewed": [0,1,2,...], "progress": {...}}
data: {"type": "complete", "duration_ms": 234}
data: {"type": "error", "detail": "..."}
Change-Format:
{"row_index": 5, "field": "english", "old": "| want", "new": "I want"}
```
---
## Schritt 7: Rekonstruktion (Detail)
Interaktiver Canvas-Editor: Das entzerrte Originalbild wird mit 30 % Opazitaet als Hintergrund
angezeigt, alle Grid-Zellen (auch leere!) werden als editierbare Textfelder darueber gelegt.
**Features:**
- Alle Zellen editierbar — auch leere Zellen (kein Filter mehr)
- Farbkodierung nach Spaltentyp (Blau=EN, Gruen=DE, Orange=Beispiel)
- Leere Pflichtfelder (EN/DE) rot gestrichelt markiert
- Undo/Redo (Ctrl+Z / Ctrl+Shift+Z)
- Tab-Navigation durch alle Zellen (inkl. leerer)
- Zoom 50200 %
- Per-Zell-Reset-Button bei geaenderten Zellen
```
POST /sessions/{id}/reconstruction
Body: {"cells": [{"cell_id": "r5_c2", "text": "corrected text"}]}
```
---
## Datenbank-Schema
```sql
CREATE TABLE ocr_pipeline_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255),
filename VARCHAR(255),
status VARCHAR(50) DEFAULT 'active',
current_step INT DEFAULT 1,
-- Bilder (BYTEA)
original_png BYTEA,
deskewed_png BYTEA,
binarized_png BYTEA,
dewarped_png BYTEA,
-- Step-Results (JSONB)
deskew_result JSONB,
dewarp_result JSONB,
column_result JSONB,
row_result JSONB,
word_result JSONB, -- enthaelt vocab_entries, cells, llm_review
-- Ground Truth + Meta
ground_truth JSONB,
auto_shear_degrees REAL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
```
`word_result` JSONB-Struktur:
```json
{
"vocab_entries": [...],
"cells": [{"cell_id": "r0_c0", "text": "hello", "bbox_pct": {...}, ...}],
"columns_used": [...],
"llm_review": {
"changes": [{"row_index": 5, "field": "english", "old": "...", "new": "..."}],
"model_used": "spell",
"duration_ms": 234
}
}
```
---
## Abhaengigkeiten
### Python (klausur-service)
| Paket | Version | Lizenz | Zweck |
|-------|---------|--------|-------|
| `pytesseract` | ≥0.3.10 | Apache-2.0 | Haupt-OCR (Schritt 35) |
| `opencv-python-headless` | ≥4.8.0 | Apache-2.0 | Bildverarbeitung, Projektionsprofile |
| `Pillow` | ≥10.0.0 | HPND (MIT-kompatibel) | Bildkonvertierung |
| `rapidocr` | latest | Apache-2.0 | Schnelles OCR (ARM64 via ONNX) |
| `onnxruntime` | latest | MIT | ONNX-Inferenz fuer RapidOCR |
| `pyspellchecker` | ≥0.8.1 | MIT | Regel-basierte OCR-Korrektur (Schritt 6) |
| `eng-to-ipa` | latest | MIT | IPA-Lautschrift-Lookup (Schritt 5) |
!!! info "pyspellchecker (neu seit 2026-03)"
`pyspellchecker` (MIT-Lizenz) ersetzt die LLM-basierte Korrektur als Standard-Engine.
EN+DE-Woerterbuch, ~134k Woerter. Kein Ollama notig.
Umschaltbar via `REVIEW_ENGINE=llm` fuer den LLM-Pfad.
---
## Bekannte Einschraenkungen
| Problem | Ursache | Workaround |
|---------|---------|------------|
| Schraeg gedruckte Seiten | Deskew erkennt Text-Rotation, nicht Seiten-Rotation | Manueller Winkel |
| Sehr kleine Schrift (< 8pt) | Tesseract PSM 7 braucht min. Zeichengroesse | Vorher zoomen |
| Handgeschriebene Eintraege | Tesseract/RapidOCR sind fuer Druckschrift optimiert | TrOCR-Engine (geplant) |
| Mehr als 4 Spalten | Projektionsprofil kann verschmelzen | Manuelle Spalten |
---
## Deployment
```bash
# 1. Git push
git push origin main
# 2. Mac Mini pull + build
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && git pull --no-rebase origin main"
# klausur-service (Backend) — bei requirements.txt Aenderungen: klausur-base neu bauen
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && \
/usr/local/bin/docker compose build klausur-service && \
/usr/local/bin/docker compose up -d klausur-service"
# admin-lehrer (Frontend)
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && \
/usr/local/bin/docker compose build admin-lehrer && \
/usr/local/bin/docker compose up -d admin-lehrer"
# 3. Testen unter:
# https://macmini:3002/ai/ocr-pipeline
```
!!! warning "Base-Image bei neuen Python-Paketen"
Wenn `requirements.txt` geaendert wird (z.B. neues Paket hinzugefuegt), muss zuerst
das Base-Image neu gebaut werden:
```bash
ssh macmini "cd ~/Projekte/breakpilot-lehrer && \
/usr/local/bin/docker build -f klausur-service/Dockerfile.base \
-t klausur-base:latest klausur-service/"
```
---
## Aenderungshistorie
| Datum | Version | Aenderung |
|-------|---------|----------|
| 2026-03-03 | 2.0.0 | Schritte 67 implementiert; Spell-Checker, Rekonstruktions-Canvas |
| 2026-03-03 | 1.5.0 | Spaltenerkennung: volle Bildbreite fuer initialen Scan, Phantom-Filter |
| 2026-03-03 | 1.4.0 | Zeilenerkennung: Artefakt-Zeilen entfernen + Luecken-Heilung |
| 2026-03-03 | 1.3.0 | Zeichenkorrektur: `1.`/`\|.` Listenpraefixe werden nicht zu `I.` |
| 2026-03-03 | 1.2.0 | LLM-Engine durch Spell-Checker ersetzt (REVIEW_ENGINE=spell) |
| 2026-02-28 | 1.0.0 | Schritt 5 (Worterkennung) implementiert |
| 2026-02-22 | 0.4.0 | Schritt 4 (Zeilenerkennung) implementiert |
| 2026-02-20 | 0.3.0 | Schritt 3 (Spaltenerkennung) mit Typ-Klassifikation |
| 2026-02-15 | 0.2.0 | Schritt 2 (Entzerrung/Dewarp) |
| 2026-02-12 | 0.1.0 | Schritt 1 (Begradigung/Deskew) + Session-Management |

View File

@@ -8,15 +8,24 @@ RUN npm install
COPY frontend/ ./
RUN npm run build
# Production stage — uses pre-built base with Tesseract + Python deps.
# Base image contains: python:3.11-slim + tesseract-ocr + all pip packages.
# Rebuild base only when requirements.txt or system deps change:
# docker build -f klausur-service/Dockerfile.base -t klausur-base:latest klausur-service/
FROM klausur-base:latest
# Production stage
FROM python:3.11-slim
WORKDIR /app
# Copy backend code (this is the only layer that changes on code edits)
# Install system dependencies (incl. Tesseract OCR for bounding-box extraction)
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
tesseract-ocr \
tesseract-ocr-deu \
tesseract-ocr-eng \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY backend/requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Copy backend code
COPY backend/ ./
# Copy built frontend to the expected path

View File

@@ -1,26 +0,0 @@
# Base image with system dependencies + Python packages.
# These change rarely — build once, reuse on every --no-cache.
#
# Rebuild manually when requirements.txt or system deps change:
# docker build -f klausur-service/Dockerfile.base -t klausur-base:latest klausur-service/
#
FROM python:3.11-slim
WORKDIR /app
# System dependencies (Tesseract OCR, curl for healthcheck)
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
tesseract-ocr \
tesseract-ocr-deu \
tesseract-ocr-eng \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies
COPY backend/requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Clean up pip cache
RUN rm -rf /root/.cache/pip

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@@ -1,276 +0,0 @@
"""
Handwriting HTR API - Hochwertige Handschriftenerkennung (HTR) fuer Klausurkorrekturen.
Endpoints:
- POST /api/v1/htr/recognize - Bild hochladen → handgeschriebener Text
- POST /api/v1/htr/recognize-session - OCR-Pipeline Session als Quelle nutzen
Modell-Strategie:
1. qwen2.5vl:32b via Ollama (primaer, hoechste Qualitaet als VLM)
2. microsoft/trocr-large-handwritten (Fallback, offline, kein Ollama)
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal auf dem Mac Mini.
"""
import io
import os
import logging
import time
import base64
from typing import Optional
import cv2
import numpy as np
from fastapi import APIRouter, HTTPException, Query, UploadFile, File
from pydantic import BaseModel
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/v1/htr", tags=["HTR"])
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434")
OLLAMA_HTR_MODEL = os.getenv("OLLAMA_HTR_MODEL", "qwen2.5vl:32b")
HTR_FALLBACK_MODEL = os.getenv("HTR_FALLBACK_MODEL", "trocr-large")
# ---------------------------------------------------------------------------
# Pydantic Models
# ---------------------------------------------------------------------------
class HTRSessionRequest(BaseModel):
session_id: str
model: str = "auto" # "auto" | "qwen2.5vl" | "trocr-large"
use_clean: bool = True # Prefer clean_png (after handwriting removal)
# ---------------------------------------------------------------------------
# Preprocessing
# ---------------------------------------------------------------------------
def _preprocess_for_htr(img_bgr: np.ndarray) -> np.ndarray:
"""
CLAHE contrast enhancement + upscale to improve HTR accuracy.
Returns grayscale enhanced image.
"""
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
enhanced = clahe.apply(gray)
# Upscale if image is too small
h, w = enhanced.shape
if min(h, w) < 800:
scale = 800 / min(h, w)
enhanced = cv2.resize(
enhanced, None, fx=scale, fy=scale,
interpolation=cv2.INTER_CUBIC
)
return enhanced
def _bgr_to_png_bytes(img_bgr: np.ndarray) -> bytes:
"""Convert BGR ndarray to PNG bytes."""
success, buf = cv2.imencode(".png", img_bgr)
if not success:
raise RuntimeError("Failed to encode image to PNG")
return buf.tobytes()
def _preprocess_image_bytes(image_bytes: bytes) -> bytes:
"""Load image, apply HTR preprocessing, return PNG bytes."""
arr = np.frombuffer(image_bytes, dtype=np.uint8)
img_bgr = cv2.imdecode(arr, cv2.IMREAD_COLOR)
if img_bgr is None:
raise ValueError("Could not decode image")
enhanced = _preprocess_for_htr(img_bgr)
# Convert grayscale back to BGR for encoding
enhanced_bgr = cv2.cvtColor(enhanced, cv2.COLOR_GRAY2BGR)
return _bgr_to_png_bytes(enhanced_bgr)
# ---------------------------------------------------------------------------
# Backend: Ollama qwen2.5vl
# ---------------------------------------------------------------------------
async def _recognize_with_qwen_vl(image_bytes: bytes, language: str) -> Optional[str]:
"""
Send image to Ollama qwen2.5vl:32b for HTR.
Returns extracted text or None on error.
"""
import httpx
lang_hint = {
"de": "Deutsch",
"en": "Englisch",
"de+en": "Deutsch und Englisch",
}.get(language, "Deutsch")
prompt = (
f"Du bist ein OCR-Experte fuer handgeschriebenen Text auf {lang_hint}. "
"Lies den Text im Bild exakt ab — korrigiere KEINE Rechtschreibfehler. "
"Antworte NUR mit dem erkannten Text, ohne Erklaerungen."
)
img_b64 = base64.b64encode(image_bytes).decode("utf-8")
payload = {
"model": OLLAMA_HTR_MODEL,
"prompt": prompt,
"images": [img_b64],
"stream": False,
}
try:
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.post(f"{OLLAMA_BASE_URL}/api/generate", json=payload)
resp.raise_for_status()
data = resp.json()
return data.get("response", "").strip()
except Exception as e:
logger.warning(f"Ollama qwen2.5vl HTR failed: {e}")
return None
# ---------------------------------------------------------------------------
# Backend: TrOCR-large fallback
# ---------------------------------------------------------------------------
async def _recognize_with_trocr_large(image_bytes: bytes) -> Optional[str]:
"""
Use microsoft/trocr-large-handwritten via trocr_service.py.
Returns extracted text or None on error.
"""
try:
from services.trocr_service import run_trocr_ocr, _check_trocr_available
if not _check_trocr_available():
logger.warning("TrOCR not available for HTR fallback")
return None
text, confidence = await run_trocr_ocr(image_bytes, handwritten=True, size="large")
return text.strip() if text else None
except Exception as e:
logger.warning(f"TrOCR-large HTR failed: {e}")
return None
# ---------------------------------------------------------------------------
# Core recognition logic
# ---------------------------------------------------------------------------
async def _do_recognize(
image_bytes: bytes,
model: str = "auto",
preprocess: bool = True,
language: str = "de",
) -> dict:
"""
Core HTR logic: preprocess → try Ollama → fallback to TrOCR-large.
Returns dict with text, model_used, processing_time_ms.
"""
t0 = time.monotonic()
if preprocess:
try:
image_bytes = _preprocess_image_bytes(image_bytes)
except Exception as e:
logger.warning(f"HTR preprocessing failed, using raw image: {e}")
text: Optional[str] = None
model_used: str = "none"
use_qwen = model in ("auto", "qwen2.5vl")
use_trocr = model in ("auto", "trocr-large") or (use_qwen and text is None)
if use_qwen:
text = await _recognize_with_qwen_vl(image_bytes, language)
if text is not None:
model_used = f"qwen2.5vl ({OLLAMA_HTR_MODEL})"
if text is None and (use_trocr or model == "trocr-large"):
text = await _recognize_with_trocr_large(image_bytes)
if text is not None:
model_used = "trocr-large-handwritten"
if text is None:
text = ""
model_used = "none (all backends failed)"
elapsed_ms = int((time.monotonic() - t0) * 1000)
return {
"text": text,
"model_used": model_used,
"processing_time_ms": elapsed_ms,
"language": language,
"preprocessed": preprocess,
}
# ---------------------------------------------------------------------------
# Endpoints
# ---------------------------------------------------------------------------
@router.post("/recognize")
async def recognize_handwriting(
file: UploadFile = File(...),
model: str = Query("auto", description="auto | qwen2.5vl | trocr-large"),
preprocess: bool = Query(True, description="Apply CLAHE + upscale before recognition"),
language: str = Query("de", description="de | en | de+en"),
):
"""
Upload an image and get back the handwritten text as plain text.
Tries qwen2.5vl:32b via Ollama first, falls back to TrOCR-large-handwritten.
"""
if model not in ("auto", "qwen2.5vl", "trocr-large"):
raise HTTPException(status_code=400, detail="model must be one of: auto, qwen2.5vl, trocr-large")
if language not in ("de", "en", "de+en"):
raise HTTPException(status_code=400, detail="language must be one of: de, en, de+en")
image_bytes = await file.read()
if not image_bytes:
raise HTTPException(status_code=400, detail="Empty file")
return await _do_recognize(image_bytes, model=model, preprocess=preprocess, language=language)
@router.post("/recognize-session")
async def recognize_from_session(req: HTRSessionRequest):
"""
Use an OCR-Pipeline session as image source for HTR.
Set use_clean=true to prefer the clean image (after handwriting removal step).
This is useful when you want to do HTR on isolated handwriting regions.
"""
from ocr_pipeline_session_store import get_session_db, get_session_image
session = await get_session_db(req.session_id)
if not session:
raise HTTPException(status_code=404, detail=f"Session {req.session_id} not found")
# Choose source image
image_bytes: Optional[bytes] = None
source_used: str = ""
if req.use_clean:
image_bytes = await get_session_image(req.session_id, "clean")
if image_bytes:
source_used = "clean"
if not image_bytes:
image_bytes = await get_session_image(req.session_id, "deskewed")
if image_bytes:
source_used = "deskewed"
if not image_bytes:
image_bytes = await get_session_image(req.session_id, "original")
source_used = "original"
if not image_bytes:
raise HTTPException(status_code=404, detail="No image available in session")
result = await _do_recognize(image_bytes, model=req.model)
result["session_id"] = req.session_id
result["source_image"] = source_used
return result

View File

@@ -42,12 +42,6 @@ try:
except ImportError:
trocr_router = None
from vocab_worksheet_api import router as vocab_router, set_db_pool as set_vocab_db_pool, _init_vocab_table, _load_all_sessions, DATABASE_URL as VOCAB_DATABASE_URL
from ocr_pipeline_api import router as ocr_pipeline_router
from ocr_pipeline_session_store import init_ocr_pipeline_tables
try:
from handwriting_htr_api import router as htr_router
except ImportError:
htr_router = None
try:
from dsfa_rag_api import router as dsfa_rag_router, set_db_pool as set_dsfa_db_pool
from dsfa_corpus_ingestion import DSFAQdrantService, DATABASE_URL as DSFA_DATABASE_URL
@@ -81,13 +75,6 @@ async def lifespan(app: FastAPI):
except Exception as e:
print(f"Warning: Vocab sessions database initialization failed: {e}")
# Initialize OCR Pipeline session tables
try:
await init_ocr_pipeline_tables()
print("OCR Pipeline session tables initialized")
except Exception as e:
print(f"Warning: OCR Pipeline tables initialization failed: {e}")
# Initialize database pool for DSFA RAG
dsfa_db_pool = None
if DSFA_DATABASE_URL and set_dsfa_db_pool:
@@ -117,19 +104,6 @@ async def lifespan(app: FastAPI):
# Ensure EH upload directory exists
os.makedirs(EH_UPLOAD_DIR, exist_ok=True)
# Preload LightOnOCR model if OCR_ENGINE=lighton (avoids cold-start on first request)
ocr_engine_env = os.getenv("OCR_ENGINE", "auto")
if ocr_engine_env == "lighton":
try:
import asyncio
from services.lighton_ocr_service import get_lighton_model
loop = asyncio.get_event_loop()
print("Preloading LightOnOCR-2-1B at startup (OCR_ENGINE=lighton)...")
await loop.run_in_executor(None, get_lighton_model)
print("LightOnOCR-2-1B preloaded")
except Exception as e:
print(f"Warning: LightOnOCR preload failed: {e}")
yield
print("Klausur-Service shutting down...")
@@ -176,9 +150,6 @@ app.include_router(mail_router) # Unified Inbox Mail
if trocr_router:
app.include_router(trocr_router) # TrOCR Handwriting OCR
app.include_router(vocab_router) # Vocabulary Worksheet Generator
app.include_router(ocr_pipeline_router) # OCR Pipeline (step-by-step)
if htr_router:
app.include_router(htr_router) # Handwriting HTR (Klausur)
if dsfa_rag_router:
app.include_router(dsfa_rag_router) # DSFA RAG Corpus Search

View File

@@ -1,28 +0,0 @@
-- OCR Pipeline Sessions - Persistent session storage
-- Applied automatically by ocr_pipeline_session_store.init_ocr_pipeline_tables()
CREATE TABLE IF NOT EXISTS ocr_pipeline_sessions (
id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
filename VARCHAR(255),
status VARCHAR(50) DEFAULT 'active',
current_step INT DEFAULT 1,
original_png BYTEA,
deskewed_png BYTEA,
binarized_png BYTEA,
dewarped_png BYTEA,
deskew_result JSONB,
dewarp_result JSONB,
column_result JSONB,
ground_truth JSONB DEFAULT '{}',
auto_shear_degrees FLOAT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index for listing sessions
CREATE INDEX IF NOT EXISTS idx_ocr_pipeline_sessions_created
ON ocr_pipeline_sessions (created_at DESC);
CREATE INDEX IF NOT EXISTS idx_ocr_pipeline_sessions_status
ON ocr_pipeline_sessions (status);

View File

@@ -1,4 +0,0 @@
-- Migration 003: Add row_result column for row geometry detection
-- Stores detected row geometries including header/footer classification
ALTER TABLE ocr_pipeline_sessions ADD COLUMN IF NOT EXISTS row_result JSONB;

View File

@@ -1,4 +0,0 @@
-- Migration 004: Add word_result column for OCR Pipeline Step 5
-- Stores the word recognition grid result (entries with english/german/example + bboxes)
ALTER TABLE ocr_pipeline_sessions ADD COLUMN IF NOT EXISTS word_result JSONB;

View File

@@ -1,7 +0,0 @@
-- Migration 005: Add document type detection columns
-- These columns store the result of automatic document type detection
-- (vocab_table, full_text, generic_table) after dewarp.
ALTER TABLE ocr_pipeline_sessions
ADD COLUMN IF NOT EXISTS doc_type VARCHAR(50),
ADD COLUMN IF NOT EXISTS doc_type_result JSONB;

File diff suppressed because it is too large Load Diff

View File

@@ -1,243 +0,0 @@
"""
OCR Pipeline Session Store - PostgreSQL persistence for OCR pipeline sessions.
Replaces in-memory storage with database persistence.
See migrations/002_ocr_pipeline_sessions.sql for schema.
"""
import os
import uuid
import logging
import json
from typing import Optional, List, Dict, Any
import asyncpg
logger = logging.getLogger(__name__)
# Database configuration (same as vocab_session_store)
DATABASE_URL = os.getenv(
"DATABASE_URL",
"postgresql://breakpilot:breakpilot@postgres:5432/breakpilot_db"
)
# Connection pool (initialized lazily)
_pool: Optional[asyncpg.Pool] = None
async def get_pool() -> asyncpg.Pool:
"""Get or create the database connection pool."""
global _pool
if _pool is None:
_pool = await asyncpg.create_pool(DATABASE_URL, min_size=2, max_size=10)
return _pool
async def init_ocr_pipeline_tables():
"""Initialize OCR pipeline tables if they don't exist."""
pool = await get_pool()
async with pool.acquire() as conn:
tables_exist = await conn.fetchval("""
SELECT EXISTS (
SELECT FROM information_schema.tables
WHERE table_name = 'ocr_pipeline_sessions'
)
""")
if not tables_exist:
logger.info("Creating OCR pipeline tables...")
migration_path = os.path.join(
os.path.dirname(__file__),
"migrations/002_ocr_pipeline_sessions.sql"
)
if os.path.exists(migration_path):
with open(migration_path, "r") as f:
sql = f.read()
await conn.execute(sql)
logger.info("OCR pipeline tables created successfully")
else:
logger.warning(f"Migration file not found: {migration_path}")
else:
logger.debug("OCR pipeline tables already exist")
# Ensure new columns exist (idempotent ALTER TABLE)
await conn.execute("""
ALTER TABLE ocr_pipeline_sessions
ADD COLUMN IF NOT EXISTS clean_png BYTEA,
ADD COLUMN IF NOT EXISTS handwriting_removal_meta JSONB,
ADD COLUMN IF NOT EXISTS doc_type VARCHAR(50),
ADD COLUMN IF NOT EXISTS doc_type_result JSONB
""")
# =============================================================================
# SESSION CRUD
# =============================================================================
async def create_session_db(
session_id: str,
name: str,
filename: str,
original_png: bytes,
) -> Dict[str, Any]:
"""Create a new OCR pipeline session."""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow("""
INSERT INTO ocr_pipeline_sessions (
id, name, filename, original_png, status, current_step
) VALUES ($1, $2, $3, $4, 'active', 1)
RETURNING id, name, filename, status, current_step,
deskew_result, dewarp_result, column_result, row_result,
word_result, ground_truth, auto_shear_degrees,
doc_type, doc_type_result,
created_at, updated_at
""", uuid.UUID(session_id), name, filename, original_png)
return _row_to_dict(row)
async def get_session_db(session_id: str) -> Optional[Dict[str, Any]]:
"""Get session metadata (without images)."""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT id, name, filename, status, current_step,
deskew_result, dewarp_result, column_result, row_result,
word_result, ground_truth, auto_shear_degrees,
doc_type, doc_type_result,
created_at, updated_at
FROM ocr_pipeline_sessions WHERE id = $1
""", uuid.UUID(session_id))
if row:
return _row_to_dict(row)
return None
async def get_session_image(session_id: str, image_type: str) -> Optional[bytes]:
"""Load a single image (BYTEA) from the session."""
column_map = {
"original": "original_png",
"deskewed": "deskewed_png",
"binarized": "binarized_png",
"dewarped": "dewarped_png",
"clean": "clean_png",
}
column = column_map.get(image_type)
if not column:
return None
pool = await get_pool()
async with pool.acquire() as conn:
return await conn.fetchval(
f"SELECT {column} FROM ocr_pipeline_sessions WHERE id = $1",
uuid.UUID(session_id)
)
async def update_session_db(session_id: str, **kwargs) -> Optional[Dict[str, Any]]:
"""Update session fields dynamically."""
pool = await get_pool()
fields = []
values = []
param_idx = 1
allowed_fields = {
'name', 'filename', 'status', 'current_step',
'original_png', 'deskewed_png', 'binarized_png', 'dewarped_png',
'clean_png', 'handwriting_removal_meta',
'deskew_result', 'dewarp_result', 'column_result', 'row_result',
'word_result', 'ground_truth', 'auto_shear_degrees',
'doc_type', 'doc_type_result',
}
jsonb_fields = {'deskew_result', 'dewarp_result', 'column_result', 'row_result', 'word_result', 'ground_truth', 'handwriting_removal_meta', 'doc_type_result'}
for key, value in kwargs.items():
if key in allowed_fields:
fields.append(f"{key} = ${param_idx}")
if key in jsonb_fields and value is not None and not isinstance(value, str):
value = json.dumps(value)
values.append(value)
param_idx += 1
if not fields:
return await get_session_db(session_id)
# Always update updated_at
fields.append(f"updated_at = NOW()")
values.append(uuid.UUID(session_id))
async with pool.acquire() as conn:
row = await conn.fetchrow(f"""
UPDATE ocr_pipeline_sessions
SET {', '.join(fields)}
WHERE id = ${param_idx}
RETURNING id, name, filename, status, current_step,
deskew_result, dewarp_result, column_result, row_result,
word_result, ground_truth, auto_shear_degrees,
doc_type, doc_type_result,
created_at, updated_at
""", *values)
if row:
return _row_to_dict(row)
return None
async def list_sessions_db(limit: int = 50) -> List[Dict[str, Any]]:
"""List all sessions (metadata only, no images)."""
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch("""
SELECT id, name, filename, status, current_step,
created_at, updated_at
FROM ocr_pipeline_sessions
ORDER BY created_at DESC
LIMIT $1
""", limit)
return [_row_to_dict(row) for row in rows]
async def delete_session_db(session_id: str) -> bool:
"""Delete a session."""
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.execute("""
DELETE FROM ocr_pipeline_sessions WHERE id = $1
""", uuid.UUID(session_id))
return result == "DELETE 1"
# =============================================================================
# HELPER
# =============================================================================
def _row_to_dict(row: asyncpg.Record) -> Dict[str, Any]:
"""Convert asyncpg Record to JSON-serializable dict."""
if row is None:
return {}
result = dict(row)
# UUID → string
for key in ['id', 'session_id']:
if key in result and result[key] is not None:
result[key] = str(result[key])
# datetime → ISO string
for key in ['created_at', 'updated_at']:
if key in result and result[key] is not None:
result[key] = result[key].isoformat()
# JSONB → parsed (asyncpg returns str for JSONB)
for key in ['deskew_result', 'dewarp_result', 'column_result', 'row_result', 'word_result', 'ground_truth', 'doc_type_result']:
if key in result and result[key] is not None:
if isinstance(result[key], str):
result[key] = json.loads(result[key])
return result

View File

@@ -28,16 +28,6 @@ opencv-python-headless>=4.8.0
pytesseract>=0.3.10
Pillow>=10.0.0
# RapidOCR (PaddleOCR models on ONNX Runtime — works on ARM64 natively)
rapidocr
onnxruntime
# IPA pronunciation dictionary lookup (MIT license, bundled CMU dict ~134k words)
eng-to-ipa
# Spell-checker for rule-based OCR correction (MIT license)
pyspellchecker>=0.8.1
# PostgreSQL (for metrics storage)
psycopg2-binary>=2.9.0
asyncpg>=0.29.0
@@ -45,9 +35,6 @@ asyncpg>=0.29.0
# Email validation for Pydantic
email-validator>=2.0.0
# DOCX export for reconstruction editor (MIT license)
python-docx>=1.1.0
# Testing
pytest>=8.0.0
pytest-asyncio>=0.23.0

View File

@@ -6,7 +6,6 @@ Uses multiple detection methods:
1. Color-based detection (blue/red ink)
2. Stroke analysis (thin irregular strokes)
3. Edge density variance
4. Pencil detection (gray ink)
DATENSCHUTZ: All processing happens locally on Mac Mini.
"""
@@ -38,16 +37,12 @@ class DetectionResult:
detection_method: str # Which method was primarily used
def detect_handwriting(image_bytes: bytes, target_ink: str = "all") -> DetectionResult:
def detect_handwriting(image_bytes: bytes) -> DetectionResult:
"""
Detect handwriting in an image.
Args:
image_bytes: Image as bytes (PNG, JPG, etc.)
target_ink: Which ink types to detect:
- "all" → all methods combined (incl. pencil)
- "colored" → only color-based (blue/red/green pen)
- "pencil" → only pencil (gray ink)
Returns:
DetectionResult with binary mask where handwriting is white (255)
@@ -67,51 +62,35 @@ def detect_handwriting(image_bytes: bytes, target_ink: str = "all") -> Detection
# Convert to BGR if needed (OpenCV format)
if len(img_array.shape) == 2:
# Grayscale to BGR
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_GRAY2BGR)
elif img_array.shape[2] == 4:
# RGBA to BGR
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGBA2BGR)
elif img_array.shape[2] == 3:
# RGB to BGR
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
else:
img_bgr = img_array
# Select detection methods based on target_ink
masks_and_weights = []
if target_ink in ("all", "colored"):
color_mask, color_conf = _detect_by_color(img_bgr)
masks_and_weights.append((color_mask, color_conf, "color"))
if target_ink == "all":
stroke_mask, stroke_conf = _detect_by_stroke_analysis(img_bgr)
variance_mask, variance_conf = _detect_by_variance(img_bgr)
masks_and_weights.append((stroke_mask, stroke_conf, "stroke"))
masks_and_weights.append((variance_mask, variance_conf, "variance"))
if target_ink in ("all", "pencil"):
pencil_mask, pencil_conf = _detect_pencil(img_bgr)
masks_and_weights.append((pencil_mask, pencil_conf, "pencil"))
if not masks_and_weights:
# Fallback: use all methods
color_mask, color_conf = _detect_by_color(img_bgr)
stroke_mask, stroke_conf = _detect_by_stroke_analysis(img_bgr)
variance_mask, variance_conf = _detect_by_variance(img_bgr)
pencil_mask, pencil_conf = _detect_pencil(img_bgr)
masks_and_weights = [
(color_mask, color_conf, "color"),
(stroke_mask, stroke_conf, "stroke"),
(variance_mask, variance_conf, "variance"),
(pencil_mask, pencil_conf, "pencil"),
]
# Run multiple detection methods
color_mask, color_confidence = _detect_by_color(img_bgr)
stroke_mask, stroke_confidence = _detect_by_stroke_analysis(img_bgr)
variance_mask, variance_confidence = _detect_by_variance(img_bgr)
# Combine masks using weighted average
total_weight = sum(w for _, w, _ in masks_and_weights)
weights = [color_confidence, stroke_confidence, variance_confidence]
total_weight = sum(weights)
if total_weight > 0:
combined_mask = sum(
m.astype(np.float32) * w for m, w, _ in masks_and_weights
# Weighted combination
combined_mask = (
color_mask.astype(np.float32) * color_confidence +
stroke_mask.astype(np.float32) * stroke_confidence +
variance_mask.astype(np.float32) * variance_confidence
) / total_weight
# Threshold to binary
combined_mask = (combined_mask > 127).astype(np.uint8) * 255
else:
combined_mask = np.zeros(img_bgr.shape[:2], dtype=np.uint8)
@@ -124,11 +103,19 @@ def detect_handwriting(image_bytes: bytes, target_ink: str = "all") -> Detection
handwriting_pixels = np.sum(combined_mask > 0)
handwriting_ratio = handwriting_pixels / total_pixels if total_pixels > 0 else 0
# Determine primary method (highest confidence)
primary_method = max(masks_and_weights, key=lambda x: x[1])[2] if masks_and_weights else "combined"
overall_confidence = total_weight / len(masks_and_weights) if masks_and_weights else 0.0
# Determine primary method
primary_method = "combined"
max_conf = max(color_confidence, stroke_confidence, variance_confidence)
if max_conf == color_confidence:
primary_method = "color"
elif max_conf == stroke_confidence:
primary_method = "stroke"
else:
primary_method = "variance"
logger.info(f"Handwriting detection (target_ink={target_ink}): {handwriting_ratio:.2%} handwriting, "
overall_confidence = total_weight / 3.0 # Average confidence
logger.info(f"Handwriting detection: {handwriting_ratio:.2%} handwriting, "
f"confidence={overall_confidence:.2f}, method={primary_method}")
return DetectionResult(
@@ -193,27 +180,6 @@ def _detect_by_color(img_bgr: np.ndarray) -> Tuple[np.ndarray, float]:
return color_mask, confidence
def _detect_pencil(img_bgr: np.ndarray) -> Tuple[np.ndarray, float]:
"""
Detect pencil marks (gray ink, ~140-220 on 255-scale).
Paper is usually >230, dark ink <130.
Pencil falls in the 140-220 gray range.
"""
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
pencil_mask = cv2.inRange(gray, 140, 220)
# Remove small noise artifacts
kernel = np.ones((2, 2), np.uint8)
pencil_mask = cv2.morphologyEx(pencil_mask, cv2.MORPH_OPEN, kernel, iterations=1)
ratio = np.sum(pencil_mask > 0) / pencil_mask.size
# Good confidence if pencil pixels are in a plausible range
confidence = 0.75 if 0.002 < ratio < 0.2 else 0.2
return pencil_mask, confidence
def _detect_by_stroke_analysis(img_bgr: np.ndarray) -> Tuple[np.ndarray, float]:
"""
Detect handwriting by analyzing stroke characteristics.

View File

@@ -350,77 +350,6 @@ def layout_to_fabric_json(layout_result: LayoutResult) -> str:
return json.dumps(layout_result.fabric_json, ensure_ascii=False, indent=2)
def cells_to_fabric_json(
cells: List[Dict[str, Any]],
image_width: int,
image_height: int,
) -> Dict[str, Any]:
"""Convert pipeline grid cells to Fabric.js-compatible JSON.
Each cell becomes a Textbox object positioned at its bbox_pct coordinates
(converted to pixels). Colour-coded by column type.
Args:
cells: List of cell dicts from GridResult (with bbox_pct, col_type, text).
image_width: Source image width in pixels.
image_height: Source image height in pixels.
Returns:
Dict with Fabric.js canvas JSON (version + objects array).
"""
COL_TYPE_COLORS = {
'column_en': '#3b82f6',
'column_de': '#22c55e',
'column_example': '#f97316',
'column_text': '#a855f7',
'page_ref': '#06b6d4',
'column_marker': '#6b7280',
}
fabric_objects = []
for cell in cells:
bp = cell.get('bbox_pct', {})
x = bp.get('x', 0) / 100 * image_width
y = bp.get('y', 0) / 100 * image_height
w = bp.get('w', 10) / 100 * image_width
h = bp.get('h', 3) / 100 * image_height
col_type = cell.get('col_type', '')
color = COL_TYPE_COLORS.get(col_type, '#6b7280')
font_size = max(8, min(18, h * 0.55))
fabric_objects.append({
"type": "textbox",
"version": "6.0.0",
"originX": "left",
"originY": "top",
"left": round(x, 1),
"top": round(y, 1),
"width": max(round(w, 1), 30),
"height": round(h, 1),
"fill": "#000000",
"stroke": color,
"strokeWidth": 1,
"text": cell.get('text', ''),
"fontSize": round(font_size, 1),
"fontFamily": "monospace",
"editable": True,
"selectable": True,
"backgroundColor": color + "22",
"data": {
"cellId": cell.get('cell_id', ''),
"colType": col_type,
"rowIndex": cell.get('row_index', 0),
"colIndex": cell.get('col_index', 0),
"originalText": cell.get('text', ''),
},
})
return {
"version": "6.0.0",
"objects": fabric_objects,
}
def reconstruct_and_clean(
image_bytes: bytes,
remove_handwriting: bool = True

View File

@@ -31,10 +31,8 @@ from datetime import datetime, timedelta
logger = logging.getLogger(__name__)
# Lazy loading for heavy dependencies
# Cache keyed by model_name to support base and large variants simultaneously
_trocr_models: dict = {} # {model_name: (processor, model)}
_trocr_processor = None # backwards-compat alias → base-printed
_trocr_model = None # backwards-compat alias → base-printed
_trocr_processor = None
_trocr_model = None
_trocr_available = None
_model_loaded_at = None
@@ -126,14 +124,12 @@ def _check_trocr_available() -> bool:
return _trocr_available
def get_trocr_model(handwritten: bool = False, size: str = "base"):
def get_trocr_model(handwritten: bool = False):
"""
Lazy load TrOCR model and processor.
Args:
handwritten: Use handwritten model instead of printed model
size: Model size — "base" (300 MB) or "large" (340 MB, higher accuracy
for exam HTR). Only applies to handwritten variant.
Returns tuple of (processor, model) or (None, None) if unavailable.
"""
@@ -142,42 +138,31 @@ def get_trocr_model(handwritten: bool = False, size: str = "base"):
if not _check_trocr_available():
return None, None
# Select model name
if size == "large" and handwritten:
model_name = "microsoft/trocr-large-handwritten"
elif handwritten:
model_name = "microsoft/trocr-base-handwritten"
else:
model_name = "microsoft/trocr-base-printed"
if _trocr_processor is None or _trocr_model is None:
try:
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
if model_name in _trocr_models:
return _trocr_models[model_name]
# Choose model based on use case
if handwritten:
model_name = "microsoft/trocr-base-handwritten"
else:
model_name = "microsoft/trocr-base-printed"
try:
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
logger.info(f"Loading TrOCR model: {model_name}")
_trocr_processor = TrOCRProcessor.from_pretrained(model_name)
_trocr_model = VisionEncoderDecoderModel.from_pretrained(model_name)
logger.info(f"Loading TrOCR model: {model_name}")
processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name)
# Use GPU if available
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
_trocr_model.to(device)
logger.info(f"TrOCR model loaded on device: {device}")
# Use GPU if available
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
model.to(device)
logger.info(f"TrOCR model loaded on device: {device}")
except Exception as e:
logger.error(f"Failed to load TrOCR model: {e}")
return None, None
_trocr_models[model_name] = (processor, model)
# Keep backwards-compat globals pointing at base-printed
if model_name == "microsoft/trocr-base-printed":
_trocr_processor = processor
_trocr_model = model
return processor, model
except Exception as e:
logger.error(f"Failed to load TrOCR model {model_name}: {e}")
return None, None
return _trocr_processor, _trocr_model
def preload_trocr_model(handwritten: bool = True) -> bool:
@@ -224,8 +209,7 @@ def get_model_status() -> Dict[str, Any]:
async def run_trocr_ocr(
image_data: bytes,
handwritten: bool = False,
split_lines: bool = True,
size: str = "base",
split_lines: bool = True
) -> Tuple[Optional[str], float]:
"""
Run TrOCR on an image.
@@ -239,12 +223,11 @@ async def run_trocr_ocr(
image_data: Raw image bytes
handwritten: Use handwritten model (slower but better for handwriting)
split_lines: Whether to split image into lines first
size: "base" or "large" (only for handwritten variant)
Returns:
Tuple of (extracted_text, confidence)
"""
processor, model = get_trocr_model(handwritten=handwritten, size=size)
processor, model = get_trocr_model(handwritten=handwritten)
if processor is None or model is None:
logger.error("TrOCR model not available")

File diff suppressed because it is too large Load Diff

View File

@@ -65,12 +65,10 @@ nav:
- BYOEH Architektur: services/klausur-service/BYOEH-Architecture.md
- BYOEH Developer Guide: services/klausur-service/BYOEH-Developer-Guide.md
- NiBiS Pipeline: services/klausur-service/NiBiS-Ingestion-Pipeline.md
- OCR Pipeline: services/klausur-service/OCR-Pipeline.md
- OCR Labeling: services/klausur-service/OCR-Labeling-Spec.md
- OCR Vergleich: services/klausur-service/OCR-Compare.md
- RAG Admin: services/klausur-service/RAG-Admin-Spec.md
- Worksheet Editor: services/klausur-service/Worksheet-Editor-Architecture.md
- Chunk-Browser: services/klausur-service/Chunk-Browser.md
- Voice-Service:
- Uebersicht: services/voice-service/index.md
- Agent-Core:

View File

@@ -31,8 +31,8 @@ WORKDIR /app
ENV NODE_ENV=production
# Create non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
RUN addgroup -S -g 1001 nodejs
RUN adduser -S -u 1001 -G nodejs nextjs
# Copy built application
COPY --from=builder /app/public ./public

View File

@@ -34,8 +34,8 @@ WORKDIR /app
ENV NODE_ENV=production
# Create non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
RUN addgroup -S -g 1001 nodejs
RUN adduser -S -u 1001 -G nodejs nextjs
# Copy built assets
COPY --from=builder /app/public ./public