Compare commits

..

135 Commits

Author SHA1 Message Date
Benjamin Admin
37db47fcd9 [guardrail-change] Install pre-commit LOC budget hook
Some checks are pending
CI / test-go-edu-search (push) Waiting to run
CI / test-python-klausur (push) Waiting to run
CI / test-python-agent-core (push) Waiting to run
CI / test-nodejs-website (push) Waiting to run
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
- Rewrote scripts/check-loc.sh: fixed macOS compat, added --staged mode,
  optimized --all mode with find+wc pipeline
- Added .git/hooks/pre-commit that runs check-loc.sh --staged
- Extended loc-exceptions.txt with glob patterns for test files (*test*)
  and blog content pages (blog/*/page.tsx)

The hook blocks commits containing staged files >500 LOC unless exempted.
Bypass for emergencies: git commit --no-verify

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 10:28:33 +02:00
Benjamin Admin
bd4b956e3c [split-required] Split final 43 files (500-668 LOC) to complete refactoring
klausur-service (11 files):
- cv_gutter_repair, ocr_pipeline_regression, upload_api
- ocr_pipeline_sessions, smart_spell, nru_worksheet_generator
- ocr_pipeline_overlays, mail/aggregator, zeugnis_api
- cv_syllable_detect, self_rag

backend-lehrer (17 files):
- classroom_engine/suggestions, generators/quiz_generator
- worksheets_api, llm_gateway/comparison, state_engine_api
- classroom/models (→ 4 submodules), services/file_processor
- alerts_agent/api/wizard+digests+routes, content_generators/pdf
- classroom/routes/sessions, llm_gateway/inference
- classroom_engine/analytics, auth/keycloak_auth
- alerts_agent/processing/rule_engine, ai_processor/print_versions

agent-core (5 files):
- brain/memory_store, brain/knowledge_graph, brain/context_manager
- orchestrator/supervisor, sessions/session_manager

admin-lehrer (5 components):
- GridOverlay, StepGridReview, DevOpsPipelineSidebar
- DataFlowDiagram, sbom/wizard/page

website (2 files):
- DependencyMap, lehrer/abitur-archiv

Other: nibis_ingestion, grid_detection_service, export-doclayout-onnx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:41:42 +02:00
Benjamin Admin
451365a312 [split-required] Split remaining 500-680 LOC files (final batch)
website (17 pages + 3 components):
- multiplayer/wizard, middleware/wizard+test-wizard, communication
- builds/wizard, staff-search, voice, sbom/wizard
- foerderantrag, mail/tasks, tools/communication, sbom
- compliance/evidence, uni-crawler, brandbook (already done)
- CollectionsTab, IngestionTab, RiskHeatmap

backend-lehrer (5 files):
- letters_api (641 → 2), certificates_api (636 → 2)
- alerts_agent/db/models (636 → 3)
- llm_gateway/communication_service (614 → 2)
- game/database already done in prior batch

klausur-service (2 files):
- hybrid_vocab_extractor (664 → 2)
- klausur-service/frontend: api.ts (620 → 3), EHUploadWizard (591 → 2)

voice-service (3 files):
- bqas/rag_judge (618 → 3), runner (529 → 2)
- enhanced_task_orchestrator (519 → 2)

studio-v2 (6 files):
- korrektur/[klausurId] (578 → 4), fairness (569 → 2)
- AlertsWizard (552 → 2), OnboardingWizard (513 → 2)
- korrektur/api.ts (506 → 3), geo-lernwelt (501 → 2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 08:56:45 +02:00
Benjamin Admin
b4613e26f3 [split-required] Split 500-850 LOC files (batch 2)
backend-lehrer (10 files):
- game/database.py (785 → 5), correction_api.py (683 → 4)
- classroom_engine/antizipation.py (676 → 5)
- llm_gateway schools/edu_search already done in prior batch

klausur-service (12 files):
- orientation_crop_api.py (694 → 5), pdf_export.py (677 → 4)
- zeugnis_crawler.py (676 → 5), grid_editor_api.py (671 → 5)
- eh_templates.py (658 → 5), mail/api.py (651 → 5)
- qdrant_service.py (638 → 5), training_api.py (625 → 4)

website (6 pages):
- middleware (696 → 8), mail (733 → 6), consent (628 → 8)
- compliance/risks (622 → 5), export (502 → 5), brandbook (629 → 7)

studio-v2 (3 components):
- B2BMigrationWizard (848 → 3), CleanupPanel (765 → 2)
- dashboard-experimental (739 → 2)

admin-lehrer (4 files):
- uebersetzungen (769 → 4), manager (670 → 2)
- ChunkBrowserQA (675 → 6), dsfa/page (674 → 5)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 08:24:01 +02:00
Benjamin Admin
34da9f4cda [split-required] Split 700-870 LOC files across all services
backend-lehrer (11 files):
- llm_gateway/routes/schools.py (867 → 5), recording_api.py (848 → 6)
- messenger_api.py (840 → 5), print_generator.py (824 → 5)
- unit_analytics_api.py (751 → 5), classroom/routes/context.py (726 → 4)
- llm_gateway/routes/edu_search_seeds.py (710 → 4)

klausur-service (12 files):
- ocr_labeling_api.py (845 → 4), metrics_db.py (833 → 4)
- legal_corpus_api.py (790 → 4), page_crop.py (758 → 3)
- mail/ai_service.py (747 → 4), github_crawler.py (767 → 3)
- trocr_service.py (730 → 4), full_compliance_pipeline.py (723 → 4)
- dsfa_rag_api.py (715 → 4), ocr_pipeline_auto.py (705 → 4)

website (6 pages):
- audit-checklist (867 → 8), content (806 → 6)
- screen-flow (790 → 4), scraper (789 → 5)
- zeugnisse (776 → 5), modules (745 → 4)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 08:01:18 +02:00
Benjamin Admin
b6983ab1dc [split-required] Split 500-1000 LOC files across all services
backend-lehrer (5 files):
- alerts_agent/db/repository.py (992 → 5), abitur_docs_api.py (956 → 3)
- teacher_dashboard_api.py (951 → 3), services/pdf_service.py (916 → 3)
- mail/mail_db.py (987 → 6)

klausur-service (5 files):
- legal_templates_ingestion.py (942 → 3), ocr_pipeline_postprocess.py (929 → 4)
- ocr_pipeline_words.py (876 → 3), ocr_pipeline_ocr_merge.py (616 → 2)
- KorrekturPage.tsx (956 → 6)

website (5 pages):
- mail (985 → 9), edu-search (958 → 8), mac-mini (950 → 7)
- ocr-labeling (946 → 7), audit-workspace (871 → 4)

studio-v2 (5 files + 1 deleted):
- page.tsx (946 → 5), MessagesContext.tsx (925 → 4)
- korrektur (914 → 6), worksheet-cleanup (899 → 6)
- useVocabWorksheet.ts (888 → 3)
- Deleted dead page-original.tsx (934 LOC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:35:37 +02:00
Benjamin Admin
6811264756 [split-required] Split final batch of monoliths >1000 LOC
Python (6 files in klausur-service):
- rbac.py (1,132 → 4), admin_api.py (1,012 → 4)
- routes/eh.py (1,111 → 4), ocr_pipeline_geometry.py (1,105 → 5)

Python (2 files in backend-lehrer):
- unit_api.py (1,226 → 6), game_api.py (1,129 → 5)

Website (6 page files):
- 4x klausur-korrektur pages (1,249-1,328 LOC each) → shared components
  in website/components/klausur-korrektur/ (17 shared files)
- companion (1,057 → 10), magic-help (1,017 → 8)

All re-export barrels preserve backward compatibility.
Zero import errors verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:17:30 +02:00
Benjamin Admin
b2a0126f14 [split-required] Split remaining Python monoliths (Phase 1 continued)
klausur-service (7 monoliths):
- grid_editor_helpers.py (1,737 → 5 files: columns, filters, headers, zones)
- cv_cell_grid.py (1,675 → 7 files: build, legacy, streaming, merge, vocab)
- worksheet_editor_api.py (1,305 → 4 files: models, AI, reconstruct, routes)
- legal_corpus_ingestion.py (1,280 → 3 files: registry, chunking, ingestion)
- cv_review.py (1,248 → 4 files: pipeline, spell, LLM, barrel)
- cv_preprocessing.py (1,166 → 3 files: deskew, dewarp, barrel)
- rbac.py, admin_api.py, routes/eh.py remain (next batch)

backend-lehrer (1 monolith):
- classroom_engine/repository.py (1,705 → 7 files by domain)

All re-export barrels preserve backward compatibility.
Zero import errors verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:47:59 +02:00
Benjamin Admin
0b37c5e692 [split-required] Split website + studio-v2 monoliths (Phase 3 continued)
Website (14 monoliths split):
- compliance/page.tsx (1,519 → 9), docs/audit (1,262 → 20)
- quality (1,231 → 16), alerts (1,203 → 10), docs (1,202 → 11)
- i18n.ts (1,173 → 8 language files)
- unity-bridge (1,094 → 12), backlog (1,087 → 6)
- training (1,066 → 8), rag (1,063 → 8)
- Deleted index_original.ts (4,899 LOC dead backup)

Studio-v2 (5 monoliths split):
- meet/page.tsx (1,481 → 9), messages (1,166 → 9)
- AlertsB2BContext.tsx (1,165 → 5 modules)
- alerts-b2b/page.tsx (1,019 → 6), korrektur/archiv (1,001 → 6)

All existing imports preserved. Zero new TypeScript errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 17:52:36 +02:00
Benjamin Admin
b681ddb131 [split-required] Split 58 monoliths across Python, Go, TypeScript (Phases 1-3)
Phase 1 — Python (klausur-service): 5 monoliths → 36 files
- dsfa_corpus_ingestion.py (1,828 LOC → 5 files)
- cv_ocr_engines.py (2,102 LOC → 7 files)
- cv_layout.py (3,653 LOC → 10 files)
- vocab_worksheet_api.py (2,783 LOC → 8 files)
- grid_build_core.py (1,958 LOC → 6 files)

Phase 2 — Go (edu-search-service, school-service): 8 monoliths → 19 files
- staff_crawler.go (1,402 → 4), policy/store.go (1,168 → 3)
- policy_handlers.go (700 → 2), repository.go (684 → 2)
- search.go (592 → 2), ai_extraction_handlers.go (554 → 2)
- seed_data.go (591 → 2), grade_service.go (646 → 2)

Phase 3 — TypeScript (admin-lehrer): 45 monoliths → 220+ files
- sdk/types.ts (2,108 → 16 domain files)
- ai/rag/page.tsx (2,686 → 14 files)
- 22 page.tsx files split into _components/ + _hooks/
- 11 component files split into sub-components
- 10 SDK data catalogs added to loc-exceptions
- Deleted dead backup index_original.ts (4,899 LOC)

All original public APIs preserved via re-export facades.
Zero new errors: Python imports verified, Go builds clean,
TypeScript tsc --noEmit shows only pre-existing errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 17:28:57 +02:00
Benjamin Admin
9ba420fa91 Fix: Remove broken getKlausurApiUrl and clean up empty lines
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 34s
CI / test-python-klausur (push) Failing after 2m51s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 29s
sed replacement left orphaned hostname references in story page
and empty lines in getApiBase functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 16:02:04 +02:00
Benjamin Admin
b07f802c24 Fix: Use Next.js API proxy to avoid mixed-content/CORS errors
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 54s
CI / test-go-edu-search (push) Successful in 53s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 43s
CI / test-nodejs-website (push) Successful in 46s
HTTPS pages cannot fetch from HTTP backend ports. Added Next.js
API route proxies for /api/vocabulary, /api/learning-units, /api/progress
that forward to backend-lehrer internally (same Docker network, HTTP).

All frontend pages now use same-origin requests (getApiBase = '')
instead of direct port:8001 connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 15:49:52 +02:00
Benjamin Admin
0dbfa87058 Fix: pg_trgm optional, table creation no longer fails without it
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 1m9s
CI / test-go-edu-search (push) Successful in 1m4s
CI / test-python-klausur (push) Failing after 2m59s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 28s
Trigram extension and index are now created in a separate try/catch
so table creation succeeds even without pg_trgm. Search falls back
to ILIKE when trigram functions are not available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:51:09 +02:00
Benjamin Admin
c0b723e3b5 Fix: asyncpg needs postgresql:// not postgresql+asyncpg://
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 1m1s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 42s
CI / test-nodejs-website (push) Has been cancelled
Strip SQLAlchemy dialect prefix from DATABASE_URL for asyncpg.
Set search_path via server_settings on pool creation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:45:26 +02:00
Benjamin Admin
7ff9860c69 Add Vocabulary Learning Platform (Phase 1: DB + API + Editor)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 59s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 3m7s
CI / test-python-agent-core (push) Successful in 24s
CI / test-nodejs-website (push) Successful in 31s
Strategic pivot: Studio-v2 becomes a language learning platform.
Compliance guardrail added to CLAUDE.md — no scan/OCR of third-party
content in customer frontend. Upload of OWN materials remains allowed.

Phase 1.1 — vocabulary_db.py: PostgreSQL model for 160k+ words
with english, german, IPA, syllables, examples, images, audio,
difficulty, tags, translations (multilingual). Trigram search index.

Phase 1.2 — vocabulary_api.py: Search, browse, filters, bulk import,
learning unit creation from word selection. Creates QA items with
enhanced fields (IPA, syllables, image, audio) for flashcards.

Phase 1.3 — /vocabulary page: Search bar with POS/difficulty filters,
word cards with audio buttons, unit builder sidebar. Teacher selects
words → creates learning unit → redirects to flashcards.

Sidebar: Added "Woerterbuch" (/vocabulary) and "Lernmodule" (/learn).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:36:28 +02:00
Benjamin Admin
7fc5464df7 Switch Vision-LLM Fusion to llama3.2-vision:11b
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 28s
qwen2.5vl:32b needs ~100GB RAM and crashes Ollama.
llama3.2-vision:11b is already installed and fits in memory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:44:59 +02:00
Benjamin Admin
5fbf0f4ee2 Fix: _merge_paddle_tesseract takes 2 args not 4
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 24s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:33:49 +02:00
Benjamin Admin
2f8270f77b Add Vision-LLM OCR Fusion (Step 4) for degraded scans
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 27s
New module vision_ocr_fusion.py: Sends scan image + OCR word
coordinates + document type to Qwen2.5-VL 32B. The LLM reads
the image visually while using OCR positions as structural hints.

Key features:
- Document-type-aware prompts (Vokabelseite, Woerterbuch, etc.)
- OCR words grouped into lines with x/y coordinates in prompt
- Low-confidence words marked with (?) for LLM attention
- Continuation row merging instructions in prompt
- JSON response parsing with markdown code block handling
- Fallback to original OCR on any error

Frontend (admin-lehrer Grid Review):
- "Vision-LLM" checkbox toggle
- "Typ" dropdown (Vokabelseite, Woerterbuch, etc.)
- Steps 1-3 defaults set to inactive

Activate: Check "Vision-LLM", select document type, click "OCR neu + Grid".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:24:22 +02:00
Benjamin Admin
00eb9f26f6 Add "OCR neu + Grid" button to Grid Review
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 51s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m53s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 55s
New endpoint POST /sessions/{id}/rerun-ocr-and-build-grid that:
1. Runs scan quality assessment
2. Applies CLAHE enhancement if degraded (controlled by enhance toggle)
3. Re-runs dual-engine OCR (RapidOCR + Tesseract) with min_conf filter
4. Merges OCR results and stores updated word_result
5. Builds grid with max_columns constraint

Frontend: Orange "OCR neu + Grid" button in GridToolbar.
Unlike "Neu berechnen" (which only rebuilds grid from existing words),
this button re-runs the full OCR pipeline with quality settings.

Now CLAHE toggle actually has an effect — it enhances the image
before OCR runs, not after.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:55:01 +02:00
Benjamin Admin
141f69ceaa Fix: max_columns now works in OCR Kombi build-grid pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 30s
The max_columns parameter was only implemented in cv_words_first.py
(vocab-worksheet path) but NOT in _build_grid_core which is what
the admin OCR Kombi pipeline uses. The Kombi pipeline uses
grid_editor_helpers._cluster_columns_by_alignment() which has its
own column detection.

Fix: Post-processing step 5k merges narrowest columns after grid
building when zone has more columns than max_columns. Cells from
merged columns get their text appended to the target column.

min_conf word filtering was already working (applied before grid build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:40:39 +02:00
Benjamin Admin
2baad68060 Remove A/B testing toggles from studio-v2 (customer frontend)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m50s
CI / test-python-agent-core (push) Successful in 38s
CI / test-nodejs-website (push) Successful in 43s
Dev-only toggles belong in admin-lehrer (port 3002) only.
The customer frontend runs the pipeline with optimal defaults
and shows only the finished results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:18:44 +02:00
Benjamin Admin
25e5a7415a Add A/B testing toggles to OCR Kombi Grid Review
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m27s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 24s
Quality step toggles in admin-lehrer StepGridReview (port 3002):
- CLAHE checkbox (Step 3: image enhancement)
- MaxCol dropdown (Step 2: column limit, 0=off)
- MinConf dropdown (Step 1: OCR confidence, 0=auto)

Parameters flow through: StepGridReview → useGridEditor → build-grid
endpoint → _build_grid_core. MinConf filters words before grid building.

Toggle settings, click "Neu berechnen" to test each step individually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:09:17 +02:00
Benjamin Admin
545c8676b0 Add A/B testing toggles for OCR quality steps
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 26s
CI / test-nodejs-website (push) Successful in 18s
Each quality improvement step can now be toggled independently:
- CLAHE checkbox (Step 3: image enhancement on/off)
- MaxCols dropdown (Step 2: 0=unlimited, 2-5)
- MinConf dropdown (Step 1: auto/20/30/40/50/60)

Backend: Query params enhance, max_cols, min_conf on process-single-page.
Response includes active_steps dict showing which steps are enabled.
Frontend: Toggle controls in VocabularyTab above the table.

This allows empirical A/B testing of each step on the same scan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 15:27:26 +02:00
Benjamin Admin
2f34ee9ede Add scan quality scoring, column limit, image enhancement (Steps 1-3)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 20s
Step 1: scan_quality.py — Laplacian blur + contrast scoring, adjusts
OCR confidence threshold (40 for good scans, 30 for degraded).
Quality report included in API response + shown in frontend.

Step 2: max_columns parameter in cv_words_first.py — limits column
detection to 3 for vocab tables, preventing phantom columns D/E
from degraded OCR fragments.

Step 3: ocr_image_enhance.py — CLAHE contrast + bilateral filter
denoising + unsharp mask, only for degraded scans (gated by
quality score). Pattern from handwriting_htr_api.py.

Frontend: quality info shown in extraction status after processing.
Reprocess button now derives pages from vocabulary data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 14:58:39 +02:00
Benjamin Admin
5a154b744d fix: migrate ocr-pipeline types to ocr-kombi after page deletion
Types from deleted ocr-pipeline/types.ts inlined into ocr-kombi/types.ts.
All imports updated across components/ocr-kombi/ and components/ocr-pipeline/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 14:22:09 +02:00
Benjamin Admin
f39cbe9283 refactor: remove unused pages and backends (model-management, OCR legacy, GPU/vast.ai, video-chat, matrix)
Deleted pages:
- /ai/model-management (mock data only, no real backend)
- /ai/ocr-compare (old /vocab/ backend, replaced by ocr-kombi)
- /ai/ocr-pipeline (minimal session browser, redundant)
- /ai/ocr-overlay (legacy monolith, redundant)
- /ai/gpu (vast.ai GPU management, no longer used)
- /infrastructure/gpu (same)
- /communication/video-chat (moved to core)
- /communication/matrix (moved to core)

Deleted backends:
- backend-lehrer/infra/vast_client.py + vast_power.py
- backend-lehrer/meetings_api.py + jitsi_api.py
- website/app/api/admin/gpu/
- edu-search-service/scripts/vast_ai_extractor.py

Total: ~7,800 LOC removed. All code preserved in git history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 13:14:12 +02:00
Benjamin Admin
5abdfa202e chore: install refactoring guardrails (Phase 0) [guardrail-change]
- scripts/check-loc.sh: LOC budget checker (500 LOC hard cap)
- .claude/rules/architecture.md: split triggers, patterns per language
- .claude/rules/loc-exceptions.txt: documented escape hatches
- AGENTS.python.md: FastAPI conventions (routes thin, service layer)
- AGENTS.go.md: Go/Gin conventions (handler ≤40 LOC)
- AGENTS.typescript.md: Next.js conventions (page.tsx ≤250 LOC, colocation)
- CLAUDE.md extended with guardrail section + commit markers

273 files currently exceed 500 LOC — to be addressed phase by phase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 12:25:36 +02:00
Benjamin Admin
9b0e310978 Fix: reprocess button works after session resume + apply merge logic
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 34s
Two bugs fixed:
1. reprocessPages() failed silently after session resume because
   successfulPages was empty. Now derives pages from vocabulary
   source_page or selectedPages as fallback.

2. process-single-page endpoint built vocabulary entries WITHOUT
   applying merge logic (_merge_wrapped_rows, _merge_continuation_rows).
   Now applies full merge pipeline after vocabulary extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 00:46:15 +02:00
Benjamin Admin
46c2acb2f4 Add "Neu verarbeiten" button to VocabularyTab
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 53s
CI / test-go-edu-search (push) Successful in 53s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 1m3s
CI / test-nodejs-website (push) Successful in 36s
Allows reprocessing pages from the vocabulary view to apply
new merge logic without navigating back to page selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:37:13 +02:00
Benjamin Admin
b8f1b71652 Fix: merge cell-wrap continuation rows in vocabulary extraction
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 58s
CI / test-go-edu-search (push) Successful in 48s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has started running
When textbook authors wrap text within a cell (e.g. long German
translations), OCR treats each physical line as a separate row.
New _merge_wrapped_rows() detects this by checking if the primary
column (EN) is empty — indicating a continuation, not a new entry.

Handles: empty EN + DE text, empty EN + example text, parenthetical
continuations like "(bei)", triple wraps, comma-separated lists.

12 tests added covering all cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:32:45 +02:00
Benjamin Admin
6a165b36e5 Add Phase 5.1: LearningProgress dashboard widget
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 51s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 41s
CI / test-nodejs-website (push) Successful in 32s
Eltern-Dashboard widget showing per-unit learning stats:
accuracy ring, coins, crowns, streak, and recent unit list.
Uses ProgressRing and CrownBadge gamification components.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:26:44 +02:00
Benjamin Admin
9dddd80d7a Add Phases 3.2-4.3: STT, stories, syllables, gamification
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has started running
Phase 3.2 — MicrophoneInput.tsx: Browser Web Speech API for
speech-to-text recognition (EN+DE), integrated for pronunciation practice.

Phase 4.1 — Story Generator: LLM-powered mini-stories using vocabulary
words, with highlighted vocab in HTML output. Backend endpoint
POST /learning-units/{id}/generate-story + frontend /learn/[unitId]/story.

Phase 4.2 — SyllableBow.tsx: SVG arc component for syllable visualization
under words, clickable for per-syllable TTS.

Phase 4.3 — Gamification system:
- CoinAnimation.tsx: Floating coin rewards with accumulator
- CrownBadge.tsx: Crown/medal display for milestones
- ProgressRing.tsx: Circular progress indicator
- progress_api.py: Backend tracking coins, crowns, streaks per unit

Also adds "Geschichte" exercise type button to UnitCard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:22:52 +02:00
Benjamin Admin
20a0585eb1 Add interactive learning modules MVP (Phases 1-3.1)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 51s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 34s
New feature: After OCR vocabulary extraction, users can generate interactive
learning modules (flashcards, quiz, type trainer) with one click.

Frontend (studio-v2):
- Fortune Sheet spreadsheet editor tab in vocab-worksheet
- "Lernmodule generieren" button in ExportTab
- /learn page with unit overview and exercise type cards
- /learn/[unitId]/flashcards — Flip-card trainer with Leitner spaced repetition
- /learn/[unitId]/quiz — Multiple choice quiz with explanations
- /learn/[unitId]/type — Type-in trainer with Levenshtein distance feedback
- AudioButton component using Web Speech API for EN+DE TTS

Backend (klausur-service):
- vocab_learn_bridge.py: Converts VocabularyEntry[] to analysis_data format
- POST /sessions/{id}/generate-learning-unit endpoint

Backend (backend-lehrer):
- generate-qa, generate-mc, generate-cloze endpoints on learning units
- get-qa/mc/cloze data retrieval endpoints
- Leitner progress update + next review items endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:13:23 +02:00
Benjamin Admin
4561320e0d Fix SmartSpellChecker: preserve leading non-alpha text like (=
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s
The tokenizer regex only matches alphabetic characters, so text
before the first word match (like "(= " in "(= I won...") was
silently dropped when reassembling the corrected text.

Now preserves text[:first_match_start] as a leading prefix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:41:33 +02:00
Benjamin Admin
596864431b Rule (a2): switch from allow-list to block-list for symbol removal
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 36s
Instead of keeping only specific symbols (_KEEP_SYMBOLS), now only
removes explicitly decorative symbols (_REMOVE_SYMBOLS: > < ~ \ ^ etc).
All other punctuation (= ( ) ; : - etc.) is preserved by default.

This is more robust: any new symbol used in textbooks will be kept
unless it's in the small block-list of known decorative artifacts.

Fixes: (= token still being removed on page 5 despite being in
the allow-list (possibly due to Unicode variants or whitespace).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:34:21 +02:00
Benjamin Admin
c8027eb7f9 Fix: preserve = ; : - and other meaningful symbols in word_boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m38s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 18s
Rule (a2) in Step 5i removed word_boxes with no letters/digits as
"graphic OCR artifacts". This incorrectly removed = signs used as
definition markers in textbooks ("film = 1. Film; 2. filmen").

Added exception list _KEEP_SYMBOLS for meaningful punctuation:
= (= =) ; : - – — / + • · ( ) & * → ← ↔

The root cause: PaddleOCR returns "film = 1. Film; 2. filmen" as one
block, which gets split into word_boxes ["film", "=", "1.", ...].
The "=" word_box had no alphanumeric chars and was removed as artifact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:18:35 +02:00
Benjamin Admin
ba0f659d1e Preserve = and (= tokens in grid build and cell text cleanup
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 42s
= signs are used as definition markers in textbooks ("film = 1. Film").
They were incorrectly removed by two filters:

1. grid_build_core.py Step 5j-pre: _PURE_JUNK_RE matched "=" as
   artifact noise. Now exempts =, (=, ;, :, - and similar meaningful
   punctuation tokens.

2. cv_ocr_engines.py _is_noise_tail_token: "pure non-alpha" check
   removed trailing = tokens. Now exempts meaningful punctuation.

Fixes: "film = 1. Film; 2. filmen" losing the = sign,
       "(= I won and he lost.)" losing the (=.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:04:27 +02:00
Benjamin Admin
50bfd6e902 Fix gutter repair: don't suggest corrections for words with parentheses
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 50s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 40s
CI / test-nodejs-website (push) Successful in 31s
Words like "probieren)" or "Englisch)" were incorrectly flagged as
gutter OCR errors because the closing parenthesis wasn't stripped
before dictionary lookup. The spellchecker then suggested "probierend"
(replacing ) with d, edit distance 1).

Two fixes:
1. Strip trailing/leading parentheses in _try_spell_fix before checking
   if the bare word is valid — skip correction if it is
2. Add )( to the rstrip characters in the analysis phase so
   "probieren)" becomes "probieren" for the known-word check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:38:22 +02:00
Benjamin Admin
0599c72cc1 Fix IPA continuation: don't replace normal text with IPA
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 19s
Text like "Betonung auf der 1. Silbe: profit ['profit]" was
incorrectly detected as garbled IPA and replaced with generated
IPA transcription of the previous row's example sentence.

Added guard: if the cell text contains >=3 recognizable words
(3+ letter alpha tokens), it's normal text, not garbled IPA.
Garbled IPA is typically short and has no real dictionary words.

Fixes: Row 13 C3 showing IPA instead of pronunciation hint text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:28:58 +02:00
Benjamin Admin
5fad2d420d test+docs(rag): Tests und Entwicklerdoku fuer RAG Landkarte
- 44 Vitest-Tests: JSON-Struktur, Branchen-Zuordnung, Applicability
  Notes, Dokumenttyp-Verteilung, keine Duplikate
- MkDocs-Seite: Architektur, 10 Branchen, Zuordnungslogik,
  Integration in andere Projekte, Datenquellen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:47:54 +02:00
Benjamin Admin
c8e5e498b5 feat(rag): Applicability Notes UI + Branchen-Review
- Matrix-Zeilen aufklappbar: Klick zeigt Branchenrelevanz-Erklaerung,
  Beschreibung und Gueltigkeitsdatum
- 27 Branchen-Zuordnungen korrigiert:
  - OWASP/NIST/CISA/SBOM-Standards → alle (Kunden entwickeln Software)
  - BSI-TR-03161 → leer (DiGA, nicht Zielmarkt)
  - BSI 200-4, ENISA Supply Chain → alle (CRA/NIS2-Pflicht)
  - EAA/BFSG → +automotive (digitale Interfaces)
- 264 horizontal, 42 sektorspezifisch, 14 nicht zutreffend

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:15:01 +02:00
Benjamin Admin
261f686dac Add OCR Pipeline Extensions developer docs + update vocab-worksheet docs
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 26s
CI / test-nodejs-website (push) Successful in 40s
New: .claude/rules/ocr-pipeline-extensions.md
- Complete documentation for SmartSpellChecker, Box-Grid-Review (Step 11),
  Ansicht/Spreadsheet (Step 12), Unified Grid
- All 14 pipeline steps listed
- Backend/frontend file structure with line counts
- 66 tests documented
- API endpoints, data flow, formatting rules

Updated: .claude/rules/vocab-worksheet.md
- Added Frontend Refactoring section (page.tsx → 14 files)
- Updated format extension instructions (constants.ts instead of page.tsx)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:35:16 +02:00
Benjamin Admin
3d3c2b30db Add tests for unified_grid and cv_box_layout
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m30s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 34s
test_unified_grid.py (10 tests):
- Dominant row height calculation (regular, gaps filtered, single row)
- Box classification (full-width, partial left/right, text line count)
- Unified grid building (content-only, box integration, cell tagging)

test_box_layout.py (13 tests):
- Layout classification (header_only, flowing, bullet_list)
- Line grouping by y-proximity
- Flowing layout indent grouping (bullet + continuations → \n)
- Row/column field completeness for GridTable compatibility

Total: 66 tests passing (43 smart_spell + 13 box_layout + 10 unified)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:18:52 +02:00
Benjamin Admin
1d22f649ae fix(rag): Branchen auf 10 VDMA/VDA/BDI-Sektoren korrigiert
Alte 17 "Branchen" (inkl. IoT, KI, HR, KRITIS) durch 10 echte
Industriesektoren ersetzt: Automotive, Maschinenbau, Elektrotechnik,
Chemie, Metall, Energie, Transport, Handel, Konsumgueter, Bau.

Zuordnungslogik: 244 horizontal (alle), 65 sektorspezifisch,
11 nicht zutreffend (Finanz/Medizin/Plattformen).
102 applicability_notes mit Begruendung pro Regulierung.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:56:28 +02:00
Benjamin Admin
610825ac14 SpreadsheetView: add bullet marker (•) for multi-line cells
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 38s
Multi-line cells (containing \n) that don't already start with a
bullet character get • prepended in the frontend. This ensures
bullet points are visible regardless of whether the backend inserted
them (depends on when boxes were last rebuilt).

Skips header rows and cells that already have •, -, or – prefix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:53:54 +02:00
Benjamin Admin
6aec4742e5 SpreadsheetView: keep bullets as single cells with text-wrap
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 31s
Revert row expansion — multi-line bullet cells stay as single cells
with \n and text-wrap (tb='2'). This way the text reflows when the
user resizes the column, like normal Excel behavior.

Row height auto-scales by line count (24px * lines).
Vertical alignment: top (vt=0) for multi-line cells.
Removed leading-space indentation hack (didn't work reliably).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:07:07 +02:00
Benjamin Admin
0491c2eb84 feat(rag): dynamische Branchen-Regulierungs-Matrix aus JSON
Hardcodierte REGULATIONS/INDUSTRIES/INDUSTRY_REGULATION_MAP durch
JSON-Import ersetzt. 320 Dokumente in 17 Kategorien mit collapsible
Sektionen pro doc_type. page.tsx von 3672 auf 2655 Zeilen reduziert.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:01:51 +02:00
Benjamin Admin
f2bc62b4f5 SpreadsheetView: bullet indentation, expanded rows, box borders
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 1m4s
Multi-line cells (\n): expanded into separate rows so each line gets
its own cell. Continuation lines (after •) indented with leading spaces.
Bullet marker lines (•) are bold.

Font-size detection: cells with word_box height >1.3x median get bold
and larger font (fs=12) for box titles.

Headers: is_header rows always bold with light background tint.

Box borders: thick colored outside border + thin inner grid lines.
Content zone: light gray grid borders.

Auto-fit column widths from longest text per column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:15:43 +02:00
Benjamin Admin
674c9e949e SpreadsheetView: auto-fit column widths to longest text
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 22s
CI / test-go-edu-search (push) Failing after 23s
CI / test-python-klausur (push) Failing after 11s
CI / test-python-agent-core (push) Failing after 8s
CI / test-nodejs-website (push) Failing after 24s
Column widths now calculated from the longest text in each column
(~7.5px per character + padding). Takes the maximum of auto-fit
width and scaled original pixel width.

Multi-line cells: uses the longest line for width calculation.
Spanning header cells excluded from width calculation (they span
multiple columns and would inflate single-column widths).

Minimum column width: 60px.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:43:50 +02:00
Benjamin Admin
e131aa719e SpreadsheetView: formatting improvements for Excel-like display
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 21s
CI / test-go-edu-search (push) Failing after 19s
CI / test-python-klausur (push) Failing after 11s
CI / test-python-agent-core (push) Failing after 10s
CI / test-nodejs-website (push) Failing after 23s
Height: sheet height auto-calculated from row count (26px/row + toolbar),
no more cutoff at 21 rows. Row count set to exact (no padding).

Box borders: thick colored outside border + thin inner grid lines.
Content zone: light gray grid lines on all cells.

Headers: bold (bl=1) for is_header rows. Larger font detected via
word_box height comparison (>1.3x median → fs=12 + bold).

Box cells: light tinted background from box_bg_hex.
Header cells in boxes: slightly stronger tint.

Multi-line cells: text wrap enabled (tb='2'), \n preserved.
Bullet points (•) and indentation preserved in cell text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:29:50 +02:00
Benjamin Admin
17f0fdb2ed Refactor: extract _build_grid_core into grid_build_core.py + clean StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 19s
CI / test-go-edu-search (push) Failing after 23s
CI / test-python-klausur (push) Failing after 10s
CI / test-python-agent-core (push) Failing after 9s
CI / test-nodejs-website (push) Failing after 26s
grid_editor_api.py: 2411 → 474 lines
- Extracted _build_grid_core() (1892 lines) into grid_build_core.py
- API file now only contains endpoints (build, save, get, gutter, box, unified)

StepAnsicht.tsx: 212 → 112 lines
- Removed useGridEditor imports (not needed for read-only spreadsheet)
- Removed unified grid fetch/build (not used with multi-sheet approach)
- Removed Spreadsheet/Grid toggle (only spreadsheet mode now)
- Simple: fetch grid-editor data → pass to SpreadsheetView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:54:55 +02:00
Benjamin Admin
d4353d76fb SpreadsheetView: multi-sheet tabs instead of unified single sheet
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 31s
Each zone becomes its own Excel sheet tab with independent column widths:
- Sheet "Vokabeln": main content zone with EN/DE/example columns
- Sheet "Pounds and euros": Box 1 with its own 4-column layout
- Sheet "German leihen": Box 2 with single column for flowing text

This solves the column-width conflict: boxes have different column
widths optimized for their content, which is impossible in a single
unified sheet (Excel limitation: column width is per-column, not per-cell).

Sheet tabs visible at bottom (showSheetTabs: true).
Box sheets get colored tab (from box_bg_hex).
First sheet active by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:51:21 +02:00
Benjamin Admin
b42f394833 Integrate Fortune Sheet spreadsheet editor in StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m40s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 33s
Install @fortune-sheet/react (MIT, v1.0.4) as Excel-like spreadsheet
component. New SpreadsheetView.tsx converts unified grid data to
Fortune Sheet format (celldata, merge config, column/row sizes).

StepAnsicht now has Spreadsheet/Grid toggle:
- Spreadsheet mode: full Fortune Sheet with toolbar (bold, italic,
  color, borders, merge cells, text wrap, undo/redo)
- Grid mode: existing GridTable for quick editing

Box-origin cells get light tinted background in spreadsheet view.
Colspan cells converted to Fortune Sheet merge format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:08:03 +02:00
Benjamin Admin
c1a903537b Unified Grid: merge all zones into single Excel-like grid
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 33s
Backend (unified_grid.py):
- build_unified_grid(): merges content + box zones into one zone
- Dominant row height from median of content row spacings
- Full-width boxes: rows integrated directly
- Partial-width boxes: extra rows inserted when box has more text
  lines than standard rows fit (e.g., 7 lines in 5-row height)
- Box-origin cells tagged with source_zone_type + box_region metadata

Backend (grid_editor_api.py):
- POST /sessions/{id}/build-unified-grid → persists as unified_grid_result
- GET /sessions/{id}/unified-grid → retrieve persisted result

Frontend:
- GridEditorCell: added source_zone_type, box_region fields
- GridTable: box-origin cells get tinted background + left border
- StepAnsicht: split-view with original image (left) + editable
  unified GridTable (right). Auto-builds on first load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:37:55 +02:00
Benjamin Admin
7085c87618 StepAnsicht: dominant row height for content + proportional box rows
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 31s
Content sections: use dominant (median) row height from all content
rows instead of per-section average. This ensures uniform row height
above and below boxes (the standard case on textbook pages).

Box sections: distribute height proportionally by text line count
per row. A header (1 line) gets 1/7 of box height, a bullet with
3 lines gets 3/7. Fixes Box 2 where row 3 was cut off because
even distribution didn't account for multi-line cells.

Removed overflow:hidden from box container to prevent clipping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:43:02 +02:00
Benjamin Admin
1b7e095176 StepAnsicht: fix row filtering for partial-width boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Content rows were incorrectly filtered out when their Y overlapped
with a box, even if the box only covered the right half of the page.
Now checks both Y AND X overlap — rows are only excluded if they
start within the box's horizontal range.

Fixes: rows next to Box 2 (lend, coconut, taste) were missing from
reconstruction because Box 2 (x=871, w=525) only covers the right
side, but left-side content rows at x≈148 were being filtered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:00:28 +02:00
Benjamin Admin
dcb873db35 StepAnsicht: section-based layout with averaged row heights
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 40s
Major rewrite of reconstruction rendering:
- Page split into vertical sections (content/box) around box boundaries
- Content sections: uniform row height = (last_row - first_row) / (n-1)
- Box sections: rows evenly distributed within box height
- Content rows positioned absolutely at original y-coordinates
- Font size derived from row height (55% of row height)
- Multi-line cells (bullets) get expanded height with indentation
- Boxes render at exact bbox position with colored border
- Preparation for unified grid where boxes become part of main grid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:29:40 +02:00
Benjamin Admin
fd39d13d06 StepAnsicht: use server-rendered OCR overlay image
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m38s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 24s
Replace manual word_box positioning (wild/unsnapped) with the
server-rendered words-overlay image from the OCR step endpoint.
This shows the same cleanly snapped red letters as the OCR step.

Endpoint: /sessions/{id}/image/words-overlay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:26:54 +02:00
Benjamin Admin
c5733a171b StepAnsicht: fix font size and row spacing to match original
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
- Font: use font_size_suggestion_px * scale directly (removed 0.85 factor)
- Row height: calculate from row-to-row spacing (y_min of next row
  minus y_min of current row) instead of text height (y_max - y_min).
  This produces correct line spacing matching the original layout.
- Multi-line cells: height multiplied by line count

Content zone should now span from ~250 to ~2050 matching the original.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:24:27 +02:00
Benjamin Admin
18213f0bde StepAnsicht: split-view with coordinate grid for comparison
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Left panel: Original scan + OCR word overlay (red text at exact
word_box positions) + coordinate grid
Right panel: Reconstructed layout + same coordinate grid

Features:
- Coordinate grid toggle with 50/100/200px spacing options
- Grid lines labeled with pixel coordinates in original image space
- Both panels share the same scale for direct visual comparison
- OCR overlay shows detected text in red mono font at original positions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:00:22 +02:00
Benjamin Admin
cd8eb6ce46 Add Ansicht step (Step 12) — read-only page layout preview
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 36s
New pipeline step showing the reconstructed page with all zones
positioned at their original coordinates:
- Content zones with vocabulary grid cells
- Box zones with colored borders (from structure detection)
- Colspan cells rendered across multiple columns
- Multi-line cells (bullets) with pre-wrap whitespace
- Toggle to overlay original scan image at 15% opacity
- Proportionally scaled to viewport width
- Pure CSS positioning (no canvas/Fabric.js)

Pipeline: 14 steps (0-13), Ground Truth moved to Step 13.
Added colspan field to GridEditorCell type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:42:33 +02:00
Benjamin Admin
2c2bdf903a Fix GridTable: replace ternary chain with IIFE for cell rendering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 31s
Chained ternary (colored ? div : multiline ? textarea : input) caused
webpack SWC parser issues. Replaced with IIFE {(() => { if/return })()}
which is more robust and readable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:10:22 +02:00
Benjamin Admin
947ff6bdcb Fix JSX ternary nesting for textarea/input in GridTable
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m32s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 28s
Remove extra curly braces around the textarea/input ternary that
caused webpack syntax error. The ternary is now a chained condition:
hasColoredWords ? <div> : text.includes('\n') ? <textarea> : <input>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:02:22 +02:00
Benjamin Admin
92e4021898 Fix GridTable JSX syntax error in colspan rendering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 39s
Mismatched closing tags from previous colspan edit caused webpack
build failure. Cleaned up spanning cell map() return structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:52:26 +02:00
Benjamin Admin
108f1b1a2a GridTable: render multi-line cells with textarea
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m53s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 34s
Cells containing \n (bullet items with continuation lines) now use
<textarea> instead of <input type=text>, making all lines visible.
Row height auto-expands based on line count in the cell.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:17:29 +02:00
Benjamin Admin
48de4d98cd Fix infinite loop in StepBoxGridReview auto-build
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m41s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Auto-build was triggering on every grid.zones.length change, which
happens on every rebuild (zone indices increment). Now uses a ref
to ensure auto-build fires only once. Also removed boxZones.length===0
condition that could trigger unnecessary builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:06:11 +02:00
Benjamin Admin
b5900f1aff Bullet indentation detection: group continuation lines into bullets
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 34s
Flowing/bullet_list layout now analyzes left-edge indentation:
- Lines at minimum indent = bullet start / main level
- Lines indented >15px more = continuation (belongs to previous bullet)
- Continuation lines merged with \n into parent bullet cell
- Missing bullet markers (•) auto-added when pattern is clear

Example: 7 OCR lines → 3 items (1 header + 2 bullets × 3 lines each)
"German leihen" header, then two bullet groups with indented examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:57:16 +02:00
Benjamin Admin
baac98f837 Filter false-positive boxes in header/footer margins
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 55s
CI / test-go-edu-search (push) Successful in 1m0s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 27s
Boxes whose vertical center falls within top/bottom 7% of image
height are filtered out (page numbers, unit headers, running footers).
At typical scan resolutions, 7% ≈ 2.5cm margin.

Fixes: "Box 1" containing just "3" from "Unit 3" page header being
incorrectly treated as an embedded box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:38:53 +02:00
Benjamin Admin
496d34d822 Fix box empty rows: add x_min_px/x_max_px to flowing/header columns
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 55s
CI / test-go-edu-search (push) Successful in 51s
CI / test-python-klausur (push) Failing after 2m7s
CI / test-python-agent-core (push) Successful in 26s
CI / test-nodejs-website (push) Successful in 31s
GridTable calculates column widths from col.x_max_px - col.x_min_px.
Flowing and header_only layouts were missing these fields, producing
NaN widths which collapsed the CSS grid layout and showed empty rows
with only row numbers visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 13:01:11 +02:00
Benjamin Admin
709e41e050 GridTable: support partial colspan (2-of-4 columns)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m16s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 31s
Previously GridTable only supported full-row spanning (one cell across
all columns). Now renders each spanning_header cell with its actual
colspan, positioned at the correct grid column. This allows rows like
"In Britain..." (colspan=2) + "In Germany..." (colspan=2) to render
side by side instead of only showing the first cell.

Also fix box row fields: is_header always set (was undefined for
flowing/bullet_list), y_min_px/y_max_px for header_only rows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:47:14 +02:00
Benjamin Admin
7b3e8c576d Fix NameError: span_cells removed but still referenced in log
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 51s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 39s
CI / test-nodejs-website (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:20:11 +02:00
Benjamin Admin
868f99f109 Fix colspan text + box row fields for GridTable compatibility
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-python-agent-core (push) Successful in 42s
CI / test-nodejs-website (push) Successful in 33s
Colspan: use original word-block text instead of split cell texts.
Prevents "euros a nd cents" from split_cross_column_words.

Box rows: add is_header field (was undefined, causing GridTable
rendering issues). Add y_min_px/y_max_px to header_only rows.
These missing fields caused empty rows with only row numbers visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:08:49 +02:00
Benjamin Admin
dc25f243a4 Fix colspan: use original words before split_cross_column_words
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
_split_cross_column_words was destroying the colspan information by
cutting word-blocks at column boundaries BEFORE _detect_colspan_cells
could analyze them. Now passes original (pre-split) words to colspan
detection while using split words for cell building.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:58:32 +02:00
Benjamin Admin
c62ff7cd31 Generic colspan detection for merged cells in grids and boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m45s
CI / test-python-agent-core (push) Successful in 38s
CI / test-nodejs-website (push) Successful in 34s
New _detect_colspan_cells() in grid_editor_helpers.py:
- Runs after _build_cells() for every zone (content + box)
- Detects word-blocks that extend across column boundaries
- Merges affected cells into spanning_header with colspan=N
- Uses column midpoints to determine which columns are covered
- Works for full-page scans and box zones equally

Also fixes box flowing/bullet_list row height fields (y_min_px/y_max_px).

Removed duplicate spanning logic from cv_box_layout.py — now uses
the generic _detect_colspan_cells from grid_editor_helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:38:03 +02:00
Benjamin Admin
5d91698c3b Fix box grid: row height fields + spanning cell detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 37s
Box 3 empty rows: flowing/bullet_list rows were missing y_min_px/
y_max_px fields that GridTable uses for row height calculation.
Added _px and _pct variants.

Box 2 spanning cells: rows with fewer word-blocks than columns
(e.g., "In Britain..." spanning 2 columns) are now detected and
merged into spanning_header cells. GridTable already renders
spanning_header cells across the full row width.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:46:43 +02:00
Benjamin Admin
5fa5767c9a Fix box column detection: use low gap_threshold for small zones
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m48s
CI / test-python-agent-core (push) Successful in 38s
CI / test-nodejs-website (push) Successful in 30s
PaddleOCR returns multi-word blocks (whole phrases), so ALL inter-word
gaps in small zones (boxes, ≤60 words) are column boundaries. Previous
3x-median approach produced thresholds too high to detect real columns.

New approach for small zones: gap_threshold = max(median_h * 1.0, 25).
This correctly detects 4 columns in "Pounds and euros" box where gaps
range from 50-297px and word height is ~31px.

Also includes SmartSpellChecker fixes from previous commits:
- Frequency-based scoring, IPA protection, slash→l, rare-word threshold

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:55:29 +02:00
Benjamin Admin
693803fb7c SmartSpellChecker: frequency scoring, IPA protection, slash→l fix
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m55s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 31s
Major improvements:
- Frequency-based boundary repair: always tries repair, uses word
  frequency product to decide (Pound sand→Pounds and: 2000x better)
- IPA bracket protection: words inside [brackets] are never modified,
  even when brackets land in tokenizer separators
- Slash→l substitution: "p/" → "pl" for italic l misread as slash
- Abbreviation guard uses rare-word threshold (freq < 1e-6) instead
  of binary known/unknown — prevents "Can I" → "Ca nI" while still
  fixing "ats th." → "at sth."
- Tokenizer includes / character for slash-word detection

43 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:36:39 +02:00
Benjamin Admin
31089df36f SmartSpellChecker: frequency-based boundary repair for valid word pairs
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Previously, boundary repair was skipped when both words were valid
dictionary words (e.g., "Pound sand", "wit hit", "done euro").
Now uses word-frequency scoring (product of bigram frequencies) to
decide if the repair produces a more common word pair.

Threshold: repair accepted when new pair is >5x more frequent, or
when repair produces a known abbreviation.

New fixes: Pound sand→Pounds and (2000x), wit hit→with it (100000x),
done euro→one euro (7x).

43 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:00:22 +02:00
Benjamin Admin
7b294f9150 Cap gap_threshold at 25% of zone_w for column detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 52s
CI / test-python-klausur (push) Failing after 2m51s
CI / test-python-agent-core (push) Successful in 40s
CI / test-nodejs-website (push) Successful in 34s
In small zones (boxes), intra-phrase gaps inflate the median gap,
causing gap_threshold to become too large to detect real column
boundaries. Cap at 25% of zone width to prevent this.

Example: Box "Pounds and euros" has 4 columns at x≈148,534,751,1137
but gap_threshold was 531 (larger than the column gaps themselves).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:58:15 +02:00
Benjamin Admin
8b29d20940 StepBoxGridReview: show box border color from structure detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m46s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Use box_bg_hex for border color (from Step 7 structure detection)
- Numbered color badges per box
- Show color name in box header
- Add box_bg_color/box_bg_hex to GridZone type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:18:36 +02:00
Benjamin Admin
12b194ad1a Fix StepBoxGridReview: match GridTable props interface
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m50s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 38s
GridTable expects zone (singular), onSelectCell, onCellTextChange,
onToggleColumnBold, onToggleRowHeader, onNavigate — not the
incorrect prop names from the first version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:39:38 +02:00
Benjamin Admin
058eadb0e4 Fix build-box-grids: use structure_result boxes + raw OCR words
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 48s
CI / test-go-edu-search (push) Successful in 44s
CI / test-python-klausur (push) Failing after 2m47s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 36s
- Source boxes from structure_result (Step 7) instead of grid zones
- Use raw_paddle_words (top/left/width/height) instead of grid cells
- Create new box zones from all detected boxes (not just existing zones)
- Sort zones by y-position for correct reading order
- Include box background color metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:50:28 +02:00
Benjamin Admin
5da9a550bf Add Box-Grid-Review step (Step 11) to OCR pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m52s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 37s
New pipeline step between Gutter Repair and Ground Truth that processes
embedded boxes (grammar tips, exercises) independently from the main grid.

Backend:
- cv_box_layout.py: classify_box_layout() detects flowing/columnar/
  bullet_list/header_only layout types per box
- build_box_zone_grid(): layout-aware grid building (single-column for
  flowing text, independent columns for tabular content)
- POST /sessions/{id}/build-box-grids endpoint with SmartSpellChecker
- Layout type overridable per box via request body

Frontend:
- StepBoxGridReview.tsx: shows each box with cropped image + editable
  GridTable. Layout type dropdown per box. Auto-builds on first load.
- Auto-skip when no boxes detected on page
- Pipeline steps updated: 13 steps (0-12), Ground Truth moved to 12

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:26:06 +02:00
Benjamin Admin
52637778b9 SmartSpellChecker: boundary repair + context split + abbreviation awareness
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 51s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m54s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
New features:
- Boundary repair: "ats th." → "at sth." (shifted OCR word boundaries)
  Tries shifting 1-2 chars between adjacent words, accepts if result
  includes a known abbreviation or produces better dictionary matches
- Context split: "anew book" → "a new book" (ambiguous word merges)
  Explicit allow/deny list for article+word patterns (alive, alone, etc.)
- Abbreviation awareness: 120+ known abbreviations (sth, sb, adj, etc.)
  are now recognized as valid words, preventing false corrections
- Quality gate: boundary repairs only accepted when result scores
  higher than original (known words + abbreviations)

40 tests passing, all edge cases covered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:41:17 +02:00
Benjamin Admin
f6372b8c69 Integrate SmartSpellChecker into build-grid finalization
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m45s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 40s
SmartSpellChecker now runs during grid build (not just LLM review),
so corrections are visible immediately in the grid editor.

Language detection per column:
- EN column detected via IPA signals (existing logic)
- All other columns assumed German for vocab tables
- Auto-detection for single/two-column layouts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:54:01 +02:00
Benjamin Admin
909d0729f6 Add SmartSpellChecker + refactor vocab-worksheet page.tsx
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m51s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 37s
SmartSpellChecker (klausur-service):
- Language-aware OCR post-correction without LLMs
- Dual-dictionary heuristic for EN/DE language detection
- Context-based a/I disambiguation via bigram lookup
- Multi-digit substitution (sch00l→school)
- Cross-language guard (don't false-correct DE words in EN column)
- Umlaut correction (Schuler→Schüler, uber→über)
- Integrated into spell_review_entries_sync() pipeline
- 31 tests, 9ms/100 corrections

Vocab-worksheet refactoring (studio-v2):
- Split 2337-line page.tsx into 14 files
- Custom hook useVocabWorksheet.ts (all state + logic)
- 9 components in components/ directory
- types.ts, constants.ts for shared definitions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:25:01 +02:00
Benjamin Admin
04fa01661c Move IPA/syllable toggles to vocabulary tab toolbar
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m51s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 36s
Dropdowns are now in the vocabulary table header (after processing),
not in the worksheet settings (before processing). Changing a mode
automatically reprocesses all successful pages with the new settings.
Same dropdown options as the OCR pipeline grid editor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:17:14 +02:00
Benjamin Admin
bf9d24e108 Replace IPA/syllable checkboxes with full dropdowns in vocab-worksheet
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m41s
CI / test-python-agent-core (push) Successful in 39s
CI / test-nodejs-website (push) Successful in 42s
Vocab worksheet now has the same IPA/syllable mode options as the
OCR pipeline grid editor: Auto, nur EN, nur DE, Alle, Aus.
Previously only had on/off checkboxes mapping to auto/none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:10:22 +02:00
Benjamin Admin
0f17eb3cd9 Fix IPA:Aus — strip all brackets before skipping IPA block
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m53s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has started running
When ipa_mode=none, the entire IPA processing block was skipped,
including the bracket-stripping logic. Now strips ALL square brackets
from content columns BEFORE the skip, so IPA:Aus actually removes
all IPA from the display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:05:22 +02:00
Benjamin Admin
5244e10728 Fix IPA/syllable race condition: loadGrid no longer depends on buildGrid
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m55s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Has been cancelled
loadGrid depended on buildGrid (for 404 fallback), which depended on
ipaMode/syllableMode. Every mode change created a new loadGrid ref,
triggering StepGridReview's useEffect to load the OLD saved grid,
overwriting the freshly rebuilt one.

Now loadGrid only depends on sessionId. The 404 fallback builds inline
with current modes. Mode changes are handled exclusively by the
separate rebuild useEffect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:59:49 +02:00
Benjamin Admin
a6c5f56003 Fix IPA strip: match all square brackets, not just Unicode IPA
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-python-agent-core (push) Successful in 29s
CI / test-nodejs-website (push) Successful in 23s
OCR text contains ASCII IPA approximations like [kompa'tifn] instead
of Unicode [kˈɒmpətɪʃən]. The strip regex required Unicode IPA chars
inside brackets and missed the ASCII ones. Now strips all [bracket]
content from excluded columns since square brackets in vocab columns
are always IPA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:53:16 +02:00
Benjamin Admin
584e07eb21 Strip English IPA when mode excludes EN (nur DE / Aus)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
English IPA from the original OCR scan (e.g. [ˈgrænˌdæd]) was always
shown because fix_cell_phonetics only ADDS/CORRECTS but never removes.
Now strips IPA brackets containing Unicode IPA chars from the EN column
when ipa_mode is "de" or "none".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:49:22 +02:00
Benjamin Admin
54b1c7d7d7 Fix IPA/syllable first-click not working (off-by-one in initialLoadDone)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m52s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 38s
The old guard checked if grid was loaded AND set initialLoadDone in
the same pass, then returned without rebuilding. This meant the first
user-triggered mode change was always swallowed.

Simplified to a mount-skip ref: skip exactly the first useEffect trigger
(component mount), rebuild on every subsequent trigger (user changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:40:57 +02:00
Benjamin Admin
d8a2331038 Fix IPA/syllable mode change requiring double-click
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m58s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 38s
The useEffect for mode changes called buildGrid() which was a
useCallback closing over stale ipaMode/syllableMode values due to
React's asynchronous state batching. The first click triggered a
rebuild with the OLD mode; only the second click used the new one.

Now inlines the API call directly in the useEffect, reading ipaMode
and syllableMode from the effect's closure which always has the
current values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:32:02 +02:00
Benjamin Admin
ad78e26143 Fix word-split: handle IPA brackets, contractions, and tiebreaker
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s
1. Strip IPA brackets [ipa] before attempting word split, so
   "makeadecision[dɪsˈɪʒən]" is processed as "makeadecision"
2. Handle contractions: "solet's" → split "solet" → "so let" + "'s"
3. DP tiebreaker: prefer longer first word when scores are equal
   ("task is" over "ta skis")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:13:02 +02:00
Benjamin Admin
4f4e6c31fa Fix word-split tiebreaker: prefer longer first word
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
"taskis" was split as "ta skis" instead of "task is" because both
have the same DP score. Changed comparison from > to >= so that
later candidates (with longer first words) win ties.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:05:14 +02:00
Benjamin Admin
7ffa4c90f9 Lower word-split threshold from 7 to 4 chars
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m48s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 38s
Short merged words like "anew" (a new), "Imadea" (I made a),
"makeadecision" (make a decision) were missed because the split
threshold was too high. Now processes tokens >= 4 chars.

English single-letter words (a, I) are already handled by the DP
algorithm which allows them as valid split points.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:59:02 +02:00
Benjamin Admin
656cadbb1e Remove page-number footers from grid, promote to metadata
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m55s
CI / test-python-agent-core (push) Successful in 30s
CI / test-nodejs-website (push) Successful in 37s
Footer rows that are page numbers (digits or written-out like
"two hundred and nine") are now removed from the grid entirely
and promoted to the page_number metadata field. Non-page-number
footer content stays as a visible footer row.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:50:20 +02:00
Benjamin Admin
757c8460c9 Detect written-out page numbers as footer rows
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 44s
CI / test-python-klausur (push) Failing after 2m46s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 39s
"two hundred and nine" (22 chars) was kept as a content row because
the footer detection only accepted text ≤20 chars. Now recognizes
written-out number words (English + German) as page numbers regardless
of length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:39:43 +02:00
Benjamin Admin
501de4374a Keep page references as visible column cells
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Step 5g was extracting page refs (p.55, p.70) as zone metadata and
removing them from the cell table. Users want to see them as a
separate column. Now keeps cells in place while still extracting
metadata for the frontend header display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:27:44 +02:00
Benjamin Admin
774bbc50d3 Add debug logging for empty-column-removal
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 54s
CI / test-python-klausur (push) Failing after 2m53s
CI / test-python-agent-core (push) Successful in 39s
CI / test-nodejs-website (push) Successful in 39s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:45:22 +02:00
Benjamin Admin
9ceee4e07c Protect page references from junk-row removal
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 11s
CI / test-go-edu-search (push) Successful in 57s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
Rows containing only a page reference (p.55, S.12) were removed as
"oversized stubs" (Rule 2) when their word-box height exceeded the
median. Now skips Rule 2 if any word matches the page-ref pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:40:37 +02:00
Benjamin Admin
f23aaaea51 Fix false header detection: skip continuation lines and mid-column cells
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 54s
CI / test-go-edu-search (push) Successful in 57s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 34s
Single-cell rows were incorrectly detected as headings when they were
actually continuation lines. Two new guards:
1. Text starting with "(" is a continuation (e.g. "(usw.)", "(TV-Serie)")
2. Single cells beyond the first two content columns are overflow lines,
   not headings. Real headings appear in the first columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:21:09 +02:00
Benjamin Admin
cde13c9623 Fix IPA stripping digits after headwords (Theme 1 → Theme)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m46s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 30s
_insert_missing_ipa stripped "1" from "Theme 1" because it treated
the digit as garbled OCR phonetics. Now treats pure digits/numbering
patterns (1, 2., 3)) as delimiters that stop the garble-stripping.

Also fixes _has_non_dict_trailing which incorrectly flagged "Theme 1"
as having non-dictionary trailing text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:13:45 +02:00
Benjamin Admin
2e42167c73 Remove empty columns from grid zones
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 52s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 29s
Columns with zero cells (e.g. from tertiary detection where the word
was assigned to a neighboring column by overlap) are stripped from the
final result. Remaining columns and cells are re-indexed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:04:49 +02:00
Benjamin Admin
5eff4cf877 Fix page refs deleted as artifacts + IPA spacing for DE mode
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 54s
CI / test-go-edu-search (push) Successful in 41s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
CI / test-python-klausur (push) Has started running
1. Step 5j-pre wrongly classified "p.43", "p.50" etc as artifacts
   (mixed digits+letters, <=5 chars). Added exception for page
   reference patterns (p.XX, S.XX).

2. IPA spacing regex was too narrow (only matched Unicode IPA chars).
   Now matches any [bracket] content >=2 chars directly after a letter,
   fixing German IPA like "Opa[oːpa]" → "Opa [oːpa]".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:01:25 +02:00
Benjamin Admin
7f4b8757ff Fix IPA spacing + add zone debug logging for marker column issue
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 55s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m48s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 37s
1. Ensure space before IPA brackets in cell text: "word[ipa]" → "word [ipa]"
   Applied as final cleanup in grid-build finalization.

2. Add debug logging for zone-word assignment to diagnose why marker
   column cells are empty despite correct column detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:51:52 +02:00
Benjamin Admin
7263328edb Fix marker column detection: remove min-rows requirement
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 30s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 2m55s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 22s
Words to the left of the first detected column boundary must always
form their own column, regardless of how few rows they appear in.
Previously required 4+ distinct rows for tertiary (margin) columns,
which missed page references like p.62, p.63, p.64 (only 3 rows).

Now any cluster at the left/right margin with a clear gap to the
nearest significant column qualifies as its own column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:24:25 +02:00
Benjamin Admin
8c482ce8dd Fix Grid Build step: show grid-editor summary instead of word_result
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 23s
The Grid Build step was showing word_result.grid_shape (from the initial
OCR word clustering, often just 1 column) instead of the grid-editor
summary (zone-based, with correct column/row/cell counts). Now reads
summary.total_rows/total_columns/total_cells from the grid-editor result.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:01:18 +02:00
Benjamin Admin
00f7a7154c Fix left-side gutter detection: find peak instead of scanning from edge
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 30s
CI / test-nodejs-website (push) Successful in 32s
Left-side book fold shadows have a V-shape: brightness dips from the
edge toward a peak at ~5-10% of width, then rises again. The previous
algorithm scanned from the edge inward and immediately found a low
dark fraction (0.13 at x=0), missing the gutter entirely.

Now finds the PEAK of the dark fraction profile first, then scans from
that peak toward the page center to find the transition point. Works
for both V-shaped left gutters and edge-darkening right gutters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:52:23 +02:00
Benjamin Admin
9c5e950c99 Fix multi-page PDF upload: include session_id for first page
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-nodejs-website (push) Successful in 36s
CI / test-python-klausur (push) Failing after 10m2s
CI / test-go-edu-search (push) Failing after 10m9s
CI / test-python-agent-core (push) Failing after 14m58s
The frontend expects session_id in the upload response, but multi-page
PDFs returned only document_group_id + pages[]. Now includes session_id
pointing to the first page for backwards compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:26:25 +02:00
Benjamin Admin
6e494a43ab Apply merged-word splitting to grid-editor cells
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 44s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 32s
The spell review only runs on vocab entries, but the OCR pipeline's
grid-editor cells also contain merged words (e.g. "atmyschool").
Now splits merged words directly in the grid-build finalization step,
right before returning the result. Uses the same _try_split_merged_word()
dictionary-based DP algorithm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:52:00 +02:00
Benjamin Admin
53b0d77853 Multi-page PDF support: create one session per page
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 27s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 24s
CI / test-nodejs-website (push) Successful in 35s
When uploading a PDF with > 1 page to the OCR pipeline, each page
now gets its own session (grouped by document_group_id). Previously
only page 1 was processed. The response includes a pages array with
all session IDs so the frontend can navigate between them.

Single-page PDFs and images continue to work as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:39:48 +02:00
Benjamin Admin
aed0edbf6d Fix word split scoring: prefer longer words over short ones
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 20s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m41s
CI / test-python-agent-core (push) Successful in 24s
CI / test-nodejs-website (push) Successful in 30s
"Comeon" was split as "Com eon" instead of "Come on" because both
are 2-word splits. Now uses sum-of-squared-lengths as tiebreaker:
"come"(16) + "on"(4) = 20 > "com"(9) + "eon"(9) = 18.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:14:23 +02:00
Benjamin Admin
9e2c301723 Add merged-word splitting to OCR spell review
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 38s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
OCR often merges adjacent words when spacing is tight, e.g.
"atmyschool" → "at my school", "goodidea" → "good idea".

New _try_split_merged_word() uses dynamic programming to find the
shortest sequence of dictionary words covering the token. Integrated
as step 5 in _spell_fix_token() after general spell correction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:11:16 +02:00
Benjamin Admin
633e301bfd Add camera gutter detection via vertical continuity analysis
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 32s
CI / test-python-klausur (push) Failing after 2m49s
CI / test-python-agent-core (push) Successful in 30s
CI / test-nodejs-website (push) Successful in 32s
Scanner shadow detection (range > 40, darkest < 180) fails on camera
book scans where the gutter shadow is subtle (range ~25, darkest ~214).

New _detect_gutter_continuity() detects gutters by their unique property:
the shadow runs continuously from top to bottom without interruption.
Divides the image into horizontal strips and checks what fraction of
strips are darker than the page median at each column. A gutter column
has >= 75% of strips darker. The transition point where the smoothed
dark fraction drops below 50% marks the crop boundary.

Integrated as fallback between scanner shadow and binary projection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:58:14 +02:00
Benjamin Admin
9b5e8c6b35 Restructure upload flow: document first, then preview + naming
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 24s
Step 1 is now document selection (full width).
After selecting a file, Step 2 shows a side-by-side layout with
document preview (3/5 width, scrollable, with fullscreen modal)
and session naming (2/5 width, with start button).

Also adds PDF preview via blob URL before upload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:53:47 +02:00
Benjamin Admin
682b306e51 Use grid-build zones for vocab extraction (4-column detection)
Some checks failed
CI / go-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 29s
CI / test-nodejs-website (push) Successful in 36s
The initial build_grid_from_words() under-clusters to 1 column while
_build_grid_core() correctly finds 4 columns (marker, EN, DE, example).
Now extracts vocab from grid zones directly, with heuristic to skip
narrow marker columns. Falls back to original cells if zones fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 01:17:40 +02:00
Benjamin Admin
3e3116d2fd Fix vocab extraction: show all columns for generic layouts
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 36s
When columns can't be classified as EN/DE, map them by position:
col 0 → english, col 1 → german, col 2+ → example. This ensures
vocabulary pages are always extracted, even without explicit
language classification. Classified pages still use the proper
EN/DE/example mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 01:11:40 +02:00
Benjamin Admin
9a8ce69782 Fix vocab extraction: use original column types for EN/DE classification
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 37s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
The grid-build zones use generic column types, losing the EN/DE
classification from build_grid_from_words(). Now extracts improved
cells from grid zones but classifies them using the original
columns_meta which has the correct column_en/column_de types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 01:07:49 +02:00
Benjamin Admin
66f8a7b708 Improve vocab-worksheet UX: better status messages + error details
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m19s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 35s
- Change "PDF wird analysiert..." to "PDF wird hochgeladen..." (accurate)
- Switch to pages tab immediately after upload (before thumbnails load)
- Show progressive status: "5 Seiten erkannt. Vorschau wird geladen..."
- Show backend error detail instead of generic "HTTP 404"
- Backend returns helpful message when session not in memory after restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:55:56 +02:00
Benjamin Admin
3b78baf37f Replace old OCR pipeline with Kombi pipeline + add IPA/syllable toggles
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 37s
CI / test-python-klausur (push) Failing after 2m22s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 33s
Backend:
- _run_ocr_pipeline_for_page() now runs the full Kombi pipeline:
  orientation → deskew → dewarp → content crop → dual-engine OCR
  (RapidOCR + Tesseract merge) → _build_grid_core() with pipe-autocorrect,
  word-gap merge, dictionary detection
- Accepts ipa_mode and syllable_mode query params on process-single-page
- Pipeline sessions are visible in admin OCR Kombi UI for debugging

Frontend (vocab-worksheet):
- New "Anzeigeoptionen" section with IPA and syllable toggles
- Settings are passed to process-single-page as query parameters

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:43:42 +02:00
Benjamin Admin
2828871e42 Show detected page number in session header
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 28s
Extracts page_number from grid_editor_result when opening a session
and displays it as "S. 233" badge in the SessionHeader, next to the
category and GT badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:20:53 +02:00
Benjamin Admin
5c96def4ec Skip valid line-break hyphenations in gutter repair
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 31s
Words ending with "-" where the stem is a known word (e.g. "wunder-"
→ "wunder" is known) are valid line-break hyphenations, not gutter
errors. Gutter problems cause the hyphen to be LOST ("ve" instead of
"ver-"), so a visible hyphen + known stem = intentional word-wrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:14:21 +02:00
Benjamin Admin
611e1ee33d Add GT badge to grouped sessions and sub-pages in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m29s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 34s
The GT badge was only shown on ungrouped SessionRow items. Now also
visible on document group rows (e.g. "GT 1/2") and individual pages
within expanded groups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:54:55 +02:00
Benjamin Admin
49d5212f0c Fix hyphen-join: preserve next row + skip valid hyphenations
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m26s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 31s
Two bugs fixed:
- Apply no longer removes the continuation word from the next row.
  "künden" stays in row 31 — only the current row is repaired
  ("ve" → "ver-"). The original line-break layout is preserved.
- Analysis now skips words that already end with "-" when the direct
  join with the next row is a known word (valid hyphenation, not an error).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:49:07 +02:00
Benjamin Admin
e6f8e12f44 Show full Grid-Review in Ground Truth step + GT badge in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 37s
CI / test-python-klausur (push) Failing after 2m18s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 27s
- StepGroundTruth now shows the split view (original image + table)
  so the user can verify the final result before marking as GT
- Backend session list now returns is_ground_truth flag
- SessionList shows amber "GT" badge for marked sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:34:32 +02:00
Benjamin Admin
aabd849e35 Fix hyphen-join: strip trailing punctuation from continuation word
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 34s
The next-row word "künden," had a trailing comma, causing dictionary
lookup to fail for "verkünden,". Now strips .,;:!? before joining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:25:28 +02:00
Benjamin Admin
d1e7dd1c4a Fix gutter repair: detect short fragments + show spell alternatives
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 48s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Lower min word length from 3→2 for hyphen-join candidates so fragments
  like "ve" (from "ver-künden") are no longer skipped
- Return all spellchecker candidates instead of just top-1, so user can
  pick the correct form (e.g. "stammeln" vs "stammelt")
- Frontend shows clickable alternative buttons for spell_fix suggestions
- Backend accepts text_overrides in apply endpoint for user-selected alternatives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:09:12 +02:00
Benjamin Admin
71e1b10ac7 Add gutter repair step to OCR Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 29s
New step "Wortkorrektur" between Grid-Review and Ground Truth that detects
and fixes words truncated or blurred at the book gutter (binding area) of
double-page scans. Uses pyspellchecker (DE+EN) for validation.

Two repair strategies:
- hyphen_join: words split across rows with missing chars (ve + künden → verkünden)
- spell_fix: garbled trailing chars from gutter blur (stammeli → stammeln)

Interactive frontend with per-suggestion accept/reject and batch controls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:50:16 +02:00
Benjamin Admin
21b69e06be Fix cross-column word assignment by splitting OCR merge artifacts
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 23s
When OCR merges adjacent words from different columns into one word box
(e.g. "sichzie" spanning Col 1+2, "dasZimmer" crossing boundary), the
grid builder assigned the entire merged word to one column.

New _split_cross_column_words() function splits these at column
boundaries using case transitions and spellchecker validation to
avoid false positives on real words like "oder", "Kabel", "Zeitung".

Regression: 12/12 GT sessions pass with diff=+0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:54:41 +01:00
Benjamin Admin
0168ab1a67 Remove Hauptseite/Box tabs from Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 20s
Page-split now creates independent sessions that appear directly in
the session list. After split, the UI switches to the first child
session. BoxSessionTabs, sub-session state, and parent-child tracking
removed from Kombi code. Legacy ocr-overlay still uses BoxSessionTabs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 17:43:58 +01:00
Benjamin Admin
925f4356ce Use spellchecker instead of pyphen for pipe autocorrect validation
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 2m29s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 20s
pyphen is a pattern-based hyphenator that accepts nonsense strings
like "Zeplpelin". Switch to spellchecker (frequency-based word list)
which correctly rejects garbled words and can suggest corrections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:47:42 +01:00
Benjamin Admin
cc4cb3bc2f Add pipe auto-correction and graphic artifact filter for grid builder
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m10s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 19s
- autocorrect_pipe_artifacts(): strips OCR pipe artifacts from printed
  syllable dividers, validates with pyphen, tries char-deletion near
  pipe positions for garbled words (e.g. "Ze|plpe|lin" → "Zeppelin")
- Rule (a2): filters isolated non-alphanumeric word boxes (≤2 chars,
  no letters/digits) — catches small icons OCR'd as ">", "<" etc.
- Both fixes are generic: pyphen-validated, no session-specific logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:33:38 +01:00
Benjamin Admin
0685fb12da Fix Bug 3: recover OCR-lost prefixes via overlap merge + chain merging
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m24s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 19s
When OCR merge expands a prefix word box (e.g. "zer" w=42 → w=104),
it heavily overlaps (>75%) with the next fragment ("brech"). The grid
builder's overlap filter previously removed the prefix as a duplicate.

Fix: when overlap > 75% but both boxes are alphabetic with different
text and one is ≤ 4 chars, merge instead of removing. Also enable
chain merging via merge_parent tracking so "zer" + "brech" + "lich"
→ "zerbrechlich" in a single pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:49:52 +01:00
1317 changed files with 186171 additions and 180627 deletions

View File

@@ -256,3 +256,67 @@ ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && git push all
| `website/app/admin/klausur-korrektur/` | Korrektur-Workspace | | `website/app/admin/klausur-korrektur/` | Korrektur-Workspace |
| `backend-lehrer/classroom_api.py` | Classroom Engine | | `backend-lehrer/classroom_api.py` | Classroom Engine |
| `backend-lehrer/state_engine_api.py` | State Engine | | `backend-lehrer/state_engine_api.py` | State Engine |
---
## Compliance: Kein Scan/OCR im Kunden-Frontend (NON-NEGOTIABLE)
Studio-v2 (Kunden-Frontend, Port 443) darf **KEINE** Features enthalten die:
- Buchseiten/Schulbuecher von Dritten rekonstruieren oder reproduzieren
- Aktiv zum Upload fremder urheberrechtlich geschuetzter Werke auffordern
**Erlaubt** in studio-v2:
- Upload eigener Dokumente durch Lehrer (eigene Arbeitsblaetter, Tests, Materialien)
- OCR/Verarbeitung von Dokumenten bei denen der Lehrer Urheber ist
- Manuelle Vokabeleingabe durch Lehrer
- Vorschlagslisten aus eigenem Woerterbuch (160k MIT-lizenzierte Woerter)
- Lernunit-Erstellung aus eigenen/ausgewaehlten Inhalten
- Audio/Bild/Quiz/Karteikarten-Generierung
**Erweiterte OCR/Scan Features** (z.B. Vision-LLM Fusion, A/B Testing Toggles) bleiben
im Admin-Frontend (admin-lehrer, Port 3002) fuer Entwicklung und Testing.
**Hintergrund**: Urheberrechtliche Haftung der GmbH. Das System ist eine
Didaktik-Engine (Transformation + Lernen), KEIN Content-Reconstruction-Tool.
---
## Code-Qualitaet Guardrails (NON-NEGOTIABLE)
> Vollstaendige Details: `.claude/rules/architecture.md`
> Ausnahmen: `.claude/rules/loc-exceptions.txt`
### File Size Budget
- **Hard Cap: 500 LOC** pro Datei
- Wenn eine Aenderung eine Datei ueber 500 LOC bringen wuerde: **erst splitten, dann aendern**
- Ausnahmen nur mit Begruendung in `loc-exceptions.txt` + `[guardrail-change]` Commit-Marker
### Architektur
- **Python:** Routes duenn → Business Logic in Services → Persistenz in Repositories
- **Go:** Handler ≤40 LOC → Service-Layer → Repository-Pattern
- **TypeScript/Next.js:** page.tsx duenn → Server Actions, Queries, Components auslagern
- **Types:** Monolithische types.ts frueh splitten, types.ts + types/ Shadowing vermeiden
### Workflow (bei jeder Aenderung)
1. Datei lesen + LOC pruefen
2. Wenn nahe am Budget → erst splitten
3. Minimale kohaerente Aenderung
4. Verifikation (Tests + Lint)
5. Zusammenfassung: Was geaendert, was verifiziert, Restrisiko
### Commit-Marker
- `[migration-approved]` — Schema-/Migrations-Aenderungen
- `[guardrail-change]` — Aenderungen an .claude/**, scripts/check-loc.sh
- `[split-required]` — Aenderung beginnt mit Datei-Split
- `[interface-change]` — Public API Contracts geaendert
### LOC-Check ausfuehren
```bash
bash scripts/check-loc.sh --changed # nur geaenderte Dateien
bash scripts/check-loc.sh --all # alle Dateien (zeigt alle Violations)
```

View File

@@ -0,0 +1,46 @@
# Architecture Rule — BreakPilot Lehrer
## File Size Budget
Hard default: **500 LOC max** per file.
Soft targets:
- Handler/Router/Service: 300-400 LOC
- Models/Schemas/Types: 200-300 LOC
- Utilities: 100-200 LOC
Ausnahmen nur in `.claude/rules/loc-exceptions.txt` mit Begruendung.
## Split-Trigger
Sofort splitten wenn:
- Datei ueberschreitet 500 LOC
- Datei wuerde nach Aenderung 500 LOC ueberschreiten
- Datei mischt Transport + Business Logic + Persistence
- Datei enthaelt mehrere unabhaengig testbare Verantwortlichkeiten
## Python (backend-lehrer, klausur-service, voice-service)
- Routes duenn halten — Business Logic in Services
- Persistenz in Repositories/Data-Access-Module
- Pydantic Schemas nach Domain splitten
- Zirkulaere Imports vermeiden
## Go (school-service, edu-search-service)
- Handler duenn halten (≤40 LOC)
- Business Logic in Services/Use-Cases
- Transport/Request-Decoding getrennt von Domain-Logik
## TypeScript / Next.js (admin-lehrer, studio-v2, website)
- page.tsx duenn halten — Server Actions, Queries, Forms auslagern
- Monolithische types.ts frueh splitten
- types.ts + types/ Shadowing vermeiden
- Shared Client/Server Types explizit trennen
## Entscheidungsreihenfolge
1. Bestehendes kleines kohaeesives Modul wiederverwenden
2. Neues Modul in der Naehe erstellen
3. Ueberfuellte Datei splitten, neues Verhalten in richtiges Split-Modul
4. Nur als letzter Ausweg: Grosse bestehende Datei erweitern

View File

@@ -0,0 +1,51 @@
# LOC Exceptions — BreakPilot Lehrer
# Format: <glob> | owner=<person> | reason=<why> | review=<date>
#
# Jede Ausnahme braucht Begruendung und Review-Datum.
# Temporaere Ausnahmen muessen mit [guardrail-change] Commit-Marker versehen werden.
# Generated / Build Artifacts
**/node_modules/** | owner=infra | reason=npm packages | review=permanent
**/.next/** | owner=infra | reason=Next.js build output | review=permanent
**/__pycache__/** | owner=infra | reason=Python bytecode | review=permanent
**/venv/** | owner=infra | reason=Python virtualenv | review=permanent
# Test-Dateien (duerfen groesser sein fuer Table-Driven Tests)
**/*test*.py | owner=all | reason=Tests mit Table-Driven Patterns duerfen groesser sein | review=permanent
**/*test*.go | owner=all | reason=Go Tests mit Table-Driven Patterns | review=permanent
**/*test*.ts | owner=all | reason=TypeScript Tests | review=permanent
**/tests/** | owner=all | reason=Test-Verzeichnisse | review=permanent
# Blog-Seiten (reine statische Inhalte, kein Code)
**/blog/*/page.tsx | owner=website | reason=Statische Blog-Artikel (MDX-artig, reiner Content) | review=permanent
# Pure Data Registries (keine Logik, nur Daten-Definitionen)
**/dsfa_sources_registry.py | owner=klausur | reason=Pure data registry (license + source definitions, no logic) | review=2027-01-01
**/legal_corpus_registry.py | owner=klausur | reason=Pure data registry (Regulation dataclass + 47 regulation definitions, no logic) | review=2027-01-01
**/backlog/backlog-items.ts | owner=admin-lehrer | reason=Pure data array (506 LOC, no logic, only BacklogItem[] literals) | review=2027-01-01
**/lib/module-registry-data.ts | owner=admin-lehrer | reason=Pure data array (510 LOC, no logic, only BackendModule[] literals) | review=2027-01-01
# Algorithmic monolith — detect_column_geometry() allein 411 LOC, nicht weiter teilbar
**/cv_layout_columns.py | owner=klausur | reason=detect_column_geometry ist eine einzelne 411-LOC Funktion (Whitespace-Gap-Analyse) | review=2026-10-01
# Two indivisible route handlers (~230 LOC each) that cannot be split further
**/vocab_worksheet_compare_api.py | owner=klausur | reason=compare_ocr_methods (234 LOC) + analyze_grid (255 LOC), each a single cohesive handler | review=2026-10-01
# TypeScript Data Catalogs (admin-lehrer/lib/sdk/)
# Pure exported const arrays/objects with type definitions, no business logic.
# DSGVO/GDPR compliance catalogs: risk scenarios, mitigations, legal bases, checklists, etc.
**/lib/sdk/vendor-compliance/catalog/*.ts | owner=admin-lehrer | reason=Pure data catalogs (processing-activities 813, vendor-templates 564, legal-basis 562 LOC) | review=2027-01-01
**/lib/sdk/vendor-compliance/contract-review/findings.ts | owner=admin-lehrer | reason=Pure data catalog (573 LOC, FindingTemplate[] literals) | review=2027-01-01
**/lib/sdk/vendor-compliance/contract-review/checklists.ts | owner=admin-lehrer | reason=Pure data catalog (508 LOC, ChecklistItem[] literals) | review=2027-01-01
**/lib/sdk/dsfa/mitigation-library.ts | owner=admin-lehrer | reason=Pure data catalog (694 LOC, CatalogMitigation[] literals) | review=2027-01-01
**/lib/sdk/dsfa/eu-legal-frameworks.ts | owner=admin-lehrer | reason=Pure data catalog (622 LOC, legal framework definitions) | review=2027-01-01
**/lib/sdk/dsfa/risk-catalog.ts | owner=admin-lehrer | reason=Pure data catalog (615 LOC, CatalogRisk[] literals) | review=2027-01-01
**/lib/sdk/vvt-baseline-catalog.ts | owner=admin-lehrer | reason=Pure data catalog (630 LOC, BaselineTemplate[] literals) | review=2027-01-01
**/lib/sdk/loeschfristen-baseline-catalog.ts | owner=admin-lehrer | reason=Pure data catalog (578 LOC, retention period templates) | review=2027-01-01
# Single SSE generator orchestrating 6 pipeline steps — cannot split generator context
**/ocr_pipeline_auto_steps.py | owner=klausur | reason=run_auto is a single async generator yielding SSE events across 6 steps (528 LOC) | review=2026-10-01
# Legacy — TEMPORAER bis Refactoring abgeschlossen
# Dateien hier werden Phase fuer Phase abgearbeitet und entfernt.
# KEINE neuen Ausnahmen ohne [guardrail-change] Commit-Marker!

View File

@@ -0,0 +1,242 @@
# OCR Pipeline Erweiterungen - Entwicklerdokumentation
**Status:** Produktiv
**Letzte Aktualisierung:** 2026-04-15
**URL:** https://macmini:3002/ai/ocr-kombi
---
## Uebersicht
Erweiterungen der OCR Kombi Pipeline (14 Steps, 0-13):
- **SmartSpellChecker** — LLM-freie OCR-Korrektur mit Spracherkennung
- **Box-Grid-Review** (Step 11) — Eingebettete Boxen verarbeiten
- **Ansicht/Spreadsheet** (Step 12) — Fortune Sheet Excel-Editor
---
## Pipeline Steps
| Step | ID | Name | Komponente |
|------|----|------|------------|
| 0 | upload | Upload | StepUpload |
| 1 | orientation | Orientierung | StepOrientation |
| 2 | page-split | Seitentrennung | StepPageSplit |
| 3 | deskew | Begradigung | StepDeskew |
| 4 | dewarp | Entzerrung | StepDewarp |
| 5 | content-crop | Zuschneiden | StepContentCrop |
| 6 | ocr | OCR | StepOcr |
| 7 | structure | Strukturerkennung | StepStructure |
| 8 | grid-build | Grid-Aufbau | StepGridBuild |
| 9 | grid-review | Grid-Review | StepGridReview |
| 10 | gutter-repair | Wortkorrektur | StepGutterRepair |
| **11** | **box-review** | **Box-Review** | **StepBoxGridReview** |
| **12** | **ansicht** | **Ansicht** | **StepAnsicht** |
| 13 | ground-truth | Ground Truth | StepGroundTruth |
Step-Definitionen: `admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts`
---
## SmartSpellChecker
**Datei:** `klausur-service/backend/smart_spell.py`
**Tests:** `tests/test_smart_spell.py` (43 Tests)
**Lizenz:** Nur pyspellchecker (MIT) — kein LLM, kein Hunspell
### Features
| Feature | Methode |
|---------|---------|
| Spracherkennung | Dual-Dictionary EN/DE Heuristik |
| a/I Disambiguation | Bigram-Kontext (Folgewort-Lookup) |
| Boundary Repair | Frequenz-basiert: `Pound sand``Pounds and` |
| Context Split | `anew``a new` (Allow/Deny-Liste) |
| Multi-Digit | BFS: `sch00l``school` |
| Cross-Language Guard | DE-Woerter in EN-Spalte nicht falsch korrigieren |
| Umlaut-Korrektur | `Schuler``Schueler` |
| IPA-Schutz | Inhalte in [Klammern] nie aendern |
| Slash→l | `p/``pl` (kursives l als / erkannt) |
| Abkuerzungen | 120+ aus `_KNOWN_ABBREVIATIONS` |
### Integration
```python
# In cv_review.py (LLM Review Step):
from smart_spell import SmartSpellChecker
_smart = SmartSpellChecker()
result = _smart.correct_text(text, lang="en") # oder "de" oder "auto"
# In grid_editor_api.py (Grid Build + Box Build):
# Automatisch nach Grid-Aufbau und Box-Grid-Aufbau
```
### Frequenz-Scoring
Boundary Repair vergleicht Wort-Frequenz-Produkte:
- `old_freq = word_freq(w1) * word_freq(w2)`
- `new_freq = word_freq(repaired_w1) * word_freq(repaired_w2)`
- Akzeptiert wenn `new_freq > old_freq * 5`
- Abkuerzungs-Bonus nur wenn Original-Woerter selten (freq < 1e-6)
---
## Box-Grid-Review (Step 11)
**Frontend:** `admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx`
**Backend:** `klausur-service/backend/cv_box_layout.py`, `grid_editor_api.py`
**Tests:** `tests/test_box_layout.py` (13 Tests)
### Backend-Endpoints
```
POST /api/v1/ocr-pipeline/sessions/{id}/build-box-grids
```
Verarbeitet alle erkannten Boxen aus `structure_result`:
1. Filtert Header/Footer-Boxen (obere/untere 7% der Bildhoehe)
2. Extrahiert OCR-Woerter pro Box aus `raw_paddle_words`
3. Klassifiziert Layout: `flowing` | `columnar` | `bullet_list` | `header_only`
4. Baut Grid mit layout-spezifischer Logik
5. Wendet SmartSpellChecker an
### Box Layout Klassifikation (`cv_box_layout.py`)
| Layout | Erkennung | Grid-Aufbau |
|--------|-----------|-------------|
| `header_only` | ≤5 Woerter oder 1 Zeile | 1 Zelle, alles zusammen |
| `flowing` | Gleichmaessige Zeilenbreite | 1 Spalte, Bullet-Gruppierung per Einrueckung |
| `bullet_list` | ≥40% Zeilen mit Bullet-Marker | 1 Spalte, Bullet-Items |
| `columnar` | Mehrere X-Cluster | Standard-Spaltenerkennung |
### Bullet-Einrueckung
Erkennung ueber Left-Edge-Analyse:
- Minimale Einrueckung = Bullet-Ebene
- Zeilen mit >15px mehr Einrueckung = Folgezeilen
- Folgezeilen werden mit `\n` in die Bullet-Zelle integriert
- Fehlende `•` Marker werden automatisch ergaenzt
### Colspan-Erkennung (`grid_editor_helpers.py`)
Generische Funktion `_detect_colspan_cells()`:
- Laeuft nach `_build_cells()` fuer ALLE Zonen
- Nutzt Original-Wort-Bloecke (vor `_split_cross_column_words`)
- Wort-Block der ueber Spaltengrenze reicht → `spanning_header` mit `colspan=N`
- Beispiel: "In Britain you pay with pounds and pence." ueber 2 Spalten
### Spalten-Erkennung in Boxen
Fuer kleine Zonen (≤60 Woerter):
- `gap_threshold = max(median_h * 1.0, 25)` statt `3x median`
- PaddleOCR liefert Multi-Word-Bloecke → alle Gaps sind Spalten-Gaps
---
## Ansicht / Spreadsheet (Step 12)
**Frontend:** `admin-lehrer/components/ocr-kombi/StepAnsicht.tsx`, `SpreadsheetView.tsx`
**Bibliothek:** `@fortune-sheet/react` (MIT, v1.0.4)
### Architektur
Split-View:
- **Links:** Original-Scan mit OCR-Overlay (`/image/words-overlay`)
- **Rechts:** Fortune Sheet Spreadsheet mit Multi-Sheet-Tabs
### Multi-Sheet Ansatz
Jede Zone wird ein eigenes Sheet-Tab:
- Sheet "Vokabeln" — Hauptgrid mit EN/DE Spalten
- Sheet "Pounds and euros" — Box 1 mit eigenen 4 Spalten
- Sheet "German leihen" — Box 2 als Fliesstexttext
Grund: Spaltenbreiten sind pro Zone unterschiedlich optimiert. Excel-Limitation: Spaltenbreite gilt fuer die ganze Spalte.
### Zell-Formatierung
| Format | Quelle | Fortune Sheet Property |
|--------|--------|----------------------|
| Fett | `is_header`, `is_bold`, groessere Schrift | `bl: 1` |
| Schriftfarbe | OCR word_boxes color | `fc: '#hex'` |
| Hintergrund | Box bg_hex, Header | `bg: '#hex08'` |
| Text-Wrap | Mehrzeilige Zellen (\n) | `tb: '2'` |
| Vertikal oben | Mehrzeilige Zellen | `vt: 0` |
| Groessere Schrift | word_box height >1.3x median | `fs: 12` |
### Spaltenbreiten
Auto-Fit: `max(laengster_text * 7.5 + 16, original_px * scaleFactor)`
### Toolbar
`undo, redo, font-bold, font-italic, font-strikethrough, font-color, background, font-size, horizontal-align, vertical-align, text-wrap, merge-cell, border`
---
## Unified Grid (Backend)
**Datei:** `klausur-service/backend/unified_grid.py`
**Tests:** `tests/test_unified_grid.py` (10 Tests)
Mergt alle Zonen in ein einzelnes Grid (fuer Export/Analyse):
```
POST /api/v1/ocr-pipeline/sessions/{id}/build-unified-grid
GET /api/v1/ocr-pipeline/sessions/{id}/unified-grid
```
- Dominante Zeilenhoehe = Median der Content-Row-Abstaende
- Full-Width Boxen: Rows direkt integriert
- Partial-Width Boxen: Extra-Rows eingefuegt wenn Box mehr Zeilen hat
- Box-Zellen mit `source_zone_type: "box"` und `box_region` Metadaten
---
## Dateistruktur
### Backend (klausur-service)
| Datei | Zeilen | Beschreibung |
|-------|--------|--------------|
| `grid_build_core.py` | 213 | `_build_grid_core()` — Orchestrator (ruft Phase-Module) |
| `grid_build_zones.py` | 462 | Phase 2: Bildverarbeitung, Grafik-/Box-Erkennung, Zonen |
| `grid_build_cleanup.py` | 390 | Phase 3: Junk-Zeilen, Artefakte, Pipes, Randstreifen |
| `grid_build_text_ops.py` | 489 | Phase 4+5a: Farben, Ueberschriften, IPA, Seitenreferenzen |
| `grid_build_cell_ops.py` | 305 | Phase 5b: Bullet-Entfernung, Wort-Reihenfolge, max_columns |
| `grid_build_finalize.py` | 452 | Phase 5c+6: Woerterbuch, Silben, Rechtschreibung, Ergebnis |
| `grid_editor_api.py` | 474 | REST-Endpoints (build, save, get, gutter, box, unified) |
| `grid_editor_helpers.py` | 1737 | Helper: Spalten, Rows, Cells, Colspan, Header |
| `smart_spell.py` | 587 | SmartSpellChecker |
| `cv_box_layout.py` | 339 | Box-Layout-Klassifikation + Grid-Aufbau |
| `unified_grid.py` | 425 | Unified Grid Builder |
### Frontend (admin-lehrer)
| Datei | Zeilen | Beschreibung |
|-------|--------|--------------|
| `StepBoxGridReview.tsx` | 283 | Box-Review Step 11 |
| `StepAnsicht.tsx` | 112 | Ansicht Step 12 (Split-View) |
| `SpreadsheetView.tsx` | ~160 | Fortune Sheet Integration |
| `GridTable.tsx` | 652 | Grid-Editor Tabelle (Steps 9-11) |
| `useGridEditor.ts` | 985 | Grid-Editor Hook |
### Tests
| Datei | Tests | Beschreibung |
|-------|-------|--------------|
| `test_smart_spell.py` | 43 | Spracherkennung, Boundary Repair, IPA-Schutz |
| `test_box_layout.py` | 13 | Layout-Klassifikation, Bullet-Gruppierung |
| `test_unified_grid.py` | 10 | Unified Grid, Box-Klassifikation |
| **Gesamt** | **66** | |
---
## Aenderungshistorie
| Datum | Aenderung |
|-------|-----------|
| 2026-04-15 | Fortune Sheet Multi-Sheet Tabs, Bullet-Points, Auto-Fit, Refactoring |
| 2026-04-14 | Unified Grid, Ansicht Step, Colspan-Erkennung |
| 2026-04-13 | Box-Grid-Review Step, Spalten in Boxen, Header/Footer Filter |
| 2026-04-12 | SmartSpellChecker, Frequency Scoring, IPA-Schutz, Vocab-Worksheet Refactoring |

View File

@@ -188,11 +188,35 @@ ssh macmini "docker compose up -d klausur-service studio-v2"
--- ---
## Frontend Refactoring (2026-04-12)
`page.tsx` wurde von 2337 Zeilen in 14 Dateien aufgeteilt:
```
studio-v2/app/vocab-worksheet/
├── page.tsx # 198 Zeilen — Orchestrator
├── types.ts # Interfaces, VocabWorksheetHook
├── constants.ts # API-Base, Formats, Defaults
├── useVocabWorksheet.ts # 843 Zeilen — Custom Hook (alle State + Logik)
└── components/
├── UploadScreen.tsx # Session-Liste + Dokument-Auswahl
├── PageSelection.tsx # PDF-Seitenauswahl
├── VocabularyTab.tsx # Vokabel-Tabelle + IPA/Silben
├── WorksheetTab.tsx # Format-Auswahl + Konfiguration
├── ExportTab.tsx # PDF-Download
├── OcrSettingsPanel.tsx # OCR-Filter Einstellungen
├── FullscreenPreview.tsx # Vollbild-Vorschau Modal
├── QRCodeModal.tsx # QR-Upload Modal
└── OcrComparisonModal.tsx # OCR-Vergleich Modal
```
---
## Erweiterung: Neue Formate hinzufuegen ## Erweiterung: Neue Formate hinzufuegen
1. **Backend**: Neuen Generator in `klausur-service/backend/` erstellen 1. **Backend**: Neuen Generator in `klausur-service/backend/` erstellen
2. **API**: Neuen Endpoint in `vocab_worksheet_api.py` hinzufuegen 2. **API**: Neuen Endpoint in `vocab_worksheet_api.py` hinzufuegen
3. **Frontend**: Format zu `worksheetFormats` Array in `page.tsx` hinzufuegen 3. **Frontend**: Format zu `worksheetFormats` Array in `constants.ts` hinzufuegen
4. **Doku**: Diese Datei aktualisieren 4. **Doku**: Diese Datei aktualisieren
--- ---

9
.claude/settings.json Normal file
View File

@@ -0,0 +1,9 @@
{
"permissions": {
"allow": [
"Bash",
"Write",
"Read"
]
}
}

36
AGENTS.go.md Normal file
View File

@@ -0,0 +1,36 @@
# AGENTS.go.md — Go/Gin Konventionen
## Architektur
- `handlers/`: HTTP Transport nur — Decode, Validate, Call Service, Encode Response
- `service/` oder `usecase/`: Business Logic
- `repo/`: Storage/Integration
- `model/` oder `domain/`: Domain Entities
- `tests/`: Table-driven Tests bevorzugen
## Regeln
1. Handler ≤40 LOC — nur Decode → Service → Encode
2. Business Logic NICHT in Handlers verstecken
3. Grosse Handler nach Resource/Verb splitten
4. Request/Response DTOs nah am Transport halten
5. Interfaces nur an echten Boundaries (nicht ueberall fuer Mocks)
6. Keine Giant-Utility-Dateien
7. Generated Files nicht manuell editieren
## Split-Trigger
- Handler-Datei ueberschreitet 400-500 LOC
- Unrelated Endpoints zusammengruppiert
- Encoding/Decoding dominiert die Handler-Datei
- Service-Logik und Transport-Logik gemischt
## Verifikation
```bash
gofmt -l . | grep -q . && exit 1
go vet ./...
golangci-lint run --timeout=5m
go test -race ./...
go build ./...
```

36
AGENTS.python.md Normal file
View File

@@ -0,0 +1,36 @@
# AGENTS.python.md — Python/FastAPI Konventionen
## Architektur
- `routes/` oder `api/`: Request/Response nur — kein Business Logic
- `services/`: Business Logic
- `repositories/`: Persistenz/Data Access
- `schemas/`: Pydantic Models, nach Domain gesplittet
- `tests/`: Spiegelt Produktions-Layout
## Regeln
1. Route-Dateien duenn halten (≤300 LOC)
2. Wenn eine Route-Datei 300-400 LOC erreicht → nach Resource/Operation splitten
3. Schema-Dateien nach Domain splitten wenn sie wachsen
4. Modul-Level Singleton-Kopplung vermeiden (Tests patchen falsches Symbol)
5. Patch immer das Symbol das vom getesteten Modul importiert wird
6. Dependency Injection bevorzugen statt versteckte Imports
7. Pydantic v2: `from __future__ import annotations` NICHT verwenden (bricht Pydantic)
8. Migrationen getrennt von Refactorings halten
## Split-Trigger
- Datei naehert sich oder ueberschreitet 500 LOC
- Zirkulaere Imports erscheinen
- Tests brauchen tiefes Patching
- API-Schemas mischen verschiedene Domains
- Service-Datei macht Transport UND DB-Logik
## Verifikation
```bash
ruff check .
mypy . --ignore-missing-imports --no-error-summary
pytest tests/ -x -q --no-header
```

55
AGENTS.typescript.md Normal file
View File

@@ -0,0 +1,55 @@
# AGENTS.typescript.md — Next.js Konventionen
## Architektur
- `app/.../page.tsx`: Minimale Seiten-Komposition (≤250 LOC)
- `app/.../actions.ts`: Server Actions
- `app/.../queries.ts`: Data Loading
- `app/.../_components/`: View-Teile (Colocation)
- `app/.../_hooks/`: Seiten-spezifische Hooks (Colocation)
- `types/` oder `types/*.ts`: Domain-spezifische Types
- `schemas/`: Zod/Validierungs-Schemas
- `lib/`: Shared Utilities
## Regeln
1. page.tsx duenn halten (≤250 LOC)
2. Grosse Seiten frueh in Sections/Components splitten
3. KEINE einzelne types.ts als Catch-All
4. types.ts UND types/ Shadowing vermeiden (eines waehlen!)
5. Server/Client Module-Grenzen explizit halten
6. Pure Helpers und schmale Props bevorzugen
7. API-Client Types getrennt von handgeschriebenen Domain Types
## Colocation Pattern (bevorzugt)
```
app/(admin)/ai/rag/
page.tsx ← duenn, komponiert nur
_components/
SearchPanel.tsx
ResultsTable.tsx
FilterBar.tsx
_hooks/
useRagSearch.ts
actions.ts ← Server Actions
queries.ts ← Data Fetching
```
## Split-Trigger
- page.tsx ueberschreitet 250-350 LOC
- types.ts ueberschreitet 200-300 LOC
- Form-Logik, Server Actions und Rendering in einer Datei
- Mehrere unabhaengig testbare Sections vorhanden
- Imports werden broechig
## Verifikation
```bash
npx tsc --noEmit
npm run lint
npm run build
```
> `npm run build` ist PFLICHT — `tsc` allein reicht nicht.

View File

@@ -0,0 +1,58 @@
import Link from 'next/link'
import { Brain, ArrowLeft, Play, Pause, CheckCircle, XCircle } from 'lucide-react'
import type { AgentDetail } from './types'
interface AgentHeaderProps {
agent: AgentDetail
}
export function AgentHeader({ agent }: AgentHeaderProps) {
return (
<div className="flex items-center justify-between mb-6">
<div className="flex items-center gap-4">
<Link
href="/ai/agents"
className="p-2 hover:bg-gray-100 rounded-lg transition-colors"
>
<ArrowLeft className="w-5 h-5 text-gray-600" />
</Link>
<div
className="p-3 rounded-xl"
style={{ backgroundColor: `${agent.color}20` }}
>
<Brain className="w-6 h-6" style={{ color: agent.color }} />
</div>
<div>
<h1 className="text-2xl font-bold text-gray-900">{agent.name}</h1>
<p className="text-gray-500">{agent.description}</p>
</div>
</div>
<div className="flex items-center gap-3">
<div className={`flex items-center gap-2 px-3 py-1.5 rounded-full text-sm font-medium ${
agent.status === 'running' ? 'bg-green-100 text-green-700' :
agent.status === 'paused' ? 'bg-yellow-100 text-yellow-700' :
'bg-red-100 text-red-700'
}`}>
{agent.status === 'running' ? <CheckCircle className="w-4 h-4" /> :
agent.status === 'paused' ? <Pause className="w-4 h-4" /> :
<XCircle className="w-4 h-4" />}
{agent.status}
</div>
<button className="flex items-center gap-2 px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors">
{agent.status === 'running' ? (
<>
<Pause className="w-4 h-4" />
Pausieren
</>
) : (
<>
<Play className="w-4 h-4" />
Starten
</>
)}
</button>
</div>
</div>
)
}

View File

@@ -0,0 +1,32 @@
import type { AgentDetail } from './types'
interface AgentStatsBarProps {
agent: AgentDetail
}
export function AgentStatsBar({ agent }: AgentStatsBarProps) {
return (
<div className="grid grid-cols-5 gap-4 mb-6">
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Aktive Sessions</div>
<div className="text-2xl font-bold text-gray-900">{agent.activeSessions}</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Verarbeitet (24h)</div>
<div className="text-2xl font-bold text-gray-900">{agent.totalProcessed.toLocaleString()}</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Avg. Antwortzeit</div>
<div className="text-2xl font-bold text-gray-900">{agent.avgResponseTime}ms</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Fehlerrate</div>
<div className="text-2xl font-bold text-amber-600">{agent.errorRate}%</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Version</div>
<div className="text-2xl font-bold text-gray-900">{agent.version}</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,32 @@
import { History } from 'lucide-react'
import type { ChangeLog } from './types'
interface HistoryTabContentProps {
changeLogs: ChangeLog[]
}
export function HistoryTabContent({ changeLogs }: HistoryTabContentProps) {
return (
<div>
<div className="space-y-4">
{changeLogs.map((log) => (
<div key={log.id} className="flex items-start gap-4 p-4 bg-gray-50 rounded-lg">
<div className="p-2 bg-white rounded-full border border-gray-200">
<History className="w-4 h-4 text-gray-500" />
</div>
<div className="flex-1">
<div className="flex items-center justify-between">
<span className="font-medium text-gray-900">{log.action}</span>
<span className="text-sm text-gray-500">
{new Date(log.timestamp).toLocaleString('de-DE')}
</span>
</div>
<p className="text-sm text-gray-600 mt-1">{log.description}</p>
<p className="text-xs text-gray-400 mt-1">von {log.user}</p>
</div>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,102 @@
import { FileText, Clock, RotateCcw, Save, Edit3, AlertTriangle } from 'lucide-react'
import type { AgentDetail } from './types'
interface SoulTabContentProps {
agent: AgentDetail
editedContent: string
isEditing: boolean
hasChanges: boolean
saving: boolean
onContentChange: (content: string) => void
onSave: () => void
onReset: () => void
onStartEditing: () => void
}
export function SoulTabContent({
agent,
editedContent,
isEditing,
hasChanges,
saving,
onContentChange,
onSave,
onReset,
onStartEditing,
}: SoulTabContentProps) {
return (
<div>
<div className="flex items-center justify-between mb-4">
<div className="flex items-center gap-2 text-sm text-gray-500">
<FileText className="w-4 h-4" />
{agent.soulFile}
<span className="text-gray-300">|</span>
<Clock className="w-4 h-4" />
Zuletzt geaendert: {new Date(agent.updatedAt).toLocaleString('de-DE')}
</div>
<div className="flex items-center gap-2">
{isEditing ? (
<>
<button
onClick={onReset}
disabled={!hasChanges}
className="flex items-center gap-2 px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
>
<RotateCcw className="w-4 h-4" />
Zuruecksetzen
</button>
<button
onClick={onSave}
disabled={!hasChanges || saving}
className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors disabled:opacity-50"
>
<Save className="w-4 h-4" />
{saving ? 'Speichert...' : 'Speichern'}
</button>
</>
) : (
<button
onClick={onStartEditing}
className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
>
<Edit3 className="w-4 h-4" />
Bearbeiten
</button>
)}
</div>
</div>
{hasChanges && (
<div className="mb-4 p-3 bg-amber-50 border border-amber-200 rounded-lg flex items-center gap-2 text-amber-700">
<AlertTriangle className="w-4 h-4" />
<span className="text-sm">Ungespeicherte Aenderungen vorhanden</span>
</div>
)}
<div className="relative">
{isEditing ? (
<textarea
value={editedContent}
onChange={(e) => onContentChange(e.target.value)}
className="w-full h-[600px] p-4 font-mono text-sm bg-gray-50 border border-gray-200 rounded-lg focus:outline-none focus:ring-2 focus:ring-teal-500 focus:border-transparent resize-none"
spellCheck={false}
/>
) : (
<div className="w-full h-[600px] p-4 font-mono text-sm bg-gray-50 border border-gray-200 rounded-lg overflow-auto whitespace-pre-wrap">
{agent.soulContent}
</div>
)}
</div>
<div className="mt-4 p-4 bg-blue-50 border border-blue-200 rounded-lg">
<h4 className="font-medium text-blue-900 mb-2">Hinweise zur SOUL-Datei</h4>
<ul className="text-sm text-blue-700 space-y-1">
<li>Die SOUL-Datei definiert die Persoenlichkeit und das Verhalten des Agents</li>
<li>Aenderungen werden nach dem Speichern sofort wirksam</li>
<li>Testen Sie Aenderungen zuerst im Staging-Modus</li>
<li>Alle Aenderungen werden in der Historie protokolliert</li>
</ul>
</div>
</div>
)
}

View File

@@ -0,0 +1,16 @@
import Link from 'next/link'
import { Activity } from 'lucide-react'
export function StatsTabContent() {
return (
<div className="space-y-6">
<div className="text-center py-12 text-gray-500">
<Activity className="w-12 h-12 mx-auto mb-4 text-gray-400" />
<p>Live-Statistiken werden in einer zukuenftigen Version verfuegbar sein.</p>
<p className="text-sm mt-2">
Besuchen Sie die <Link href="/ai/agents/statistics" className="text-teal-600 hover:underline">Statistik-Seite</Link> fuer aggregierte Daten.
</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,6 @@
export { AgentHeader } from './AgentHeader'
export { AgentStatsBar } from './AgentStatsBar'
export { SoulTabContent } from './SoulTabContent'
export { StatsTabContent } from './StatsTabContent'
export { HistoryTabContent } from './HistoryTabContent'
export type { AgentDetail, ChangeLog } from './types'

View File

@@ -0,0 +1,308 @@
import type { AgentDetail, ChangeLog } from './types'
export const mockAgentDetails: Record<string, AgentDetail> = {
'tutor-agent': {
id: 'tutor-agent',
name: 'TutorAgent',
description: 'Geduldiger, ermutigender Lernbegleiter fuer Schueler',
soulFile: 'tutor-agent.soul.md',
soulContent: `# TutorAgent SOUL
## Identitaet
Du bist ein geduldiger, ermutigender Lernbegleiter fuer Schueler.
Dein Ziel ist es, Verstaendnis zu foerdern, nicht Antworten vorzugeben.
## Kernprinzipien
- **Sokratische Methode**: Stelle Fragen, die zum Nachdenken anregen
- **Positives Reinforcement**: Erkenne und feiere Lernfortschritte
- **Adaptive Kommunikation**: Passe Sprache und Komplexitaet an das Niveau an
- **Geduld**: Wiederhole Erklaerungen ohne Frustration zu zeigen
## Kommunikationsstil
- Verwende einfache, klare Sprache
- Stelle Rueckfragen, um Verstaendnis zu pruefen
- Gib Hinweise statt direkter Loesungen
- Feiere kleine Erfolge
- Nutze Analogien und Beispiele aus dem Alltag
- Strukturiere komplexe Themen in verdauliche Schritte
## Fachgebiete
- Mathematik (Grundschule bis Abitur)
- Naturwissenschaften (Physik, Chemie, Biologie)
- Sprachen (Deutsch, Englisch)
- Gesellschaftswissenschaften (Geschichte, Politik)
## Lernstrategien
1. **Konzeptbasiertes Lernen**: Erklaere das "Warum" hinter Regeln
2. **Visualisierung**: Nutze Diagramme und Skizzen wenn moeglich
3. **Verbindungen herstellen**: Verknuepfe neues Wissen mit Bekanntem
4. **Wiederholung**: Baue systematische Wiederholung ein
5. **Selbsttest**: Ermutige zur Selbstueberpruefung
## Einschraenkungen
- Gib NIEMALS vollstaendige Loesungen fuer Hausaufgaben
- Verweise bei komplexen Themen auf Lehrkraefte
- Erkenne Frustration und biete Pausen an
- Keine Unterstuetzung bei Pruefungsbetrug
- Keine medizinischen oder rechtlichen Ratschlaege
## Eskalation
- Bei wiederholtem Unverstaendnis: Schlage alternatives Erklaerformat vor
- Bei emotionaler Belastung: Empfehle Gespraech mit Vertrauensperson
- Bei technischen Problemen: Eskaliere an Support
- Bei Verdacht auf Lernschwierigkeiten: Empfehle professionelle Diagnostik
## Metrik-Ziele
- Verstaendnis-Score > 80% bei Nachfragen
- Engagement-Zeit > 5 Minuten pro Session
- Wiederbesuchs-Rate > 60%
- Frustrations-Indikatoren < 10%`,
color: '#3b82f6',
status: 'running',
activeSessions: 12,
totalProcessed: 1847,
avgResponseTime: 234,
errorRate: 0.5,
lastRestart: '2025-01-14T08:30:00Z',
version: '1.2.0',
createdAt: '2024-11-01T00:00:00Z',
updatedAt: '2025-01-14T10:15:00Z'
},
'grader-agent': {
id: 'grader-agent',
name: 'GraderAgent',
description: 'Objektiver, fairer Pruefer von Schuelerarbeiten',
soulFile: 'grader-agent.soul.md',
soulContent: `# GraderAgent SOUL
## Identitaet
Du bist ein objektiver, fairer Pruefer von Schuelerarbeiten.
Dein Ziel ist konstruktives Feedback, das zum Lernen motiviert.
## Kernprinzipien
- **Objektivitaet**: Bewerte nach festgelegten Kriterien, nicht nach Sympathie
- **Fairness**: Gleiche Massstaebe fuer alle Schueler
- **Konstruktivitaet**: Feedback soll zum Lernen anregen
- **Transparenz**: Begruende jede Bewertung nachvollziehbar
## Bewertungsprinzipien
- Bewerte nach festgelegten Kriterien (Erwartungshorizont)
- Beruecksichtige Teilleistungen
- Unterscheide zwischen Fluechtigkeitsfehlern und Verstaendnisluecken
- Formuliere Feedback lernfoerdernd
- Nutze das 15-Punkte-System korrekt (0-15 Punkte, 5 = ausreichend)
## Workflow
1. Lies die Aufgabenstellung und den Erwartungshorizont
2. Analysiere die Schuelerantwort systematisch
3. Identifiziere korrekte Elemente
4. Identifiziere Fehler mit Kategorisierung
5. Vergebe Punkte nach Kriterienkatalog
6. Formuliere konstruktives Feedback
## Fehlerkategorien
- **Rechtschreibung (R)**: Orthografische Fehler
- **Grammatik (Gr)**: Grammatikalische Fehler
- **Ausdruck (A)**: Stilistische Schwaechen
- **Inhalt (I)**: Fachliche Fehler oder Luecken
- **Struktur (St)**: Aufbau- und Gliederungsprobleme
- **Logik (L)**: Argumentationsfehler
## Qualitaetssicherung
- Bei Unsicherheit: Markiere zur manuellen Ueberpruefung
- Bei Grenzfaellen: Dokumentiere Entscheidungsgrundlage
- Konsistenz: Vergleiche mit aehnlichen Bewertungen
- Kalibrierung: Orientiere an Vergleichsarbeiten
## Eskalation
- Unleserliche Antworten: Markiere fuer manuelles Review
- Verdacht auf Plagiat: Eskaliere an Lehrkraft
- Technische Fehler: Pausiere und melde
- Unklare Aufgabenstellung: Frage nach Klarstellung`,
color: '#10b981',
status: 'running',
activeSessions: 3,
totalProcessed: 456,
avgResponseTime: 1205,
errorRate: 1.2,
lastRestart: '2025-01-13T14:00:00Z',
version: '1.1.0',
createdAt: '2024-11-01T00:00:00Z',
updatedAt: '2025-01-13T16:30:00Z'
},
'quality-judge': {
id: 'quality-judge',
name: 'QualityJudge',
description: 'Kritischer Qualitaetspruefer fuer KI-generierte Inhalte',
soulFile: 'quality-judge.soul.md',
soulContent: `# QualityJudge SOUL
## Identitaet
Du bist ein kritischer Qualitaetspruefer fuer KI-generierte Inhalte.
Dein Ziel ist die Sicherstellung hoher Qualitaetsstandards.
## Bewertungsdimensionen
### 1. Intent Accuracy (0-100)
- Wurde die Benutzerabsicht korrekt erkannt?
- Stimmt die Kategorie der Antwort?
### 2. Faithfulness (1-5)
- **5**: Vollstaendig faktisch korrekt
- **4**: Minor Ungenauigkeiten ohne Auswirkung
- **3**: Einige Ungenauigkeiten, Kernaussage korrekt
- **2**: Signifikante Fehler
- **1**: Grundlegend falsch
### 3. Relevance (1-5)
- **5**: Direkt und vollstaendig relevant
- **4**: Weitgehend relevant
- **3**: Teilweise relevant
- **2**: Geringe Relevanz
- **1**: Voellig irrelevant
### 4. Coherence (1-5)
- **5**: Perfekt strukturiert und logisch
- **4**: Gut strukturiert, kleine Luecken
- **3**: Verstaendlich, aber verbesserungsfaehig
- **2**: Schwer zu folgen
- **1**: Unverstaendlich/chaotisch
### 5. Safety ("pass"/"fail")
- Keine DSGVO-Verstoesse (keine PII)
- Keine schaedlichen Inhalte
- Keine Desinformation
- Keine Diskriminierung
- Altersgerechte Sprache
## Schwellenwerte
- **Production Ready**: composite >= 80
- **Needs Review**: 60 <= composite < 80
- **Failed**: composite < 60`,
color: '#f59e0b',
status: 'running',
activeSessions: 8,
totalProcessed: 3291,
avgResponseTime: 89,
errorRate: 0.3,
lastRestart: '2025-01-14T06:00:00Z',
version: '2.0.0',
createdAt: '2024-10-15T00:00:00Z',
updatedAt: '2025-01-14T08:00:00Z'
},
'alert-agent': {
id: 'alert-agent',
name: 'AlertAgent',
description: 'Aufmerksamer Waechter fuer das Breakpilot-System',
soulFile: 'alert-agent.soul.md',
soulContent: `# AlertAgent SOUL
## Identitaet
Du bist ein aufmerksamer Waechter fuer das Breakpilot-System.
Dein Ziel ist die rechtzeitige Erkennung und Kommunikation relevanter Ereignisse.
## Importance Levels
### KRITISCH (5)
- Systemausfaelle
- Sicherheitsvorfaelle
- DSGVO-Verstoesse
**Aktion**: Sofortige Benachrichtigung aller Admins
### DRINGEND (4)
- Performance-Probleme
- API-Ausfaelle
- Hohe Fehlerraten
**Aktion**: Benachrichtigung innerhalb 5 Minuten
### WICHTIG (3)
- Neue kritische Nachrichten
- Relevante Bildungspolitik
- Technische Warnungen
**Aktion**: Taeglicher Digest
### PRUEFEN (2)
- Interessante Entwicklungen
- Konkurrenznachrichten
**Aktion**: Woechentlicher Digest
### INFO (1)
- Allgemeine Updates
**Aktion**: Archivieren`,
color: '#ef4444',
status: 'running',
activeSessions: 1,
totalProcessed: 892,
avgResponseTime: 45,
errorRate: 0.1,
lastRestart: '2025-01-12T00:00:00Z',
version: '1.0.0',
createdAt: '2024-12-01T00:00:00Z',
updatedAt: '2025-01-12T02:00:00Z'
},
'orchestrator': {
id: 'orchestrator',
name: 'Orchestrator',
description: 'Zentraler Koordinator des Multi-Agent-Systems',
soulFile: 'orchestrator.soul.md',
soulContent: `# OrchestratorAgent SOUL
## Identitaet
Du bist der zentrale Koordinator des Breakpilot Multi-Agent-Systems.
Dein Ziel ist die effiziente Verteilung und Ueberwachung von Aufgaben.
## Kernprinzipien
- **Effizienz**: Minimale Latenz bei maximaler Qualitaet
- **Resilienz**: Graceful Degradation bei Agent-Ausfaellen
- **Fairness**: Ausgewogene Lastverteilung
- **Transparenz**: Volle Nachvollziehbarkeit aller Entscheidungen
## Verantwortlichkeiten
1. Task-Routing zu spezialisierten Agents
2. Session-Management und Recovery
3. Agent-Gesundheitsueberwachung
4. Lastverteilung
5. Fehlerbehandlung und Retry-Logik
## Task-Routing-Logik
| Intent-Kategorie | Primaerer Agent | Fallback |
|------------------|-----------------|----------|
| learning_support | TutorAgent | Manuell |
| exam_grading | GraderAgent | QualityJudge |
| quality_check | QualityJudge | Manual Review |
| system_alert | AlertAgent | E-Mail Fallback |
## Fehlerbehandlung
### Retry-Policy
- **Max Retries**: 3
- **Backoff**: Exponential (1s, 2s, 4s)
- **Keine Retries**: Validation Errors, Auth Failures
### Circuit Breaker
- **Threshold**: 5 Fehler in 60 Sekunden
- **Cooldown**: 30 Sekunden
## Metriken
- **Task Completion Rate**: > 99%
- **Average Latency**: < 2s
- **Error Rate**: < 1%`,
color: '#8b5cf6',
status: 'running',
activeSessions: 24,
totalProcessed: 8934,
avgResponseTime: 12,
errorRate: 0.2,
lastRestart: '2025-01-14T00:00:00Z',
version: '1.5.0',
createdAt: '2024-10-01T00:00:00Z',
updatedAt: '2025-01-14T00:30:00Z'
}
}
export const mockChangeLogs: ChangeLog[] = [
{ id: '1', timestamp: '2025-01-14T10:15:00Z', user: 'admin@breakpilot.de', action: 'SOUL Updated', description: 'Kommunikationsstil angepasst' },
{ id: '2', timestamp: '2025-01-13T14:30:00Z', user: 'lehrer1@schule.de', action: 'Einschraenkung hinzugefuegt', description: 'Keine Hausaufgaben-Loesungen' },
{ id: '3', timestamp: '2025-01-10T09:00:00Z', user: 'admin@breakpilot.de', action: 'Version 1.2.0', description: 'Neue Fachgebiete hinzugefuegt' },
]

View File

@@ -0,0 +1,25 @@
export interface AgentDetail {
id: string
name: string
description: string
soulFile: string
soulContent: string
color: string
status: 'running' | 'paused' | 'stopped' | 'error'
activeSessions: number
totalProcessed: number
avgResponseTime: number
errorRate: number
lastRestart: string
version: string
createdAt: string
updatedAt: string
}
export interface ChangeLog {
id: string
timestamp: string
user: string
action: string
description: string
}

View File

@@ -1,348 +1,29 @@
'use client' 'use client'
import { useState, useEffect } from 'react' import { useState, useEffect } from 'react'
import { useParams, useRouter } from 'next/navigation' import { useParams } from 'next/navigation'
import Link from 'next/link' import Link from 'next/link'
import { Bot, Brain, ArrowLeft, Save, RotateCcw, Play, Pause, AlertTriangle, FileText, Settings, Activity, Clock, CheckCircle, XCircle, History, Eye, Edit3 } from 'lucide-react' import { AlertTriangle, FileText, Activity, History } from 'lucide-react'
import { mockAgentDetails, mockChangeLogs } from './_components/mock-data'
import {
AgentHeader,
AgentStatsBar,
SoulTabContent,
StatsTabContent,
HistoryTabContent,
} from './_components'
import type { AgentDetail } from './_components'
// Types type TabId = 'soul' | 'stats' | 'history'
interface AgentDetail {
id: string
name: string
description: string
soulFile: string
soulContent: string
color: string
status: 'running' | 'paused' | 'stopped' | 'error'
activeSessions: number
totalProcessed: number
avgResponseTime: number
errorRate: number
lastRestart: string
version: string
createdAt: string
updatedAt: string
}
interface ChangeLog { const TABS: { id: TabId; label: string; icon: typeof FileText }[] = [
id: string { id: 'soul', label: 'SOUL-File', icon: FileText },
timestamp: string { id: 'stats', label: 'Live-Statistiken', icon: Activity },
user: string { id: 'history', label: 'Aenderungshistorie', icon: History },
action: string
description: string
}
// Mock data
const mockAgentDetails: Record<string, AgentDetail> = {
'tutor-agent': {
id: 'tutor-agent',
name: 'TutorAgent',
description: 'Geduldiger, ermutigender Lernbegleiter fuer Schueler',
soulFile: 'tutor-agent.soul.md',
soulContent: `# TutorAgent SOUL
## Identitaet
Du bist ein geduldiger, ermutigender Lernbegleiter fuer Schueler.
Dein Ziel ist es, Verstaendnis zu foerdern, nicht Antworten vorzugeben.
## Kernprinzipien
- **Sokratische Methode**: Stelle Fragen, die zum Nachdenken anregen
- **Positives Reinforcement**: Erkenne und feiere Lernfortschritte
- **Adaptive Kommunikation**: Passe Sprache und Komplexitaet an das Niveau an
- **Geduld**: Wiederhole Erklaerungen ohne Frustration zu zeigen
## Kommunikationsstil
- Verwende einfache, klare Sprache
- Stelle Rueckfragen, um Verstaendnis zu pruefen
- Gib Hinweise statt direkter Loesungen
- Feiere kleine Erfolge
- Nutze Analogien und Beispiele aus dem Alltag
- Strukturiere komplexe Themen in verdauliche Schritte
## Fachgebiete
- Mathematik (Grundschule bis Abitur)
- Naturwissenschaften (Physik, Chemie, Biologie)
- Sprachen (Deutsch, Englisch)
- Gesellschaftswissenschaften (Geschichte, Politik)
## Lernstrategien
1. **Konzeptbasiertes Lernen**: Erklaere das "Warum" hinter Regeln
2. **Visualisierung**: Nutze Diagramme und Skizzen wenn moeglich
3. **Verbindungen herstellen**: Verknuepfe neues Wissen mit Bekanntem
4. **Wiederholung**: Baue systematische Wiederholung ein
5. **Selbsttest**: Ermutige zur Selbstueberpruefung
## Einschraenkungen
- Gib NIEMALS vollstaendige Loesungen fuer Hausaufgaben
- Verweise bei komplexen Themen auf Lehrkraefte
- Erkenne Frustration und biete Pausen an
- Keine Unterstuetzung bei Pruefungsbetrug
- Keine medizinischen oder rechtlichen Ratschlaege
## Eskalation
- Bei wiederholtem Unverstaendnis: Schlage alternatives Erklaerformat vor
- Bei emotionaler Belastung: Empfehle Gespraech mit Vertrauensperson
- Bei technischen Problemen: Eskaliere an Support
- Bei Verdacht auf Lernschwierigkeiten: Empfehle professionelle Diagnostik
## Metrik-Ziele
- Verstaendnis-Score > 80% bei Nachfragen
- Engagement-Zeit > 5 Minuten pro Session
- Wiederbesuchs-Rate > 60%
- Frustrations-Indikatoren < 10%`,
color: '#3b82f6',
status: 'running',
activeSessions: 12,
totalProcessed: 1847,
avgResponseTime: 234,
errorRate: 0.5,
lastRestart: '2025-01-14T08:30:00Z',
version: '1.2.0',
createdAt: '2024-11-01T00:00:00Z',
updatedAt: '2025-01-14T10:15:00Z'
},
'grader-agent': {
id: 'grader-agent',
name: 'GraderAgent',
description: 'Objektiver, fairer Pruefer von Schuelerarbeiten',
soulFile: 'grader-agent.soul.md',
soulContent: `# GraderAgent SOUL
## Identitaet
Du bist ein objektiver, fairer Pruefer von Schuelerarbeiten.
Dein Ziel ist konstruktives Feedback, das zum Lernen motiviert.
## Kernprinzipien
- **Objektivitaet**: Bewerte nach festgelegten Kriterien, nicht nach Sympathie
- **Fairness**: Gleiche Massstaebe fuer alle Schueler
- **Konstruktivitaet**: Feedback soll zum Lernen anregen
- **Transparenz**: Begruende jede Bewertung nachvollziehbar
## Bewertungsprinzipien
- Bewerte nach festgelegten Kriterien (Erwartungshorizont)
- Beruecksichtige Teilleistungen
- Unterscheide zwischen Fluechtigkeitsfehlern und Verstaendnisluecken
- Formuliere Feedback lernfoerdernd
- Nutze das 15-Punkte-System korrekt (0-15 Punkte, 5 = ausreichend)
## Workflow
1. Lies die Aufgabenstellung und den Erwartungshorizont
2. Analysiere die Schuelerantwort systematisch
3. Identifiziere korrekte Elemente
4. Identifiziere Fehler mit Kategorisierung
5. Vergebe Punkte nach Kriterienkatalog
6. Formuliere konstruktives Feedback
## Fehlerkategorien
- **Rechtschreibung (R)**: Orthografische Fehler
- **Grammatik (Gr)**: Grammatikalische Fehler
- **Ausdruck (A)**: Stilistische Schwaechen
- **Inhalt (I)**: Fachliche Fehler oder Luecken
- **Struktur (St)**: Aufbau- und Gliederungsprobleme
- **Logik (L)**: Argumentationsfehler
## Qualitaetssicherung
- Bei Unsicherheit: Markiere zur manuellen Ueberpruefung
- Bei Grenzfaellen: Dokumentiere Entscheidungsgrundlage
- Konsistenz: Vergleiche mit aehnlichen Bewertungen
- Kalibrierung: Orientiere an Vergleichsarbeiten
## Eskalation
- Unleserliche Antworten: Markiere fuer manuelles Review
- Verdacht auf Plagiat: Eskaliere an Lehrkraft
- Technische Fehler: Pausiere und melde
- Unklare Aufgabenstellung: Frage nach Klarstellung`,
color: '#10b981',
status: 'running',
activeSessions: 3,
totalProcessed: 456,
avgResponseTime: 1205,
errorRate: 1.2,
lastRestart: '2025-01-13T14:00:00Z',
version: '1.1.0',
createdAt: '2024-11-01T00:00:00Z',
updatedAt: '2025-01-13T16:30:00Z'
},
'quality-judge': {
id: 'quality-judge',
name: 'QualityJudge',
description: 'Kritischer Qualitaetspruefer fuer KI-generierte Inhalte',
soulFile: 'quality-judge.soul.md',
soulContent: `# QualityJudge SOUL
## Identitaet
Du bist ein kritischer Qualitaetspruefer fuer KI-generierte Inhalte.
Dein Ziel ist die Sicherstellung hoher Qualitaetsstandards.
## Bewertungsdimensionen
### 1. Intent Accuracy (0-100)
- Wurde die Benutzerabsicht korrekt erkannt?
- Stimmt die Kategorie der Antwort?
### 2. Faithfulness (1-5)
- **5**: Vollstaendig faktisch korrekt
- **4**: Minor Ungenauigkeiten ohne Auswirkung
- **3**: Einige Ungenauigkeiten, Kernaussage korrekt
- **2**: Signifikante Fehler
- **1**: Grundlegend falsch
### 3. Relevance (1-5)
- **5**: Direkt und vollstaendig relevant
- **4**: Weitgehend relevant
- **3**: Teilweise relevant
- **2**: Geringe Relevanz
- **1**: Voellig irrelevant
### 4. Coherence (1-5)
- **5**: Perfekt strukturiert und logisch
- **4**: Gut strukturiert, kleine Luecken
- **3**: Verstaendlich, aber verbesserungsfaehig
- **2**: Schwer zu folgen
- **1**: Unverstaendlich/chaotisch
### 5. Safety ("pass"/"fail")
- Keine DSGVO-Verstoesse (keine PII)
- Keine schaedlichen Inhalte
- Keine Desinformation
- Keine Diskriminierung
- Altersgerechte Sprache
## Schwellenwerte
- **Production Ready**: composite >= 80
- **Needs Review**: 60 <= composite < 80
- **Failed**: composite < 60`,
color: '#f59e0b',
status: 'running',
activeSessions: 8,
totalProcessed: 3291,
avgResponseTime: 89,
errorRate: 0.3,
lastRestart: '2025-01-14T06:00:00Z',
version: '2.0.0',
createdAt: '2024-10-15T00:00:00Z',
updatedAt: '2025-01-14T08:00:00Z'
},
'alert-agent': {
id: 'alert-agent',
name: 'AlertAgent',
description: 'Aufmerksamer Waechter fuer das Breakpilot-System',
soulFile: 'alert-agent.soul.md',
soulContent: `# AlertAgent SOUL
## Identitaet
Du bist ein aufmerksamer Waechter fuer das Breakpilot-System.
Dein Ziel ist die rechtzeitige Erkennung und Kommunikation relevanter Ereignisse.
## Importance Levels
### KRITISCH (5)
- Systemausfaelle
- Sicherheitsvorfaelle
- DSGVO-Verstoesse
**Aktion**: Sofortige Benachrichtigung aller Admins
### DRINGEND (4)
- Performance-Probleme
- API-Ausfaelle
- Hohe Fehlerraten
**Aktion**: Benachrichtigung innerhalb 5 Minuten
### WICHTIG (3)
- Neue kritische Nachrichten
- Relevante Bildungspolitik
- Technische Warnungen
**Aktion**: Taeglicher Digest
### PRUEFEN (2)
- Interessante Entwicklungen
- Konkurrenznachrichten
**Aktion**: Woechentlicher Digest
### INFO (1)
- Allgemeine Updates
**Aktion**: Archivieren`,
color: '#ef4444',
status: 'running',
activeSessions: 1,
totalProcessed: 892,
avgResponseTime: 45,
errorRate: 0.1,
lastRestart: '2025-01-12T00:00:00Z',
version: '1.0.0',
createdAt: '2024-12-01T00:00:00Z',
updatedAt: '2025-01-12T02:00:00Z'
},
'orchestrator': {
id: 'orchestrator',
name: 'Orchestrator',
description: 'Zentraler Koordinator des Multi-Agent-Systems',
soulFile: 'orchestrator.soul.md',
soulContent: `# OrchestratorAgent SOUL
## Identitaet
Du bist der zentrale Koordinator des Breakpilot Multi-Agent-Systems.
Dein Ziel ist die effiziente Verteilung und Ueberwachung von Aufgaben.
## Kernprinzipien
- **Effizienz**: Minimale Latenz bei maximaler Qualitaet
- **Resilienz**: Graceful Degradation bei Agent-Ausfaellen
- **Fairness**: Ausgewogene Lastverteilung
- **Transparenz**: Volle Nachvollziehbarkeit aller Entscheidungen
## Verantwortlichkeiten
1. Task-Routing zu spezialisierten Agents
2. Session-Management und Recovery
3. Agent-Gesundheitsueberwachung
4. Lastverteilung
5. Fehlerbehandlung und Retry-Logik
## Task-Routing-Logik
| Intent-Kategorie | Primaerer Agent | Fallback |
|------------------|-----------------|----------|
| learning_support | TutorAgent | Manuell |
| exam_grading | GraderAgent | QualityJudge |
| quality_check | QualityJudge | Manual Review |
| system_alert | AlertAgent | E-Mail Fallback |
## Fehlerbehandlung
### Retry-Policy
- **Max Retries**: 3
- **Backoff**: Exponential (1s, 2s, 4s)
- **Keine Retries**: Validation Errors, Auth Failures
### Circuit Breaker
- **Threshold**: 5 Fehler in 60 Sekunden
- **Cooldown**: 30 Sekunden
## Metriken
- **Task Completion Rate**: > 99%
- **Average Latency**: < 2s
- **Error Rate**: < 1%`,
color: '#8b5cf6',
status: 'running',
activeSessions: 24,
totalProcessed: 8934,
avgResponseTime: 12,
errorRate: 0.2,
lastRestart: '2025-01-14T00:00:00Z',
version: '1.5.0',
createdAt: '2024-10-01T00:00:00Z',
updatedAt: '2025-01-14T00:30:00Z'
}
}
const mockChangeLogs: ChangeLog[] = [
{ id: '1', timestamp: '2025-01-14T10:15:00Z', user: 'admin@breakpilot.de', action: 'SOUL Updated', description: 'Kommunikationsstil angepasst' },
{ id: '2', timestamp: '2025-01-13T14:30:00Z', user: 'lehrer1@schule.de', action: 'Einschraenkung hinzugefuegt', description: 'Keine Hausaufgaben-Loesungen' },
{ id: '3', timestamp: '2025-01-10T09:00:00Z', user: 'admin@breakpilot.de', action: 'Version 1.2.0', description: 'Neue Fachgebiete hinzugefuegt' },
] ]
export default function AgentDetailPage() { export default function AgentDetailPage() {
const params = useParams() const params = useParams()
const router = useRouter()
const agentId = params.agentId as string const agentId = params.agentId as string
const [agent, setAgent] = useState<AgentDetail | null>(null) const [agent, setAgent] = useState<AgentDetail | null>(null)
@@ -350,10 +31,9 @@ export default function AgentDetailPage() {
const [isEditing, setIsEditing] = useState(false) const [isEditing, setIsEditing] = useState(false)
const [hasChanges, setHasChanges] = useState(false) const [hasChanges, setHasChanges] = useState(false)
const [saving, setSaving] = useState(false) const [saving, setSaving] = useState(false)
const [activeTab, setActiveTab] = useState<'soul' | 'stats' | 'history'>('soul') const [activeTab, setActiveTab] = useState<TabId>('soul')
useEffect(() => { useEffect(() => {
// Load agent data
const agentData = mockAgentDetails[agentId] const agentData = mockAgentDetails[agentId]
if (agentData) { if (agentData) {
setAgent(agentData) setAgent(agentData)
@@ -363,10 +43,7 @@ export default function AgentDetailPage() {
const handleSave = async () => { const handleSave = async () => {
setSaving(true) setSaving(true)
// In production, save to API
// await fetch(`/api/admin/agents/${agentId}/soul`, { method: 'PUT', body: editedContent })
await new Promise(resolve => setTimeout(resolve, 1000)) await new Promise(resolve => setTimeout(resolve, 1000))
if (agent) { if (agent) {
setAgent({ ...agent, soulContent: editedContent, updatedAt: new Date().toISOString() }) setAgent({ ...agent, soulContent: editedContent, updatedAt: new Date().toISOString() })
} }
@@ -393,7 +70,7 @@ export default function AgentDetailPage() {
<div className="text-center py-12"> <div className="text-center py-12">
<AlertTriangle className="w-12 h-12 text-amber-500 mx-auto mb-4" /> <AlertTriangle className="w-12 h-12 text-amber-500 mx-auto mb-4" />
<h2 className="text-xl font-semibold text-gray-900 mb-2">Agent nicht gefunden</h2> <h2 className="text-xl font-semibold text-gray-900 mb-2">Agent nicht gefunden</h2>
<p className="text-gray-500 mb-4">Der Agent "{agentId}" existiert nicht.</p> <p className="text-gray-500 mb-4">Der Agent &quot;{agentId}&quot; existiert nicht.</p>
<Link href="/ai/agents" className="text-teal-600 hover:text-teal-700"> <Link href="/ai/agents" className="text-teal-600 hover:text-teal-700">
&larr; Zurueck zur Uebersicht &larr; Zurueck zur Uebersicht
</Link> </Link>
@@ -404,231 +81,46 @@ export default function AgentDetailPage() {
return ( return (
<div className="p-6 max-w-7xl mx-auto"> <div className="p-6 max-w-7xl mx-auto">
{/* Header */} <AgentHeader agent={agent} />
<div className="flex items-center justify-between mb-6"> <AgentStatsBar agent={agent} />
<div className="flex items-center gap-4">
<Link
href="/ai/agents"
className="p-2 hover:bg-gray-100 rounded-lg transition-colors"
>
<ArrowLeft className="w-5 h-5 text-gray-600" />
</Link>
<div
className="p-3 rounded-xl"
style={{ backgroundColor: `${agent.color}20` }}
>
<Brain className="w-6 h-6" style={{ color: agent.color }} />
</div>
<div>
<h1 className="text-2xl font-bold text-gray-900">{agent.name}</h1>
<p className="text-gray-500">{agent.description}</p>
</div>
</div>
<div className="flex items-center gap-3">
<div className={`flex items-center gap-2 px-3 py-1.5 rounded-full text-sm font-medium ${
agent.status === 'running' ? 'bg-green-100 text-green-700' :
agent.status === 'paused' ? 'bg-yellow-100 text-yellow-700' :
'bg-red-100 text-red-700'
}`}>
{agent.status === 'running' ? <CheckCircle className="w-4 h-4" /> :
agent.status === 'paused' ? <Pause className="w-4 h-4" /> :
<XCircle className="w-4 h-4" />}
{agent.status}
</div>
<button className="flex items-center gap-2 px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors">
{agent.status === 'running' ? (
<>
<Pause className="w-4 h-4" />
Pausieren
</>
) : (
<>
<Play className="w-4 h-4" />
Starten
</>
)}
</button>
</div>
</div>
{/* Stats Bar */}
<div className="grid grid-cols-5 gap-4 mb-6">
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Aktive Sessions</div>
<div className="text-2xl font-bold text-gray-900">{agent.activeSessions}</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Verarbeitet (24h)</div>
<div className="text-2xl font-bold text-gray-900">{agent.totalProcessed.toLocaleString()}</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Avg. Antwortzeit</div>
<div className="text-2xl font-bold text-gray-900">{agent.avgResponseTime}ms</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Fehlerrate</div>
<div className="text-2xl font-bold text-amber-600">{agent.errorRate}%</div>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<div className="text-sm text-gray-500">Version</div>
<div className="text-2xl font-bold text-gray-900">{agent.version}</div>
</div>
</div>
{/* Tabs */} {/* Tabs */}
<div className="bg-white border border-gray-200 rounded-xl overflow-hidden"> <div className="bg-white border border-gray-200 rounded-xl overflow-hidden">
<div className="border-b border-gray-200"> <div className="border-b border-gray-200">
<div className="flex"> <div className="flex">
<button {TABS.map(({ id, label, icon: Icon }) => (
onClick={() => setActiveTab('soul')} <button
className={`flex items-center gap-2 px-6 py-4 text-sm font-medium border-b-2 transition-colors ${ key={id}
activeTab === 'soul' onClick={() => setActiveTab(id)}
? 'border-teal-500 text-teal-600' className={`flex items-center gap-2 px-6 py-4 text-sm font-medium border-b-2 transition-colors ${
: 'border-transparent text-gray-500 hover:text-gray-700' activeTab === id
}`} ? 'border-teal-500 text-teal-600'
> : 'border-transparent text-gray-500 hover:text-gray-700'
<FileText className="w-4 h-4" /> }`}
SOUL-File >
</button> <Icon className="w-4 h-4" />
<button {label}
onClick={() => setActiveTab('stats')} </button>
className={`flex items-center gap-2 px-6 py-4 text-sm font-medium border-b-2 transition-colors ${ ))}
activeTab === 'stats'
? 'border-teal-500 text-teal-600'
: 'border-transparent text-gray-500 hover:text-gray-700'
}`}
>
<Activity className="w-4 h-4" />
Live-Statistiken
</button>
<button
onClick={() => setActiveTab('history')}
className={`flex items-center gap-2 px-6 py-4 text-sm font-medium border-b-2 transition-colors ${
activeTab === 'history'
? 'border-teal-500 text-teal-600'
: 'border-transparent text-gray-500 hover:text-gray-700'
}`}
>
<History className="w-4 h-4" />
Aenderungshistorie
</button>
</div> </div>
</div> </div>
{/* Tab Content */}
<div className="p-6"> <div className="p-6">
{activeTab === 'soul' && ( {activeTab === 'soul' && (
<div> <SoulTabContent
<div className="flex items-center justify-between mb-4"> agent={agent}
<div className="flex items-center gap-2 text-sm text-gray-500"> editedContent={editedContent}
<FileText className="w-4 h-4" /> isEditing={isEditing}
{agent.soulFile} hasChanges={hasChanges}
<span className="text-gray-300">|</span> saving={saving}
<Clock className="w-4 h-4" /> onContentChange={handleContentChange}
Zuletzt geaendert: {new Date(agent.updatedAt).toLocaleString('de-DE')} onSave={handleSave}
</div> onReset={handleReset}
<div className="flex items-center gap-2"> onStartEditing={() => setIsEditing(true)}
{isEditing ? ( />
<>
<button
onClick={handleReset}
disabled={!hasChanges}
className="flex items-center gap-2 px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
>
<RotateCcw className="w-4 h-4" />
Zuruecksetzen
</button>
<button
onClick={handleSave}
disabled={!hasChanges || saving}
className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors disabled:opacity-50"
>
<Save className="w-4 h-4" />
{saving ? 'Speichert...' : 'Speichern'}
</button>
</>
) : (
<button
onClick={() => setIsEditing(true)}
className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
>
<Edit3 className="w-4 h-4" />
Bearbeiten
</button>
)}
</div>
</div>
{hasChanges && (
<div className="mb-4 p-3 bg-amber-50 border border-amber-200 rounded-lg flex items-center gap-2 text-amber-700">
<AlertTriangle className="w-4 h-4" />
<span className="text-sm">Ungespeicherte Aenderungen vorhanden</span>
</div>
)}
<div className="relative">
{isEditing ? (
<textarea
value={editedContent}
onChange={(e) => handleContentChange(e.target.value)}
className="w-full h-[600px] p-4 font-mono text-sm bg-gray-50 border border-gray-200 rounded-lg focus:outline-none focus:ring-2 focus:ring-teal-500 focus:border-transparent resize-none"
spellCheck={false}
/>
) : (
<div className="w-full h-[600px] p-4 font-mono text-sm bg-gray-50 border border-gray-200 rounded-lg overflow-auto whitespace-pre-wrap">
{agent.soulContent}
</div>
)}
</div>
<div className="mt-4 p-4 bg-blue-50 border border-blue-200 rounded-lg">
<h4 className="font-medium text-blue-900 mb-2">Hinweise zur SOUL-Datei</h4>
<ul className="text-sm text-blue-700 space-y-1">
<li> Die SOUL-Datei definiert die Persoenlichkeit und das Verhalten des Agents</li>
<li> Aenderungen werden nach dem Speichern sofort wirksam</li>
<li> Testen Sie Aenderungen zuerst im Staging-Modus</li>
<li> Alle Aenderungen werden in der Historie protokolliert</li>
</ul>
</div>
</div>
)}
{activeTab === 'stats' && (
<div className="space-y-6">
<div className="text-center py-12 text-gray-500">
<Activity className="w-12 h-12 mx-auto mb-4 text-gray-400" />
<p>Live-Statistiken werden in einer zukuenftigen Version verfuegbar sein.</p>
<p className="text-sm mt-2">
Besuchen Sie die <Link href="/ai/agents/statistics" className="text-teal-600 hover:underline">Statistik-Seite</Link> fuer aggregierte Daten.
</p>
</div>
</div>
)}
{activeTab === 'history' && (
<div>
<div className="space-y-4">
{mockChangeLogs.map((log) => (
<div key={log.id} className="flex items-start gap-4 p-4 bg-gray-50 rounded-lg">
<div className="p-2 bg-white rounded-full border border-gray-200">
<History className="w-4 h-4 text-gray-500" />
</div>
<div className="flex-1">
<div className="flex items-center justify-between">
<span className="font-medium text-gray-900">{log.action}</span>
<span className="text-sm text-gray-500">
{new Date(log.timestamp).toLocaleString('de-DE')}
</span>
</div>
<p className="text-sm text-gray-600 mt-1">{log.description}</p>
<p className="text-xs text-gray-400 mt-1">von {log.user}</p>
</div>
</div>
))}
</div>
</div>
)} )}
{activeTab === 'stats' && <StatsTabContent />}
{activeTab === 'history' && <HistoryTabContent changeLogs={mockChangeLogs} />}
</div> </div>
</div> </div>
</div> </div>

View File

@@ -0,0 +1,120 @@
import { Brain, CheckCircle, Shield, AlertTriangle, MessageSquare } from 'lucide-react'
interface AgentCardProps {
icon: React.ReactNode
bgColor: string
hoverBorder: string
name: string
description: string
tags: { label: string; colorClasses: string }[]
soulInfo: string
}
function AgentCard({ icon, bgColor, hoverBorder, name, description, tags, soulInfo }: AgentCardProps) {
return (
<div className={`border border-gray-200 rounded-xl p-4 ${hoverBorder} transition-colors`}>
<div className="flex items-start gap-4">
<div className={`p-3 ${bgColor} rounded-lg`}>
{icon}
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">{name}</h4>
<p className="text-sm text-gray-600 mb-2">{description}</p>
<div className="flex flex-wrap gap-2">
{tags.map(tag => (
<span key={tag.label} className={`px-2 py-1 ${tag.colorClasses} text-xs rounded-full`}>
{tag.label}
</span>
))}
</div>
<div className="mt-2 text-xs text-gray-500">{soulInfo}</div>
</div>
</div>
</div>
)
}
const AGENTS: AgentCardProps[] = [
{
icon: <Brain className="w-6 h-6 text-blue-600" />,
bgColor: 'bg-blue-100',
hoverBorder: 'hover:border-blue-300',
name: 'TutorAgent',
description: 'Lernbegleitung und Fragen beantworten',
tags: [
{ label: 'Geduldig', colorClasses: 'bg-blue-50 text-blue-700' },
{ label: 'Ermutigend', colorClasses: 'bg-blue-50 text-blue-700' },
{ label: 'Sokratisch', colorClasses: 'bg-blue-50 text-blue-700' },
],
soulInfo: 'SOUL: tutor-agent.soul.md | Routing: learning_*, help_*, question_*',
},
{
icon: <CheckCircle className="w-6 h-6 text-green-600" />,
bgColor: 'bg-green-100',
hoverBorder: 'hover:border-green-300',
name: 'GraderAgent',
description: 'Klausur-Korrektur und Bewertung',
tags: [
{ label: 'Objektiv', colorClasses: 'bg-green-50 text-green-700' },
{ label: 'Fair', colorClasses: 'bg-green-50 text-green-700' },
{ label: 'Konstruktiv', colorClasses: 'bg-green-50 text-green-700' },
],
soulInfo: 'SOUL: grader-agent.soul.md | Routing: grade_*, evaluate_*, correct_*',
},
{
icon: <Shield className="w-6 h-6 text-amber-600" />,
bgColor: 'bg-amber-100',
hoverBorder: 'hover:border-amber-300',
name: 'QualityJudge',
description: 'BQAS Qualitaetspruefung',
tags: [
{ label: 'Kritisch', colorClasses: 'bg-amber-50 text-amber-700' },
{ label: 'Praezise', colorClasses: 'bg-amber-50 text-amber-700' },
{ label: 'Schnell', colorClasses: 'bg-amber-50 text-amber-700' },
],
soulInfo: 'SOUL: quality-judge.soul.md | Routing: quality_*, review_*, validate_*',
},
{
icon: <AlertTriangle className="w-6 h-6 text-red-600" />,
bgColor: 'bg-red-100',
hoverBorder: 'hover:border-red-300',
name: 'AlertAgent',
description: 'Monitoring und Benachrichtigungen',
tags: [
{ label: 'Wachsam', colorClasses: 'bg-red-50 text-red-700' },
{ label: 'Proaktiv', colorClasses: 'bg-red-50 text-red-700' },
{ label: 'Priorisierend', colorClasses: 'bg-red-50 text-red-700' },
],
soulInfo: 'SOUL: alert-agent.soul.md | Routing: alert_*, monitor_*, notify_*',
},
{
icon: <MessageSquare className="w-6 h-6 text-purple-600" />,
bgColor: 'bg-purple-100',
hoverBorder: 'hover:border-purple-300',
name: 'Orchestrator',
description: 'Task-Koordination und Routing',
tags: [
{ label: 'Koordinierend', colorClasses: 'bg-purple-50 text-purple-700' },
{ label: 'Effizient', colorClasses: 'bg-purple-50 text-purple-700' },
{ label: 'Zuverlaessig', colorClasses: 'bg-purple-50 text-purple-700' },
],
soulInfo: 'SOUL: orchestrator.soul.md | Routing: Fallback fuer alle unbekannten Intents',
},
]
export function AgentTypesSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Jeder Agent hat eine spezialisierte Rolle im System. Die Agents kommunizieren ueber den Message Bus
und nutzen das Shared Brain fuer konsistente Entscheidungen.
</p>
<div className="grid gap-4">
{AGENTS.map(agent => (
<AgentCard key={agent.name} {...agent} />
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,73 @@
export function DatabaseSchemaSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Das Agent-System nutzt PostgreSQL fuer persistente Daten und Valkey (Redis) fuer Caching und Pub/Sub.
</p>
<div className="space-y-4">
{/* agent_sessions */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_sessions</h4>
<p className="text-sm text-gray-600 mb-3">Speichert Session-Daten mit Checkpoints</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_sessions (
id UUID PRIMARY KEY,
agent_type VARCHAR(50) NOT NULL,
user_id UUID REFERENCES users(id),
state VARCHAR(20) NOT NULL DEFAULT 'active',
context JSONB DEFAULT '{}',
checkpoints JSONB DEFAULT '[]',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
last_heartbeat TIMESTAMPTZ DEFAULT NOW()
);
`}</pre>
</div>
</div>
{/* agent_memory */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_memory</h4>
<p className="text-sm text-gray-600 mb-3">Langzeit-Gedaechtnis mit TTL</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_memory (
id UUID PRIMARY KEY,
namespace VARCHAR(100) NOT NULL,
key VARCHAR(500) NOT NULL,
value JSONB NOT NULL,
agent_id VARCHAR(50) NOT NULL,
access_count INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ,
UNIQUE(namespace, key)
);
`}</pre>
</div>
</div>
{/* agent_messages */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_messages</h4>
<p className="text-sm text-gray-600 mb-3">Audit-Trail fuer Inter-Agent Kommunikation</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_messages (
id UUID PRIMARY KEY,
sender VARCHAR(50) NOT NULL,
receiver VARCHAR(50) NOT NULL,
message_type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
priority INTEGER DEFAULT 1,
correlation_id UUID,
created_at TIMESTAMPTZ DEFAULT NOW()
);
`}</pre>
</div>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,75 @@
export function MessageBusSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Der Message Bus ermoeglicht die asynchrone Kommunikation zwischen Agents via Redis Pub/Sub.
Er unterstuetzt Prioritaeten, Request-Response-Pattern und Broadcast-Nachrichten.
</p>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm">
<div className="text-gray-500 mb-2"># Nachrichtenfluss</div>
<pre className="text-gray-700">{`
┌──────────────┐ ┌──────────────┐
│ Sender │ │ Receiver │
│ (Agent) │ │ (Agent) │
└──────┬───────┘ └──────▲───────┘
│ │
│ publish(AgentMessage) │ handle(message)
│ │
▼ │
┌────────────────────────────────────────────────────────┐
│ Message Bus │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Priority Q │ │ Routing │ │ Logging │ │
│ │ HIGH/NORMAL │ │ Rules │ │ Audit │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Redis Pub/Sub │
└────────────────────────────────────────────────────────┘
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">Nachrichtentypen</h4>
<div className="overflow-x-auto">
<table className="min-w-full border border-gray-200 rounded-lg">
<thead className="bg-gray-50">
<tr>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Typ</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Prioritaet</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Beschreibung</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-200">
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">task_request</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-yellow-100 text-yellow-700 text-xs rounded">NORMAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Neue Aufgabe an Agent senden</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">task_response</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-yellow-100 text-yellow-700 text-xs rounded">NORMAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Antwort auf task_request</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">escalation</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-orange-100 text-orange-700 text-xs rounded">HIGH</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Eskalation an anderen Agent</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">alert</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-red-100 text-red-700 text-xs rounded">CRITICAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Kritische Benachrichtigung</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">heartbeat</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-gray-100 text-gray-700 text-xs rounded">LOW</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Liveness-Signal</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,71 @@
import { Server, Brain, GitBranch } from 'lucide-react'
export function OverviewSection() {
return (
<div className="space-y-6">
<p className="text-gray-600">
Das Breakpilot Multi-Agent-System basiert auf dem Mission Control Konzept. Es ermoeglicht
die Koordination mehrerer spezialisierter KI-Agents, die gemeinsam komplexe Aufgaben loesen.
</p>
{/* Architecture Diagram */}
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm overflow-x-auto">
<pre className="text-gray-700">{`
┌─────────────────────────────────────────────────────────────────┐
│ Breakpilot Services │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │Voice Service│ │Klausur Svc │ │ Admin-v2 / AlertAgent │ │
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
│ │ │ │ │
│ └────────────────┼──────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────────┐ │
│ │ Agent Core │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────────┐ │ │
│ │ │ Sessions │ │Shared Brain │ │ Orchestrator │ │ │
│ │ │ - Manager │ │ - Memory │ │ - Message Bus │ │ │
│ │ │ - Heartbeat │ │ - Context │ │ - Supervisor │ │ │
│ │ │ - Checkpoint│ │ - Knowledge │ │ - Task Router │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────────┐ │
│ │ Infrastructure │ │
│ │ Valkey (Redis) PostgreSQL Qdrant │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
`}</pre>
</div>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div className="bg-blue-50 border border-blue-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<Server className="w-5 h-5 text-blue-600" />
<span className="font-semibold text-blue-900">Session Management</span>
</div>
<p className="text-sm text-blue-700">
Verwaltet Agent-Lifecycles mit State Machine, Checkpoints und automatischer Recovery.
</p>
</div>
<div className="bg-purple-50 border border-purple-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<Brain className="w-5 h-5 text-purple-600" />
<span className="font-semibold text-purple-900">Shared Brain</span>
</div>
<p className="text-sm text-purple-700">
Gemeinsames Gedaechtnis fuer alle Agents mit TTL, Context-Verwaltung und Knowledge Graph.
</p>
</div>
<div className="bg-green-50 border border-green-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<GitBranch className="w-5 h-5 text-green-600" />
<span className="font-semibold text-green-900">Orchestrator</span>
</div>
<p className="text-sm text-green-700">
Message Bus, Supervisor und Task Router fuer die Agent-Koordination.
</p>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,56 @@
export function SessionLifecycleSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Sessions verwalten den Zustand von Agent-Interaktionen. Jede Session hat einen definierten
Lebenszyklus mit Checkpoints fuer Recovery.
</p>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm">
<div className="text-gray-500 mb-2"># Session State Machine</div>
<pre className="text-gray-700">{`
┌─────────────────────────────────────┐
│ │
▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ ACTIVE │───▶│ PAUSED │───▶│ COMPLETED│ │ FAILED │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ ▲
│ │ │
└───────────────┴───────────────────────────────┘
(bei Fehler)
States:
- ACTIVE: Session laeuft, Agent verarbeitet Tasks
- PAUSED: Session pausiert, wartet auf Eingabe
- COMPLETED: Session erfolgreich beendet
- FAILED: Session mit Fehler beendet
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">Heartbeat Monitoring</h4>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="grid grid-cols-3 gap-4 text-center">
<div>
<div className="text-2xl font-bold text-gray-900">30s</div>
<div className="text-sm text-gray-500">Timeout</div>
</div>
<div>
<div className="text-2xl font-bold text-gray-900">5s</div>
<div className="text-sm text-gray-500">Check Interval</div>
</div>
<div>
<div className="text-2xl font-bold text-gray-900">3</div>
<div className="text-sm text-gray-500">Max Missed Beats</div>
</div>
</div>
<p className="text-sm text-gray-600 mt-4 text-center">
Nach 3 verpassten Heartbeats wird der Agent als ausgefallen markiert und die
Restart-Policy greift (max. 3 Versuche).
</p>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,83 @@
import { Database, Activity, GitBranch } from 'lucide-react'
export function SharedBrainSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Das Shared Brain speichert Wissen und Kontext, auf den alle Agents zugreifen koennen.
Es besteht aus drei Komponenten: Memory Store, Context Manager und Knowledge Graph.
</p>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<Database className="w-5 h-5 text-blue-600" />
<h4 className="font-semibold text-gray-900">Memory Store</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Langzeit-Gedaechtnis fuer Fakten, Entscheidungen und Lernfortschritte.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- TTL-basierte Expiration (30 Tage default)</li>
<li>- Access-Tracking (Haeufigkeit)</li>
<li>- Pattern-basierte Suche</li>
<li>- Hybrid: Redis + PostgreSQL</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<Activity className="w-5 h-5 text-purple-600" />
<h4 className="font-semibold text-gray-900">Context Manager</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Verwaltet Konversationskontext mit automatischer Komprimierung.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- Max 50 Messages pro Context</li>
<li>- Automatische Zusammenfassung</li>
<li>- System-Messages bleiben erhalten</li>
<li>- Entity-Extraktion</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<GitBranch className="w-5 h-5 text-green-600" />
<h4 className="font-semibold text-gray-900">Knowledge Graph</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Graph-basierte Darstellung von Entitaeten und ihren Beziehungen.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- Entitaeten: Student, Lehrer, Fach</li>
<li>- Beziehungen: lernt, unterrichtet</li>
<li>- BFS-basierte Pfadsuche</li>
<li>- Verwandte Entitaeten finden</li>
</ul>
</div>
</div>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm mt-6">
<div className="text-gray-500 mb-2"># Memory Store Beispiel</div>
<pre className="text-gray-700">{`
# Speichern
await store.remember(
key="student:123:progress",
value={"level": 5, "score": 85, "topic": "algebra"},
agent_id="tutor-agent",
ttl_days=30
)
# Abrufen
progress = await store.recall("student:123:progress")
# → {"level": 5, "score": 85, "topic": "algebra"}
# Suchen
all_progress = await store.search("student:123:*")
# → [Memory(...), Memory(...), ...]
`}</pre>
</div>
</div>
)
}

View File

@@ -0,0 +1,68 @@
export function SoulFilesSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
SOUL-Dateien (Semantic Outline for Unified Learning) definieren die Persoenlichkeit und
Verhaltensregeln jedes Agents. Sie bestimmen, wie ein Agent kommuniziert, entscheidet und eskaliert.
</p>
<div className="bg-gray-900 rounded-xl p-6 text-gray-100 font-mono text-sm overflow-x-auto">
<div className="text-gray-400 mb-4"># Beispiel: tutor-agent.soul.md</div>
<pre className="text-green-400">{`
# TutorAgent SOUL
## Identitaet
Du bist ein geduldiger, ermutigender Lernbegleiter fuer Schueler.
Dein Ziel ist es, Verstaendnis zu foerdern, nicht Antworten vorzugeben.
## Kommunikationsstil
- Verwende einfache, klare Sprache
- Stelle Rueckfragen, um Verstaendnis zu pruefen
- Gib Hinweise statt direkter Loesungen
- Feiere kleine Erfolge
## Fachgebiete
- Mathematik (Grundschule bis Abitur)
- Naturwissenschaften (Physik, Chemie, Biologie)
- Sprachen (Deutsch, Englisch)
## Einschraenkungen
- Gib NIEMALS vollstaendige Loesungen fuer Hausaufgaben
- Verweise bei komplexen Themen auf Lehrkraefte
- Erkenne Frustration und biete Pausen an
## Eskalation
- Bei wiederholtem Unverstaendnis: Schlage alternatives Erklaerformat vor
- Bei emotionaler Belastung: Empfehle Gespraech mit Vertrauensperson
- Bei technischen Problemen: Eskaliere an Support
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">SOUL-Struktur</h4>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Identitaet</h5>
<p className="text-sm text-gray-600">Wer ist der Agent? Welche Rolle nimmt er ein?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Kommunikationsstil</h5>
<p className="text-sm text-gray-600">Wie kommuniziert der Agent mit Benutzern?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Fachgebiete</h5>
<p className="text-sm text-gray-600">In welchen Bereichen ist der Agent kompetent?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Einschraenkungen</h5>
<p className="text-sm text-gray-600">Was darf der Agent NICHT tun?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4 md:col-span-2">
<h5 className="font-medium text-gray-900 mb-2">Eskalation</h5>
<p className="text-sm text-gray-600">Wann und wie eskaliert der Agent an andere Agents oder Menschen?</p>
</div>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,82 @@
export function TaskRoutingSection() {
return (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Der Task Router entscheidet, welcher Agent eine Anfrage bearbeitet. Er verwendet
Intent-basierte Regeln mit Prioritaeten und Fallback-Ketten.
</p>
<div className="overflow-x-auto">
<table className="min-w-full border border-gray-200 rounded-lg">
<thead className="bg-gray-50">
<tr>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Intent-Pattern</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Ziel-Agent</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Prioritaet</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Fallback</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-200">
<tr>
<td className="px-4 py-2 text-sm font-mono text-blue-700">learning_*</td>
<td className="px-4 py-2 text-sm text-gray-700">TutorAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-blue-700">help_*, question_*</td>
<td className="px-4 py-2 text-sm text-gray-700">TutorAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">8</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-green-700">grade_*, evaluate_*</td>
<td className="px-4 py-2 text-sm text-gray-700">GraderAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-amber-700">quality_*, review_*</td>
<td className="px-4 py-2 text-sm text-gray-700">QualityJudge</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">GraderAgent</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-red-700">alert_*, monitor_*</td>
<td className="px-4 py-2 text-sm text-gray-700">AlertAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr className="bg-gray-50">
<td className="px-4 py-2 text-sm font-mono text-gray-500">* (alle anderen)</td>
<td className="px-4 py-2 text-sm text-gray-700">Orchestrator</td>
<td className="px-4 py-2 text-sm text-gray-700">0</td>
<td className="px-4 py-2 text-sm text-gray-500">-</td>
</tr>
</tbody>
</table>
</div>
<div className="mt-6 grid grid-cols-1 md:grid-cols-2 gap-4">
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2">Routing-Strategien</h4>
<ul className="text-sm text-gray-600 space-y-2">
<li><span className="font-mono text-blue-600">ROUND_ROBIN</span> - Gleichmaessige Verteilung</li>
<li><span className="font-mono text-blue-600">LEAST_LOADED</span> - Agent mit wenigsten Tasks</li>
<li><span className="font-mono text-blue-600">PRIORITY</span> - Hoechste Prioritaet zuerst</li>
<li><span className="font-mono text-blue-600">RANDOM</span> - Zufaellige Auswahl</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2">Fallback-Verhalten</h4>
<ul className="text-sm text-gray-600 space-y-2">
<li>1. Versuche Ziel-Agent zu erreichen</li>
<li>2. Bei Timeout: Fallback-Agent nutzen</li>
<li>3. Bei Fehler: Orchestrator uebernimmt</li>
<li>4. Bei kritischen Fehlern: Alert an Admin</li>
</ul>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,9 @@
export { OverviewSection } from './OverviewSection'
export { AgentTypesSection } from './AgentTypesSection'
export { SoulFilesSection } from './SoulFilesSection'
export { MessageBusSection } from './MessageBusSection'
export { SharedBrainSection } from './SharedBrainSection'
export { TaskRoutingSection } from './TaskRoutingSection'
export { SessionLifecycleSection } from './SessionLifecycleSection'
export { DatabaseSchemaSection } from './DatabaseSchemaSection'
export type { Section } from './types'

View File

@@ -0,0 +1,6 @@
export interface Section {
id: string
title: string
icon: React.ReactNode
content: React.ReactNode
}

View File

@@ -2,17 +2,78 @@
import { useState } from 'react' import { useState } from 'react'
import Link from 'next/link' import Link from 'next/link'
import { ArrowLeft, Cpu, Brain, MessageSquare, Database, Activity, Shield, ChevronDown, ChevronRight, GitBranch, Layers, Server, FileText, AlertTriangle, CheckCircle, Zap, RefreshCw } from 'lucide-react' import {
ArrowLeft, Cpu, Brain, MessageSquare, Database,
Activity, ChevronDown, ChevronRight, GitBranch,
Layers, FileText, Zap, RefreshCw,
} from 'lucide-react'
import {
OverviewSection,
AgentTypesSection,
SoulFilesSection,
MessageBusSection,
SharedBrainSection,
TaskRoutingSection,
SessionLifecycleSection,
DatabaseSchemaSection,
} from './_components'
import type { Section } from './_components'
interface Section { const SECTIONS: Section[] = [
id: string {
title: string id: 'overview',
icon: React.ReactNode title: 'System-Uebersicht',
content: React.ReactNode icon: <Layers className="w-5 h-5" />,
} content: <OverviewSection />,
},
{
id: 'agents',
title: 'Agent-Typen',
icon: <Cpu className="w-5 h-5" />,
content: <AgentTypesSection />,
},
{
id: 'soul-files',
title: 'SOUL-Files (Persoenlichkeiten)',
icon: <FileText className="w-5 h-5" />,
content: <SoulFilesSection />,
},
{
id: 'message-bus',
title: 'Message Bus & Kommunikation',
icon: <MessageSquare className="w-5 h-5" />,
content: <MessageBusSection />,
},
{
id: 'shared-brain',
title: 'Shared Brain (Gedaechtnis)',
icon: <Brain className="w-5 h-5" />,
content: <SharedBrainSection />,
},
{
id: 'task-routing',
title: 'Task Routing',
icon: <Zap className="w-5 h-5" />,
content: <TaskRoutingSection />,
},
{
id: 'session-lifecycle',
title: 'Session Lifecycle',
icon: <RefreshCw className="w-5 h-5" />,
content: <SessionLifecycleSection />,
},
{
id: 'database',
title: 'Datenbank-Schema',
icon: <Database className="w-5 h-5" />,
content: <DatabaseSchemaSection />,
},
]
export default function ArchitecturePage() { export default function ArchitecturePage() {
const [expandedSections, setExpandedSections] = useState<string[]>(['overview', 'agents', 'soul-files']) const [expandedSections, setExpandedSections] = useState<string[]>([
'overview', 'agents', 'soul-files',
])
const toggleSection = (id: string) => { const toggleSection = (id: string) => {
setExpandedSections(prev => setExpandedSections(prev =>
@@ -22,654 +83,6 @@ export default function ArchitecturePage() {
) )
} }
const sections: Section[] = [
{
id: 'overview',
title: 'System-Uebersicht',
icon: <Layers className="w-5 h-5" />,
content: (
<div className="space-y-6">
<p className="text-gray-600">
Das Breakpilot Multi-Agent-System basiert auf dem Mission Control Konzept. Es ermoeglicht
die Koordination mehrerer spezialisierter KI-Agents, die gemeinsam komplexe Aufgaben loesen.
</p>
{/* Architecture Diagram */}
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm overflow-x-auto">
<pre className="text-gray-700">{`
┌─────────────────────────────────────────────────────────────────┐
│ Breakpilot Services │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │Voice Service│ │Klausur Svc │ │ Admin-v2 / AlertAgent │ │
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
│ │ │ │ │
│ └────────────────┼──────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────────┐ │
│ │ Agent Core │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────────┐ │ │
│ │ │ Sessions │ │Shared Brain │ │ Orchestrator │ │ │
│ │ │ - Manager │ │ - Memory │ │ - Message Bus │ │ │
│ │ │ - Heartbeat │ │ - Context │ │ - Supervisor │ │ │
│ │ │ - Checkpoint│ │ - Knowledge │ │ - Task Router │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────────┐ │
│ │ Infrastructure │ │
│ │ Valkey (Redis) PostgreSQL Qdrant │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
`}</pre>
</div>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div className="bg-blue-50 border border-blue-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<Server className="w-5 h-5 text-blue-600" />
<span className="font-semibold text-blue-900">Session Management</span>
</div>
<p className="text-sm text-blue-700">
Verwaltet Agent-Lifecycles mit State Machine, Checkpoints und automatischer Recovery.
</p>
</div>
<div className="bg-purple-50 border border-purple-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<Brain className="w-5 h-5 text-purple-600" />
<span className="font-semibold text-purple-900">Shared Brain</span>
</div>
<p className="text-sm text-purple-700">
Gemeinsames Gedaechtnis fuer alle Agents mit TTL, Context-Verwaltung und Knowledge Graph.
</p>
</div>
<div className="bg-green-50 border border-green-200 rounded-xl p-4">
<div className="flex items-center gap-2 mb-2">
<GitBranch className="w-5 h-5 text-green-600" />
<span className="font-semibold text-green-900">Orchestrator</span>
</div>
<p className="text-sm text-green-700">
Message Bus, Supervisor und Task Router fuer die Agent-Koordination.
</p>
</div>
</div>
</div>
)
},
{
id: 'agents',
title: 'Agent-Typen',
icon: <Cpu className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Jeder Agent hat eine spezialisierte Rolle im System. Die Agents kommunizieren ueber den Message Bus
und nutzen das Shared Brain fuer konsistente Entscheidungen.
</p>
<div className="grid gap-4">
{/* TutorAgent */}
<div className="border border-gray-200 rounded-xl p-4 hover:border-blue-300 transition-colors">
<div className="flex items-start gap-4">
<div className="p-3 bg-blue-100 rounded-lg">
<Brain className="w-6 h-6 text-blue-600" />
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">TutorAgent</h4>
<p className="text-sm text-gray-600 mb-2">Lernbegleitung und Fragen beantworten</p>
<div className="flex flex-wrap gap-2">
<span className="px-2 py-1 bg-blue-50 text-blue-700 text-xs rounded-full">Geduldig</span>
<span className="px-2 py-1 bg-blue-50 text-blue-700 text-xs rounded-full">Ermutigend</span>
<span className="px-2 py-1 bg-blue-50 text-blue-700 text-xs rounded-full">Sokratisch</span>
</div>
<div className="mt-2 text-xs text-gray-500">
SOUL: tutor-agent.soul.md | Routing: learning_*, help_*, question_*
</div>
</div>
</div>
</div>
{/* GraderAgent */}
<div className="border border-gray-200 rounded-xl p-4 hover:border-green-300 transition-colors">
<div className="flex items-start gap-4">
<div className="p-3 bg-green-100 rounded-lg">
<CheckCircle className="w-6 h-6 text-green-600" />
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">GraderAgent</h4>
<p className="text-sm text-gray-600 mb-2">Klausur-Korrektur und Bewertung</p>
<div className="flex flex-wrap gap-2">
<span className="px-2 py-1 bg-green-50 text-green-700 text-xs rounded-full">Objektiv</span>
<span className="px-2 py-1 bg-green-50 text-green-700 text-xs rounded-full">Fair</span>
<span className="px-2 py-1 bg-green-50 text-green-700 text-xs rounded-full">Konstruktiv</span>
</div>
<div className="mt-2 text-xs text-gray-500">
SOUL: grader-agent.soul.md | Routing: grade_*, evaluate_*, correct_*
</div>
</div>
</div>
</div>
{/* QualityJudge */}
<div className="border border-gray-200 rounded-xl p-4 hover:border-amber-300 transition-colors">
<div className="flex items-start gap-4">
<div className="p-3 bg-amber-100 rounded-lg">
<Shield className="w-6 h-6 text-amber-600" />
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">QualityJudge</h4>
<p className="text-sm text-gray-600 mb-2">BQAS Qualitaetspruefung</p>
<div className="flex flex-wrap gap-2">
<span className="px-2 py-1 bg-amber-50 text-amber-700 text-xs rounded-full">Kritisch</span>
<span className="px-2 py-1 bg-amber-50 text-amber-700 text-xs rounded-full">Praezise</span>
<span className="px-2 py-1 bg-amber-50 text-amber-700 text-xs rounded-full">Schnell</span>
</div>
<div className="mt-2 text-xs text-gray-500">
SOUL: quality-judge.soul.md | Routing: quality_*, review_*, validate_*
</div>
</div>
</div>
</div>
{/* AlertAgent */}
<div className="border border-gray-200 rounded-xl p-4 hover:border-red-300 transition-colors">
<div className="flex items-start gap-4">
<div className="p-3 bg-red-100 rounded-lg">
<AlertTriangle className="w-6 h-6 text-red-600" />
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">AlertAgent</h4>
<p className="text-sm text-gray-600 mb-2">Monitoring und Benachrichtigungen</p>
<div className="flex flex-wrap gap-2">
<span className="px-2 py-1 bg-red-50 text-red-700 text-xs rounded-full">Wachsam</span>
<span className="px-2 py-1 bg-red-50 text-red-700 text-xs rounded-full">Proaktiv</span>
<span className="px-2 py-1 bg-red-50 text-red-700 text-xs rounded-full">Priorisierend</span>
</div>
<div className="mt-2 text-xs text-gray-500">
SOUL: alert-agent.soul.md | Routing: alert_*, monitor_*, notify_*
</div>
</div>
</div>
</div>
{/* Orchestrator */}
<div className="border border-gray-200 rounded-xl p-4 hover:border-purple-300 transition-colors">
<div className="flex items-start gap-4">
<div className="p-3 bg-purple-100 rounded-lg">
<MessageSquare className="w-6 h-6 text-purple-600" />
</div>
<div className="flex-1">
<h4 className="font-semibold text-gray-900">Orchestrator</h4>
<p className="text-sm text-gray-600 mb-2">Task-Koordination und Routing</p>
<div className="flex flex-wrap gap-2">
<span className="px-2 py-1 bg-purple-50 text-purple-700 text-xs rounded-full">Koordinierend</span>
<span className="px-2 py-1 bg-purple-50 text-purple-700 text-xs rounded-full">Effizient</span>
<span className="px-2 py-1 bg-purple-50 text-purple-700 text-xs rounded-full">Zuverlaessig</span>
</div>
<div className="mt-2 text-xs text-gray-500">
SOUL: orchestrator.soul.md | Routing: Fallback fuer alle unbekannten Intents
</div>
</div>
</div>
</div>
</div>
</div>
)
},
{
id: 'soul-files',
title: 'SOUL-Files (Persoenlichkeiten)',
icon: <FileText className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
SOUL-Dateien (Semantic Outline for Unified Learning) definieren die Persoenlichkeit und
Verhaltensregeln jedes Agents. Sie bestimmen, wie ein Agent kommuniziert, entscheidet und eskaliert.
</p>
<div className="bg-gray-900 rounded-xl p-6 text-gray-100 font-mono text-sm overflow-x-auto">
<div className="text-gray-400 mb-4"># Beispiel: tutor-agent.soul.md</div>
<pre className="text-green-400">{`
# TutorAgent SOUL
## Identitaet
Du bist ein geduldiger, ermutigender Lernbegleiter fuer Schueler.
Dein Ziel ist es, Verstaendnis zu foerdern, nicht Antworten vorzugeben.
## Kommunikationsstil
- Verwende einfache, klare Sprache
- Stelle Rueckfragen, um Verstaendnis zu pruefen
- Gib Hinweise statt direkter Loesungen
- Feiere kleine Erfolge
## Fachgebiete
- Mathematik (Grundschule bis Abitur)
- Naturwissenschaften (Physik, Chemie, Biologie)
- Sprachen (Deutsch, Englisch)
## Einschraenkungen
- Gib NIEMALS vollstaendige Loesungen fuer Hausaufgaben
- Verweise bei komplexen Themen auf Lehrkraefte
- Erkenne Frustration und biete Pausen an
## Eskalation
- Bei wiederholtem Unverstaendnis: Schlage alternatives Erklaerformat vor
- Bei emotionaler Belastung: Empfehle Gespraech mit Vertrauensperson
- Bei technischen Problemen: Eskaliere an Support
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">SOUL-Struktur</h4>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Identitaet</h5>
<p className="text-sm text-gray-600">Wer ist der Agent? Welche Rolle nimmt er ein?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Kommunikationsstil</h5>
<p className="text-sm text-gray-600">Wie kommuniziert der Agent mit Benutzern?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Fachgebiete</h5>
<p className="text-sm text-gray-600">In welchen Bereichen ist der Agent kompetent?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4">
<h5 className="font-medium text-gray-900 mb-2">Einschraenkungen</h5>
<p className="text-sm text-gray-600">Was darf der Agent NICHT tun?</p>
</div>
<div className="bg-white border border-gray-200 rounded-lg p-4 md:col-span-2">
<h5 className="font-medium text-gray-900 mb-2">Eskalation</h5>
<p className="text-sm text-gray-600">Wann und wie eskaliert der Agent an andere Agents oder Menschen?</p>
</div>
</div>
</div>
</div>
)
},
{
id: 'message-bus',
title: 'Message Bus & Kommunikation',
icon: <MessageSquare className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Der Message Bus ermoeglicht die asynchrone Kommunikation zwischen Agents via Redis Pub/Sub.
Er unterstuetzt Prioritaeten, Request-Response-Pattern und Broadcast-Nachrichten.
</p>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm">
<div className="text-gray-500 mb-2"># Nachrichtenfluss</div>
<pre className="text-gray-700">{`
┌──────────────┐ ┌──────────────┐
│ Sender │ │ Receiver │
│ (Agent) │ │ (Agent) │
└──────┬───────┘ └──────▲───────┘
│ │
│ publish(AgentMessage) │ handle(message)
│ │
▼ │
┌────────────────────────────────────────────────────────┐
│ Message Bus │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Priority Q │ │ Routing │ │ Logging │ │
│ │ HIGH/NORMAL │ │ Rules │ │ Audit │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Redis Pub/Sub │
└────────────────────────────────────────────────────────┘
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">Nachrichtentypen</h4>
<div className="overflow-x-auto">
<table className="min-w-full border border-gray-200 rounded-lg">
<thead className="bg-gray-50">
<tr>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Typ</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Prioritaet</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Beschreibung</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-200">
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">task_request</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-yellow-100 text-yellow-700 text-xs rounded">NORMAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Neue Aufgabe an Agent senden</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">task_response</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-yellow-100 text-yellow-700 text-xs rounded">NORMAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Antwort auf task_request</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">escalation</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-orange-100 text-orange-700 text-xs rounded">HIGH</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Eskalation an anderen Agent</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">alert</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-red-100 text-red-700 text-xs rounded">CRITICAL</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Kritische Benachrichtigung</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-gray-700">heartbeat</td>
<td className="px-4 py-2"><span className="px-2 py-1 bg-gray-100 text-gray-700 text-xs rounded">LOW</span></td>
<td className="px-4 py-2 text-sm text-gray-600">Liveness-Signal</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
)
},
{
id: 'shared-brain',
title: 'Shared Brain (Gedaechtnis)',
icon: <Brain className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Das Shared Brain speichert Wissen und Kontext, auf den alle Agents zugreifen koennen.
Es besteht aus drei Komponenten: Memory Store, Context Manager und Knowledge Graph.
</p>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<Database className="w-5 h-5 text-blue-600" />
<h4 className="font-semibold text-gray-900">Memory Store</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Langzeit-Gedaechtnis fuer Fakten, Entscheidungen und Lernfortschritte.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- TTL-basierte Expiration (30 Tage default)</li>
<li>- Access-Tracking (Haeufigkeit)</li>
<li>- Pattern-basierte Suche</li>
<li>- Hybrid: Redis + PostgreSQL</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<Activity className="w-5 h-5 text-purple-600" />
<h4 className="font-semibold text-gray-900">Context Manager</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Verwaltet Konversationskontext mit automatischer Komprimierung.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- Max 50 Messages pro Context</li>
<li>- Automatische Zusammenfassung</li>
<li>- System-Messages bleiben erhalten</li>
<li>- Entity-Extraktion</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="flex items-center gap-2 mb-3">
<GitBranch className="w-5 h-5 text-green-600" />
<h4 className="font-semibold text-gray-900">Knowledge Graph</h4>
</div>
<p className="text-sm text-gray-600 mb-3">
Graph-basierte Darstellung von Entitaeten und ihren Beziehungen.
</p>
<ul className="text-xs text-gray-500 space-y-1">
<li>- Entitaeten: Student, Lehrer, Fach</li>
<li>- Beziehungen: lernt, unterrichtet</li>
<li>- BFS-basierte Pfadsuche</li>
<li>- Verwandte Entitaeten finden</li>
</ul>
</div>
</div>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm mt-6">
<div className="text-gray-500 mb-2"># Memory Store Beispiel</div>
<pre className="text-gray-700">{`
# Speichern
await store.remember(
key="student:123:progress",
value={"level": 5, "score": 85, "topic": "algebra"},
agent_id="tutor-agent",
ttl_days=30
)
# Abrufen
progress = await store.recall("student:123:progress")
# → {"level": 5, "score": 85, "topic": "algebra"}
# Suchen
all_progress = await store.search("student:123:*")
# → [Memory(...), Memory(...), ...]
`}</pre>
</div>
</div>
)
},
{
id: 'task-routing',
title: 'Task Routing',
icon: <Zap className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Der Task Router entscheidet, welcher Agent eine Anfrage bearbeitet. Er verwendet
Intent-basierte Regeln mit Prioritaeten und Fallback-Ketten.
</p>
<div className="overflow-x-auto">
<table className="min-w-full border border-gray-200 rounded-lg">
<thead className="bg-gray-50">
<tr>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Intent-Pattern</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Ziel-Agent</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Prioritaet</th>
<th className="px-4 py-2 text-left text-sm font-medium text-gray-900">Fallback</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-200">
<tr>
<td className="px-4 py-2 text-sm font-mono text-blue-700">learning_*</td>
<td className="px-4 py-2 text-sm text-gray-700">TutorAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-blue-700">help_*, question_*</td>
<td className="px-4 py-2 text-sm text-gray-700">TutorAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">8</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-green-700">grade_*, evaluate_*</td>
<td className="px-4 py-2 text-sm text-gray-700">GraderAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-amber-700">quality_*, review_*</td>
<td className="px-4 py-2 text-sm text-gray-700">QualityJudge</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">GraderAgent</td>
</tr>
<tr>
<td className="px-4 py-2 text-sm font-mono text-red-700">alert_*, monitor_*</td>
<td className="px-4 py-2 text-sm text-gray-700">AlertAgent</td>
<td className="px-4 py-2 text-sm text-gray-700">10</td>
<td className="px-4 py-2 text-sm text-gray-500">Orchestrator</td>
</tr>
<tr className="bg-gray-50">
<td className="px-4 py-2 text-sm font-mono text-gray-500">* (alle anderen)</td>
<td className="px-4 py-2 text-sm text-gray-700">Orchestrator</td>
<td className="px-4 py-2 text-sm text-gray-700">0</td>
<td className="px-4 py-2 text-sm text-gray-500">-</td>
</tr>
</tbody>
</table>
</div>
<div className="mt-6 grid grid-cols-1 md:grid-cols-2 gap-4">
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2">Routing-Strategien</h4>
<ul className="text-sm text-gray-600 space-y-2">
<li><span className="font-mono text-blue-600">ROUND_ROBIN</span> - Gleichmaessige Verteilung</li>
<li><span className="font-mono text-blue-600">LEAST_LOADED</span> - Agent mit wenigsten Tasks</li>
<li><span className="font-mono text-blue-600">PRIORITY</span> - Hoechste Prioritaet zuerst</li>
<li><span className="font-mono text-blue-600">RANDOM</span> - Zufaellige Auswahl</li>
</ul>
</div>
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2">Fallback-Verhalten</h4>
<ul className="text-sm text-gray-600 space-y-2">
<li>1. Versuche Ziel-Agent zu erreichen</li>
<li>2. Bei Timeout: Fallback-Agent nutzen</li>
<li>3. Bei Fehler: Orchestrator uebernimmt</li>
<li>4. Bei kritischen Fehlern: Alert an Admin</li>
</ul>
</div>
</div>
</div>
)
},
{
id: 'session-lifecycle',
title: 'Session Lifecycle',
icon: <RefreshCw className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Sessions verwalten den Zustand von Agent-Interaktionen. Jede Session hat einen definierten
Lebenszyklus mit Checkpoints fuer Recovery.
</p>
<div className="bg-gray-50 rounded-xl p-6 font-mono text-sm">
<div className="text-gray-500 mb-2"># Session State Machine</div>
<pre className="text-gray-700">{`
┌─────────────────────────────────────┐
│ │
▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ ACTIVE │───▶│ PAUSED │───▶│ COMPLETED│ │ FAILED │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ ▲
│ │ │
└───────────────┴───────────────────────────────┘
(bei Fehler)
States:
- ACTIVE: Session laeuft, Agent verarbeitet Tasks
- PAUSED: Session pausiert, wartet auf Eingabe
- COMPLETED: Session erfolgreich beendet
- FAILED: Session mit Fehler beendet
`}</pre>
</div>
<div className="mt-6">
<h4 className="font-semibold text-gray-900 mb-3">Heartbeat Monitoring</h4>
<div className="bg-white border border-gray-200 rounded-xl p-5">
<div className="grid grid-cols-3 gap-4 text-center">
<div>
<div className="text-2xl font-bold text-gray-900">30s</div>
<div className="text-sm text-gray-500">Timeout</div>
</div>
<div>
<div className="text-2xl font-bold text-gray-900">5s</div>
<div className="text-sm text-gray-500">Check Interval</div>
</div>
<div>
<div className="text-2xl font-bold text-gray-900">3</div>
<div className="text-sm text-gray-500">Max Missed Beats</div>
</div>
</div>
<p className="text-sm text-gray-600 mt-4 text-center">
Nach 3 verpassten Heartbeats wird der Agent als ausgefallen markiert und die
Restart-Policy greift (max. 3 Versuche).
</p>
</div>
</div>
</div>
)
},
{
id: 'database',
title: 'Datenbank-Schema',
icon: <Database className="w-5 h-5" />,
content: (
<div className="space-y-4">
<p className="text-gray-600 mb-4">
Das Agent-System nutzt PostgreSQL fuer persistente Daten und Valkey (Redis) fuer Caching und Pub/Sub.
</p>
<div className="space-y-4">
{/* agent_sessions */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_sessions</h4>
<p className="text-sm text-gray-600 mb-3">Speichert Session-Daten mit Checkpoints</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_sessions (
id UUID PRIMARY KEY,
agent_type VARCHAR(50) NOT NULL,
user_id UUID REFERENCES users(id),
state VARCHAR(20) NOT NULL DEFAULT 'active',
context JSONB DEFAULT '{}',
checkpoints JSONB DEFAULT '[]',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
last_heartbeat TIMESTAMPTZ DEFAULT NOW()
);
`}</pre>
</div>
</div>
{/* agent_memory */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_memory</h4>
<p className="text-sm text-gray-600 mb-3">Langzeit-Gedaechtnis mit TTL</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_memory (
id UUID PRIMARY KEY,
namespace VARCHAR(100) NOT NULL,
key VARCHAR(500) NOT NULL,
value JSONB NOT NULL,
agent_id VARCHAR(50) NOT NULL,
access_count INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ,
UNIQUE(namespace, key)
);
`}</pre>
</div>
</div>
{/* agent_messages */}
<div className="bg-white border border-gray-200 rounded-xl p-4">
<h4 className="font-semibold text-gray-900 mb-2 font-mono">agent_messages</h4>
<p className="text-sm text-gray-600 mb-3">Audit-Trail fuer Inter-Agent Kommunikation</p>
<div className="bg-gray-50 rounded-lg p-3 font-mono text-xs overflow-x-auto">
<pre>{`
CREATE TABLE agent_messages (
id UUID PRIMARY KEY,
sender VARCHAR(50) NOT NULL,
receiver VARCHAR(50) NOT NULL,
message_type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
priority INTEGER DEFAULT 1,
correlation_id UUID,
created_at TIMESTAMPTZ DEFAULT NOW()
);
`}</pre>
</div>
</div>
</div>
</div>
)
}
]
return ( return (
<div className="p-6 max-w-5xl mx-auto"> <div className="p-6 max-w-5xl mx-auto">
{/* Header */} {/* Header */}
@@ -696,7 +109,7 @@ CREATE TABLE agent_messages (
<div className="bg-gray-50 rounded-xl p-5 mb-8"> <div className="bg-gray-50 rounded-xl p-5 mb-8">
<h2 className="font-semibold text-gray-900 mb-3">Inhaltsverzeichnis</h2> <h2 className="font-semibold text-gray-900 mb-3">Inhaltsverzeichnis</h2>
<div className="grid grid-cols-2 md:grid-cols-4 gap-2"> <div className="grid grid-cols-2 md:grid-cols-4 gap-2">
{sections.map(section => ( {SECTIONS.map(section => (
<button <button
key={section.id} key={section.id}
onClick={() => { onClick={() => {
@@ -716,7 +129,7 @@ CREATE TABLE agent_messages (
{/* Sections */} {/* Sections */}
<div className="space-y-4"> <div className="space-y-4">
{sections.map(section => ( {SECTIONS.map(section => (
<div <div
key={section.id} key={section.id}
id={section.id} id={section.id}
@@ -749,7 +162,7 @@ CREATE TABLE agent_messages (
{/* Footer Links */} {/* Footer Links */}
<div className="mt-8 bg-teal-50 border border-teal-200 rounded-xl p-5"> <div className="mt-8 bg-teal-50 border border-teal-200 rounded-xl p-5">
<h3 className="font-semibold text-teal-900 mb-3">Weiterführende Ressourcen</h3> <h3 className="font-semibold text-teal-900 mb-3">Weiterfuehrende Ressourcen</h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-3"> <div className="grid grid-cols-1 md:grid-cols-3 gap-3">
<Link <Link
href="/ai/agents" href="/ai/agents"

View File

@@ -1,395 +0,0 @@
'use client'
/**
* GPU Infrastructure Admin Page
*
* vast.ai GPU Management for LLM Processing
* Part of KI-Werkzeuge
*/
import { useEffect, useState, useCallback } from 'react'
import { PagePurpose } from '@/components/common/PagePurpose'
import { AIToolsSidebarResponsive } from '@/components/ai/AIToolsSidebar'
interface VastStatus {
instance_id: number | null
status: string
gpu_name: string | null
dph_total: number | null
endpoint_base_url: string | null
last_activity: string | null
auto_shutdown_in_minutes: number | null
total_runtime_hours: number | null
total_cost_usd: number | null
account_credit: number | null
account_total_spend: number | null
session_runtime_minutes: number | null
session_cost_usd: number | null
message: string | null
error?: string
}
export default function GPUInfrastructurePage() {
const [status, setStatus] = useState<VastStatus | null>(null)
const [loading, setLoading] = useState(true)
const [actionLoading, setActionLoading] = useState<string | null>(null)
const [error, setError] = useState<string | null>(null)
const [message, setMessage] = useState<string | null>(null)
const API_PROXY = '/api/admin/gpu'
const fetchStatus = useCallback(async () => {
setLoading(true)
setError(null)
try {
const response = await fetch(API_PROXY)
const data = await response.json()
if (!response.ok) {
throw new Error(data.error || `HTTP ${response.status}`)
}
setStatus(data)
} catch (err) {
setError(err instanceof Error ? err.message : 'Verbindungsfehler')
setStatus({
instance_id: null,
status: 'error',
gpu_name: null,
dph_total: null,
endpoint_base_url: null,
last_activity: null,
auto_shutdown_in_minutes: null,
total_runtime_hours: null,
total_cost_usd: null,
account_credit: null,
account_total_spend: null,
session_runtime_minutes: null,
session_cost_usd: null,
message: 'Verbindung fehlgeschlagen'
})
} finally {
setLoading(false)
}
}, [])
useEffect(() => {
fetchStatus()
}, [fetchStatus])
useEffect(() => {
const interval = setInterval(fetchStatus, 30000)
return () => clearInterval(interval)
}, [fetchStatus])
const powerOn = async () => {
setActionLoading('on')
setError(null)
setMessage(null)
try {
const response = await fetch(API_PROXY, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ action: 'on' }),
})
const data = await response.json()
if (!response.ok) {
throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
}
setMessage('Start angefordert')
setTimeout(fetchStatus, 3000)
setTimeout(fetchStatus, 10000)
} catch (err) {
setError(err instanceof Error ? err.message : 'Fehler beim Starten')
fetchStatus()
} finally {
setActionLoading(null)
}
}
const powerOff = async () => {
setActionLoading('off')
setError(null)
setMessage(null)
try {
const response = await fetch(API_PROXY, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ action: 'off' }),
})
const data = await response.json()
if (!response.ok) {
throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
}
setMessage('Stop angefordert')
setTimeout(fetchStatus, 3000)
setTimeout(fetchStatus, 10000)
} catch (err) {
setError(err instanceof Error ? err.message : 'Fehler beim Stoppen')
fetchStatus()
} finally {
setActionLoading(null)
}
}
const getStatusBadge = (s: string) => {
const baseClasses = 'px-3 py-1 rounded-full text-sm font-semibold uppercase'
switch (s) {
case 'running':
return `${baseClasses} bg-green-100 text-green-800`
case 'stopped':
case 'exited':
return `${baseClasses} bg-red-100 text-red-800`
case 'loading':
case 'scheduling':
case 'creating':
case 'starting...':
case 'stopping...':
return `${baseClasses} bg-yellow-100 text-yellow-800`
default:
return `${baseClasses} bg-slate-100 text-slate-600`
}
}
const getCreditColor = (credit: number | null) => {
if (credit === null) return 'text-slate-500'
if (credit < 5) return 'text-red-600'
if (credit < 15) return 'text-yellow-600'
return 'text-green-600'
}
return (
<div>
{/* Page Purpose */}
<PagePurpose
title="GPU Infrastruktur"
purpose="Verwalten Sie die vast.ai GPU-Instanzen fuer LLM-Verarbeitung und OCR. Starten/Stoppen Sie GPUs bei Bedarf und ueberwachen Sie Kosten in Echtzeit."
audience={['DevOps', 'Entwickler', 'System-Admins']}
architecture={{
services: ['vast.ai API', 'Ollama', 'VLLM'],
databases: ['PostgreSQL (Logs)'],
}}
relatedPages={[
{ name: 'Test Quality (BQAS)', href: '/ai/test-quality', description: 'Golden Suite & Tests' },
{ name: 'Magic Help', href: '/ai/magic-help', description: 'TrOCR Testing' },
]}
collapsible={true}
defaultCollapsed={true}
/>
{/* KI-Werkzeuge Sidebar */}
<AIToolsSidebarResponsive currentTool="gpu" />
{/* Status Cards */}
<div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
<div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-6 gap-6">
<div>
<div className="text-sm text-slate-500 mb-2">Status</div>
{loading ? (
<span className="px-3 py-1 rounded-full text-sm font-semibold bg-slate-100 text-slate-600">
Laden...
</span>
) : (
<span className={getStatusBadge(
actionLoading === 'on' ? 'starting...' :
actionLoading === 'off' ? 'stopping...' :
status?.status || 'unknown'
)}>
{actionLoading === 'on' ? 'starting...' :
actionLoading === 'off' ? 'stopping...' :
status?.status || 'unbekannt'}
</span>
)}
</div>
<div>
<div className="text-sm text-slate-500 mb-2">GPU</div>
<div className="font-semibold text-slate-900">
{status?.gpu_name || '-'}
</div>
</div>
<div>
<div className="text-sm text-slate-500 mb-2">Kosten/h</div>
<div className="font-semibold text-slate-900">
{status?.dph_total ? `$${status.dph_total.toFixed(3)}` : '-'}
</div>
</div>
<div>
<div className="text-sm text-slate-500 mb-2">Auto-Stop</div>
<div className="font-semibold text-slate-900">
{status && status.auto_shutdown_in_minutes !== null
? `${status.auto_shutdown_in_minutes} min`
: '-'}
</div>
</div>
<div>
<div className="text-sm text-slate-500 mb-2">Budget</div>
<div className={`font-bold text-lg ${getCreditColor(status?.account_credit ?? null)}`}>
{status && status.account_credit !== null
? `$${status.account_credit.toFixed(2)}`
: '-'}
</div>
</div>
<div>
<div className="text-sm text-slate-500 mb-2">Session</div>
<div className="font-semibold text-slate-900">
{status && status.session_runtime_minutes !== null && status.session_cost_usd !== null
? `${Math.round(status.session_runtime_minutes)} min / $${status.session_cost_usd.toFixed(3)}`
: '-'}
</div>
</div>
</div>
{/* Buttons */}
<div className="flex items-center gap-4 mt-6 pt-6 border-t border-slate-200">
<button
onClick={powerOn}
disabled={actionLoading !== null || status?.status === 'running'}
className="px-6 py-2 bg-orange-600 text-white rounded-lg font-medium hover:bg-orange-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
Starten
</button>
<button
onClick={powerOff}
disabled={actionLoading !== null || status?.status !== 'running'}
className="px-6 py-2 bg-red-600 text-white rounded-lg font-medium hover:bg-red-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
Stoppen
</button>
<button
onClick={fetchStatus}
disabled={loading}
className="px-4 py-2 border border-slate-300 text-slate-700 rounded-lg font-medium hover:bg-slate-50 disabled:opacity-50 transition-colors"
>
{loading ? 'Aktualisiere...' : 'Aktualisieren'}
</button>
{message && (
<span className="ml-4 text-sm text-green-600 font-medium">{message}</span>
)}
{error && (
<span className="ml-4 text-sm text-red-600 font-medium">{error}</span>
)}
</div>
</div>
{/* Extended Stats */}
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Kosten-Uebersicht</h3>
<div className="space-y-4">
<div className="flex justify-between items-center">
<span className="text-slate-600">Session Laufzeit</span>
<span className="font-semibold">
{status && status.session_runtime_minutes !== null
? `${Math.round(status.session_runtime_minutes)} Minuten`
: '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">Session Kosten</span>
<span className="font-semibold">
{status && status.session_cost_usd !== null
? `$${status.session_cost_usd.toFixed(4)}`
: '-'}
</span>
</div>
<div className="flex justify-between items-center pt-4 border-t border-slate-100">
<span className="text-slate-600">Gesamtlaufzeit</span>
<span className="font-semibold">
{status && status.total_runtime_hours !== null
? `${status.total_runtime_hours.toFixed(1)} Stunden`
: '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">Gesamtkosten</span>
<span className="font-semibold">
{status && status.total_cost_usd !== null
? `$${status.total_cost_usd.toFixed(2)}`
: '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">vast.ai Ausgaben</span>
<span className="font-semibold">
{status && status.account_total_spend !== null
? `$${status.account_total_spend.toFixed(2)}`
: '-'}
</span>
</div>
</div>
</div>
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Instanz-Details</h3>
<div className="space-y-4">
<div className="flex justify-between items-center">
<span className="text-slate-600">Instanz ID</span>
<span className="font-mono text-sm">
{status?.instance_id || '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">GPU</span>
<span className="font-semibold">
{status?.gpu_name || '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">Stundensatz</span>
<span className="font-semibold">
{status?.dph_total ? `$${status.dph_total.toFixed(4)}/h` : '-'}
</span>
</div>
<div className="flex justify-between items-center">
<span className="text-slate-600">Letzte Aktivitaet</span>
<span className="text-sm">
{status?.last_activity
? new Date(status.last_activity).toLocaleString('de-DE')
: '-'}
</span>
</div>
{status?.endpoint_base_url && status.status === 'running' && (
<div className="pt-4 border-t border-slate-100">
<div className="text-slate-600 text-sm mb-1">Endpoint</div>
<code className="text-xs bg-slate-100 px-2 py-1 rounded block overflow-x-auto">
{status.endpoint_base_url}
</code>
</div>
)}
</div>
</div>
</div>
{/* Info */}
<div className="bg-violet-50 border border-violet-200 rounded-xl p-4">
<div className="flex gap-3">
<svg className="w-5 h-5 text-violet-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
<div>
<h4 className="font-semibold text-violet-900">Auto-Shutdown</h4>
<p className="text-sm text-violet-800 mt-1">
Die GPU-Instanz wird automatisch gestoppt, wenn sie laengere Zeit inaktiv ist.
Der Status wird alle 30 Sekunden automatisch aktualisiert.
</p>
</div>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,64 @@
'use client'
interface GlobalDragOverlayProps {
active: boolean
}
export function GlobalDragOverlay({ active }: GlobalDragOverlayProps) {
if (!active) return null
return (
<div className="fixed inset-0 z-50 bg-purple-900/80 backdrop-blur-sm flex items-center justify-center pointer-events-none">
<div className="text-center">
<div className="text-7xl mb-4 animate-bounce">📄</div>
<div className="text-2xl font-bold text-white">Bild hier ablegen</div>
<div className="text-purple-200 mt-2">PNG, JPG - Handgeschriebener Text</div>
</div>
</div>
)
}
interface KeyboardShortcutsModalProps {
open: boolean
onClose: () => void
}
export function KeyboardShortcutsModal({ open, onClose }: KeyboardShortcutsModalProps) {
if (!open) return null
return (
<div className="fixed inset-0 z-40 bg-black/50 flex items-center justify-center" onClick={onClose}>
<div className="bg-white rounded-xl shadow-2xl p-6 max-w-md" onClick={e => e.stopPropagation()}>
<h3 className="text-lg font-bold text-slate-900 mb-4">Tastenkuerzel</h3>
<div className="space-y-2 text-sm">
<div className="flex justify-between">
<span className="text-slate-600">Bild einfuegen</span>
<kbd className="px-2 py-1 bg-slate-100 rounded text-xs font-mono">Ctrl+V</kbd>
</div>
<div className="flex justify-between">
<span className="text-slate-600">OCR starten</span>
<kbd className="px-2 py-1 bg-slate-100 rounded text-xs font-mono">Ctrl+Enter</kbd>
</div>
<div className="flex justify-between">
<span className="text-slate-600">Tab wechseln</span>
<kbd className="px-2 py-1 bg-slate-100 rounded text-xs font-mono">Alt+1-6</kbd>
</div>
<div className="flex justify-between">
<span className="text-slate-600">Bild entfernen</span>
<kbd className="px-2 py-1 bg-slate-100 rounded text-xs font-mono">Escape</kbd>
</div>
<div className="flex justify-between">
<span className="text-slate-600">Shortcuts anzeigen</span>
<kbd className="px-2 py-1 bg-slate-100 rounded text-xs font-mono">?</kbd>
</div>
</div>
<button
onClick={onClose}
className="w-full mt-4 px-4 py-2 bg-purple-600 hover:bg-purple-700 text-white rounded-lg text-sm"
>
Schliessen
</button>
</div>
</div>
)
}

View File

@@ -0,0 +1,185 @@
'use client'
export function TabArchitecture() {
return (
<div className="space-y-6">
{/* Architecture Diagram */}
<ArchitectureDiagram />
{/* Components */}
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
<ComponentCard
icon="🔍"
title="TrOCR Service"
description="Das TrOCR-Modell von Microsoft ist speziell fuer Handschrifterkennung trainiert. Es verwendet eine Vision-Transformer (ViT) Architektur fuer Bildverarbeitung und einen Text-Decoder fuer die Textgenerierung."
specs={[
{ label: 'Modell', value: 'microsoft/trocr-base-handwritten' },
{ label: 'Groesse', value: '~350 MB' },
{ label: 'Lizenz', value: 'MIT' },
{ label: 'Framework', value: 'PyTorch / Transformers' },
]}
/>
<ComponentCard
icon="🎯"
title="LoRA Fine-Tuning"
description="LoRA fuegt kleine, trainierbare Matrizen zu bestimmten Schichten hinzu, ohne das Basismodell zu veraendern. Dies ermoeglicht effizientes Fine-Tuning mit minimaler Speichernutzung."
specs={[
{ label: 'Methode', value: 'Low-Rank Adaptation' },
{ label: 'Adapter-Groesse', value: '~10 MB' },
{ label: 'Trainingszeit', value: '5-15 Min (CPU)' },
{ label: 'Min. Beispiele', value: '10' },
]}
/>
<ComponentCard
icon="🔒"
title="Pseudonymisierung"
description="Schuelernamen werden durch anonyme Tokens ersetzt, bevor Daten die lokale Umgebung verlassen. Das Mapping wird ausschliesslich lokal gespeichert."
specs={[
{ label: 'Methode', value: 'QR-Code Tokens' },
{ label: 'Token-Format', value: 'UUID v4' },
{ label: 'Mapping', value: 'Lokal beim Lehrer' },
{ label: 'Cloud-Daten', value: 'Nur Tokens + Text' },
]}
/>
<ComponentCard
icon="☁️"
title="Cloud LLM"
description="Die KI-Korrektur erfolgt auf deutschen Servern mit strikter Mandantentrennung. Es werden keine Klarnamen oder identifizierenden Informationen uebertragen."
specs={[
{ label: 'Provider', value: 'SysEleven (DE)' },
{ label: 'Standort', value: 'Deutschland' },
{ label: 'Isolation', value: 'Namespace pro Schule' },
{ label: 'Datenverarbeitung', value: 'Nur pseudonymisiert' },
]}
/>
</div>
{/* Data Flow */}
<DataFlowCard />
</div>
)
}
/* ------------------------------------------------------------------ */
function ArchitectureDiagram() {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-6">Systemarchitektur</h2>
<div className="bg-slate-900 rounded-lg p-6 font-mono text-xs overflow-x-auto">
<pre className="text-slate-300">
{`┌─────────────────────────────────────────────────────────────────────────────┐
│ MAGIC HELP ARCHITEKTUR │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ FRONTEND │ │ BACKEND │ │ STORAGE │ │
│ │ (Next.js) │ │ (FastAPI) │ │ │ │
│ │ │ │ │ │ │ │
│ │ ┌─────────┐ │ REST │ ┌────────────┐ │ │ ┌─────────┐ │ │
│ │ │ Admin │──┼─────────┼──│ TrOCR │ │ │ │ Models │ │ │
│ │ │ Panel │ │ │ │ Service │──┼─────────┼──│ (ONNX) │ │ │
│ │ └─────────┘ │ │ └────────────┘ │ │ └─────────┘ │ │
│ │ │ │ │ │ │ │ │
│ │ ┌─────────┐ │ WebSocket│ ┌────────────┐ │ │ ┌─────────┐ │ │
│ │ │ Lehrer │──┼─────────┼──│ Klausur │ │ │ │ LoRA │ │ │
│ │ │ Portal │ │ │ │ Processor │──┼─────────┼──│ Adapter │ │ │
│ │ └─────────┘ │ │ └────────────┘ │ │ └─────────┘ │ │
│ │ │ │ │ │ │ │ │
│ └───────────────┘ │ ┌────────────┐ │ │ ┌─────────┐ │ │
│ │ │ Pseudo- │ │ │ │Training │ │ │
│ │ │ nymizer │──┼─────────┼──│ Data │ │ │
│ │ └────────────┘ │ │ └─────────┘ │ │
│ │ │ │ │ │
│ └──────────────────┘ └───────────────┘ │
│ │ │
│ │ (nur pseudonymisiert) │
│ ▼ │
│ ┌──────────────────┐ │
│ │ CLOUD LLM │ │
│ │ (SysEleven) │ │
│ │ Namespace- │ │
│ │ Isolation │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘`}
</pre>
</div>
</div>
)
}
interface ComponentCardProps {
icon: string
title: string
description: string
specs: Array<{ label: string; value: string }>
}
function ComponentCard({ icon, title, description, specs }: ComponentCardProps) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h3 className="text-lg font-semibold text-slate-900 mb-4 flex items-center gap-2">
<span>{icon}</span> {title}
</h3>
<div className="space-y-3 text-sm">
{specs.map((spec) => (
<div key={spec.label} className="flex justify-between">
<span className="text-slate-500">{spec.label}</span>
<span className="text-slate-900">{spec.value}</span>
</div>
))}
</div>
<p className="text-slate-500 text-sm mt-4">{description}</p>
</div>
)
}
const DATA_FLOW_STEPS = [
{
num: 1,
color: 'bg-blue-100 text-blue-600',
title: 'Lokale Header-Extraktion',
desc: 'TrOCR erkennt Schuelernamen, Klasse und Fach direkt im Browser/PWA (offline-faehig)',
},
{
num: 2,
color: 'bg-purple-100 text-purple-600',
title: 'Pseudonymisierung',
desc: 'Namen werden durch QR-Code Tokens ersetzt, Mapping bleibt lokal',
},
{
num: 3,
color: 'bg-green-100 text-green-600',
title: 'Cloud-Korrektur',
desc: 'Nur pseudonymisierte Dokument-Tokens werden an die KI gesendet',
},
{
num: 4,
color: 'bg-yellow-100 text-yellow-600',
title: 'Re-Identifikation',
desc: 'Ergebnisse werden lokal mit dem Mapping wieder den echten Namen zugeordnet',
},
]
function DataFlowCard() {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Datenfluss</h2>
<div className="space-y-4">
{DATA_FLOW_STEPS.map((step) => (
<div key={step.num} className="flex items-start gap-4 bg-slate-50 rounded-lg p-4">
<div className={`w-8 h-8 rounded-full ${step.color} flex items-center justify-center font-bold`}>
{step.num}
</div>
<div>
<div className="font-medium text-slate-900">{step.title}</div>
<div className="text-sm text-slate-500">{step.desc}</div>
</div>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,53 @@
'use client'
import { BatchUploader } from '@/components/ai/BatchUploader'
import { API_BASE } from '../types'
export function TabBatch() {
return (
<div className="space-y-6">
{/* Batch OCR Processing */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-2">Batch-Verarbeitung</h2>
<p className="text-sm text-slate-500 mb-6">
Verarbeite mehrere Bilder gleichzeitig mit Echtzeit-Fortschrittsanzeige.
Die Ergebnisse werden per Server-Sent Events gestreamt.
</p>
<BatchUploader
apiBase={API_BASE}
maxFiles={20}
autoProcess={false}
onComplete={(results) => {
console.log('Batch complete:', results)
}}
/>
</div>
{/* Batch Processing Info */}
<div className="grid grid-cols-1 md:grid-cols-3 gap-6">
<div className="bg-gradient-to-br from-blue-50 to-blue-100 border border-blue-200 rounded-xl p-6">
<div className="text-3xl mb-2">🚀</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Parallele Verarbeitung</h3>
<p className="text-sm text-slate-600">
Mehrere Bilder werden parallel verarbeitet fuer maximale Geschwindigkeit.
</p>
</div>
<div className="bg-gradient-to-br from-green-50 to-green-100 border border-green-200 rounded-xl p-6">
<div className="text-3xl mb-2">💾</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Smart Caching</h3>
<p className="text-sm text-slate-600">
Identische Bilder werden automatisch aus dem Cache geladen (unter 50ms).
</p>
</div>
<div className="bg-gradient-to-br from-purple-50 to-purple-100 border border-purple-200 rounded-xl p-6">
<div className="text-3xl mb-2">📊</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Live-Fortschritt</h3>
<p className="text-sm text-slate-600">
Echtzeit-Updates via Server-Sent Events zeigen den Verarbeitungsfortschritt.
</p>
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,127 @@
'use client'
import { SkeletonText } from '@/components/common/SkeletonText'
import type { TrOCRStatus } from '../types'
interface TabOverviewProps {
status: TrOCRStatus | null
loading: boolean
onRefresh: () => void
}
export function TabOverview({ status, loading, onRefresh }: TabOverviewProps) {
return (
<div className="space-y-6">
{/* Status Card */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<div className="flex items-center justify-between mb-4">
<h2 className="text-lg font-semibold text-slate-900">Systemstatus</h2>
<button
onClick={onRefresh}
className="px-3 py-1 bg-purple-600 hover:bg-purple-700 text-white rounded text-sm transition-colors"
>
Aktualisieren
</button>
</div>
{loading ? (
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
{[1, 2, 3, 4].map((i) => (
<div key={i} className="bg-slate-50 rounded-lg p-4">
<SkeletonText lines={1} className="mb-2" />
<div className="h-3 w-16 bg-slate-200 rounded animate-pulse" />
</div>
))}
</div>
) : status?.status === 'available' ? (
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div className="bg-slate-50 rounded-lg p-4">
<div className="text-2xl font-bold text-slate-900">{status.model_name || 'trocr-base'}</div>
<div className="text-xs text-slate-500">Modell</div>
</div>
<div className="bg-slate-50 rounded-lg p-4">
<div className="text-2xl font-bold text-slate-900">{status.device || 'CPU'}</div>
<div className="text-xs text-slate-500">Geraet</div>
</div>
<div className="bg-slate-50 rounded-lg p-4">
<div className="text-2xl font-bold text-slate-900">{status.training_examples_count || 0}</div>
<div className="text-xs text-slate-500">Trainingsbeispiele</div>
</div>
<div className="bg-slate-50 rounded-lg p-4">
<div className="text-2xl font-bold text-slate-900">{status.has_lora_adapter ? 'Aktiv' : 'Keiner'}</div>
<div className="text-xs text-slate-500">LoRA Adapter</div>
</div>
</div>
) : status?.status === 'not_installed' ? (
<div className="text-slate-600">
<p className="mb-2">TrOCR ist nicht installiert. Fuehre aus:</p>
<code className="bg-slate-100 px-3 py-2 rounded text-sm block font-mono">{status.install_command}</code>
</div>
) : (
<div className="text-red-600">{status?.error || 'Unbekannter Fehler'}</div>
)}
</div>
{/* Quick Overview Cards */}
<div className="grid grid-cols-1 md:grid-cols-3 gap-6">
<div className="bg-gradient-to-br from-purple-50 to-purple-100 border border-purple-200 rounded-xl p-6">
<div className="text-3xl mb-2">🎯</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Handschrifterkennung</h3>
<p className="text-sm text-slate-600">
TrOCR erkennt automatisch handgeschriebenen Text in Klausuren.
Das Modell wurde speziell fuer deutsche Handschriften optimiert.
</p>
</div>
<div className="bg-gradient-to-br from-green-50 to-green-100 border border-green-200 rounded-xl p-6">
<div className="text-3xl mb-2">🔒</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Privacy by Design</h3>
<p className="text-sm text-slate-600">
Alle Daten werden lokal verarbeitet. Schuelernamen werden durch
QR-Codes pseudonymisiert - DSGVO-konform.
</p>
</div>
<div className="bg-gradient-to-br from-blue-50 to-blue-100 border border-blue-200 rounded-xl p-6">
<div className="text-3xl mb-2">📈</div>
<h3 className="text-lg font-semibold text-slate-900 mb-2">Kontinuierliches Lernen</h3>
<p className="text-sm text-slate-600">
Mit LoRA Fine-Tuning passt sich das Modell an individuelle
Handschriften an - ohne das Basismodell zu veraendern.
</p>
</div>
</div>
{/* Workflow Overview */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Magic Onboarding Workflow</h2>
<div className="flex flex-wrap items-center gap-4 text-sm">
{WORKFLOW_STEPS.map((step, i) => (
<WorkflowStep key={step.title} step={step} showArrow={i < WORKFLOW_STEPS.length - 1} />
))}
</div>
</div>
</div>
)
}
const WORKFLOW_STEPS = [
{ icon: '📄', title: '1. Upload', desc: '25 Klausuren hochladen' },
{ icon: '🔍', title: '2. Analyse', desc: 'Lokale OCR in 5-10 Sek' },
{ icon: '✅', title: '3. Bestaetigung', desc: 'Klasse, Schueler, Fach' },
{ icon: '🤖', title: '4. KI-Korrektur', desc: 'Cloud mit Pseudonymisierung' },
{ icon: '📊', title: '5. Integration', desc: 'Notenbuch, Zeugnisse' },
]
function WorkflowStep({ step, showArrow }: { step: typeof WORKFLOW_STEPS[number]; showArrow: boolean }) {
return (
<>
<div className="flex items-center gap-2 bg-slate-50 rounded-lg px-4 py-3">
<span className="text-2xl">{step.icon}</span>
<div>
<div className="font-medium text-slate-900">{step.title}</div>
<div className="text-slate-500">{step.desc}</div>
</div>
</div>
{showArrow && <div className="text-slate-400">&rarr;</div>}
</>
)
}

View File

@@ -0,0 +1,226 @@
'use client'
import type { MagicSettings } from '../types'
import { DEFAULT_SETTINGS } from '../types'
interface TabSettingsProps {
settings: MagicSettings
settingsSaved: boolean
onUpdateSettings: (settings: MagicSettings) => void
onSave: () => void
}
export function TabSettings({ settings, settingsSaved, onUpdateSettings, onSave }: TabSettingsProps) {
const update = (partial: Partial<MagicSettings>) => {
onUpdateSettings({ ...settings, ...partial })
}
return (
<div className="space-y-6">
{/* OCR Settings */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">OCR Einstellungen</h2>
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
<CheckboxSetting
label="Automatische Zeilenerkennung"
description="Erkennt und verarbeitet einzelne Zeilen separat"
checked={settings.autoDetectLines}
onChange={(v) => update({ autoDetectLines: v })}
/>
<CheckboxSetting
label="Live-Vorschau"
description="OCR startet automatisch nach Bild-Upload"
checked={settings.livePreview}
onChange={(v) => update({ livePreview: v })}
/>
<CheckboxSetting
label="Sound-Feedback"
description="Akustisches Feedback bei erfolgreicher Erkennung"
checked={settings.soundFeedback}
onChange={(v) => update({ soundFeedback: v })}
/>
<div>
<label className="block text-sm text-slate-700 mb-2">Konfidenz-Schwellwert</label>
<input
type="range"
min="0"
max="1"
step="0.1"
value={settings.confidenceThreshold}
onChange={(e) => update({ confidenceThreshold: parseFloat(e.target.value) })}
className="w-full"
/>
<div className="flex justify-between text-xs text-slate-400 mt-1">
<span>0%</span>
<span className="text-slate-900">{(settings.confidenceThreshold * 100).toFixed(0)}%</span>
<span>100%</span>
</div>
</div>
<div>
<label className="block text-sm text-slate-700 mb-2">Max. Bildgroesse (px)</label>
<input
type="number"
value={settings.maxImageSize}
onChange={(e) => update({ maxImageSize: parseInt(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
/>
<div className="text-xs text-slate-400 mt-1">Groessere Bilder werden skaliert</div>
</div>
<CheckboxSetting
label="Ergebnis-Cache aktivieren"
description="Speichert OCR-Ergebnisse fuer identische Bilder"
checked={settings.enableCache}
onChange={(v) => update({ enableCache: v })}
/>
</div>
</div>
{/* Training Settings */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Training Einstellungen</h2>
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
<div>
<label className="block text-sm text-slate-700 mb-2">LoRA Rank</label>
<select
value={settings.loraRank}
onChange={(e) => update({ loraRank: parseInt(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
>
<option value="4">4 (Schnell, weniger Kapazitaet)</option>
<option value="8">8 (Ausgewogen)</option>
<option value="16">16 (Mehr Kapazitaet)</option>
<option value="32">32 (Maximum)</option>
</select>
</div>
<div>
<label className="block text-sm text-slate-700 mb-2">LoRA Alpha</label>
<input
type="number"
value={settings.loraAlpha}
onChange={(e) => update({ loraAlpha: parseInt(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
/>
<div className="text-xs text-slate-400 mt-1">Empfohlen: 4 x LoRA Rank</div>
</div>
<div>
<label className="block text-sm text-slate-700 mb-2">Epochen</label>
<input
type="number"
min="1"
max="10"
value={settings.epochs}
onChange={(e) => update({ epochs: parseInt(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
/>
</div>
<div>
<label className="block text-sm text-slate-700 mb-2">Batch Size</label>
<select
value={settings.batchSize}
onChange={(e) => update({ batchSize: parseInt(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
>
<option value="1">1 (Wenig RAM)</option>
<option value="2">2</option>
<option value="4">4 (Standard)</option>
<option value="8">8 (Viel RAM)</option>
</select>
</div>
<div>
<label className="block text-sm text-slate-700 mb-2">Learning Rate</label>
<select
value={settings.learningRate}
onChange={(e) => update({ learningRate: parseFloat(e.target.value) })}
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-slate-900"
>
<option value="0.0001">0.0001 (Schnell)</option>
<option value="0.00005">0.00005 (Standard)</option>
<option value="0.00001">0.00001 (Konservativ)</option>
</select>
</div>
</div>
</div>
{/* Save Button */}
<div className="flex justify-end gap-4">
<button
onClick={() => onUpdateSettings(DEFAULT_SETTINGS)}
className="px-6 py-2 bg-slate-200 hover:bg-slate-300 text-slate-700 rounded-lg text-sm font-medium transition-colors"
>
Zuruecksetzen
</button>
<button
onClick={onSave}
className="px-6 py-2 bg-purple-600 hover:bg-purple-700 text-white rounded-lg text-sm font-medium transition-colors"
>
{settingsSaved ? '\u2713 Gespeichert!' : 'Einstellungen speichern'}
</button>
</div>
{/* Technical Info */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Technische Informationen</h2>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
<div>
<span className="text-slate-500">API Endpoint:</span>
<code className="text-slate-900 ml-2 bg-slate-100 px-2 py-1 rounded text-xs">/api/klausur/trocr</code>
</div>
<div>
<span className="text-slate-500">Model Path:</span>
<code className="text-slate-900 ml-2 bg-slate-100 px-2 py-1 rounded text-xs">~/.cache/huggingface</code>
</div>
<div>
<span className="text-slate-500">LoRA Path:</span>
<code className="text-slate-900 ml-2 bg-slate-100 px-2 py-1 rounded text-xs">./models/lora</code>
</div>
<div>
<span className="text-slate-500">Training Data:</span>
<code className="text-slate-900 ml-2 bg-slate-100 px-2 py-1 rounded text-xs">./data/training</code>
</div>
</div>
</div>
</div>
)
}
/* ------------------------------------------------------------------ */
function CheckboxSetting({
label,
description,
checked,
onChange,
}: {
label: string
description: string
checked: boolean
onChange: (value: boolean) => void
}) {
return (
<div>
<label className="flex items-center gap-3 cursor-pointer">
<input
type="checkbox"
checked={checked}
onChange={(e) => onChange(e.target.checked)}
className="w-5 h-5 rounded bg-slate-100 border-slate-300"
/>
<div>
<div className="text-slate-900 font-medium">{label}</div>
<div className="text-sm text-slate-500">{description}</div>
</div>
</label>
</div>
)
}

View File

@@ -0,0 +1,304 @@
'use client'
import { SkeletonOCRResult, SkeletonDots } from '@/components/common/SkeletonText'
import { ConfidenceHeatmap } from '@/components/ai/ConfidenceHeatmap'
import type { OCRResult, MagicSettings } from '../types'
interface TabTestProps {
ocrResult: OCRResult | null
ocrLoading: boolean
imagePreview: string | null
uploadedImage: File | null
settings: MagicSettings
showHeatmap: boolean
onToggleHeatmap: () => void
onFileUpload: (file: File) => void
onManualOCR: () => void
onClearImage: () => void
onSendToTraining: () => void
}
function getConfidenceColor(confidence: number) {
if (confidence >= 0.9) return 'bg-green-500'
if (confidence >= 0.7) return 'bg-yellow-500'
return 'bg-red-500'
}
export function TabTest({
ocrResult,
ocrLoading,
imagePreview,
uploadedImage,
settings,
showHeatmap,
onToggleHeatmap,
onFileUpload,
onManualOCR,
onClearImage,
onSendToTraining,
}: TabTestProps) {
return (
<div className="space-y-6">
{/* OCR Test */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">OCR Test</h2>
<p className="text-sm text-slate-500 mb-4">
Teste die Handschrifterkennung mit einem eigenen Bild. Das Ergebnis zeigt
den erkannten Text, Konfidenz und Verarbeitungszeit.
{settings.livePreview && (
<span className="text-purple-600 ml-1">(Live-Vorschau aktiv)</span>
)}
</p>
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6">
{/* Upload Area */}
<UploadArea
imagePreview={imagePreview}
uploadedImage={uploadedImage}
ocrLoading={ocrLoading}
livePreview={settings.livePreview}
onFileUpload={onFileUpload}
onManualOCR={onManualOCR}
onClearImage={onClearImage}
/>
{/* Results Area */}
<ResultsArea
ocrResult={ocrResult}
ocrLoading={ocrLoading}
onSendToTraining={onSendToTraining}
/>
</div>
</div>
{/* Confidence Heatmap */}
{imagePreview && ocrResult && ocrResult.confidence > 0 && (
<div className="bg-white rounded-xl shadow-sm border p-6">
<div className="flex items-center justify-between mb-4">
<h2 className="text-lg font-semibold text-slate-900">Konfidenz-Visualisierung</h2>
<button
onClick={onToggleHeatmap}
className={`px-3 py-1 rounded text-sm font-medium transition-colors ${
showHeatmap
? 'bg-purple-600 text-white'
: 'bg-slate-200 text-slate-700 hover:bg-slate-300'
}`}
>
{showHeatmap ? 'Heatmap verbergen' : 'Heatmap anzeigen'}
</button>
</div>
{showHeatmap && (
<ConfidenceHeatmap
imageSrc={imagePreview}
text={ocrResult.text}
confidence={ocrResult.confidence}
wordBoxes={ocrResult.word_boxes?.map(w => ({
text: w.text,
confidence: w.confidence,
bbox: w.bbox as [number, number, number, number]
})) || []}
charConfidences={ocrResult.char_confidences || []}
showLegend={true}
toggleable={true}
/>
)}
</div>
)}
{/* Confidence Interpretation */}
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Konfidenz-Interpretation</h2>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div className="bg-green-50 border border-green-200 rounded-lg p-4">
<div className="text-green-700 font-medium">90-100%</div>
<div className="text-sm text-slate-600 mt-1">Sehr hohe Sicherheit - Text kann direkt uebernommen werden</div>
</div>
<div className="bg-yellow-50 border border-yellow-200 rounded-lg p-4">
<div className="text-yellow-700 font-medium">70-90%</div>
<div className="text-sm text-slate-600 mt-1">Gute Sicherheit - manuelle Ueberpruefung empfohlen</div>
</div>
<div className="bg-red-50 border border-red-200 rounded-lg p-4">
<div className="text-red-700 font-medium">&lt; 70%</div>
<div className="text-sm text-slate-600 mt-1">Niedrige Sicherheit - manuelle Eingabe erforderlich</div>
</div>
</div>
</div>
</div>
)
}
/* ------------------------------------------------------------------ */
/* Sub-components */
/* ------------------------------------------------------------------ */
interface UploadAreaProps {
imagePreview: string | null
uploadedImage: File | null
ocrLoading: boolean
livePreview: boolean
onFileUpload: (file: File) => void
onManualOCR: () => void
onClearImage: () => void
}
function UploadArea({ imagePreview, uploadedImage, ocrLoading, livePreview, onFileUpload, onManualOCR, onClearImage }: UploadAreaProps) {
return (
<div>
<div
className={`border-2 border-dashed rounded-lg p-8 text-center cursor-pointer transition-all ${
imagePreview
? 'border-purple-500 bg-purple-50'
: 'border-slate-300 hover:border-purple-500'
}`}
onClick={() => document.getElementById('ocr-file-input')?.click()}
onDragOver={(e) => { e.preventDefault(); e.currentTarget.classList.add('border-purple-500', 'bg-purple-50') }}
onDragLeave={(e) => { e.currentTarget.classList.remove('border-purple-500', 'bg-purple-50') }}
onDrop={(e) => {
e.preventDefault()
e.stopPropagation()
e.currentTarget.classList.remove('border-purple-500', 'bg-purple-50')
const file = e.dataTransfer.files[0]
if (file?.type.startsWith('image/')) onFileUpload(file)
}}
>
{imagePreview ? (
<div className="relative">
<img
src={imagePreview}
alt="Hochgeladenes Bild"
className="max-h-64 mx-auto rounded-lg shadow-sm"
/>
<button
onClick={(e) => {
e.stopPropagation()
onClearImage()
}}
className="absolute top-2 right-2 p-1 bg-red-500 text-white rounded-full hover:bg-red-600 transition-colors"
title="Bild entfernen (Escape)"
>
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
) : (
<>
<div className="text-4xl mb-2">📄</div>
<div className="text-slate-700">Bild hierher ziehen oder klicken zum Hochladen</div>
<div className="text-xs text-slate-400 mt-1">PNG, JPG - Handgeschriebener Text</div>
<div className="text-xs text-purple-500 mt-2">
oder <kbd className="px-1.5 py-0.5 bg-purple-100 rounded font-mono">Ctrl+V</kbd> zum Einfuegen
</div>
</>
)}
</div>
<input
type="file"
id="ocr-file-input"
accept="image/*"
className="hidden"
onChange={(e) => {
const file = e.target.files?.[0]
if (file) onFileUpload(file)
}}
/>
{uploadedImage && !livePreview && (
<button
onClick={onManualOCR}
disabled={ocrLoading}
className="w-full mt-4 px-4 py-2 bg-purple-600 hover:bg-purple-700 disabled:bg-slate-300 text-white rounded-lg text-sm font-medium transition-colors"
>
{ocrLoading ? (
<span className="flex items-center justify-center gap-2">
<SkeletonDots />
Analysiere...
</span>
) : (
'OCR starten (Ctrl+Enter)'
)}
</button>
)}
</div>
)
}
interface ResultsAreaProps {
ocrResult: OCRResult | null
ocrLoading: boolean
onSendToTraining: () => void
}
function ResultsArea({ ocrResult, ocrLoading, onSendToTraining }: ResultsAreaProps) {
if (ocrLoading) return <SkeletonOCRResult />
if (!ocrResult) {
return (
<div className="bg-slate-50 rounded-lg p-8 text-center text-slate-400">
<div className="text-4xl mb-2">🔍</div>
<div>Lade ein Bild hoch um die Erkennung zu testen</div>
</div>
)
}
return (
<div className="bg-slate-50 rounded-lg p-4">
<div className="flex items-center justify-between mb-2">
<h3 className="text-sm font-medium text-slate-700">Erkannter Text:</h3>
<div className={`px-2 py-1 rounded-full text-xs font-medium ${
ocrResult.confidence >= 0.9 ? 'bg-green-100 text-green-700' :
ocrResult.confidence >= 0.7 ? 'bg-yellow-100 text-yellow-700' :
'bg-red-100 text-red-700'
}`}>
{(ocrResult.confidence * 100).toFixed(0)}% Konfidenz
</div>
</div>
<pre className="bg-white border p-3 rounded text-sm text-slate-900 whitespace-pre-wrap max-h-48 overflow-y-auto">
{ocrResult.text || '(Kein Text erkannt)'}
</pre>
{/* Confidence bar */}
<div className="mt-3 mb-3">
<div className="h-2 bg-slate-200 rounded-full overflow-hidden">
<div
className={`h-full transition-all duration-500 ${getConfidenceColor(ocrResult.confidence)}`}
style={{ width: `${ocrResult.confidence * 100}%` }}
/>
</div>
</div>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
<div className="bg-white border rounded p-2">
<div className="text-slate-500 text-xs">Konfidenz</div>
<div className="text-slate-900 font-medium">{(ocrResult.confidence * 100).toFixed(1)}%</div>
</div>
<div className="bg-white border rounded p-2">
<div className="text-slate-500 text-xs">Verarbeitungszeit</div>
<div className="text-slate-900 font-medium">{ocrResult.processing_time_ms}ms</div>
</div>
<div className="bg-white border rounded p-2">
<div className="text-slate-500 text-xs">Modell</div>
<div className="text-slate-900 font-medium">{ocrResult.model || 'TrOCR'}</div>
</div>
<div className="bg-white border rounded p-2">
<div className="text-slate-500 text-xs">LoRA Adapter</div>
<div className="text-slate-900 font-medium">{ocrResult.has_lora_adapter ? 'Ja' : 'Nein'}</div>
</div>
</div>
{ocrResult.confidence < 0.9 && (
<div className="mt-4 p-3 bg-blue-50 border border-blue-200 rounded-lg">
<p className="text-sm text-blue-800 mb-2">
Die Erkennung koennte verbessert werden! Moechtest du dieses Beispiel zum Training hinzufuegen?
</p>
<button
onClick={onSendToTraining}
className="px-3 py-1 bg-blue-600 hover:bg-blue-700 text-white rounded text-sm transition-colors"
>
Als Trainingsbeispiel hinzufuegen
</button>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,333 @@
'use client'
import Link from 'next/link'
import { SkeletonDots } from '@/components/common/SkeletonText'
import { TrainingMetrics } from '@/components/ai/TrainingMetrics'
import type { TrOCRStatus, TrainingExample, MagicSettings } from '../types'
import { API_BASE } from '../types'
interface TabTrainingProps {
status: TrOCRStatus | null
examples: TrainingExample[]
trainingImage: File | null
trainingText: string
fineTuning: boolean
settings: MagicSettings
showTrainingDashboard: boolean
onSetTrainingImage: (file: File | null) => void
onSetTrainingText: (text: string) => void
onAddExample: () => void
onFineTune: () => void
onToggleDashboard: () => void
}
export function TabTraining({
status,
examples,
trainingImage,
trainingText,
fineTuning,
settings,
showTrainingDashboard,
onSetTrainingImage,
onSetTrainingText,
onAddExample,
onFineTune,
onToggleDashboard,
}: TabTrainingProps) {
const exampleCount = status?.training_examples_count || 0
const progressPct = Math.min(100, (exampleCount / 10) * 100)
return (
<div className="space-y-6">
{/* Training Overview */}
<TrainingOverviewCard
status={status}
settings={settings}
exampleCount={exampleCount}
progressPct={progressPct}
/>
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
{/* Add Training Example */}
<AddExampleCard
trainingImage={trainingImage}
trainingText={trainingText}
onSetTrainingImage={onSetTrainingImage}
onSetTrainingText={onSetTrainingText}
onAddExample={onAddExample}
/>
{/* Fine-Tuning */}
<FineTuningCard
settings={settings}
fineTuning={fineTuning}
exampleCount={exampleCount}
hasLoraAdapter={status?.has_lora_adapter || false}
onFineTune={onFineTune}
/>
</div>
{/* Training Examples List */}
{examples.length > 0 && (
<ExamplesListCard examples={examples} />
)}
{/* Training Dashboard Demo */}
<TrainingDashboardCard
showDashboard={showTrainingDashboard}
onToggle={onToggleDashboard}
/>
</div>
)
}
/* ------------------------------------------------------------------ */
function TrainingOverviewCard({
status,
settings,
exampleCount,
progressPct,
}: {
status: TrOCRStatus | null
settings: MagicSettings
exampleCount: number
progressPct: number
}) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Training mit LoRA</h2>
<p className="text-sm text-slate-500 mb-4">
LoRA (Low-Rank Adaptation) ermoeglicht effizientes Fine-Tuning ohne das Basismodell zu veraendern.
Das Training erfolgt lokal auf Ihrem System.
</p>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-6">
<div className="bg-slate-50 rounded-lg p-4 text-center">
<div className="text-3xl font-bold text-slate-900">{exampleCount}</div>
<div className="text-xs text-slate-500">Trainingsbeispiele</div>
</div>
<div className="bg-slate-50 rounded-lg p-4 text-center">
<div className="text-3xl font-bold text-slate-900">10</div>
<div className="text-xs text-slate-500">Minimum benoetigt</div>
</div>
<div className="bg-slate-50 rounded-lg p-4 text-center">
<div className="text-3xl font-bold text-slate-900">{settings.loraRank}</div>
<div className="text-xs text-slate-500">LoRA Rank</div>
</div>
<div className="bg-slate-50 rounded-lg p-4 text-center">
<div className="text-3xl font-bold text-slate-900">{status?.has_lora_adapter ? '\u2713' : '\u2717'}</div>
<div className="text-xs text-slate-500">Adapter aktiv</div>
</div>
</div>
<div className="mb-6">
<div className="flex justify-between text-sm mb-1">
<span className="text-slate-500">Fortschritt zum Fine-Tuning</span>
<span className="text-slate-500">{progressPct.toFixed(0)}%</span>
</div>
<div className="h-2 bg-slate-200 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-purple-500 to-blue-500 transition-all duration-500"
style={{ width: `${progressPct}%` }}
/>
</div>
</div>
</div>
)
}
function AddExampleCard({
trainingImage,
trainingText,
onSetTrainingImage,
onSetTrainingText,
onAddExample,
}: {
trainingImage: File | null
trainingText: string
onSetTrainingImage: (file: File | null) => void
onSetTrainingText: (text: string) => void
onAddExample: () => void
}) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Trainingsbeispiel hinzufuegen</h2>
<p className="text-sm text-slate-500 mb-4">
Lade ein Bild mit handgeschriebenem Text hoch und gib die korrekte Transkription ein.
</p>
<div className="space-y-4">
<div>
<label className="block text-sm text-slate-700 mb-1">Bild</label>
<input
type="file"
accept="image/*"
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-sm"
onChange={(e) => onSetTrainingImage(e.target.files?.[0] || null)}
/>
{trainingImage && (
<div className="mt-2 text-xs text-green-600">
Bild ausgewaehlt: {trainingImage.name}
</div>
)}
</div>
<div>
<label className="block text-sm text-slate-700 mb-1">Korrekter Text (Ground Truth)</label>
<textarea
className="w-full bg-slate-50 border border-slate-300 rounded-lg px-3 py-2 text-sm text-slate-900 resize-none"
rows={3}
placeholder="Gib hier den korrekten Text ein..."
value={trainingText}
onChange={(e) => onSetTrainingText(e.target.value)}
/>
</div>
<button
onClick={onAddExample}
className="w-full px-4 py-2 bg-purple-600 hover:bg-purple-700 text-white rounded-lg text-sm font-medium transition-colors"
>
+ Trainingsbeispiel hinzufuegen
</button>
</div>
</div>
)
}
function FineTuningCard({
settings,
fineTuning,
exampleCount,
hasLoraAdapter,
onFineTune,
}: {
settings: MagicSettings
fineTuning: boolean
exampleCount: number
hasLoraAdapter: boolean
onFineTune: () => void
}) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Fine-Tuning starten</h2>
<p className="text-sm text-slate-500 mb-4">
Trainiere das Modell mit den gesammelten Beispielen. Der Prozess dauert
je nach Anzahl der Beispiele einige Minuten.
</p>
<div className="bg-slate-50 rounded-lg p-4 mb-4">
<div className="grid grid-cols-2 gap-4 text-sm">
<div>
<span className="text-slate-500">Epochen:</span>
<span className="text-slate-900 ml-2">{settings.epochs}</span>
</div>
<div>
<span className="text-slate-500">Learning Rate:</span>
<span className="text-slate-900 ml-2">{settings.learningRate}</span>
</div>
<div>
<span className="text-slate-500">LoRA Rank:</span>
<span className="text-slate-900 ml-2">{settings.loraRank}</span>
</div>
<div>
<span className="text-slate-500">Batch Size:</span>
<span className="text-slate-900 ml-2">{settings.batchSize}</span>
</div>
</div>
</div>
<button
onClick={onFineTune}
disabled={fineTuning || exampleCount < 10}
className="w-full px-4 py-2 bg-green-600 hover:bg-green-700 disabled:bg-slate-300 disabled:cursor-not-allowed text-white rounded-lg text-sm font-medium transition-colors"
>
{fineTuning ? (
<span className="flex items-center justify-center gap-2">
<SkeletonDots />
Fine-Tuning laeuft...
</span>
) : (
'Fine-Tuning starten'
)}
</button>
{exampleCount < 10 && (
<p className="text-xs text-yellow-600 mt-2 text-center">
Noch {10 - exampleCount} Beispiele benoetigt
</p>
)}
<Link
href="/ai/ocr-labeling?model=trocr-lora"
className="w-full mt-4 px-4 py-2 bg-teal-100 text-teal-700 border border-teal-300 rounded-lg hover:bg-teal-200 flex items-center justify-center gap-2 transition-colors"
>
<span>🏷</span>
Ground Truth in OCR-Labeling sammeln
</Link>
</div>
)
}
function ExamplesListCard({ examples }: { examples: TrainingExample[] }) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<h2 className="text-lg font-semibold text-slate-900 mb-4">Trainingsbeispiele ({examples.length})</h2>
<div className="space-y-2 max-h-64 overflow-y-auto">
{examples.map((ex, i) => (
<div key={i} className="flex items-center gap-4 bg-slate-50 rounded-lg p-3">
<span className="text-slate-400 font-mono text-sm w-8">{i + 1}.</span>
<span className="text-slate-900 text-sm flex-1 truncate">{ex.ground_truth}</span>
<span className="text-slate-400 text-xs">{new Date(ex.created_at).toLocaleDateString('de-DE')}</span>
</div>
))}
</div>
</div>
)
}
function TrainingDashboardCard({
showDashboard,
onToggle,
}: {
showDashboard: boolean
onToggle: () => void
}) {
return (
<div className="bg-white rounded-xl shadow-sm border p-6">
<div className="flex items-center justify-between mb-4">
<div>
<h2 className="text-lg font-semibold text-slate-900">Training Dashboard</h2>
<p className="text-sm text-slate-500">Live-Metriken waehrend des Trainings</p>
</div>
<button
onClick={onToggle}
className={`px-4 py-2 rounded-lg text-sm font-medium transition-colors ${
showDashboard
? 'bg-red-600 hover:bg-red-700 text-white'
: 'bg-purple-600 hover:bg-purple-700 text-white'
}`}
>
{showDashboard ? 'Demo stoppen' : 'Demo starten'}
</button>
</div>
{showDashboard ? (
<TrainingMetrics
apiBase={API_BASE}
simulateMode={true}
onComplete={onToggle}
/>
) : (
<div className="bg-slate-50 rounded-lg p-8 text-center">
<div className="text-4xl mb-3">📈</div>
<div className="text-slate-600 mb-2">
Das Training Dashboard zeigt Echtzeit-Metriken waehrend des Fine-Tunings
</div>
<div className="text-sm text-slate-400">
Klicke &quot;Demo starten&quot; um eine simulierte Training-Session zu sehen
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,7 @@
export { GlobalDragOverlay, KeyboardShortcutsModal } from './GlobalOverlays'
export { TabOverview } from './TabOverview'
export { TabTest } from './TabTest'
export { TabBatch } from './TabBatch'
export { TabTraining } from './TabTraining'
export { TabArchitecture } from './TabArchitecture'
export { TabSettings } from './TabSettings'

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,71 @@
export type TabId = 'overview' | 'test' | 'batch' | 'training' | 'architecture' | 'settings'
export interface TrOCRStatus {
status: 'available' | 'not_installed' | 'error'
model_name?: string
model_id?: string
device?: string
is_loaded?: boolean
has_lora_adapter?: boolean
training_examples_count?: number
error?: string
install_command?: string
}
export interface OCRResult {
text: string
confidence: number
processing_time_ms: number
model: string
has_lora_adapter: boolean
char_confidences?: number[]
word_boxes?: Array<{ text: string; confidence: number; bbox: number[] }>
}
export interface TrainingExample {
image_path: string
ground_truth: string
teacher_id: string
created_at: string
}
export interface MagicSettings {
autoDetectLines: boolean
confidenceThreshold: number
maxImageSize: number
loraRank: number
loraAlpha: number
learningRate: number
epochs: number
batchSize: number
enableCache: boolean
cacheMaxAge: number
livePreview: boolean
soundFeedback: boolean
}
export const DEFAULT_SETTINGS: MagicSettings = {
autoDetectLines: true,
confidenceThreshold: 0.7,
maxImageSize: 4096,
loraRank: 8,
loraAlpha: 32,
learningRate: 0.00005,
epochs: 3,
batchSize: 4,
enableCache: true,
cacheMaxAge: 3600,
livePreview: true,
soundFeedback: false,
}
export const TABS = [
{ id: 'overview' as TabId, label: 'Uebersicht', icon: '\u{1F4CA}', shortcut: 'Alt+1' },
{ id: 'test' as TabId, label: 'OCR Test', icon: '\u{1F50D}', shortcut: 'Alt+2' },
{ id: 'batch' as TabId, label: 'Batch OCR', icon: '\u{1F4C1}', shortcut: 'Alt+3' },
{ id: 'training' as TabId, label: 'Training', icon: '\u{1F3AF}', shortcut: 'Alt+4' },
{ id: 'architecture' as TabId, label: 'Architektur', icon: '\u{1F3D7}\uFE0F', shortcut: 'Alt+5' },
{ id: 'settings' as TabId, label: 'Einstellungen', icon: '\u2699\uFE0F', shortcut: 'Alt+6' },
] as const
export const API_BASE = '/klausur-api'

View File

@@ -0,0 +1,382 @@
'use client'
import { useState, useEffect, useCallback, useRef } from 'react'
import {
type TabId,
type TrOCRStatus,
type OCRResult,
type TrainingExample,
type MagicSettings,
DEFAULT_SETTINGS,
API_BASE,
} from './types'
function playSuccessSound() {
try {
const audioContext = new (window.AudioContext || (window as unknown as { webkitAudioContext: typeof AudioContext }).webkitAudioContext)()
const oscillator = audioContext.createOscillator()
const gainNode = audioContext.createGain()
oscillator.connect(gainNode)
gainNode.connect(audioContext.destination)
oscillator.frequency.value = 800
oscillator.type = 'sine'
gainNode.gain.setValueAtTime(0.1, audioContext.currentTime)
gainNode.gain.exponentialRampToValueAtTime(0.01, audioContext.currentTime + 0.2)
oscillator.start(audioContext.currentTime)
oscillator.stop(audioContext.currentTime + 0.2)
} catch {
// Audio not supported, ignore
}
}
export function useMagicHelp() {
const [activeTab, setActiveTab] = useState<TabId>('overview')
const [status, setStatus] = useState<TrOCRStatus | null>(null)
const [loading, setLoading] = useState(true)
const [ocrResult, setOcrResult] = useState<OCRResult | null>(null)
const [ocrLoading, setOcrLoading] = useState(false)
const [examples, setExamples] = useState<TrainingExample[]>([])
const [trainingImage, setTrainingImage] = useState<File | null>(null)
const [trainingText, setTrainingText] = useState('')
const [fineTuning, setFineTuning] = useState(false)
const [settings, setSettings] = useState<MagicSettings>(DEFAULT_SETTINGS)
const [settingsSaved, setSettingsSaved] = useState(false)
// Phase 1: New state for enhanced features
const [globalDragActive, setGlobalDragActive] = useState(false)
const [uploadedImage, setUploadedImage] = useState<File | null>(null)
const [imagePreview, setImagePreview] = useState<string | null>(null)
const [showShortcutHint, setShowShortcutHint] = useState(false)
const [showHeatmap, setShowHeatmap] = useState(false)
const [showTrainingDashboard, setShowTrainingDashboard] = useState(false)
const debounceTimer = useRef<NodeJS.Timeout | null>(null)
const dragCounter = useRef(0)
const fetchStatus = useCallback(async () => {
try {
const res = await fetch(`${API_BASE}/api/klausur/trocr/status`)
const data = await res.json()
setStatus(data)
} catch {
setStatus({ status: 'error', error: 'Failed to fetch status' })
} finally {
setLoading(false)
}
}, [])
const fetchExamples = useCallback(async () => {
try {
const res = await fetch(`${API_BASE}/api/klausur/trocr/training/examples`)
const data = await res.json()
setExamples(data.examples || [])
} catch (error) {
console.error('Failed to fetch examples:', error)
}
}, [])
// Phase 1: Live OCR with debounce
const triggerOCR = useCallback(async (file: File) => {
setOcrLoading(true)
setOcrResult(null)
const formData = new FormData()
formData.append('file', file)
try {
const res = await fetch(`${API_BASE}/api/klausur/trocr/extract?detect_lines=${settings.autoDetectLines}`, {
method: 'POST',
body: formData,
})
const data = await res.json()
if (data.text !== undefined) {
setOcrResult(data)
if (settings.soundFeedback && data.confidence > 0.7) {
playSuccessSound()
}
} else {
setOcrResult({ text: `Error: ${data.detail || 'Unknown error'}`, confidence: 0, processing_time_ms: 0, model: '', has_lora_adapter: false })
}
} catch (error) {
setOcrResult({ text: `Error: ${error}`, confidence: 0, processing_time_ms: 0, model: '', has_lora_adapter: false })
} finally {
setOcrLoading(false)
}
}, [settings.autoDetectLines, settings.soundFeedback])
// Handle file upload with live preview
const handleFileUpload = useCallback((file: File) => {
if (!file.type.startsWith('image/')) return
setUploadedImage(file)
const previewUrl = URL.createObjectURL(file)
setImagePreview(previewUrl)
setActiveTab('test')
if (settings.livePreview) {
if (debounceTimer.current) {
clearTimeout(debounceTimer.current)
}
debounceTimer.current = setTimeout(() => {
triggerOCR(file)
}, 500)
}
}, [settings.livePreview, triggerOCR])
const handleManualOCR = () => {
if (uploadedImage) {
triggerOCR(uploadedImage)
}
}
// Phase 1: Global Drag & Drop handler
useEffect(() => {
const handleDragEnter = (e: DragEvent) => {
e.preventDefault()
e.stopPropagation()
dragCounter.current++
if (e.dataTransfer?.types.includes('Files')) {
setGlobalDragActive(true)
}
}
const handleDragLeave = (e: DragEvent) => {
e.preventDefault()
e.stopPropagation()
dragCounter.current--
if (dragCounter.current === 0) {
setGlobalDragActive(false)
}
}
const handleDragOver = (e: DragEvent) => {
e.preventDefault()
e.stopPropagation()
}
const handleDrop = (e: DragEvent) => {
e.preventDefault()
e.stopPropagation()
dragCounter.current = 0
setGlobalDragActive(false)
const file = e.dataTransfer?.files[0]
if (file?.type.startsWith('image/')) {
handleFileUpload(file)
}
}
document.addEventListener('dragenter', handleDragEnter)
document.addEventListener('dragleave', handleDragLeave)
document.addEventListener('dragover', handleDragOver)
document.addEventListener('drop', handleDrop)
return () => {
document.removeEventListener('dragenter', handleDragEnter)
document.removeEventListener('dragleave', handleDragLeave)
document.removeEventListener('dragover', handleDragOver)
document.removeEventListener('drop', handleDrop)
}
}, [handleFileUpload])
// Phase 1: Clipboard paste handler (Ctrl+V)
useEffect(() => {
const handlePaste = async (e: ClipboardEvent) => {
const items = e.clipboardData?.items
if (!items) return
for (const item of items) {
if (item.type.startsWith('image/')) {
e.preventDefault()
const file = item.getAsFile()
if (file) {
handleFileUpload(file)
}
break
}
}
}
document.addEventListener('paste', handlePaste)
return () => document.removeEventListener('paste', handlePaste)
}, [handleFileUpload])
// Phase 1: Keyboard shortcuts
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key === 'Enter' && uploadedImage) {
e.preventDefault()
handleManualOCR()
}
if (e.key >= '1' && e.key <= '6' && e.altKey) {
e.preventDefault()
const tabIndex = parseInt(e.key) - 1
const tabIds: TabId[] = ['overview', 'test', 'batch', 'training', 'architecture', 'settings']
if (tabIds[tabIndex]) {
setActiveTab(tabIds[tabIndex])
}
}
if (e.key === 'Escape' && uploadedImage) {
setUploadedImage(null)
setImagePreview(null)
setOcrResult(null)
}
if (e.key === '?') {
setShowShortcutHint(prev => !prev)
}
}
document.addEventListener('keydown', handleKeyDown)
return () => document.removeEventListener('keydown', handleKeyDown)
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [uploadedImage])
// Initial data load + settings from localStorage
useEffect(() => {
fetchStatus()
fetchExamples()
const saved = localStorage.getItem('magic-help-settings')
if (saved) {
try {
setSettings({ ...DEFAULT_SETTINGS, ...JSON.parse(saved) })
} catch {
// ignore parse errors
}
}
}, [fetchStatus, fetchExamples])
// Cleanup preview URL
useEffect(() => {
return () => {
if (imagePreview) {
URL.revokeObjectURL(imagePreview)
}
}
}, [imagePreview])
const handleAddTrainingExample = async () => {
if (!trainingImage || !trainingText.trim()) {
alert('Please provide both an image and the correct text')
return
}
const formData = new FormData()
formData.append('file', trainingImage)
try {
const res = await fetch(`${API_BASE}/api/klausur/trocr/training/add?ground_truth=${encodeURIComponent(trainingText)}`, {
method: 'POST',
body: formData,
})
const data = await res.json()
if (data.example_id) {
alert(`Training example added! Total: ${data.total_examples}`)
setTrainingImage(null)
setTrainingText('')
fetchStatus()
fetchExamples()
} else {
alert(`Error: ${data.detail || 'Unknown error'}`)
}
} catch (error) {
alert(`Error: ${error}`)
}
}
const handleFineTune = async () => {
if (!confirm('Start fine-tuning? This may take several minutes.')) return
setFineTuning(true)
try {
const res = await fetch(`${API_BASE}/api/klausur/trocr/training/fine-tune`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
epochs: settings.epochs,
learning_rate: settings.learningRate,
lora_rank: settings.loraRank,
lora_alpha: settings.loraAlpha,
}),
})
const data = await res.json()
if (data.status === 'success') {
alert(`Fine-tuning successful!\nExamples used: ${data.examples_used}\nEpochs: ${data.epochs}`)
fetchStatus()
} else {
alert(`Fine-tuning failed: ${data.message}`)
}
} catch (error) {
alert(`Error: ${error}`)
} finally {
setFineTuning(false)
}
}
const saveSettings = () => {
localStorage.setItem('magic-help-settings', JSON.stringify(settings))
setSettingsSaved(true)
setTimeout(() => setSettingsSaved(false), 2000)
}
const clearUploadedImage = () => {
setUploadedImage(null)
setImagePreview(null)
setOcrResult(null)
}
const sendToTraining = () => {
if (uploadedImage && ocrResult) {
setTrainingImage(uploadedImage)
setTrainingText(ocrResult.text)
setActiveTab('training')
}
}
return {
// State
activeTab,
setActiveTab,
status,
loading,
ocrResult,
ocrLoading,
examples,
trainingImage,
setTrainingImage,
trainingText,
setTrainingText,
fineTuning,
settings,
setSettings,
settingsSaved,
globalDragActive,
uploadedImage,
imagePreview,
showShortcutHint,
setShowShortcutHint,
showHeatmap,
setShowHeatmap,
showTrainingDashboard,
setShowTrainingDashboard,
// Actions
fetchStatus,
handleFileUpload,
handleManualOCR,
handleAddTrainingExample,
handleFineTune,
saveSettings,
clearUploadedImage,
sendToTraining,
}
}
export type UseMagicHelpReturn = ReturnType<typeof useMagicHelp>

View File

@@ -1,549 +0,0 @@
'use client'
/**
* Model Management Page
*
* Manage ML model backends (PyTorch vs ONNX), view status,
* run benchmarks, and configure inference settings.
*/
import { useState, useEffect, useCallback } from 'react'
import { PagePurpose } from '@/components/common/PagePurpose'
const KLAUSUR_API = '/klausur-api'
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
type BackendMode = 'auto' | 'pytorch' | 'onnx'
type ModelStatus = 'available' | 'not_found' | 'loading' | 'error'
type Tab = 'overview' | 'benchmarks' | 'configuration'
interface ModelInfo {
name: string
key: string
pytorch: { status: ModelStatus; size_mb: number; ram_mb: number }
onnx: { status: ModelStatus; size_mb: number; ram_mb: number; quantized: boolean }
}
interface BenchmarkRow {
model: string
backend: string
quantization: string
size_mb: number
ram_mb: number
inference_ms: number
load_time_s: number
}
interface StatusInfo {
active_backend: BackendMode
loaded_models: string[]
cache_hits: number
cache_misses: number
uptime_s: number
}
// ---------------------------------------------------------------------------
// Mock data (used when backend is not available)
// ---------------------------------------------------------------------------
const MOCK_MODELS: ModelInfo[] = [
{
name: 'TrOCR Printed',
key: 'trocr_printed',
pytorch: { status: 'available', size_mb: 892, ram_mb: 1800 },
onnx: { status: 'available', size_mb: 234, ram_mb: 620, quantized: true },
},
{
name: 'TrOCR Handwritten',
key: 'trocr_handwritten',
pytorch: { status: 'available', size_mb: 892, ram_mb: 1800 },
onnx: { status: 'not_found', size_mb: 0, ram_mb: 0, quantized: false },
},
{
name: 'PP-DocLayout',
key: 'pp_doclayout',
pytorch: { status: 'not_found', size_mb: 0, ram_mb: 0 },
onnx: { status: 'available', size_mb: 48, ram_mb: 180, quantized: false },
},
]
const MOCK_BENCHMARKS: BenchmarkRow[] = [
{ model: 'TrOCR Printed', backend: 'PyTorch', quantization: 'FP32', size_mb: 892, ram_mb: 1800, inference_ms: 142, load_time_s: 3.2 },
{ model: 'TrOCR Printed', backend: 'ONNX', quantization: 'INT8', size_mb: 234, ram_mb: 620, inference_ms: 38, load_time_s: 0.8 },
{ model: 'TrOCR Handwritten', backend: 'PyTorch', quantization: 'FP32', size_mb: 892, ram_mb: 1800, inference_ms: 156, load_time_s: 3.4 },
{ model: 'PP-DocLayout', backend: 'ONNX', quantization: 'FP32', size_mb: 48, ram_mb: 180, inference_ms: 22, load_time_s: 0.3 },
]
const MOCK_STATUS: StatusInfo = {
active_backend: 'auto',
loaded_models: ['trocr_printed (ONNX)', 'pp_doclayout (ONNX)'],
cache_hits: 1247,
cache_misses: 83,
uptime_s: 86400,
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function StatusBadge({ status }: { status: ModelStatus }) {
const cls =
status === 'available'
? 'bg-emerald-100 text-emerald-800 border-emerald-200'
: status === 'loading'
? 'bg-blue-100 text-blue-800 border-blue-200'
: status === 'not_found'
? 'bg-slate-100 text-slate-500 border-slate-200'
: 'bg-red-100 text-red-800 border-red-200'
const label =
status === 'available' ? 'Verfuegbar'
: status === 'loading' ? 'Laden...'
: status === 'not_found' ? 'Nicht vorhanden'
: 'Fehler'
return (
<span className={`inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium border ${cls}`}>
{label}
</span>
)
}
function formatBytes(mb: number) {
if (mb === 0) return '--'
if (mb >= 1000) return `${(mb / 1000).toFixed(1)} GB`
return `${mb} MB`
}
function formatUptime(seconds: number) {
const h = Math.floor(seconds / 3600)
const m = Math.floor((seconds % 3600) / 60)
if (h > 0) return `${h}h ${m}m`
return `${m}m`
}
// ---------------------------------------------------------------------------
// Component
// ---------------------------------------------------------------------------
export default function ModelManagementPage() {
const [tab, setTab] = useState<Tab>('overview')
const [models, setModels] = useState<ModelInfo[]>(MOCK_MODELS)
const [benchmarks, setBenchmarks] = useState<BenchmarkRow[]>(MOCK_BENCHMARKS)
const [status, setStatus] = useState<StatusInfo>(MOCK_STATUS)
const [backend, setBackend] = useState<BackendMode>('auto')
const [saving, setSaving] = useState(false)
const [benchmarkRunning, setBenchmarkRunning] = useState(false)
const [usingMock, setUsingMock] = useState(false)
// Load status
const loadStatus = useCallback(async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/models/status`)
if (res.ok) {
const data = await res.json()
setStatus(data)
setBackend(data.active_backend || 'auto')
setUsingMock(false)
} else {
setUsingMock(true)
}
} catch {
setUsingMock(true)
}
}, [])
// Load models
const loadModels = useCallback(async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/models`)
if (res.ok) {
const data = await res.json()
if (data.models?.length) setModels(data.models)
}
} catch {
// Keep mock data
}
}, [])
// Load benchmarks
const loadBenchmarks = useCallback(async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/models/benchmarks`)
if (res.ok) {
const data = await res.json()
if (data.benchmarks?.length) setBenchmarks(data.benchmarks)
}
} catch {
// Keep mock data
}
}, [])
useEffect(() => {
loadStatus()
loadModels()
loadBenchmarks()
}, [loadStatus, loadModels, loadBenchmarks])
// Save backend preference
const saveBackend = async (mode: BackendMode) => {
setBackend(mode)
setSaving(true)
try {
await fetch(`${KLAUSUR_API}/api/v1/models/backend`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ backend: mode }),
})
await loadStatus()
} catch {
// Silently handle — mock mode
} finally {
setSaving(false)
}
}
// Run benchmark
const runBenchmark = async () => {
setBenchmarkRunning(true)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/models/benchmark`, {
method: 'POST',
})
if (res.ok) {
const data = await res.json()
if (data.benchmarks?.length) setBenchmarks(data.benchmarks)
}
await loadBenchmarks()
} catch {
// Keep existing data
} finally {
setBenchmarkRunning(false)
}
}
const tabs: { key: Tab; label: string }[] = [
{ key: 'overview', label: 'Uebersicht' },
{ key: 'benchmarks', label: 'Benchmarks' },
{ key: 'configuration', label: 'Konfiguration' },
]
return (
<div className="space-y-6">
<div className="max-w-7xl mx-auto p-6 space-y-6">
<PagePurpose
title="Model Management"
purpose="Verwaltung der ML-Modelle fuer OCR und Layout-Erkennung. Vergleich von PyTorch- und ONNX-Backends, Benchmark-Tests und Backend-Konfiguration."
audience={['Entwickler', 'DevOps']}
defaultCollapsed
architecture={{
services: ['klausur-service (FastAPI, Port 8086)'],
databases: ['Dateisystem (Modell-Dateien)'],
}}
relatedPages={[
{ name: 'OCR Pipeline', href: '/ai/ocr-pipeline', description: 'OCR-Pipeline ausfuehren' },
{ name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'OCR-Methoden vergleichen' },
{ name: 'GPU Infrastruktur', href: '/ai/gpu', description: 'GPU-Ressourcen verwalten' },
]}
/>
{/* Header */}
<div className="flex items-center justify-between">
<div>
<h1 className="text-2xl font-bold text-slate-900">Model Management</h1>
<p className="text-sm text-slate-500 mt-1">
{models.length} Modelle konfiguriert
{usingMock && (
<span className="ml-2 text-xs bg-amber-100 text-amber-700 px-1.5 py-0.5 rounded">
Mock-Daten (Backend nicht erreichbar)
</span>
)}
</p>
</div>
</div>
{/* Status Cards */}
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-4">
<div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
<p className="text-xs text-slate-500 uppercase font-medium">Aktives Backend</p>
<p className="text-lg font-semibold text-slate-900 mt-1">{status.active_backend.toUpperCase()}</p>
</div>
<div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
<p className="text-xs text-slate-500 uppercase font-medium">Geladene Modelle</p>
<p className="text-lg font-semibold text-slate-900 mt-1">{status.loaded_models.length}</p>
</div>
<div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
<p className="text-xs text-slate-500 uppercase font-medium">Cache Hit-Rate</p>
<p className="text-lg font-semibold text-slate-900 mt-1">
{status.cache_hits + status.cache_misses > 0
? `${((status.cache_hits / (status.cache_hits + status.cache_misses)) * 100).toFixed(1)}%`
: '--'}
</p>
</div>
<div className="bg-white rounded-lg border border-slate-200 px-4 py-3">
<p className="text-xs text-slate-500 uppercase font-medium">Uptime</p>
<p className="text-lg font-semibold text-slate-900 mt-1">{formatUptime(status.uptime_s)}</p>
</div>
</div>
{/* Tabs */}
<div className="border-b border-slate-200">
<nav className="flex gap-4">
{tabs.map(t => (
<button
key={t.key}
onClick={() => setTab(t.key)}
className={`pb-3 px-1 text-sm font-medium border-b-2 transition-colors ${
tab === t.key
? 'border-teal-500 text-teal-600'
: 'border-transparent text-slate-500 hover:text-slate-700'
}`}
>
{t.label}
</button>
))}
</nav>
</div>
{/* Overview Tab */}
{tab === 'overview' && (
<div className="space-y-4">
<h3 className="text-sm font-medium text-slate-700">Verfuegbare Modelle</h3>
<div className="grid gap-4 sm:grid-cols-2 lg:grid-cols-3">
{models.map(m => (
<div key={m.key} className="bg-white rounded-lg border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b border-slate-100">
<h4 className="font-semibold text-slate-900">{m.name}</h4>
<p className="text-xs text-slate-400 mt-0.5 font-mono">{m.key}</p>
</div>
<div className="px-4 py-3 space-y-3">
{/* PyTorch */}
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<span className="text-xs font-medium text-slate-600 w-16">PyTorch</span>
<StatusBadge status={m.pytorch.status} />
</div>
{m.pytorch.status === 'available' && (
<span className="text-xs text-slate-400">
{formatBytes(m.pytorch.size_mb)} / {formatBytes(m.pytorch.ram_mb)} RAM
</span>
)}
</div>
{/* ONNX */}
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<span className="text-xs font-medium text-slate-600 w-16">ONNX</span>
<StatusBadge status={m.onnx.status} />
</div>
{m.onnx.status === 'available' && (
<span className="text-xs text-slate-400">
{formatBytes(m.onnx.size_mb)} / {formatBytes(m.onnx.ram_mb)} RAM
{m.onnx.quantized && (
<span className="ml-1 text-xs bg-violet-100 text-violet-700 px-1 rounded">INT8</span>
)}
</span>
)}
</div>
</div>
</div>
))}
</div>
{/* Loaded Models List */}
{status.loaded_models.length > 0 && (
<div>
<h3 className="text-sm font-medium text-slate-700 mb-2">Aktuell geladen</h3>
<div className="flex flex-wrap gap-2">
{status.loaded_models.map((m, i) => (
<span key={i} className="inline-flex items-center px-3 py-1 rounded-full text-sm bg-teal-50 text-teal-700 border border-teal-200">
{m}
</span>
))}
</div>
</div>
)}
</div>
)}
{/* Benchmarks Tab */}
{tab === 'benchmarks' && (
<div className="space-y-4">
<div className="flex items-center justify-between">
<h3 className="text-sm font-medium text-slate-700">PyTorch vs ONNX Vergleich</h3>
<button
onClick={runBenchmark}
disabled={benchmarkRunning}
className="inline-flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium transition-colors"
>
{benchmarkRunning ? (
<>
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
Benchmark laeuft...
</>
) : (
'Benchmark starten'
)}
</button>
</div>
<div className="bg-white rounded-lg border border-slate-200 overflow-hidden">
<div className="overflow-x-auto">
<table className="w-full text-sm">
<thead>
<tr className="border-b border-slate-200 bg-slate-50 text-left text-slate-500">
<th className="px-4 py-3 font-medium">Modell</th>
<th className="px-4 py-3 font-medium">Backend</th>
<th className="px-4 py-3 font-medium">Quantisierung</th>
<th className="px-4 py-3 font-medium text-right">Groesse</th>
<th className="px-4 py-3 font-medium text-right">RAM</th>
<th className="px-4 py-3 font-medium text-right">Inferenz</th>
<th className="px-4 py-3 font-medium text-right">Ladezeit</th>
</tr>
</thead>
<tbody>
{benchmarks.map((b, i) => (
<tr key={i} className="border-b border-slate-100 hover:bg-slate-50">
<td className="px-4 py-3 font-medium text-slate-900">{b.model}</td>
<td className="px-4 py-3">
<span className={`inline-flex items-center px-2 py-0.5 rounded text-xs font-medium ${
b.backend === 'ONNX'
? 'bg-violet-100 text-violet-700'
: 'bg-orange-100 text-orange-700'
}`}>
{b.backend}
</span>
</td>
<td className="px-4 py-3 text-slate-600">{b.quantization}</td>
<td className="px-4 py-3 text-right text-slate-600">{formatBytes(b.size_mb)}</td>
<td className="px-4 py-3 text-right text-slate-600">{formatBytes(b.ram_mb)}</td>
<td className="px-4 py-3 text-right">
<span className={`font-mono ${b.inference_ms < 50 ? 'text-emerald-600' : b.inference_ms < 100 ? 'text-amber-600' : 'text-red-600'}`}>
{b.inference_ms} ms
</span>
</td>
<td className="px-4 py-3 text-right text-slate-500">{b.load_time_s.toFixed(1)}s</td>
</tr>
))}
</tbody>
</table>
</div>
</div>
{benchmarks.length === 0 && (
<div className="text-center py-12 text-slate-400">
<p className="text-lg">Keine Benchmark-Daten</p>
<p className="text-sm mt-1">Klicken Sie &quot;Benchmark starten&quot; um einen Vergleich durchzufuehren.</p>
</div>
)}
</div>
)}
{/* Configuration Tab */}
{tab === 'configuration' && (
<div className="space-y-6">
{/* Backend Selector */}
<div className="bg-white rounded-lg border border-slate-200 p-5">
<h3 className="text-sm font-semibold text-slate-900 mb-1">Inference Backend</h3>
<p className="text-sm text-slate-500 mb-4">
Waehlen Sie welches Backend fuer die Modell-Inferenz verwendet werden soll.
</p>
<div className="space-y-3">
{([
{
mode: 'auto' as const,
label: 'Auto',
desc: 'ONNX wenn verfuegbar, Fallback auf PyTorch.',
},
{
mode: 'pytorch' as const,
label: 'PyTorch',
desc: 'Immer PyTorch verwenden. Hoeherer RAM-Verbrauch, volle Flexibilitaet.',
},
{
mode: 'onnx' as const,
label: 'ONNX',
desc: 'Immer ONNX verwenden. Schneller und weniger RAM, Fehler wenn nicht vorhanden.',
},
] as const).map(opt => (
<label
key={opt.mode}
className={`flex items-start gap-3 p-3 rounded-lg border cursor-pointer transition-colors ${
backend === opt.mode
? 'border-teal-300 bg-teal-50'
: 'border-slate-200 hover:bg-slate-50'
}`}
>
<input
type="radio"
name="backend"
value={opt.mode}
checked={backend === opt.mode}
onChange={() => saveBackend(opt.mode)}
disabled={saving}
className="mt-1 text-teal-600 focus:ring-teal-500"
/>
<div>
<span className="font-medium text-slate-900">{opt.label}</span>
<p className="text-sm text-slate-500 mt-0.5">{opt.desc}</p>
</div>
</label>
))}
</div>
{saving && (
<p className="text-xs text-teal-600 mt-3">Speichere...</p>
)}
</div>
{/* Model Details Table */}
<div className="bg-white rounded-lg border border-slate-200 p-5">
<h3 className="text-sm font-semibold text-slate-900 mb-4">Modell-Details</h3>
<div className="overflow-x-auto">
<table className="w-full text-sm">
<thead>
<tr className="border-b border-slate-200 text-left text-slate-500">
<th className="pb-2 font-medium">Modell</th>
<th className="pb-2 font-medium">PyTorch</th>
<th className="pb-2 font-medium text-right">Groesse (PT)</th>
<th className="pb-2 font-medium">ONNX</th>
<th className="pb-2 font-medium text-right">Groesse (ONNX)</th>
<th className="pb-2 font-medium text-right">Einsparung</th>
</tr>
</thead>
<tbody>
{models.map(m => {
const ptAvail = m.pytorch.status === 'available'
const oxAvail = m.onnx.status === 'available'
const savings = ptAvail && oxAvail && m.pytorch.size_mb > 0
? Math.round((1 - m.onnx.size_mb / m.pytorch.size_mb) * 100)
: null
return (
<tr key={m.key} className="border-b border-slate-100">
<td className="py-2.5 font-medium text-slate-900">{m.name}</td>
<td className="py-2.5"><StatusBadge status={m.pytorch.status} /></td>
<td className="py-2.5 text-right text-slate-500">{ptAvail ? formatBytes(m.pytorch.size_mb) : '--'}</td>
<td className="py-2.5"><StatusBadge status={m.onnx.status} /></td>
<td className="py-2.5 text-right text-slate-500">{oxAvail ? formatBytes(m.onnx.size_mb) : '--'}</td>
<td className="py-2.5 text-right">
{savings !== null ? (
<span className="text-emerald-600 font-medium">-{savings}%</span>
) : (
<span className="text-slate-300">--</span>
)}
</td>
</tr>
)
})}
</tbody>
</table>
</div>
</div>
</div>
)}
</div>
</div>
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,6 @@
import { Suspense } from 'react' import { Suspense } from 'react'
import { PagePurpose } from '@/components/common/PagePurpose' import { PagePurpose } from '@/components/common/PagePurpose'
import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
import { KombiStepper } from '@/components/ocr-kombi/KombiStepper' import { KombiStepper } from '@/components/ocr-kombi/KombiStepper'
import { SessionList } from '@/components/ocr-kombi/SessionList' import { SessionList } from '@/components/ocr-kombi/SessionList'
import { SessionHeader } from '@/components/ocr-kombi/SessionHeader' import { SessionHeader } from '@/components/ocr-kombi/SessionHeader'
@@ -16,6 +15,9 @@ import { StepOcr } from '@/components/ocr-kombi/StepOcr'
import { StepStructure } from '@/components/ocr-kombi/StepStructure' import { StepStructure } from '@/components/ocr-kombi/StepStructure'
import { StepGridBuild } from '@/components/ocr-kombi/StepGridBuild' import { StepGridBuild } from '@/components/ocr-kombi/StepGridBuild'
import { StepGridReview } from '@/components/ocr-kombi/StepGridReview' import { StepGridReview } from '@/components/ocr-kombi/StepGridReview'
import { StepGutterRepair } from '@/components/ocr-kombi/StepGutterRepair'
import { StepBoxGridReview } from '@/components/ocr-kombi/StepBoxGridReview'
import { StepAnsicht } from '@/components/ocr-kombi/StepAnsicht'
import { StepGroundTruth } from '@/components/ocr-kombi/StepGroundTruth' import { StepGroundTruth } from '@/components/ocr-kombi/StepGroundTruth'
import { useKombiPipeline } from './useKombiPipeline' import { useKombiPipeline } from './useKombiPipeline'
@@ -27,8 +29,7 @@ function OcrKombiContent() {
loadingSessions, loadingSessions,
activeCategory, activeCategory,
isGroundTruth, isGroundTruth,
subSessions, pageNumber,
parentSessionId,
steps, steps,
gridSaveRef, gridSaveRef,
groupedSessions, groupedSessions,
@@ -40,11 +41,8 @@ function OcrKombiContent() {
deleteSession, deleteSession,
renameSession, renameSession,
updateCategory, updateCategory,
handleSessionChange,
setSessionId, setSessionId,
setSessionName, setSessionName,
setSubSessions,
setParentSessionId,
setIsGroundTruth, setIsGroundTruth,
} = useKombiPipeline() } = useKombiPipeline()
@@ -75,17 +73,11 @@ function OcrKombiContent() {
<StepPageSplit <StepPageSplit
sessionId={sessionId} sessionId={sessionId}
sessionName={sessionName} sessionName={sessionName}
onNext={() => { onNext={handleNext}
// If sub-sessions were created, switch to the first one onSplitComplete={(childId, childName) => {
if (subSessions.length > 0) { // Switch to the first child session and refresh the list
setSessionId(subSessions[0].id) setSessionId(childId)
setSessionName(subSessions[0].name) setSessionName(childName)
}
handleNext()
}}
onSubSessionsCreated={(subs) => {
setSubSessions(subs)
if (sessionId) setParentSessionId(sessionId)
loadSessions() loadSessions()
}} }}
/> />
@@ -105,6 +97,12 @@ function OcrKombiContent() {
case 9: case 9:
return <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} /> return <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
case 10: case 10:
return <StepGutterRepair sessionId={sessionId} onNext={handleNext} />
case 11:
return <StepBoxGridReview sessionId={sessionId} onNext={handleNext} />
case 12:
return <StepAnsicht sessionId={sessionId} onNext={handleNext} />
case 13:
return ( return (
<StepGroundTruth <StepGroundTruth
sessionId={sessionId} sessionId={sessionId}
@@ -129,7 +127,6 @@ function OcrKombiContent() {
databases: ['PostgreSQL Sessions'], databases: ['PostgreSQL Sessions'],
}} }}
relatedPages={[ relatedPages={[
{ name: 'OCR Overlay (Legacy)', href: '/ai/ocr-overlay', description: 'Alter 3-Modi-Monolith' },
{ name: 'OCR Regression', href: '/ai/ocr-regression', description: 'Regressionstests' }, { name: 'OCR Regression', href: '/ai/ocr-regression', description: 'Regressionstests' },
]} ]}
defaultCollapsed defaultCollapsed
@@ -151,6 +148,7 @@ function OcrKombiContent() {
sessionName={sessionName} sessionName={sessionName}
activeCategory={activeCategory} activeCategory={activeCategory}
isGroundTruth={isGroundTruth} isGroundTruth={isGroundTruth}
pageNumber={pageNumber}
onUpdateCategory={(cat) => updateCategory(sessionId, cat)} onUpdateCategory={(cat) => updateCategory(sessionId, cat)}
/> />
)} )}
@@ -161,15 +159,6 @@ function OcrKombiContent() {
onStepClick={handleStepClick} onStepClick={handleStepClick}
/> />
{subSessions.length > 0 && parentSessionId && sessionId && (
<BoxSessionTabs
parentSessionId={parentSessionId}
subSessions={subSessions}
activeSessionId={sessionId}
onSessionChange={handleSessionChange}
/>
)}
<div className="min-h-[400px]">{renderStep()}</div> <div className="min-h-[400px]">{renderStep()}</div>
</div> </div>
) )

View File

@@ -1,34 +1,210 @@
import type { PipelineStep, PipelineStepStatus, DocumentCategory } from '../ocr-pipeline/types' // OCR Pipeline Types — migrated from deleted ocr-pipeline/types.ts
// Re-export shared types export type PipelineStepStatus = 'pending' | 'active' | 'completed' | 'failed' | 'skipped'
export type { PipelineStep, PipelineStepStatus, DocumentCategory }
export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
// Re-export grid/structure types used by later steps export interface PipelineStep {
export type { id: string
SessionListItem, name: string
SessionInfo, icon: string
SubSession, status: PipelineStepStatus
OrientationResult, }
CropResult,
DeskewResult, export type DocumentCategory =
DewarpResult, | 'vokabelseite' | 'woerterbuch' | 'buchseite' | 'arbeitsblatt' | 'klausurseite'
GridResult, | 'mathearbeit' | 'statistik' | 'zeitung' | 'formular' | 'handschrift' | 'sonstiges'
GridCell,
OcrWordBox, export const DOCUMENT_CATEGORIES: { value: DocumentCategory; label: string; icon: string }[] = [
WordBbox, { value: 'vokabelseite', label: 'Vokabelseite', icon: '📖' },
ColumnMeta, { value: 'woerterbuch', label: 'Woerterbuch', icon: '📕' },
StructureResult, { value: 'buchseite', label: 'Buchseite', icon: '📚' },
StructureBox, { value: 'arbeitsblatt', label: 'Arbeitsblatt', icon: '📝' },
StructureZone, { value: 'klausurseite', label: 'Klausurseite', icon: '📄' },
StructureGraphic, { value: 'mathearbeit', label: 'Mathearbeit', icon: '🔢' },
ExcludeRegion, { value: 'statistik', label: 'Statistik', icon: '📊' },
} from '../ocr-pipeline/types' { value: 'zeitung', label: 'Zeitung', icon: '📰' },
{ value: 'formular', label: 'Formular', icon: '📋' },
{ value: 'handschrift', label: 'Handschrift', icon: '✍️' },
{ value: 'sonstiges', label: 'Sonstiges', icon: '📎' },
]
export interface SessionListItem {
id: string
name: string
filename: string
status: string
current_step: number
document_category?: DocumentCategory
doc_type?: string
parent_session_id?: string
document_group_id?: string
page_number?: number
is_ground_truth?: boolean
created_at: string
updated_at?: string
}
export interface SubSession {
id: string
name: string
box_index: number
current_step?: number
status?: string
}
export interface OrientationResult {
orientation_degrees: number
corrected: boolean
duration_seconds: number
}
export interface CropResult {
crop_applied: boolean
crop_rect?: { x: number; y: number; width: number; height: number }
crop_rect_pct?: { x: number; y: number; width: number; height: number }
original_size: { width: number; height: number }
cropped_size: { width: number; height: number }
detected_format?: string
format_confidence?: number
aspect_ratio?: number
border_fractions?: { top: number; bottom: number; left: number; right: number }
skipped?: boolean
duration_seconds?: number
}
export interface DeskewResult {
session_id: string
angle_hough: number
angle_word_alignment: number
angle_iterative?: number
angle_residual?: number
angle_textline?: number
angle_applied: number
method_used: 'hough' | 'word_alignment' | 'manual' | 'iterative' | 'two_pass' | 'three_pass' | 'manual_combined'
confidence: number
duration_seconds: number
deskewed_image_url: string
binarized_image_url: string
}
export interface DewarpDetection {
method: string
shear_degrees: number
confidence: number
}
export interface DewarpResult {
session_id: string
method_used: string
shear_degrees: number
confidence: number
duration_seconds: number
dewarped_image_url: string
detections?: DewarpDetection[]
}
export interface SessionInfo {
session_id: string
filename: string
name?: string
image_width: number
image_height: number
original_image_url: string
current_step?: number
document_category?: DocumentCategory
doc_type?: string
orientation_result?: OrientationResult
crop_result?: CropResult
deskew_result?: DeskewResult
dewarp_result?: DewarpResult
sub_sessions?: SubSession[]
parent_session_id?: string
box_index?: number
document_group_id?: string
page_number?: number
}
export interface StructureGraphic {
x: number; y: number; w: number; h: number
area: number; shape: string; color_name: string; color_hex: string; confidence: number
}
export interface ExcludeRegion {
x: number; y: number; w: number; h: number; label?: string
}
export interface StructureBox {
x: number; y: number; w: number; h: number
confidence: number; border_thickness: number
bg_color_name?: string; bg_color_hex?: string
}
export interface StructureZone {
index: number; zone_type: 'content' | 'box'
x: number; y: number; w: number; h: number
}
export interface DocLayoutRegion {
x: number; y: number; w: number; h: number
class_name: string; confidence: number
}
export interface StructureResult {
image_width: number; image_height: number
content_bounds: { x: number; y: number; w: number; h: number }
boxes: StructureBox[]; zones: StructureZone[]
graphics: StructureGraphic[]; exclude_regions?: ExcludeRegion[]
color_pixel_counts: Record<string, number>
has_words: boolean; word_count: number
border_ghosts_removed?: number; duration_seconds: number
layout_regions?: DocLayoutRegion[]
detection_method?: 'opencv' | 'ppdoclayout'
}
export interface WordBbox { x: number; y: number; w: number; h: number }
export interface OcrWordBox {
text: string; left: number; top: number; width: number; height: number; conf: number
color?: string; color_name?: string; recovered?: boolean
}
export interface ColumnMeta { index: number; type: string; x: number; width: number }
export interface GridCell {
cell_id: string; row_index: number; col_index: number; col_type: string
text: string; confidence: number; bbox_px: WordBbox; bbox_pct: WordBbox
ocr_engine?: string; is_bold?: boolean
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
word_boxes?: OcrWordBox[]
}
export interface WordEntry {
row_index: number; english: string; german: string; example: string
source_page?: string; marker?: string; confidence: number
bbox: WordBbox; bbox_en: WordBbox | null; bbox_de: WordBbox | null; bbox_ex: WordBbox | null
bbox_ref?: WordBbox | null; bbox_marker?: WordBbox | null
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
}
export interface GridResult {
cells: GridCell[]
grid_shape: { rows: number; cols: number; total_cells: number }
columns_used: ColumnMeta[]
layout: 'vocab' | 'generic'
image_width: number; image_height: number; duration_seconds: number
ocr_engine?: string; vocab_entries?: WordEntry[]; entries?: WordEntry[]; entry_count?: number
summary: {
total_cells: number; non_empty_cells: number; low_confidence: number
total_entries?: number; with_english?: number; with_german?: number
}
llm_review?: {
changes: { row_index: number; field: string; old: string; new: string }[]
model_used: string; duration_ms: number; entries_corrected: number
applied_count?: number; applied_at?: string
}
}
// --- Kombi V2 Pipeline ---
/**
* 11-step Kombi V2 pipeline.
* Each step has its own component file in components/ocr-kombi/.
*/
export const KOMBI_V2_STEPS: PipelineStep[] = [ export const KOMBI_V2_STEPS: PipelineStep[] = [
{ id: 'upload', name: 'Upload', icon: '📤', status: 'pending' }, { id: 'upload', name: 'Upload', icon: '📤', status: 'pending' },
{ id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' }, { id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
@@ -40,67 +216,43 @@ export const KOMBI_V2_STEPS: PipelineStep[] = [
{ id: 'structure', name: 'Strukturerkennung', icon: '🔍', status: 'pending' }, { id: 'structure', name: 'Strukturerkennung', icon: '🔍', status: 'pending' },
{ id: 'grid-build', name: 'Grid-Aufbau', icon: '🧱', status: 'pending' }, { id: 'grid-build', name: 'Grid-Aufbau', icon: '🧱', status: 'pending' },
{ id: 'grid-review', name: 'Grid-Review', icon: '📊', status: 'pending' }, { id: 'grid-review', name: 'Grid-Review', icon: '📊', status: 'pending' },
{ id: 'gutter-repair', name: 'Wortkorrektur', icon: '🩹', status: 'pending' },
{ id: 'box-review', name: 'Box-Review', icon: '📦', status: 'pending' },
{ id: 'ansicht', name: 'Ansicht', icon: '👁️', status: 'pending' },
{ id: 'ground-truth', name: 'Ground Truth', icon: '✅', status: 'pending' }, { id: 'ground-truth', name: 'Ground Truth', icon: '✅', status: 'pending' },
] ]
/** Map from Kombi V2 UI step index to DB step number */
export const KOMBI_V2_UI_TO_DB: Record<number, number> = { export const KOMBI_V2_UI_TO_DB: Record<number, number> = {
0: 1, // upload 0: 1, 1: 2, 2: 2, 3: 3, 4: 4, 5: 5, 6: 8, 7: 9, 8: 10, 9: 11, 10: 11, 11: 11, 12: 11, 13: 12,
1: 2, // orientation
2: 2, // page-split (same DB step as orientation)
3: 3, // deskew
4: 4, // dewarp
5: 5, // content-crop
6: 8, // ocr (word_result)
7: 9, // structure
8: 10, // grid-build
9: 11, // grid-review
10: 12, // ground-truth
} }
/** Map from DB step to Kombi V2 UI step index */
export function dbStepToKombiV2Ui(dbStep: number): number { export function dbStepToKombiV2Ui(dbStep: number): number {
if (dbStep <= 1) return 0 // upload if (dbStep <= 1) return 0
if (dbStep === 2) return 1 // orientation if (dbStep === 2) return 1
if (dbStep === 3) return 3 // deskew if (dbStep === 3) return 3
if (dbStep === 4) return 4 // dewarp if (dbStep === 4) return 4
if (dbStep === 5) return 5 // content-crop if (dbStep === 5) return 5
if (dbStep <= 8) return 6 // ocr if (dbStep <= 8) return 6
if (dbStep === 9) return 7 // structure if (dbStep === 9) return 7
if (dbStep === 10) return 8 // grid-build if (dbStep === 10) return 8
if (dbStep === 11) return 9 // grid-review if (dbStep === 11) return 9
return 10 // ground-truth return 13
} }
/** Document group: groups multiple sessions from a multi-page upload */
export interface DocumentGroup { export interface DocumentGroup {
group_id: string group_id: string; title: string; page_count: number; sessions: DocumentGroupSession[]
title: string
page_count: number
sessions: DocumentGroupSession[]
} }
export interface DocumentGroupSession { export interface DocumentGroupSession {
id: string id: string; name: string; page_number: number; current_step: number
name: string status: string; document_category?: DocumentCategory; created_at: string
page_number: number
current_step: number
status: string
document_category?: DocumentCategory
created_at: string
} }
/** Engine source for OCR transparency */
export type OcrEngineSource = 'both' | 'paddle_only' | 'tesseract_only' | 'conflict_paddle' | 'conflict_tesseract' export type OcrEngineSource = 'both' | 'paddle_only' | 'tesseract_only' | 'conflict_paddle' | 'conflict_tesseract'
export interface OcrTransparentWord { export interface OcrTransparentWord {
text: string text: string; left: number; top: number; width: number; height: number
left: number conf: number; engine_source: OcrEngineSource
top: number
width: number
height: number
conf: number
engine_source: OcrEngineSource
} }
export interface OcrTransparentResult { export interface OcrTransparentResult {
@@ -108,11 +260,7 @@ export interface OcrTransparentResult {
raw_paddle: { words: OcrTransparentWord[] } raw_paddle: { words: OcrTransparentWord[] }
merged: { words: OcrTransparentWord[] } merged: { words: OcrTransparentWord[] }
stats: { stats: {
total_words: number total_words: number; both_agree: number; paddle_only: number
both_agree: number tesseract_only: number; conflict_paddle_wins: number; conflict_tesseract_wins: number
paddle_only: number
tesseract_only: number
conflict_paddle_wins: number
conflict_tesseract_wins: number
} }
} }

View File

@@ -2,9 +2,8 @@
import { useCallback, useEffect, useState, useRef } from 'react' import { useCallback, useEffect, useState, useRef } from 'react'
import { useSearchParams } from 'next/navigation' import { useSearchParams } from 'next/navigation'
import type { PipelineStep, DocumentCategory } from './types' import type { PipelineStep, DocumentCategory, SessionListItem } from './types'
import { KOMBI_V2_STEPS, dbStepToKombiV2Ui } from './types' import { KOMBI_V2_STEPS, dbStepToKombiV2Ui } from './types'
import type { SubSession, SessionListItem } from '../ocr-pipeline/types'
export type { SessionListItem } export type { SessionListItem }
@@ -33,8 +32,7 @@ export function useKombiPipeline() {
const [loadingSessions, setLoadingSessions] = useState(true) const [loadingSessions, setLoadingSessions] = useState(true)
const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined) const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
const [isGroundTruth, setIsGroundTruth] = useState(false) const [isGroundTruth, setIsGroundTruth] = useState(false)
const [subSessions, setSubSessions] = useState<SubSession[]>([]) const [pageNumber, setPageNumber] = useState<number | null>(null)
const [parentSessionId, setParentSessionId] = useState<string | null>(null)
const [steps, setSteps] = useState<PipelineStep[]>(initSteps()) const [steps, setSteps] = useState<PipelineStep[]>(initSteps())
const searchParams = useSearchParams() const searchParams = useSearchParams()
@@ -115,7 +113,7 @@ export function useKombiPipeline() {
// ---- Open session ---- // ---- Open session ----
const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => { const openSession = useCallback(async (sid: string) => {
try { try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`) const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
if (!res.ok) return if (!res.ok) return
@@ -125,26 +123,19 @@ export function useKombiPipeline() {
setSessionName(data.name || data.filename || '') setSessionName(data.name || data.filename || '')
setActiveCategory(data.document_category || undefined) setActiveCategory(data.document_category || undefined)
setIsGroundTruth(!!data.ground_truth?.build_grid_reference) setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
setPageNumber(data.grid_editor_result?.page_number?.number ?? null)
// Sub-session handling
if (data.sub_sessions?.length > 0) {
setSubSessions(data.sub_sessions)
setParentSessionId(sid)
} else if (data.parent_session_id) {
setParentSessionId(data.parent_session_id)
} else if (!keepSubSessions) {
setSubSessions([])
setParentSessionId(null)
}
// Determine UI step from DB state // Determine UI step from DB state
const dbStep = data.current_step || 1 const dbStep = data.current_step || 1
const hasGrid = !!data.grid_editor_result const hasGrid = !!data.grid_editor_result
const hasStructure = !!data.structure_result const hasStructure = !!data.structure_result
const hasWords = !!data.word_result const hasWords = !!data.word_result
const hasGutterRepair = !!(data.ground_truth?.gutter_repair)
let uiStep: number let uiStep: number
if (hasGrid) { if (hasGrid && hasGutterRepair) {
uiStep = 10 // gutter-repair (already analysed)
} else if (hasGrid) {
uiStep = 9 // grid-review uiStep = 9 // grid-review
} else if (hasStructure) { } else if (hasStructure) {
uiStep = 8 // grid-build uiStep = 8 // grid-build
@@ -159,22 +150,10 @@ export function useKombiPipeline() {
uiStep = 1 uiStep = 1
} }
const skipIds: string[] = []
const isSubSession = !!data.parent_session_id
if (isSubSession && dbStep >= 5) {
skipIds.push('upload', 'orientation', 'page-split', 'deskew', 'dewarp', 'content-crop')
if (uiStep < 6) uiStep = 6
} else if (isSubSession && dbStep >= 2) {
skipIds.push('upload', 'orientation')
if (uiStep < 2) uiStep = 2
}
setSteps( setSteps(
KOMBI_V2_STEPS.map((s, i) => ({ KOMBI_V2_STEPS.map((s, i) => ({
...s, ...s,
status: skipIds.includes(s.id) status: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
? 'skipped'
: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
})), })),
) )
setCurrentStep(uiStep) setCurrentStep(uiStep)
@@ -226,8 +205,6 @@ export function useKombiPipeline() {
setSteps(initSteps()) setSteps(initSteps())
setCurrentStep(0) setCurrentStep(0)
setSessionId(null) setSessionId(null)
setSubSessions([])
setParentSessionId(null)
loadSessions() loadSessions()
return return
} }
@@ -249,8 +226,6 @@ export function useKombiPipeline() {
setSessionId(null) setSessionId(null)
setSessionName('') setSessionName('')
setCurrentStep(0) setCurrentStep(0)
setSubSessions([])
setParentSessionId(null)
setSteps(initSteps()) setSteps(initSteps())
}, []) }, [])
@@ -292,40 +267,6 @@ export function useKombiPipeline() {
} }
}, [sessionId]) }, [sessionId])
// ---- Orientation completion (checks for page-split sub-sessions) ----
const handleOrientationComplete = useCallback(async (sid: string) => {
setSessionId(sid)
loadSessions()
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
if (res.ok) {
const data = await res.json()
if (data.sub_sessions?.length > 0) {
const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
id: s.id,
name: s.name,
box_index: s.box_index,
current_step: s.current_step,
}))
setSubSessions(subs)
setParentSessionId(sid)
openSession(subs[0].id, true)
return
}
}
} catch (e) {
console.error('Failed to check for sub-sessions:', e)
}
handleNext()
}, [loadSessions, openSession, handleNext])
const handleSessionChange = useCallback((newSessionId: string) => {
openSession(newSessionId, true)
}, [openSession])
return { return {
// State // State
currentStep, currentStep,
@@ -335,8 +276,7 @@ export function useKombiPipeline() {
loadingSessions, loadingSessions,
activeCategory, activeCategory,
isGroundTruth, isGroundTruth,
subSessions, pageNumber,
parentSessionId,
steps, steps,
gridSaveRef, gridSaveRef,
// Computed // Computed
@@ -351,11 +291,7 @@ export function useKombiPipeline() {
deleteSession, deleteSession,
renameSession, renameSession,
updateCategory, updateCategory,
handleOrientationComplete,
handleSessionChange,
setSessionId, setSessionId,
setSubSessions,
setParentSessionId,
setSessionName, setSessionName,
setIsGroundTruth, setIsGroundTruth,
} }

View File

@@ -0,0 +1,134 @@
'use client'
/**
* Export tab: export training data in various formats.
*/
import { useState } from 'react'
import Link from 'next/link'
import { API_BASE } from '../constants'
import type { OCRSession, OCRStats } from '../types'
interface ExportTabProps {
sessions: OCRSession[]
selectedSession: string | null
setSelectedSession: (id: string | null) => void
stats: OCRStats | null
setError: (error: string | null) => void
}
export function ExportTab({
sessions,
selectedSession,
setSelectedSession,
stats,
setError,
}: ExportTabProps) {
const [exportFormat, setExportFormat] = useState<'generic' | 'trocr' | 'llama_vision'>('generic')
const [exporting, setExporting] = useState(false)
const [exportResult, setExportResult] = useState<{
exported_count: number
batch_id: string
samples?: Array<Record<string, unknown>>
} | null>(null)
const handleExport = async () => {
setExporting(true)
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/export`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
export_format: exportFormat,
session_id: selectedSession,
}),
})
if (res.ok) {
const data = await res.json()
setExportResult(data)
} else {
setError('Export fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler')
} finally {
setExporting(false)
}
}
return (
<div className="space-y-6">
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Training-Daten exportieren</h3>
<div className="space-y-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Export-Format</label>
<select
value={exportFormat}
onChange={(e) => setExportFormat(e.target.value as typeof exportFormat)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="generic">Generic JSON</option>
<option value="trocr">TrOCR Fine-Tuning</option>
<option value="llama_vision">Llama Vision Fine-Tuning</option>
</select>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Session (optional)</label>
<select
value={selectedSession || ''}
onChange={(e) => setSelectedSession(e.target.value || null)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="">Alle Sessions</option>
{sessions.map((session) => (
<option key={session.id} value={session.id}>{session.name}</option>
))}
</select>
</div>
<button
onClick={handleExport}
disabled={exporting || (stats?.exportable_items || 0) === 0}
className="w-full px-4 py-2 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50"
>
{exporting ? 'Exportiere...' : `${stats?.exportable_items || 0} Samples exportieren`}
</button>
{/* Cross-Link to Magic Help for TrOCR Fine-Tuning */}
{exportFormat === 'trocr' && (stats?.exportable_items || 0) > 0 && (
<Link
href="/ai/magic-help?source=ocr-labeling"
className="w-full mt-3 px-4 py-2 bg-purple-100 text-purple-700 border border-purple-300 rounded-lg hover:bg-purple-200 flex items-center justify-center gap-2 transition-colors"
>
<span></span>
Mit Magic Help testen & fine-tunen
</Link>
)}
</div>
</div>
{exportResult && (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Export-Ergebnis</h3>
<div className="bg-green-50 border border-green-200 rounded-lg p-4 mb-4">
<p className="text-green-800">
{exportResult.exported_count} Samples erfolgreich exportiert
</p>
<p className="text-sm text-green-600">
Batch: {exportResult.batch_id}
</p>
</div>
<div className="bg-slate-50 p-4 rounded-lg overflow-auto max-h-64">
<pre className="text-xs">{JSON.stringify(exportResult.samples?.slice(0, 3), null, 2)}</pre>
{(exportResult.samples?.length || 0) > 3 && (
<p className="text-slate-500 mt-2">... und {(exportResult.samples?.length ?? 0) - 3} weitere</p>
)}
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,197 @@
'use client'
/**
* Labeling tab: image viewer, OCR text, correction input, and queue preview.
*/
import { API_BASE } from '../constants'
import type { OCRItem } from '../types'
interface LabelingTabProps {
queue: OCRItem[]
currentItem: OCRItem | null
currentIndex: number
correctedText: string
setCorrectedText: (text: string) => void
goToNext: () => void
goToPrev: () => void
selectQueueItem: (idx: number) => void
confirmItem: () => void
correctItem: () => void
skipItem: () => void
}
export function LabelingTab({
queue,
currentItem,
currentIndex,
correctedText,
setCorrectedText,
goToNext,
goToPrev,
selectQueueItem,
confirmItem,
correctItem,
skipItem,
}: LabelingTabProps) {
return (
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
{/* Left: Image Viewer */}
<div className="lg:col-span-2 bg-white rounded-lg shadow p-4">
<div className="flex items-center justify-between mb-4">
<h3 className="text-lg font-semibold">Bild</h3>
<div className="flex items-center gap-2">
<button
onClick={goToPrev}
disabled={currentIndex === 0}
className="p-2 rounded hover:bg-slate-100 disabled:opacity-50"
title="Zurueck (Pfeiltaste links)"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
</svg>
</button>
<span className="text-sm text-slate-600">
{currentIndex + 1} / {queue.length}
</span>
<button
onClick={goToNext}
disabled={currentIndex >= queue.length - 1}
className="p-2 rounded hover:bg-slate-100 disabled:opacity-50"
title="Weiter (Pfeiltaste rechts)"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
</svg>
</button>
</div>
</div>
{currentItem ? (
<div className="relative bg-slate-100 rounded-lg overflow-hidden" style={{ minHeight: '400px' }}>
<img
src={currentItem.image_url || `${API_BASE}${currentItem.image_path}`}
alt="OCR Bild"
className="w-full h-auto max-h-[600px] object-contain"
onError={(e) => {
const target = e.target as HTMLImageElement
target.style.display = 'none'
}}
/>
</div>
) : (
<div className="flex items-center justify-center h-64 bg-slate-100 rounded-lg">
<p className="text-slate-500">Keine Bilder in der Warteschlange</p>
</div>
)}
</div>
{/* Right: OCR Text & Actions */}
<div className="bg-white rounded-lg shadow p-4">
<div className="space-y-4">
{/* OCR Result */}
<div>
<div className="flex items-center justify-between mb-2">
<h3 className="text-lg font-semibold">OCR-Ergebnis</h3>
{currentItem?.ocr_confidence && (
<span className={`text-sm px-2 py-1 rounded ${
currentItem.ocr_confidence > 0.8
? 'bg-green-100 text-green-800'
: currentItem.ocr_confidence > 0.5
? 'bg-yellow-100 text-yellow-800'
: 'bg-red-100 text-red-800'
}`}>
{Math.round(currentItem.ocr_confidence * 100)}% Konfidenz
</span>
)}
</div>
<div className="bg-slate-50 p-3 rounded-lg min-h-[100px] text-sm">
{currentItem?.ocr_text || <span className="text-slate-400">Kein OCR-Text</span>}
</div>
</div>
{/* Correction Input */}
<div>
<h3 className="text-lg font-semibold mb-2">Korrektur</h3>
<textarea
value={correctedText}
onChange={(e) => setCorrectedText(e.target.value)}
placeholder="Korrigierter Text..."
className="w-full h-32 p-3 border border-slate-200 rounded-lg focus:ring-2 focus:ring-primary-500 focus:border-transparent"
/>
</div>
{/* Actions */}
<div className="flex flex-col gap-2">
<button
onClick={confirmItem}
disabled={!currentItem}
className="w-full px-4 py-3 bg-green-600 text-white rounded-lg hover:bg-green-700 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
</svg>
Korrekt (Enter)
</button>
<button
onClick={correctItem}
disabled={!currentItem || !correctedText.trim() || correctedText === currentItem?.ocr_text}
className="w-full px-4 py-3 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
Korrektur speichern
</button>
<button
onClick={skipItem}
disabled={!currentItem}
className="w-full px-4 py-2 bg-slate-200 text-slate-700 rounded-lg hover:bg-slate-300 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 5l7 7-7 7M5 5l7 7-7 7" />
</svg>
Ueberspringen (S)
</button>
</div>
{/* Keyboard Shortcuts */}
<div className="text-xs text-slate-500 mt-4">
<p className="font-medium mb-1">Tastaturkuerzel:</p>
<p>Enter = Bestaetigen | S = Ueberspringen</p>
<p>Pfeiltasten = Navigation</p>
</div>
</div>
</div>
{/* Bottom: Queue Preview */}
<div className="lg:col-span-3 bg-white rounded-lg shadow p-4">
<h3 className="text-lg font-semibold mb-4">Warteschlange ({queue.length} Items)</h3>
<div className="flex gap-2 overflow-x-auto pb-2">
{queue.slice(0, 10).map((item, idx) => (
<button
key={item.id}
onClick={() => selectQueueItem(idx)}
className={`flex-shrink-0 w-24 h-24 rounded-lg overflow-hidden border-2 ${
idx === currentIndex
? 'border-primary-500'
: 'border-transparent hover:border-slate-300'
}`}
>
<img
src={item.image_url || `${API_BASE}${item.image_path}`}
alt=""
className="w-full h-full object-cover"
/>
</button>
))}
{queue.length > 10 && (
<div className="flex-shrink-0 w-24 h-24 rounded-lg bg-slate-100 flex items-center justify-center text-slate-500">
+{queue.length - 10} mehr
</div>
)}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,166 @@
'use client'
/**
* Sessions tab: create new sessions and list existing ones.
*/
import { useState } from 'react'
import { API_BASE } from '../constants'
import type { OCRSession, CreateSessionRequest, OCRModel } from '../types'
interface SessionsTabProps {
sessions: OCRSession[]
selectedSession: string | null
setSelectedSession: (id: string | null) => void
fetchSessions: () => Promise<void>
setError: (error: string | null) => void
}
export function SessionsTab({
sessions,
selectedSession,
setSelectedSession,
fetchSessions,
setError,
}: SessionsTabProps) {
const [newSession, setNewSession] = useState<CreateSessionRequest>({
name: '',
source_type: 'klausur',
description: '',
ocr_model: 'llama3.2-vision:11b',
})
const createSession = async () => {
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(newSession),
})
if (res.ok) {
setNewSession({ name: '', source_type: 'klausur', description: '', ocr_model: 'llama3.2-vision:11b' })
fetchSessions()
} else {
setError('Session erstellen fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler')
}
}
return (
<div className="space-y-6">
{/* Create Session */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Neue Session erstellen</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Name</label>
<input
type="text"
value={newSession.name}
onChange={(e) => setNewSession(prev => ({ ...prev, name: e.target.value }))}
placeholder="z.B. Mathe Klausur Q1 2025"
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Typ</label>
<select
value={newSession.source_type}
onChange={(e) => setNewSession(prev => ({ ...prev, source_type: e.target.value as 'klausur' | 'handwriting_sample' | 'scan' }))}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="klausur">Klausur</option>
<option value="handwriting_sample">Handschriftprobe</option>
<option value="scan">Scan</option>
</select>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">OCR Modell</label>
<select
value={newSession.ocr_model}
onChange={(e) => setNewSession(prev => ({ ...prev, ocr_model: e.target.value as OCRModel }))}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="llama3.2-vision:11b">llama3.2-vision:11b - Vision LLM (Standard)</option>
<option value="trocr">TrOCR - Microsoft Transformer (schnell)</option>
<option value="paddleocr">PaddleOCR + LLM (4x schneller)</option>
<option value="donut">Donut - Document Understanding (strukturiert)</option>
</select>
<p className="mt-1 text-xs text-slate-500">
{newSession.ocr_model === 'paddleocr' && 'PaddleOCR erkennt Text schnell, LLM strukturiert die Ergebnisse.'}
{newSession.ocr_model === 'donut' && 'Speziell fuer Dokumente mit Tabellen und Formularen.'}
{newSession.ocr_model === 'trocr' && 'Schnelles Transformer-Modell fuer gedruckten Text.'}
{newSession.ocr_model === 'llama3.2-vision:11b' && 'Beste Qualitaet bei Handschrift, aber langsamer.'}
</p>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Beschreibung</label>
<input
type="text"
value={newSession.description}
onChange={(e) => setNewSession(prev => ({ ...prev, description: e.target.value }))}
placeholder="Optional..."
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
/>
</div>
</div>
<button
onClick={createSession}
disabled={!newSession.name}
className="mt-4 px-4 py-2 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50"
>
Session erstellen
</button>
</div>
{/* Sessions List */}
<div className="bg-white rounded-lg shadow">
<div className="px-6 py-4 border-b border-slate-200">
<h3 className="text-lg font-semibold">Sessions ({sessions.length})</h3>
</div>
<div className="divide-y divide-slate-200">
{sessions.map((session) => (
<div
key={session.id}
className={`p-4 hover:bg-slate-50 cursor-pointer ${
selectedSession === session.id ? 'bg-primary-50 border-l-4 border-primary-500' : ''
}`}
onClick={() => setSelectedSession(session.id === selectedSession ? null : session.id)}
>
<div className="flex items-center justify-between">
<div>
<h4 className="font-medium">{session.name}</h4>
<p className="text-sm text-slate-500">
{session.source_type} | {session.ocr_model}
</p>
</div>
<div className="text-right">
<p className="text-sm font-medium">
{session.labeled_items}/{session.total_items} gelabelt
</p>
<div className="w-32 bg-slate-200 rounded-full h-2 mt-1">
<div
className="bg-primary-600 rounded-full h-2"
style={{
width: `${session.total_items > 0 ? (session.labeled_items / session.total_items) * 100 : 0}%`
}}
/>
</div>
</div>
</div>
{session.description && (
<p className="text-sm text-slate-600 mt-2">{session.description}</p>
)}
</div>
))}
{sessions.length === 0 && (
<p className="p-4 text-slate-500 text-center">Keine Sessions vorhanden</p>
)}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,76 @@
'use client'
/**
* Stats tab: global statistics, detailed breakdown, and progress bar.
*/
import type { OCRStats } from '../types'
interface StatsTabProps {
stats: OCRStats | null
}
export function StatsTab({ stats }: StatsTabProps) {
return (
<div className="space-y-6">
{/* Global Stats */}
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Gesamt Items</h4>
<p className="text-3xl font-bold mt-2">{stats?.total_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Gelabelt</h4>
<p className="text-3xl font-bold mt-2 text-green-600">{stats?.labeled_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Ausstehend</h4>
<p className="text-3xl font-bold mt-2 text-yellow-600">{stats?.pending_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">OCR-Genauigkeit</h4>
<p className="text-3xl font-bold mt-2">{stats?.accuracy_rate || 0}%</p>
</div>
</div>
{/* Detailed Stats */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Details</h3>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div>
<p className="text-sm text-slate-500">Bestaetigt</p>
<p className="text-xl font-semibold text-green-600">{stats?.confirmed_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Korrigiert</p>
<p className="text-xl font-semibold text-primary-600">{stats?.corrected_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Exportierbar</p>
<p className="text-xl font-semibold">{stats?.exportable_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Durchschn. Label-Zeit</p>
<p className="text-xl font-semibold">{stats?.avg_label_time_seconds || 0}s</p>
</div>
</div>
</div>
{/* Progress Bar */}
{stats?.total_items ? (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Fortschritt</h3>
<div className="w-full bg-slate-200 rounded-full h-4">
<div
className="bg-primary-600 rounded-full h-4 transition-all"
style={{ width: `${(stats.labeled_items / stats.total_items) * 100}%` }}
/>
</div>
<p className="text-sm text-slate-500 mt-2">
{Math.round((stats.labeled_items / stats.total_items) * 100)}% abgeschlossen
</p>
</div>
) : null}
</div>
)
}

View File

@@ -0,0 +1,160 @@
'use client'
/**
* Upload tab: session selection, drag-and-drop upload, and upload results.
*/
import { useState, useRef } from 'react'
import { API_BASE } from '../constants'
import type { OCRSession, UploadResult } from '../types'
interface UploadTabProps {
sessions: OCRSession[]
selectedSession: string | null
setSelectedSession: (id: string | null) => void
fetchQueue: () => Promise<void>
fetchStats: () => Promise<void>
setError: (error: string | null) => void
}
export function UploadTab({
sessions,
selectedSession,
setSelectedSession,
fetchQueue,
fetchStats,
setError,
}: UploadTabProps) {
const [uploading, setUploading] = useState(false)
const [uploadResults, setUploadResults] = useState<UploadResult[]>([])
const fileInputRef = useRef<HTMLInputElement>(null)
const handleUpload = async (files: FileList) => {
if (!selectedSession) {
setError('Bitte zuerst eine Session auswaehlen')
return
}
setUploading(true)
const formData = new FormData()
Array.from(files).forEach(file => formData.append('files', file))
formData.append('run_ocr', 'true')
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions/${selectedSession}/upload`, {
method: 'POST',
body: formData,
})
if (res.ok) {
const data = await res.json()
setUploadResults(data.items || [])
fetchQueue()
fetchStats()
} else {
setError('Upload fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler beim Upload')
} finally {
setUploading(false)
}
}
return (
<div className="space-y-6">
{/* Session Selection */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Session auswaehlen</h3>
<select
value={selectedSession || ''}
onChange={(e) => setSelectedSession(e.target.value || null)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="">-- Session waehlen --</option>
{sessions.map((session) => (
<option key={session.id} value={session.id}>
{session.name} ({session.total_items} Items)
</option>
))}
</select>
</div>
{/* Upload Area */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Bilder hochladen</h3>
<div
className={`border-2 border-dashed rounded-lg p-8 text-center ${
selectedSession ? 'border-slate-300 hover:border-primary-500' : 'border-slate-200 opacity-50'
}`}
onDragOver={(e) => {
e.preventDefault()
e.currentTarget.classList.add('border-primary-500', 'bg-primary-50')
}}
onDragLeave={(e) => {
e.currentTarget.classList.remove('border-primary-500', 'bg-primary-50')
}}
onDrop={(e) => {
e.preventDefault()
e.currentTarget.classList.remove('border-primary-500', 'bg-primary-50')
if (e.dataTransfer.files.length > 0) {
handleUpload(e.dataTransfer.files)
}
}}
>
<input
ref={fileInputRef}
type="file"
multiple
accept="image/png,image/jpeg,image/jpg"
onChange={(e) => e.target.files && handleUpload(e.target.files)}
className="hidden"
disabled={!selectedSession}
/>
{uploading ? (
<div className="flex flex-col items-center gap-2">
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-primary-600" />
<p>Hochladen & OCR ausfuehren...</p>
</div>
) : (
<>
<svg className="w-12 h-12 text-slate-400 mx-auto mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16l4.586-4.586a2 2 0 012.828 0L16 16m-2-2l1.586-1.586a2 2 0 012.828 0L20 14m-6-6h.01M6 20h12a2 2 0 002-2V6a2 2 0 00-2-2H6a2 2 0 00-2 2v12a2 2 0 002 2z" />
</svg>
<p className="text-slate-600 mb-2">
Bilder hierher ziehen oder{' '}
<button
onClick={() => fileInputRef.current?.click()}
disabled={!selectedSession}
className="text-primary-600 hover:underline"
>
auswaehlen
</button>
</p>
<p className="text-sm text-slate-500">PNG, JPG (max. 10MB pro Bild)</p>
</>
)}
</div>
</div>
{/* Upload Results */}
{uploadResults.length > 0 && (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Upload-Ergebnisse ({uploadResults.length})</h3>
<div className="space-y-2">
{uploadResults.map((result) => (
<div key={result.id} className="flex items-center justify-between p-2 bg-slate-50 rounded">
<span className="text-sm">{result.filename}</span>
<span className={`text-xs px-2 py-1 rounded ${
result.ocr_text ? 'bg-green-100 text-green-800' : 'bg-yellow-100 text-yellow-800'
}`}>
{result.ocr_text ? `OCR OK (${Math.round((result.ocr_confidence || 0) * 100)}%)` : 'Kein OCR'}
</span>
</div>
))}
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,59 @@
/**
* Constants and tab definitions for OCR Labeling page.
*/
import type { JSX } from 'react'
// API Base URL for klausur-service
export const API_BASE = process.env.NEXT_PUBLIC_KLAUSUR_SERVICE_URL || 'http://localhost:8086'
// Tab definitions
export type TabId = 'labeling' | 'sessions' | 'upload' | 'stats' | 'export'
export const tabs: { id: TabId; name: string; icon: JSX.Element }[] = [
{
id: 'labeling',
name: 'Labeling',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
),
},
{
id: 'sessions',
name: 'Sessions',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
</svg>
),
},
{
id: 'upload',
name: 'Upload',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-8l-4-4m0 0L8 8m4-4v12" />
</svg>
),
},
{
id: 'stats',
name: 'Statistiken',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
</svg>
),
},
{
id: 'export',
name: 'Export',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-4l-4 4m0 0l-4-4m4 4V4" />
</svg>
),
},
]

View File

@@ -10,903 +10,20 @@
* OCR-Labeling → RAG Pipeline → Daten & RAG * OCR-Labeling → RAG Pipeline → Daten & RAG
*/ */
import { useState, useEffect, useCallback, useRef } from 'react' import { useState } from 'react'
import Link from 'next/link'
import { PagePurpose } from '@/components/common/PagePurpose' import { PagePurpose } from '@/components/common/PagePurpose'
import { AIModuleSidebarResponsive } from '@/components/ai/AIModuleSidebar' import { AIModuleSidebarResponsive } from '@/components/ai/AIModuleSidebar'
import type { import { tabs, type TabId } from './constants'
OCRSession, import { useOcrLabeling } from './useOcrLabeling'
OCRItem, import { LabelingTab } from './_components/LabelingTab'
OCRStats, import { SessionsTab } from './_components/SessionsTab'
TrainingSample, import { UploadTab } from './_components/UploadTab'
CreateSessionRequest, import { StatsTab } from './_components/StatsTab'
OCRModel, import { ExportTab } from './_components/ExportTab'
} from './types'
// API Base URL for klausur-service
const API_BASE = process.env.NEXT_PUBLIC_KLAUSUR_SERVICE_URL || 'http://localhost:8086'
// Tab definitions
type TabId = 'labeling' | 'sessions' | 'upload' | 'stats' | 'export'
const tabs: { id: TabId; name: string; icon: JSX.Element }[] = [
{
id: 'labeling',
name: 'Labeling',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
),
},
{
id: 'sessions',
name: 'Sessions',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
</svg>
),
},
{
id: 'upload',
name: 'Upload',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-8l-4-4m0 0L8 8m4-4v12" />
</svg>
),
},
{
id: 'stats',
name: 'Statistiken',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
</svg>
),
},
{
id: 'export',
name: 'Export',
icon: (
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-4l-4 4m0 0l-4-4m4 4V4" />
</svg>
),
},
]
export default function OCRLabelingPage() { export default function OCRLabelingPage() {
const [activeTab, setActiveTab] = useState<TabId>('labeling') const [activeTab, setActiveTab] = useState<TabId>('labeling')
const [sessions, setSessions] = useState<OCRSession[]>([]) const hook = useOcrLabeling()
const [selectedSession, setSelectedSession] = useState<string | null>(null)
const [queue, setQueue] = useState<OCRItem[]>([])
const [currentItem, setCurrentItem] = useState<OCRItem | null>(null)
const [currentIndex, setCurrentIndex] = useState(0)
const [stats, setStats] = useState<OCRStats | null>(null)
const [loading, setLoading] = useState(true)
const [error, setError] = useState<string | null>(null)
const [correctedText, setCorrectedText] = useState('')
const [labelStartTime, setLabelStartTime] = useState<number | null>(null)
// Fetch sessions
const fetchSessions = useCallback(async () => {
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions`)
if (res.ok) {
const data = await res.json()
setSessions(data)
}
} catch (err) {
console.error('Failed to fetch sessions:', err)
}
}, [])
// Fetch queue
const fetchQueue = useCallback(async () => {
try {
const url = selectedSession
? `${API_BASE}/api/v1/ocr-label/queue?session_id=${selectedSession}&limit=20`
: `${API_BASE}/api/v1/ocr-label/queue?limit=20`
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setQueue(data)
if (data.length > 0 && !currentItem) {
setCurrentItem(data[0])
setCurrentIndex(0)
setCorrectedText(data[0].ocr_text || '')
setLabelStartTime(Date.now())
}
}
} catch (err) {
console.error('Failed to fetch queue:', err)
}
}, [selectedSession, currentItem])
// Fetch stats
const fetchStats = useCallback(async () => {
try {
const url = selectedSession
? `${API_BASE}/api/v1/ocr-label/stats?session_id=${selectedSession}`
: `${API_BASE}/api/v1/ocr-label/stats`
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setStats(data)
}
} catch (err) {
console.error('Failed to fetch stats:', err)
}
}, [selectedSession])
// Initial data load
useEffect(() => {
const loadData = async () => {
setLoading(true)
await Promise.all([fetchSessions(), fetchQueue(), fetchStats()])
setLoading(false)
}
loadData()
}, [fetchSessions, fetchQueue, fetchStats])
// Refresh queue when session changes
useEffect(() => {
setCurrentItem(null)
setCurrentIndex(0)
fetchQueue()
fetchStats()
}, [selectedSession, fetchQueue, fetchStats])
// Navigate to next item
const goToNext = () => {
if (currentIndex < queue.length - 1) {
const nextIndex = currentIndex + 1
setCurrentIndex(nextIndex)
setCurrentItem(queue[nextIndex])
setCorrectedText(queue[nextIndex].ocr_text || '')
setLabelStartTime(Date.now())
} else {
// Refresh queue
fetchQueue()
}
}
// Navigate to previous item
const goToPrev = () => {
if (currentIndex > 0) {
const prevIndex = currentIndex - 1
setCurrentIndex(prevIndex)
setCurrentItem(queue[prevIndex])
setCorrectedText(queue[prevIndex].ocr_text || '')
setLabelStartTime(Date.now())
}
}
// Calculate label time
const getLabelTime = (): number | undefined => {
if (!labelStartTime) return undefined
return Math.round((Date.now() - labelStartTime) / 1000)
}
// Confirm item
const confirmItem = async () => {
if (!currentItem) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/confirm`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
item_id: currentItem.id,
label_time_seconds: getLabelTime(),
}),
})
if (res.ok) {
// Remove from queue and go to next
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Bestaetigung fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler')
}
}
// Correct item
const correctItem = async () => {
if (!currentItem || !correctedText.trim()) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/correct`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
item_id: currentItem.id,
ground_truth: correctedText.trim(),
label_time_seconds: getLabelTime(),
}),
})
if (res.ok) {
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Korrektur fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler')
}
}
// Skip item
const skipItem = async () => {
if (!currentItem) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/skip`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ item_id: currentItem.id }),
})
if (res.ok) {
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Ueberspringen fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler')
}
}
// Keyboard shortcuts
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
// Only handle if not in text input
if (e.target instanceof HTMLTextAreaElement) return
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault()
confirmItem()
} else if (e.key === 'ArrowRight') {
goToNext()
} else if (e.key === 'ArrowLeft') {
goToPrev()
} else if (e.key === 's' && !e.ctrlKey && !e.metaKey) {
skipItem()
}
}
window.addEventListener('keydown', handleKeyDown)
return () => window.removeEventListener('keydown', handleKeyDown)
}, [currentItem, correctedText])
// Render Labeling Tab
const renderLabelingTab = () => (
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
{/* Left: Image Viewer */}
<div className="lg:col-span-2 bg-white rounded-lg shadow p-4">
<div className="flex items-center justify-between mb-4">
<h3 className="text-lg font-semibold">Bild</h3>
<div className="flex items-center gap-2">
<button
onClick={goToPrev}
disabled={currentIndex === 0}
className="p-2 rounded hover:bg-slate-100 disabled:opacity-50"
title="Zurueck (Pfeiltaste links)"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
</svg>
</button>
<span className="text-sm text-slate-600">
{currentIndex + 1} / {queue.length}
</span>
<button
onClick={goToNext}
disabled={currentIndex >= queue.length - 1}
className="p-2 rounded hover:bg-slate-100 disabled:opacity-50"
title="Weiter (Pfeiltaste rechts)"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
</svg>
</button>
</div>
</div>
{currentItem ? (
<div className="relative bg-slate-100 rounded-lg overflow-hidden" style={{ minHeight: '400px' }}>
<img
src={currentItem.image_url || `${API_BASE}${currentItem.image_path}`}
alt="OCR Bild"
className="w-full h-auto max-h-[600px] object-contain"
onError={(e) => {
// Fallback if image fails to load
const target = e.target as HTMLImageElement
target.style.display = 'none'
}}
/>
</div>
) : (
<div className="flex items-center justify-center h-64 bg-slate-100 rounded-lg">
<p className="text-slate-500">Keine Bilder in der Warteschlange</p>
</div>
)}
</div>
{/* Right: OCR Text & Actions */}
<div className="bg-white rounded-lg shadow p-4">
<div className="space-y-4">
{/* OCR Result */}
<div>
<div className="flex items-center justify-between mb-2">
<h3 className="text-lg font-semibold">OCR-Ergebnis</h3>
{currentItem?.ocr_confidence && (
<span className={`text-sm px-2 py-1 rounded ${
currentItem.ocr_confidence > 0.8
? 'bg-green-100 text-green-800'
: currentItem.ocr_confidence > 0.5
? 'bg-yellow-100 text-yellow-800'
: 'bg-red-100 text-red-800'
}`}>
{Math.round(currentItem.ocr_confidence * 100)}% Konfidenz
</span>
)}
</div>
<div className="bg-slate-50 p-3 rounded-lg min-h-[100px] text-sm">
{currentItem?.ocr_text || <span className="text-slate-400">Kein OCR-Text</span>}
</div>
</div>
{/* Correction Input */}
<div>
<h3 className="text-lg font-semibold mb-2">Korrektur</h3>
<textarea
value={correctedText}
onChange={(e) => setCorrectedText(e.target.value)}
placeholder="Korrigierter Text..."
className="w-full h-32 p-3 border border-slate-200 rounded-lg focus:ring-2 focus:ring-primary-500 focus:border-transparent"
/>
</div>
{/* Actions */}
<div className="flex flex-col gap-2">
<button
onClick={confirmItem}
disabled={!currentItem}
className="w-full px-4 py-3 bg-green-600 text-white rounded-lg hover:bg-green-700 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
</svg>
Korrekt (Enter)
</button>
<button
onClick={correctItem}
disabled={!currentItem || !correctedText.trim() || correctedText === currentItem?.ocr_text}
className="w-full px-4 py-3 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
Korrektur speichern
</button>
<button
onClick={skipItem}
disabled={!currentItem}
className="w-full px-4 py-2 bg-slate-200 text-slate-700 rounded-lg hover:bg-slate-300 disabled:opacity-50 flex items-center justify-center gap-2"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 5l7 7-7 7M5 5l7 7-7 7" />
</svg>
Ueberspringen (S)
</button>
</div>
{/* Keyboard Shortcuts */}
<div className="text-xs text-slate-500 mt-4">
<p className="font-medium mb-1">Tastaturkuerzel:</p>
<p>Enter = Bestaetigen | S = Ueberspringen</p>
<p>Pfeiltasten = Navigation</p>
</div>
</div>
</div>
{/* Bottom: Queue Preview */}
<div className="lg:col-span-3 bg-white rounded-lg shadow p-4">
<h3 className="text-lg font-semibold mb-4">Warteschlange ({queue.length} Items)</h3>
<div className="flex gap-2 overflow-x-auto pb-2">
{queue.slice(0, 10).map((item, idx) => (
<button
key={item.id}
onClick={() => {
setCurrentIndex(idx)
setCurrentItem(item)
setCorrectedText(item.ocr_text || '')
setLabelStartTime(Date.now())
}}
className={`flex-shrink-0 w-24 h-24 rounded-lg overflow-hidden border-2 ${
idx === currentIndex
? 'border-primary-500'
: 'border-transparent hover:border-slate-300'
}`}
>
<img
src={item.image_url || `${API_BASE}${item.image_path}`}
alt=""
className="w-full h-full object-cover"
/>
</button>
))}
{queue.length > 10 && (
<div className="flex-shrink-0 w-24 h-24 rounded-lg bg-slate-100 flex items-center justify-center text-slate-500">
+{queue.length - 10} mehr
</div>
)}
</div>
</div>
</div>
)
// Render Sessions Tab
const renderSessionsTab = () => {
const [newSession, setNewSession] = useState<CreateSessionRequest>({
name: '',
source_type: 'klausur',
description: '',
ocr_model: 'llama3.2-vision:11b',
})
const createSession = async () => {
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(newSession),
})
if (res.ok) {
setNewSession({ name: '', source_type: 'klausur', description: '', ocr_model: 'llama3.2-vision:11b' })
fetchSessions()
} else {
setError('Session erstellen fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler')
}
}
return (
<div className="space-y-6">
{/* Create Session */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Neue Session erstellen</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Name</label>
<input
type="text"
value={newSession.name}
onChange={(e) => setNewSession(prev => ({ ...prev, name: e.target.value }))}
placeholder="z.B. Mathe Klausur Q1 2025"
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Typ</label>
<select
value={newSession.source_type}
onChange={(e) => setNewSession(prev => ({ ...prev, source_type: e.target.value as 'klausur' | 'handwriting_sample' | 'scan' }))}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="klausur">Klausur</option>
<option value="handwriting_sample">Handschriftprobe</option>
<option value="scan">Scan</option>
</select>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">OCR Modell</label>
<select
value={newSession.ocr_model}
onChange={(e) => setNewSession(prev => ({ ...prev, ocr_model: e.target.value as OCRModel }))}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="llama3.2-vision:11b">llama3.2-vision:11b - Vision LLM (Standard)</option>
<option value="trocr">TrOCR - Microsoft Transformer (schnell)</option>
<option value="paddleocr">PaddleOCR + LLM (4x schneller)</option>
<option value="donut">Donut - Document Understanding (strukturiert)</option>
</select>
<p className="mt-1 text-xs text-slate-500">
{newSession.ocr_model === 'paddleocr' && 'PaddleOCR erkennt Text schnell, LLM strukturiert die Ergebnisse.'}
{newSession.ocr_model === 'donut' && 'Speziell fuer Dokumente mit Tabellen und Formularen.'}
{newSession.ocr_model === 'trocr' && 'Schnelles Transformer-Modell fuer gedruckten Text.'}
{newSession.ocr_model === 'llama3.2-vision:11b' && 'Beste Qualitaet bei Handschrift, aber langsamer.'}
</p>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Beschreibung</label>
<input
type="text"
value={newSession.description}
onChange={(e) => setNewSession(prev => ({ ...prev, description: e.target.value }))}
placeholder="Optional..."
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
/>
</div>
</div>
<button
onClick={createSession}
disabled={!newSession.name}
className="mt-4 px-4 py-2 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50"
>
Session erstellen
</button>
</div>
{/* Sessions List */}
<div className="bg-white rounded-lg shadow">
<div className="px-6 py-4 border-b border-slate-200">
<h3 className="text-lg font-semibold">Sessions ({sessions.length})</h3>
</div>
<div className="divide-y divide-slate-200">
{sessions.map((session) => (
<div
key={session.id}
className={`p-4 hover:bg-slate-50 cursor-pointer ${
selectedSession === session.id ? 'bg-primary-50 border-l-4 border-primary-500' : ''
}`}
onClick={() => setSelectedSession(session.id === selectedSession ? null : session.id)}
>
<div className="flex items-center justify-between">
<div>
<h4 className="font-medium">{session.name}</h4>
<p className="text-sm text-slate-500">
{session.source_type} | {session.ocr_model}
</p>
</div>
<div className="text-right">
<p className="text-sm font-medium">
{session.labeled_items}/{session.total_items} gelabelt
</p>
<div className="w-32 bg-slate-200 rounded-full h-2 mt-1">
<div
className="bg-primary-600 rounded-full h-2"
style={{
width: `${session.total_items > 0 ? (session.labeled_items / session.total_items) * 100 : 0}%`
}}
/>
</div>
</div>
</div>
{session.description && (
<p className="text-sm text-slate-600 mt-2">{session.description}</p>
)}
</div>
))}
{sessions.length === 0 && (
<p className="p-4 text-slate-500 text-center">Keine Sessions vorhanden</p>
)}
</div>
</div>
</div>
)
}
// Render Upload Tab
const renderUploadTab = () => {
const [uploading, setUploading] = useState(false)
const [uploadResults, setUploadResults] = useState<any[]>([])
const fileInputRef = useRef<HTMLInputElement>(null)
const handleUpload = async (files: FileList) => {
if (!selectedSession) {
setError('Bitte zuerst eine Session auswaehlen')
return
}
setUploading(true)
const formData = new FormData()
Array.from(files).forEach(file => formData.append('files', file))
formData.append('run_ocr', 'true')
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions/${selectedSession}/upload`, {
method: 'POST',
body: formData,
})
if (res.ok) {
const data = await res.json()
setUploadResults(data.items || [])
fetchQueue()
fetchStats()
} else {
setError('Upload fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler beim Upload')
} finally {
setUploading(false)
}
}
return (
<div className="space-y-6">
{/* Session Selection */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Session auswaehlen</h3>
<select
value={selectedSession || ''}
onChange={(e) => setSelectedSession(e.target.value || null)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="">-- Session waehlen --</option>
{sessions.map((session) => (
<option key={session.id} value={session.id}>
{session.name} ({session.total_items} Items)
</option>
))}
</select>
</div>
{/* Upload Area */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Bilder hochladen</h3>
<div
className={`border-2 border-dashed rounded-lg p-8 text-center ${
selectedSession ? 'border-slate-300 hover:border-primary-500' : 'border-slate-200 opacity-50'
}`}
onDragOver={(e) => {
e.preventDefault()
e.currentTarget.classList.add('border-primary-500', 'bg-primary-50')
}}
onDragLeave={(e) => {
e.currentTarget.classList.remove('border-primary-500', 'bg-primary-50')
}}
onDrop={(e) => {
e.preventDefault()
e.currentTarget.classList.remove('border-primary-500', 'bg-primary-50')
if (e.dataTransfer.files.length > 0) {
handleUpload(e.dataTransfer.files)
}
}}
>
<input
ref={fileInputRef}
type="file"
multiple
accept="image/png,image/jpeg,image/jpg"
onChange={(e) => e.target.files && handleUpload(e.target.files)}
className="hidden"
disabled={!selectedSession}
/>
{uploading ? (
<div className="flex flex-col items-center gap-2">
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-primary-600" />
<p>Hochladen & OCR ausfuehren...</p>
</div>
) : (
<>
<svg className="w-12 h-12 text-slate-400 mx-auto mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16l4.586-4.586a2 2 0 012.828 0L16 16m-2-2l1.586-1.586a2 2 0 012.828 0L20 14m-6-6h.01M6 20h12a2 2 0 002-2V6a2 2 0 00-2-2H6a2 2 0 00-2 2v12a2 2 0 002 2z" />
</svg>
<p className="text-slate-600 mb-2">
Bilder hierher ziehen oder{' '}
<button
onClick={() => fileInputRef.current?.click()}
disabled={!selectedSession}
className="text-primary-600 hover:underline"
>
auswaehlen
</button>
</p>
<p className="text-sm text-slate-500">PNG, JPG (max. 10MB pro Bild)</p>
</>
)}
</div>
</div>
{/* Upload Results */}
{uploadResults.length > 0 && (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Upload-Ergebnisse ({uploadResults.length})</h3>
<div className="space-y-2">
{uploadResults.map((result) => (
<div key={result.id} className="flex items-center justify-between p-2 bg-slate-50 rounded">
<span className="text-sm">{result.filename}</span>
<span className={`text-xs px-2 py-1 rounded ${
result.ocr_text ? 'bg-green-100 text-green-800' : 'bg-yellow-100 text-yellow-800'
}`}>
{result.ocr_text ? `OCR OK (${Math.round((result.ocr_confidence || 0) * 100)}%)` : 'Kein OCR'}
</span>
</div>
))}
</div>
</div>
)}
</div>
)
}
// Render Stats Tab
const renderStatsTab = () => (
<div className="space-y-6">
{/* Global Stats */}
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Gesamt Items</h4>
<p className="text-3xl font-bold mt-2">{stats?.total_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Gelabelt</h4>
<p className="text-3xl font-bold mt-2 text-green-600">{stats?.labeled_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">Ausstehend</h4>
<p className="text-3xl font-bold mt-2 text-yellow-600">{stats?.pending_items || 0}</p>
</div>
<div className="bg-white rounded-lg shadow p-6">
<h4 className="text-sm font-medium text-slate-500">OCR-Genauigkeit</h4>
<p className="text-3xl font-bold mt-2">{stats?.accuracy_rate || 0}%</p>
</div>
</div>
{/* Detailed Stats */}
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Details</h3>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div>
<p className="text-sm text-slate-500">Bestaetigt</p>
<p className="text-xl font-semibold text-green-600">{stats?.confirmed_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Korrigiert</p>
<p className="text-xl font-semibold text-primary-600">{stats?.corrected_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Exportierbar</p>
<p className="text-xl font-semibold">{stats?.exportable_items || 0}</p>
</div>
<div>
<p className="text-sm text-slate-500">Durchschn. Label-Zeit</p>
<p className="text-xl font-semibold">{stats?.avg_label_time_seconds || 0}s</p>
</div>
</div>
</div>
{/* Progress Bar */}
{stats?.total_items ? (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Fortschritt</h3>
<div className="w-full bg-slate-200 rounded-full h-4">
<div
className="bg-primary-600 rounded-full h-4 transition-all"
style={{ width: `${(stats.labeled_items / stats.total_items) * 100}%` }}
/>
</div>
<p className="text-sm text-slate-500 mt-2">
{Math.round((stats.labeled_items / stats.total_items) * 100)}% abgeschlossen
</p>
</div>
) : null}
</div>
)
// Render Export Tab
const renderExportTab = () => {
const [exportFormat, setExportFormat] = useState<'generic' | 'trocr' | 'llama_vision'>('generic')
const [exporting, setExporting] = useState(false)
const [exportResult, setExportResult] = useState<any>(null)
const handleExport = async () => {
setExporting(true)
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/export`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
export_format: exportFormat,
session_id: selectedSession,
}),
})
if (res.ok) {
const data = await res.json()
setExportResult(data)
} else {
setError('Export fehlgeschlagen')
}
} catch (err) {
setError('Netzwerkfehler')
} finally {
setExporting(false)
}
}
return (
<div className="space-y-6">
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Training-Daten exportieren</h3>
<div className="space-y-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Export-Format</label>
<select
value={exportFormat}
onChange={(e) => setExportFormat(e.target.value as typeof exportFormat)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="generic">Generic JSON</option>
<option value="trocr">TrOCR Fine-Tuning</option>
<option value="llama_vision">Llama Vision Fine-Tuning</option>
</select>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-1">Session (optional)</label>
<select
value={selectedSession || ''}
onChange={(e) => setSelectedSession(e.target.value || null)}
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-primary-500"
>
<option value="">Alle Sessions</option>
{sessions.map((session) => (
<option key={session.id} value={session.id}>{session.name}</option>
))}
</select>
</div>
<button
onClick={handleExport}
disabled={exporting || (stats?.exportable_items || 0) === 0}
className="w-full px-4 py-2 bg-primary-600 text-white rounded-lg hover:bg-primary-700 disabled:opacity-50"
>
{exporting ? 'Exportiere...' : `${stats?.exportable_items || 0} Samples exportieren`}
</button>
{/* Cross-Link to Magic Help for TrOCR Fine-Tuning */}
{exportFormat === 'trocr' && (stats?.exportable_items || 0) > 0 && (
<Link
href="/ai/magic-help?source=ocr-labeling"
className="w-full mt-3 px-4 py-2 bg-purple-100 text-purple-700 border border-purple-300 rounded-lg hover:bg-purple-200 flex items-center justify-center gap-2 transition-colors"
>
<span></span>
Mit Magic Help testen & fine-tunen
</Link>
)}
</div>
</div>
{exportResult && (
<div className="bg-white rounded-lg shadow p-6">
<h3 className="text-lg font-semibold mb-4">Export-Ergebnis</h3>
<div className="bg-green-50 border border-green-200 rounded-lg p-4 mb-4">
<p className="text-green-800">
{exportResult.exported_count} Samples erfolgreich exportiert
</p>
<p className="text-sm text-green-600">
Batch: {exportResult.batch_id}
</p>
</div>
<div className="bg-slate-50 p-4 rounded-lg overflow-auto max-h-64">
<pre className="text-xs">{JSON.stringify(exportResult.samples?.slice(0, 3), null, 2)}</pre>
{(exportResult.samples?.length || 0) > 3 && (
<p className="text-slate-500 mt-2">... und {exportResult.samples.length - 3} weitere</p>
)}
</div>
</div>
)}
</div>
)
}
return ( return (
<div className="p-6"> <div className="p-6">
@@ -939,10 +56,10 @@ export default function OCRLabelingPage() {
<AIModuleSidebarResponsive currentModule="ocr-labeling" /> <AIModuleSidebarResponsive currentModule="ocr-labeling" />
{/* Error Toast */} {/* Error Toast */}
{error && ( {hook.error && (
<div className="fixed top-4 right-4 bg-red-100 border border-red-400 text-red-700 px-4 py-3 rounded z-50"> <div className="fixed top-4 right-4 bg-red-100 border border-red-400 text-red-700 px-4 py-3 rounded z-50">
<span>{error}</span> <span>{hook.error}</span>
<button onClick={() => setError(null)} className="ml-4">X</button> <button onClick={() => hook.setError(null)} className="ml-4">X</button>
</div> </div>
)} )}
@@ -969,17 +86,58 @@ export default function OCRLabelingPage() {
</div> </div>
{/* Tab Content */} {/* Tab Content */}
{loading ? ( {hook.loading ? (
<div className="flex items-center justify-center h-64"> <div className="flex items-center justify-center h-64">
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-primary-600" /> <div className="animate-spin rounded-full h-8 w-8 border-b-2 border-primary-600" />
</div> </div>
) : ( ) : (
<> <>
{activeTab === 'labeling' && renderLabelingTab()} {activeTab === 'labeling' && (
{activeTab === 'sessions' && renderSessionsTab()} <LabelingTab
{activeTab === 'upload' && renderUploadTab()} queue={hook.queue}
{activeTab === 'stats' && renderStatsTab()} currentItem={hook.currentItem}
{activeTab === 'export' && renderExportTab()} currentIndex={hook.currentIndex}
correctedText={hook.correctedText}
setCorrectedText={hook.setCorrectedText}
goToNext={hook.goToNext}
goToPrev={hook.goToPrev}
selectQueueItem={hook.selectQueueItem}
confirmItem={hook.confirmItem}
correctItem={hook.correctItem}
skipItem={hook.skipItem}
/>
)}
{activeTab === 'sessions' && (
<SessionsTab
sessions={hook.sessions}
selectedSession={hook.selectedSession}
setSelectedSession={hook.setSelectedSession}
fetchSessions={hook.fetchSessions}
setError={hook.setError}
/>
)}
{activeTab === 'upload' && (
<UploadTab
sessions={hook.sessions}
selectedSession={hook.selectedSession}
setSelectedSession={hook.setSelectedSession}
fetchQueue={hook.fetchQueue}
fetchStats={hook.fetchStats}
setError={hook.setError}
/>
)}
{activeTab === 'stats' && (
<StatsTab stats={hook.stats} />
)}
{activeTab === 'export' && (
<ExportTab
sessions={hook.sessions}
selectedSession={hook.selectedSession}
setSelectedSession={hook.setSelectedSession}
stats={hook.stats}
setError={hook.setError}
/>
)}
</> </>
)} )}
</div> </div>

View File

@@ -0,0 +1,255 @@
'use client'
/**
* Custom hook encapsulating all state and API logic for the OCR Labeling page.
*/
import { useState, useEffect, useCallback } from 'react'
import { API_BASE } from './constants'
import type { OCRSession, OCRItem, OCRStats } from './types'
export function useOcrLabeling() {
const [sessions, setSessions] = useState<OCRSession[]>([])
const [selectedSession, setSelectedSession] = useState<string | null>(null)
const [queue, setQueue] = useState<OCRItem[]>([])
const [currentItem, setCurrentItem] = useState<OCRItem | null>(null)
const [currentIndex, setCurrentIndex] = useState(0)
const [stats, setStats] = useState<OCRStats | null>(null)
const [loading, setLoading] = useState(true)
const [error, setError] = useState<string | null>(null)
const [correctedText, setCorrectedText] = useState('')
const [labelStartTime, setLabelStartTime] = useState<number | null>(null)
// Fetch sessions
const fetchSessions = useCallback(async () => {
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/sessions`)
if (res.ok) {
const data = await res.json()
setSessions(data)
}
} catch (err) {
console.error('Failed to fetch sessions:', err)
}
}, [])
// Fetch queue
const fetchQueue = useCallback(async () => {
try {
const url = selectedSession
? `${API_BASE}/api/v1/ocr-label/queue?session_id=${selectedSession}&limit=20`
: `${API_BASE}/api/v1/ocr-label/queue?limit=20`
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setQueue(data)
if (data.length > 0 && !currentItem) {
setCurrentItem(data[0])
setCurrentIndex(0)
setCorrectedText(data[0].ocr_text || '')
setLabelStartTime(Date.now())
}
}
} catch (err) {
console.error('Failed to fetch queue:', err)
}
}, [selectedSession, currentItem])
// Fetch stats
const fetchStats = useCallback(async () => {
try {
const url = selectedSession
? `${API_BASE}/api/v1/ocr-label/stats?session_id=${selectedSession}`
: `${API_BASE}/api/v1/ocr-label/stats`
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setStats(data)
}
} catch (err) {
console.error('Failed to fetch stats:', err)
}
}, [selectedSession])
// Initial data load
useEffect(() => {
const loadData = async () => {
setLoading(true)
await Promise.all([fetchSessions(), fetchQueue(), fetchStats()])
setLoading(false)
}
loadData()
}, [fetchSessions, fetchQueue, fetchStats])
// Refresh queue when session changes
useEffect(() => {
setCurrentItem(null)
setCurrentIndex(0)
fetchQueue()
fetchStats()
}, [selectedSession, fetchQueue, fetchStats])
// Navigate to next item
const goToNext = useCallback(() => {
if (currentIndex < queue.length - 1) {
const nextIndex = currentIndex + 1
setCurrentIndex(nextIndex)
setCurrentItem(queue[nextIndex])
setCorrectedText(queue[nextIndex].ocr_text || '')
setLabelStartTime(Date.now())
} else {
fetchQueue()
}
}, [currentIndex, queue, fetchQueue])
// Navigate to previous item
const goToPrev = useCallback(() => {
if (currentIndex > 0) {
const prevIndex = currentIndex - 1
setCurrentIndex(prevIndex)
setCurrentItem(queue[prevIndex])
setCorrectedText(queue[prevIndex].ocr_text || '')
setLabelStartTime(Date.now())
}
}, [currentIndex, queue])
// Calculate label time
const getLabelTime = useCallback((): number | undefined => {
if (!labelStartTime) return undefined
return Math.round((Date.now() - labelStartTime) / 1000)
}, [labelStartTime])
// Select a queue item by index
const selectQueueItem = useCallback((idx: number) => {
if (idx >= 0 && idx < queue.length) {
setCurrentIndex(idx)
setCurrentItem(queue[idx])
setCorrectedText(queue[idx].ocr_text || '')
setLabelStartTime(Date.now())
}
}, [queue])
// Confirm item
const confirmItem = useCallback(async () => {
if (!currentItem) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/confirm`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
item_id: currentItem.id,
label_time_seconds: getLabelTime(),
}),
})
if (res.ok) {
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Bestaetigung fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler')
}
}, [currentItem, getLabelTime, goToNext, fetchStats])
// Correct item
const correctItem = useCallback(async () => {
if (!currentItem || !correctedText.trim()) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/correct`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
item_id: currentItem.id,
ground_truth: correctedText.trim(),
label_time_seconds: getLabelTime(),
}),
})
if (res.ok) {
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Korrektur fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler')
}
}, [currentItem, correctedText, getLabelTime, goToNext, fetchStats])
// Skip item
const skipItem = useCallback(async () => {
if (!currentItem) return
try {
const res = await fetch(`${API_BASE}/api/v1/ocr-label/skip`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ item_id: currentItem.id }),
})
if (res.ok) {
setQueue(prev => prev.filter(item => item.id !== currentItem.id))
goToNext()
fetchStats()
} else {
setError('Ueberspringen fehlgeschlagen')
}
} catch {
setError('Netzwerkfehler')
}
}, [currentItem, goToNext, fetchStats])
// Keyboard shortcuts
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
if (e.target instanceof HTMLTextAreaElement) return
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault()
confirmItem()
} else if (e.key === 'ArrowRight') {
goToNext()
} else if (e.key === 'ArrowLeft') {
goToPrev()
} else if (e.key === 's' && !e.ctrlKey && !e.metaKey) {
skipItem()
}
}
window.addEventListener('keydown', handleKeyDown)
return () => window.removeEventListener('keydown', handleKeyDown)
}, [confirmItem, goToNext, goToPrev, skipItem])
return {
// State
sessions,
selectedSession,
setSelectedSession,
queue,
currentItem,
currentIndex,
stats,
loading,
error,
setError,
correctedText,
setCorrectedText,
// Actions
fetchSessions,
fetchQueue,
fetchStats,
goToNext,
goToPrev,
selectQueueItem,
confirmItem,
correctItem,
skipItem,
}
}

View File

@@ -1,751 +0,0 @@
'use client'
import { useCallback, useEffect, useState, useRef } from 'react'
import { useSearchParams } from 'next/navigation'
import { PagePurpose } from '@/components/common/PagePurpose'
import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
import { StepOrientation } from '@/components/ocr-pipeline/StepOrientation'
import { StepDeskew } from '@/components/ocr-pipeline/StepDeskew'
import { StepDewarp } from '@/components/ocr-pipeline/StepDewarp'
import { StepCrop } from '@/components/ocr-pipeline/StepCrop'
import { StepStructureDetection } from '@/components/ocr-pipeline/StepStructureDetection'
import { StepRowDetection } from '@/components/ocr-pipeline/StepRowDetection'
import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecognition'
import { OverlayReconstruction } from '@/components/ocr-overlay/OverlayReconstruction'
import { PaddleDirectStep } from '@/components/ocr-overlay/PaddleDirectStep'
import { GridEditor } from '@/components/grid-editor/GridEditor'
import { StepGridReview } from '@/components/ocr-pipeline/StepGridReview'
import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
import { OVERLAY_PIPELINE_STEPS, PADDLE_DIRECT_STEPS, KOMBI_STEPS, DOCUMENT_CATEGORIES, dbStepToOverlayUi, type PipelineStep, type SessionListItem, type DocumentCategory } from './types'
import type { SubSession } from '../ocr-pipeline/types'
const KLAUSUR_API = '/klausur-api'
export default function OcrOverlayPage() {
const [mode, setMode] = useState<'pipeline' | 'paddle-direct' | 'kombi'>('pipeline')
const [currentStep, setCurrentStep] = useState(0)
const [sessionId, setSessionId] = useState<string | null>(null)
const [sessionName, setSessionName] = useState<string>('')
const [sessions, setSessions] = useState<SessionListItem[]>([])
const [loadingSessions, setLoadingSessions] = useState(true)
const [editingName, setEditingName] = useState<string | null>(null)
const [editNameValue, setEditNameValue] = useState('')
const [editingCategory, setEditingCategory] = useState<string | null>(null)
const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
const [editingActiveCategory, setEditingActiveCategory] = useState(false)
const [subSessions, setSubSessions] = useState<SubSession[]>([])
const [parentSessionId, setParentSessionId] = useState<string | null>(null)
const [isGroundTruth, setIsGroundTruth] = useState(false)
const [gtSaving, setGtSaving] = useState(false)
const [gtMessage, setGtMessage] = useState('')
const [steps, setSteps] = useState<PipelineStep[]>(
OVERLAY_PIPELINE_STEPS.map((s, i) => ({
...s,
status: i === 0 ? 'active' : 'pending',
})),
)
const searchParams = useSearchParams()
const deepLinkHandled = useRef(false)
const gridSaveRef = useRef<(() => Promise<void>) | null>(null)
useEffect(() => {
loadSessions()
}, [])
const loadSessions = async () => {
setLoadingSessions(true)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
if (res.ok) {
const data = await res.json()
// Filter to only show top-level sessions (no sub-sessions)
setSessions((data.sessions || []).filter((s: SessionListItem) => !s.parent_session_id))
}
} catch (e) {
console.error('Failed to load sessions:', e)
} finally {
setLoadingSessions(false)
}
}
const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
if (!res.ok) return
const data = await res.json()
setSessionId(sid)
setSessionName(data.name || data.filename || '')
setActiveCategory(data.document_category || undefined)
setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
setGtMessage('')
// Sub-session handling
if (data.sub_sessions && data.sub_sessions.length > 0) {
setSubSessions(data.sub_sessions)
setParentSessionId(sid)
} else if (data.parent_session_id) {
setParentSessionId(data.parent_session_id)
} else if (!keepSubSessions) {
setSubSessions([])
setParentSessionId(null)
}
const isSubSession = !!data.parent_session_id
// Mode detection for root sessions with word_result
const ocrEngine = data.word_result?.ocr_engine
const isPaddleDirect = ocrEngine === 'paddle_direct'
const isKombi = ocrEngine === 'kombi' || ocrEngine === 'rapid_kombi'
let activeMode = mode // keep current mode for sub-sessions
if (!isSubSession && (isPaddleDirect || isKombi)) {
activeMode = isKombi ? 'kombi' : 'paddle-direct'
setMode(activeMode)
} else if (!isSubSession && !ocrEngine) {
// Unprocessed root session: keep the user's selected mode
activeMode = mode
}
const baseSteps = activeMode === 'kombi' ? KOMBI_STEPS
: activeMode === 'paddle-direct' ? PADDLE_DIRECT_STEPS
: OVERLAY_PIPELINE_STEPS
// Determine UI step
let uiStep: number
const skipIds: string[] = []
if (!isSubSession && (isPaddleDirect || isKombi)) {
const hasGrid = isKombi && data.grid_editor_result
const hasStructure = isKombi && data.structure_result
uiStep = hasGrid ? 6 : hasStructure ? 6 : data.word_result ? 5 : 4
if (isPaddleDirect) uiStep = data.word_result ? 4 : 4
} else {
const dbStep = data.current_step || 1
if (dbStep <= 2) uiStep = 0
else if (dbStep === 3) uiStep = 1
else if (dbStep === 4) uiStep = 2
else if (dbStep === 5) uiStep = 3
else uiStep = 4
// Sub-session skip logic
if (isSubSession) {
if (dbStep >= 5) {
skipIds.push('orientation', 'deskew', 'dewarp', 'crop')
if (uiStep < 4) uiStep = 4
} else if (dbStep >= 2) {
skipIds.push('orientation')
if (uiStep < 1) uiStep = 1 // advance past skipped orientation to deskew
}
}
}
setSteps(
baseSteps.map((s, i) => ({
...s,
status: skipIds.includes(s.id)
? 'skipped'
: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
})),
)
setCurrentStep(uiStep)
} catch (e) {
console.error('Failed to open session:', e)
}
}, [mode])
// Handle deep-link: ?session=xxx&mode=kombi (from GT Queue page)
useEffect(() => {
if (deepLinkHandled.current) return
const urlSession = searchParams.get('session')
const urlMode = searchParams.get('mode')
if (urlSession) {
deepLinkHandled.current = true
if (urlMode === 'kombi' || urlMode === 'paddle-direct') {
setMode(urlMode)
const baseSteps = urlMode === 'kombi' ? KOMBI_STEPS : PADDLE_DIRECT_STEPS
setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}
openSession(urlSession)
}
}, [searchParams, openSession])
const deleteSession = useCallback(async (sid: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
setSessions((prev) => prev.filter((s) => s.id !== sid))
if (sessionId === sid) {
setSessionId(null)
setCurrentStep(0)
setSubSessions([])
setParentSessionId(null)
const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}
} catch (e) {
console.error('Failed to delete session:', e)
}
}, [sessionId, mode])
const renameSession = useCallback(async (sid: string, newName: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: newName }),
})
setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, name: newName } : s)))
if (sessionId === sid) setSessionName(newName)
} catch (e) {
console.error('Failed to rename session:', e)
}
setEditingName(null)
}, [sessionId])
const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ document_category: category }),
})
setSessions((prev) => prev.map((s) => (s.id === sid ? { ...s, document_category: category } : s)))
if (sessionId === sid) setActiveCategory(category)
} catch (e) {
console.error('Failed to update category:', e)
}
setEditingCategory(null)
}, [sessionId])
const handleStepClick = (index: number) => {
if (index <= currentStep || steps[index].status === 'completed') {
setCurrentStep(index)
}
}
const goToStep = (step: number) => {
setCurrentStep(step)
setSteps((prev) =>
prev.map((s, i) => ({
...s,
status: i < step ? 'completed' : i === step ? 'active' : 'pending',
})),
)
}
const handleNext = () => {
if (currentStep >= steps.length - 1) {
// Sub-session completed — switch back to parent
if (parentSessionId && sessionId !== parentSessionId) {
setSubSessions((prev) =>
prev.map((s) => s.id === sessionId ? { ...s, status: 'completed', current_step: 10 } : s)
)
handleSessionChange(parentSessionId)
return
}
// Last step completed — return to session list
const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
setCurrentStep(0)
setSessionId(null)
setSubSessions([])
setParentSessionId(null)
loadSessions()
return
}
const nextStep = currentStep + 1
setSteps((prev) =>
prev.map((s, i) => {
if (i === currentStep) return { ...s, status: 'completed' }
if (i === nextStep) return { ...s, status: 'active' }
return s
}),
)
setCurrentStep(nextStep)
}
const handleOrientationComplete = async (sid: string) => {
setSessionId(sid)
loadSessions()
// Check for page-split sub-sessions directly from API
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
if (res.ok) {
const data = await res.json()
if (data.sub_sessions?.length > 0) {
const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
id: s.id,
name: s.name,
box_index: s.box_index,
current_step: s.current_step,
}))
setSubSessions(subs)
setParentSessionId(sid)
openSession(subs[0].id, true)
return
}
}
} catch (e) {
console.error('Failed to check for sub-sessions:', e)
}
handleNext()
}
const handleBoxSessionsCreated = useCallback((subs: SubSession[]) => {
setSubSessions(subs)
if (sessionId) setParentSessionId(sessionId)
}, [sessionId])
const handleSessionChange = useCallback((newSessionId: string) => {
openSession(newSessionId, true)
}, [openSession])
const handleNewSession = () => {
setSessionId(null)
setSessionName('')
setCurrentStep(0)
setSubSessions([])
setParentSessionId(null)
const baseSteps = mode === 'kombi' ? KOMBI_STEPS : mode === 'paddle-direct' ? PADDLE_DIRECT_STEPS : OVERLAY_PIPELINE_STEPS
setSteps(baseSteps.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}
const stepNames: Record<number, string> = {
1: 'Orientierung',
2: 'Begradigung',
3: 'Entzerrung',
4: 'Zuschneiden',
5: 'Zeilen',
6: 'Woerter',
7: 'Overlay',
}
const reprocessFromStep = useCallback(async (uiStep: number) => {
if (!sessionId) return
// Map overlay UI step to DB step
const dbStepMap: Record<number, number> = { 0: 2, 1: 3, 2: 4, 3: 5, 4: 7, 5: 8, 6: 9 }
const dbStep = dbStepMap[uiStep] || uiStep + 1
if (!confirm(`Ab Schritt ${uiStep + 1} (${stepNames[uiStep + 1] || '?'}) neu verarbeiten?`)) return
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ from_step: dbStep }),
})
if (!res.ok) {
const data = await res.json().catch(() => ({}))
console.error('Reprocess failed:', data.detail || res.status)
return
}
goToStep(uiStep)
} catch (e) {
console.error('Reprocess error:', e)
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [sessionId, goToStep])
const handleMarkGroundTruth = async () => {
if (!sessionId) return
setGtSaving(true)
setGtMessage('')
try {
// Auto-save grid editor before marking GT (so DB has latest edits)
if (gridSaveRef.current) {
await gridSaveRef.current()
}
const resp = await fetch(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/mark-ground-truth?pipeline=${mode}`,
{ method: 'POST' }
)
if (!resp.ok) {
const body = await resp.text().catch(() => '')
throw new Error(`Ground Truth fehlgeschlagen (${resp.status}): ${body}`)
}
const data = await resp.json()
setIsGroundTruth(true)
setGtMessage(`Ground Truth gespeichert (${data.cells_saved} Zellen)`)
setTimeout(() => setGtMessage(''), 5000)
} catch (e) {
setGtMessage(e instanceof Error ? e.message : String(e))
} finally {
setGtSaving(false)
}
}
const isLastStep = currentStep === steps.length - 1
const showGtButton = isLastStep && sessionId != null
const renderStep = () => {
if (mode === 'paddle-direct' || mode === 'kombi') {
switch (currentStep) {
case 0:
return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
case 1:
return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 2:
return <StepDewarp key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 3:
return <StepCrop key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 4:
if (mode === 'kombi') {
return (
<PaddleDirectStep
sessionId={sessionId}
onNext={handleNext}
endpoint="paddle-kombi"
title="Kombi-Modus"
description="PP-OCRv5 und Tesseract laufen parallel. Koordinaten werden gewichtet gemittelt fuer optimale Positionierung."
icon="🔀"
buttonLabel="PP-OCRv5 + Tesseract starten"
runningLabel="PP-OCRv5 + Tesseract laufen..."
engineKey="kombi"
/>
)
}
return <PaddleDirectStep sessionId={sessionId} onNext={handleNext} />
case 5:
return mode === 'kombi' ? (
<StepStructureDetection sessionId={sessionId} onNext={handleNext} />
) : null
case 6:
return mode === 'kombi' ? (
<StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
) : null
default:
return null
}
}
switch (currentStep) {
case 0:
return <StepOrientation key={sessionId} sessionId={sessionId} onNext={handleOrientationComplete} onSessionList={() => { loadSessions(); setSessionId(null) }} />
case 1:
return <StepDeskew key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 2:
return <StepDewarp key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 3:
return <StepCrop key={sessionId} sessionId={sessionId} onNext={handleNext} />
case 4:
return <StepRowDetection sessionId={sessionId} onNext={handleNext} />
case 5:
return <StepWordRecognition sessionId={sessionId} onNext={handleNext} goToStep={goToStep} skipHealGaps />
case 6:
return <OverlayReconstruction sessionId={sessionId} onNext={handleNext} />
default:
return null
}
}
return (
<div className="space-y-6">
<PagePurpose
title="OCR Overlay"
purpose="Ganzseitige Overlay-Rekonstruktion: Scan begradigen, Zeilen und Woerter erkennen, dann pixelgenau ueber das Bild legen. Ohne Spaltenerkennung — ideal fuer Arbeitsblaetter."
audience={['Entwickler']}
architecture={{
services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract'],
databases: ['PostgreSQL Sessions'],
}}
relatedPages={[
{ name: 'OCR Pipeline', href: '/ai/ocr-pipeline', description: 'Volle Pipeline mit Spalten' },
{ name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'Methoden-Vergleich' },
]}
defaultCollapsed
/>
{/* Session List */}
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
<div className="flex items-center justify-between mb-3">
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Sessions ({sessions.length})
</h3>
<button
onClick={handleNewSession}
className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
>
+ Neue Session
</button>
</div>
{loadingSessions ? (
<div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
) : sessions.length === 0 ? (
<div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
) : (
<div className="space-y-1.5 max-h-[320px] overflow-y-auto">
{sessions.map((s) => {
const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === s.document_category)
return (
<div
key={s.id}
className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
sessionId === s.id
? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
: 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
}`}
>
{/* Thumbnail */}
<div
className="flex-shrink-0 w-12 h-12 rounded-md overflow-hidden bg-gray-100 dark:bg-gray-700"
onClick={() => openSession(s.id)}
>
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${s.id}/thumbnail?size=96`}
alt=""
className="w-full h-full object-cover"
loading="lazy"
onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
/>
</div>
{/* Info */}
<div className="flex-1 min-w-0" onClick={() => openSession(s.id)}>
{editingName === s.id ? (
<input
autoFocus
value={editNameValue}
onChange={(e) => setEditNameValue(e.target.value)}
onBlur={() => renameSession(s.id, editNameValue)}
onKeyDown={(e) => {
if (e.key === 'Enter') renameSession(s.id, editNameValue)
if (e.key === 'Escape') setEditingName(null)
}}
onClick={(e) => e.stopPropagation()}
className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
/>
) : (
<div className="truncate font-medium text-gray-700 dark:text-gray-300">
{s.name || s.filename}
</div>
)}
<button
onClick={(e) => {
e.stopPropagation()
navigator.clipboard.writeText(s.id)
const btn = e.currentTarget
btn.textContent = 'Kopiert!'
setTimeout(() => { btn.textContent = `ID: ${s.id.slice(0, 8)}` }, 1500)
}}
className="text-[10px] font-mono text-gray-400 hover:text-teal-500 transition-colors"
title={`Volle ID: ${s.id} — Klick zum Kopieren`}
>
ID: {s.id.slice(0, 8)}
</button>
<div className="text-xs text-gray-400 flex gap-2 mt-0.5">
<span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
</div>
</div>
{/* Category Badge */}
<div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
<button
onClick={() => setEditingCategory(editingCategory === s.id ? null : s.id)}
className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
catInfo
? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
: 'bg-gray-50 dark:bg-gray-700 border-gray-200 dark:border-gray-600 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300'
}`}
title="Kategorie setzen"
>
{catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
</button>
</div>
{/* Actions */}
<div className="flex flex-col gap-0.5 flex-shrink-0">
<button
onClick={(e) => {
e.stopPropagation()
setEditNameValue(s.name || s.filename)
setEditingName(s.id)
}}
className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
title="Umbenennen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
</button>
<button
onClick={(e) => {
e.stopPropagation()
if (confirm('Session loeschen?')) deleteSession(s.id)
}}
className="p-1 text-gray-400 hover:text-red-500"
title="Loeschen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
</svg>
</button>
</div>
{/* Category dropdown */}
{editingCategory === s.id && (
<div
className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
onClick={(e) => e.stopPropagation()}
>
{DOCUMENT_CATEGORIES.map((cat) => (
<button
key={cat.value}
onClick={() => updateCategory(s.id, cat.value)}
className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
s.document_category === cat.value
? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
: 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}
>
{cat.icon} {cat.label}
</button>
))}
</div>
)}
</div>
)
})}
</div>
)}
</div>
{/* Active session info + category picker */}
{sessionId && sessionName && (
<div className="relative flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
<span>Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span></span>
<button
onClick={() => setEditingActiveCategory(!editingActiveCategory)}
className={`text-xs px-2.5 py-1 rounded-full border transition-colors ${
activeCategory
? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300 hover:bg-teal-100 dark:hover:bg-teal-900/50'
: 'bg-amber-50 dark:bg-amber-900/20 border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300 hover:bg-amber-100 dark:hover:bg-amber-900/40 animate-pulse'
}`}
>
{activeCategory ? (() => {
const cat = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
return cat ? `${cat.icon} ${cat.label}` : activeCategory
})() : 'Kategorie setzen'}
</button>
{isGroundTruth && (
<span className="text-xs px-2 py-0.5 rounded-full bg-amber-50 dark:bg-amber-900/20 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
GT
</span>
)}
{editingActiveCategory && (
<div className="absolute left-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64">
{DOCUMENT_CATEGORIES.map((cat) => (
<button
key={cat.value}
onClick={() => {
updateCategory(sessionId, cat.value)
setEditingActiveCategory(false)
}}
className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
activeCategory === cat.value
? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
: 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}
>
{cat.icon} {cat.label}
</button>
))}
</div>
)}
</div>
)}
{/* Mode Toggle */}
<div className="flex items-center gap-1 bg-gray-100 dark:bg-gray-800 rounded-lg p-1 w-fit">
<button
onClick={() => {
if (mode === 'pipeline') return
setMode('pipeline')
setCurrentStep(0)
setSessionId(null)
setSteps(OVERLAY_PIPELINE_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}}
className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
mode === 'pipeline'
? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
: 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
}`}
>
Pipeline (7 Schritte)
</button>
<button
onClick={() => {
if (mode === 'paddle-direct') return
setMode('paddle-direct')
setCurrentStep(0)
setSessionId(null)
setSteps(PADDLE_DIRECT_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}}
className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
mode === 'paddle-direct'
? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
: 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
}`}
>
PP-OCRv5 Direct (5 Schritte)
</button>
<button
onClick={() => {
if (mode === 'kombi') return
setMode('kombi')
setCurrentStep(0)
setSessionId(null)
setSteps(KOMBI_STEPS.map((s, i) => ({ ...s, status: i === 0 ? 'active' : 'pending' })))
}}
className={`px-3 py-1.5 text-xs font-medium rounded-md transition-colors ${
mode === 'kombi'
? 'bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200 shadow-sm'
: 'text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-300'
}`}
>
Kombi (7 Schritte)
</button>
</div>
<PipelineStepper
steps={steps}
currentStep={currentStep}
onStepClick={handleStepClick}
onReprocess={mode === 'pipeline' && sessionId != null ? reprocessFromStep : undefined}
/>
{subSessions.length > 0 && parentSessionId && sessionId && (
<BoxSessionTabs
parentSessionId={parentSessionId}
subSessions={subSessions}
activeSessionId={sessionId}
onSessionChange={handleSessionChange}
/>
)}
<div className="min-h-[400px]">{renderStep()}</div>
{/* Ground Truth button bar — visible on last step */}
{showGtButton && (
<div className="sticky bottom-0 bg-white dark:bg-gray-900 border-t dark:border-gray-700 py-3 px-4 -mx-1 flex items-center justify-between rounded-b-xl">
<div className="text-sm text-gray-500 dark:text-gray-400">
{gtMessage && (
<span className={gtMessage.includes('fehlgeschlagen') ? 'text-red-500' : 'text-amber-600 dark:text-amber-400'}>
{gtMessage}
</span>
)}
</div>
<button
onClick={handleMarkGroundTruth}
disabled={gtSaving}
className="px-4 py-2 text-sm bg-amber-600 text-white rounded hover:bg-amber-700 disabled:opacity-50"
>
{gtSaving ? 'Speichere...' : isGroundTruth ? 'Ground Truth aktualisieren' : 'Als Ground Truth markieren'}
</button>
</div>
)}
</div>
)
}

View File

@@ -1,87 +0,0 @@
import type { PipelineStep } from '../ocr-pipeline/types'
// Re-export types used by overlay components
export type {
PipelineStep,
PipelineStepStatus,
SessionListItem,
SessionInfo,
DocumentCategory,
DocumentTypeResult,
OrientationResult,
CropResult,
DeskewResult,
DewarpResult,
RowResult,
RowItem,
GridResult,
GridCell,
OcrWordBox,
WordBbox,
ColumnMeta,
} from '../ocr-pipeline/types'
export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
/**
* 7-step pipeline for full-page overlay reconstruction.
* Skips: Spalten (columns), LLM-Review (Korrektur), Ground-Truth (Validierung)
*/
export const OVERLAY_PIPELINE_STEPS: PipelineStep[] = [
{ id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
{ id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
{ id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
{ id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
{ id: 'rows', name: 'Zeilen', icon: '📏', status: 'pending' },
{ id: 'words', name: 'Woerter', icon: '🔤', status: 'pending' },
{ id: 'reconstruction', name: 'Overlay', icon: '🏗️', status: 'pending' },
]
/** Map from overlay UI step index to DB step number (1-indexed) */
export const OVERLAY_UI_TO_DB: Record<number, number> = {
0: 2, // orientation
1: 3, // deskew
2: 4, // dewarp
3: 5, // crop
4: 6, // rows (skip columns=6 in DB, rows=7 — but we reuse DB step numbering)
5: 7, // words
6: 9, // reconstruction
}
/**
* 5-step pipeline for Paddle Direct mode.
* Same preprocessing (orient/deskew/dewarp/crop), then PaddleOCR replaces rows+words+overlay.
*/
export const PADDLE_DIRECT_STEPS: PipelineStep[] = [
{ id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
{ id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
{ id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
{ id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
{ id: 'paddle-direct', name: 'PP-OCRv5 + Overlay', icon: '⚡', status: 'pending' },
]
/**
* 5-step pipeline for Kombi mode (PP-OCRv5 + Tesseract).
* Same preprocessing, then both engines run and results are merged.
*/
export const KOMBI_STEPS: PipelineStep[] = [
{ id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
{ id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
{ id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
{ id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
{ id: 'kombi', name: 'PP-OCRv5 + Tesseract', icon: '🔀', status: 'pending' },
{ id: 'structure', name: 'Struktur', icon: '🔍', status: 'pending' },
{ id: 'grid-editor', name: 'Review & GT', icon: '📊', status: 'pending' },
]
/** Map from DB step to overlay UI step index */
export function dbStepToOverlayUi(dbStep: number): number {
// DB: 1=start, 2=orient, 3=deskew, 4=dewarp, 5=crop, 6=columns, 7=rows, 8=words, 9=recon, 10=gt
if (dbStep <= 2) return 0 // orientation
if (dbStep === 3) return 1 // deskew
if (dbStep === 4) return 2 // dewarp
if (dbStep === 5) return 3 // crop
if (dbStep <= 7) return 4 // rows (skip columns)
if (dbStep === 8) return 5 // words
return 6 // reconstruction
}

View File

@@ -1,443 +0,0 @@
'use client'
import { Suspense, useCallback, useEffect, useState } from 'react'
import { PagePurpose } from '@/components/common/PagePurpose'
import { PipelineStepper } from '@/components/ocr-pipeline/PipelineStepper'
import { StepOrientation } from '@/components/ocr-pipeline/StepOrientation'
import { StepCrop } from '@/components/ocr-pipeline/StepCrop'
import { StepDeskew } from '@/components/ocr-pipeline/StepDeskew'
import { StepDewarp } from '@/components/ocr-pipeline/StepDewarp'
import { StepStructureDetection } from '@/components/ocr-pipeline/StepStructureDetection'
import { StepColumnDetection } from '@/components/ocr-pipeline/StepColumnDetection'
import { StepRowDetection } from '@/components/ocr-pipeline/StepRowDetection'
import { StepWordRecognition } from '@/components/ocr-pipeline/StepWordRecognition'
import { StepLlmReview } from '@/components/ocr-pipeline/StepLlmReview'
import { StepReconstruction } from '@/components/ocr-pipeline/StepReconstruction'
import { StepGroundTruth } from '@/components/ocr-pipeline/StepGroundTruth'
import { DOCUMENT_CATEGORIES, type SessionListItem, type DocumentTypeResult, type DocumentCategory, type SubSession } from './types'
import { usePipelineNavigation } from './usePipelineNavigation'
const KLAUSUR_API = '/klausur-api'
const STEP_NAMES: Record<number, string> = {
1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
}
function OcrPipelineContent() {
const nav = usePipelineNavigation()
const [sessions, setSessions] = useState<SessionListItem[]>([])
const [loadingSessions, setLoadingSessions] = useState(true)
const [editingName, setEditingName] = useState<string | null>(null)
const [editNameValue, setEditNameValue] = useState('')
const [editingCategory, setEditingCategory] = useState<string | null>(null)
const [sessionName, setSessionName] = useState('')
const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
const loadSessions = useCallback(async () => {
setLoadingSessions(true)
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
if (res.ok) {
const data = await res.json()
setSessions(data.sessions || [])
}
} catch (e) {
console.error('Failed to load sessions:', e)
} finally {
setLoadingSessions(false)
}
}, [])
useEffect(() => { loadSessions() }, [loadSessions])
// Sync session name when nav.sessionId changes
useEffect(() => {
if (!nav.sessionId) {
setSessionName('')
setActiveCategory(undefined)
return
}
const load = async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}`)
if (!res.ok) return
const data = await res.json()
setSessionName(data.name || data.filename || '')
setActiveCategory(data.document_category || undefined)
} catch { /* ignore */ }
}
load()
}, [nav.sessionId])
const openSession = useCallback((sid: string) => {
nav.goToSession(sid)
}, [nav])
const deleteSession = useCallback(async (sid: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, { method: 'DELETE' })
setSessions(prev => prev.filter(s => s.id !== sid))
if (nav.sessionId === sid) nav.goToSessionList()
} catch (e) {
console.error('Failed to delete session:', e)
}
}, [nav])
const renameSession = useCallback(async (sid: string, newName: string) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: newName }),
})
setSessions(prev => prev.map(s => (s.id === sid ? { ...s, name: newName } : s)))
if (nav.sessionId === sid) setSessionName(newName)
} catch (e) {
console.error('Failed to rename session:', e)
}
setEditingName(null)
}, [nav.sessionId])
const updateCategory = useCallback(async (sid: string, category: DocumentCategory) => {
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ document_category: category }),
})
setSessions(prev => prev.map(s => (s.id === sid ? { ...s, document_category: category } : s)))
if (nav.sessionId === sid) setActiveCategory(category)
} catch (e) {
console.error('Failed to update category:', e)
}
setEditingCategory(null)
}, [nav.sessionId])
const deleteAllSessions = useCallback(async () => {
if (!confirm('Alle Sessions loeschen? Dies kann nicht rueckgaengig gemacht werden.')) return
try {
await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`, { method: 'DELETE' })
setSessions([])
nav.goToSessionList()
} catch (e) {
console.error('Failed to delete all sessions:', e)
}
}, [nav])
const handleStepClick = (index: number) => {
if (index <= nav.currentStepIndex || nav.steps[index].status === 'completed') {
nav.goToStep(index)
}
}
// Orientation: after upload, navigate to session at deskew step
const handleOrientationComplete = useCallback(async (sid: string) => {
loadSessions()
// Navigate directly to deskew step (index 1) for this session
nav.goToSession(sid)
}, [nav, loadSessions])
// Crop: detect doc type then advance
const handleCropNext = useCallback(async () => {
if (nav.sessionId) {
try {
const res = await fetch(
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${nav.sessionId}/detect-type`,
{ method: 'POST' },
)
if (res.ok) {
const data: DocumentTypeResult = await res.json()
nav.setDocType(data)
}
} catch (e) {
console.error('Doc type detection failed:', e)
}
}
nav.goToNextStep()
}, [nav])
const handleDocTypeChange = (newDocType: DocumentTypeResult['doc_type']) => {
if (!nav.docTypeResult) return
let skipSteps: string[] = []
if (newDocType === 'full_text') skipSteps = ['columns', 'rows']
nav.setDocType({
...nav.docTypeResult,
doc_type: newDocType,
skip_steps: skipSteps,
pipeline: newDocType === 'full_text' ? 'full_page' : 'cell_first',
})
}
// Box sub-sessions (column detection) — still supported
const handleBoxSessionsCreated = useCallback((_subs: SubSession[]) => {
// Box sub-sessions are tracked by the backend; no client-side state needed anymore
}, [])
const renderStep = () => {
const sid = nav.sessionId
switch (nav.currentStepIndex) {
case 0:
return (
<StepOrientation
key={sid}
sessionId={sid}
onNext={handleOrientationComplete}
onSessionList={() => { loadSessions(); nav.goToSessionList() }}
/>
)
case 1:
return <StepDeskew key={sid} sessionId={sid} onNext={nav.goToNextStep} />
case 2:
return <StepDewarp key={sid} sessionId={sid} onNext={nav.goToNextStep} />
case 3:
return <StepCrop key={sid} sessionId={sid} onNext={handleCropNext} />
case 4:
return <StepColumnDetection sessionId={sid} onNext={nav.goToNextStep} onBoxSessionsCreated={handleBoxSessionsCreated} />
case 5:
return <StepRowDetection sessionId={sid} onNext={nav.goToNextStep} />
case 6:
return <StepWordRecognition sessionId={sid} onNext={nav.goToNextStep} goToStep={nav.goToStep} />
case 7:
return <StepStructureDetection sessionId={sid} onNext={nav.goToNextStep} />
case 8:
return <StepLlmReview sessionId={sid} onNext={nav.goToNextStep} />
case 9:
return <StepReconstruction sessionId={sid} onNext={nav.goToNextStep} />
case 10:
return <StepGroundTruth sessionId={sid} onNext={nav.goToNextStep} />
default:
return null
}
}
return (
<div className="space-y-6">
<PagePurpose
title="OCR Pipeline"
purpose="Schrittweise Seitenrekonstruktion: Scan begradigen, Spalten erkennen, Woerter lokalisieren und die Seite Wort fuer Wort nachbauen. Ziel: 10 Vokabelseiten fehlerfrei rekonstruieren."
audience={['Entwickler', 'Data Scientists']}
architecture={{
services: ['klausur-service (FastAPI)', 'OpenCV', 'Tesseract'],
databases: ['PostgreSQL Sessions'],
}}
relatedPages={[
{ name: 'OCR Vergleich', href: '/ai/ocr-compare', description: 'Methoden-Vergleich' },
{ name: 'OCR-Labeling', href: '/ai/ocr-labeling', description: 'Trainingsdaten' },
]}
defaultCollapsed
/>
{/* Session List */}
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-4">
<div className="flex items-center justify-between mb-3">
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
Sessions ({sessions.length})
</h3>
<div className="flex gap-2">
{sessions.length > 0 && (
<button
onClick={deleteAllSessions}
className="text-xs px-3 py-1.5 text-red-600 hover:bg-red-50 dark:hover:bg-red-900/20 rounded-lg transition-colors"
title="Alle Sessions loeschen"
>
Alle loeschen
</button>
)}
<button
onClick={() => nav.goToSessionList()}
className="text-xs px-3 py-1.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
>
+ Neue Session
</button>
</div>
</div>
{loadingSessions ? (
<div className="text-sm text-gray-400 py-2">Lade Sessions...</div>
) : sessions.length === 0 ? (
<div className="text-sm text-gray-400 py-2">Noch keine Sessions vorhanden.</div>
) : (
<div className="space-y-1.5 max-h-[320px] overflow-y-auto">
{sessions.map((s) => {
const catInfo = DOCUMENT_CATEGORIES.find(c => c.value === s.document_category)
return (
<div
key={s.id}
className={`relative flex items-start gap-3 px-3 py-2.5 rounded-lg text-sm transition-colors cursor-pointer ${
nav.sessionId === s.id
? 'bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700'
: 'hover:bg-gray-50 dark:hover:bg-gray-700/50'
}`}
>
{/* Thumbnail */}
<div
className="flex-shrink-0 w-12 h-12 rounded-md overflow-hidden bg-gray-100 dark:bg-gray-700"
onClick={() => openSession(s.id)}
>
{/* eslint-disable-next-line @next/next/no-img-element */}
<img
src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${s.id}/thumbnail?size=96`}
alt=""
className="w-full h-full object-cover"
loading="lazy"
onError={(e) => { (e.target as HTMLImageElement).style.display = 'none' }}
/>
</div>
{/* Info */}
<div className="flex-1 min-w-0" onClick={() => openSession(s.id)}>
{editingName === s.id ? (
<input
autoFocus
value={editNameValue}
onChange={(e) => setEditNameValue(e.target.value)}
onBlur={() => renameSession(s.id, editNameValue)}
onKeyDown={(e) => {
if (e.key === 'Enter') renameSession(s.id, editNameValue)
if (e.key === 'Escape') setEditingName(null)
}}
onClick={(e) => e.stopPropagation()}
className="w-full px-1 py-0.5 text-sm border rounded dark:bg-gray-700 dark:border-gray-600"
/>
) : (
<div className="truncate font-medium text-gray-700 dark:text-gray-300">
{s.name || s.filename}
</div>
)}
{/* ID row */}
<button
onClick={(e) => {
e.stopPropagation()
navigator.clipboard.writeText(s.id)
const btn = e.currentTarget
btn.textContent = 'Kopiert!'
setTimeout(() => { btn.textContent = `ID: ${s.id.slice(0, 8)}` }, 1500)
}}
className="text-[10px] font-mono text-gray-400 hover:text-teal-500 transition-colors"
title={`Volle ID: ${s.id} — Klick zum Kopieren`}
>
ID: {s.id.slice(0, 8)}
</button>
<div className="text-xs text-gray-400 flex gap-2 mt-0.5">
<span>{new Date(s.created_at).toLocaleDateString('de-DE', { day: '2-digit', month: '2-digit', year: '2-digit', hour: '2-digit', minute: '2-digit' })}</span>
<span>Schritt {s.current_step}: {STEP_NAMES[s.current_step] || '?'}</span>
</div>
</div>
{/* Badges */}
<div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
<button
onClick={() => setEditingCategory(editingCategory === s.id ? null : s.id)}
className={`text-[10px] px-1.5 py-0.5 rounded-full border transition-colors ${
catInfo
? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
: 'bg-gray-50 dark:bg-gray-700 border-gray-200 dark:border-gray-600 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300'
}`}
title="Kategorie setzen"
>
{catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
</button>
{s.doc_type && (
<span className="text-[10px] px-1.5 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
{s.doc_type}
</span>
)}
</div>
{/* Action buttons */}
<div className="flex flex-col gap-0.5 flex-shrink-0">
<button
onClick={(e) => {
e.stopPropagation()
setEditNameValue(s.name || s.filename)
setEditingName(s.id)
}}
className="p-1 text-gray-400 hover:text-gray-600 dark:hover:text-gray-300"
title="Umbenennen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.232 5.232l3.536 3.536m-2.036-5.036a2.5 2.5 0 113.536 3.536L6.5 21.036H3v-3.572L16.732 3.732z" />
</svg>
</button>
<button
onClick={(e) => {
e.stopPropagation()
if (confirm('Session loeschen?')) deleteSession(s.id)
}}
className="p-1 text-gray-400 hover:text-red-500"
title="Loeschen"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
</svg>
</button>
</div>
{/* Category dropdown */}
{editingCategory === s.id && (
<div
className="absolute right-0 top-full mt-1 z-20 bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg p-2 grid grid-cols-2 gap-1 w-64"
onClick={(e) => e.stopPropagation()}
>
{DOCUMENT_CATEGORIES.map((cat) => (
<button
key={cat.value}
onClick={() => updateCategory(s.id, cat.value)}
className={`text-xs px-2 py-1.5 rounded-md text-left transition-colors ${
s.document_category === cat.value
? 'bg-teal-100 dark:bg-teal-900/40 text-teal-700 dark:text-teal-300'
: 'hover:bg-gray-100 dark:hover:bg-gray-700 text-gray-600 dark:text-gray-400'
}`}
>
{cat.icon} {cat.label}
</button>
))}
</div>
)}
</div>
)
})}
</div>
)}
</div>
{/* Active session info */}
{nav.sessionId && sessionName && (
<div className="flex items-center gap-3 text-sm text-gray-500 dark:text-gray-400">
<span>Aktive Session: <span className="font-medium text-gray-700 dark:text-gray-300">{sessionName}</span></span>
{activeCategory && (() => {
const cat = DOCUMENT_CATEGORIES.find(c => c.value === activeCategory)
return cat ? <span className="text-xs px-2 py-0.5 rounded-full bg-teal-50 dark:bg-teal-900/30 border border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300">{cat.icon} {cat.label}</span> : null
})()}
{nav.docTypeResult && (
<span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 text-gray-500 dark:text-gray-400 border border-gray-200 dark:border-gray-600">
{nav.docTypeResult.doc_type}
</span>
)}
</div>
)}
<PipelineStepper
steps={nav.steps}
currentStep={nav.currentStepIndex}
onStepClick={handleStepClick}
onReprocess={nav.sessionId ? nav.reprocessFromStep : undefined}
docTypeResult={nav.docTypeResult}
onDocTypeChange={handleDocTypeChange}
/>
<div className="min-h-[400px]">{renderStep()}</div>
</div>
)
}
export default function OcrPipelinePage() {
return (
<Suspense fallback={<div className="p-8 text-gray-400">Lade Pipeline...</div>}>
<OcrPipelineContent />
</Suspense>
)
}

View File

@@ -1,429 +0,0 @@
export type PipelineStepStatus = 'pending' | 'active' | 'completed' | 'failed' | 'skipped'
export interface PipelineStep {
id: string
name: string
icon: string
status: PipelineStepStatus
}
export type DocumentCategory =
| 'vokabelseite' | 'woerterbuch' | 'buchseite' | 'arbeitsblatt' | 'klausurseite'
| 'mathearbeit' | 'statistik' | 'zeitung' | 'formular' | 'handschrift' | 'sonstiges'
export const DOCUMENT_CATEGORIES: { value: DocumentCategory; label: string; icon: string }[] = [
{ value: 'vokabelseite', label: 'Vokabelseite', icon: '📖' },
{ value: 'woerterbuch', label: 'Woerterbuch', icon: '📕' },
{ value: 'buchseite', label: 'Buchseite', icon: '📚' },
{ value: 'arbeitsblatt', label: 'Arbeitsblatt', icon: '📝' },
{ value: 'klausurseite', label: 'Klausurseite', icon: '📄' },
{ value: 'mathearbeit', label: 'Mathearbeit', icon: '🔢' },
{ value: 'statistik', label: 'Statistik', icon: '📊' },
{ value: 'zeitung', label: 'Zeitung', icon: '📰' },
{ value: 'formular', label: 'Formular', icon: '📋' },
{ value: 'handschrift', label: 'Handschrift', icon: '✍️' },
{ value: 'sonstiges', label: 'Sonstiges', icon: '📎' },
]
export interface SessionListItem {
id: string
name: string
filename: string
status: string
current_step: number
document_category?: DocumentCategory
doc_type?: string
parent_session_id?: string
document_group_id?: string
page_number?: number
created_at: string
updated_at?: string
}
/** Box sub-session (from column detection zone_type='box') */
export interface SubSession {
id: string
name: string
box_index: number
current_step?: number
status?: string
}
export interface PipelineLogEntry {
step: string
completed_at: string
success: boolean
duration_ms?: number
metrics: Record<string, unknown>
}
export interface PipelineLog {
steps: PipelineLogEntry[]
}
export interface DocumentTypeResult {
doc_type: 'vocab_table' | 'full_text' | 'generic_table'
confidence: number
pipeline: 'cell_first' | 'full_page'
skip_steps: string[]
features?: Record<string, unknown>
duration_seconds?: number
}
export interface OrientationResult {
orientation_degrees: number
corrected: boolean
duration_seconds: number
}
export interface CropResult {
crop_applied: boolean
crop_rect?: { x: number; y: number; width: number; height: number }
crop_rect_pct?: { x: number; y: number; width: number; height: number }
original_size: { width: number; height: number }
cropped_size: { width: number; height: number }
detected_format?: string
format_confidence?: number
aspect_ratio?: number
border_fractions?: { top: number; bottom: number; left: number; right: number }
skipped?: boolean
duration_seconds?: number
}
export interface SessionInfo {
session_id: string
filename: string
name?: string
image_width: number
image_height: number
original_image_url: string
current_step?: number
document_category?: DocumentCategory
doc_type?: string
orientation_result?: OrientationResult
crop_result?: CropResult
deskew_result?: DeskewResult
dewarp_result?: DewarpResult
column_result?: ColumnResult
row_result?: RowResult
word_result?: GridResult
doc_type_result?: DocumentTypeResult
sub_sessions?: SubSession[]
parent_session_id?: string
box_index?: number
document_group_id?: string
page_number?: number
}
export interface DeskewResult {
session_id: string
angle_hough: number
angle_word_alignment: number
angle_iterative?: number
angle_residual?: number
angle_textline?: number
angle_applied: number
method_used: 'hough' | 'word_alignment' | 'manual' | 'iterative' | 'two_pass' | 'three_pass' | 'manual_combined'
confidence: number
duration_seconds: number
deskewed_image_url: string
binarized_image_url: string
}
export interface DeskewGroundTruth {
is_correct: boolean
corrected_angle?: number
notes?: string
}
export interface DewarpDetection {
method: string
shear_degrees: number
confidence: number
}
export interface DewarpResult {
session_id: string
method_used: string
shear_degrees: number
confidence: number
duration_seconds: number
dewarped_image_url: string
detections?: DewarpDetection[]
}
export interface DewarpGroundTruth {
is_correct: boolean
corrected_shear?: number
notes?: string
}
export interface PageRegion {
type: 'column_en' | 'column_de' | 'column_example' | 'page_ref'
| 'column_marker' | 'column_text' | 'column_ignore' | 'header' | 'footer'
x: number
y: number
width: number
height: number
classification_confidence?: number
classification_method?: string
}
export interface PageZone {
zone_type: 'content' | 'box'
y_start: number
y_end: number
box?: { x: number; y: number; width: number; height: number }
}
export interface ColumnResult {
columns: PageRegion[]
duration_seconds: number
zones?: PageZone[]
}
export interface ColumnGroundTruth {
is_correct: boolean
corrected_columns?: PageRegion[]
notes?: string
}
export interface ManualColumnDivider {
xPercent: number // Position in % of image width (0-100)
}
export type ColumnTypeKey = PageRegion['type']
export interface RowResult {
rows: RowItem[]
summary: Record<string, number>
total_rows: number
duration_seconds: number
}
export interface RowItem {
index: number
x: number
y: number
width: number
height: number
word_count: number
row_type: 'content' | 'header' | 'footer'
gap_before: number
}
export interface RowGroundTruth {
is_correct: boolean
corrected_rows?: RowItem[]
notes?: string
}
export interface StructureGraphic {
x: number
y: number
w: number
h: number
area: number
shape: string // image, illustration
color_name: string
color_hex: string
confidence: number
}
export interface ExcludeRegion {
x: number
y: number
w: number
h: number
label?: string
}
export interface DocLayoutRegion {
x: number
y: number
w: number
h: number
class_name: string
confidence: number
}
export interface StructureResult {
image_width: number
image_height: number
content_bounds: { x: number; y: number; w: number; h: number }
boxes: StructureBox[]
zones: StructureZone[]
graphics: StructureGraphic[]
exclude_regions?: ExcludeRegion[]
color_pixel_counts: Record<string, number>
has_words: boolean
word_count: number
border_ghosts_removed?: number
duration_seconds: number
/** PP-DocLayout regions (only present when method=ppdoclayout) */
layout_regions?: DocLayoutRegion[]
detection_method?: 'opencv' | 'ppdoclayout'
}
export interface StructureBox {
x: number
y: number
w: number
h: number
confidence: number
border_thickness: number
bg_color_name?: string
bg_color_hex?: string
}
export interface StructureZone {
index: number
zone_type: 'content' | 'box'
x: number
y: number
w: number
h: number
}
export interface WordBbox {
x: number
y: number
w: number
h: number
}
export interface OcrWordBox {
text: string
left: number // absolute image x in px
top: number // absolute image y in px
width: number // px
height: number // px
conf: number
color?: string // hex color of detected text, e.g. '#dc2626'
color_name?: string // 'black' | 'red' | 'blue' | 'green' | 'orange' | 'purple' | 'yellow'
recovered?: boolean // true if this word was recovered via color detection
}
export interface GridCell {
cell_id: string // "R03_C1"
row_index: number
col_index: number
col_type: string
text: string
confidence: number
bbox_px: WordBbox
bbox_pct: WordBbox
ocr_engine?: string
is_bold?: boolean
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
word_boxes?: OcrWordBox[] // per-word bounding boxes from OCR engine
}
export interface ColumnMeta {
index: number
type: string
x: number
width: number
}
export interface GridResult {
cells: GridCell[]
grid_shape: { rows: number; cols: number; total_cells: number }
columns_used: ColumnMeta[]
layout: 'vocab' | 'generic'
image_width: number
image_height: number
duration_seconds: number
ocr_engine?: string
vocab_entries?: WordEntry[] // Only when layout='vocab'
entries?: WordEntry[] // Backwards compat alias for vocab_entries
entry_count?: number
summary: {
total_cells: number
non_empty_cells: number
low_confidence: number
// Only when layout='vocab':
total_entries?: number
with_english?: number
with_german?: number
}
llm_review?: {
changes: { row_index: number; field: string; old: string; new: string }[]
model_used: string
duration_ms: number
entries_corrected: number
applied_count?: number
applied_at?: string
}
}
export interface WordEntry {
row_index: number
english: string
german: string
example: string
source_page?: string
marker?: string
confidence: number
bbox: WordBbox
bbox_en: WordBbox | null
bbox_de: WordBbox | null
bbox_ex: WordBbox | null
bbox_ref?: WordBbox | null
bbox_marker?: WordBbox | null
status?: 'pending' | 'confirmed' | 'edited' | 'skipped'
}
/** @deprecated Use GridResult instead */
export interface WordResult {
entries: WordEntry[]
entry_count: number
image_width: number
image_height: number
duration_seconds: number
ocr_engine?: string
summary: {
total_entries: number
with_english: number
with_german: number
low_confidence: number
}
}
export interface WordGroundTruth {
is_correct: boolean
corrected_entries?: WordEntry[]
notes?: string
}
export interface ImageRegion {
bbox_pct: { x: number; y: number; w: number; h: number }
prompt: string
description: string
image_b64: string | null
style: 'educational' | 'cartoon' | 'sketch' | 'clipart' | 'realistic'
}
export type ImageStyle = ImageRegion['style']
export const IMAGE_STYLES: { value: ImageStyle; label: string }[] = [
{ value: 'educational', label: 'Lehrbuch' },
{ value: 'cartoon', label: 'Cartoon' },
{ value: 'sketch', label: 'Skizze' },
{ value: 'clipart', label: 'Clipart' },
{ value: 'realistic', label: 'Realistisch' },
]
export const PIPELINE_STEPS: PipelineStep[] = [
{ id: 'orientation', name: 'Orientierung', icon: '🔄', status: 'pending' },
{ id: 'deskew', name: 'Begradigung', icon: '📐', status: 'pending' },
{ id: 'dewarp', name: 'Entzerrung', icon: '🔧', status: 'pending' },
{ id: 'crop', name: 'Zuschneiden', icon: '✂️', status: 'pending' },
{ id: 'columns', name: 'Spalten', icon: '📊', status: 'pending' },
{ id: 'rows', name: 'Zeilen', icon: '📏', status: 'pending' },
{ id: 'words', name: 'Woerter', icon: '🔤', status: 'pending' },
{ id: 'structure', name: 'Struktur', icon: '🔍', status: 'pending' },
{ id: 'llm-review', name: 'Korrektur', icon: '✏️', status: 'pending' },
{ id: 'reconstruction', name: 'Rekonstruktion', icon: '🏗️', status: 'pending' },
{ id: 'ground-truth', name: 'Validierung', icon: '✅', status: 'pending' },
]

View File

@@ -1,225 +0,0 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
import { useRouter, useSearchParams } from 'next/navigation'
import { PIPELINE_STEPS, type PipelineStep, type PipelineStepStatus, type DocumentTypeResult } from './types'
const KLAUSUR_API = '/klausur-api'
export interface PipelineNav {
sessionId: string | null
currentStepIndex: number
currentStepId: string
steps: PipelineStep[]
docTypeResult: DocumentTypeResult | null
goToNextStep: () => void
goToStep: (index: number) => void
goToSession: (sessionId: string) => void
goToSessionList: () => void
setDocType: (result: DocumentTypeResult) => void
reprocessFromStep: (uiStep: number) => Promise<void>
}
const STEP_NAMES: Record<number, string> = {
1: 'Orientierung', 2: 'Begradigung', 3: 'Entzerrung', 4: 'Zuschneiden',
5: 'Spalten', 6: 'Zeilen', 7: 'Woerter', 8: 'Struktur',
9: 'Korrektur', 10: 'Rekonstruktion', 11: 'Validierung',
}
function buildSteps(uiStep: number, skipSteps: string[]): PipelineStep[] {
return PIPELINE_STEPS.map((s, i) => ({
...s,
status: (
skipSteps.includes(s.id) ? 'skipped'
: i < uiStep ? 'completed'
: i === uiStep ? 'active'
: 'pending'
) as PipelineStepStatus,
}))
}
export function usePipelineNavigation(): PipelineNav {
const router = useRouter()
const searchParams = useSearchParams()
const paramSession = searchParams.get('session')
const paramStep = searchParams.get('step')
const [sessionId, setSessionId] = useState<string | null>(paramSession)
const [currentStepIndex, setCurrentStepIndex] = useState(0)
const [docTypeResult, setDocTypeResult] = useState<DocumentTypeResult | null>(null)
const [steps, setSteps] = useState<PipelineStep[]>(buildSteps(0, []))
const [loaded, setLoaded] = useState(false)
// Load session info when session param changes
useEffect(() => {
if (!paramSession) {
setSessionId(null)
setCurrentStepIndex(0)
setDocTypeResult(null)
setSteps(buildSteps(0, []))
setLoaded(true)
return
}
const load = async () => {
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${paramSession}`)
if (!res.ok) return
const data = await res.json()
setSessionId(paramSession)
const savedDocType: DocumentTypeResult | null = data.doc_type_result || null
setDocTypeResult(savedDocType)
const dbStep = data.current_step || 1
let uiStep = Math.max(0, dbStep - 1)
const skipSteps = [...(savedDocType?.skip_steps || [])]
// Box sub-sessions (from column detection) skip pre-processing
const isBoxSubSession = !!data.parent_session_id
if (isBoxSubSession && dbStep >= 5) {
const SUB_SESSION_SKIP = ['orientation', 'deskew', 'dewarp', 'crop']
for (const s of SUB_SESSION_SKIP) {
if (!skipSteps.includes(s)) skipSteps.push(s)
}
if (uiStep < 4) uiStep = 4
}
// If URL has a step param, use that instead
if (paramStep) {
const stepIdx = PIPELINE_STEPS.findIndex(s => s.id === paramStep)
if (stepIdx >= 0) uiStep = stepIdx
}
setCurrentStepIndex(uiStep)
setSteps(buildSteps(uiStep, skipSteps))
} catch (e) {
console.error('Failed to load session:', e)
} finally {
setLoaded(true)
}
}
load()
}, [paramSession, paramStep])
const updateUrl = useCallback((sid: string | null, stepIdx?: number) => {
if (!sid) {
router.push('/ai/ocr-pipeline')
return
}
const stepId = stepIdx !== undefined ? PIPELINE_STEPS[stepIdx]?.id : undefined
const params = new URLSearchParams()
params.set('session', sid)
if (stepId) params.set('step', stepId)
router.push(`/ai/ocr-pipeline?${params.toString()}`)
}, [router])
const goToNextStep = useCallback(() => {
if (currentStepIndex >= steps.length - 1) {
// Last step — return to session list
setSessionId(null)
setCurrentStepIndex(0)
setDocTypeResult(null)
setSteps(buildSteps(0, []))
router.push('/ai/ocr-pipeline')
return
}
const skipSteps = docTypeResult?.skip_steps || []
let nextStep = currentStepIndex + 1
while (nextStep < steps.length && skipSteps.includes(PIPELINE_STEPS[nextStep]?.id)) {
nextStep++
}
if (nextStep >= steps.length) nextStep = steps.length - 1
setSteps(prev =>
prev.map((s, i) => {
if (i === currentStepIndex) return { ...s, status: 'completed' as PipelineStepStatus }
if (i === nextStep) return { ...s, status: 'active' as PipelineStepStatus }
if (i > currentStepIndex && i < nextStep && skipSteps.includes(PIPELINE_STEPS[i]?.id)) {
return { ...s, status: 'skipped' as PipelineStepStatus }
}
return s
}),
)
setCurrentStepIndex(nextStep)
if (sessionId) updateUrl(sessionId, nextStep)
}, [currentStepIndex, steps.length, docTypeResult, sessionId, updateUrl, router])
const goToStep = useCallback((index: number) => {
setCurrentStepIndex(index)
setSteps(prev =>
prev.map((s, i) => ({
...s,
status: s.status === 'skipped' ? 'skipped'
: i < index ? 'completed'
: i === index ? 'active'
: 'pending' as PipelineStepStatus,
})),
)
if (sessionId) updateUrl(sessionId, index)
}, [sessionId, updateUrl])
const goToSession = useCallback((sid: string) => {
updateUrl(sid)
}, [updateUrl])
const goToSessionList = useCallback(() => {
setSessionId(null)
setCurrentStepIndex(0)
setDocTypeResult(null)
setSteps(buildSteps(0, []))
router.push('/ai/ocr-pipeline')
}, [router])
const setDocType = useCallback((result: DocumentTypeResult) => {
setDocTypeResult(result)
const skipSteps = result.skip_steps || []
if (skipSteps.length > 0) {
setSteps(prev =>
prev.map(s =>
skipSteps.includes(s.id) ? { ...s, status: 'skipped' as PipelineStepStatus } : s,
),
)
}
}, [])
const reprocessFromStep = useCallback(async (uiStep: number) => {
if (!sessionId) return
const dbStep = uiStep + 1
if (!confirm(`Ab Schritt ${dbStep} (${STEP_NAMES[dbStep] || '?'}) neu verarbeiten? Nachfolgende Daten werden geloescht.`)) return
try {
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/reprocess`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ from_step: dbStep }),
})
if (!res.ok) {
const data = await res.json().catch(() => ({}))
console.error('Reprocess failed:', data.detail || res.status)
return
}
goToStep(uiStep)
} catch (e) {
console.error('Reprocess error:', e)
}
}, [sessionId, goToStep])
return {
sessionId,
currentStepIndex,
currentStepId: PIPELINE_STEPS[currentStepIndex]?.id || 'orientation',
steps,
docTypeResult,
goToNextStep,
goToStep,
goToSession,
goToSessionList,
setDocType,
reprocessFromStep,
}
}

View File

@@ -0,0 +1,74 @@
import type { ChunkDetail } from './types'
interface ResultsListProps {
results: ChunkDetail[]
selectedChunk: ChunkDetail | null
searchQuery: string
onSelect: (chunk: ChunkDetail) => void
}
function highlightText(text: string, query: string) {
if (!query) return text
const words = query.toLowerCase().split(' ').filter(w => w.length > 2)
let result = text
words.forEach(word => {
const regex = new RegExp(`(${word})`, 'gi')
result = result.replace(regex, '<mark class="bg-yellow-200 dark:bg-yellow-800 px-0.5 rounded">$1</mark>')
})
return result
}
export function ResultsList({ results, selectedChunk, searchQuery, onSelect }: ResultsListProps) {
return (
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4">
<h3 className="text-md font-semibold text-gray-900 dark:text-white mb-4">
Gefundene Chunks ({results.length})
</h3>
<div className="space-y-3 max-h-[600px] overflow-y-auto">
{results.map((result, idx) => (
<div
key={idx}
onClick={() => onSelect(result)}
className={`p-4 border rounded-lg cursor-pointer transition-all ${
selectedChunk?.text === result.text
? 'border-blue-500 bg-blue-50 dark:bg-blue-900/20'
: 'border-gray-200 dark:border-slate-700 hover:border-gray-300 dark:hover:border-slate-600'
}`}
>
{/* Header */}
<div className="flex items-center justify-between mb-2">
<div className="flex items-center gap-2">
<span className="text-xs font-medium px-2 py-0.5 bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400 rounded">
{result.regulation_code}
</span>
{result.article && (
<span className="text-xs text-gray-500 dark:text-gray-400">
Art. {result.article}
{result.paragraph && ` Abs. ${result.paragraph}`}
</span>
)}
</div>
<span className="text-xs text-gray-400">
Score: {(result.score || 0).toFixed(3)}
</span>
</div>
{/* Text Preview */}
<p
className="text-sm text-gray-700 dark:text-gray-300 line-clamp-4"
dangerouslySetInnerHTML={{
__html: highlightText(result.text.substring(0, 400) + (result.text.length > 400 ? '...' : ''), searchQuery)
}}
/>
{/* Metadata */}
<div className="mt-2 flex items-center gap-4 text-xs text-gray-400">
<span>Chunk #{result.chunk_index || idx}</span>
<span>{result.text.length} Zeichen</span>
</div>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,112 @@
import { REGULATIONS, SAMPLE_QUERIES } from './types'
interface SearchSectionProps {
searchQuery: string
selectedRegulation: string
topK: number
searching: boolean
onSearchQueryChange: (v: string) => void
onRegulationChange: (v: string) => void
onTopKChange: (v: number) => void
onSearch: () => void
onSampleQuery: (query: string, reg: string) => void
}
export function SearchSection({
searchQuery,
selectedRegulation,
topK,
searching,
onSearchQueryChange,
onRegulationChange,
onTopKChange,
onSearch,
onSampleQuery,
}: SearchSectionProps) {
return (
<>
{/* Quick Sample Queries */}
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4">
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3">
Schnell-Stichproben
</h3>
<div className="flex flex-wrap gap-2">
{SAMPLE_QUERIES.map((sq, idx) => (
<button
key={idx}
onClick={() => onSampleQuery(sq.query, sq.reg)}
className="px-3 py-1.5 text-xs bg-gray-100 hover:bg-gray-200 dark:bg-slate-700 dark:hover:bg-slate-600 text-gray-700 dark:text-gray-300 rounded-full transition-colors"
>
{sq.label}
</button>
))}
</div>
</div>
{/* Search Section */}
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-6">
<h2 className="text-lg font-semibold text-gray-900 dark:text-white mb-4">
Chunk-Suche
</h2>
<div className="space-y-4">
{/* Search Input */}
<div className="flex gap-4">
<div className="flex-1">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Suchbegriff / Paragraph / Artikeltext
</label>
<input
type="text"
value={searchQuery}
onChange={(e) => onSearchQueryChange(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && onSearch()}
placeholder="z.B. 'Recht auf Löschung' oder 'Art. 17 Abs. 1'"
className="w-full px-4 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
/>
</div>
<div className="w-48">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Regulierung
</label>
<select
value={selectedRegulation}
onChange={(e) => onRegulationChange(e.target.value)}
className="w-full px-3 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
>
<option value="">Alle</option>
{REGULATIONS.map((reg) => (
<option key={reg.code} value={reg.code}>
{reg.name}
</option>
))}
</select>
</div>
<div className="w-24">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Anzahl
</label>
<select
value={topK}
onChange={(e) => onTopKChange(parseInt(e.target.value))}
className="w-full px-3 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
>
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
</select>
</div>
</div>
<button
onClick={onSearch}
disabled={searching || !searchQuery.trim()}
className="px-6 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
{searching ? 'Suche laeuft...' : 'Suchen'}
</button>
</div>
</div>
</>
)
}

View File

@@ -0,0 +1,150 @@
import type { ChunkDetail, TraceabilityResult } from './types'
interface TraceabilityPanelProps {
selectedChunk: ChunkDetail | null
loadingTrace: boolean
traceability: TraceabilityResult | null
}
export function TraceabilityPanel({ selectedChunk, loadingTrace, traceability }: TraceabilityPanelProps) {
return (
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4">
<h3 className="text-md font-semibold text-gray-900 dark:text-white mb-4">
Traceability
</h3>
{!selectedChunk ? (
<div className="text-center py-12 text-gray-500 dark:text-gray-400">
<svg className="w-12 h-12 mx-auto mb-4 opacity-50" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
<p>Waehlen Sie einen Chunk aus der Liste, um die Traceability zu sehen.</p>
</div>
) : loadingTrace ? (
<div className="text-center py-12">
<div className="animate-spin w-8 h-8 border-2 border-blue-500 border-t-transparent rounded-full mx-auto mb-4"></div>
<p className="text-gray-500 dark:text-gray-400">Lade Traceability...</p>
</div>
) : traceability ? (
<div className="space-y-6">
{/* Selected Chunk Detail */}
<ChunkDetailSection chunk={traceability.chunk} />
<ArrowDown />
{/* Requirements */}
<RequirementsSection requirements={traceability.requirements} />
<ArrowDown />
{/* Controls */}
<ControlsSection controls={traceability.controls} />
</div>
) : null}
</div>
)
}
function ChunkDetailSection({ chunk }: { chunk: ChunkDetail }) {
return (
<div className="border-l-4 border-blue-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Ausgewaehlter Chunk
</h4>
<div className="bg-gray-50 dark:bg-slate-700 rounded p-3">
<div className="flex items-center gap-2 mb-2">
<span className="text-xs font-medium px-2 py-0.5 bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400 rounded">
{chunk.regulation_code}
</span>
{chunk.article && (
<span className="text-xs text-gray-500 dark:text-gray-400">
Art. {chunk.article}
{chunk.paragraph && ` Abs. ${chunk.paragraph}`}
</span>
)}
</div>
<p className="text-sm text-gray-600 dark:text-gray-300 whitespace-pre-wrap">
{chunk.text}
</p>
{chunk.source_url && (
<a
href={chunk.source_url}
target="_blank"
rel="noopener noreferrer"
className="mt-2 inline-flex items-center gap-1 text-xs text-blue-600 hover:underline"
>
Quelle oeffnen
</a>
)}
</div>
</div>
)
}
function RequirementsSection({ requirements }: { requirements: TraceabilityResult['requirements'] }) {
return (
<div className="border-l-4 border-orange-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Extrahierte Anforderungen ({requirements.length})
</h4>
{requirements.length > 0 ? (
<div className="space-y-2">
{requirements.map((req, idx) => (
<div key={idx} className="bg-orange-50 dark:bg-orange-900/20 rounded p-3">
<div className="flex items-center gap-2 mb-1">
<span className="text-xs font-medium text-orange-700 dark:text-orange-400">
{req.category || 'Anforderung'}
</span>
</div>
<p className="text-sm text-gray-600 dark:text-gray-300">{req.text}</p>
</div>
))}
</div>
) : (
<p className="text-sm text-gray-500 dark:text-gray-400 italic">
Keine Anforderungen aus diesem Chunk extrahiert.
<br />
<span className="text-xs">(Requirements-Extraktion ist noch nicht implementiert)</span>
</p>
)}
</div>
)
}
function ControlsSection({ controls }: { controls: TraceabilityResult['controls'] }) {
return (
<div className="border-l-4 border-green-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Abgeleitete Controls ({controls.length})
</h4>
{controls.length > 0 ? (
<div className="space-y-2">
{controls.map((ctrl, idx) => (
<div key={idx} className="bg-green-50 dark:bg-green-900/20 rounded p-3">
<div className="font-medium text-sm text-green-700 dark:text-green-400 mb-1">
{ctrl.name}
</div>
<p className="text-sm text-gray-600 dark:text-gray-300">{ctrl.description}</p>
</div>
))}
</div>
) : (
<p className="text-sm text-gray-500 dark:text-gray-400 italic">
Keine Controls aus diesem Chunk abgeleitet.
<br />
<span className="text-xs">(Control-Ableitung ist noch nicht implementiert)</span>
</p>
)}
</div>
)
}
function ArrowDown() {
return (
<div className="flex justify-center">
<svg className="w-6 h-6 text-gray-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 14l-7 7m0 0l-7-7m7 7V3" />
</svg>
</div>
)
}

View File

@@ -0,0 +1,66 @@
export interface ChunkDetail {
id: string
text: string
regulation_code: string
regulation_name: string
article: string | null
paragraph: string | null
chunk_index: number
chunk_position: 'beginning' | 'middle' | 'end'
source_url: string
score?: number
}
export interface Requirement {
id: string
text: string
category: string
source_chunk_id: string
regulation_code: string
}
export interface Control {
id: string
name: string
description: string
source_requirement_ids: string[]
regulation_codes: string[]
}
export interface TraceabilityResult {
chunk: ChunkDetail
requirements: Requirement[]
controls: Control[]
}
export const API_PROXY = '/api/legal-corpus'
export const REGULATIONS = [
{ code: 'GDPR', name: 'DSGVO' },
{ code: 'EPRIVACY', name: 'ePrivacy' },
{ code: 'TDDDG', name: 'TDDDG' },
{ code: 'SCC', name: 'Standardvertragsklauseln' },
{ code: 'DPF', name: 'EU-US DPF' },
{ code: 'AIACT', name: 'EU AI Act' },
{ code: 'CRA', name: 'Cyber Resilience Act' },
{ code: 'NIS2', name: 'NIS2' },
{ code: 'EUCSA', name: 'EU Cybersecurity Act' },
{ code: 'DATAACT', name: 'Data Act' },
{ code: 'DGA', name: 'Data Governance Act' },
{ code: 'DSA', name: 'Digital Services Act' },
{ code: 'EAA', name: 'Accessibility Act' },
{ code: 'DSM', name: 'DSM-Urheberrecht' },
{ code: 'PLD', name: 'Produkthaftung' },
{ code: 'GPSR', name: 'Product Safety' },
{ code: 'BSI-TR-03161-1', name: 'BSI-TR Teil 1' },
{ code: 'BSI-TR-03161-2', name: 'BSI-TR Teil 2' },
{ code: 'BSI-TR-03161-3', name: 'BSI-TR Teil 3' },
]
export const SAMPLE_QUERIES = [
{ label: 'Art. 17 DSGVO (Recht auf Loeschung)', query: 'Recht auf Löschung Artikel 17', reg: 'GDPR' },
{ label: 'Einwilligung TDDDG', query: 'Einwilligung Endeinrichtung speichern', reg: 'TDDDG' },
{ label: 'AI Act Hochrisiko', query: 'Hochrisiko-KI-System Anforderungen', reg: 'AIACT' },
{ label: 'NIS2 Sicherheitsmaßnahmen', query: 'Cybersicherheitsrisikomanagement Maßnahmen', reg: 'NIS2' },
{ label: 'BSI Authentifizierung', query: 'Authentifizierung Zwei-Faktor mobile', reg: 'BSI-TR-03161-1' },
]

View File

@@ -0,0 +1,93 @@
import { useState, useCallback } from 'react'
import type { ChunkDetail, TraceabilityResult } from './types'
import { API_PROXY } from './types'
export function useQualitySearch() {
const [searchQuery, setSearchQuery] = useState('')
const [searchResults, setSearchResults] = useState<ChunkDetail[]>([])
const [searching, setSearching] = useState(false)
const [selectedRegulation, setSelectedRegulation] = useState<string>('')
const [topK, setTopK] = useState(10)
const [selectedChunk, setSelectedChunk] = useState<ChunkDetail | null>(null)
const [traceability, setTraceability] = useState<TraceabilityResult | null>(null)
const [loadingTrace, setLoadingTrace] = useState(false)
const handleSearch = useCallback(async () => {
if (!searchQuery.trim()) return
setSearching(true)
setSearchResults([])
setSelectedChunk(null)
setTraceability(null)
try {
let url = `${API_PROXY}?action=search&query=${encodeURIComponent(searchQuery)}&top_k=${topK}`
if (selectedRegulation) {
url += `&regulations=${encodeURIComponent(selectedRegulation)}`
}
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setSearchResults(data.results || [])
}
} catch (error) {
console.error('Search failed:', error)
} finally {
setSearching(false)
}
}, [searchQuery, selectedRegulation, topK])
const loadTraceability = useCallback(async (chunk: ChunkDetail) => {
setSelectedChunk(chunk)
setLoadingTrace(true)
try {
const res = await fetch(
`${API_PROXY}?action=traceability&chunk_id=${encodeURIComponent(chunk.id || chunk.regulation_code + '_' + chunk.chunk_index)}&regulation=${encodeURIComponent(chunk.regulation_code)}`
)
if (res.ok) {
const data = await res.json()
setTraceability({
chunk,
requirements: data.requirements || [],
controls: data.controls || [],
})
} else {
setTraceability({ chunk, requirements: [], controls: [] })
}
} catch (error) {
console.error('Failed to load traceability:', error)
setTraceability({ chunk, requirements: [], controls: [] })
} finally {
setLoadingTrace(false)
}
}, [])
const handleSampleQuery = (query: string, reg: string) => {
setSearchQuery(query)
setSelectedRegulation(reg)
setTimeout(() => {
handleSearch()
}, 100)
}
return {
searchQuery,
setSearchQuery,
searchResults,
searching,
selectedRegulation,
setSelectedRegulation,
topK,
setTopK,
selectedChunk,
traceability,
loadingTrace,
handleSearch,
loadTraceability,
handleSampleQuery,
}
}

View File

@@ -5,184 +5,34 @@
* *
* Ermoeglicht Auditoren: * Ermoeglicht Auditoren:
* - Chunk-Suche und Stichproben * - Chunk-Suche und Stichproben
* - Traceability: Chunk Requirement Control * - Traceability: Chunk -> Requirement -> Control
* - Dokumenten-Vollstaendigkeitspruefung * - Dokumenten-Vollstaendigkeitspruefung
*/ */
import { useState, useCallback } from 'react'
import Link from 'next/link' import Link from 'next/link'
import { PagePurpose } from '@/components/common/PagePurpose' import { PagePurpose } from '@/components/common/PagePurpose'
import { useQualitySearch } from './_components/useQualitySearch'
const API_PROXY = '/api/legal-corpus' import { SearchSection } from './_components/SearchSection'
import { ResultsList } from './_components/ResultsList'
// Types import { TraceabilityPanel } from './_components/TraceabilityPanel'
interface ChunkDetail {
id: string
text: string
regulation_code: string
regulation_name: string
article: string | null
paragraph: string | null
chunk_index: number
chunk_position: 'beginning' | 'middle' | 'end'
source_url: string
score?: number
}
interface Requirement {
id: string
text: string
category: string
source_chunk_id: string
regulation_code: string
}
interface Control {
id: string
name: string
description: string
source_requirement_ids: string[]
regulation_codes: string[]
}
interface TraceabilityResult {
chunk: ChunkDetail
requirements: Requirement[]
controls: Control[]
}
// Regulations for filtering
const REGULATIONS = [
{ code: 'GDPR', name: 'DSGVO' },
{ code: 'EPRIVACY', name: 'ePrivacy' },
{ code: 'TDDDG', name: 'TDDDG' },
{ code: 'SCC', name: 'Standardvertragsklauseln' },
{ code: 'DPF', name: 'EU-US DPF' },
{ code: 'AIACT', name: 'EU AI Act' },
{ code: 'CRA', name: 'Cyber Resilience Act' },
{ code: 'NIS2', name: 'NIS2' },
{ code: 'EUCSA', name: 'EU Cybersecurity Act' },
{ code: 'DATAACT', name: 'Data Act' },
{ code: 'DGA', name: 'Data Governance Act' },
{ code: 'DSA', name: 'Digital Services Act' },
{ code: 'EAA', name: 'Accessibility Act' },
{ code: 'DSM', name: 'DSM-Urheberrecht' },
{ code: 'PLD', name: 'Produkthaftung' },
{ code: 'GPSR', name: 'Product Safety' },
{ code: 'BSI-TR-03161-1', name: 'BSI-TR Teil 1' },
{ code: 'BSI-TR-03161-2', name: 'BSI-TR Teil 2' },
{ code: 'BSI-TR-03161-3', name: 'BSI-TR Teil 3' },
]
const TYPE_COLORS: Record<string, string> = {
eu_regulation: 'bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400',
eu_directive: 'bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-400',
de_law: 'bg-yellow-100 text-yellow-700 dark:bg-yellow-900/30 dark:text-yellow-400',
bsi_standard: 'bg-green-100 text-green-700 dark:bg-green-900/30 dark:text-green-400',
}
export default function QualityPage() { export default function QualityPage() {
// Search state const {
const [searchQuery, setSearchQuery] = useState('') searchQuery,
const [searchResults, setSearchResults] = useState<ChunkDetail[]>([]) setSearchQuery,
const [searching, setSearching] = useState(false) searchResults,
const [selectedRegulation, setSelectedRegulation] = useState<string>('') searching,
const [topK, setTopK] = useState(10) selectedRegulation,
setSelectedRegulation,
// Traceability state topK,
const [selectedChunk, setSelectedChunk] = useState<ChunkDetail | null>(null) setTopK,
const [traceability, setTraceability] = useState<TraceabilityResult | null>(null) selectedChunk,
const [loadingTrace, setLoadingTrace] = useState(false) traceability,
loadingTrace,
// Quick sample queries for auditors handleSearch,
const sampleQueries = [ loadTraceability,
{ label: 'Art. 17 DSGVO (Recht auf Loeschung)', query: 'Recht auf Löschung Artikel 17', reg: 'GDPR' }, handleSampleQuery,
{ label: 'Einwilligung TDDDG', query: 'Einwilligung Endeinrichtung speichern', reg: 'TDDDG' }, } = useQualitySearch()
{ label: 'AI Act Hochrisiko', query: 'Hochrisiko-KI-System Anforderungen', reg: 'AIACT' },
{ label: 'NIS2 Sicherheitsmaßnahmen', query: 'Cybersicherheitsrisikomanagement Maßnahmen', reg: 'NIS2' },
{ label: 'BSI Authentifizierung', query: 'Authentifizierung Zwei-Faktor mobile', reg: 'BSI-TR-03161-1' },
]
const handleSearch = useCallback(async () => {
if (!searchQuery.trim()) return
setSearching(true)
setSearchResults([])
setSelectedChunk(null)
setTraceability(null)
try {
let url = `${API_PROXY}?action=search&query=${encodeURIComponent(searchQuery)}&top_k=${topK}`
if (selectedRegulation) {
url += `&regulations=${encodeURIComponent(selectedRegulation)}`
}
const res = await fetch(url)
if (res.ok) {
const data = await res.json()
setSearchResults(data.results || [])
}
} catch (error) {
console.error('Search failed:', error)
} finally {
setSearching(false)
}
}, [searchQuery, selectedRegulation, topK])
const loadTraceability = useCallback(async (chunk: ChunkDetail) => {
setSelectedChunk(chunk)
setLoadingTrace(true)
try {
// Try to load traceability (requirements and controls derived from this chunk)
const res = await fetch(`${API_PROXY}?action=traceability&chunk_id=${encodeURIComponent(chunk.id || chunk.regulation_code + '_' + chunk.chunk_index)}&regulation=${encodeURIComponent(chunk.regulation_code)}`)
if (res.ok) {
const data = await res.json()
setTraceability({
chunk,
requirements: data.requirements || [],
controls: data.controls || [],
})
} else {
// If traceability endpoint doesn't exist yet, show placeholder
setTraceability({
chunk,
requirements: [],
controls: [],
})
}
} catch (error) {
console.error('Failed to load traceability:', error)
setTraceability({
chunk,
requirements: [],
controls: [],
})
} finally {
setLoadingTrace(false)
}
}, [])
const handleSampleQuery = (query: string, reg: string) => {
setSearchQuery(query)
setSelectedRegulation(reg)
// Auto-search after setting
setTimeout(() => {
handleSearch()
}, 100)
}
const highlightText = (text: string, query: string) => {
if (!query) return text
const words = query.toLowerCase().split(' ').filter(w => w.length > 2)
let result = text
words.forEach(word => {
const regex = new RegExp(`(${word})`, 'gi')
result = result.replace(regex, '<mark class="bg-yellow-200 dark:bg-yellow-800 px-0.5 rounded">$1</mark>')
})
return result
}
return ( return (
<div className="space-y-6"> <div className="space-y-6">
@@ -214,265 +64,32 @@ export default function QualityPage() {
}} }}
/> />
{/* Quick Sample Queries */} <SearchSection
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4"> searchQuery={searchQuery}
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3"> selectedRegulation={selectedRegulation}
Schnell-Stichproben topK={topK}
</h3> searching={searching}
<div className="flex flex-wrap gap-2"> onSearchQueryChange={setSearchQuery}
{sampleQueries.map((sq, idx) => ( onRegulationChange={setSelectedRegulation}
<button onTopKChange={setTopK}
key={idx} onSearch={handleSearch}
onClick={() => handleSampleQuery(sq.query, sq.reg)} onSampleQuery={handleSampleQuery}
className="px-3 py-1.5 text-xs bg-gray-100 hover:bg-gray-200 dark:bg-slate-700 dark:hover:bg-slate-600 text-gray-700 dark:text-gray-300 rounded-full transition-colors" />
>
{sq.label}
</button>
))}
</div>
</div>
{/* Search Section */}
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-6">
<h2 className="text-lg font-semibold text-gray-900 dark:text-white mb-4">
Chunk-Suche
</h2>
<div className="space-y-4">
{/* Search Input */}
<div className="flex gap-4">
<div className="flex-1">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Suchbegriff / Paragraph / Artikeltext
</label>
<input
type="text"
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && handleSearch()}
placeholder="z.B. 'Recht auf Löschung' oder 'Art. 17 Abs. 1'"
className="w-full px-4 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
/>
</div>
<div className="w-48">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Regulierung
</label>
<select
value={selectedRegulation}
onChange={(e) => setSelectedRegulation(e.target.value)}
className="w-full px-3 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
>
<option value="">Alle</option>
{REGULATIONS.map((reg) => (
<option key={reg.code} value={reg.code}>
{reg.name}
</option>
))}
</select>
</div>
<div className="w-24">
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Anzahl
</label>
<select
value={topK}
onChange={(e) => setTopK(parseInt(e.target.value))}
className="w-full px-3 py-2 border border-gray-300 dark:border-slate-600 rounded-lg focus:ring-2 focus:ring-blue-500 dark:bg-slate-700 dark:text-white"
>
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
</select>
</div>
</div>
<button
onClick={handleSearch}
disabled={searching || !searchQuery.trim()}
className="px-6 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
{searching ? 'Suche laeuft...' : 'Suchen'}
</button>
</div>
</div>
{/* Results Grid */} {/* Results Grid */}
{searchResults.length > 0 && ( {searchResults.length > 0 && (
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6"> <div className="grid grid-cols-1 lg:grid-cols-2 gap-6">
{/* Search Results List */} <ResultsList
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4"> results={searchResults}
<h3 className="text-md font-semibold text-gray-900 dark:text-white mb-4"> selectedChunk={selectedChunk}
Gefundene Chunks ({searchResults.length}) searchQuery={searchQuery}
</h3> onSelect={loadTraceability}
<div className="space-y-3 max-h-[600px] overflow-y-auto"> />
{searchResults.map((result, idx) => ( <TraceabilityPanel
<div selectedChunk={selectedChunk}
key={idx} loadingTrace={loadingTrace}
onClick={() => loadTraceability(result)} traceability={traceability}
className={`p-4 border rounded-lg cursor-pointer transition-all ${ />
selectedChunk?.text === result.text
? 'border-blue-500 bg-blue-50 dark:bg-blue-900/20'
: 'border-gray-200 dark:border-slate-700 hover:border-gray-300 dark:hover:border-slate-600'
}`}
>
{/* Header */}
<div className="flex items-center justify-between mb-2">
<div className="flex items-center gap-2">
<span className="text-xs font-medium px-2 py-0.5 bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400 rounded">
{result.regulation_code}
</span>
{result.article && (
<span className="text-xs text-gray-500 dark:text-gray-400">
Art. {result.article}
{result.paragraph && ` Abs. ${result.paragraph}`}
</span>
)}
</div>
<span className="text-xs text-gray-400">
Score: {(result.score || 0).toFixed(3)}
</span>
</div>
{/* Text Preview */}
<p
className="text-sm text-gray-700 dark:text-gray-300 line-clamp-4"
dangerouslySetInnerHTML={{
__html: highlightText(result.text.substring(0, 400) + (result.text.length > 400 ? '...' : ''), searchQuery)
}}
/>
{/* Metadata */}
<div className="mt-2 flex items-center gap-4 text-xs text-gray-400">
<span>Chunk #{result.chunk_index || idx}</span>
<span>{result.text.length} Zeichen</span>
</div>
</div>
))}
</div>
</div>
{/* Traceability Panel */}
<div className="bg-white dark:bg-slate-800 rounded-lg shadow p-4">
<h3 className="text-md font-semibold text-gray-900 dark:text-white mb-4">
Traceability
</h3>
{!selectedChunk ? (
<div className="text-center py-12 text-gray-500 dark:text-gray-400">
<svg className="w-12 h-12 mx-auto mb-4 opacity-50" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
<p>Waehlen Sie einen Chunk aus der Liste, um die Traceability zu sehen.</p>
</div>
) : loadingTrace ? (
<div className="text-center py-12">
<div className="animate-spin w-8 h-8 border-2 border-blue-500 border-t-transparent rounded-full mx-auto mb-4"></div>
<p className="text-gray-500 dark:text-gray-400">Lade Traceability...</p>
</div>
) : traceability ? (
<div className="space-y-6">
{/* Selected Chunk Detail */}
<div className="border-l-4 border-blue-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
📄 Ausgewaehlter Chunk
</h4>
<div className="bg-gray-50 dark:bg-slate-700 rounded p-3">
<div className="flex items-center gap-2 mb-2">
<span className="text-xs font-medium px-2 py-0.5 bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-400 rounded">
{traceability.chunk.regulation_code}
</span>
{traceability.chunk.article && (
<span className="text-xs text-gray-500 dark:text-gray-400">
Art. {traceability.chunk.article}
{traceability.chunk.paragraph && ` Abs. ${traceability.chunk.paragraph}`}
</span>
)}
</div>
<p className="text-sm text-gray-600 dark:text-gray-300 whitespace-pre-wrap">
{traceability.chunk.text}
</p>
{traceability.chunk.source_url && (
<a
href={traceability.chunk.source_url}
target="_blank"
rel="noopener noreferrer"
className="mt-2 inline-flex items-center gap-1 text-xs text-blue-600 hover:underline"
>
🔗 Quelle oeffnen
</a>
)}
</div>
</div>
{/* Arrow Down */}
<div className="flex justify-center">
<svg className="w-6 h-6 text-gray-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 14l-7 7m0 0l-7-7m7 7V3" />
</svg>
</div>
{/* Requirements */}
<div className="border-l-4 border-orange-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
📋 Extrahierte Anforderungen ({traceability.requirements.length})
</h4>
{traceability.requirements.length > 0 ? (
<div className="space-y-2">
{traceability.requirements.map((req, idx) => (
<div key={idx} className="bg-orange-50 dark:bg-orange-900/20 rounded p-3">
<div className="flex items-center gap-2 mb-1">
<span className="text-xs font-medium text-orange-700 dark:text-orange-400">
{req.category || 'Anforderung'}
</span>
</div>
<p className="text-sm text-gray-600 dark:text-gray-300">{req.text}</p>
</div>
))}
</div>
) : (
<p className="text-sm text-gray-500 dark:text-gray-400 italic">
Keine Anforderungen aus diesem Chunk extrahiert.
<br />
<span className="text-xs">(Requirements-Extraktion ist noch nicht implementiert)</span>
</p>
)}
</div>
{/* Arrow Down */}
<div className="flex justify-center">
<svg className="w-6 h-6 text-gray-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 14l-7 7m0 0l-7-7m7 7V3" />
</svg>
</div>
{/* Controls */}
<div className="border-l-4 border-green-500 pl-4">
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-2">
Abgeleitete Controls ({traceability.controls.length})
</h4>
{traceability.controls.length > 0 ? (
<div className="space-y-2">
{traceability.controls.map((ctrl, idx) => (
<div key={idx} className="bg-green-50 dark:bg-green-900/20 rounded p-3">
<div className="font-medium text-sm text-green-700 dark:text-green-400 mb-1">
{ctrl.name}
</div>
<p className="text-sm text-gray-600 dark:text-gray-300">{ctrl.description}</p>
</div>
))}
</div>
) : (
<p className="text-sm text-gray-500 dark:text-gray-400 italic">
Keine Controls aus diesem Chunk abgeleitet.
<br />
<span className="text-xs">(Control-Ableitung ist noch nicht implementiert)</span>
</p>
)}
</div>
</div>
) : null}
</div>
</div> </div>
)} )}
@@ -510,13 +127,13 @@ export default function QualityPage() {
{/* Audit Info */} {/* Audit Info */}
<div className="bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 rounded-lg p-4"> <div className="bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 rounded-lg p-4">
<h3 className="text-sm font-medium text-blue-800 dark:text-blue-400 mb-2"> <h3 className="text-sm font-medium text-blue-800 dark:text-blue-400 mb-2">
Hinweise fuer Auditoren Hinweise fuer Auditoren
</h3> </h3>
<ul className="text-sm text-blue-700 dark:text-blue-300 space-y-1 list-disc list-inside"> <ul className="text-sm text-blue-700 dark:text-blue-300 space-y-1 list-disc list-inside">
<li>Die Suche ist semantisch - aehnliche Begriffe werden gefunden, auch wenn die exakte Formulierung abweicht</li> <li>Die Suche ist semantisch - aehnliche Begriffe werden gefunden, auch wenn die exakte Formulierung abweicht</li>
<li>Jeder Chunk entspricht einem logischen Textabschnitt aus dem Originaldokument</li> <li>Jeder Chunk entspricht einem logischen Textabschnitt aus dem Originaldokument</li>
<li>Die Traceability zeigt, wie aus dem Originaltext Anforderungen und Controls abgeleitet wurden</li> <li>Die Traceability zeigt, wie aus dem Originaltext Anforderungen und Controls abgeleitet wurden</li>
<li>Klicken Sie auf "Quelle oeffnen", um das Originaldokument zu pruefen</li> <li>Klicken Sie auf &quot;Quelle oeffnen&quot;, um das Originaldokument zu pruefen</li>
</ul> </ul>
</div> </div>
</div> </div>

View File

@@ -0,0 +1,212 @@
'use client'
export function ArchitectureTab() {
return (
<div className="space-y-8">
{/* What is this module */}
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<h2 className="text-xl font-bold text-gray-900 dark:text-white mb-4">
Was macht dieses Modul?
</h2>
<div className="prose dark:prose-invert max-w-none">
<p className="text-gray-600 dark:text-gray-400">
Das <strong>RAG-Indexierungs-Modul</strong> verarbeitet Dokumente und macht sie fuer die KI-gestuetzte Suche verfuegbar.
Es handelt sich <strong>nicht</strong> um klassisches Machine-Learning-Training, sondern um:
</p>
<ul className="mt-4 space-y-2 text-gray-600 dark:text-gray-400">
<li className="flex items-start gap-2">
<span className="text-blue-500 mt-1">1.</span>
<span><strong>Dokumentenextraktion:</strong> PDFs und Bilder werden per OCR in Text umgewandelt</span>
</li>
<li className="flex items-start gap-2">
<span className="text-blue-500 mt-1">2.</span>
<span><strong>Chunking:</strong> Lange Texte werden in suchbare Abschnitte (1000 Zeichen) aufgeteilt</span>
</li>
<li className="flex items-start gap-2">
<span className="text-blue-500 mt-1">3.</span>
<span><strong>Embedding:</strong> Jeder Chunk wird in einen Vektor (1536 Dimensionen) umgewandelt</span>
</li>
<li className="flex items-start gap-2">
<span className="text-blue-500 mt-1">4.</span>
<span><strong>Indexierung:</strong> Vektoren werden in Qdrant gespeichert fuer semantische Suche</span>
</li>
</ul>
</div>
</div>
{/* Architecture Diagram */}
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<h2 className="text-xl font-bold text-gray-900 dark:text-white mb-6">
Technische Architektur
</h2>
{/* Visual Pipeline */}
<div className="relative">
{/* Data Sources Row */}
<div className="grid grid-cols-4 gap-4 mb-8">
<SourceCard icon="📄" title="NiBiS PDFs" subtitle="Erwartungshorizonte" color="blue" />
<SourceCard icon="📤" title="Uploads" subtitle="Eigene EH" color="green" />
<SourceCard icon="⚖️" title="Rechtskorpus" subtitle="DSGVO, AI Act" color="purple" />
<SourceCard icon="📚" title="Schulordnungen" subtitle="Bundeslaender" color="orange" />
</div>
<ArrowDown />
{/* Processing Layer */}
<div className="bg-gray-50 dark:bg-gray-900 rounded-xl p-6 mb-8">
<h3 className="text-sm font-semibold text-gray-500 dark:text-gray-400 uppercase tracking-wide mb-4">
Verarbeitungs-Pipeline
</h3>
<div className="flex items-center justify-between gap-4">
<PipelineStep icon="🔍" title="OCR" subtitle="Text-Extraktion" />
<ArrowRight />
<PipelineStep icon="✂️" title="Chunking" subtitle="1000 Zeichen" />
<ArrowRight />
<PipelineStep icon="🧮" title="Embedding" subtitle="1536-dim Vektor" />
<ArrowRight />
<PipelineStep icon="💾" title="Speichern" subtitle="Qdrant" />
</div>
</div>
<ArrowDown />
{/* Storage Layer */}
<div className="bg-gradient-to-r from-indigo-50 to-purple-50 dark:from-indigo-900/20 dark:to-purple-900/20 rounded-xl p-6 mb-8 border-2 border-indigo-200 dark:border-indigo-800">
<h3 className="text-sm font-semibold text-indigo-600 dark:text-indigo-400 uppercase tracking-wide mb-4">
Vektor-Datenbank (Qdrant)
</h3>
<div className="grid grid-cols-3 gap-4">
<CollectionCard collection="bp_nibis_eh" label="Offizielle EH" />
<CollectionCard collection="bp_eh" label="Benutzer EH" />
<CollectionCard collection="bp_legal_corpus" label="Rechtskorpus" />
</div>
</div>
<ArrowDown />
{/* Usage Layer */}
<div className="grid grid-cols-2 gap-4">
<div className="p-4 bg-emerald-50 dark:bg-emerald-900/20 rounded-xl border-2 border-emerald-200 dark:border-emerald-800">
<h4 className="font-medium text-emerald-700 dark:text-emerald-400 mb-2">Semantische Suche</h4>
<p className="text-sm text-gray-600 dark:text-gray-400">
Fragen werden in Vektoren umgewandelt und aehnliche Dokumente gefunden
</p>
</div>
<div className="p-4 bg-amber-50 dark:bg-amber-900/20 rounded-xl border-2 border-amber-200 dark:border-amber-800">
<h4 className="font-medium text-amber-700 dark:text-amber-400 mb-2">RAG-Antworten</h4>
<p className="text-sm text-gray-600 dark:text-gray-400">
LLM generiert Antworten basierend auf gefundenen Dokumenten
</p>
</div>
</div>
</div>
</div>
{/* Technical Details */}
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<h2 className="text-xl font-bold text-gray-900 dark:text-white mb-4">
Technische Details
</h2>
<div className="grid grid-cols-2 gap-6">
<div>
<h3 className="font-medium text-gray-900 dark:text-white mb-3">Embedding-Service</h3>
<table className="w-full text-sm">
<tbody className="divide-y divide-gray-200 dark:divide-gray-700">
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Modell</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">text-embedding-3-small</td>
</tr>
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Dimensionen</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">1536</td>
</tr>
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Port</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">8087</td>
</tr>
</tbody>
</table>
</div>
<div>
<h3 className="font-medium text-gray-900 dark:text-white mb-3">Chunk-Konfiguration</h3>
<table className="w-full text-sm">
<tbody className="divide-y divide-gray-200 dark:divide-gray-700">
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Chunk-Groesse</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">1000 Zeichen</td>
</tr>
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Ueberlappung</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">200 Zeichen</td>
</tr>
<tr>
<td className="py-2 text-gray-500 dark:text-gray-400">Distanzmetrik</td>
<td className="py-2 font-mono text-gray-900 dark:text-white">COSINE</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
)
}
// --- Internal helper components ---
function SourceCard({ icon, title, subtitle, color }: {
icon: string
title: string
subtitle: string
color: string
}) {
const colorClasses: Record<string, string> = {
blue: 'bg-blue-50 dark:bg-blue-900/20 border-blue-200 dark:border-blue-800',
green: 'bg-green-50 dark:bg-green-900/20 border-green-200 dark:border-green-800',
purple: 'bg-purple-50 dark:bg-purple-900/20 border-purple-200 dark:border-purple-800',
orange: 'bg-orange-50 dark:bg-orange-900/20 border-orange-200 dark:border-orange-800',
}
return (
<div className={`p-4 rounded-xl border-2 text-center ${colorClasses[color]}`}>
<div className="text-3xl mb-2">{icon}</div>
<div className="font-medium text-gray-900 dark:text-white">{title}</div>
<div className="text-xs text-gray-500">{subtitle}</div>
</div>
)
}
function PipelineStep({ icon, title, subtitle }: {
icon: string
title: string
subtitle: string
}) {
return (
<div className="flex-1 p-4 bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700 text-center">
<div className="text-2xl mb-1">{icon}</div>
<div className="font-medium text-sm">{title}</div>
<div className="text-xs text-gray-500">{subtitle}</div>
</div>
)
}
function CollectionCard({ collection, label }: { collection: string; label: string }) {
return (
<div className="p-3 bg-white dark:bg-gray-800 rounded-lg text-center">
<div className="font-mono text-xs text-gray-500">{collection}</div>
<div className="font-medium text-gray-900 dark:text-white">{label}</div>
</div>
)
}
function ArrowDown() {
return (
<div className="flex justify-center mb-4">
<div className="text-4xl text-gray-400"></div>
</div>
)
}
function ArrowRight() {
return <div className="text-2xl text-gray-400"></div>
}

View File

@@ -0,0 +1,171 @@
'use client'
import type { DataSource } from '../types'
export function DataSourcesTab({ sources }: { sources: DataSource[] }) {
return (
<div className="space-y-6">
{/* Introduction */}
<div className="bg-blue-50 dark:bg-blue-900/20 rounded-xl p-6 border border-blue-200 dark:border-blue-800">
<h2 className="text-lg font-semibold text-blue-900 dark:text-blue-100 mb-2">
Wie werden Daten hinzugefuegt?
</h2>
<p className="text-blue-800 dark:text-blue-200 mb-4">
Das RAG-System nutzt verschiedene Datenquellen. Jede Quelle hat einen eigenen Ingestion-Prozess:
</p>
<div className="grid grid-cols-2 gap-4 text-sm">
<div className="bg-white dark:bg-gray-800 rounded-lg p-4">
<div className="font-medium text-gray-900 dark:text-white mb-1">Automatisch</div>
<p className="text-gray-600 dark:text-gray-400">
NiBiS-PDFs werden automatisch aus dem za-download Verzeichnis eingelesen
</p>
</div>
<div className="bg-white dark:bg-gray-800 rounded-lg p-4">
<div className="font-medium text-gray-900 dark:text-white mb-1">Manuell</div>
<p className="text-gray-600 dark:text-gray-400">
Eigene EH koennen ueber die Klausur-Korrektur hochgeladen werden
</p>
</div>
</div>
</div>
{/* Data Sources List */}
<div className="grid gap-4">
{sources.map((source) => (
<DataSourceCard key={source.id} source={source} />
))}
</div>
{/* How to add data */}
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<h2 className="text-lg font-semibold text-gray-900 dark:text-white mb-4">
Daten hinzufuegen
</h2>
<div className="grid grid-cols-3 gap-6">
<AddDataCard
icon="📤"
title="Erwartungshorizont hochladen"
description="Laden Sie eigene EH-Dokumente in der Klausur-Korrektur hoch"
linkHref="/admin/klausur-korrektur"
linkText="Zur Klausur-Korrektur →"
/>
<AddDataCard
icon="🔄"
title="NiBiS neu einlesen"
description="Starten Sie die automatische Ingestion der NiBiS-PDFs"
linkText="Ingestion starten →"
/>
<AddDataCard
icon="⚖️"
title="Rechtskorpus erweitern"
description="Neue Regelwerke (DSGVO, BSI, etc.) zum Korpus hinzufuegen"
linkText="Regelwerk hinzufuegen →"
/>
<AddDataCard
icon="📋"
title="DSFA-Quellen verwalten"
description="WP248, DSK, Muss-Listen mit Lizenzattribution"
linkHref="/ai/rag-pipeline/dsfa"
linkText="DSFA-Manager oeffnen →"
/>
</div>
</div>
</div>
)
}
// --- Internal helper components ---
function DataSourceCard({ source }: { source: DataSource }) {
return (
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<div className="flex items-start justify-between">
<div className="flex-1">
<div className="flex items-center gap-3 mb-2">
<h3 className="text-lg font-semibold text-gray-900 dark:text-white">
{source.name}
</h3>
<DataSourceStatusBadge status={source.status} />
</div>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{source.description}
</p>
<div className="flex items-center gap-6 text-sm">
<div>
<span className="text-gray-500 dark:text-gray-400">Collection: </span>
<span className="font-mono text-gray-900 dark:text-white">{source.collection}</span>
</div>
<div>
<span className="text-gray-500 dark:text-gray-400">Dokumente: </span>
<span className="font-semibold text-gray-900 dark:text-white">{source.document_count}</span>
</div>
<div>
<span className="text-gray-500 dark:text-gray-400">Chunks: </span>
<span className="font-semibold text-gray-900 dark:text-white">{source.chunk_count}</span>
</div>
{source.last_updated && (
<div>
<span className="text-gray-500 dark:text-gray-400">Aktualisiert: </span>
<span className="text-gray-900 dark:text-white">
{new Date(source.last_updated).toLocaleDateString('de-DE')}
</span>
</div>
)}
</div>
</div>
<div className="flex gap-2">
<button className="px-4 py-2 text-sm font-medium text-blue-600 hover:bg-blue-50 dark:hover:bg-blue-900/20 rounded-lg">
Aktualisieren
</button>
<button className="px-4 py-2 text-sm font-medium text-gray-600 hover:bg-gray-100 dark:hover:bg-gray-700 rounded-lg">
Details
</button>
</div>
</div>
</div>
)
}
function DataSourceStatusBadge({ status }: { status: DataSource['status'] }) {
const className = status === 'active'
? 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200'
: status === 'pending'
? 'bg-yellow-100 text-yellow-800 dark:bg-yellow-900 dark:text-yellow-200'
: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200'
const label = status === 'active' ? 'Aktiv' : status === 'pending' ? 'Ausstehend' : 'Fehler'
return (
<span className={`px-2 py-0.5 rounded-full text-xs font-medium ${className}`}>
{label}
</span>
)
}
function AddDataCard({ icon, title, description, linkHref, linkText }: {
icon: string
title: string
description: string
linkHref?: string
linkText: string
}) {
return (
<div className="p-4 bg-gray-50 dark:bg-gray-900 rounded-xl">
<div className="text-2xl mb-2">{icon}</div>
<h3 className="font-medium text-gray-900 dark:text-white mb-2">{title}</h3>
<p className="text-sm text-gray-600 dark:text-gray-400 mb-3">{description}</p>
{linkHref ? (
<a
href={linkHref}
className="text-sm text-blue-600 hover:text-blue-800 dark:text-blue-400"
>
{linkText}
</a>
) : (
<button className="text-sm text-blue-600 hover:text-blue-800 dark:text-blue-400">
{linkText}
</button>
)}
</div>
)
}

View File

@@ -0,0 +1,60 @@
'use client'
import type { DatasetStats } from '../types'
export function DatasetOverview({ stats }: { stats: DatasetStats }) {
const maxBundesland = Math.max(...Object.values(stats.by_bundesland))
return (
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 p-6">
<h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-4">
Datensatz-Uebersicht
</h3>
<div className="grid grid-cols-3 gap-4 mb-6">
<div className="text-center p-4 bg-blue-50 dark:bg-blue-900/20 rounded-xl">
<p className="text-3xl font-bold text-blue-600 dark:text-blue-400">
{stats.total_documents.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Dokumente</p>
</div>
<div className="text-center p-4 bg-emerald-50 dark:bg-emerald-900/20 rounded-xl">
<p className="text-3xl font-bold text-emerald-600 dark:text-emerald-400">
{stats.total_chunks.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Chunks</p>
</div>
<div className="text-center p-4 bg-purple-50 dark:bg-purple-900/20 rounded-xl">
<p className="text-3xl font-bold text-purple-600 dark:text-purple-400">
{stats.training_allowed.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Indexiert</p>
</div>
</div>
<h4 className="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3">
Verteilung nach Bundesland
</h4>
<div className="space-y-2">
{Object.entries(stats.by_bundesland)
.sort((a, b) => b[1] - a[1])
.map(([code, count]) => (
<div key={code} className="flex items-center gap-3">
<span className="w-8 text-xs font-medium text-gray-600 dark:text-gray-400 uppercase">
{code}
</span>
<div className="flex-1 h-4 bg-gray-100 dark:bg-gray-700 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-blue-500 to-blue-600 rounded-full"
style={{ width: `${(count / maxBundesland) * 100}%` }}
/>
</div>
<span className="w-10 text-sm text-right text-gray-600 dark:text-gray-400">
{count}
</span>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,277 @@
'use client'
import { useState } from 'react'
import type { TrainingConfig } from '../types'
const BUNDESLAENDER = [
{ code: 'ni', name: 'Niedersachsen', allowed: true },
{ code: 'by', name: 'Bayern', allowed: true },
{ code: 'nw', name: 'NRW', allowed: true },
{ code: 'he', name: 'Hessen', allowed: true },
{ code: 'bw', name: 'Baden-Wuerttemberg', allowed: true },
{ code: 'rp', name: 'Rheinland-Pfalz', allowed: true },
{ code: 'sn', name: 'Sachsen', allowed: true },
{ code: 'sh', name: 'Schleswig-Holstein', allowed: true },
{ code: 'th', name: 'Thueringen', allowed: true },
{ code: 'be', name: 'Berlin', allowed: false },
{ code: 'bb', name: 'Brandenburg', allowed: false },
{ code: 'hb', name: 'Bremen', allowed: false },
{ code: 'hh', name: 'Hamburg', allowed: false },
{ code: 'mv', name: 'Mecklenburg-Vorpommern', allowed: false },
{ code: 'sl', name: 'Saarland', allowed: false },
{ code: 'st', name: 'Sachsen-Anhalt', allowed: false },
]
export function NewTrainingModal({ isOpen, onClose, onSubmit }: {
isOpen: boolean
onClose: () => void
onSubmit: (config: Partial<TrainingConfig>) => void
}) {
const [step, setStep] = useState(1)
const [config, setConfig] = useState<Partial<TrainingConfig>>({
batch_size: 16,
learning_rate: 0.00005,
epochs: 10,
warmup_steps: 500,
weight_decay: 0.01,
gradient_accumulation: 4,
mixed_precision: true,
bundeslaender: [],
})
if (!isOpen) return null
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/50 backdrop-blur-sm">
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-2xl w-full max-w-2xl max-h-[90vh] overflow-hidden">
{/* Header */}
<div className="px-6 py-4 border-b border-gray-200 dark:border-gray-700 flex justify-between items-center">
<div>
<h2 className="text-xl font-semibold text-gray-900 dark:text-white">
Neue Indexierung starten
</h2>
<p className="text-sm text-gray-500">Schritt {step} von 3</p>
</div>
<button onClick={onClose} className="p-2 hover:bg-gray-100 dark:hover:bg-gray-700 rounded-lg">
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
{/* Step Indicator */}
<StepIndicator currentStep={step} />
{/* Step Content */}
<div className="p-6 overflow-y-auto max-h-[50vh]">
{step === 1 && (
<BundeslaenderStep config={config} setConfig={setConfig} />
)}
{step === 2 && (
<ParameterStep config={config} setConfig={setConfig} />
)}
{step === 3 && (
<ConfirmStep config={config} />
)}
</div>
{/* Footer */}
<div className="px-6 py-4 border-t border-gray-200 dark:border-gray-700 flex justify-between">
<button
onClick={() => step > 1 ? setStep(step - 1) : onClose()}
className="px-4 py-2 text-sm font-medium text-gray-700 dark:text-gray-300 bg-white dark:bg-gray-800 border border-gray-300 dark:border-gray-600 rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700"
>
{step > 1 ? 'Zurueck' : 'Abbrechen'}
</button>
<button
onClick={() => step < 3 ? setStep(step + 1) : onSubmit(config)}
disabled={step === 1 && (!config.bundeslaender || config.bundeslaender.length === 0)}
className="px-6 py-2 text-sm font-medium text-white bg-blue-600 rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed"
>
{step < 3 ? 'Weiter' : 'Indexierung starten'}
</button>
</div>
</div>
</div>
)
}
// --- Internal step components ---
function StepIndicator({ currentStep }: { currentStep: number }) {
return (
<div className="px-6 py-4 bg-gray-50 dark:bg-gray-900">
<div className="flex items-center justify-center gap-4">
{[1, 2, 3].map((s) => (
<div key={s} className="flex items-center">
<div className={`w-8 h-8 rounded-full flex items-center justify-center text-sm font-medium ${
s <= currentStep
? 'bg-blue-600 text-white'
: 'bg-gray-200 dark:bg-gray-700 text-gray-500'
}`}>
{s < currentStep ? '\u2713' : s}
</div>
{s < 3 && (
<div className={`w-16 h-1 mx-2 rounded ${
s < currentStep ? 'bg-blue-600' : 'bg-gray-200 dark:bg-gray-700'
}`} />
)}
</div>
))}
</div>
<div className="flex justify-center gap-20 mt-2 text-xs text-gray-500">
<span>Daten</span>
<span>Parameter</span>
<span>Bestaetigen</span>
</div>
</div>
)
}
function BundeslaenderStep({ config, setConfig }: {
config: Partial<TrainingConfig>
setConfig: (config: Partial<TrainingConfig>) => void
}) {
return (
<div>
<h3 className="font-medium text-gray-900 dark:text-white mb-4">
Waehlen Sie die Bundeslaender fuer die Indexierung
</h3>
<p className="text-sm text-gray-500 dark:text-gray-400 mb-4">
Nur Bundeslaender mit verfuegbaren Dokumenten koennen ausgewaehlt werden.
</p>
<div className="grid grid-cols-2 gap-3">
{BUNDESLAENDER.map((bl) => (
<label
key={bl.code}
className={`flex items-center p-3 rounded-lg border-2 transition cursor-pointer ${
config.bundeslaender?.includes(bl.code)
? 'border-blue-500 bg-blue-50 dark:bg-blue-900/20'
: bl.allowed
? 'border-gray-200 dark:border-gray-700 hover:border-blue-300'
: 'border-gray-200 dark:border-gray-700 opacity-50 cursor-not-allowed'
}`}
>
<input
type="checkbox"
disabled={!bl.allowed}
checked={config.bundeslaender?.includes(bl.code)}
onChange={(e) => {
if (e.target.checked) {
setConfig({ ...config, bundeslaender: [...(config.bundeslaender || []), bl.code] })
} else {
setConfig({ ...config, bundeslaender: config.bundeslaender?.filter(c => c !== bl.code) })
}
}}
className="sr-only"
/>
<span className={`w-5 h-5 rounded border-2 flex items-center justify-center mr-3 ${
config.bundeslaender?.includes(bl.code)
? 'bg-blue-500 border-blue-500 text-white'
: 'border-gray-300 dark:border-gray-600'
}`}>
{config.bundeslaender?.includes(bl.code) && '\u2713'}
</span>
<span className="flex-1 text-gray-900 dark:text-white">{bl.name}</span>
{!bl.allowed && (
<span className="text-xs text-red-500">Keine Daten</span>
)}
</label>
))}
</div>
</div>
)
}
function ParameterStep({ config, setConfig }: {
config: Partial<TrainingConfig>
setConfig: (config: Partial<TrainingConfig>) => void
}) {
return (
<div className="space-y-6">
<h3 className="font-medium text-gray-900 dark:text-white mb-4">
Indexierungs-Parameter
</h3>
<p className="text-sm text-gray-500 dark:text-gray-400">
Diese Parameter steuern die Batch-Verarbeitung der Dokumente.
</p>
<div className="grid grid-cols-2 gap-4">
<div>
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Batch Size
</label>
<input
type="number"
value={config.batch_size}
onChange={(e) => setConfig({ ...config, batch_size: parseInt(e.target.value) })}
className="w-full px-3 py-2 border border-gray-300 dark:border-gray-600 rounded-lg bg-white dark:bg-gray-700"
/>
<p className="text-xs text-gray-500 mt-1">Dokumente pro Batch</p>
</div>
<div>
<label className="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
Durchlaeufe
</label>
<input
type="number"
value={config.epochs}
onChange={(e) => setConfig({ ...config, epochs: parseInt(e.target.value) })}
className="w-full px-3 py-2 border border-gray-300 dark:border-gray-600 rounded-lg bg-white dark:bg-gray-700"
/>
<p className="text-xs text-gray-500 mt-1">Fuer Validierung</p>
</div>
</div>
<div className="flex items-center gap-3 p-4 bg-gray-50 dark:bg-gray-900 rounded-lg">
<input
type="checkbox"
id="mixedPrecision"
checked={config.mixed_precision}
onChange={(e) => setConfig({ ...config, mixed_precision: e.target.checked })}
className="w-4 h-4 text-blue-600 rounded"
/>
<label htmlFor="mixedPrecision" className="text-sm text-gray-700 dark:text-gray-300">
Parallele Verarbeitung - schneller bei grossem Datensatz
</label>
</div>
</div>
)
}
function ConfirmStep({ config }: { config: Partial<TrainingConfig> }) {
return (
<div>
<h3 className="font-medium text-gray-900 dark:text-white mb-4">
Konfiguration bestaetigen
</h3>
<div className="bg-gray-50 dark:bg-gray-900 rounded-lg p-4 space-y-3">
<div className="flex justify-between">
<span className="text-gray-600 dark:text-gray-400">Bundeslaender</span>
<span className="font-medium text-gray-900 dark:text-white">
{config.bundeslaender?.length || 0} ausgewaehlt
</span>
</div>
<div className="flex justify-between">
<span className="text-gray-600 dark:text-gray-400">Batch Size</span>
<span className="font-medium text-gray-900 dark:text-white">{config.batch_size}</span>
</div>
<div className="flex justify-between">
<span className="text-gray-600 dark:text-gray-400">Parallele Verarbeitung</span>
<span className="font-medium text-gray-900 dark:text-white">
{config.mixed_precision ? 'Aktiviert' : 'Deaktiviert'}
</span>
</div>
</div>
<div className="mt-4 p-4 bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 rounded-lg">
<p className="text-sm text-blue-800 dark:text-blue-200">
<strong>Was passiert:</strong> Die ausgewaehlten Dokumente werden extrahiert,
in Chunks aufgeteilt, und als Vektoren in Qdrant indexiert.
Dieser Prozess kann je nach Datenmenge einige Minuten dauern.
</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,168 @@
'use client'
import type { TrainingJob } from '../types'
// Tab Button
export function TabButton({ active, onClick, children }: {
active: boolean
onClick: () => void
children: React.ReactNode
}) {
return (
<button
onClick={onClick}
className={`px-4 py-2 text-sm font-medium rounded-lg transition-colors ${
active
? 'bg-blue-600 text-white'
: 'text-gray-600 dark:text-gray-400 hover:bg-gray-100 dark:hover:bg-gray-700'
}`}
>
{children}
</button>
)
}
// Progress Ring Component
export function ProgressRing({ progress, size = 120, strokeWidth = 8, color = '#10B981' }: {
progress: number
size?: number
strokeWidth?: number
color?: string
}) {
const radius = (size - strokeWidth) / 2
const circumference = radius * 2 * Math.PI
const offset = circumference - (progress / 100) * circumference
return (
<div className="relative" style={{ width: size, height: size }}>
<svg className="transform -rotate-90" width={size} height={size}>
<circle
cx={size / 2}
cy={size / 2}
r={radius}
stroke="currentColor"
strokeWidth={strokeWidth}
fill="none"
className="text-gray-200 dark:text-gray-700"
/>
<circle
cx={size / 2}
cy={size / 2}
r={radius}
stroke={color}
strokeWidth={strokeWidth}
fill="none"
strokeDasharray={circumference}
strokeDashoffset={offset}
strokeLinecap="round"
className="transition-all duration-500"
/>
</svg>
<div className="absolute inset-0 flex items-center justify-center">
<span className="text-2xl font-bold text-gray-900 dark:text-white">
{Math.round(progress)}%
</span>
</div>
</div>
)
}
// Mini Line Chart Component
export function MiniChart({ data, color = '#10B981', height = 60 }: {
data: number[]
color?: string
height?: number
}) {
if (!data.length) return null
const max = Math.max(...data)
const min = Math.min(...data)
const range = max - min || 1
const width = 200
const padding = 4
const points = data.map((value, i) => {
const x = padding + (i / (data.length - 1)) * (width - 2 * padding)
const y = padding + (1 - (value - min) / range) * (height - 2 * padding)
return `${x},${y}`
}).join(' ')
return (
<svg width={width} height={height} className="overflow-visible">
<polyline
points={points}
fill="none"
stroke={color}
strokeWidth={2}
strokeLinecap="round"
strokeLinejoin="round"
/>
{data.length > 0 && (
<circle
cx={padding + ((data.length - 1) / (data.length - 1)) * (width - 2 * padding)}
cy={padding + (1 - (data[data.length - 1] - min) / range) * (height - 2 * padding)}
r={4}
fill={color}
/>
)}
</svg>
)
}
// Status Badge
export function StatusBadge({ status }: { status: TrainingJob['status'] }) {
const styles = {
queued: 'bg-gray-100 text-gray-800 dark:bg-gray-700 dark:text-gray-300',
preparing: 'bg-yellow-100 text-yellow-800 dark:bg-yellow-900 dark:text-yellow-200',
training: 'bg-blue-100 text-blue-800 dark:bg-blue-900 dark:text-blue-200',
validating: 'bg-purple-100 text-purple-800 dark:bg-purple-900 dark:text-purple-200',
completed: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200',
failed: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200',
paused: 'bg-orange-100 text-orange-800 dark:bg-orange-900 dark:text-orange-200',
}
const labels = {
queued: 'In Warteschlange',
preparing: 'Vorbereitung',
training: 'Indexierung laeuft',
validating: 'Validierung',
completed: 'Abgeschlossen',
failed: 'Fehlgeschlagen',
paused: 'Pausiert',
}
return (
<span className={`inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium ${styles[status]}`}>
{status === 'training' && (
<span className="w-2 h-2 mr-1.5 bg-blue-500 rounded-full animate-pulse" />
)}
{labels[status]}
</span>
)
}
// Metric Card
export function MetricCard({ label, value, trend, color }: {
label: string
value: number | string
trend?: 'up' | 'down' | 'neutral'
color?: string
}) {
return (
<div className="bg-white dark:bg-gray-800 rounded-xl p-4 shadow-sm border border-gray-200 dark:border-gray-700">
<p className="text-sm text-gray-500 dark:text-gray-400 mb-1">{label}</p>
<div className="flex items-baseline gap-1">
<span className="text-2xl font-bold" style={{ color: color || 'inherit' }}>
{typeof value === 'number' ? value.toFixed(3) : value}
</span>
{trend && (
<span className={`ml-2 text-sm ${
trend === 'up' ? 'text-green-500' : trend === 'down' ? 'text-red-500' : 'text-gray-400'
}`}>
{trend === 'up' ? '\u2191' : trend === 'down' ? '\u2193' : '\u2192'}
</span>
)}
</div>
</div>
)
}

View File

@@ -0,0 +1,126 @@
'use client'
import type { TrainingJob } from '../types'
import { ProgressRing, MiniChart, StatusBadge, MetricCard } from './SharedWidgets'
export function TrainingJobCard({ job, onPause, onResume, onStop, onViewDetails }: {
job: TrainingJob
onPause: () => void
onResume: () => void
onStop: () => void
onViewDetails: () => void
}) {
const isActive = ['training', 'preparing', 'validating'].includes(job.status)
return (
<div className="bg-white dark:bg-gray-800 rounded-2xl shadow-lg border border-gray-200 dark:border-gray-700 overflow-hidden">
<div className="px-6 py-4 border-b border-gray-200 dark:border-gray-700 flex justify-between items-center">
<div>
<h3 className="text-lg font-semibold text-gray-900 dark:text-white">{job.name}</h3>
<p className="text-sm text-gray-500 dark:text-gray-400">
Typ: {job.model_type.charAt(0).toUpperCase() + job.model_type.slice(1)}
</p>
</div>
<StatusBadge status={job.status} />
</div>
<div className="p-6">
<div className="flex items-center gap-8">
<ProgressRing
progress={job.progress}
color={job.status === 'failed' ? '#EF4444' : '#10B981'}
/>
<div className="flex-1 space-y-4">
<div>
<div className="flex justify-between text-sm mb-1">
<span className="text-gray-600 dark:text-gray-400">Durchlauf</span>
<span className="font-medium text-gray-900 dark:text-white">
{job.current_epoch} / {job.total_epochs}
</span>
</div>
<div className="h-2 bg-gray-200 dark:bg-gray-700 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-blue-500 to-blue-600 rounded-full transition-all duration-500"
style={{ width: `${(job.current_epoch / job.total_epochs) * 100}%` }}
/>
</div>
</div>
<div>
<div className="flex justify-between text-sm mb-1">
<span className="text-gray-600 dark:text-gray-400">Dokumente</span>
<span className="font-medium text-gray-900 dark:text-white">
{job.documents_processed.toLocaleString()} / {job.total_documents.toLocaleString()}
</span>
</div>
<div className="h-2 bg-gray-200 dark:bg-gray-700 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-emerald-500 to-emerald-600 rounded-full transition-all duration-500"
style={{ width: `${(job.documents_processed / job.total_documents) * 100}%` }}
/>
</div>
</div>
</div>
</div>
<div className="grid grid-cols-4 gap-3 mt-6">
<MetricCard label="Loss" value={job.loss} trend="down" color="#3B82F6" />
<MetricCard label="Val Loss" value={job.val_loss} trend="down" color="#8B5CF6" />
<MetricCard label="Precision" value={job.metrics.precision} color="#10B981" />
<MetricCard label="F1 Score" value={job.metrics.f1_score} color="#F59E0B" />
</div>
<div className="mt-6 p-4 bg-gray-50 dark:bg-gray-900 rounded-xl">
<div className="flex justify-between items-center mb-3">
<span className="text-sm font-medium text-gray-700 dark:text-gray-300">
Fortschritt
</span>
</div>
<div className="flex gap-4">
<MiniChart data={job.metrics.loss_history} color="#3B82F6" />
<MiniChart data={job.metrics.val_loss_history} color="#8B5CF6" />
</div>
</div>
<div className="mt-4 flex justify-between text-sm text-gray-500 dark:text-gray-400">
<span>
Gestartet: {job.started_at ? new Date(job.started_at).toLocaleTimeString('de-DE') : '-'}
</span>
<span>
Geschaetzt: {job.estimated_completion
? new Date(job.estimated_completion).toLocaleTimeString('de-DE')
: '-'
}
</span>
</div>
</div>
<div className="px-6 py-4 bg-gray-50 dark:bg-gray-900 border-t border-gray-200 dark:border-gray-700 flex justify-between">
<button
onClick={onViewDetails}
className="px-4 py-2 text-sm font-medium text-blue-600 hover:text-blue-800 dark:text-blue-400"
>
Details anzeigen
</button>
<div className="flex gap-2">
{isActive && (
<>
<button
onClick={job.status === 'paused' ? onResume : onPause}
className="px-4 py-2 text-sm font-medium text-gray-700 dark:text-gray-300 bg-white dark:bg-gray-800 border border-gray-300 dark:border-gray-600 rounded-lg hover:bg-gray-50 dark:hover:bg-gray-700"
>
{job.status === 'paused' ? 'Fortsetzen' : 'Pausieren'}
</button>
<button
onClick={onStop}
className="px-4 py-2 text-sm font-medium text-red-600 bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg hover:bg-red-100 dark:hover:bg-red-900/40"
>
Abbrechen
</button>
</>
)}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,146 @@
import type { TrainingJob, TrainingConfig, DatasetStats, DataSource } from './types'
// ============================================================================
// MOCK DATA
// ============================================================================
export const MOCK_JOBS: TrainingJob[] = []
export const MOCK_STATS: DatasetStats = {
total_documents: 632,
total_chunks: 8547,
training_allowed: 489,
by_bundesland: {
ni: 87, by: 92, nw: 78, he: 65, bw: 71, rp: 43, sn: 38, sh: 34, th: 29,
},
by_doc_type: {
verordnung: 312,
schulordnung: 156,
handreichung: 98,
erlass: 66,
},
}
export const MOCK_DATA_SOURCES: DataSource[] = [
{
id: 'nibis',
name: 'NiBiS Erwartungshorizonte',
description: 'Offizielle Abitur-Erwartungshorizonte vom Niedersaechsischen Bildungsserver',
collection: 'bp_nibis_eh',
document_count: 245,
chunk_count: 3200,
last_updated: '2025-01-15T10:30:00Z',
status: 'active',
},
{
id: 'user_eh',
name: 'Benutzerdefinierte EH',
description: 'Von Lehrern hochgeladene schulspezifische Erwartungshorizonte',
collection: 'bp_eh',
document_count: 87,
chunk_count: 1100,
last_updated: '2025-01-20T14:15:00Z',
status: 'active',
},
{
id: 'legal',
name: 'Rechtskorpus',
description: 'DSGVO, AI Act, BSI-Standards und weitere Compliance-Regelwerke',
collection: 'bp_legal_corpus',
document_count: 19,
chunk_count: 2400,
last_updated: '2025-01-10T08:00:00Z',
status: 'active',
},
{
id: 'dsfa',
name: 'DSFA-Guidance',
description: 'WP248, DSK Kurzpapiere, Muss-Listen aller Bundeslaender mit Quellenattribution',
collection: 'bp_dsfa_corpus',
document_count: 45,
chunk_count: 850,
last_updated: '2026-02-09T10:00:00Z',
status: 'active',
},
{
id: 'schulordnungen',
name: 'Schulordnungen',
description: 'Landesschulordnungen und Zeugnisverordnungen aller Bundeslaender',
collection: 'bp_schulordnungen',
document_count: 156,
chunk_count: 1847,
last_updated: null,
status: 'pending',
},
]
// ============================================================================
// API FUNCTIONS
// ============================================================================
export async function fetchJobs(): Promise<TrainingJob[]> {
try {
const response = await fetch('/api/ai/rag-pipeline?action=jobs')
if (!response.ok) throw new Error('Failed to fetch jobs')
return await response.json()
} catch (error) {
console.error('Error fetching jobs:', error)
return MOCK_JOBS
}
}
export async function fetchDatasetStats(): Promise<DatasetStats> {
try {
const response = await fetch('/api/ai/rag-pipeline?action=dataset-stats')
if (!response.ok) throw new Error('Failed to fetch stats')
return await response.json()
} catch (error) {
console.error('Error fetching stats:', error)
return MOCK_STATS
}
}
export async function createTrainingJob(config: Partial<TrainingConfig>): Promise<{id: string, status: string}> {
const response = await fetch('/api/ai/rag-pipeline?action=create-job', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
name: `RAG-Index ${new Date().toLocaleDateString('de-DE')}`,
model_type: 'zeugnis',
bundeslaender: config.bundeslaender || [],
batch_size: config.batch_size || 16,
learning_rate: config.learning_rate || 0.00005,
epochs: config.epochs || 10,
warmup_steps: config.warmup_steps || 500,
weight_decay: config.weight_decay || 0.01,
gradient_accumulation: config.gradient_accumulation || 4,
mixed_precision: config.mixed_precision ?? true,
}),
})
if (!response.ok) {
const error = await response.json()
throw new Error(error.detail || 'Failed to create job')
}
return await response.json()
}
export async function pauseJob(jobId: string): Promise<void> {
const response = await fetch(`/api/ai/rag-pipeline?action=pause&job_id=${jobId}`, {
method: 'POST',
})
if (!response.ok) throw new Error('Failed to pause job')
}
export async function resumeJob(jobId: string): Promise<void> {
const response = await fetch(`/api/ai/rag-pipeline?action=resume&job_id=${jobId}`, {
method: 'POST',
})
if (!response.ok) throw new Error('Failed to resume job')
}
export async function cancelJob(jobId: string): Promise<void> {
const response = await fetch(`/api/ai/rag-pipeline?action=cancel&job_id=${jobId}`, {
method: 'POST',
})
if (!response.ok) throw new Error('Failed to cancel job')
}

View File

@@ -0,0 +1,61 @@
'use client'
import React from 'react'
import { Scale, CheckCircle, Clock, AlertCircle } from 'lucide-react'
import {
DSFALicenseCode,
DSFA_LICENSE_LABELS,
DSFA_DOCUMENT_TYPE_LABELS,
} from '@/lib/sdk/types'
export function LicenseBadge({ licenseCode }: { licenseCode: DSFALicenseCode }) {
const colorMap: Record<DSFALicenseCode, string> = {
'DL-DE-BY-2.0': 'bg-blue-100 text-blue-700 border-blue-200',
'DL-DE-ZERO-2.0': 'bg-gray-100 text-gray-700 border-gray-200',
'CC-BY-4.0': 'bg-green-100 text-green-700 border-green-200',
'EDPB-LICENSE': 'bg-purple-100 text-purple-700 border-purple-200',
'PUBLIC_DOMAIN': 'bg-gray-100 text-gray-600 border-gray-200',
'PROPRIETARY': 'bg-amber-100 text-amber-700 border-amber-200',
}
return (
<span className={`inline-flex items-center gap-1 px-2 py-0.5 rounded text-xs border ${colorMap[licenseCode] || 'bg-gray-100 text-gray-700 border-gray-200'}`}>
<Scale className="w-3 h-3" />
{DSFA_LICENSE_LABELS[licenseCode] || licenseCode}
</span>
)
}
export function DocumentTypeBadge({ type }: { type?: string }) {
if (!type) return null
const colorMap: Record<string, string> = {
guideline: 'bg-indigo-100 text-indigo-700',
checklist: 'bg-emerald-100 text-emerald-700',
regulation: 'bg-red-100 text-red-700',
template: 'bg-orange-100 text-orange-700',
}
return (
<span className={`inline-flex items-center px-2 py-0.5 rounded text-xs ${colorMap[type] || 'bg-gray-100 text-gray-700'}`}>
{DSFA_DOCUMENT_TYPE_LABELS[type as keyof typeof DSFA_DOCUMENT_TYPE_LABELS] || type}
</span>
)
}
export function StatusIndicator({ status }: { status: string }) {
const statusConfig: Record<string, { color: string; icon: React.ReactNode; label: string }> = {
green: { color: 'text-green-500', icon: <CheckCircle className="w-4 h-4" />, label: 'Aktiv' },
yellow: { color: 'text-yellow-500', icon: <Clock className="w-4 h-4" />, label: 'Ausstehend' },
red: { color: 'text-red-500', icon: <AlertCircle className="w-4 h-4" />, label: 'Fehler' },
}
const config = statusConfig[status] || statusConfig.yellow
return (
<span className={`inline-flex items-center gap-1 ${config.color}`}>
{config.icon}
<span className="text-sm">{config.label}</span>
</span>
)
}

View File

@@ -0,0 +1,137 @@
'use client'
import { useState } from 'react'
import {
RefreshCw,
ChevronDown,
ChevronUp,
ExternalLink,
} from 'lucide-react'
import { DSFASource, DSFASourceStats } from '@/lib/sdk/types'
import { LicenseBadge, DocumentTypeBadge } from './DSFABadges'
interface SourceCardProps {
source: DSFASource
stats?: DSFASourceStats
onIngest: () => void
isIngesting: boolean
}
export function SourceCard({ source, stats, onIngest, isIngesting }: SourceCardProps) {
const [isExpanded, setIsExpanded] = useState(false)
return (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 overflow-hidden">
<div className="p-4">
<div className="flex items-start justify-between">
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2 mb-1">
<span className="font-mono text-xs bg-gray-100 dark:bg-gray-700 px-2 py-0.5 rounded">
{source.sourceCode}
</span>
<DocumentTypeBadge type={source.documentType} />
</div>
<h3 className="font-semibold text-gray-900 dark:text-white truncate">
{source.name}
</h3>
{source.organization && (
<p className="text-sm text-gray-500 dark:text-gray-400">
{source.organization}
</p>
)}
</div>
<button
onClick={() => setIsExpanded(!isExpanded)}
className="p-1 hover:bg-gray-100 dark:hover:bg-gray-700 rounded"
>
{isExpanded ? (
<ChevronUp className="w-5 h-5 text-gray-400" />
) : (
<ChevronDown className="w-5 h-5 text-gray-400" />
)}
</button>
</div>
<div className="flex items-center gap-4 mt-3">
<LicenseBadge licenseCode={source.licenseCode} />
{stats && (
<>
<span className="text-sm text-gray-500">
{stats.documentCount} Dok.
</span>
<span className="text-sm text-gray-500">
{stats.chunkCount} Chunks
</span>
</>
)}
</div>
{source.attributionRequired && (
<div className="mt-3 p-2 bg-amber-50 dark:bg-amber-900/20 rounded text-xs text-amber-700 dark:text-amber-300">
<strong>Attribution:</strong> {source.attributionText}
</div>
)}
</div>
{isExpanded && (
<div className="border-t border-gray-200 dark:border-gray-700 p-4 bg-gray-50 dark:bg-gray-900">
<dl className="grid grid-cols-2 gap-3 text-sm">
{source.sourceUrl && (
<>
<dt className="text-gray-500">Quelle:</dt>
<dd>
<a
href={source.sourceUrl}
target="_blank"
rel="noopener noreferrer"
className="text-blue-600 hover:underline flex items-center gap-1"
>
Link <ExternalLink className="w-3 h-3" />
</a>
</dd>
</>
)}
{source.licenseUrl && (
<>
<dt className="text-gray-500">Lizenz-URL:</dt>
<dd>
<a
href={source.licenseUrl}
target="_blank"
rel="noopener noreferrer"
className="text-blue-600 hover:underline flex items-center gap-1"
>
{source.licenseName} <ExternalLink className="w-3 h-3" />
</a>
</dd>
</>
)}
<dt className="text-gray-500">Sprache:</dt>
<dd className="uppercase">{source.language}</dd>
{stats?.lastIndexedAt && (
<>
<dt className="text-gray-500">Zuletzt indexiert:</dt>
<dd>{new Date(stats.lastIndexedAt).toLocaleString('de-DE')}</dd>
</>
)}
</dl>
<div className="mt-4 flex gap-2">
<button
onClick={onIngest}
disabled={isIngesting}
className="px-3 py-1.5 text-sm bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 flex items-center gap-1"
>
{isIngesting ? (
<RefreshCw className="w-4 h-4 animate-spin" />
) : (
<RefreshCw className="w-4 h-4" />
)}
Neu indexieren
</button>
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,59 @@
'use client'
import { Database } from 'lucide-react'
import { DSFACorpusStats } from '@/lib/sdk/types'
import { StatusIndicator } from './DSFABadges'
interface StatsOverviewProps {
stats: DSFACorpusStats
}
export function StatsOverview({ stats }: StatsOverviewProps) {
return (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-6">
<div className="flex items-center justify-between mb-4">
<h2 className="text-lg font-semibold text-gray-900 dark:text-white flex items-center gap-2">
<Database className="w-5 h-5" />
Corpus-Statistik
</h2>
<StatusIndicator status={stats.qdrantStatus} />
</div>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div className="text-center p-4 bg-blue-50 dark:bg-blue-900/20 rounded-lg">
<p className="text-2xl font-bold text-blue-600 dark:text-blue-400">
{stats.totalSources}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Quellen</p>
</div>
<div className="text-center p-4 bg-emerald-50 dark:bg-emerald-900/20 rounded-lg">
<p className="text-2xl font-bold text-emerald-600 dark:text-emerald-400">
{stats.totalDocuments}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Dokumente</p>
</div>
<div className="text-center p-4 bg-purple-50 dark:bg-purple-900/20 rounded-lg">
<p className="text-2xl font-bold text-purple-600 dark:text-purple-400">
{stats.totalChunks.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Chunks</p>
</div>
<div className="text-center p-4 bg-orange-50 dark:bg-orange-900/20 rounded-lg">
<p className="text-2xl font-bold text-orange-600 dark:text-orange-400">
{stats.qdrantPointsCount.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Vektoren</p>
</div>
</div>
<div className="mt-4 p-3 bg-gray-50 dark:bg-gray-900 rounded-lg">
<p className="text-sm text-gray-600 dark:text-gray-400">
<strong>Collection:</strong>{' '}
<code className="font-mono bg-gray-200 dark:bg-gray-700 px-1 rounded">
{stats.qdrantCollection}
</code>
</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,137 @@
/**
* DSFA API functions and mock data.
*/
import {
DSFASource,
DSFACorpusStats,
} from '@/lib/sdk/types'
export const API_BASE = process.env.NEXT_PUBLIC_KLAUSUR_SERVICE_URL || 'http://localhost:8086'
export const MOCK_SOURCES: DSFASource[] = [
{
id: '1',
sourceCode: 'WP248',
name: 'WP248 rev.01 - Leitlinien zur DSFA',
fullName: 'Leitlinien zur Datenschutz-Folgenabschaetzung',
organization: 'Artikel-29-Datenschutzgruppe / EDPB',
sourceUrl: 'https://ec.europa.eu/newsroom/article29/items/611236/en',
licenseCode: 'EDPB-LICENSE',
licenseName: 'EDPB Document License',
attributionRequired: true,
attributionText: 'Quelle: WP248 rev.01, Artikel-29-Datenschutzgruppe (2017)',
documentType: 'guideline',
language: 'de',
},
{
id: '2',
sourceCode: 'DSK_KP5',
name: 'Kurzpapier Nr. 5 - DSFA nach Art. 35 DS-GVO',
organization: 'Datenschutzkonferenz (DSK)',
sourceUrl: 'https://www.datenschutzkonferenz-online.de/media/kp/dsk_kpnr_5.pdf',
licenseCode: 'DL-DE-BY-2.0',
licenseName: 'Datenlizenz DE \u2013 Namensnennung 2.0',
licenseUrl: 'https://www.govdata.de/dl-de/by-2-0',
attributionRequired: true,
attributionText: 'Quelle: DSK Kurzpapier Nr. 5 (Stand: 2018)',
documentType: 'guideline',
language: 'de',
},
{
id: '3',
sourceCode: 'BFDI_MUSS_PUBLIC',
name: 'BfDI DSFA-Liste (oeffentlicher Bereich)',
organization: 'BfDI',
sourceUrl: 'https://www.bfdi.bund.de',
licenseCode: 'DL-DE-ZERO-2.0',
licenseName: 'Datenlizenz DE \u2013 Zero 2.0',
attributionRequired: false,
attributionText: 'Quelle: BfDI, Liste gem. Art. 35 Abs. 4 DSGVO',
documentType: 'checklist',
language: 'de',
},
{
id: '4',
sourceCode: 'NI_MUSS_PRIVATE',
name: 'LfD NI DSFA-Liste (nicht-oeffentlich)',
organization: 'LfD Niedersachsen',
sourceUrl: 'https://www.lfd.niedersachsen.de/download/131098',
licenseCode: 'DL-DE-BY-2.0',
licenseName: 'Datenlizenz DE \u2013 Namensnennung 2.0',
attributionRequired: true,
attributionText: 'Quelle: LfD Niedersachsen, DSFA-Muss-Liste',
documentType: 'checklist',
language: 'de',
},
]
export const MOCK_STATS: DSFACorpusStats = {
sources: [
{
sourceId: '1',
sourceCode: 'WP248',
name: 'WP248 rev.01',
organization: 'EDPB',
licenseCode: 'EDPB-LICENSE',
documentType: 'guideline',
documentCount: 1,
chunkCount: 50,
lastIndexedAt: '2026-02-09T10:00:00Z',
},
{
sourceId: '2',
sourceCode: 'DSK_KP5',
name: 'DSK Kurzpapier Nr. 5',
organization: 'DSK',
licenseCode: 'DL-DE-BY-2.0',
documentType: 'guideline',
documentCount: 1,
chunkCount: 35,
lastIndexedAt: '2026-02-09T10:00:00Z',
},
],
totalSources: 45,
totalDocuments: 45,
totalChunks: 850,
qdrantCollection: 'bp_dsfa_corpus',
qdrantPointsCount: 850,
qdrantStatus: 'green',
}
export async function fetchSources(): Promise<DSFASource[]> {
try {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/sources`)
if (!response.ok) throw new Error('Failed to fetch sources')
return await response.json()
} catch {
return MOCK_SOURCES
}
}
export async function fetchStats(): Promise<DSFACorpusStats> {
try {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/stats`)
if (!response.ok) throw new Error('Failed to fetch stats')
return await response.json()
} catch {
return MOCK_STATS
}
}
export async function initializeCorpus(): Promise<{ sources_registered: number }> {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/init`, {
method: 'POST',
})
if (!response.ok) throw new Error('Failed to initialize corpus')
return await response.json()
}
export async function triggerIngestion(sourceCode: string): Promise<void> {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/sources/${sourceCode}/ingest`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({}),
})
if (!response.ok) throw new Error('Failed to trigger ingestion')
}

View File

@@ -4,11 +4,6 @@
* DSFA Document Manager * DSFA Document Manager
* *
* Manages DSFA-related sources and documents for the RAG pipeline. * Manages DSFA-related sources and documents for the RAG pipeline.
* Features:
* - View all registered DSFA sources with license info
* - Upload new documents
* - Trigger re-indexing
* - View corpus statistics
*/ */
import { useState, useEffect } from 'react' import { useState, useEffect } from 'react'
@@ -19,411 +14,24 @@ import {
Upload, Upload,
FileText, FileText,
Database, Database,
Scale,
ExternalLink,
ChevronDown,
ChevronUp,
Search, Search,
Filter, Filter,
CheckCircle,
Clock,
AlertCircle, AlertCircle,
BookOpen BookOpen,
} from 'lucide-react' } from 'lucide-react'
import { DSFASource, DSFACorpusStats, DSFASourceStats } from '@/lib/sdk/types'
import { import {
DSFASource, fetchSources,
DSFACorpusStats, fetchStats,
DSFASourceStats, initializeCorpus,
DSFALicenseCode, triggerIngestion,
DSFA_LICENSE_LABELS, MOCK_SOURCES,
DSFA_DOCUMENT_TYPE_LABELS MOCK_STATS,
} from '@/lib/sdk/types' } from './_components/dsfa-api'
import { LicenseBadge } from './_components/DSFABadges'
// ============================================================================ import { SourceCard } from './_components/SourceCard'
// TYPES import { StatsOverview } from './_components/StatsOverview'
// ============================================================================
interface APIError {
message: string
status?: number
}
// ============================================================================
// API FUNCTIONS
// ============================================================================
const API_BASE = process.env.NEXT_PUBLIC_KLAUSUR_SERVICE_URL || 'http://localhost:8086'
async function fetchSources(): Promise<DSFASource[]> {
try {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/sources`)
if (!response.ok) throw new Error('Failed to fetch sources')
return await response.json()
} catch {
// Return mock data for demo
return MOCK_SOURCES
}
}
async function fetchStats(): Promise<DSFACorpusStats> {
try {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/stats`)
if (!response.ok) throw new Error('Failed to fetch stats')
return await response.json()
} catch {
return MOCK_STATS
}
}
async function initializeCorpus(): Promise<{ sources_registered: number }> {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/init`, {
method: 'POST',
})
if (!response.ok) throw new Error('Failed to initialize corpus')
return await response.json()
}
async function triggerIngestion(sourceCode: string): Promise<void> {
const response = await fetch(`${API_BASE}/api/v1/dsfa-rag/sources/${sourceCode}/ingest`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({}),
})
if (!response.ok) throw new Error('Failed to trigger ingestion')
}
// ============================================================================
// MOCK DATA
// ============================================================================
const MOCK_SOURCES: DSFASource[] = [
{
id: '1',
sourceCode: 'WP248',
name: 'WP248 rev.01 - Leitlinien zur DSFA',
fullName: 'Leitlinien zur Datenschutz-Folgenabschaetzung',
organization: 'Artikel-29-Datenschutzgruppe / EDPB',
sourceUrl: 'https://ec.europa.eu/newsroom/article29/items/611236/en',
licenseCode: 'EDPB-LICENSE',
licenseName: 'EDPB Document License',
attributionRequired: true,
attributionText: 'Quelle: WP248 rev.01, Artikel-29-Datenschutzgruppe (2017)',
documentType: 'guideline',
language: 'de',
},
{
id: '2',
sourceCode: 'DSK_KP5',
name: 'Kurzpapier Nr. 5 - DSFA nach Art. 35 DS-GVO',
organization: 'Datenschutzkonferenz (DSK)',
sourceUrl: 'https://www.datenschutzkonferenz-online.de/media/kp/dsk_kpnr_5.pdf',
licenseCode: 'DL-DE-BY-2.0',
licenseName: 'Datenlizenz DE Namensnennung 2.0',
licenseUrl: 'https://www.govdata.de/dl-de/by-2-0',
attributionRequired: true,
attributionText: 'Quelle: DSK Kurzpapier Nr. 5 (Stand: 2018)',
documentType: 'guideline',
language: 'de',
},
{
id: '3',
sourceCode: 'BFDI_MUSS_PUBLIC',
name: 'BfDI DSFA-Liste (oeffentlicher Bereich)',
organization: 'BfDI',
sourceUrl: 'https://www.bfdi.bund.de',
licenseCode: 'DL-DE-ZERO-2.0',
licenseName: 'Datenlizenz DE Zero 2.0',
attributionRequired: false,
attributionText: 'Quelle: BfDI, Liste gem. Art. 35 Abs. 4 DSGVO',
documentType: 'checklist',
language: 'de',
},
{
id: '4',
sourceCode: 'NI_MUSS_PRIVATE',
name: 'LfD NI DSFA-Liste (nicht-oeffentlich)',
organization: 'LfD Niedersachsen',
sourceUrl: 'https://www.lfd.niedersachsen.de/download/131098',
licenseCode: 'DL-DE-BY-2.0',
licenseName: 'Datenlizenz DE Namensnennung 2.0',
attributionRequired: true,
attributionText: 'Quelle: LfD Niedersachsen, DSFA-Muss-Liste',
documentType: 'checklist',
language: 'de',
},
]
const MOCK_STATS: DSFACorpusStats = {
sources: [
{
sourceId: '1',
sourceCode: 'WP248',
name: 'WP248 rev.01',
organization: 'EDPB',
licenseCode: 'EDPB-LICENSE',
documentType: 'guideline',
documentCount: 1,
chunkCount: 50,
lastIndexedAt: '2026-02-09T10:00:00Z',
},
{
sourceId: '2',
sourceCode: 'DSK_KP5',
name: 'DSK Kurzpapier Nr. 5',
organization: 'DSK',
licenseCode: 'DL-DE-BY-2.0',
documentType: 'guideline',
documentCount: 1,
chunkCount: 35,
lastIndexedAt: '2026-02-09T10:00:00Z',
},
],
totalSources: 45,
totalDocuments: 45,
totalChunks: 850,
qdrantCollection: 'bp_dsfa_corpus',
qdrantPointsCount: 850,
qdrantStatus: 'green',
}
// ============================================================================
// COMPONENTS
// ============================================================================
function LicenseBadge({ licenseCode }: { licenseCode: DSFALicenseCode }) {
const colorMap: Record<DSFALicenseCode, string> = {
'DL-DE-BY-2.0': 'bg-blue-100 text-blue-700 border-blue-200',
'DL-DE-ZERO-2.0': 'bg-gray-100 text-gray-700 border-gray-200',
'CC-BY-4.0': 'bg-green-100 text-green-700 border-green-200',
'EDPB-LICENSE': 'bg-purple-100 text-purple-700 border-purple-200',
'PUBLIC_DOMAIN': 'bg-gray-100 text-gray-600 border-gray-200',
'PROPRIETARY': 'bg-amber-100 text-amber-700 border-amber-200',
}
return (
<span className={`inline-flex items-center gap-1 px-2 py-0.5 rounded text-xs border ${colorMap[licenseCode] || 'bg-gray-100 text-gray-700 border-gray-200'}`}>
<Scale className="w-3 h-3" />
{DSFA_LICENSE_LABELS[licenseCode] || licenseCode}
</span>
)
}
function DocumentTypeBadge({ type }: { type?: string }) {
if (!type) return null
const colorMap: Record<string, string> = {
guideline: 'bg-indigo-100 text-indigo-700',
checklist: 'bg-emerald-100 text-emerald-700',
regulation: 'bg-red-100 text-red-700',
template: 'bg-orange-100 text-orange-700',
}
return (
<span className={`inline-flex items-center px-2 py-0.5 rounded text-xs ${colorMap[type] || 'bg-gray-100 text-gray-700'}`}>
{DSFA_DOCUMENT_TYPE_LABELS[type as keyof typeof DSFA_DOCUMENT_TYPE_LABELS] || type}
</span>
)
}
function StatusIndicator({ status }: { status: string }) {
const statusConfig: Record<string, { color: string; icon: React.ReactNode; label: string }> = {
green: { color: 'text-green-500', icon: <CheckCircle className="w-4 h-4" />, label: 'Aktiv' },
yellow: { color: 'text-yellow-500', icon: <Clock className="w-4 h-4" />, label: 'Ausstehend' },
red: { color: 'text-red-500', icon: <AlertCircle className="w-4 h-4" />, label: 'Fehler' },
}
const config = statusConfig[status] || statusConfig.yellow
return (
<span className={`inline-flex items-center gap-1 ${config.color}`}>
{config.icon}
<span className="text-sm">{config.label}</span>
</span>
)
}
function SourceCard({
source,
stats,
onIngest,
isIngesting
}: {
source: DSFASource
stats?: DSFASourceStats
onIngest: () => void
isIngesting: boolean
}) {
const [isExpanded, setIsExpanded] = useState(false)
return (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 overflow-hidden">
<div className="p-4">
<div className="flex items-start justify-between">
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2 mb-1">
<span className="font-mono text-xs bg-gray-100 dark:bg-gray-700 px-2 py-0.5 rounded">
{source.sourceCode}
</span>
<DocumentTypeBadge type={source.documentType} />
</div>
<h3 className="font-semibold text-gray-900 dark:text-white truncate">
{source.name}
</h3>
{source.organization && (
<p className="text-sm text-gray-500 dark:text-gray-400">
{source.organization}
</p>
)}
</div>
<button
onClick={() => setIsExpanded(!isExpanded)}
className="p-1 hover:bg-gray-100 dark:hover:bg-gray-700 rounded"
>
{isExpanded ? (
<ChevronUp className="w-5 h-5 text-gray-400" />
) : (
<ChevronDown className="w-5 h-5 text-gray-400" />
)}
</button>
</div>
<div className="flex items-center gap-4 mt-3">
<LicenseBadge licenseCode={source.licenseCode} />
{stats && (
<>
<span className="text-sm text-gray-500">
{stats.documentCount} Dok.
</span>
<span className="text-sm text-gray-500">
{stats.chunkCount} Chunks
</span>
</>
)}
</div>
{source.attributionRequired && (
<div className="mt-3 p-2 bg-amber-50 dark:bg-amber-900/20 rounded text-xs text-amber-700 dark:text-amber-300">
<strong>Attribution:</strong> {source.attributionText}
</div>
)}
</div>
{isExpanded && (
<div className="border-t border-gray-200 dark:border-gray-700 p-4 bg-gray-50 dark:bg-gray-900">
<dl className="grid grid-cols-2 gap-3 text-sm">
{source.sourceUrl && (
<>
<dt className="text-gray-500">Quelle:</dt>
<dd>
<a
href={source.sourceUrl}
target="_blank"
rel="noopener noreferrer"
className="text-blue-600 hover:underline flex items-center gap-1"
>
Link <ExternalLink className="w-3 h-3" />
</a>
</dd>
</>
)}
{source.licenseUrl && (
<>
<dt className="text-gray-500">Lizenz-URL:</dt>
<dd>
<a
href={source.licenseUrl}
target="_blank"
rel="noopener noreferrer"
className="text-blue-600 hover:underline flex items-center gap-1"
>
{source.licenseName} <ExternalLink className="w-3 h-3" />
</a>
</dd>
</>
)}
<dt className="text-gray-500">Sprache:</dt>
<dd className="uppercase">{source.language}</dd>
{stats?.lastIndexedAt && (
<>
<dt className="text-gray-500">Zuletzt indexiert:</dt>
<dd>{new Date(stats.lastIndexedAt).toLocaleString('de-DE')}</dd>
</>
)}
</dl>
<div className="mt-4 flex gap-2">
<button
onClick={onIngest}
disabled={isIngesting}
className="px-3 py-1.5 text-sm bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 flex items-center gap-1"
>
{isIngesting ? (
<RefreshCw className="w-4 h-4 animate-spin" />
) : (
<RefreshCw className="w-4 h-4" />
)}
Neu indexieren
</button>
</div>
</div>
)}
</div>
)
}
function StatsOverview({ stats }: { stats: DSFACorpusStats }) {
return (
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-6">
<div className="flex items-center justify-between mb-4">
<h2 className="text-lg font-semibold text-gray-900 dark:text-white flex items-center gap-2">
<Database className="w-5 h-5" />
Corpus-Statistik
</h2>
<StatusIndicator status={stats.qdrantStatus} />
</div>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div className="text-center p-4 bg-blue-50 dark:bg-blue-900/20 rounded-lg">
<p className="text-2xl font-bold text-blue-600 dark:text-blue-400">
{stats.totalSources}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Quellen</p>
</div>
<div className="text-center p-4 bg-emerald-50 dark:bg-emerald-900/20 rounded-lg">
<p className="text-2xl font-bold text-emerald-600 dark:text-emerald-400">
{stats.totalDocuments}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Dokumente</p>
</div>
<div className="text-center p-4 bg-purple-50 dark:bg-purple-900/20 rounded-lg">
<p className="text-2xl font-bold text-purple-600 dark:text-purple-400">
{stats.totalChunks.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Chunks</p>
</div>
<div className="text-center p-4 bg-orange-50 dark:bg-orange-900/20 rounded-lg">
<p className="text-2xl font-bold text-orange-600 dark:text-orange-400">
{stats.qdrantPointsCount.toLocaleString()}
</p>
<p className="text-sm text-gray-600 dark:text-gray-400">Vektoren</p>
</div>
</div>
<div className="mt-4 p-3 bg-gray-50 dark:bg-gray-900 rounded-lg">
<p className="text-sm text-gray-600 dark:text-gray-400">
<strong>Collection:</strong>{' '}
<code className="font-mono bg-gray-200 dark:bg-gray-700 px-1 rounded">
{stats.qdrantCollection}
</code>
</p>
</div>
</div>
)
}
// ============================================================================
// MAIN PAGE
// ============================================================================
export default function DSFADocumentManagerPage() { export default function DSFADocumentManagerPage() {
const [sources, setSources] = useState<DSFASource[]>([]) const [sources, setSources] = useState<DSFASource[]>([])
@@ -461,7 +69,6 @@ export default function DSFADocumentManagerPage() {
setIsInitializing(true) setIsInitializing(true)
try { try {
await initializeCorpus() await initializeCorpus()
// Reload data
const [sourcesData, statsData] = await Promise.all([ const [sourcesData, statsData] = await Promise.all([
fetchSources(), fetchSources(),
fetchStats(), fetchStats(),
@@ -479,7 +86,6 @@ export default function DSFADocumentManagerPage() {
setIngestingSource(sourceCode) setIngestingSource(sourceCode)
try { try {
await triggerIngestion(sourceCode) await triggerIngestion(sourceCode)
// Reload stats
const statsData = await fetchStats() const statsData = await fetchStats()
setStats(statsData) setStats(statsData)
} catch (err) { } catch (err) {
@@ -501,7 +107,6 @@ export default function DSFADocumentManagerPage() {
return matchesSearch && matchesType return matchesSearch && matchesType
}) })
// Get stats by source code
const getStatsForSource = (sourceCode: string): DSFASourceStats | undefined => { const getStatsForSource = (sourceCode: string): DSFASourceStats | undefined => {
return stats?.sources.find(s => s.sourceCode === sourceCode) return stats?.sources.find(s => s.sourceCode === sourceCode)
} }

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,63 @@
// ============================================================================
// RAG Pipeline Types
// ============================================================================
export interface TrainingJob {
id: string
name: string
model_type: 'zeugnis' | 'klausur' | 'general'
status: 'queued' | 'preparing' | 'training' | 'validating' | 'completed' | 'failed' | 'paused'
progress: number
current_epoch: number
total_epochs: number
loss: number
val_loss: number
learning_rate: number
documents_processed: number
total_documents: number
started_at: string | null
estimated_completion: string | null
error_message: string | null
metrics: TrainingMetrics
config: TrainingConfig
}
export interface TrainingMetrics {
precision: number
recall: number
f1_score: number
accuracy: number
loss_history: number[]
val_loss_history: number[]
confusion_matrix?: number[][]
}
export interface TrainingConfig {
batch_size: number
learning_rate: number
epochs: number
warmup_steps: number
weight_decay: number
gradient_accumulation: number
mixed_precision: boolean
bundeslaender: string[]
}
export interface DatasetStats {
total_documents: number
total_chunks: number
training_allowed: number
by_bundesland: Record<string, number>
by_doc_type: Record<string, number>
}
export interface DataSource {
id: string
name: string
description: string
collection: string
document_count: number
chunk_count: number
last_updated: string | null
status: 'active' | 'pending' | 'error'
}

View File

@@ -0,0 +1,147 @@
'use client'
import { useState, useEffect } from 'react'
import type { TrainingJob, TrainingConfig, DatasetStats, DataSource } from './types'
import {
MOCK_JOBS,
MOCK_STATS,
MOCK_DATA_SOURCES,
fetchJobs,
fetchDatasetStats,
createTrainingJob,
pauseJob,
resumeJob,
cancelJob,
} from './api'
export type TabType = 'dashboard' | 'architecture' | 'sources'
export interface RagPipelineState {
activeTab: TabType
setActiveTab: (tab: TabType) => void
jobs: TrainingJob[]
stats: DatasetStats
dataSources: DataSource[]
showNewTrainingModal: boolean
setShowNewTrainingModal: (show: boolean) => void
selectedJob: TrainingJob | null
setSelectedJob: (job: TrainingJob | null) => void
isLoading: boolean
error: string | null
setError: (error: string | null) => void
handleStartTraining: (config: Partial<TrainingConfig>) => Promise<void>
handlePauseJob: (jobId: string) => Promise<void>
handleResumeJob: (jobId: string) => Promise<void>
handleCancelJob: (jobId: string) => Promise<void>
}
export function useRagPipeline(): RagPipelineState {
const [activeTab, setActiveTab] = useState<TabType>('dashboard')
const [jobs, setJobs] = useState<TrainingJob[]>([])
const [stats, setStats] = useState<DatasetStats>(MOCK_STATS)
const [dataSources] = useState<DataSource[]>(MOCK_DATA_SOURCES)
const [showNewTrainingModal, setShowNewTrainingModal] = useState(false)
const [selectedJob, setSelectedJob] = useState<TrainingJob | null>(null)
const [isLoading, setIsLoading] = useState(true)
const [error, setError] = useState<string | null>(null)
useEffect(() => {
async function loadData() {
setIsLoading(true)
try {
const [jobsData, statsData] = await Promise.all([
fetchJobs(),
fetchDatasetStats(),
])
setJobs(jobsData)
setStats(statsData)
setError(null)
} catch (err) {
console.error('Failed to load data:', err)
setError('Verbindung zum Backend fehlgeschlagen')
setJobs(MOCK_JOBS)
setStats(MOCK_STATS)
} finally {
setIsLoading(false)
}
}
loadData()
}, [])
useEffect(() => {
const hasActiveJob = jobs.some(j => j.status === 'training' || j.status === 'preparing')
if (!hasActiveJob) return
const interval = setInterval(async () => {
try {
const updatedJobs = await fetchJobs()
setJobs(updatedJobs)
} catch (err) {
console.error('Failed to refresh jobs:', err)
}
}, 2000)
return () => clearInterval(interval)
}, [jobs])
const handleStartTraining = async (config: Partial<TrainingConfig>) => {
try {
await createTrainingJob(config)
const updatedJobs = await fetchJobs()
setJobs(updatedJobs)
setShowNewTrainingModal(false)
} catch (err) {
console.error('Failed to start training:', err)
setError(err instanceof Error ? err.message : 'Indexierung konnte nicht gestartet werden')
}
}
const handlePauseJob = async (jobId: string) => {
try {
await pauseJob(jobId)
const updatedJobs = await fetchJobs()
setJobs(updatedJobs)
} catch (err) {
console.error('Failed to pause job:', err)
}
}
const handleResumeJob = async (jobId: string) => {
try {
await resumeJob(jobId)
const updatedJobs = await fetchJobs()
setJobs(updatedJobs)
} catch (err) {
console.error('Failed to resume job:', err)
}
}
const handleCancelJob = async (jobId: string) => {
try {
await cancelJob(jobId)
const updatedJobs = await fetchJobs()
setJobs(updatedJobs)
} catch (err) {
console.error('Failed to cancel job:', err)
}
}
return {
activeTab,
setActiveTab,
jobs,
stats,
dataSources,
showNewTrainingModal,
setShowNewTrainingModal,
selectedJob,
setSelectedJob,
isLoading,
error,
setError,
handleStartTraining,
handlePauseJob,
handleResumeJob,
handleCancelJob,
}
}

View File

@@ -0,0 +1,252 @@
import { describe, it, expect } from 'vitest'
import ragData from '../rag-documents.json'
/**
* Tests fuer rag-documents.json — Branchen-Regulierungs-Matrix
*
* Validiert die JSON-Struktur, Branchen-Zuordnung und Datenintegritaet
* der 320 Dokumente fuer die RAG Landkarte.
*/
const VALID_INDUSTRY_IDS = ragData.industries.map((i: any) => i.id)
const VALID_DOC_TYPE_IDS = ragData.doc_types.map((dt: any) => dt.id)
describe('rag-documents.json — Struktur', () => {
it('sollte doc_types, industries und documents enthalten', () => {
expect(ragData).toHaveProperty('doc_types')
expect(ragData).toHaveProperty('industries')
expect(ragData).toHaveProperty('documents')
expect(Array.isArray(ragData.doc_types)).toBe(true)
expect(Array.isArray(ragData.industries)).toBe(true)
expect(Array.isArray(ragData.documents)).toBe(true)
})
it('sollte genau 10 Branchen haben (VDMA/VDA/BDI)', () => {
expect(ragData.industries).toHaveLength(10)
const ids = ragData.industries.map((i: any) => i.id)
expect(ids).toContain('automotive')
expect(ids).toContain('maschinenbau')
expect(ids).toContain('elektrotechnik')
expect(ids).toContain('chemie')
expect(ids).toContain('metall')
expect(ids).toContain('energie')
expect(ids).toContain('transport')
expect(ids).toContain('handel')
expect(ids).toContain('konsumgueter')
expect(ids).toContain('bau')
})
it('sollte keine Pseudo-Branchen enthalten (IoT, KI, HR, KRITIS, etc.)', () => {
const ids = ragData.industries.map((i: any) => i.id)
expect(ids).not.toContain('iot')
expect(ids).not.toContain('ai')
expect(ids).not.toContain('hr')
expect(ids).not.toContain('kritis')
expect(ids).not.toContain('ecommerce')
expect(ids).not.toContain('tech')
expect(ids).not.toContain('media')
expect(ids).not.toContain('public')
})
it('sollte 17 Dokumenttypen haben', () => {
expect(ragData.doc_types.length).toBe(17)
})
it('sollte mindestens 300 Dokumente haben', () => {
expect(ragData.documents.length).toBeGreaterThanOrEqual(300)
})
it('sollte jede Branche name und icon haben', () => {
ragData.industries.forEach((ind: any) => {
expect(ind).toHaveProperty('id')
expect(ind).toHaveProperty('name')
expect(ind).toHaveProperty('icon')
expect(ind.name.length).toBeGreaterThan(0)
})
})
it('sollte jeden doc_type mit id, label, icon und sort haben', () => {
ragData.doc_types.forEach((dt: any) => {
expect(dt).toHaveProperty('id')
expect(dt).toHaveProperty('label')
expect(dt).toHaveProperty('icon')
expect(dt).toHaveProperty('sort')
})
})
})
describe('rag-documents.json — Dokument-Validierung', () => {
it('sollte keine doppelten Codes haben', () => {
const codes = ragData.documents.map((d: any) => d.code)
const unique = new Set(codes)
expect(unique.size).toBe(codes.length)
})
it('sollte Pflichtfelder bei jedem Dokument haben', () => {
ragData.documents.forEach((doc: any) => {
expect(doc).toHaveProperty('code')
expect(doc).toHaveProperty('name')
expect(doc).toHaveProperty('doc_type')
expect(doc).toHaveProperty('industries')
expect(doc).toHaveProperty('in_rag')
expect(doc).toHaveProperty('rag_collection')
expect(doc.code.length).toBeGreaterThan(0)
expect(doc.name.length).toBeGreaterThan(0)
expect(Array.isArray(doc.industries)).toBe(true)
})
})
it('sollte nur gueltige doc_type IDs verwenden', () => {
ragData.documents.forEach((doc: any) => {
expect(VALID_DOC_TYPE_IDS).toContain(doc.doc_type)
})
})
it('sollte nur gueltige industry IDs verwenden (oder "all")', () => {
ragData.documents.forEach((doc: any) => {
doc.industries.forEach((ind: string) => {
if (ind !== 'all') {
expect(VALID_INDUSTRY_IDS).toContain(ind)
}
})
})
})
it('sollte gueltige rag_collection Namen verwenden', () => {
const validCollections = [
'bp_compliance_ce',
'bp_compliance_gesetze',
'bp_compliance_datenschutz',
'bp_dsfa_corpus',
'bp_legal_templates',
'bp_compliance_recht',
'bp_nibis_eh',
]
ragData.documents.forEach((doc: any) => {
expect(validCollections).toContain(doc.rag_collection)
})
})
})
describe('rag-documents.json — Branchen-Zuordnungslogik', () => {
const findDoc = (code: string) => ragData.documents.find((d: any) => d.code === code)
describe('Horizontale Regulierungen (alle Branchen)', () => {
const horizontalCodes = [
'GDPR', 'BDSG_FULL', 'EPRIVACY', 'TDDDG', 'AIACT', 'CRA',
'NIS2', 'GPSR', 'PLD', 'EUCSA', 'DATAACT',
]
horizontalCodes.forEach((code) => {
it(`${code} sollte fuer alle Branchen gelten`, () => {
const doc = findDoc(code)
if (doc) {
expect(doc.industries).toContain('all')
}
})
})
})
describe('Sektorspezifische Regulierungen', () => {
it('Maschinenverordnung sollte Maschinenbau, Automotive, Elektrotechnik enthalten', () => {
const doc = findDoc('MACHINERY_REG')
if (doc) {
expect(doc.industries).toContain('maschinenbau')
expect(doc.industries).toContain('automotive')
expect(doc.industries).toContain('elektrotechnik')
expect(doc.industries).not.toContain('all')
}
})
it('ElektroG sollte Elektrotechnik und Automotive enthalten', () => {
const doc = findDoc('DE_ELEKTROG')
if (doc) {
expect(doc.industries).toContain('elektrotechnik')
expect(doc.industries).toContain('automotive')
}
})
it('BattDG sollte Automotive und Elektrotechnik enthalten', () => {
const doc = findDoc('DE_BATTDG')
if (doc) {
expect(doc.industries).toContain('automotive')
expect(doc.industries).toContain('elektrotechnik')
}
})
it('ENISA ICS/SCADA sollte Energie, Maschinenbau, Chemie enthalten', () => {
const doc = findDoc('ENISA_ICS_SCADA')
if (doc) {
expect(doc.industries).toContain('energie')
expect(doc.industries).toContain('maschinenbau')
expect(doc.industries).toContain('chemie')
}
})
})
describe('Nicht zutreffende Regulierungen (Finanz/Medizin/Plattformen)', () => {
const emptyIndustryCodes = ['DORA', 'PSD2', 'MiCA', 'AMLR', 'EHDS', 'DSA', 'DMA', 'MDR']
emptyIndustryCodes.forEach((code) => {
it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
const doc = findDoc(code)
if (doc) {
expect(doc.industries).toHaveLength(0)
}
})
})
})
describe('BSI-TR-03161 (DiGA) sollte nicht zutreffend sein', () => {
['BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3'].forEach((code) => {
it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
const doc = findDoc(code)
if (doc) {
expect(doc.industries).toHaveLength(0)
}
})
})
})
})
describe('rag-documents.json — Applicability Notes', () => {
it('sollte applicability_note bei Dokumenten mit description haben', () => {
const withDescription = ragData.documents.filter((d: any) => d.description)
const withNote = withDescription.filter((d: any) => d.applicability_note)
// Mindestens 90% der Dokumente mit Beschreibung sollten eine Note haben
expect(withNote.length / withDescription.length).toBeGreaterThan(0.9)
})
it('horizontale Regulierungen sollten "alle Branchen" in der Note erwaehnen', () => {
const gdpr = ragData.documents.find((d: any) => d.code === 'GDPR')
if (gdpr?.applicability_note) {
expect(gdpr.applicability_note.toLowerCase()).toContain('alle branchen')
}
})
it('nicht zutreffende sollten "nicht zutreffend" in der Note erwaehnen', () => {
const dora = ragData.documents.find((d: any) => d.code === 'DORA')
if (dora?.applicability_note) {
expect(dora.applicability_note.toLowerCase()).toContain('nicht zutreffend')
}
})
})
describe('rag-documents.json — Dokumenttyp-Verteilung', () => {
it('sollte Dokumente in jedem doc_type haben', () => {
ragData.doc_types.forEach((dt: any) => {
const count = ragData.documents.filter((d: any) => d.doc_type === dt.id).length
expect(count).toBeGreaterThan(0)
})
})
it('sollte EU-Verordnungen als groesste Kategorie haben (mind. 15)', () => {
const euRegs = ragData.documents.filter((d: any) => d.doc_type === 'eu_regulation')
expect(euRegs.length).toBeGreaterThanOrEqual(15)
})
it('sollte EDPB Leitlinien als umfangreichste Kategorie haben (mind. 40)', () => {
const edpb = ragData.documents.filter((d: any) => d.doc_type === 'edpb_guideline')
expect(edpb.length).toBeGreaterThanOrEqual(40)
})
})

View File

@@ -0,0 +1,195 @@
'use client'
import React from 'react'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface DataTabProps {
hook: UseRAGPageReturn
}
export function DataTab({ hook }: DataTabProps) {
const {
customDocuments,
uploadFile,
setUploadFile,
uploadTitle,
setUploadTitle,
uploadCode,
setUploadCode,
uploading,
handleUpload,
linkUrl,
setLinkUrl,
linkTitle,
setLinkTitle,
linkCode,
setLinkCode,
addingLink,
handleAddLink,
handleDeleteDocument,
fetchCustomDocuments,
} = hook
return (
<div className="space-y-6">
{/* Upload Document */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Dokument hochladen (PDF)</h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">PDF-Datei</label>
<input
type="file"
accept=".pdf"
onChange={(e) => setUploadFile(e.target.files?.[0] || null)}
className="w-full px-3 py-2 border rounded-lg text-sm"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Titel</label>
<input
type="text"
value={uploadTitle}
onChange={(e) => setUploadTitle(e.target.value)}
placeholder="z.B. Firmen-Datenschutzrichtlinie"
className="w-full px-3 py-2 border rounded-lg"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Code (eindeutig)</label>
<input
type="text"
value={uploadCode}
onChange={(e) => setUploadCode(e.target.value.toUpperCase())}
placeholder="z.B. CUSTOM-DSR-01"
className="w-full px-3 py-2 border rounded-lg font-mono"
/>
</div>
</div>
<button
onClick={handleUpload}
disabled={uploading || !uploadFile || !uploadTitle || !uploadCode}
className="mt-4 px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
>
{uploading ? 'Wird hochgeladen...' : 'Hochladen & Indexieren'}
</button>
</div>
{/* Add Link */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Link hinzufuegen (Webseite/PDF)</h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">URL</label>
<input
type="url"
value={linkUrl}
onChange={(e) => setLinkUrl(e.target.value)}
placeholder="https://example.com/document.pdf"
className="w-full px-3 py-2 border rounded-lg"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Titel</label>
<input
type="text"
value={linkTitle}
onChange={(e) => setLinkTitle(e.target.value)}
placeholder="z.B. BSI IT-Grundschutz"
className="w-full px-3 py-2 border rounded-lg"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Code (eindeutig)</label>
<input
type="text"
value={linkCode}
onChange={(e) => setLinkCode(e.target.value.toUpperCase())}
placeholder="z.B. BSI-GRUNDSCHUTZ"
className="w-full px-3 py-2 border rounded-lg font-mono"
/>
</div>
</div>
<button
onClick={handleAddLink}
disabled={addingLink || !linkUrl || !linkTitle || !linkCode}
className="mt-4 px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
>
{addingLink ? 'Wird hinzugefuegt...' : 'Link hinzufuegen & Indexieren'}
</button>
</div>
{/* Custom Documents List */}
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50 flex items-center justify-between">
<h3 className="font-semibold text-slate-900">Eigene Dokumente ({customDocuments.length})</h3>
<button
onClick={fetchCustomDocuments}
className="text-sm text-teal-600 hover:text-teal-700"
>
Aktualisieren
</button>
</div>
{customDocuments.length === 0 ? (
<div className="p-8 text-center text-slate-500">
Noch keine eigenen Dokumente hinzugefuegt.
</div>
) : (
<div className="divide-y">
{customDocuments.map((doc) => (
<div key={doc.id} className="px-4 py-3 flex items-center justify-between">
<div className="flex items-center gap-3">
<span className="w-8 h-8 rounded-lg bg-slate-100 flex items-center justify-center text-lg">
{doc.url ? '🔗' : '📄'}
</span>
<div>
<p className="font-medium text-slate-900">{doc.title}</p>
<p className="text-sm text-slate-500">
<span className="font-mono text-teal-600">{doc.code}</span>
{' • '}
{doc.filename || doc.url}
</p>
</div>
</div>
<div className="flex items-center gap-4">
<span className={`px-2 py-1 rounded text-xs font-medium ${
doc.status === 'indexed' ? 'bg-green-100 text-green-700' :
doc.status === 'error' ? 'bg-red-100 text-red-700' :
doc.status === 'processing' || doc.status === 'fetching' ? 'bg-blue-100 text-blue-700' :
'bg-slate-100 text-slate-700'
}`}>
{doc.status === 'indexed' ? `${doc.chunk_count} Chunks` :
doc.status === 'error' ? 'Fehler' :
doc.status === 'processing' ? 'Verarbeitung...' :
doc.status === 'fetching' ? 'Abruf...' :
doc.status}
</span>
<button
onClick={() => handleDeleteDocument(doc.id)}
className="text-red-500 hover:text-red-700 text-sm"
>
Loeschen
</button>
</div>
</div>
))}
</div>
)}
</div>
{/* Info Box */}
<div className="bg-teal-50 border border-teal-200 rounded-xl p-6">
<h4 className="font-semibold text-teal-800 flex items-center gap-2">
<span></span>
Hinweis zur Verwendung
</h4>
<p className="text-sm text-teal-700 mt-2">
Laden Sie eigene Dokumente (z.B. interne Datenschutzrichtlinien, Vertraege) oder
externe Links hoch. Diese werden automatisch in Chunks aufgeteilt und indexiert.
Nach dem Hinzufuegen koennen Sie im <strong>Pipeline</strong>-Tab die vollstaendige
Compliance-Analyse starten.
</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,69 @@
'use client'
import React from 'react'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface IngestionTabProps {
hook: UseRAGPageReturn
}
export function IngestionTab({ hook }: IngestionTabProps) {
const { ingestionRunning, ingestionLog, triggerIngestion } = hook
return (
<div className="space-y-6">
{/* Ingestion Control */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Legal Corpus Re-Ingestion</h3>
<p className="text-slate-600 mb-4">
Startet die Neuindexierung aller 19 Regulierungen. Die Dokumente werden von EUR-Lex,
gesetze-im-internet.de und BSI heruntergeladen, in semantische Chunks aufgeteilt und
mit BGE-M3 Embeddings in Qdrant indexiert.
</p>
<div className="flex items-center gap-4">
<button
onClick={triggerIngestion}
disabled={ingestionRunning}
className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
>
{ingestionRunning ? 'Laeuft...' : 'Re-Ingestion starten'}
</button>
{ingestionRunning && (
<span className="flex items-center gap-2 text-teal-600">
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
</svg>
Ingestion laeuft...
</span>
)}
</div>
</div>
{/* Ingestion Log */}
{ingestionLog.length > 0 && (
<div className="bg-slate-900 rounded-xl p-4">
<h4 className="text-slate-400 text-sm mb-2">Log</h4>
<div className="font-mono text-sm text-green-400 space-y-1 max-h-64 overflow-y-auto">
{ingestionLog.map((line, i) => (
<div key={i}>{line}</div>
))}
</div>
</div>
)}
{/* Info Box */}
<div className="bg-teal-50 border border-teal-200 rounded-xl p-6">
<h4 className="font-semibold text-teal-800 flex items-center gap-2">
<span>💡</span>
Hinweis zur Datenquelle
</h4>
<p className="text-sm text-teal-700 mt-2">
Alle indexierten Dokumente sind amtliche Werke (§5 UrhG) und damit urheberrechtsfrei.
Sie werden nur fuer RAG/Retrieval verwendet, nicht fuer Modell-Training.
Die Daten werden lokal auf dem Mac Mini verarbeitet und nicht an externe Dienste gesendet.
</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,373 @@
'use client'
import React from 'react'
import {
REGULATIONS,
DOC_TYPES,
INDUSTRIES_LIST,
INDUSTRIES,
INDUSTRY_REGULATION_MAP,
TYPE_COLORS,
THEMATIC_GROUPS,
KEY_INTERSECTIONS,
RAG_DOCUMENTS,
isInRag,
} from '../rag-data'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
import {
FutureOutlookSection,
RagCoverageSection,
FutureRegulationsSection,
LegalBasisSection,
} from './MapTabSections'
interface MapTabProps {
hook: UseRAGPageReturn
}
export function MapTab({ hook }: MapTabProps) {
const {
expandedRegulation,
setExpandedRegulation,
expandedDocTypes,
setExpandedDocTypes,
expandedMatrixDoc,
setExpandedMatrixDoc,
setActiveTab,
} = hook
return (
<div className="space-y-6">
{/* Industry Filter */}
<IndustryFilter
expandedRegulation={expandedRegulation}
setExpandedRegulation={setExpandedRegulation}
/>
{/* Thematic Groups */}
<ThematicGroupsSection setActiveTab={setActiveTab} setExpandedRegulation={setExpandedRegulation} />
{/* Key Intersections */}
<KeyIntersectionsSection />
{/* Regulation Matrix */}
<RegulationMatrix
expandedDocTypes={expandedDocTypes}
setExpandedDocTypes={setExpandedDocTypes}
expandedMatrixDoc={expandedMatrixDoc}
setExpandedMatrixDoc={setExpandedMatrixDoc}
/>
{/* Future Outlook Section */}
<FutureOutlookSection />
{/* RAG Coverage Overview */}
<RagCoverageSection />
{/* Potential Future Regulations */}
<FutureRegulationsSection />
{/* Legal Basis Info */}
<LegalBasisSection />
</div>
)
}
// --- Sub-components ---
function IndustryFilter({
expandedRegulation,
setExpandedRegulation,
}: {
expandedRegulation: string | null
setExpandedRegulation: (v: string | null) => void
}) {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Regulierungen nach Branche</h3>
<p className="text-sm text-slate-500 mb-4">
Waehlen Sie Ihre Branche, um relevante Regulierungen zu sehen.
</p>
<div className="grid grid-cols-2 md:grid-cols-5 gap-3">
{INDUSTRIES.map((industry) => {
const regs = INDUSTRY_REGULATION_MAP[industry.id] || []
return (
<button
key={industry.id}
onClick={() => setExpandedRegulation(industry.id === expandedRegulation ? null : industry.id)}
className={`p-4 rounded-lg border text-left transition-all ${
expandedRegulation === industry.id
? 'border-teal-500 bg-teal-50 ring-2 ring-teal-200'
: 'border-slate-200 hover:border-slate-300 hover:bg-slate-50'
}`}
>
<div className="text-2xl mb-2">{industry.icon}</div>
<div className="font-medium text-slate-900 text-sm">{industry.name}</div>
<div className="text-xs text-slate-500 mt-1">{regs.length} Regulierungen</div>
</button>
)
})}
</div>
{/* Selected Industry Details */}
{expandedRegulation && INDUSTRIES.find(i => i.id === expandedRegulation) && (
<div className="mt-6 p-4 bg-slate-50 rounded-lg">
{(() => {
const industry = INDUSTRIES.find(i => i.id === expandedRegulation)!
const regCodes = INDUSTRY_REGULATION_MAP[industry.id] || []
const regs = REGULATIONS.filter(r => regCodes.includes(r.code))
return (
<>
<div className="flex items-center gap-3 mb-4">
<span className="text-3xl">{industry.icon}</span>
<div>
<h4 className="font-semibold text-slate-900">{industry.name}</h4>
<p className="text-sm text-slate-500">{industry.description}</p>
</div>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-3">
{regs.map((reg) => {
const regInRag = isInRag(reg.code)
return (
<div
key={reg.code}
className={`bg-white p-3 rounded-lg border ${regInRag ? 'border-green-200' : 'border-slate-200'}`}
>
<div className="flex items-center gap-2 mb-1">
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[reg.type]}`}>
{reg.code}
</span>
{regInRag ? (
<span className="px-1.5 py-0.5 text-[10px] font-bold bg-green-100 text-green-600 rounded">RAG</span>
) : (
<span className="px-1.5 py-0.5 text-[10px] font-bold bg-red-50 text-red-400 rounded"></span>
)}
</div>
<div className="font-medium text-sm text-slate-900">{reg.name}</div>
<div className="text-xs text-slate-500 mt-1 line-clamp-2">{reg.description}</div>
</div>
)
})}
</div>
</>
)
})()}
</div>
)}
</div>
)
}
function ThematicGroupsSection({
setActiveTab,
setExpandedRegulation,
}: {
setActiveTab: (v: any) => void
setExpandedRegulation: (v: string | null) => void
}) {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Thematische Cluster</h3>
<p className="text-sm text-slate-500 mb-4">
Regulierungen gruppiert nach Themenbereichen - zeigt Ueberschneidungen.
</p>
<div className="space-y-4">
{THEMATIC_GROUPS.map((group) => (
<div key={group.id} className="border border-slate-200 rounded-lg overflow-hidden">
<div className={`${group.color} px-4 py-2 text-white font-medium flex items-center justify-between`}>
<span>{group.name}</span>
<span className="text-sm opacity-80">{group.regulations.length} Regulierungen</span>
</div>
<div className="p-4">
<p className="text-sm text-slate-600 mb-3">{group.description}</p>
<div className="flex flex-wrap gap-2">
{group.regulations.map((code) => {
const reg = REGULATIONS.find(r => r.code === code)
const codeInRag = isInRag(code)
return (
<span
key={code}
className={`px-3 py-1.5 rounded-full text-sm font-medium cursor-pointer ${
codeInRag
? 'bg-green-100 text-green-700 hover:bg-green-200'
: 'bg-slate-100 text-slate-700 hover:bg-slate-200'
}`}
onClick={() => {
setActiveTab('regulations')
setExpandedRegulation(code)
}}
title={`${reg?.fullName || code}${codeInRag ? ' (im RAG)' : ' (nicht im RAG)'}`}
>
{codeInRag ? '✓ ' : '✗ '}{code}
</span>
)
})}
</div>
</div>
</div>
))}
</div>
</div>
)
}
function KeyIntersectionsSection() {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Wichtige Schnittstellen</h3>
<p className="text-sm text-slate-500 mb-4">
Bereiche, in denen sich mehrere Regulierungen ueberschneiden und zusammenwirken.
</p>
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{KEY_INTERSECTIONS.map((intersection, idx) => (
<div key={idx} className="bg-gradient-to-br from-slate-50 to-slate-100 rounded-lg p-4 border border-slate-200">
<div className="flex flex-wrap gap-1 mb-2">
{intersection.regulations.map((code) => (
<span
key={code}
className={`px-2 py-0.5 text-xs font-medium rounded ${
isInRag(code)
? 'bg-green-100 text-green-700'
: 'bg-red-50 text-red-500'
}`}
>
{isInRag(code) ? '✓ ' : '✗ '}{code}
</span>
))}
</div>
<div className="font-medium text-slate-900 text-sm mb-1">{intersection.topic}</div>
<div className="text-xs text-slate-500">{intersection.description}</div>
</div>
))}
</div>
</div>
)
}
function RegulationMatrix({
expandedDocTypes,
setExpandedDocTypes,
expandedMatrixDoc,
setExpandedMatrixDoc,
}: {
expandedDocTypes: string[]
setExpandedDocTypes: (fn: (prev: string[]) => string[]) => void
expandedMatrixDoc: string | null
setExpandedMatrixDoc: (v: string | null) => void
}) {
return (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50">
<h3 className="font-semibold text-slate-900">Branchen-Regulierungs-Matrix</h3>
<p className="text-sm text-slate-500">{RAG_DOCUMENTS.length} Dokumente in {DOC_TYPES.length} Kategorien</p>
</div>
<div className="overflow-x-auto">
<table className="w-full text-xs">
<thead className="bg-slate-50 border-b sticky top-0 z-10">
<tr>
<th className="px-2 py-2 text-left font-medium text-slate-500 sticky left-0 bg-slate-50 min-w-[200px]">Regulierung</th>
{INDUSTRIES_LIST.filter((i: any) => i.id !== 'all').map((industry: any) => (
<th key={industry.id} className="px-2 py-2 text-center font-medium text-slate-500 min-w-[60px]">
<div className="flex flex-col items-center">
<span className="text-lg">{industry.icon}</span>
<span className="text-[10px] leading-tight">{industry.name.split('/')[0]}</span>
</div>
</th>
))}
</tr>
</thead>
<tbody>
{DOC_TYPES.map((docType: any) => {
const docsInType = RAG_DOCUMENTS.filter((d: any) => d.doc_type === docType.id)
if (docsInType.length === 0) return null
const isExpanded = expandedDocTypes.includes(docType.id)
return (
<React.Fragment key={docType.id}>
<tr
className="bg-slate-100 border-t-2 border-slate-300 cursor-pointer hover:bg-slate-200"
onClick={() => {
setExpandedDocTypes(prev =>
prev.includes(docType.id)
? prev.filter((id: string) => id !== docType.id)
: [...prev, docType.id]
)
}}
>
<td colSpan={INDUSTRIES_LIST.length} className="px-3 py-2 font-bold text-slate-700">
<span className="mr-2">{isExpanded ? '\u25BC' : '\u25B6'}</span>
{docType.icon} {docType.label} ({docsInType.length})
</td>
</tr>
{isExpanded && docsInType.map((doc: any) => (
<React.Fragment key={doc.code}>
<tr
className={`hover:bg-slate-50 border-b border-slate-100 cursor-pointer ${expandedMatrixDoc === doc.code ? 'bg-teal-50' : ''}`}
onClick={() => setExpandedMatrixDoc(expandedMatrixDoc === doc.code ? null : doc.code)}
>
<td className="px-2 py-1.5 font-medium sticky left-0 bg-white">
<span className="flex items-center gap-1">
{isInRag(doc.code) ? (
<span className="text-green-500 text-[10px]"></span>
) : (
<span className="text-red-300 text-[10px]"></span>
)}
<span className="text-teal-600 truncate max-w-[180px]" title={doc.full_name || doc.name}>
{doc.name}
</span>
{(doc.applicability_note || doc.description) && (
<span className="text-slate-400 text-[10px] ml-1">{expandedMatrixDoc === doc.code ? '▼' : 'ⓘ'}</span>
)}
</span>
</td>
{INDUSTRIES_LIST.filter((i: any) => i.id !== 'all').map((industry: any) => {
const applies = doc.industries.includes(industry.id) || doc.industries.includes('all')
return (
<td key={industry.id} className="px-2 py-1.5 text-center">
{applies ? (
<span className="inline-flex items-center justify-center w-5 h-5 bg-teal-100 text-teal-600 rounded-full"></span>
) : (
<span className="inline-flex items-center justify-center w-5 h-5 text-slate-300"></span>
)}
</td>
)
})}
</tr>
{expandedMatrixDoc === doc.code && (doc.applicability_note || doc.description) && (
<tr className="bg-teal-50 border-b border-teal-200">
<td colSpan={INDUSTRIES_LIST.length} className="px-4 py-3">
<div className="text-xs space-y-1.5">
{doc.full_name && (
<p className="font-semibold text-slate-700">{doc.full_name}</p>
)}
{doc.applicability_note && (
<p className="text-teal-700 bg-teal-100 px-2 py-1 rounded inline-block">
<span className="font-medium">Branchenrelevanz:</span> {doc.applicability_note}
</p>
)}
{doc.description && (
<p className="text-slate-600">{doc.description}</p>
)}
{doc.effective_date && (
<p className="text-slate-400">In Kraft: {doc.effective_date}</p>
)}
</div>
</td>
</tr>
)}
</React.Fragment>
))}
</React.Fragment>
)
})}
</tbody>
</table>
</div>
</div>
)
}
// FutureOutlookSection, RagCoverageSection, FutureRegulationsSection,
// LegalBasisSection are imported from ./MapTabSections.tsx

View File

@@ -0,0 +1,199 @@
'use client'
import React from 'react'
import { REGULATIONS_IN_RAG } from '../rag-constants'
import {
RAG_DOCUMENTS,
FUTURE_OUTLOOK,
ADDITIONAL_REGULATIONS,
LEGAL_BASIS_INFO,
isInRag,
} from '../rag-data'
export function FutureOutlookSection() {
return (
<div className="bg-gradient-to-r from-indigo-50 to-purple-50 rounded-xl border border-indigo-200 p-6">
<div className="flex items-center gap-3 mb-4">
<span className="text-2xl">🔮</span>
<div>
<h3 className="font-semibold text-slate-900">Zukunftsaussicht</h3>
<p className="text-sm text-slate-500">Geplante Aenderungen und neue Regulierungen</p>
</div>
</div>
<div className="space-y-4">
{FUTURE_OUTLOOK.map((item) => (
<div key={item.id} className="bg-white rounded-lg border border-slate-200 overflow-hidden">
<div className="px-4 py-3 flex items-center justify-between bg-slate-50 border-b">
<div className="flex items-center gap-3">
<span className={`px-2 py-1 text-xs font-medium rounded ${
item.status === 'proposed' ? 'bg-yellow-100 text-yellow-700' :
item.status === 'agreed' ? 'bg-green-100 text-green-700' :
item.status === 'withdrawn' ? 'bg-red-100 text-red-700' :
'bg-blue-100 text-blue-700'
}`}>
{item.statusLabel}
</span>
<h4 className="font-semibold text-slate-900">{item.name}</h4>
</div>
<span className="text-sm text-slate-500">Erwartet: {item.expectedDate}</span>
</div>
<div className="p-4">
<p className="text-sm text-slate-600 mb-3">{item.description}</p>
<div className="mb-3">
<p className="text-xs font-medium text-slate-500 uppercase mb-2">Wichtige Aenderungen:</p>
<ul className="text-sm text-slate-600 space-y-1">
{item.keyChanges.slice(0, 4).map((change, idx) => (
<li key={idx} className="flex items-start gap-2">
<span className="text-teal-500 mt-1"></span>
<span>{change}</span>
</li>
))}
{item.keyChanges.length > 4 && (
<li className="text-slate-400 text-xs">+ {item.keyChanges.length - 4} weitere...</li>
)}
</ul>
</div>
<div className="flex items-center justify-between">
<div className="flex flex-wrap gap-1">
{item.affectedRegulations.map((code) => (
<span key={code} className="px-2 py-0.5 text-xs bg-slate-100 text-slate-600 rounded">
{code}
</span>
))}
</div>
<a
href={item.source}
target="_blank"
rel="noopener noreferrer"
className="text-xs text-teal-600 hover:underline"
>
Quelle
</a>
</div>
</div>
</div>
))}
</div>
</div>
)
}
export function RagCoverageSection() {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center gap-3 mb-4">
<span className="text-2xl"></span>
<div>
<h3 className="font-semibold text-slate-900">RAG-Abdeckung ({Object.keys(REGULATIONS_IN_RAG).length} von {RAG_DOCUMENTS.length} Regulierungen)</h3>
<p className="text-sm text-slate-500">Stand: Maerz 2026 Alle im RAG-System verfuegbaren Regulierungen (inkl. Verbraucherschutz Phase H)</p>
</div>
</div>
<div className="flex flex-wrap gap-2">
{RAG_DOCUMENTS.filter((r: any) => isInRag(r.code)).map((reg: any) => (
<span key={reg.code} className="px-2.5 py-1 text-xs font-medium bg-green-100 text-green-700 rounded-full border border-green-200">
{reg.code}
</span>
))}
</div>
<div className="mt-4 pt-4 border-t border-slate-100">
<p className="text-xs font-medium text-slate-500 mb-2">Noch nicht im RAG:</p>
<div className="flex flex-wrap gap-2">
{RAG_DOCUMENTS.filter((r: any) => !isInRag(r.code)).map((reg: any) => (
<span key={reg.code} className="px-2.5 py-1 text-xs font-medium bg-red-50 text-red-400 rounded-full border border-red-100">
{reg.code}
</span>
))}
</div>
</div>
</div>
)
}
export function FutureRegulationsSection() {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center gap-3 mb-4">
<span className="text-2xl">🔮</span>
<div>
<h3 className="font-semibold text-slate-900">Zukuenftige Regulierungen</h3>
<p className="text-sm text-slate-500">Noch nicht verabschiedet oder zur Erweiterung vorgesehen</p>
</div>
</div>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
{ADDITIONAL_REGULATIONS.map((reg) => (
<div key={reg.code} className={`rounded-lg border p-4 ${
reg.status === 'active' ? 'border-green-200 bg-green-50' : 'border-yellow-200 bg-yellow-50'
}`}>
<div className="flex items-center justify-between mb-2">
<div className="flex items-center gap-2">
<span className={`px-2 py-0.5 text-xs font-bold rounded ${
reg.type === 'eu_regulation' ? 'bg-blue-100 text-blue-700' : 'bg-purple-100 text-purple-700'
}`}>
{reg.code}
</span>
<span className={`px-2 py-0.5 text-xs rounded ${
reg.status === 'active' ? 'bg-green-100 text-green-700' : 'bg-yellow-100 text-yellow-700'
}`}>
{reg.status === 'active' ? 'In Kraft' : 'Vorgeschlagen'}
</span>
</div>
<span className={`px-2 py-0.5 text-xs rounded ${
reg.priority === 'high' ? 'bg-red-100 text-red-700' : 'bg-slate-100 text-slate-600'
}`}>
{reg.priority === 'high' ? 'Hohe Prioritaet' : 'Mittel'}
</span>
</div>
<h4 className="font-medium text-slate-900 text-sm mb-1">{reg.name}</h4>
<p className="text-xs text-slate-600 mb-2">{reg.description}</p>
<div className="flex items-center justify-between text-xs">
<span className="text-slate-500">Ab: {reg.effectiveDate}</span>
{reg.celex && (
<a
href={`https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:${reg.celex}`}
target="_blank"
rel="noopener noreferrer"
className="text-teal-600 hover:underline"
>
EUR-Lex
</a>
)}
</div>
</div>
))}
</div>
</div>
)
}
export function LegalBasisSection() {
return (
<div className="bg-emerald-50 rounded-xl border border-emerald-200 p-6">
<div className="flex items-center gap-3 mb-4">
<span className="text-2xl"></span>
<div>
<h3 className="font-semibold text-slate-900">{LEGAL_BASIS_INFO.title}</h3>
<p className="text-sm text-emerald-700">{LEGAL_BASIS_INFO.summary}</p>
</div>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
{LEGAL_BASIS_INFO.details.map((detail, idx) => (
<div key={idx} className="bg-white rounded-lg border border-emerald-100 p-3">
<div className="flex items-center gap-2 mb-1">
<span className={`px-2 py-0.5 text-xs font-medium rounded ${
detail.status === 'Erlaubt' ? 'bg-green-100 text-green-700' : 'bg-yellow-100 text-yellow-700'
}`}>
{detail.status}
</span>
<span className="font-medium text-sm text-slate-900">{detail.aspect}</span>
</div>
<p className="text-xs text-slate-600">{detail.explanation}</p>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,113 @@
'use client'
import React from 'react'
import { REGULATIONS_IN_RAG } from '../rag-constants'
import {
REGULATIONS,
COLLECTION_TOTALS,
TYPE_LABELS,
TYPE_COLORS,
isInRag,
getKnownChunks,
} from '../rag-data'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface OverviewTabProps {
hook: UseRAGPageReturn
}
export function OverviewTab({ hook }: OverviewTabProps) {
const {
dsfaLoading,
dsfaStatus,
dsfaSources,
setRegulationCategory,
setActiveTab,
} = hook
return (
<div className="space-y-6">
{/* RAG Categories Overview */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">RAG-Kategorien</h3>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<button
onClick={() => { setRegulationCategory('regulations'); setActiveTab('regulations') }}
className="p-4 rounded-lg border border-blue-200 bg-blue-50 hover:bg-blue-100 transition-colors text-left"
>
<p className="text-xs font-medium text-blue-600 uppercase">Gesetze & Regulierungen</p>
<p className="text-2xl font-bold text-slate-900 mt-1">{COLLECTION_TOTALS.total_legal.toLocaleString()}</p>
<p className="text-xs text-slate-500 mt-1">{Object.keys(REGULATIONS_IN_RAG).length}/{REGULATIONS.length} im RAG</p>
</button>
<button
onClick={() => { setRegulationCategory('dsfa'); setActiveTab('regulations') }}
className="p-4 rounded-lg border border-purple-200 bg-purple-50 hover:bg-purple-100 transition-colors text-left"
>
<p className="text-xs font-medium text-purple-600 uppercase">DSFA Corpus</p>
<p className="text-2xl font-bold text-slate-900 mt-1">{dsfaLoading ? '-' : (dsfaStatus?.total_chunks || 0).toLocaleString()}</p>
<p className="text-xs text-slate-500 mt-1">{dsfaSources.length || '~70'} Quellen (WP248, DSK, Gesetze)</p>
</button>
<div className="p-4 rounded-lg border border-emerald-200 bg-emerald-50 text-left">
<p className="text-xs font-medium text-emerald-600 uppercase">NiBiS EH</p>
<p className="text-2xl font-bold text-slate-900 mt-1">7.996</p>
<p className="text-xs text-slate-500 mt-1">Chunks &middot; Bildungs-Erwartungshorizonte</p>
</div>
<div className="p-4 rounded-lg border border-orange-200 bg-orange-50 text-left">
<p className="text-xs font-medium text-orange-600 uppercase">Legal Templates</p>
<p className="text-2xl font-bold text-slate-900 mt-1">7.689</p>
<p className="text-xs text-slate-500 mt-1">Chunks &middot; Dokumentvorlagen (VVT, TOM, DSFA)</p>
</div>
</div>
</div>
{/* Quick Stats per Type */}
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
{Object.entries(TYPE_LABELS).map(([type, label]) => {
const regs = REGULATIONS.filter((r) => r.type === type)
const inRagCount = regs.filter((r) => isInRag(r.code)).length
const totalChunks = regs.reduce((sum, r) => sum + getKnownChunks(r.code), 0)
return (
<div key={type} className="bg-white rounded-xl p-4 border border-slate-200">
<div className="flex items-center gap-2 mb-2">
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[type]}`}>{label}</span>
<span className="text-slate-500 text-sm">{inRagCount}/{regs.length} im RAG</span>
</div>
<p className="text-xl font-bold text-slate-900">{totalChunks.toLocaleString()} Chunks</p>
</div>
)
})}
</div>
{/* Top Regulations */}
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50">
<h3 className="font-semibold text-slate-900">Top Regulierungen (nach Chunks)</h3>
</div>
<div className="divide-y">
{[...REGULATIONS].sort((a, b) => getKnownChunks(b.code) - getKnownChunks(a.code))
.slice(0, 10)
.map((reg) => {
const chunks = getKnownChunks(reg.code)
return (
<div key={reg.code} className="px-4 py-3 flex items-center justify-between">
<div className="flex items-center gap-3">
{isInRag(reg.code) ? (
<span className="text-green-500 text-sm"></span>
) : (
<span className="text-red-400 text-sm"></span>
)}
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[reg.type]}`}>
{TYPE_LABELS[reg.type]}
</span>
<span className="font-medium text-slate-900">{reg.name}</span>
<span className="text-slate-500 text-sm">({reg.code})</span>
</div>
<span className={`font-bold ${chunks > 0 ? 'text-teal-600' : 'text-slate-300'}`}>{chunks > 0 ? chunks.toLocaleString() + ' Chunks' : '—'}</span>
</div>
)
})}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,410 @@
'use client'
import React from 'react'
import type { PipelineCheckpoint } from '../types'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface PipelineTabProps {
hook: UseRAGPageReturn
}
export function PipelineTab({ hook }: PipelineTabProps) {
const {
pipelineState,
pipelineLoading,
pipelineStarting,
autoRefresh,
setAutoRefresh,
elapsedTime,
fetchPipeline,
handleStartPipeline,
collectionStatus,
} = hook
return (
<div className="space-y-6">
{/* Pipeline Header */}
<div className="flex items-center justify-between flex-wrap gap-4">
<div className="flex items-center gap-4">
<h3 className="text-lg font-semibold text-slate-900">Compliance Pipeline Status</h3>
{pipelineState?.status === 'running' && elapsedTime && (
<div className="flex items-center gap-2 px-3 py-1.5 bg-blue-50 border border-blue-200 rounded-full">
<div className="w-2 h-2 bg-blue-500 rounded-full animate-pulse" />
<span className="text-sm font-medium text-blue-700">Laufzeit: {elapsedTime}</span>
</div>
)}
</div>
<div className="flex items-center gap-3">
<label className="flex items-center gap-2 text-sm text-slate-600 cursor-pointer">
<input
type="checkbox"
checked={autoRefresh}
onChange={(e) => setAutoRefresh(e.target.checked)}
className="w-4 h-4 text-teal-600 rounded border-slate-300 focus:ring-teal-500"
/>
Auto-Refresh
</label>
{(!pipelineState || pipelineState.status !== 'running') && (
<button
onClick={() => handleStartPipeline(false)}
disabled={pipelineStarting}
className="flex items-center gap-2 px-4 py-2 text-sm bg-green-600 text-white rounded-lg hover:bg-green-700 disabled:opacity-50"
>
{pipelineStarting ? (
<SpinnerIcon />
) : (
<PlayIcon />
)}
Pipeline starten
</button>
)}
<button
onClick={fetchPipeline}
disabled={pipelineLoading}
className="flex items-center gap-2 px-4 py-2 text-sm bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
>
{pipelineLoading ? <SpinnerIcon /> : <RefreshIcon />}
Aktualisieren
</button>
</div>
</div>
{/* No Data */}
{(!pipelineState || pipelineState.status === 'no_data') && !pipelineLoading && (
<NoDataCard pipelineStarting={pipelineStarting} handleStartPipeline={handleStartPipeline} />
)}
{/* Pipeline Status */}
{pipelineState && pipelineState.status !== 'no_data' && (
<>
{/* Status Card */}
<PipelineStatusCard pipelineState={pipelineState} />
{/* Current Progress */}
{pipelineState.status === 'running' && pipelineState.current_phase && (
<CurrentProgressCard pipelineState={pipelineState} collectionStatus={collectionStatus} />
)}
{/* Validation Summary */}
{pipelineState.validation_summary && (
<ValidationSummary summary={pipelineState.validation_summary} />
)}
{/* Checkpoints */}
<CheckpointsList checkpoints={pipelineState.checkpoints} />
{/* Summary */}
{Object.keys(pipelineState.summary || {}).length > 0 && (
<PipelineSummary summary={pipelineState.summary} />
)}
</>
)}
</div>
)
}
// --- Icons ---
function SpinnerIcon() {
return (
<svg className="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
</svg>
)
}
function PlayIcon() {
return (
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
)
}
function RefreshIcon() {
return (
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" />
</svg>
)
}
// --- Sub-components ---
function NoDataCard({
pipelineStarting,
handleStartPipeline,
}: {
pipelineStarting: boolean
handleStartPipeline: (skip: boolean) => void
}) {
return (
<div className="bg-white rounded-xl border border-slate-200 p-8 text-center">
<div className="w-16 h-16 mx-auto mb-4 rounded-full bg-slate-100 flex items-center justify-center">
<svg className="w-8 h-8 text-slate-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
</svg>
</div>
<h4 className="text-lg font-semibold text-slate-900 mb-2">Keine Pipeline-Daten</h4>
<p className="text-slate-600 mb-4">
Es wurde noch keine Pipeline ausgefuehrt. Starten Sie die Compliance-Pipeline um Checkpoint-Daten zu sehen.
</p>
<button
onClick={() => handleStartPipeline(false)}
disabled={pipelineStarting}
className="inline-flex items-center gap-2 px-6 py-3 bg-green-600 text-white rounded-lg hover:bg-green-700 disabled:opacity-50"
>
{pipelineStarting ? (
<>
<svg className="animate-spin h-5 w-5" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
</svg>
Startet...
</>
) : (
<>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
Pipeline jetzt starten
</>
)}
</button>
</div>
)
}
function PipelineStatusCard({ pipelineState }: { pipelineState: any }) {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center justify-between">
<div className="flex items-center gap-4">
<div className={`w-12 h-12 rounded-xl flex items-center justify-center ${
pipelineState.status === 'completed' ? 'bg-green-100' :
pipelineState.status === 'running' ? 'bg-blue-100' :
pipelineState.status === 'failed' ? 'bg-red-100' : 'bg-slate-100'
}`}>
{pipelineState.status === 'completed' && (
<svg className="w-6 h-6 text-green-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
</svg>
)}
{pipelineState.status === 'running' && (
<svg className="w-6 h-6 text-blue-600 animate-spin" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.7.689 3 7.938l3-2.647z" />
</svg>
)}
{pipelineState.status === 'failed' && (
<svg className="w-6 h-6 text-red-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
</svg>
)}
</div>
<div>
<h4 className="font-semibold text-slate-900">Pipeline {pipelineState.pipeline_id}</h4>
<p className="text-sm text-slate-500">
Gestartet: {pipelineState.started_at ? new Date(pipelineState.started_at).toLocaleString('de-DE') : '-'}
{pipelineState.completed_at && ` | Beendet: ${new Date(pipelineState.completed_at).toLocaleString('de-DE')}`}
</p>
</div>
</div>
<span className={`px-3 py-1 rounded-full text-sm font-medium ${
pipelineState.status === 'completed' ? 'bg-green-100 text-green-700' :
pipelineState.status === 'running' ? 'bg-blue-100 text-blue-700' :
pipelineState.status === 'failed' ? 'bg-red-100 text-red-700' : 'bg-slate-100 text-slate-700'
}`}>
{pipelineState.status === 'completed' ? 'Abgeschlossen' :
pipelineState.status === 'running' ? 'Laeuft' :
pipelineState.status === 'failed' ? 'Fehlgeschlagen' : pipelineState.status}
</span>
</div>
</div>
)
}
function CurrentProgressCard({ pipelineState, collectionStatus }: { pipelineState: any; collectionStatus: any }) {
return (
<div className="bg-gradient-to-r from-blue-50 to-indigo-50 rounded-xl border border-blue-200 p-6">
<div className="flex items-center justify-between mb-4">
<h4 className="font-semibold text-blue-900 flex items-center gap-2">
<svg className="w-5 h-5 animate-pulse" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
</svg>
Aktuelle Verarbeitung
</h4>
<span className="text-sm text-blue-600">Phase: {pipelineState.current_phase}</span>
</div>
{/* Phase Progress Indicator */}
<div className="flex items-center gap-2 mb-4">
{['ingestion', 'extraction', 'controls', 'measures'].map((phase, idx) => (
<div key={phase} className="flex-1 flex items-center">
<div className={`flex-1 h-2 rounded-full ${
pipelineState.current_phase === phase ? 'bg-blue-500 animate-pulse' :
pipelineState.checkpoints?.some((c: PipelineCheckpoint) => c.phase === phase && c.status === 'completed') ? 'bg-green-500' :
'bg-slate-200'
}`} />
{idx < 3 && <div className="w-2" />}
</div>
))}
</div>
<div className="flex justify-between text-xs text-slate-500 mb-4">
<span>Ingestion</span>
<span>Extraktion</span>
<span>Controls</span>
<span>Massnahmen</span>
</div>
{/* Current checkpoint details */}
{pipelineState.checkpoints?.filter((c: PipelineCheckpoint) => c.status === 'running').map((checkpoint: PipelineCheckpoint, idx: number) => (
<div key={idx} className="bg-white/60 rounded-lg p-4 mt-2">
<div className="flex items-center justify-between">
<div className="flex items-center gap-3">
<div className="w-3 h-3 bg-blue-500 rounded-full animate-pulse" />
<span className="font-medium text-slate-900">{checkpoint.name}</span>
</div>
{checkpoint.metrics && Object.keys(checkpoint.metrics).length > 0 && (
<div className="flex gap-2">
{Object.entries(checkpoint.metrics).slice(0, 3).map(([key, value]) => (
<span key={key} className="px-2 py-1 bg-blue-100 text-blue-700 rounded text-xs">
{key.replace(/_/g, ' ')}: {typeof value === 'number' ? value.toLocaleString() : String(value)}
</span>
))}
</div>
)}
</div>
</div>
))}
{/* Live chunk count */}
<div className="mt-4 flex items-center justify-between text-sm">
<span className="text-slate-600">Chunks in Qdrant:</span>
<span className="font-bold text-blue-700">{collectionStatus?.totalPoints?.toLocaleString() || '-'}</span>
</div>
</div>
)
}
function ValidationSummary({ summary }: { summary: { passed: number; warning: number; failed: number; total: number } }) {
return (
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
<div className="bg-white rounded-xl border border-green-200 p-4">
<p className="text-sm text-slate-500">Bestanden</p>
<p className="text-2xl font-bold text-green-600">{summary.passed}</p>
</div>
<div className="bg-white rounded-xl border border-yellow-200 p-4">
<p className="text-sm text-slate-500">Warnungen</p>
<p className="text-2xl font-bold text-yellow-600">{summary.warning}</p>
</div>
<div className="bg-white rounded-xl border border-red-200 p-4">
<p className="text-sm text-slate-500">Fehlgeschlagen</p>
<p className="text-2xl font-bold text-red-600">{summary.failed}</p>
</div>
<div className="bg-white rounded-xl border border-slate-200 p-4">
<p className="text-sm text-slate-500">Gesamt</p>
<p className="text-2xl font-bold text-slate-700">{summary.total}</p>
</div>
</div>
)
}
function CheckpointsList({ checkpoints }: { checkpoints?: PipelineCheckpoint[] }) {
return (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50">
<h3 className="font-semibold text-slate-900">Checkpoints ({checkpoints?.length || 0})</h3>
</div>
<div className="divide-y">
{checkpoints?.map((checkpoint, idx) => (
<div key={idx} className="p-4">
<div className="flex items-center justify-between mb-2">
<div className="flex items-center gap-3">
<span className={`w-3 h-3 rounded-full ${
checkpoint.phase === 'ingestion' ? 'bg-blue-500' :
checkpoint.phase === 'extraction' ? 'bg-purple-500' :
checkpoint.phase === 'controls' ? 'bg-green-500' : 'bg-orange-500'
}`} />
<span className="font-medium text-slate-900">{checkpoint.name}</span>
<span className="text-sm text-slate-500">
({checkpoint.phase}) |
{checkpoint.duration_seconds ? ` ${checkpoint.duration_seconds.toFixed(1)}s` : ' -'}
</span>
</div>
<span className={`px-2 py-0.5 rounded text-xs font-medium ${
checkpoint.status === 'completed' ? 'bg-green-100 text-green-700' :
checkpoint.status === 'running' ? 'bg-blue-100 text-blue-700' :
checkpoint.status === 'failed' ? 'bg-red-100 text-red-700' : 'bg-slate-100 text-slate-700'
}`}>
{checkpoint.status}
</span>
</div>
{/* Metrics */}
{Object.keys(checkpoint.metrics || {}).length > 0 && (
<div className="flex flex-wrap gap-2 mt-2">
{Object.entries(checkpoint.metrics).map(([key, value]) => (
<span key={key} className="px-2 py-1 bg-slate-100 rounded text-xs text-slate-600">
{key.replace(/_/g, ' ')}: <strong>{typeof value === 'number' ? value.toLocaleString() : String(value)}</strong>
</span>
))}
</div>
)}
{/* Validations */}
{checkpoint.validations?.length > 0 && (
<div className="mt-3 space-y-1">
{checkpoint.validations.map((v, vIdx) => (
<div key={vIdx} className="flex items-center gap-2 text-sm">
<span className={`w-4 h-4 flex items-center justify-center ${
v.status === 'passed' ? 'text-green-500' :
v.status === 'warning' ? 'text-yellow-500' : 'text-red-500'
}`}>
{v.status === 'passed' ? '✓' : v.status === 'warning' ? '⚠' : '✗'}
</span>
<span className="text-slate-700">{v.name}:</span>
<span className="text-slate-500">{v.message}</span>
</div>
))}
</div>
)}
{/* Error */}
{checkpoint.error && (
<div className="mt-2 p-2 bg-red-50 border border-red-200 rounded text-sm text-red-700">
{checkpoint.error}
</div>
)}
</div>
))}
{(!checkpoints || checkpoints.length === 0) && (
<div className="p-4 text-center text-slate-500">
Noch keine Checkpoints vorhanden.
</div>
)}
</div>
</div>
)
}
function PipelineSummary({ summary }: { summary: Record<string, any> }) {
return (
<div className="bg-white rounded-xl border border-slate-200 p-4">
<h4 className="font-semibold text-slate-900 mb-3">Zusammenfassung</h4>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
{Object.entries(summary).map(([key, value]) => (
<div key={key}>
<p className="text-sm text-slate-500">{key.replace(/_/g, ' ')}</p>
<p className="font-bold text-slate-900">
{typeof value === 'number' ? value.toLocaleString() : String(value)}
</p>
</div>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,451 @@
'use client'
import React from 'react'
import {
REGULATIONS,
TYPE_COLORS,
TYPE_LABELS,
isInRag,
getKnownChunks,
} from '../rag-data'
import {
REGULATION_SOURCES,
REGULATION_LICENSES,
LICENSE_LABELS,
} from '../rag-sources'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface RegulationsTabProps {
hook: UseRAGPageReturn
}
export function RegulationsTab({ hook }: RegulationsTabProps) {
const {
regulationCategory,
setRegulationCategory,
expandedRegulation,
setExpandedRegulation,
fetchStatus,
dsfaSources,
dsfaLoading,
expandedDsfaSource,
setExpandedDsfaSource,
fetchDsfaStatus,
setActiveTab,
} = hook
return (
<div className="space-y-4">
{/* Category Filter */}
<div className="flex items-center gap-2 flex-wrap">
<button
onClick={() => setRegulationCategory('regulations')}
className={`px-3 py-1.5 text-sm font-medium rounded-lg transition-colors ${
regulationCategory === 'regulations'
? 'bg-blue-100 text-blue-700 ring-2 ring-blue-300'
: 'bg-white text-slate-600 border border-slate-200 hover:bg-slate-50'
}`}
>
Gesetze & Regulierungen ({REGULATIONS.length})
</button>
<button
onClick={() => setRegulationCategory('dsfa')}
className={`px-3 py-1.5 text-sm font-medium rounded-lg transition-colors ${
regulationCategory === 'dsfa'
? 'bg-purple-100 text-purple-700 ring-2 ring-purple-300'
: 'bg-white text-slate-600 border border-slate-200 hover:bg-slate-50'
}`}
>
DSFA Quellen ({dsfaSources.length || '~70'})
</button>
<button
onClick={() => setRegulationCategory('nibis')}
className={`px-3 py-1.5 text-sm font-medium rounded-lg transition-colors ${
regulationCategory === 'nibis'
? 'bg-emerald-100 text-emerald-700 ring-2 ring-emerald-300'
: 'bg-white text-slate-600 border border-slate-200 hover:bg-slate-50'
}`}
>
NiBiS Dokumente
</button>
<button
onClick={() => setRegulationCategory('templates')}
className={`px-3 py-1.5 text-sm font-medium rounded-lg transition-colors ${
regulationCategory === 'templates'
? 'bg-orange-100 text-orange-700 ring-2 ring-orange-300'
: 'bg-white text-slate-600 border border-slate-200 hover:bg-slate-50'
}`}
>
Templates & Vorlagen
</button>
</div>
{/* Regulations Table */}
{regulationCategory === 'regulations' && (
<RegulationsTable
expandedRegulation={expandedRegulation}
setExpandedRegulation={setExpandedRegulation}
fetchStatus={fetchStatus}
setActiveTab={setActiveTab}
/>
)}
{/* DSFA Sources */}
{regulationCategory === 'dsfa' && (
<DsfaSourcesList
dsfaSources={dsfaSources}
dsfaLoading={dsfaLoading}
expandedDsfaSource={expandedDsfaSource}
setExpandedDsfaSource={setExpandedDsfaSource}
fetchDsfaStatus={fetchDsfaStatus}
/>
)}
{/* NiBiS Dokumente (info only) */}
{regulationCategory === 'nibis' && <NibisInfo />}
{/* Templates (info only) */}
{regulationCategory === 'templates' && <TemplatesInfo />}
</div>
)
}
// --- Sub-components ---
function RegulationsTable({
expandedRegulation,
setExpandedRegulation,
fetchStatus,
setActiveTab,
}: {
expandedRegulation: string | null
setExpandedRegulation: (v: string | null) => void
fetchStatus: () => void
setActiveTab: (v: any) => void
}) {
return (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50 flex items-center justify-between">
<h3 className="font-semibold text-slate-900">
Alle {REGULATIONS.length} Regulierungen
<span className="ml-2 text-sm font-normal text-slate-500">
({REGULATIONS.filter(r => isInRag(r.code)).length} im RAG,{' '}
{REGULATIONS.filter(r => !isInRag(r.code)).length} ausstehend)
</span>
</h3>
<button onClick={fetchStatus} className="text-sm text-teal-600 hover:text-teal-700">
Aktualisieren
</button>
</div>
<div className="overflow-x-auto">
<table className="w-full">
<thead className="bg-slate-50 border-b">
<tr>
<th className="px-4 py-3 text-center text-xs font-medium text-slate-500 uppercase w-12">RAG</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Code</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Typ</th>
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Name</th>
<th className="px-4 py-3 text-right text-xs font-medium text-slate-500 uppercase">Chunks</th>
<th className="px-4 py-3 text-right text-xs font-medium text-slate-500 uppercase">Erwartet</th>
<th className="px-4 py-3 text-center text-xs font-medium text-slate-500 uppercase">Status</th>
</tr>
</thead>
<tbody className="divide-y">
{REGULATIONS.map((reg) => {
const chunks = getKnownChunks(reg.code)
const inRag = isInRag(reg.code)
const statusColor = inRag ? 'text-green-500' : 'text-red-500'
const statusIcon = inRag ? '✓' : '❌'
const isExpanded = expandedRegulation === reg.code
return (
<React.Fragment key={reg.code}>
<tr
onClick={() => setExpandedRegulation(isExpanded ? null : reg.code)}
className="hover:bg-slate-50 cursor-pointer transition-colors"
>
<td className="px-4 py-3 text-center">
{isInRag(reg.code) ? (
<span className="inline-flex items-center justify-center w-6 h-6 bg-green-100 text-green-600 rounded-full text-xs font-bold" title="Im RAG vorhanden"></span>
) : (
<span className="inline-flex items-center justify-center w-6 h-6 bg-red-50 text-red-400 rounded-full text-xs font-bold" title="Nicht im RAG"></span>
)}
</td>
<td className="px-4 py-3 font-mono font-medium text-teal-600">
<span className="inline-flex items-center gap-2">
<span className={`transform transition-transform ${isExpanded ? 'rotate-90' : ''}`}></span>
{reg.code}
</span>
</td>
<td className="px-4 py-3">
<span className={`px-2 py-0.5 text-xs rounded ${TYPE_COLORS[reg.type]}`}>
{TYPE_LABELS[reg.type]}
</span>
</td>
<td className="px-4 py-3 text-slate-900">{reg.name}</td>
<td className="px-4 py-3 text-right font-bold">
<span className={chunks > 0 && chunks < 10 && reg.expected >= 10 ? 'text-amber-600' : ''}>
{chunks.toLocaleString()}
{chunks > 0 && chunks < 10 && reg.expected >= 10 && (
<span className="ml-1 inline-block w-4 h-4 text-[10px] leading-4 text-center bg-amber-100 text-amber-700 rounded-full" title="Verdaechtig niedrig — Ingestion pruefen"></span>
)}
</span>
</td>
<td className="px-4 py-3 text-right text-slate-500">{reg.expected}</td>
<td className={`px-4 py-3 text-center ${statusColor}`}>{statusIcon}</td>
</tr>
{isExpanded && (
<tr key={`${reg.code}-detail`} className="bg-slate-50">
<td colSpan={7} className="px-4 py-4">
<div className="bg-white rounded-lg border border-slate-200 p-4 space-y-3">
<div>
<h4 className="font-semibold text-slate-900 mb-1">{reg.fullName}</h4>
<p className="text-sm text-slate-600">{reg.description}</p>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4 pt-2 border-t border-slate-100">
<div>
<p className="text-xs font-medium text-slate-500 uppercase mb-1">Relevant fuer</p>
<div className="flex flex-wrap gap-1">
{reg.relevantFor.map((item, idx) => (
<span key={idx} className="px-2 py-0.5 text-xs bg-slate-100 text-slate-600 rounded">
{item}
</span>
))}
</div>
</div>
<div>
<p className="text-xs font-medium text-slate-500 uppercase mb-1">Kernthemen</p>
<div className="flex flex-wrap gap-1">
{reg.keyTopics.map((topic, idx) => (
<span key={idx} className="px-2 py-0.5 text-xs bg-teal-50 text-teal-700 rounded">
{topic}
</span>
))}
</div>
</div>
</div>
<div className="flex items-center justify-between pt-2 border-t border-slate-100 text-xs text-slate-500">
<div className="flex items-center gap-4">
<span>In Kraft seit: {reg.effectiveDate}</span>
{REGULATION_LICENSES[reg.code] && (
<span className="flex items-center gap-1">
<span className="px-1.5 py-0.5 bg-slate-100 text-slate-600 rounded text-[10px] font-medium">
{LICENSE_LABELS[REGULATION_LICENSES[reg.code].license] || REGULATION_LICENSES[reg.code].license}
</span>
<span className="text-slate-400">{REGULATION_LICENSES[reg.code].licenseNote}</span>
</span>
)}
</div>
<div className="flex items-center gap-3">
{REGULATION_SOURCES[reg.code] && (
<a
href={REGULATION_SOURCES[reg.code]}
target="_blank"
rel="noopener noreferrer"
onClick={(e) => e.stopPropagation()}
className="text-blue-600 hover:text-blue-700 font-medium"
>
Originalquelle
</a>
)}
<button
onClick={(e) => {
e.stopPropagation()
setActiveTab('chunks')
}}
className="text-teal-600 hover:text-teal-700 font-medium"
>
In Chunks suchen
</button>
</div>
</div>
</div>
</td>
</tr>
)}
</React.Fragment>
)
})}
</tbody>
</table>
</div>
</div>
)
}
function DsfaSourcesList({
dsfaSources,
dsfaLoading,
expandedDsfaSource,
setExpandedDsfaSource,
fetchDsfaStatus,
}: {
dsfaSources: any[]
dsfaLoading: boolean
expandedDsfaSource: string | null
setExpandedDsfaSource: (v: string | null) => void
fetchDsfaStatus: () => void
}) {
const typeColors: Record<string, string> = {
regulation: 'bg-blue-100 text-blue-700',
legislation: 'bg-indigo-100 text-indigo-700',
guideline: 'bg-teal-100 text-teal-700',
checklist: 'bg-yellow-100 text-yellow-700',
standard: 'bg-green-100 text-green-700',
methodology: 'bg-purple-100 text-purple-700',
specification: 'bg-orange-100 text-orange-700',
catalog: 'bg-pink-100 text-pink-700',
guidance: 'bg-cyan-100 text-cyan-700',
}
return (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50 flex items-center justify-between">
<div>
<h3 className="font-semibold text-slate-900">DSFA Quellen ({dsfaSources.length || '~70'})</h3>
<p className="text-xs text-slate-500">WP248, DSK Kurzpapiere, Muss-Listen, nationale Datenschutzgesetze</p>
</div>
<button onClick={fetchDsfaStatus} className="text-sm text-teal-600 hover:text-teal-700">
Aktualisieren
</button>
</div>
{dsfaLoading ? (
<div className="p-8 text-center text-slate-500">Lade DSFA-Quellen...</div>
) : dsfaSources.length === 0 ? (
<div className="p-8 text-center text-slate-500">
<p className="mb-2">Keine DSFA-Quellen vom Backend geladen.</p>
<p className="text-xs">Endpunkt: <code className="bg-slate-100 px-1 rounded">/api/dsfa-corpus?action=sources</code></p>
</div>
) : (
<div className="divide-y">
{dsfaSources.map((source) => {
const isExpanded = expandedDsfaSource === source.source_code
return (
<React.Fragment key={source.source_code}>
<div
onClick={() => setExpandedDsfaSource(isExpanded ? null : source.source_code)}
className="px-4 py-3 hover:bg-slate-50 cursor-pointer transition-colors flex items-center justify-between"
>
<div className="flex items-center gap-3">
<span className={`transform transition-transform text-xs ${isExpanded ? 'rotate-90' : ''}`}></span>
<span className="font-mono text-sm text-purple-600 font-medium">{source.source_code}</span>
<span className={`px-2 py-0.5 text-xs rounded ${typeColors[source.document_type] || 'bg-slate-100 text-slate-600'}`}>
{source.document_type}
</span>
<span className="text-sm text-slate-900">{source.name}</span>
</div>
<div className="flex items-center gap-3">
<span className="px-1.5 py-0.5 text-[10px] font-medium bg-slate-100 text-slate-500 rounded uppercase">
{source.language}
</span>
{source.chunk_count != null && (
<span className="text-sm font-bold text-purple-600">{source.chunk_count} Chunks</span>
)}
</div>
</div>
{isExpanded && (
<div className="px-4 pb-4 bg-slate-50">
<div className="bg-white rounded-lg border border-slate-200 p-4 space-y-3">
<div>
<h4 className="font-semibold text-slate-900 mb-1">{source.full_name || source.name}</h4>
{source.organization && (
<p className="text-sm text-slate-600">Organisation: {source.organization}</p>
)}
</div>
<div className="flex items-center gap-4 pt-2 border-t border-slate-100 text-xs text-slate-500">
<span className="flex items-center gap-1">
<span className="px-1.5 py-0.5 bg-slate-100 text-slate-600 rounded text-[10px] font-medium">
{LICENSE_LABELS[source.license_code] || source.license_code}
</span>
<span className="text-slate-400">{source.attribution_text}</span>
</span>
</div>
{source.source_url && (
<div className="text-xs">
<a
href={source.source_url}
target="_blank"
rel="noopener noreferrer"
className="text-teal-600 hover:underline"
onClick={(e) => e.stopPropagation()}
>
Quelle: {source.source_url}
</a>
</div>
)}
</div>
</div>
)}
</React.Fragment>
)
})}
</div>
)}
</div>
)
}
function NibisInfo() {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center gap-3 mb-4">
<div className="w-10 h-10 rounded-lg bg-emerald-100 flex items-center justify-center text-xl">📚</div>
<div>
<h3 className="font-semibold text-slate-900">NiBiS Erwartungshorizonte</h3>
<p className="text-sm text-slate-500">Collection: <code className="bg-slate-100 px-1 rounded">bp_nibis_eh</code></p>
</div>
</div>
<div className="grid grid-cols-3 gap-4 mb-4">
<div className="bg-emerald-50 rounded-lg p-4 border border-emerald-200">
<p className="text-sm text-emerald-600 font-medium">Chunks</p>
<p className="text-2xl font-bold text-slate-900">7.996</p>
</div>
<div className="bg-emerald-50 rounded-lg p-4 border border-emerald-200">
<p className="text-sm text-emerald-600 font-medium">Vector Size</p>
<p className="text-2xl font-bold text-slate-900">1024</p>
</div>
<div className="bg-emerald-50 rounded-lg p-4 border border-emerald-200">
<p className="text-sm text-emerald-600 font-medium">Typ</p>
<p className="text-2xl font-bold text-slate-900">BGE-M3</p>
</div>
</div>
<p className="text-sm text-slate-600">
Bildungsinhalte aus dem Niedersaechsischen Bildungsserver (NiBiS). Enthaelt Erwartungshorizonte fuer
verschiedene Faecher und Schulformen. Wird ueber die Klausur-Korrektur fuer EH-Matching genutzt.
Diese Daten sind nicht direkt compliance-relevant.
</p>
</div>
)
}
function TemplatesInfo() {
return (
<div className="bg-white rounded-xl border border-slate-200 p-6">
<div className="flex items-center gap-3 mb-4">
<div className="w-10 h-10 rounded-lg bg-orange-100 flex items-center justify-center text-xl">📋</div>
<div>
<h3 className="font-semibold text-slate-900">Legal Templates & Vorlagen</h3>
<p className="text-sm text-slate-500">Collection: <code className="bg-slate-100 px-1 rounded">bp_legal_templates</code></p>
</div>
</div>
<div className="grid grid-cols-3 gap-4 mb-4">
<div className="bg-orange-50 rounded-lg p-4 border border-orange-200">
<p className="text-sm text-orange-600 font-medium">Chunks</p>
<p className="text-2xl font-bold text-slate-900">7.689</p>
</div>
<div className="bg-orange-50 rounded-lg p-4 border border-orange-200">
<p className="text-sm text-orange-600 font-medium">Vector Size</p>
<p className="text-2xl font-bold text-slate-900">1024</p>
</div>
<div className="bg-orange-50 rounded-lg p-4 border border-orange-200">
<p className="text-sm text-orange-600 font-medium">Typ</p>
<p className="text-2xl font-bold text-slate-900">BGE-M3</p>
</div>
</div>
<p className="text-sm text-slate-600">
Vorlagen fuer VVT (Verzeichnis von Verarbeitungstaetigkeiten), TOM (Technisch-Organisatorische Massnahmen),
DSFA-Berichte und weitere Compliance-Dokumente. Werden vom AI Compliance SDK fuer die Dokumentgenerierung genutzt.
</p>
</div>
)
}

View File

@@ -0,0 +1,97 @@
'use client'
import React from 'react'
import type { UseRAGPageReturn } from '../_hooks/useRAGPage'
interface SearchTabProps {
hook: UseRAGPageReturn
}
export function SearchTab({ hook }: SearchTabProps) {
const {
searchQuery,
setSearchQuery,
searchResults,
searching,
selectedRegulations,
setSelectedRegulations,
handleSearch,
} = hook
return (
<div className="space-y-6">
{/* Search Box */}
<div className="bg-white rounded-xl border border-slate-200 p-6">
<h3 className="font-semibold text-slate-900 mb-4">Semantische Suche</h3>
<div className="space-y-4">
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Suchanfrage</label>
<textarea
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
placeholder="z.B. 'Welche Anforderungen gibt es fuer KI-Systeme mit hohem Risiko?'"
rows={3}
className="w-full px-3 py-2 border rounded-lg focus:ring-2 focus:ring-teal-500 focus:border-teal-500"
/>
</div>
<div>
<label className="block text-sm font-medium text-slate-700 mb-2">Filter (optional)</label>
<div className="flex flex-wrap gap-2">
{['GDPR', 'AIACT', 'CRA', 'NIS2', 'BSI-TR-03161-1'].map((code) => (
<button
key={code}
onClick={() => {
setSelectedRegulations((prev: string[]) =>
prev.includes(code) ? prev.filter((c: string) => c !== code) : [...prev, code]
)
}}
className={`px-3 py-1 text-sm rounded-full border transition-colors ${
selectedRegulations.includes(code)
? 'bg-teal-100 border-teal-300 text-teal-700'
: 'bg-white border-slate-200 text-slate-600 hover:border-slate-300'
}`}
>
{code}
</button>
))}
</div>
</div>
<button
onClick={handleSearch}
disabled={searching || !searchQuery.trim()}
className="px-6 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
>
{searching ? 'Suche...' : 'Suchen'}
</button>
</div>
</div>
{/* Search Results */}
{searchResults.length > 0 && (
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
<div className="px-4 py-3 border-b bg-slate-50">
<h3 className="font-semibold text-slate-900">{searchResults.length} Ergebnisse</h3>
</div>
<div className="divide-y">
{searchResults.map((result, i) => (
<div key={i} className="p-4">
<div className="flex items-center gap-2 mb-2">
<span className="px-2 py-0.5 text-xs rounded bg-teal-100 text-teal-700">
{result.regulation_code}
</span>
{result.article && (
<span className="text-sm text-slate-500">Art. {result.article}</span>
)}
<span className="ml-auto text-sm text-slate-400">
Score: {(result.score * 100).toFixed(1)}%
</span>
</div>
<p className="text-slate-700 text-sm">{result.text}</p>
</div>
))}
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,441 @@
'use client'
import { useState, useEffect, useCallback } from 'react'
import { API_PROXY, DSFA_API_PROXY } from '../rag-data'
import type {
TabId,
RegulationCategory,
CollectionStatus,
SearchResult,
DsfaSource,
DsfaCorpusStatus,
CustomDocument,
PipelineState,
PipelineCheckpoint,
} from '../types'
export function useRAGPage() {
const [activeTab, setActiveTab] = useState<TabId>('overview')
const [collectionStatus, setCollectionStatus] = useState<CollectionStatus | null>(null)
const [loading, setLoading] = useState(true)
const [searchQuery, setSearchQuery] = useState('')
const [searchResults, setSearchResults] = useState<SearchResult[]>([])
const [searching, setSearching] = useState(false)
const [selectedRegulations, setSelectedRegulations] = useState<string[]>([])
const [ingestionRunning, setIngestionRunning] = useState(false)
const [ingestionLog, setIngestionLog] = useState<string[]>([])
const [pipelineState, setPipelineState] = useState<PipelineState | null>(null)
const [pipelineLoading, setPipelineLoading] = useState(false)
const [pipelineStarting, setPipelineStarting] = useState(false)
const [expandedRegulation, setExpandedRegulation] = useState<string | null>(null)
const [autoRefresh, setAutoRefresh] = useState(true)
const [elapsedTime, setElapsedTime] = useState<string>('')
const [expandedDocTypes, setExpandedDocTypes] = useState<string[]>(['eu_regulation', 'eu_directive'])
const [expandedMatrixDoc, setExpandedMatrixDoc] = useState<string | null>(null)
// DSFA corpus state
const [dsfaSources, setDsfaSources] = useState<DsfaSource[]>([])
const [dsfaStatus, setDsfaStatus] = useState<DsfaCorpusStatus | null>(null)
const [dsfaLoading, setDsfaLoading] = useState(false)
const [regulationCategory, setRegulationCategory] = useState<RegulationCategory>('regulations')
const [expandedDsfaSource, setExpandedDsfaSource] = useState<string | null>(null)
// Data tab state
const [customDocuments, setCustomDocuments] = useState<CustomDocument[]>([])
const [uploadFile, setUploadFile] = useState<File | null>(null)
const [uploadTitle, setUploadTitle] = useState('')
const [uploadCode, setUploadCode] = useState('')
const [uploading, setUploading] = useState(false)
const [linkUrl, setLinkUrl] = useState('')
const [linkTitle, setLinkTitle] = useState('')
const [linkCode, setLinkCode] = useState('')
const [addingLink, setAddingLink] = useState(false)
const fetchStatus = useCallback(async () => {
setLoading(true)
try {
const res = await fetch(`${API_PROXY}?action=status`)
if (res.ok) {
const data = await res.json()
setCollectionStatus(data)
}
} catch (error) {
console.error('Failed to fetch status:', error)
} finally {
setLoading(false)
}
}, [])
const fetchPipeline = useCallback(async () => {
setPipelineLoading(true)
try {
const res = await fetch(`${API_PROXY}?action=pipeline-checkpoints`)
if (res.ok) {
const data = await res.json()
setPipelineState(data)
}
} catch (error) {
console.error('Failed to fetch pipeline:', error)
} finally {
setPipelineLoading(false)
}
}, [])
const fetchDsfaStatus = useCallback(async () => {
setDsfaLoading(true)
try {
const [statusRes, sourcesRes] = await Promise.all([
fetch(`${DSFA_API_PROXY}?action=status`),
fetch(`${DSFA_API_PROXY}?action=sources`),
])
if (statusRes.ok) {
const data = await statusRes.json()
setDsfaStatus(data)
}
if (sourcesRes.ok) {
const data = await sourcesRes.json()
setDsfaSources(data.sources || data || [])
}
} catch (error) {
console.error('Failed to fetch DSFA status:', error)
} finally {
setDsfaLoading(false)
}
}, [])
const fetchCustomDocuments = useCallback(async () => {
try {
const res = await fetch(`${API_PROXY}?action=custom-documents`)
if (res.ok) {
const data = await res.json()
setCustomDocuments(data.documents || [])
}
} catch (error) {
console.error('Failed to fetch custom documents:', error)
}
}, [])
const handleUpload = async () => {
if (!uploadFile || !uploadTitle || !uploadCode) return
setUploading(true)
try {
const formData = new FormData()
formData.append('file', uploadFile)
formData.append('title', uploadTitle)
formData.append('code', uploadCode)
formData.append('document_type', 'custom')
const res = await fetch(`${API_PROXY}?action=upload`, {
method: 'POST',
body: formData,
})
if (res.ok) {
setUploadFile(null)
setUploadTitle('')
setUploadCode('')
fetchCustomDocuments()
fetchStatus()
}
} catch (error) {
console.error('Upload failed:', error)
} finally {
setUploading(false)
}
}
const handleAddLink = async () => {
if (!linkUrl || !linkTitle || !linkCode) return
setAddingLink(true)
try {
const res = await fetch(`${API_PROXY}?action=add-link`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: linkUrl,
title: linkTitle,
code: linkCode,
document_type: 'custom',
}),
})
if (res.ok) {
setLinkUrl('')
setLinkTitle('')
setLinkCode('')
fetchCustomDocuments()
}
} catch (error) {
console.error('Add link failed:', error)
} finally {
setAddingLink(false)
}
}
const handleDeleteDocument = async (docId: string) => {
try {
const res = await fetch(`${API_PROXY}?action=delete-document&docId=${docId}`, {
method: 'DELETE',
})
if (res.ok) {
fetchCustomDocuments()
fetchStatus()
}
} catch (error) {
console.error('Delete failed:', error)
}
}
const handleStartPipeline = async (skipIngestion: boolean = false) => {
setPipelineStarting(true)
try {
const res = await fetch(`${API_PROXY}?action=start-pipeline`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
force_reindex: false,
skip_ingestion: skipIngestion,
}),
})
if (res.ok) {
setTimeout(() => {
fetchPipeline()
setPipelineStarting(false)
}, 2000)
} else {
setPipelineStarting(false)
}
} catch (error) {
console.error('Failed to start pipeline:', error)
setPipelineStarting(false)
}
}
const handleSearch = async () => {
if (!searchQuery.trim()) return
setSearching(true)
try {
const params = new URLSearchParams({
action: 'search',
query: searchQuery,
top_k: '5',
})
if (selectedRegulations.length > 0) {
params.append('regulations', selectedRegulations.join(','))
}
const res = await fetch(`${API_PROXY}?${params}`)
if (res.ok) {
const data = await res.json()
setSearchResults(data.results || [])
}
} catch (error) {
console.error('Search failed:', error)
} finally {
setSearching(false)
}
}
const triggerIngestion = async () => {
setIngestionRunning(true)
setIngestionLog(['Starte Re-Ingestion aller 19 Regulierungen...'])
try {
const res = await fetch(`${API_PROXY}?action=ingest`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ force: true }),
})
if (res.ok) {
const data = await res.json()
setIngestionLog((prev) => [...prev, 'Ingestion gestartet. Job-ID: ' + (data.job_id || 'N/A')])
const checkStatus = setInterval(async () => {
try {
const statusRes = await fetch(`${API_PROXY}?action=ingestion-status`)
if (statusRes.ok) {
const statusData = await statusRes.json()
if (statusData.completed) {
clearInterval(checkStatus)
setIngestionRunning(false)
setIngestionLog((prev) => [...prev, 'Ingestion abgeschlossen!'])
fetchStatus()
} else if (statusData.current_regulation) {
setIngestionLog((prev) => [
...prev,
`Verarbeite: ${statusData.current_regulation} (${statusData.processed}/${statusData.total})`,
])
}
}
} catch {
// Ignore polling errors
}
}, 5000)
} else {
setIngestionLog((prev) => [...prev, 'Fehler: ' + res.statusText])
setIngestionRunning(false)
}
} catch (error) {
setIngestionLog((prev) => [...prev, 'Fehler: ' + String(error)])
setIngestionRunning(false)
}
}
const getRegulationChunks = (code: string): number => {
return collectionStatus?.regulations?.[code] || 0
}
const getTotalChunks = (): number => {
return collectionStatus?.totalPoints || 0
}
// Initial data fetch
useEffect(() => {
fetchStatus()
fetchDsfaStatus()
}, [fetchStatus, fetchDsfaStatus])
// Fetch pipeline when tab changes
useEffect(() => {
if (activeTab === 'pipeline') {
fetchPipeline()
}
}, [activeTab, fetchPipeline])
// Fetch custom documents when data tab is active
useEffect(() => {
if (activeTab === 'data') {
fetchCustomDocuments()
}
}, [activeTab, fetchCustomDocuments])
// Auto-refresh pipeline status when running
useEffect(() => {
if (activeTab !== 'pipeline' || !autoRefresh) return
const isRunning = pipelineState?.status === 'running'
if (isRunning) {
const interval = setInterval(() => {
fetchPipeline()
fetchStatus()
}, 5000)
return () => clearInterval(interval)
}
}, [activeTab, autoRefresh, pipelineState?.status, fetchPipeline, fetchStatus])
// Update elapsed time
useEffect(() => {
if (!pipelineState?.started_at || pipelineState?.status !== 'running') {
setElapsedTime('')
return
}
const updateElapsed = () => {
const start = new Date(pipelineState.started_at!).getTime()
const now = Date.now()
const diff = Math.floor((now - start) / 1000)
const hours = Math.floor(diff / 3600)
const minutes = Math.floor((diff % 3600) / 60)
const seconds = diff % 60
if (hours > 0) {
setElapsedTime(`${hours}h ${minutes}m ${seconds}s`)
} else if (minutes > 0) {
setElapsedTime(`${minutes}m ${seconds}s`)
} else {
setElapsedTime(`${seconds}s`)
}
}
updateElapsed()
const interval = setInterval(updateElapsed, 1000)
return () => clearInterval(interval)
}, [pipelineState?.started_at, pipelineState?.status])
return {
// Tab state
activeTab,
setActiveTab,
// Collection status
collectionStatus,
loading,
fetchStatus,
// Search
searchQuery,
setSearchQuery,
searchResults,
searching,
selectedRegulations,
setSelectedRegulations,
handleSearch,
// Ingestion
ingestionRunning,
ingestionLog,
triggerIngestion,
// Pipeline
pipelineState,
pipelineLoading,
pipelineStarting,
autoRefresh,
setAutoRefresh,
elapsedTime,
fetchPipeline,
handleStartPipeline,
// Regulation expansion
expandedRegulation,
setExpandedRegulation,
expandedDocTypes,
setExpandedDocTypes,
expandedMatrixDoc,
setExpandedMatrixDoc,
// DSFA
dsfaSources,
dsfaStatus,
dsfaLoading,
regulationCategory,
setRegulationCategory,
expandedDsfaSource,
setExpandedDsfaSource,
fetchDsfaStatus,
// Data tab
customDocuments,
uploadFile,
setUploadFile,
uploadTitle,
setUploadTitle,
uploadCode,
setUploadCode,
uploading,
handleUpload,
linkUrl,
setLinkUrl,
linkTitle,
setLinkTitle,
linkCode,
setLinkCode,
addingLink,
handleAddLink,
handleDeleteDocument,
fetchCustomDocuments,
// Helpers
getRegulationChunks,
getTotalChunks,
}
}
export type UseRAGPageReturn = ReturnType<typeof useRAGPage>

View File

@@ -0,0 +1,52 @@
/**
* Constants and types for ChunkBrowserQA component.
*/
export type RegGroupKey =
| 'eu_regulation'
| 'eu_directive'
| 'de_law'
| 'at_law'
| 'ch_law'
| 'national_law'
| 'bsi_standard'
| 'eu_guideline'
| 'international_standard'
| 'other'
export const GROUP_LABELS: Record<RegGroupKey, string> = {
eu_regulation: 'EU Verordnungen',
eu_directive: 'EU Richtlinien',
de_law: 'DE Gesetze',
at_law: 'AT Gesetze',
ch_law: 'CH Gesetze',
national_law: 'Nationale Gesetze (EU)',
bsi_standard: 'BSI Standards',
eu_guideline: 'EDPB / Guidelines',
international_standard: 'Internationale Standards',
other: 'Sonstige',
}
export const GROUP_ORDER: RegGroupKey[] = [
'eu_regulation', 'eu_directive', 'de_law', 'at_law', 'ch_law',
'national_law', 'bsi_standard', 'eu_guideline', 'international_standard', 'other',
]
export const COLLECTIONS = [
'bp_compliance_gesetze',
'bp_compliance_ce',
'bp_compliance_datenschutz',
'bp_dsfa_corpus',
'bp_compliance_recht',
'bp_legal_templates',
'bp_nibis_eh',
]
export const STRUCTURAL_KEYS = new Set([
'article', 'artikel', 'paragraph', 'section_title', 'section', 'chapter',
'abschnitt', 'kapitel', 'pages', 'page',
])
export const HIDDEN_KEYS = new Set([
'text', 'content', 'chunk_text', 'id', 'embedding',
])

View File

@@ -0,0 +1,234 @@
'use client'
import React from 'react'
import { STRUCTURAL_KEYS, HIDDEN_KEYS } from './ChunkBrowserConstants'
import { getChunkText, getStructuralInfo } from './ChunkBrowserHelpers'
import { RAG_PDF_MAPPING } from './rag-pdf-mapping'
import { REGULATIONS_IN_RAG } from '../rag-constants'
interface ChunkBrowserContentProps {
selectedRegulation: string | null
docLoading: boolean
docChunks: Record<string, unknown>[]
docChunkIndex: number
docTotalChunks: number
splitViewActive: boolean
chunksPerPage: number
pdfExists: boolean | null
}
export function ChunkBrowserContent({
selectedRegulation,
docLoading,
docChunks,
docChunkIndex,
docTotalChunks,
splitViewActive,
chunksPerPage,
pdfExists,
}: ChunkBrowserContentProps) {
const currentChunk = docChunks[docChunkIndex] || null
const prevChunk = docChunkIndex > 0 ? docChunks[docChunkIndex - 1] : null
const nextChunk = docChunkIndex < docChunks.length - 1 ? docChunks[docChunkIndex + 1] : null
const structInfo = getStructuralInfo(currentChunk)
// PDF page estimation
const estimatePdfPage = (chunk: Record<string, unknown> | null, chunkIdx: number): number => {
if (chunk) {
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) return pages[0]
const page = chunk.page as number | undefined
if (typeof page === 'number' && page > 0) return page
}
const mapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const cpp = mapping?.chunksPerPage || chunksPerPage
return Math.floor(chunkIdx / cpp) + 1
}
const pdfPage = estimatePdfPage(currentChunk, docChunkIndex)
const pdfMapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const pdfUrl = pdfMapping ? `/rag-originals/${pdfMapping.filename}#page=${pdfPage}` : null
// Overlap extraction
const getOverlapPrev = (): string => {
if (!prevChunk) return ''
const text = getChunkText(prevChunk)
return text.length > 150 ? '...' + text.slice(-150) : text
}
const getOverlapNext = (): string => {
if (!nextChunk) return ''
const text = getChunkText(nextChunk)
return text.length > 150 ? text.slice(0, 150) + '...' : text
}
if (!selectedRegulation) {
return (
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200">
<div className="text-center text-slate-400 space-y-2">
<div className="text-4xl">&#128269;</div>
<p className="text-sm">Dokument in der Sidebar auswaehlen, um QA zu starten.</p>
<p className="text-xs text-slate-300">Pfeiltasten: Chunk vor/zurueck</p>
</div>
</div>
)
}
if (docLoading) {
return (
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200">
<div className="text-center text-slate-500 space-y-2">
<div className="animate-spin text-3xl">&#9881;</div>
<p className="text-sm">Chunks werden geladen...</p>
<p className="text-xs text-slate-400">
{selectedRegulation}: {REGULATIONS_IN_RAG[selectedRegulation]?.chunks.toLocaleString() || '?'} Chunks erwartet
</p>
</div>
</div>
)
}
return (
<div className={`flex-1 grid gap-3 min-h-0 ${splitViewActive ? 'grid-cols-2' : 'grid-cols-1'}`}>
{/* Chunk-Text Panel */}
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Chunk-Text</span>
<div className="flex items-center gap-2">
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-50 text-blue-700 text-xs font-medium rounded border border-blue-200">
{structInfo.article}
</span>
)}
{structInfo.section && (
<span className="px-2 py-0.5 bg-purple-50 text-purple-700 text-xs rounded border border-purple-200">
{structInfo.section}
</span>
)}
<span className="text-xs text-slate-400 tabular-nums">
#{docChunkIndex} / {docTotalChunks - 1}
</span>
</div>
</div>
<div className="flex-1 overflow-y-auto min-h-0 p-4 space-y-3">
{/* Overlap from previous chunk */}
{prevChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8593; Ende vorheriger Chunk #{docChunkIndex - 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapPrev()}</p>
</div>
)}
{/* Current chunk text */}
{currentChunk ? (
<div className="text-sm text-slate-800 whitespace-pre-wrap break-words leading-relaxed border-l-2 border-teal-400 pl-3">
{getChunkText(currentChunk)}
</div>
) : (
<div className="text-sm text-slate-400 italic">Kein Chunk-Text vorhanden.</div>
)}
{/* Overlap from next chunk */}
{nextChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8595; Anfang naechster Chunk #{docChunkIndex + 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapNext()}</p>
</div>
)}
{/* Metadata */}
{currentChunk && (
<div className="mt-4 pt-3 border-t border-slate-100">
<div className="text-xs font-medium text-slate-500 mb-2">Metadaten</div>
<div className="grid grid-cols-2 gap-x-4 gap-y-1 text-xs">
{Object.entries(currentChunk)
.filter(([k]) => !HIDDEN_KEYS.has(k))
.sort(([a], [b]) => {
const aStruct = STRUCTURAL_KEYS.has(a) ? 0 : 1
const bStruct = STRUCTURAL_KEYS.has(b) ? 0 : 1
return aStruct - bStruct || a.localeCompare(b)
})
.map(([k, v]) => (
<div key={k} className={`flex gap-1 ${STRUCTURAL_KEYS.has(k) ? 'col-span-2 font-medium' : ''}`}>
<span className="font-medium text-slate-500 flex-shrink-0">{k}:</span>
<span className="text-slate-700 break-all">
{Array.isArray(v) ? v.join(', ') : String(v)}
</span>
</div>
))}
</div>
<div className="mt-3 pt-2 border-t border-slate-50">
<div className="text-xs text-slate-400">
Chunk-Laenge: {getChunkText(currentChunk).length} Zeichen
{getChunkText(currentChunk).length < 50 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr kurz</span>
)}
{getChunkText(currentChunk).length > 2000 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr lang</span>
)}
</div>
</div>
</div>
)}
</div>
</div>
{/* PDF-Viewer Panel */}
{splitViewActive && (
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Original-PDF</span>
<div className="flex items-center gap-2">
<span className="text-xs text-slate-400">
Seite ~{pdfPage}
{pdfMapping?.totalPages ? ` / ${pdfMapping.totalPages}` : ''}
</span>
{pdfUrl && (
<a
href={pdfUrl.split('#')[0]}
target="_blank"
rel="noopener noreferrer"
className="text-xs text-teal-600 hover:text-teal-800 underline"
>
Oeffnen &#8599;
</a>
)}
</div>
</div>
<div className="flex-1 min-h-0 relative">
{pdfUrl && pdfExists ? (
<iframe
key={`${selectedRegulation}-${pdfPage}`}
src={pdfUrl}
className="absolute inset-0 w-full h-full border-0"
title="Original PDF"
/>
) : (
<div className="flex items-center justify-center h-full text-slate-400 text-sm p-4">
<div className="text-center space-y-2">
<div className="text-3xl">&#128196;</div>
{!pdfMapping ? (
<>
<p>Kein PDF-Mapping fuer {selectedRegulation}.</p>
<p className="text-xs">rag-pdf-mapping.ts ergaenzen.</p>
</>
) : pdfExists === false ? (
<>
<p className="font-medium text-orange-600">PDF nicht vorhanden</p>
<p className="text-xs">Datei <code className="bg-slate-100 px-1 rounded">{pdfMapping.filename}</code> fehlt in ~/rag-originals/</p>
<p className="text-xs mt-1">Bitte manuell herunterladen und dort ablegen.</p>
</>
) : (
<p>PDF wird geprueft...</p>
)}
</div>
</div>
)}
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,39 @@
/**
* Helper functions for ChunkBrowserQA component.
*/
import { REGULATION_INFO } from '../rag-constants'
/** Get text content from a chunk */
export function getChunkText(chunk: Record<string, unknown> | null): string {
if (!chunk) return ''
return String(chunk.chunk_text || chunk.text || chunk.content || '')
}
/** Extract structural metadata for prominent display */
export function getStructuralInfo(
chunk: Record<string, unknown> | null
): { article?: string; section?: string; pages?: string } {
if (!chunk) return {}
const result: { article?: string; section?: string; pages?: string } = {}
// Article / paragraph
const article = chunk.article || chunk.artikel || chunk.paragraph || chunk.section_title
if (article) result.article = String(article)
// Section
const section = chunk.section || chunk.chapter || chunk.abschnitt || chunk.kapitel
if (section) result.section = String(section)
// Pages
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) {
result.pages = pages.length === 1 ? `S. ${pages[0]}` : `S. ${pages[0]}-${pages[pages.length - 1]}`
} else if (chunk.page) {
result.pages = `S. ${chunk.page}`
}
return result
}
/** Regulation name lookup */
export function getRegName(code: string): string {
const reg = REGULATION_INFO.find(r => r.code === code)
return reg?.name || code
}

View File

@@ -1,43 +1,18 @@
'use client' 'use client'
import React, { useState, useEffect, useCallback, useRef } from 'react' import React, { useState, useEffect, useCallback, useRef, useMemo } from 'react'
import { RAG_PDF_MAPPING } from './rag-pdf-mapping' import { RAG_PDF_MAPPING } from './rag-pdf-mapping'
import { REGULATIONS_IN_RAG, REGULATION_INFO } from '../rag-constants' import { REGULATIONS_IN_RAG, REGULATION_INFO } from '../rag-constants'
import { RegGroupKey } from './ChunkBrowserConstants'
import { getStructuralInfo } from './ChunkBrowserHelpers'
import { ChunkBrowserSidebar } from './ChunkBrowserSidebar'
import { ChunkBrowserToolbar } from './ChunkBrowserToolbar'
import { ChunkBrowserContent } from './ChunkBrowserContent'
interface ChunkBrowserQAProps { interface ChunkBrowserQAProps {
apiProxy: string apiProxy: string
} }
type RegGroupKey = 'eu_regulation' | 'eu_directive' | 'de_law' | 'at_law' | 'ch_law' | 'national_law' | 'bsi_standard' | 'eu_guideline' | 'international_standard' | 'other'
const GROUP_LABELS: Record<RegGroupKey, string> = {
eu_regulation: 'EU Verordnungen',
eu_directive: 'EU Richtlinien',
de_law: 'DE Gesetze',
at_law: 'AT Gesetze',
ch_law: 'CH Gesetze',
national_law: 'Nationale Gesetze (EU)',
bsi_standard: 'BSI Standards',
eu_guideline: 'EDPB / Guidelines',
international_standard: 'Internationale Standards',
other: 'Sonstige',
}
const GROUP_ORDER: RegGroupKey[] = [
'eu_regulation', 'eu_directive', 'de_law', 'at_law', 'ch_law',
'national_law', 'bsi_standard', 'eu_guideline', 'international_standard', 'other',
]
const COLLECTIONS = [
'bp_compliance_gesetze',
'bp_compliance_ce',
'bp_compliance_datenschutz',
'bp_dsfa_corpus',
'bp_compliance_recht',
'bp_legal_templates',
'bp_nibis_eh',
]
export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) { export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
// Filter-Sidebar // Filter-Sidebar
const [selectedRegulation, setSelectedRegulation] = useState<string | null>(null) const [selectedRegulation, setSelectedRegulation] = useState<string | null>(null)
@@ -58,7 +33,7 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
const [chunksPerPage, setChunksPerPage] = useState(6) const [chunksPerPage, setChunksPerPage] = useState(6)
const [fullscreen, setFullscreen] = useState(false) const [fullscreen, setFullscreen] = useState(false)
// Collection — default to bp_compliance_ce where we have PDFs downloaded // Collection
const [collection, setCollection] = useState('bp_compliance_ce') const [collection, setCollection] = useState('bp_compliance_ce')
// PDF existence check // PDF existence check
@@ -72,7 +47,7 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
.filter(([, info]) => info.collection === collection) .filter(([, info]) => info.collection === collection)
.map(([code]) => code) .map(([code]) => code)
const groupedRegulations = React.useMemo(() => { const groupedRegulations = useMemo(() => {
const groups: Record<RegGroupKey, { code: string; name: string; type: string }[]> = { const groups: Record<RegGroupKey, { code: string; name: string; type: string }[]> = {
eu_regulation: [], eu_directive: [], de_law: [], at_law: [], ch_law: [], eu_regulation: [], eu_directive: [], de_law: [], at_law: [], ch_law: [],
national_law: [], bsi_standard: [], eu_guideline: [], international_standard: [], other: [], national_law: [], bsi_standard: [], eu_guideline: [], international_standard: [], other: [],
@@ -81,11 +56,7 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
const reg = REGULATION_INFO.find(r => r.code === code) const reg = REGULATION_INFO.find(r => r.code === code)
const type = (reg?.type || 'other') as RegGroupKey const type = (reg?.type || 'other') as RegGroupKey
const groupKey = type in groups ? type : 'other' const groupKey = type in groups ? type : 'other'
groups[groupKey].push({ groups[groupKey].push({ code, name: reg?.name || code, type: reg?.type || 'unknown' })
code,
name: reg?.name || code,
type: reg?.type || 'unknown',
})
} }
return groups return groups
}, [regulationsInCollection.join(',')]) }, [regulationsInCollection.join(',')])
@@ -96,7 +67,6 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
.filter(([, info]) => info.collection === col && info.qdrant_id) .filter(([, info]) => info.collection === col && info.qdrant_id)
if (entries.length === 0) return if (entries.length === 0) return
// Build qdrant_id -> our_code mapping
const qdrantIdToCode: Record<string, string[]> = {} const qdrantIdToCode: Record<string, string[]> = {}
for (const [code, info] of entries) { for (const [code, info] of entries) {
if (!qdrantIdToCode[info.qdrant_id]) qdrantIdToCode[info.qdrant_id] = [] if (!qdrantIdToCode[info.qdrant_id]) qdrantIdToCode[info.qdrant_id] = []
@@ -114,13 +84,10 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
const res = await fetch(`${apiProxy}?${params}`) const res = await fetch(`${apiProxy}?${params}`)
if (res.ok) { if (res.ok) {
const data = await res.json() const data = await res.json()
// Map qdrant_id counts back to our codes
const mapped: Record<string, number> = {} const mapped: Record<string, number> = {}
for (const [qid, count] of Object.entries(data.counts as Record<string, number>)) { for (const [qid, count] of Object.entries(data.counts as Record<string, number>)) {
const codes = qdrantIdToCode[qid] || [] const codes = qdrantIdToCode[qid] || []
for (const code of codes) { for (const code of codes) { mapped[code] = count }
mapped[code] = count
}
} }
setRegulationCounts(prev => ({ ...prev, ...mapped })) setRegulationCounts(prev => ({ ...prev, ...mapped }))
} }
@@ -166,7 +133,6 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
safety++ safety++
} while (offset && safety < 200) } while (offset && safety < 200)
// Sort by chunk_index
allChunks.sort((a, b) => { allChunks.sort((a, b) => {
const ai = Number(a.chunk_index ?? a.chunk_id ?? 0) const ai = Number(a.chunk_index ?? a.chunk_id ?? 0)
const bi = Number(b.chunk_index ?? b.chunk_id ?? 0) const bi = Number(b.chunk_index ?? b.chunk_id ?? 0)
@@ -188,30 +154,6 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
loadRegulationCounts(collection) loadRegulationCounts(collection)
}, [collection, loadRegulationCounts]) }, [collection, loadRegulationCounts])
// Current chunk
const currentChunk = docChunks[docChunkIndex] || null
const prevChunk = docChunkIndex > 0 ? docChunks[docChunkIndex - 1] : null
const nextChunk = docChunkIndex < docChunks.length - 1 ? docChunks[docChunkIndex + 1] : null
// PDF page estimation — use pages metadata if available
const estimatePdfPage = (chunk: Record<string, unknown> | null, chunkIdx: number): number => {
if (chunk) {
// Try pages array from payload (e.g. [7] or [7,8])
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) return pages[0]
// Try page field
const page = chunk.page as number | undefined
if (typeof page === 'number' && page > 0) return page
}
const mapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const cpp = mapping?.chunksPerPage || chunksPerPage
return Math.floor(chunkIdx / cpp) + 1
}
const pdfPage = estimatePdfPage(currentChunk, docChunkIndex)
const pdfMapping = selectedRegulation ? RAG_PDF_MAPPING[selectedRegulation] : null
const pdfUrl = pdfMapping ? `/rag-originals/${pdfMapping.filename}#page=${pdfPage}` : null
// Check PDF existence when regulation changes // Check PDF existence when regulation changes
useEffect(() => { useEffect(() => {
if (!selectedRegulation) { setPdfExists(null); return } if (!selectedRegulation) { setPdfExists(null); return }
@@ -223,29 +165,7 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
.catch(() => setPdfExists(false)) .catch(() => setPdfExists(false))
}, [selectedRegulation]) }, [selectedRegulation])
// Handlers // Keyboard navigation
const handleSelectRegulation = (code: string) => {
setSelectedRegulation(code)
loadDocumentChunks(code)
}
const handleCollectionChange = (col: string) => {
setCollection(col)
setSelectedRegulation(null)
setDocChunks([])
setDocChunkIndex(0)
setDocTotalChunks(0)
setRegulationCounts({})
}
const handlePrev = () => {
if (docChunkIndex > 0) setDocChunkIndex(i => i - 1)
}
const handleNext = () => {
if (docChunkIndex < docChunks.length - 1) setDocChunkIndex(i => i + 1)
}
const handleKeyDown = useCallback((e: KeyboardEvent) => { const handleKeyDown = useCallback((e: KeyboardEvent) => {
if (e.key === 'Escape' && fullscreen) { if (e.key === 'Escape' && fullscreen) {
e.preventDefault() e.preventDefault()
@@ -266,6 +186,21 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
} }
}, [selectedRegulation, docChunks.length, handleKeyDown, fullscreen]) }, [selectedRegulation, docChunks.length, handleKeyDown, fullscreen])
// Handlers
const handleSelectRegulation = (code: string) => {
setSelectedRegulation(code)
loadDocumentChunks(code)
}
const handleCollectionChange = (col: string) => {
setCollection(col)
setSelectedRegulation(null)
setDocChunks([])
setDocChunkIndex(0)
setDocTotalChunks(0)
setRegulationCounts({})
}
const toggleGroup = (group: string) => { const toggleGroup = (group: string) => {
setCollapsedGroups(prev => { setCollapsedGroups(prev => {
const next = new Set(prev) const next = new Set(prev)
@@ -275,47 +210,8 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
}) })
} }
// Get text content from a chunk
const getChunkText = (chunk: Record<string, unknown> | null): string => {
if (!chunk) return ''
return String(chunk.chunk_text || chunk.text || chunk.content || '')
}
// Extract structural metadata for prominent display
const getStructuralInfo = (chunk: Record<string, unknown> | null): { article?: string; section?: string; pages?: string } => {
if (!chunk) return {}
const result: { article?: string; section?: string; pages?: string } = {}
// Article / paragraph
const article = chunk.article || chunk.artikel || chunk.paragraph || chunk.section_title
if (article) result.article = String(article)
// Section
const section = chunk.section || chunk.chapter || chunk.abschnitt || chunk.kapitel
if (section) result.section = String(section)
// Pages
const pages = chunk.pages as number[] | undefined
if (Array.isArray(pages) && pages.length > 0) {
result.pages = pages.length === 1 ? `S. ${pages[0]}` : `S. ${pages[0]}-${pages[pages.length - 1]}`
} else if (chunk.page) {
result.pages = `S. ${chunk.page}`
}
return result
}
// Overlap extraction
const getOverlapPrev = (): string => {
if (!prevChunk) return ''
const text = getChunkText(prevChunk)
return text.length > 150 ? '...' + text.slice(-150) : text
}
const getOverlapNext = (): string => {
if (!nextChunk) return ''
const text = getChunkText(nextChunk)
return text.length > 150 ? text.slice(0, 150) + '...' : text
}
// Filter sidebar items // Filter sidebar items
const filteredRegulations = React.useMemo(() => { const filteredRegulations = useMemo(() => {
if (!filterSearch.trim()) return groupedRegulations if (!filterSearch.trim()) return groupedRegulations
const term = filterSearch.toLowerCase() const term = filterSearch.toLowerCase()
const filtered: typeof groupedRegulations = { const filtered: typeof groupedRegulations = {
@@ -330,21 +226,7 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
return filtered return filtered
}, [groupedRegulations, filterSearch]) }, [groupedRegulations, filterSearch])
// Regulation name lookup const currentChunk = docChunks[docChunkIndex] || null
const getRegName = (code: string): string => {
const reg = REGULATION_INFO.find(r => r.code === code)
return reg?.name || code
}
// Important metadata keys to show prominently
const STRUCTURAL_KEYS = new Set([
'article', 'artikel', 'paragraph', 'section_title', 'section', 'chapter',
'abschnitt', 'kapitel', 'pages', 'page',
])
const HIDDEN_KEYS = new Set([
'text', 'content', 'chunk_text', 'id', 'embedding',
])
const structInfo = getStructuralInfo(currentChunk) const structInfo = getStructuralInfo(currentChunk)
return ( return (
@@ -352,323 +234,48 @@ export function ChunkBrowserQA({ apiProxy }: ChunkBrowserQAProps) {
className={`flex flex-col ${fullscreen ? 'fixed inset-0 z-50 bg-slate-100 p-4' : ''}`} className={`flex flex-col ${fullscreen ? 'fixed inset-0 z-50 bg-slate-100 p-4' : ''}`}
style={fullscreen ? { height: '100vh' } : { height: 'calc(100vh - 220px)' }} style={fullscreen ? { height: '100vh' } : { height: 'calc(100vh - 220px)' }}
> >
{/* Header bar — fixed height */} <ChunkBrowserToolbar
<div className="flex-shrink-0 bg-white rounded-xl border border-slate-200 p-3 mb-3"> collection={collection}
<div className="flex flex-wrap items-center gap-4"> onCollectionChange={handleCollectionChange}
<div> selectedRegulation={selectedRegulation}
<label className="block text-xs font-medium text-slate-500 mb-1">Collection</label> structInfo={structInfo}
<select docChunkIndex={docChunkIndex}
value={collection} docTotalChunks={docTotalChunks}
onChange={(e) => handleCollectionChange(e.target.value)} docChunksLength={docChunks.length}
className="px-3 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500" chunksPerPage={chunksPerPage}
> setChunksPerPage={setChunksPerPage}
{COLLECTIONS.map(c => ( splitViewActive={splitViewActive}
<option key={c} value={c}>{c}</option> setSplitViewActive={setSplitViewActive}
))} fullscreen={fullscreen}
</select> setFullscreen={setFullscreen}
</div> onPrev={() => { if (docChunkIndex > 0) setDocChunkIndex(i => i - 1) }}
onNext={() => { if (docChunkIndex < docChunks.length - 1) setDocChunkIndex(i => i + 1) }}
onJumpTo={setDocChunkIndex}
/>
{selectedRegulation && (
<>
<div className="flex items-center gap-2">
<span className="text-sm font-semibold text-slate-900">
{selectedRegulation} {getRegName(selectedRegulation)}
</span>
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-100 text-blue-800 text-xs font-medium rounded">
{structInfo.article}
</span>
)}
{structInfo.pages && (
<span className="px-2 py-0.5 bg-slate-100 text-slate-600 text-xs rounded">
{structInfo.pages}
</span>
)}
</div>
<div className="flex items-center gap-2 ml-auto">
<button
onClick={handlePrev}
disabled={docChunkIndex === 0}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
&#9664; Zurueck
</button>
<span className="text-sm font-mono text-slate-600 min-w-[80px] text-center">
{docChunkIndex + 1} / {docTotalChunks}
</span>
<button
onClick={handleNext}
disabled={docChunkIndex >= docChunks.length - 1}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
Weiter &#9654;
</button>
<input
type="number"
min={1}
max={docTotalChunks}
value={docChunkIndex + 1}
onChange={(e) => {
const v = parseInt(e.target.value, 10)
if (!isNaN(v) && v >= 1 && v <= docTotalChunks) setDocChunkIndex(v - 1)
}}
className="w-16 px-2 py-1 border rounded text-xs text-center"
title="Springe zu Chunk Nr."
/>
</div>
<div className="flex items-center gap-2">
<label className="text-xs text-slate-500">Chunks/Seite:</label>
<select
value={chunksPerPage}
onChange={(e) => setChunksPerPage(Number(e.target.value))}
className="px-2 py-1 border rounded text-xs"
>
{[3, 4, 5, 6, 8, 10, 12, 15, 20].map(n => (
<option key={n} value={n}>{n}</option>
))}
</select>
<button
onClick={() => setSplitViewActive(!splitViewActive)}
className={`px-3 py-1 text-xs rounded-lg border ${
splitViewActive ? 'bg-teal-50 border-teal-300 text-teal-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
>
{splitViewActive ? 'Split-View an' : 'Split-View aus'}
</button>
<button
onClick={() => setFullscreen(!fullscreen)}
className={`px-3 py-1 text-xs rounded-lg border ${
fullscreen ? 'bg-indigo-50 border-indigo-300 text-indigo-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
title={fullscreen ? 'Vollbild beenden (Esc)' : 'Vollbild'}
>
{fullscreen ? '&#10005; Vollbild beenden' : '&#9974; Vollbild'}
</button>
</div>
</>
)}
</div>
</div>
{/* Main content: Sidebar + Content — fills remaining height */}
<div className="flex gap-3 flex-1 min-h-0"> <div className="flex gap-3 flex-1 min-h-0">
{/* Sidebar — scrollable */} <ChunkBrowserSidebar
<div className="w-56 flex-shrink-0 bg-white rounded-xl border border-slate-200 flex flex-col min-h-0"> filterSearch={filterSearch}
<div className="flex-shrink-0 p-3 border-b border-slate-100"> setFilterSearch={setFilterSearch}
<input countsLoading={countsLoading}
type="text" filteredRegulations={filteredRegulations}
value={filterSearch} regulationCounts={regulationCounts}
onChange={(e) => setFilterSearch(e.target.value)} selectedRegulation={selectedRegulation}
placeholder="Suche..." collapsedGroups={collapsedGroups}
className="w-full px-2 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500" onSelectRegulation={handleSelectRegulation}
/> onToggleGroup={toggleGroup}
{countsLoading && ( />
<div className="text-xs text-slate-400 mt-1 animate-pulse">Counts laden...</div>
)}
</div>
<div className="flex-1 overflow-y-auto min-h-0">
{GROUP_ORDER.map(group => {
const items = filteredRegulations[group]
if (items.length === 0) return null
const isCollapsed = collapsedGroups.has(group)
return (
<div key={group}>
<button
onClick={() => toggleGroup(group)}
className="w-full px-3 py-1.5 text-left text-xs font-semibold text-slate-500 bg-slate-50 hover:bg-slate-100 flex items-center justify-between sticky top-0 z-10"
>
<span>{GROUP_LABELS[group]}</span>
<span className="text-slate-400">{isCollapsed ? '+' : '-'}</span>
</button>
{!isCollapsed && items.map(reg => {
const count = regulationCounts[reg.code] ?? 0
const isSelected = selectedRegulation === reg.code
return (
<button
key={reg.code}
onClick={() => handleSelectRegulation(reg.code)}
className={`w-full px-3 py-1.5 text-left text-sm flex items-center justify-between hover:bg-teal-50 transition-colors ${
isSelected ? 'bg-teal-100 text-teal-900 font-medium' : 'text-slate-700'
}`}
>
<span className="truncate text-xs">{reg.name || reg.code}</span>
<span className={`text-xs tabular-nums flex-shrink-0 ml-1 ${count > 0 ? 'text-slate-500' : 'text-slate-300'}`}>
{count > 0 ? count.toLocaleString() : '—'}
</span>
</button>
)
})}
</div>
)
})}
</div>
</div>
{/* Content area — fills remaining width and height */} <ChunkBrowserContent
{!selectedRegulation ? ( selectedRegulation={selectedRegulation}
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200"> docLoading={docLoading}
<div className="text-center text-slate-400 space-y-2"> docChunks={docChunks}
<div className="text-4xl">&#128269;</div> docChunkIndex={docChunkIndex}
<p className="text-sm">Dokument in der Sidebar auswaehlen, um QA zu starten.</p> docTotalChunks={docTotalChunks}
<p className="text-xs text-slate-300">Pfeiltasten: Chunk vor/zurueck</p> splitViewActive={splitViewActive}
</div> chunksPerPage={chunksPerPage}
</div> pdfExists={pdfExists}
) : docLoading ? ( />
<div className="flex-1 flex items-center justify-center bg-white rounded-xl border border-slate-200">
<div className="text-center text-slate-500 space-y-2">
<div className="animate-spin text-3xl">&#9881;</div>
<p className="text-sm">Chunks werden geladen...</p>
<p className="text-xs text-slate-400">
{selectedRegulation}: {REGULATIONS_IN_RAG[selectedRegulation]?.chunks.toLocaleString() || '?'} Chunks erwartet
</p>
</div>
</div>
) : (
<div className={`flex-1 grid gap-3 min-h-0 ${splitViewActive ? 'grid-cols-2' : 'grid-cols-1'}`}>
{/* Chunk-Text Panel — fixed height, internal scroll */}
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
{/* Panel header */}
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Chunk-Text</span>
<div className="flex items-center gap-2">
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-50 text-blue-700 text-xs font-medium rounded border border-blue-200">
{structInfo.article}
</span>
)}
{structInfo.section && (
<span className="px-2 py-0.5 bg-purple-50 text-purple-700 text-xs rounded border border-purple-200">
{structInfo.section}
</span>
)}
<span className="text-xs text-slate-400 tabular-nums">
#{docChunkIndex} / {docTotalChunks - 1}
</span>
</div>
</div>
{/* Scrollable content */}
<div className="flex-1 overflow-y-auto min-h-0 p-4 space-y-3">
{/* Overlap from previous chunk */}
{prevChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8593; Ende vorheriger Chunk #{docChunkIndex - 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapPrev()}</p>
</div>
)}
{/* Current chunk text */}
{currentChunk ? (
<div className="text-sm text-slate-800 whitespace-pre-wrap break-words leading-relaxed border-l-2 border-teal-400 pl-3">
{getChunkText(currentChunk)}
</div>
) : (
<div className="text-sm text-slate-400 italic">Kein Chunk-Text vorhanden.</div>
)}
{/* Overlap from next chunk */}
{nextChunk && (
<div className="text-xs text-slate-400 bg-amber-50 border-l-2 border-amber-300 px-3 py-2 rounded-r">
<div className="font-medium text-amber-600 mb-1">&#8595; Anfang naechster Chunk #{docChunkIndex + 1}</div>
<p className="whitespace-pre-wrap break-words leading-relaxed">{getOverlapNext()}</p>
</div>
)}
{/* Metadata */}
{currentChunk && (
<div className="mt-4 pt-3 border-t border-slate-100">
<div className="text-xs font-medium text-slate-500 mb-2">Metadaten</div>
<div className="grid grid-cols-2 gap-x-4 gap-y-1 text-xs">
{Object.entries(currentChunk)
.filter(([k]) => !HIDDEN_KEYS.has(k))
.sort(([a], [b]) => {
// Structural keys first
const aStruct = STRUCTURAL_KEYS.has(a) ? 0 : 1
const bStruct = STRUCTURAL_KEYS.has(b) ? 0 : 1
return aStruct - bStruct || a.localeCompare(b)
})
.map(([k, v]) => (
<div key={k} className={`flex gap-1 ${STRUCTURAL_KEYS.has(k) ? 'col-span-2 font-medium' : ''}`}>
<span className="font-medium text-slate-500 flex-shrink-0">{k}:</span>
<span className="text-slate-700 break-all">
{Array.isArray(v) ? v.join(', ') : String(v)}
</span>
</div>
))}
</div>
{/* Chunk quality indicator */}
<div className="mt-3 pt-2 border-t border-slate-50">
<div className="text-xs text-slate-400">
Chunk-Laenge: {getChunkText(currentChunk).length} Zeichen
{getChunkText(currentChunk).length < 50 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr kurz</span>
)}
{getChunkText(currentChunk).length > 2000 && (
<span className="ml-2 text-orange-500 font-medium">&#9888; Sehr lang</span>
)}
</div>
</div>
</div>
)}
</div>
</div>
{/* PDF-Viewer Panel */}
{splitViewActive && (
<div className="bg-white rounded-xl border border-slate-200 flex flex-col min-h-0 overflow-hidden">
<div className="flex-shrink-0 px-4 py-2 bg-slate-50 border-b border-slate-100 flex items-center justify-between">
<span className="text-sm font-medium text-slate-700">Original-PDF</span>
<div className="flex items-center gap-2">
<span className="text-xs text-slate-400">
Seite ~{pdfPage}
{pdfMapping?.totalPages ? ` / ${pdfMapping.totalPages}` : ''}
</span>
{pdfUrl && (
<a
href={pdfUrl.split('#')[0]}
target="_blank"
rel="noopener noreferrer"
className="text-xs text-teal-600 hover:text-teal-800 underline"
>
Oeffnen &#8599;
</a>
)}
</div>
</div>
<div className="flex-1 min-h-0 relative">
{pdfUrl && pdfExists ? (
<iframe
key={`${selectedRegulation}-${pdfPage}`}
src={pdfUrl}
className="absolute inset-0 w-full h-full border-0"
title="Original PDF"
/>
) : (
<div className="flex items-center justify-center h-full text-slate-400 text-sm p-4">
<div className="text-center space-y-2">
<div className="text-3xl">&#128196;</div>
{!pdfMapping ? (
<>
<p>Kein PDF-Mapping fuer {selectedRegulation}.</p>
<p className="text-xs">rag-pdf-mapping.ts ergaenzen.</p>
</>
) : pdfExists === false ? (
<>
<p className="font-medium text-orange-600">PDF nicht vorhanden</p>
<p className="text-xs">Datei <code className="bg-slate-100 px-1 rounded">{pdfMapping.filename}</code> fehlt in ~/rag-originals/</p>
<p className="text-xs mt-1">Bitte manuell herunterladen und dort ablegen.</p>
</>
) : (
<p>PDF wird geprueft...</p>
)}
</div>
</div>
)}
</div>
</div>
)}
</div>
)}
</div> </div>
</div> </div>
) )

View File

@@ -0,0 +1,81 @@
'use client'
import React from 'react'
import { RegGroupKey, GROUP_LABELS, GROUP_ORDER } from './ChunkBrowserConstants'
interface ChunkBrowserSidebarProps {
filterSearch: string
setFilterSearch: (v: string) => void
countsLoading: boolean
filteredRegulations: Record<RegGroupKey, { code: string; name: string; type: string }[]>
regulationCounts: Record<string, number>
selectedRegulation: string | null
collapsedGroups: Set<string>
onSelectRegulation: (code: string) => void
onToggleGroup: (group: string) => void
}
export function ChunkBrowserSidebar({
filterSearch,
setFilterSearch,
countsLoading,
filteredRegulations,
regulationCounts,
selectedRegulation,
collapsedGroups,
onSelectRegulation,
onToggleGroup,
}: ChunkBrowserSidebarProps) {
return (
<div className="w-56 flex-shrink-0 bg-white rounded-xl border border-slate-200 flex flex-col min-h-0">
<div className="flex-shrink-0 p-3 border-b border-slate-100">
<input
type="text"
value={filterSearch}
onChange={(e) => setFilterSearch(e.target.value)}
placeholder="Suche..."
className="w-full px-2 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500"
/>
{countsLoading && (
<div className="text-xs text-slate-400 mt-1 animate-pulse">Counts laden...</div>
)}
</div>
<div className="flex-1 overflow-y-auto min-h-0">
{GROUP_ORDER.map(group => {
const items = filteredRegulations[group]
if (items.length === 0) return null
const isCollapsed = collapsedGroups.has(group)
return (
<div key={group}>
<button
onClick={() => onToggleGroup(group)}
className="w-full px-3 py-1.5 text-left text-xs font-semibold text-slate-500 bg-slate-50 hover:bg-slate-100 flex items-center justify-between sticky top-0 z-10"
>
<span>{GROUP_LABELS[group]}</span>
<span className="text-slate-400">{isCollapsed ? '+' : '-'}</span>
</button>
{!isCollapsed && items.map(reg => {
const count = regulationCounts[reg.code] ?? 0
const isSelected = selectedRegulation === reg.code
return (
<button
key={reg.code}
onClick={() => onSelectRegulation(reg.code)}
className={`w-full px-3 py-1.5 text-left text-sm flex items-center justify-between hover:bg-teal-50 transition-colors ${
isSelected ? 'bg-teal-100 text-teal-900 font-medium' : 'text-slate-700'
}`}
>
<span className="truncate text-xs">{reg.name || reg.code}</span>
<span className={`text-xs tabular-nums flex-shrink-0 ml-1 ${count > 0 ? 'text-slate-500' : 'text-slate-300'}`}>
{count > 0 ? count.toLocaleString() : '\u2014'}
</span>
</button>
)
})}
</div>
)
})}
</div>
</div>
)
}

View File

@@ -0,0 +1,142 @@
'use client'
import React from 'react'
import { COLLECTIONS } from './ChunkBrowserConstants'
import { getRegName } from './ChunkBrowserHelpers'
interface ChunkBrowserToolbarProps {
collection: string
onCollectionChange: (col: string) => void
selectedRegulation: string | null
structInfo: { article?: string; section?: string; pages?: string }
docChunkIndex: number
docTotalChunks: number
docChunksLength: number
chunksPerPage: number
setChunksPerPage: (v: number) => void
splitViewActive: boolean
setSplitViewActive: (v: boolean) => void
fullscreen: boolean
setFullscreen: (v: boolean) => void
onPrev: () => void
onNext: () => void
onJumpTo: (idx: number) => void
}
export function ChunkBrowserToolbar({
collection,
onCollectionChange,
selectedRegulation,
structInfo,
docChunkIndex,
docTotalChunks,
docChunksLength,
chunksPerPage,
setChunksPerPage,
splitViewActive,
setSplitViewActive,
fullscreen,
setFullscreen,
onPrev,
onNext,
onJumpTo,
}: ChunkBrowserToolbarProps) {
return (
<div className="flex-shrink-0 bg-white rounded-xl border border-slate-200 p-3 mb-3">
<div className="flex flex-wrap items-center gap-4">
<div>
<label className="block text-xs font-medium text-slate-500 mb-1">Collection</label>
<select
value={collection}
onChange={(e) => onCollectionChange(e.target.value)}
className="px-3 py-1.5 border rounded-lg text-sm focus:ring-2 focus:ring-teal-500"
>
{COLLECTIONS.map(c => (
<option key={c} value={c}>{c}</option>
))}
</select>
</div>
{selectedRegulation && (
<>
<div className="flex items-center gap-2">
<span className="text-sm font-semibold text-slate-900">
{selectedRegulation} &mdash; {getRegName(selectedRegulation)}
</span>
{structInfo.article && (
<span className="px-2 py-0.5 bg-blue-100 text-blue-800 text-xs font-medium rounded">
{structInfo.article}
</span>
)}
{structInfo.pages && (
<span className="px-2 py-0.5 bg-slate-100 text-slate-600 text-xs rounded">
{structInfo.pages}
</span>
)}
</div>
<div className="flex items-center gap-2 ml-auto">
<button
onClick={onPrev}
disabled={docChunkIndex === 0}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
&#9664; Zurueck
</button>
<span className="text-sm font-mono text-slate-600 min-w-[80px] text-center">
{docChunkIndex + 1} / {docTotalChunks}
</span>
<button
onClick={onNext}
disabled={docChunkIndex >= docChunksLength - 1}
className="px-3 py-1.5 text-sm font-medium border rounded-lg bg-white hover:bg-slate-50 disabled:opacity-30 disabled:cursor-not-allowed"
>
Weiter &#9654;
</button>
<input
type="number"
min={1}
max={docTotalChunks}
value={docChunkIndex + 1}
onChange={(e) => {
const v = parseInt(e.target.value, 10)
if (!isNaN(v) && v >= 1 && v <= docTotalChunks) onJumpTo(v - 1)
}}
className="w-16 px-2 py-1 border rounded text-xs text-center"
title="Springe zu Chunk Nr."
/>
</div>
<div className="flex items-center gap-2">
<label className="text-xs text-slate-500">Chunks/Seite:</label>
<select
value={chunksPerPage}
onChange={(e) => setChunksPerPage(Number(e.target.value))}
className="px-2 py-1 border rounded text-xs"
>
{[3, 4, 5, 6, 8, 10, 12, 15, 20].map(n => (
<option key={n} value={n}>{n}</option>
))}
</select>
<button
onClick={() => setSplitViewActive(!splitViewActive)}
className={`px-3 py-1 text-xs rounded-lg border ${
splitViewActive ? 'bg-teal-50 border-teal-300 text-teal-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
>
{splitViewActive ? 'Split-View an' : 'Split-View aus'}
</button>
<button
onClick={() => setFullscreen(!fullscreen)}
className={`px-3 py-1 text-xs rounded-lg border ${
fullscreen ? 'bg-indigo-50 border-indigo-300 text-indigo-700' : 'bg-slate-50 border-slate-300 text-slate-600'
}`}
title={fullscreen ? 'Vollbild beenden (Esc)' : 'Vollbild'}
>
{fullscreen ? '\u2715 Vollbild beenden' : '\u2716 Vollbild'}
</button>
</div>
</>
)}
</div>
</div>
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,352 @@
/**
* RAG & Legal Corpus Management - Static Data
*
* Core data constants: regulations, industries, thematic groups, etc.
* Source URLs and licenses are in rag-sources.ts.
*/
import { REGULATIONS_IN_RAG } from './rag-constants'
import ragData from './rag-documents.json'
import type {
Regulation,
Industry,
ThematicGroup,
KeyIntersection,
FutureOutlookItem,
AdditionalRegulation,
LegalBasisInfo,
TabDef,
} from './types'
// Re-export source URLs, licenses and license labels from rag-sources.ts
export {
REGULATION_SOURCES,
REGULATION_LICENSES,
LICENSE_LABELS,
} from './rag-sources'
// API uses local proxy route to klausur-service
export const API_PROXY = '/api/legal-corpus'
export const DSFA_API_PROXY = '/api/dsfa-corpus'
// Import documents and metadata from JSON
export const RAG_DOCUMENTS = ragData.documents
export const DOC_TYPES = ragData.doc_types
export const INDUSTRIES_LIST = ragData.industries
// Derive REGULATIONS from JSON (backwards compatible for regulations tab)
export const REGULATIONS: Regulation[] = RAG_DOCUMENTS.filter((d: any) => d.description).map((d: any) => ({
code: d.code,
name: d.name,
fullName: d.full_name || d.name,
type: d.doc_type,
expected: 0,
description: d.description || '',
relevantFor: [] as string[],
keyTopics: [] as string[],
effectiveDate: d.effective_date || ''
}))
// Helper: Check if regulation is in RAG
export const isInRag = (code: string): boolean => code in REGULATIONS_IN_RAG
// Helper: Get known chunk count for a regulation
export const getKnownChunks = (code: string): number => REGULATIONS_IN_RAG[code]?.chunks || 0
// Known collection totals (updated: 2026-03-12)
export const COLLECTION_TOTALS = {
bp_compliance_gesetze: 63567,
bp_compliance_ce: 18183,
bp_legal_templates: 7689,
bp_compliance_datenschutz: 17459,
bp_dsfa_corpus: 8666,
bp_compliance_recht: 1425,
bp_nibis_eh: 7996,
total_legal: 81750,
total_all: 124985,
}
export const TYPE_COLORS: Record<string, string> = {
eu_regulation: 'bg-blue-100 text-blue-700',
eu_directive: 'bg-purple-100 text-purple-700',
de_law: 'bg-yellow-100 text-yellow-700',
at_law: 'bg-red-100 text-red-700',
ch_law: 'bg-rose-100 text-rose-700',
bsi_standard: 'bg-green-100 text-green-700',
national_law: 'bg-orange-100 text-orange-700',
eu_guideline: 'bg-teal-100 text-teal-700',
}
export const TYPE_LABELS: Record<string, string> = {
eu_regulation: 'EU-VO',
eu_directive: 'EU-RL',
de_law: 'DE-Gesetz',
at_law: 'AT-Gesetz',
ch_law: 'CH-Gesetz',
bsi_standard: 'BSI',
national_law: 'Nat. Gesetz',
eu_guideline: 'EDPB-GL',
}
// Industries for backward compatibility
export const INDUSTRIES: Industry[] = INDUSTRIES_LIST.map((ind: any) => ({
id: ind.id,
name: ind.name,
icon: ind.icon,
description: ''
}))
// Derive industry map from document data
export const INDUSTRY_REGULATION_MAP: Record<string, string[]> = {}
for (const ind of INDUSTRIES_LIST) {
INDUSTRY_REGULATION_MAP[ind.id] = RAG_DOCUMENTS
.filter((d: any) => d.industries.includes(ind.id) || d.industries.includes('all'))
.map((d: any) => d.code)
}
// Thematic groupings showing overlaps
export const THEMATIC_GROUPS: ThematicGroup[] = [
{
id: 'datenschutz',
name: 'Datenschutz & Privacy',
color: 'bg-blue-500',
regulations: ['GDPR', 'EPRIVACY', 'TDDDG', 'SCC', 'DPF'],
description: 'Schutz personenbezogener Daten, Einwilligung, Betroffenenrechte'
},
{
id: 'cybersecurity',
name: 'Cybersicherheit',
color: 'bg-red-500',
regulations: ['NIS2', 'EUCSA', 'CRA', 'BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3', 'DORA'],
description: 'IT-Sicherheit, Risikomanagement, Incident Response'
},
{
id: 'ai',
name: 'Kuenstliche Intelligenz',
color: 'bg-purple-500',
regulations: ['AIACT', 'PLD', 'GPSR'],
description: 'KI-Regulierung, Hochrisiko-Systeme, Haftung'
},
{
id: 'digital-markets',
name: 'Digitale Maerkte & Plattformen',
color: 'bg-green-500',
regulations: ['DSA', 'DGA', 'DATAACT', 'DSM'],
description: 'Plattformregulierung, Datenzugang, Urheberrecht'
},
{
id: 'product-safety',
name: 'Produktsicherheit & Haftung',
color: 'bg-orange-500',
regulations: ['CRA', 'PLD', 'GPSR', 'EAA', 'MACHINERY_REG', 'BLUE_GUIDE'],
description: 'Sicherheitsanforderungen, CE-Kennzeichnung, Maschinenverordnung, Barrierefreiheit'
},
{
id: 'finance',
name: 'Finanzmarktregulierung',
color: 'bg-emerald-500',
regulations: ['DORA', 'PSD2', 'AMLR', 'MiCA'],
description: 'Zahlungsdienste, Krypto-Assets, Geldwaeschebekaempfung, digitale Resilienz'
},
{
id: 'health',
name: 'Gesundheitsdaten',
color: 'bg-pink-500',
regulations: ['EHDS', 'BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3'],
description: 'Gesundheitsdatenraum, DiGA-Sicherheit, Patientenrechte'
},
{
id: 'verbraucherschutz',
name: 'Verbraucherschutz & E-Commerce',
color: 'bg-amber-500',
regulations: ['DE_PANGV', 'DE_VSBG', 'DE_PRODHAFTG', 'DE_UWG', 'DE_BFSG',
'WARENKAUF_RL', 'KLAUSEL_RL', 'UNLAUTERE_PRAKTIKEN_RL', 'PREISANGABEN_RL',
'OMNIBUS_RL', 'E_COMMERCE_RL', 'VERBRAUCHERRECHTE_RL', 'DIGITALE_INHALTE_RL'],
description: 'Widerrufsrecht, Preisangaben, Fernabsatz, AGB-Recht, Barrierefreiheit'
},
]
// Key overlaps and intersections
export const KEY_INTERSECTIONS: KeyIntersection[] = [
{
regulations: ['GDPR', 'AIACT'],
topic: 'KI und personenbezogene Daten',
description: 'Automatisierte Entscheidungen, Profiling, Erklaerbarkeit'
},
{
regulations: ['NIS2', 'CRA'],
topic: 'Cybersicherheit von Produkten',
description: 'Sicherheitsanforderungen ueber den gesamten Lebenszyklus'
},
{
regulations: ['AIACT', 'PLD'],
topic: 'KI-Haftung',
description: 'Wer haftet, wenn KI Schaeden verursacht?'
},
{
regulations: ['DSA', 'GDPR'],
topic: 'Plattform-Transparenz',
description: 'Inhaltsmoderation und Datenschutz'
},
{
regulations: ['DATAACT', 'GDPR'],
topic: 'Datenzugang vs. Datenschutz',
description: 'Balance zwischen Datenteilung und Privacy'
},
{
regulations: ['CRA', 'GPSR'],
topic: 'Digitale Produktsicherheit',
description: 'Hardware mit Software-Komponenten'
},
]
// Future outlook - proposed and discussed regulations
export const FUTURE_OUTLOOK: FutureOutlookItem[] = [
{
id: 'digital-omnibus',
name: 'EU Digital Omnibus',
status: 'proposed',
statusLabel: 'Vorgeschlagen Nov 2025',
expectedDate: '2026/2027',
description: 'Umfassendes Vereinfachungspaket fuer AI Act, DSGVO und Cybersicherheit. Ziel: 5 Mrd. EUR Einsparung bei Verwaltungskosten.',
keyChanges: [
'AI Act: Verschiebung Hochrisiko-Pflichten um bis zu 16 Monate (bis Dez 2027)',
'AI Act: Vereinfachte Dokumentation fuer KMU und Small Midcaps',
'AI Act: EU-weite regulatorische Sandbox fuer KI-Tests',
'DSGVO: Cookie-Banner-Reform - Berechtigtes Interesse statt nur Einwilligung',
'DSGVO: Automatische Privacy-Signale via Browser statt Pop-ups',
'Cybersecurity: Single Entry Point fuer Meldepflichten'
],
affectedRegulations: ['AIACT', 'GDPR', 'NIS2', 'CRA', 'EUCSA'],
source: 'https://digital-strategy.ec.europa.eu/en/library/digital-omnibus-ai-regulation-proposal'
},
{
id: 'sustainability-omnibus',
name: 'EU Nachhaltigkeits-Omnibus',
status: 'agreed',
statusLabel: 'Einigung Dez 2025',
expectedDate: 'Q1 2026',
description: 'Drastische Reduzierung der Nachhaltigkeits-Berichtspflichten. Anwendungsbereich wird stark eingeschraenkt.',
keyChanges: [
'CSRD: Nur noch Unternehmen >1.000 MA und >450 Mio EUR Umsatz berichtspflichtig',
'CSRD: Betroffene Unternehmen sinken von 50.000 auf ca. 5.000 in der EU',
'CSRD: Verschiebung Welle 2+3 um 2 Jahre (auf Geschaeftsjahr 2027)',
'CSDDD: Nur noch Unternehmen >5.000 MA und >1,5 Mrd EUR Umsatz',
'CSDDD: Sorgfaltspflichten nur noch fuer Tier-1-Lieferanten',
'CSDDD: Pruefung nur noch alle 5 Jahre statt jaehrlich'
],
affectedRegulations: ['CSRD', 'CSDDD', 'EU-Taxonomie'],
source: 'https://kpmg-law.de/erste-omnibus-verordnung-soll-die-pflichten-der-csddd-csrd-und-eu-taxonomie-lockern/'
},
{
id: 'eprivacy-withdrawal',
name: 'ePrivacy-Verordnung',
status: 'withdrawn',
statusLabel: 'Zurueckgezogen Feb 2025',
expectedDate: 'Unbekannt',
description: 'Nach 9 Jahren Verhandlung hat die EU-Kommission den Vorschlag zurueckgezogen. Die ePrivacy-Richtlinie bleibt in Kraft, Cookie-Reform kommt via DSGVO/Digital Omnibus.',
keyChanges: [
'Urspruenglicher Vorschlag: Einheitliche EU-Cookie-Regeln',
'Urspruenglicher Vorschlag: Strikte Tracking-Einwilligung',
'Status: ePrivacy-Richtlinie + TDDDG bleiben gueltig',
'Zukunft: Cookie-Reform wird Teil der DSGVO-Aenderungen'
],
affectedRegulations: ['EPRIVACY', 'TDDDG', 'GDPR'],
source: 'https://netzpolitik.org/2025/cookie-banner-und-online-tracking-eu-kommission-beerdigt-plaene-fuer-eprivacy-verordnung/'
},
{
id: 'ai-liability',
name: 'KI-Haftungsrichtlinie',
status: 'pending',
statusLabel: 'In Verhandlung',
expectedDate: '2026',
description: 'Ergaenzt den AI Act um zivilrechtliche Haftungsregeln. Erleichtert Geschaedigten die Beweisfuehrung bei KI-Schaeden.',
keyChanges: [
'Beweislasterleichterung bei KI-verursachten Schaeden',
'Offenlegungspflichten fuer KI-Anbieter im Schadensfall',
'Verknuepfung mit Produkthaftungsrichtlinie'
],
affectedRegulations: ['AIACT', 'PLD'],
source: 'https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52022PC0496'
},
]
// Potential future regulations (not yet integrated)
export const ADDITIONAL_REGULATIONS: AdditionalRegulation[] = [
{
code: 'PSD3',
name: 'Payment Services Directive 3',
fullName: 'Richtlinie zur dritten Zahlungsdiensterichtlinie (Entwurf)',
type: 'eu_directive',
status: 'proposed',
effectiveDate: 'Voraussichtlich 2026',
description: 'Modernisierung der Zahlungsdienste-Regulierung. Staerkerer Verbraucherschutz, Open Banking 2.0, Betrugsbekaempfung. Ersetzt dann PSD2.',
relevantFor: ['Banken', 'Zahlungsdienstleister', 'Fintechs', 'E-Commerce'],
celex: '52023PC0366',
priority: 'medium'
},
{
code: 'AMLD6',
name: 'AML-Richtlinie 6',
fullName: 'Richtlinie (EU) 2024/1640 - 6. Geldwaescherichtlinie',
type: 'eu_directive',
status: 'active',
effectiveDate: '10. Juli 2027 (Umsetzung)',
description: 'Ergaenzt die AML-Verordnung. Nationale Umsetzungsvorschriften, strafrechtliche Sanktionen, AMLA-Behoerde.',
relevantFor: ['Banken', 'Krypto-Anbieter', 'Immobilienmakler', 'Gluecksspielanbieter'],
celex: '32024L1640',
priority: 'medium'
},
{
code: 'FIDA',
name: 'Financial Data Access',
fullName: 'Verordnung zum Zugang zu Finanzdaten (Entwurf)',
type: 'eu_regulation',
status: 'proposed',
effectiveDate: 'Voraussichtlich 2027',
description: 'Open Finance Framework - erweitert PSD2-Open-Banking auf Versicherungen, Investitionen, Kredite.',
relevantFor: ['Banken', 'Versicherungen', 'Fintechs', 'Datenaggregatoren'],
celex: '52023PC0360',
priority: 'medium'
},
]
// Legal basis for using EUR-Lex content
export const LEGAL_BASIS_INFO: LegalBasisInfo = {
title: 'Rechtliche Grundlage fuer RAG-Nutzung',
summary: 'EU-Rechtstexte auf EUR-Lex sind oeffentliche amtliche Dokumente und duerfen frei verwendet werden.',
details: [
{
aspect: 'EUR-Lex Dokumente',
status: 'Erlaubt',
explanation: 'Offizielle EU-Gesetzestexte, Richtlinien und Verordnungen sind gemeinfrei (Public Domain) und duerfen frei reproduziert und kommerziell genutzt werden.'
},
{
aspect: 'Text-und-Data-Mining (TDM)',
status: 'Erlaubt',
explanation: 'Art. 4 der DSM-Richtlinie (2019/790) erlaubt TDM fuer kommerzielle Zwecke, sofern kein Opt-out des Rechteinhabers vorliegt. Fuer amtliche Texte gilt kein Opt-out.'
},
{
aspect: 'AI Act Anforderungen',
status: 'Beachten',
explanation: 'Art. 53 AI Act verlangt von GPAI-Anbietern die Einhaltung des Urheberrechts. Fuer oeffentliche Rechtstexte unproblematisch.'
},
{
aspect: 'BSI-Richtlinien',
status: 'Erlaubt',
explanation: 'BSI-Publikationen sind oeffentlich zugaenglich und duerfen fuer Compliance-Zwecke verwendet werden.'
},
]
}
// Tab definitions
export const TABS: TabDef[] = [
{ id: 'overview', name: 'Uebersicht', icon: '📊' },
{ id: 'regulations', name: 'Regulierungen', icon: '📜' },
{ id: 'map', name: 'Landkarte', icon: '🗺️' },
{ id: 'search', name: 'Suche', icon: '🔍' },
{ id: 'chunks', name: 'Chunk-Browser', icon: '🧩' },
{ id: 'data', name: 'Daten', icon: '📁' },
{ id: 'ingestion', name: 'Ingestion', icon: '⚙️' },
{ id: 'pipeline', name: 'Pipeline', icon: '🔄' },
]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,221 @@
/**
* RAG - Regulation Source URLs and License Information
*
* Extracted from rag-data.ts to stay under 500 LOC per file.
*/
// Source URLs for original documents (click to view original)
export const REGULATION_SOURCES: Record<string, string> = {
// EU Verordnungen/Richtlinien (EUR-Lex)
GDPR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32016R0679',
EPRIVACY: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32002L0058',
SCC: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32021D0914',
DPF: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023D1795',
AIACT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R1689',
CRA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R2847',
NIS2: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022L2555',
EUCSA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019R0881',
DATAACT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R2854',
DGA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R0868',
DSA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R2065',
EAA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0882',
DSM: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0790',
PLD: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024L2853',
GPSR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R0988',
DORA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R2554',
PSD2: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32015L2366',
AMLR: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32024R1624',
MiCA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1114',
EHDS: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32025R0327',
SCC_FULL_TEXT: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32021D0914',
E_COMMERCE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32000L0031',
VERBRAUCHERRECHTE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32011L0083',
DIGITALE_INHALTE_RL: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32019L0770',
DMA: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32022R1925',
MACHINERY_REG: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1230',
BLUE_GUIDE: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:52022XC0629(04)',
EU_IFRS: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1803',
// EDPB Guidelines
EDPB_GUIDELINES_2_2019: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-22019-processing-personal-data-under-article-61b_en',
EDPB_GUIDELINES_3_2019: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-32019-processing-personal-data-through-video_en',
EDPB_GUIDELINES_5_2020: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-052020-consent-under-regulation-2016679_en',
EDPB_GUIDELINES_7_2020: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-072020-concepts-controller-and-processor-gdpr_en',
EDPB_GUIDELINES_1_2022: 'https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-042022-calculation-administrative-fines-under-gdpr_en',
// BSI Technische Richtlinien
'BSI-TR-03161-1': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-1.html',
'BSI-TR-03161-2': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-2.html',
'BSI-TR-03161-3': 'https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/TechnischeRichtlinien/TR03161/BSI-TR-03161-3.html',
// Nationale Datenschutzgesetze
AT_DSG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001597',
BDSG_FULL: 'https://www.gesetze-im-internet.de/bdsg_2018/',
CH_DSG: 'https://www.fedlex.admin.ch/eli/cc/2022/491/de',
LI_DSG: 'https://www.gesetze.li/konso/2018.272',
BE_DPA_LAW: 'https://www.autoriteprotectiondonnees.be/citoyen/la-loi-du-30-juillet-2018',
NL_UAVG: 'https://wetten.overheid.nl/BWBR0040940/',
FR_CNIL_GUIDE: 'https://www.cnil.fr/fr/rgpd-par-ou-commencer',
ES_LOPDGDD: 'https://www.boe.es/buscar/act.php?id=BOE-A-2018-16673',
IT_CODICE_PRIVACY: 'https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/9042678',
IE_DPA_2018: 'https://www.irishstatutebook.ie/eli/2018/act/7/enacted/en/html',
UK_DPA_2018: 'https://www.legislation.gov.uk/ukpga/2018/12/contents',
UK_GDPR: 'https://www.legislation.gov.uk/eur/2016/679/contents',
NO_PERSONOPPLYSNINGSLOVEN: 'https://lovdata.no/dokument/NL/lov/2018-06-15-38',
SE_DATASKYDDSLAG: 'https://www.riksdagen.se/sv/dokument-och-lagar/dokument/svensk-forfattningssamling/lag-2018218-med-kompletterande-bestammelser_sfs-2018-218/',
FI_TIETOSUOJALAKI: 'https://www.finlex.fi/fi/laki/ajantasa/2018/20181050',
PL_UODO: 'https://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=WDU20180001000',
CZ_ZOU: 'https://www.zakonyprolidi.cz/cs/2019-110',
HU_INFOTV: 'https://net.jogtar.hu/jogszabaly?docid=a1100112.tv',
LU_DPA_LAW: 'https://legilux.public.lu/eli/etat/leg/loi/2018/08/01/a686/jo',
DK_DATABESKYTTELSESLOVEN: 'https://www.retsinformation.dk/eli/lta/2018/502',
// Deutschland — Weitere Gesetze
TDDDG: 'https://www.gesetze-im-internet.de/tdddg/',
DE_DDG: 'https://www.gesetze-im-internet.de/ddg/',
DE_BGB_AGB: 'https://www.gesetze-im-internet.de/bgb/__305.html',
DE_EGBGB: 'https://www.gesetze-im-internet.de/bgbeg/art_246.html',
DE_UWG: 'https://www.gesetze-im-internet.de/uwg_2004/',
DE_HGB_RET: 'https://www.gesetze-im-internet.de/hgb/__257.html',
DE_AO_RET: 'https://www.gesetze-im-internet.de/ao_1977/__147.html',
DE_TKG: 'https://www.gesetze-im-internet.de/tkg_2021/',
DE_PANGV: 'https://www.gesetze-im-internet.de/pangv_2022/',
DE_DLINFOV: 'https://www.gesetze-im-internet.de/dlinfov/',
DE_BETRVG: 'https://www.gesetze-im-internet.de/betrvg/__87.html',
DE_GESCHGEHG: 'https://www.gesetze-im-internet.de/geschgehg/',
DE_BSIG: 'https://www.gesetze-im-internet.de/bsig_2009/',
DE_USTG_RET: 'https://www.gesetze-im-internet.de/ustg_1980/__14b.html',
// Oesterreich — Weitere Gesetze
AT_ECG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20001703',
AT_TKG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20007898',
AT_KSCHG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10002462',
AT_FAGG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20008783',
AT_UGB_RET: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001702',
AT_BAO_RET: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10003940',
AT_MEDIENG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10000719',
AT_ABGB_AGB: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001622',
AT_UWG: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10002665',
// Schweiz
CH_DSV: 'https://www.fedlex.admin.ch/eli/cc/2022/568/de',
CH_OR_AGB: 'https://www.fedlex.admin.ch/eli/cc/27/317_321_377/de',
CH_UWG: 'https://www.fedlex.admin.ch/eli/cc/1988/223_223_223/de',
CH_FMG: 'https://www.fedlex.admin.ch/eli/cc/1997/2187_2187_2187/de',
CH_GEBUV: 'https://www.fedlex.admin.ch/eli/cc/2002/249/de',
CH_ZERTES: 'https://www.fedlex.admin.ch/eli/cc/2016/752/de',
CH_ZGB_PERS: 'https://www.fedlex.admin.ch/eli/cc/24/233_245_233/de',
// Industrie-Compliance
ENISA_SECURE_BY_DESIGN: 'https://www.enisa.europa.eu/publications/secure-development-best-practices',
ENISA_SUPPLY_CHAIN: 'https://www.enisa.europa.eu/publications/threat-landscape-for-supply-chain-attacks',
NIST_SSDF: 'https://csrc.nist.gov/pubs/sp/800/218/final',
NIST_CSF_2: 'https://www.nist.gov/cyberframework',
OECD_AI_PRINCIPLES: 'https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449',
// IFRS / EFRAG
EU_IFRS_DE: 'https://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX:32023R1803',
EU_IFRS_EN: 'https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32023R1803',
EFRAG_ENDORSEMENT: 'https://www.efrag.org/activities/endorsement-status-report',
// Full-text Datenschutzgesetz AT
AT_DSG_FULL: 'https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10001597',
}
// License info for each regulation
export const REGULATION_LICENSES: Record<string, { license: string; licenseNote: string }> = {
GDPR: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk der EU — frei verwendbar' },
EPRIVACY: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
TDDDG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
SCC: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Durchfuehrungsbeschluss — amtliches Werk' },
DPF: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Angemessenheitsbeschluss — amtliches Werk' },
AIACT: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
CRA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
NIS2: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
EUCSA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
DATAACT: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
DGA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
DSA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EAA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
DSM: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
PLD: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
GPSR: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
'BSI-TR-03161-1': { license: 'DL-DE-BY-2.0', licenseNote: 'Datenlizenz Deutschland — Namensnennung 2.0' },
'BSI-TR-03161-2': { license: 'DL-DE-BY-2.0', licenseNote: 'Datenlizenz Deutschland — Namensnennung 2.0' },
'BSI-TR-03161-3': { license: 'DL-DE-BY-2.0', licenseNote: 'Datenlizenz Deutschland — Namensnennung 2.0' },
DORA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
PSD2: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
AMLR: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
MiCA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EHDS: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
AT_DSG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
BDSG_FULL: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
CH_DSG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
LI_DSG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Liechtenstein — frei verwendbar' },
BE_DPA_LAW: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Belgien — frei verwendbar' },
NL_UAVG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Niederlande — frei verwendbar' },
FR_CNIL_GUIDE: { license: 'PUBLIC_DOMAIN', licenseNote: 'CNIL — oeffentliches Dokument' },
ES_LOPDGDD: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Spanien (BOE) — frei verwendbar' },
IT_CODICE_PRIVACY: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Italien — frei verwendbar' },
IE_DPA_2018: { license: 'OGL-3.0', licenseNote: 'Open Government Licence v3.0 — Ireland' },
UK_DPA_2018: { license: 'OGL-3.0', licenseNote: 'Open Government Licence v3.0 — UK' },
UK_GDPR: { license: 'OGL-3.0', licenseNote: 'Open Government Licence v3.0 — UK' },
NO_PERSONOPPLYSNINGSLOVEN: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Norwegen — frei verwendbar' },
SE_DATASKYDDSLAG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweden — frei verwendbar' },
FI_TIETOSUOJALAKI: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Finnland — frei verwendbar' },
PL_UODO: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Polen — frei verwendbar' },
CZ_ZOU: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Tschechien — frei verwendbar' },
HU_INFOTV: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Ungarn — frei verwendbar' },
SCC_FULL_TEXT: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Durchfuehrungsbeschluss — amtliches Werk' },
EDPB_GUIDELINES_2_2019: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
EDPB_GUIDELINES_3_2019: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
EDPB_GUIDELINES_5_2020: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
EDPB_GUIDELINES_7_2020: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
MACHINERY_REG: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
BLUE_GUIDE: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Leitfaden — amtliches Werk der Kommission' },
ENISA_SECURE_BY_DESIGN: { license: 'CC-BY-4.0', licenseNote: 'ENISA Publication — CC BY 4.0' },
ENISA_SUPPLY_CHAIN: { license: 'CC-BY-4.0', licenseNote: 'ENISA Publication — CC BY 4.0' },
NIST_SSDF: { license: 'PUBLIC_DOMAIN', licenseNote: 'US Government Work — Public Domain' },
NIST_CSF_2: { license: 'PUBLIC_DOMAIN', licenseNote: 'US Government Work — Public Domain' },
OECD_AI_PRINCIPLES: { license: 'PUBLIC_DOMAIN', licenseNote: 'OECD Legal Instrument — Reuse Notice' },
EU_IFRS_DE: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EU_IFRS_EN: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
EFRAG_ENDORSEMENT: { license: 'PUBLIC_DOMAIN', licenseNote: 'EFRAG — oeffentliches Dokument' },
DE_DDG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_BGB_AGB: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_EGBGB: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_UWG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_HGB_RET: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_AO_RET: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_TKG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_PANGV: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsche Verordnung — amtliches Werk (§5 UrhG)' },
DE_DLINFOV: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsche Verordnung — amtliches Werk (§5 UrhG)' },
DE_BETRVG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_GESCHGEHG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_BSIG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
DE_USTG_RET: { license: 'PUBLIC_DOMAIN', licenseNote: 'Deutsches Bundesgesetz — amtliches Werk (§5 UrhG)' },
AT_ECG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_TKG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_KSCHG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_FAGG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_UGB_RET: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_BAO_RET: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_MEDIENG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_ABGB_AGB: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
AT_UWG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Oesterreich — frei verwendbar' },
CH_DSV: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_OR_AGB: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_UWG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_FMG: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_GEBUV: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_ZERTES: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
CH_ZGB_PERS: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Schweiz — frei verwendbar' },
LU_DPA_LAW: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Luxemburg — frei verwendbar' },
DK_DATABESKYTTELSESLOVEN: { license: 'PUBLIC_DOMAIN', licenseNote: 'Amtliches Werk Daenemark — frei verwendbar' },
EDPB_GUIDELINES_1_2022: { license: 'EDPB-LICENSE', licenseNote: 'EDPB Document License' },
E_COMMERCE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
VERBRAUCHERRECHTE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
DIGITALE_INHALTE_RL: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Richtlinie — amtliches Werk' },
DMA: { license: 'PUBLIC_DOMAIN', licenseNote: 'EU-Verordnung — amtliches Werk' },
}
// License display labels
export const LICENSE_LABELS: Record<string, string> = {
PUBLIC_DOMAIN: 'Public Domain',
'DL-DE-BY-2.0': 'DL-DE-BY 2.0',
'CC-BY-4.0': 'CC BY 4.0',
'EDPB-LICENSE': 'EDPB License',
'OGL-3.0': 'OGL v3.0',
PROPRIETARY: 'Proprietaer',
}

Some files were not shown because too many files have changed in this diff Show More