Compare commits
108 Commits
96ea23164d
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9b0e310978 | ||
|
|
46c2acb2f4 | ||
|
|
b8f1b71652 | ||
|
|
6a165b36e5 | ||
|
|
9dddd80d7a | ||
|
|
20a0585eb1 | ||
|
|
4561320e0d | ||
|
|
596864431b | ||
|
|
c8027eb7f9 | ||
|
|
ba0f659d1e | ||
|
|
50bfd6e902 | ||
|
|
0599c72cc1 | ||
|
|
5fad2d420d | ||
|
|
c8e5e498b5 | ||
|
|
261f686dac | ||
|
|
3d3c2b30db | ||
|
|
1d22f649ae | ||
|
|
610825ac14 | ||
|
|
6aec4742e5 | ||
|
|
0491c2eb84 | ||
|
|
f2bc62b4f5 | ||
|
|
674c9e949e | ||
|
|
e131aa719e | ||
|
|
17f0fdb2ed | ||
|
|
d4353d76fb | ||
|
|
b42f394833 | ||
|
|
c1a903537b | ||
|
|
7085c87618 | ||
|
|
1b7e095176 | ||
|
|
dcb873db35 | ||
|
|
fd39d13d06 | ||
|
|
c5733a171b | ||
|
|
18213f0bde | ||
|
|
cd8eb6ce46 | ||
|
|
2c2bdf903a | ||
|
|
947ff6bdcb | ||
|
|
92e4021898 | ||
|
|
108f1b1a2a | ||
|
|
48de4d98cd | ||
|
|
b5900f1aff | ||
|
|
baac98f837 | ||
|
|
496d34d822 | ||
|
|
709e41e050 | ||
|
|
7b3e8c576d | ||
|
|
868f99f109 | ||
|
|
dc25f243a4 | ||
|
|
c62ff7cd31 | ||
|
|
5d91698c3b | ||
|
|
5fa5767c9a | ||
|
|
693803fb7c | ||
|
|
31089df36f | ||
|
|
7b294f9150 | ||
|
|
8b29d20940 | ||
|
|
12b194ad1a | ||
|
|
058eadb0e4 | ||
|
|
5da9a550bf | ||
|
|
52637778b9 | ||
|
|
f6372b8c69 | ||
|
|
909d0729f6 | ||
|
|
04fa01661c | ||
|
|
bf9d24e108 | ||
|
|
0f17eb3cd9 | ||
|
|
5244e10728 | ||
|
|
a6c5f56003 | ||
|
|
584e07eb21 | ||
|
|
54b1c7d7d7 | ||
|
|
d8a2331038 | ||
|
|
ad78e26143 | ||
|
|
4f4e6c31fa | ||
|
|
7ffa4c90f9 | ||
|
|
656cadbb1e | ||
|
|
757c8460c9 | ||
|
|
501de4374a | ||
|
|
774bbc50d3 | ||
|
|
9ceee4e07c | ||
|
|
f23aaaea51 | ||
|
|
cde13c9623 | ||
|
|
2e42167c73 | ||
|
|
5eff4cf877 | ||
|
|
7f4b8757ff | ||
|
|
7263328edb | ||
|
|
8c482ce8dd | ||
|
|
00f7a7154c | ||
|
|
9c5e950c99 | ||
|
|
6e494a43ab | ||
|
|
53b0d77853 | ||
|
|
aed0edbf6d | ||
|
|
9e2c301723 | ||
|
|
633e301bfd | ||
|
|
9b5e8c6b35 | ||
|
|
682b306e51 | ||
|
|
3e3116d2fd | ||
|
|
9a8ce69782 | ||
|
|
66f8a7b708 | ||
|
|
3b78baf37f | ||
|
|
2828871e42 | ||
|
|
5c96def4ec | ||
|
|
611e1ee33d | ||
|
|
49d5212f0c | ||
|
|
e6f8e12f44 | ||
|
|
aabd849e35 | ||
|
|
d1e7dd1c4a | ||
|
|
71e1b10ac7 | ||
|
|
21b69e06be | ||
|
|
0168ab1a67 | ||
|
|
925f4356ce | ||
|
|
cc4cb3bc2f | ||
|
|
0685fb12da |
237
.claude/rules/ocr-pipeline-extensions.md
Normal file
237
.claude/rules/ocr-pipeline-extensions.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# OCR Pipeline Erweiterungen - Entwicklerdokumentation
|
||||
|
||||
**Status:** Produktiv
|
||||
**Letzte Aktualisierung:** 2026-04-15
|
||||
**URL:** https://macmini:3002/ai/ocr-kombi
|
||||
|
||||
---
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Erweiterungen der OCR Kombi Pipeline (14 Steps, 0-13):
|
||||
- **SmartSpellChecker** — LLM-freie OCR-Korrektur mit Spracherkennung
|
||||
- **Box-Grid-Review** (Step 11) — Eingebettete Boxen verarbeiten
|
||||
- **Ansicht/Spreadsheet** (Step 12) — Fortune Sheet Excel-Editor
|
||||
|
||||
---
|
||||
|
||||
## Pipeline Steps
|
||||
|
||||
| Step | ID | Name | Komponente |
|
||||
|------|----|------|------------|
|
||||
| 0 | upload | Upload | StepUpload |
|
||||
| 1 | orientation | Orientierung | StepOrientation |
|
||||
| 2 | page-split | Seitentrennung | StepPageSplit |
|
||||
| 3 | deskew | Begradigung | StepDeskew |
|
||||
| 4 | dewarp | Entzerrung | StepDewarp |
|
||||
| 5 | content-crop | Zuschneiden | StepContentCrop |
|
||||
| 6 | ocr | OCR | StepOcr |
|
||||
| 7 | structure | Strukturerkennung | StepStructure |
|
||||
| 8 | grid-build | Grid-Aufbau | StepGridBuild |
|
||||
| 9 | grid-review | Grid-Review | StepGridReview |
|
||||
| 10 | gutter-repair | Wortkorrektur | StepGutterRepair |
|
||||
| **11** | **box-review** | **Box-Review** | **StepBoxGridReview** |
|
||||
| **12** | **ansicht** | **Ansicht** | **StepAnsicht** |
|
||||
| 13 | ground-truth | Ground Truth | StepGroundTruth |
|
||||
|
||||
Step-Definitionen: `admin-lehrer/app/(admin)/ai/ocr-kombi/types.ts`
|
||||
|
||||
---
|
||||
|
||||
## SmartSpellChecker
|
||||
|
||||
**Datei:** `klausur-service/backend/smart_spell.py`
|
||||
**Tests:** `tests/test_smart_spell.py` (43 Tests)
|
||||
**Lizenz:** Nur pyspellchecker (MIT) — kein LLM, kein Hunspell
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Methode |
|
||||
|---------|---------|
|
||||
| Spracherkennung | Dual-Dictionary EN/DE Heuristik |
|
||||
| a/I Disambiguation | Bigram-Kontext (Folgewort-Lookup) |
|
||||
| Boundary Repair | Frequenz-basiert: `Pound sand`→`Pounds and` |
|
||||
| Context Split | `anew`→`a new` (Allow/Deny-Liste) |
|
||||
| Multi-Digit | BFS: `sch00l`→`school` |
|
||||
| Cross-Language Guard | DE-Woerter in EN-Spalte nicht falsch korrigieren |
|
||||
| Umlaut-Korrektur | `Schuler`→`Schueler` |
|
||||
| IPA-Schutz | Inhalte in [Klammern] nie aendern |
|
||||
| Slash→l | `p/`→`pl` (kursives l als / erkannt) |
|
||||
| Abkuerzungen | 120+ aus `_KNOWN_ABBREVIATIONS` |
|
||||
|
||||
### Integration
|
||||
|
||||
```python
|
||||
# In cv_review.py (LLM Review Step):
|
||||
from smart_spell import SmartSpellChecker
|
||||
_smart = SmartSpellChecker()
|
||||
result = _smart.correct_text(text, lang="en") # oder "de" oder "auto"
|
||||
|
||||
# In grid_editor_api.py (Grid Build + Box Build):
|
||||
# Automatisch nach Grid-Aufbau und Box-Grid-Aufbau
|
||||
```
|
||||
|
||||
### Frequenz-Scoring
|
||||
|
||||
Boundary Repair vergleicht Wort-Frequenz-Produkte:
|
||||
- `old_freq = word_freq(w1) * word_freq(w2)`
|
||||
- `new_freq = word_freq(repaired_w1) * word_freq(repaired_w2)`
|
||||
- Akzeptiert wenn `new_freq > old_freq * 5`
|
||||
- Abkuerzungs-Bonus nur wenn Original-Woerter selten (freq < 1e-6)
|
||||
|
||||
---
|
||||
|
||||
## Box-Grid-Review (Step 11)
|
||||
|
||||
**Frontend:** `admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx`
|
||||
**Backend:** `klausur-service/backend/cv_box_layout.py`, `grid_editor_api.py`
|
||||
**Tests:** `tests/test_box_layout.py` (13 Tests)
|
||||
|
||||
### Backend-Endpoints
|
||||
|
||||
```
|
||||
POST /api/v1/ocr-pipeline/sessions/{id}/build-box-grids
|
||||
```
|
||||
|
||||
Verarbeitet alle erkannten Boxen aus `structure_result`:
|
||||
1. Filtert Header/Footer-Boxen (obere/untere 7% der Bildhoehe)
|
||||
2. Extrahiert OCR-Woerter pro Box aus `raw_paddle_words`
|
||||
3. Klassifiziert Layout: `flowing` | `columnar` | `bullet_list` | `header_only`
|
||||
4. Baut Grid mit layout-spezifischer Logik
|
||||
5. Wendet SmartSpellChecker an
|
||||
|
||||
### Box Layout Klassifikation (`cv_box_layout.py`)
|
||||
|
||||
| Layout | Erkennung | Grid-Aufbau |
|
||||
|--------|-----------|-------------|
|
||||
| `header_only` | ≤5 Woerter oder 1 Zeile | 1 Zelle, alles zusammen |
|
||||
| `flowing` | Gleichmaessige Zeilenbreite | 1 Spalte, Bullet-Gruppierung per Einrueckung |
|
||||
| `bullet_list` | ≥40% Zeilen mit Bullet-Marker | 1 Spalte, Bullet-Items |
|
||||
| `columnar` | Mehrere X-Cluster | Standard-Spaltenerkennung |
|
||||
|
||||
### Bullet-Einrueckung
|
||||
|
||||
Erkennung ueber Left-Edge-Analyse:
|
||||
- Minimale Einrueckung = Bullet-Ebene
|
||||
- Zeilen mit >15px mehr Einrueckung = Folgezeilen
|
||||
- Folgezeilen werden mit `\n` in die Bullet-Zelle integriert
|
||||
- Fehlende `•` Marker werden automatisch ergaenzt
|
||||
|
||||
### Colspan-Erkennung (`grid_editor_helpers.py`)
|
||||
|
||||
Generische Funktion `_detect_colspan_cells()`:
|
||||
- Laeuft nach `_build_cells()` fuer ALLE Zonen
|
||||
- Nutzt Original-Wort-Bloecke (vor `_split_cross_column_words`)
|
||||
- Wort-Block der ueber Spaltengrenze reicht → `spanning_header` mit `colspan=N`
|
||||
- Beispiel: "In Britain you pay with pounds and pence." ueber 2 Spalten
|
||||
|
||||
### Spalten-Erkennung in Boxen
|
||||
|
||||
Fuer kleine Zonen (≤60 Woerter):
|
||||
- `gap_threshold = max(median_h * 1.0, 25)` statt `3x median`
|
||||
- PaddleOCR liefert Multi-Word-Bloecke → alle Gaps sind Spalten-Gaps
|
||||
|
||||
---
|
||||
|
||||
## Ansicht / Spreadsheet (Step 12)
|
||||
|
||||
**Frontend:** `admin-lehrer/components/ocr-kombi/StepAnsicht.tsx`, `SpreadsheetView.tsx`
|
||||
**Bibliothek:** `@fortune-sheet/react` (MIT, v1.0.4)
|
||||
|
||||
### Architektur
|
||||
|
||||
Split-View:
|
||||
- **Links:** Original-Scan mit OCR-Overlay (`/image/words-overlay`)
|
||||
- **Rechts:** Fortune Sheet Spreadsheet mit Multi-Sheet-Tabs
|
||||
|
||||
### Multi-Sheet Ansatz
|
||||
|
||||
Jede Zone wird ein eigenes Sheet-Tab:
|
||||
- Sheet "Vokabeln" — Hauptgrid mit EN/DE Spalten
|
||||
- Sheet "Pounds and euros" — Box 1 mit eigenen 4 Spalten
|
||||
- Sheet "German leihen" — Box 2 als Fliesstexttext
|
||||
|
||||
Grund: Spaltenbreiten sind pro Zone unterschiedlich optimiert. Excel-Limitation: Spaltenbreite gilt fuer die ganze Spalte.
|
||||
|
||||
### Zell-Formatierung
|
||||
|
||||
| Format | Quelle | Fortune Sheet Property |
|
||||
|--------|--------|----------------------|
|
||||
| Fett | `is_header`, `is_bold`, groessere Schrift | `bl: 1` |
|
||||
| Schriftfarbe | OCR word_boxes color | `fc: '#hex'` |
|
||||
| Hintergrund | Box bg_hex, Header | `bg: '#hex08'` |
|
||||
| Text-Wrap | Mehrzeilige Zellen (\n) | `tb: '2'` |
|
||||
| Vertikal oben | Mehrzeilige Zellen | `vt: 0` |
|
||||
| Groessere Schrift | word_box height >1.3x median | `fs: 12` |
|
||||
|
||||
### Spaltenbreiten
|
||||
|
||||
Auto-Fit: `max(laengster_text * 7.5 + 16, original_px * scaleFactor)`
|
||||
|
||||
### Toolbar
|
||||
|
||||
`undo, redo, font-bold, font-italic, font-strikethrough, font-color, background, font-size, horizontal-align, vertical-align, text-wrap, merge-cell, border`
|
||||
|
||||
---
|
||||
|
||||
## Unified Grid (Backend)
|
||||
|
||||
**Datei:** `klausur-service/backend/unified_grid.py`
|
||||
**Tests:** `tests/test_unified_grid.py` (10 Tests)
|
||||
|
||||
Mergt alle Zonen in ein einzelnes Grid (fuer Export/Analyse):
|
||||
|
||||
```
|
||||
POST /api/v1/ocr-pipeline/sessions/{id}/build-unified-grid
|
||||
GET /api/v1/ocr-pipeline/sessions/{id}/unified-grid
|
||||
```
|
||||
|
||||
- Dominante Zeilenhoehe = Median der Content-Row-Abstaende
|
||||
- Full-Width Boxen: Rows direkt integriert
|
||||
- Partial-Width Boxen: Extra-Rows eingefuegt wenn Box mehr Zeilen hat
|
||||
- Box-Zellen mit `source_zone_type: "box"` und `box_region` Metadaten
|
||||
|
||||
---
|
||||
|
||||
## Dateistruktur
|
||||
|
||||
### Backend (klausur-service)
|
||||
|
||||
| Datei | Zeilen | Beschreibung |
|
||||
|-------|--------|--------------|
|
||||
| `grid_build_core.py` | 1943 | `_build_grid_core()` — Haupt-Grid-Aufbau |
|
||||
| `grid_editor_api.py` | 474 | REST-Endpoints (build, save, get, gutter, box, unified) |
|
||||
| `grid_editor_helpers.py` | 1737 | Helper: Spalten, Rows, Cells, Colspan, Header |
|
||||
| `smart_spell.py` | 587 | SmartSpellChecker |
|
||||
| `cv_box_layout.py` | 339 | Box-Layout-Klassifikation + Grid-Aufbau |
|
||||
| `unified_grid.py` | 425 | Unified Grid Builder |
|
||||
|
||||
### Frontend (admin-lehrer)
|
||||
|
||||
| Datei | Zeilen | Beschreibung |
|
||||
|-------|--------|--------------|
|
||||
| `StepBoxGridReview.tsx` | 283 | Box-Review Step 11 |
|
||||
| `StepAnsicht.tsx` | 112 | Ansicht Step 12 (Split-View) |
|
||||
| `SpreadsheetView.tsx` | ~160 | Fortune Sheet Integration |
|
||||
| `GridTable.tsx` | 652 | Grid-Editor Tabelle (Steps 9-11) |
|
||||
| `useGridEditor.ts` | 985 | Grid-Editor Hook |
|
||||
|
||||
### Tests
|
||||
|
||||
| Datei | Tests | Beschreibung |
|
||||
|-------|-------|--------------|
|
||||
| `test_smart_spell.py` | 43 | Spracherkennung, Boundary Repair, IPA-Schutz |
|
||||
| `test_box_layout.py` | 13 | Layout-Klassifikation, Bullet-Gruppierung |
|
||||
| `test_unified_grid.py` | 10 | Unified Grid, Box-Klassifikation |
|
||||
| **Gesamt** | **66** | |
|
||||
|
||||
---
|
||||
|
||||
## Aenderungshistorie
|
||||
|
||||
| Datum | Aenderung |
|
||||
|-------|-----------|
|
||||
| 2026-04-15 | Fortune Sheet Multi-Sheet Tabs, Bullet-Points, Auto-Fit, Refactoring |
|
||||
| 2026-04-14 | Unified Grid, Ansicht Step, Colspan-Erkennung |
|
||||
| 2026-04-13 | Box-Grid-Review Step, Spalten in Boxen, Header/Footer Filter |
|
||||
| 2026-04-12 | SmartSpellChecker, Frequency Scoring, IPA-Schutz, Vocab-Worksheet Refactoring |
|
||||
@@ -188,11 +188,35 @@ ssh macmini "docker compose up -d klausur-service studio-v2"
|
||||
|
||||
---
|
||||
|
||||
## Frontend Refactoring (2026-04-12)
|
||||
|
||||
`page.tsx` wurde von 2337 Zeilen in 14 Dateien aufgeteilt:
|
||||
|
||||
```
|
||||
studio-v2/app/vocab-worksheet/
|
||||
├── page.tsx # 198 Zeilen — Orchestrator
|
||||
├── types.ts # Interfaces, VocabWorksheetHook
|
||||
├── constants.ts # API-Base, Formats, Defaults
|
||||
├── useVocabWorksheet.ts # 843 Zeilen — Custom Hook (alle State + Logik)
|
||||
└── components/
|
||||
├── UploadScreen.tsx # Session-Liste + Dokument-Auswahl
|
||||
├── PageSelection.tsx # PDF-Seitenauswahl
|
||||
├── VocabularyTab.tsx # Vokabel-Tabelle + IPA/Silben
|
||||
├── WorksheetTab.tsx # Format-Auswahl + Konfiguration
|
||||
├── ExportTab.tsx # PDF-Download
|
||||
├── OcrSettingsPanel.tsx # OCR-Filter Einstellungen
|
||||
├── FullscreenPreview.tsx # Vollbild-Vorschau Modal
|
||||
├── QRCodeModal.tsx # QR-Upload Modal
|
||||
└── OcrComparisonModal.tsx # OCR-Vergleich Modal
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Erweiterung: Neue Formate hinzufuegen
|
||||
|
||||
1. **Backend**: Neuen Generator in `klausur-service/backend/` erstellen
|
||||
2. **API**: Neuen Endpoint in `vocab_worksheet_api.py` hinzufuegen
|
||||
3. **Frontend**: Format zu `worksheetFormats` Array in `page.tsx` hinzufuegen
|
||||
3. **Frontend**: Format zu `worksheetFormats` Array in `constants.ts` hinzufuegen
|
||||
4. **Doku**: Diese Datei aktualisieren
|
||||
|
||||
---
|
||||
|
||||
@@ -2,7 +2,6 @@
|
||||
|
||||
import { Suspense } from 'react'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
import { BoxSessionTabs } from '@/components/ocr-pipeline/BoxSessionTabs'
|
||||
import { KombiStepper } from '@/components/ocr-kombi/KombiStepper'
|
||||
import { SessionList } from '@/components/ocr-kombi/SessionList'
|
||||
import { SessionHeader } from '@/components/ocr-kombi/SessionHeader'
|
||||
@@ -16,6 +15,9 @@ import { StepOcr } from '@/components/ocr-kombi/StepOcr'
|
||||
import { StepStructure } from '@/components/ocr-kombi/StepStructure'
|
||||
import { StepGridBuild } from '@/components/ocr-kombi/StepGridBuild'
|
||||
import { StepGridReview } from '@/components/ocr-kombi/StepGridReview'
|
||||
import { StepGutterRepair } from '@/components/ocr-kombi/StepGutterRepair'
|
||||
import { StepBoxGridReview } from '@/components/ocr-kombi/StepBoxGridReview'
|
||||
import { StepAnsicht } from '@/components/ocr-kombi/StepAnsicht'
|
||||
import { StepGroundTruth } from '@/components/ocr-kombi/StepGroundTruth'
|
||||
import { useKombiPipeline } from './useKombiPipeline'
|
||||
|
||||
@@ -27,8 +29,7 @@ function OcrKombiContent() {
|
||||
loadingSessions,
|
||||
activeCategory,
|
||||
isGroundTruth,
|
||||
subSessions,
|
||||
parentSessionId,
|
||||
pageNumber,
|
||||
steps,
|
||||
gridSaveRef,
|
||||
groupedSessions,
|
||||
@@ -40,11 +41,8 @@ function OcrKombiContent() {
|
||||
deleteSession,
|
||||
renameSession,
|
||||
updateCategory,
|
||||
handleSessionChange,
|
||||
setSessionId,
|
||||
setSessionName,
|
||||
setSubSessions,
|
||||
setParentSessionId,
|
||||
setIsGroundTruth,
|
||||
} = useKombiPipeline()
|
||||
|
||||
@@ -75,17 +73,11 @@ function OcrKombiContent() {
|
||||
<StepPageSplit
|
||||
sessionId={sessionId}
|
||||
sessionName={sessionName}
|
||||
onNext={() => {
|
||||
// If sub-sessions were created, switch to the first one
|
||||
if (subSessions.length > 0) {
|
||||
setSessionId(subSessions[0].id)
|
||||
setSessionName(subSessions[0].name)
|
||||
}
|
||||
handleNext()
|
||||
}}
|
||||
onSubSessionsCreated={(subs) => {
|
||||
setSubSessions(subs)
|
||||
if (sessionId) setParentSessionId(sessionId)
|
||||
onNext={handleNext}
|
||||
onSplitComplete={(childId, childName) => {
|
||||
// Switch to the first child session and refresh the list
|
||||
setSessionId(childId)
|
||||
setSessionName(childName)
|
||||
loadSessions()
|
||||
}}
|
||||
/>
|
||||
@@ -105,6 +97,12 @@ function OcrKombiContent() {
|
||||
case 9:
|
||||
return <StepGridReview sessionId={sessionId} onNext={handleNext} saveRef={gridSaveRef} />
|
||||
case 10:
|
||||
return <StepGutterRepair sessionId={sessionId} onNext={handleNext} />
|
||||
case 11:
|
||||
return <StepBoxGridReview sessionId={sessionId} onNext={handleNext} />
|
||||
case 12:
|
||||
return <StepAnsicht sessionId={sessionId} onNext={handleNext} />
|
||||
case 13:
|
||||
return (
|
||||
<StepGroundTruth
|
||||
sessionId={sessionId}
|
||||
@@ -151,6 +149,7 @@ function OcrKombiContent() {
|
||||
sessionName={sessionName}
|
||||
activeCategory={activeCategory}
|
||||
isGroundTruth={isGroundTruth}
|
||||
pageNumber={pageNumber}
|
||||
onUpdateCategory={(cat) => updateCategory(sessionId, cat)}
|
||||
/>
|
||||
)}
|
||||
@@ -161,15 +160,6 @@ function OcrKombiContent() {
|
||||
onStepClick={handleStepClick}
|
||||
/>
|
||||
|
||||
{subSessions.length > 0 && parentSessionId && sessionId && (
|
||||
<BoxSessionTabs
|
||||
parentSessionId={parentSessionId}
|
||||
subSessions={subSessions}
|
||||
activeSessionId={sessionId}
|
||||
onSessionChange={handleSessionChange}
|
||||
/>
|
||||
)}
|
||||
|
||||
<div className="min-h-[400px]">{renderStep()}</div>
|
||||
</div>
|
||||
)
|
||||
|
||||
@@ -8,7 +8,6 @@ export { DOCUMENT_CATEGORIES } from '../ocr-pipeline/types'
|
||||
export type {
|
||||
SessionListItem,
|
||||
SessionInfo,
|
||||
SubSession,
|
||||
OrientationResult,
|
||||
CropResult,
|
||||
DeskewResult,
|
||||
@@ -40,6 +39,9 @@ export const KOMBI_V2_STEPS: PipelineStep[] = [
|
||||
{ id: 'structure', name: 'Strukturerkennung', icon: '🔍', status: 'pending' },
|
||||
{ id: 'grid-build', name: 'Grid-Aufbau', icon: '🧱', status: 'pending' },
|
||||
{ id: 'grid-review', name: 'Grid-Review', icon: '📊', status: 'pending' },
|
||||
{ id: 'gutter-repair', name: 'Wortkorrektur', icon: '🩹', status: 'pending' },
|
||||
{ id: 'box-review', name: 'Box-Review', icon: '📦', status: 'pending' },
|
||||
{ id: 'ansicht', name: 'Ansicht', icon: '👁️', status: 'pending' },
|
||||
{ id: 'ground-truth', name: 'Ground Truth', icon: '✅', status: 'pending' },
|
||||
]
|
||||
|
||||
@@ -55,7 +57,10 @@ export const KOMBI_V2_UI_TO_DB: Record<number, number> = {
|
||||
7: 9, // structure
|
||||
8: 10, // grid-build
|
||||
9: 11, // grid-review
|
||||
10: 12, // ground-truth
|
||||
10: 11, // gutter-repair (shares DB step with grid-review)
|
||||
11: 11, // box-review (shares DB step with grid-review)
|
||||
12: 11, // ansicht (shares DB step with grid-review)
|
||||
13: 12, // ground-truth
|
||||
}
|
||||
|
||||
/** Map from DB step to Kombi V2 UI step index */
|
||||
@@ -69,7 +74,7 @@ export function dbStepToKombiV2Ui(dbStep: number): number {
|
||||
if (dbStep === 9) return 7 // structure
|
||||
if (dbStep === 10) return 8 // grid-build
|
||||
if (dbStep === 11) return 9 // grid-review
|
||||
return 10 // ground-truth
|
||||
return 13 // ground-truth
|
||||
}
|
||||
|
||||
/** Document group: groups multiple sessions from a multi-page upload */
|
||||
|
||||
@@ -4,7 +4,7 @@ import { useCallback, useEffect, useState, useRef } from 'react'
|
||||
import { useSearchParams } from 'next/navigation'
|
||||
import type { PipelineStep, DocumentCategory } from './types'
|
||||
import { KOMBI_V2_STEPS, dbStepToKombiV2Ui } from './types'
|
||||
import type { SubSession, SessionListItem } from '../ocr-pipeline/types'
|
||||
import type { SessionListItem } from '../ocr-pipeline/types'
|
||||
|
||||
export type { SessionListItem }
|
||||
|
||||
@@ -33,8 +33,7 @@ export function useKombiPipeline() {
|
||||
const [loadingSessions, setLoadingSessions] = useState(true)
|
||||
const [activeCategory, setActiveCategory] = useState<DocumentCategory | undefined>(undefined)
|
||||
const [isGroundTruth, setIsGroundTruth] = useState(false)
|
||||
const [subSessions, setSubSessions] = useState<SubSession[]>([])
|
||||
const [parentSessionId, setParentSessionId] = useState<string | null>(null)
|
||||
const [pageNumber, setPageNumber] = useState<number | null>(null)
|
||||
const [steps, setSteps] = useState<PipelineStep[]>(initSteps())
|
||||
|
||||
const searchParams = useSearchParams()
|
||||
@@ -115,7 +114,7 @@ export function useKombiPipeline() {
|
||||
|
||||
// ---- Open session ----
|
||||
|
||||
const openSession = useCallback(async (sid: string, keepSubSessions?: boolean) => {
|
||||
const openSession = useCallback(async (sid: string) => {
|
||||
try {
|
||||
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
|
||||
if (!res.ok) return
|
||||
@@ -125,26 +124,19 @@ export function useKombiPipeline() {
|
||||
setSessionName(data.name || data.filename || '')
|
||||
setActiveCategory(data.document_category || undefined)
|
||||
setIsGroundTruth(!!data.ground_truth?.build_grid_reference)
|
||||
|
||||
// Sub-session handling
|
||||
if (data.sub_sessions?.length > 0) {
|
||||
setSubSessions(data.sub_sessions)
|
||||
setParentSessionId(sid)
|
||||
} else if (data.parent_session_id) {
|
||||
setParentSessionId(data.parent_session_id)
|
||||
} else if (!keepSubSessions) {
|
||||
setSubSessions([])
|
||||
setParentSessionId(null)
|
||||
}
|
||||
setPageNumber(data.grid_editor_result?.page_number?.number ?? null)
|
||||
|
||||
// Determine UI step from DB state
|
||||
const dbStep = data.current_step || 1
|
||||
const hasGrid = !!data.grid_editor_result
|
||||
const hasStructure = !!data.structure_result
|
||||
const hasWords = !!data.word_result
|
||||
const hasGutterRepair = !!(data.ground_truth?.gutter_repair)
|
||||
|
||||
let uiStep: number
|
||||
if (hasGrid) {
|
||||
if (hasGrid && hasGutterRepair) {
|
||||
uiStep = 10 // gutter-repair (already analysed)
|
||||
} else if (hasGrid) {
|
||||
uiStep = 9 // grid-review
|
||||
} else if (hasStructure) {
|
||||
uiStep = 8 // grid-build
|
||||
@@ -159,22 +151,10 @@ export function useKombiPipeline() {
|
||||
uiStep = 1
|
||||
}
|
||||
|
||||
const skipIds: string[] = []
|
||||
const isSubSession = !!data.parent_session_id
|
||||
if (isSubSession && dbStep >= 5) {
|
||||
skipIds.push('upload', 'orientation', 'page-split', 'deskew', 'dewarp', 'content-crop')
|
||||
if (uiStep < 6) uiStep = 6
|
||||
} else if (isSubSession && dbStep >= 2) {
|
||||
skipIds.push('upload', 'orientation')
|
||||
if (uiStep < 2) uiStep = 2
|
||||
}
|
||||
|
||||
setSteps(
|
||||
KOMBI_V2_STEPS.map((s, i) => ({
|
||||
...s,
|
||||
status: skipIds.includes(s.id)
|
||||
? 'skipped'
|
||||
: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
|
||||
status: i < uiStep ? 'completed' : i === uiStep ? 'active' : 'pending',
|
||||
})),
|
||||
)
|
||||
setCurrentStep(uiStep)
|
||||
@@ -226,8 +206,6 @@ export function useKombiPipeline() {
|
||||
setSteps(initSteps())
|
||||
setCurrentStep(0)
|
||||
setSessionId(null)
|
||||
setSubSessions([])
|
||||
setParentSessionId(null)
|
||||
loadSessions()
|
||||
return
|
||||
}
|
||||
@@ -249,8 +227,6 @@ export function useKombiPipeline() {
|
||||
setSessionId(null)
|
||||
setSessionName('')
|
||||
setCurrentStep(0)
|
||||
setSubSessions([])
|
||||
setParentSessionId(null)
|
||||
setSteps(initSteps())
|
||||
}, [])
|
||||
|
||||
@@ -292,40 +268,6 @@ export function useKombiPipeline() {
|
||||
}
|
||||
}, [sessionId])
|
||||
|
||||
// ---- Orientation completion (checks for page-split sub-sessions) ----
|
||||
|
||||
const handleOrientationComplete = useCallback(async (sid: string) => {
|
||||
setSessionId(sid)
|
||||
loadSessions()
|
||||
|
||||
try {
|
||||
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sid}`)
|
||||
if (res.ok) {
|
||||
const data = await res.json()
|
||||
if (data.sub_sessions?.length > 0) {
|
||||
const subs: SubSession[] = data.sub_sessions.map((s: SubSession) => ({
|
||||
id: s.id,
|
||||
name: s.name,
|
||||
box_index: s.box_index,
|
||||
current_step: s.current_step,
|
||||
}))
|
||||
setSubSessions(subs)
|
||||
setParentSessionId(sid)
|
||||
openSession(subs[0].id, true)
|
||||
return
|
||||
}
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('Failed to check for sub-sessions:', e)
|
||||
}
|
||||
|
||||
handleNext()
|
||||
}, [loadSessions, openSession, handleNext])
|
||||
|
||||
const handleSessionChange = useCallback((newSessionId: string) => {
|
||||
openSession(newSessionId, true)
|
||||
}, [openSession])
|
||||
|
||||
return {
|
||||
// State
|
||||
currentStep,
|
||||
@@ -335,8 +277,7 @@ export function useKombiPipeline() {
|
||||
loadingSessions,
|
||||
activeCategory,
|
||||
isGroundTruth,
|
||||
subSessions,
|
||||
parentSessionId,
|
||||
pageNumber,
|
||||
steps,
|
||||
gridSaveRef,
|
||||
// Computed
|
||||
@@ -351,11 +292,7 @@ export function useKombiPipeline() {
|
||||
deleteSession,
|
||||
renameSession,
|
||||
updateCategory,
|
||||
handleOrientationComplete,
|
||||
handleSessionChange,
|
||||
setSessionId,
|
||||
setSubSessions,
|
||||
setParentSessionId,
|
||||
setSessionName,
|
||||
setIsGroundTruth,
|
||||
}
|
||||
|
||||
@@ -36,6 +36,7 @@ export interface SessionListItem {
|
||||
parent_session_id?: string
|
||||
document_group_id?: string
|
||||
page_number?: number
|
||||
is_ground_truth?: boolean
|
||||
created_at: string
|
||||
updated_at?: string
|
||||
}
|
||||
|
||||
252
admin-lehrer/app/(admin)/ai/rag/__tests__/rag-documents.test.ts
Normal file
252
admin-lehrer/app/(admin)/ai/rag/__tests__/rag-documents.test.ts
Normal file
@@ -0,0 +1,252 @@
|
||||
import { describe, it, expect } from 'vitest'
|
||||
import ragData from '../rag-documents.json'
|
||||
|
||||
/**
|
||||
* Tests fuer rag-documents.json — Branchen-Regulierungs-Matrix
|
||||
*
|
||||
* Validiert die JSON-Struktur, Branchen-Zuordnung und Datenintegritaet
|
||||
* der 320 Dokumente fuer die RAG Landkarte.
|
||||
*/
|
||||
|
||||
const VALID_INDUSTRY_IDS = ragData.industries.map((i: any) => i.id)
|
||||
const VALID_DOC_TYPE_IDS = ragData.doc_types.map((dt: any) => dt.id)
|
||||
|
||||
describe('rag-documents.json — Struktur', () => {
|
||||
it('sollte doc_types, industries und documents enthalten', () => {
|
||||
expect(ragData).toHaveProperty('doc_types')
|
||||
expect(ragData).toHaveProperty('industries')
|
||||
expect(ragData).toHaveProperty('documents')
|
||||
expect(Array.isArray(ragData.doc_types)).toBe(true)
|
||||
expect(Array.isArray(ragData.industries)).toBe(true)
|
||||
expect(Array.isArray(ragData.documents)).toBe(true)
|
||||
})
|
||||
|
||||
it('sollte genau 10 Branchen haben (VDMA/VDA/BDI)', () => {
|
||||
expect(ragData.industries).toHaveLength(10)
|
||||
const ids = ragData.industries.map((i: any) => i.id)
|
||||
expect(ids).toContain('automotive')
|
||||
expect(ids).toContain('maschinenbau')
|
||||
expect(ids).toContain('elektrotechnik')
|
||||
expect(ids).toContain('chemie')
|
||||
expect(ids).toContain('metall')
|
||||
expect(ids).toContain('energie')
|
||||
expect(ids).toContain('transport')
|
||||
expect(ids).toContain('handel')
|
||||
expect(ids).toContain('konsumgueter')
|
||||
expect(ids).toContain('bau')
|
||||
})
|
||||
|
||||
it('sollte keine Pseudo-Branchen enthalten (IoT, KI, HR, KRITIS, etc.)', () => {
|
||||
const ids = ragData.industries.map((i: any) => i.id)
|
||||
expect(ids).not.toContain('iot')
|
||||
expect(ids).not.toContain('ai')
|
||||
expect(ids).not.toContain('hr')
|
||||
expect(ids).not.toContain('kritis')
|
||||
expect(ids).not.toContain('ecommerce')
|
||||
expect(ids).not.toContain('tech')
|
||||
expect(ids).not.toContain('media')
|
||||
expect(ids).not.toContain('public')
|
||||
})
|
||||
|
||||
it('sollte 17 Dokumenttypen haben', () => {
|
||||
expect(ragData.doc_types.length).toBe(17)
|
||||
})
|
||||
|
||||
it('sollte mindestens 300 Dokumente haben', () => {
|
||||
expect(ragData.documents.length).toBeGreaterThanOrEqual(300)
|
||||
})
|
||||
|
||||
it('sollte jede Branche name und icon haben', () => {
|
||||
ragData.industries.forEach((ind: any) => {
|
||||
expect(ind).toHaveProperty('id')
|
||||
expect(ind).toHaveProperty('name')
|
||||
expect(ind).toHaveProperty('icon')
|
||||
expect(ind.name.length).toBeGreaterThan(0)
|
||||
})
|
||||
})
|
||||
|
||||
it('sollte jeden doc_type mit id, label, icon und sort haben', () => {
|
||||
ragData.doc_types.forEach((dt: any) => {
|
||||
expect(dt).toHaveProperty('id')
|
||||
expect(dt).toHaveProperty('label')
|
||||
expect(dt).toHaveProperty('icon')
|
||||
expect(dt).toHaveProperty('sort')
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
describe('rag-documents.json — Dokument-Validierung', () => {
|
||||
it('sollte keine doppelten Codes haben', () => {
|
||||
const codes = ragData.documents.map((d: any) => d.code)
|
||||
const unique = new Set(codes)
|
||||
expect(unique.size).toBe(codes.length)
|
||||
})
|
||||
|
||||
it('sollte Pflichtfelder bei jedem Dokument haben', () => {
|
||||
ragData.documents.forEach((doc: any) => {
|
||||
expect(doc).toHaveProperty('code')
|
||||
expect(doc).toHaveProperty('name')
|
||||
expect(doc).toHaveProperty('doc_type')
|
||||
expect(doc).toHaveProperty('industries')
|
||||
expect(doc).toHaveProperty('in_rag')
|
||||
expect(doc).toHaveProperty('rag_collection')
|
||||
expect(doc.code.length).toBeGreaterThan(0)
|
||||
expect(doc.name.length).toBeGreaterThan(0)
|
||||
expect(Array.isArray(doc.industries)).toBe(true)
|
||||
})
|
||||
})
|
||||
|
||||
it('sollte nur gueltige doc_type IDs verwenden', () => {
|
||||
ragData.documents.forEach((doc: any) => {
|
||||
expect(VALID_DOC_TYPE_IDS).toContain(doc.doc_type)
|
||||
})
|
||||
})
|
||||
|
||||
it('sollte nur gueltige industry IDs verwenden (oder "all")', () => {
|
||||
ragData.documents.forEach((doc: any) => {
|
||||
doc.industries.forEach((ind: string) => {
|
||||
if (ind !== 'all') {
|
||||
expect(VALID_INDUSTRY_IDS).toContain(ind)
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
it('sollte gueltige rag_collection Namen verwenden', () => {
|
||||
const validCollections = [
|
||||
'bp_compliance_ce',
|
||||
'bp_compliance_gesetze',
|
||||
'bp_compliance_datenschutz',
|
||||
'bp_dsfa_corpus',
|
||||
'bp_legal_templates',
|
||||
'bp_compliance_recht',
|
||||
'bp_nibis_eh',
|
||||
]
|
||||
ragData.documents.forEach((doc: any) => {
|
||||
expect(validCollections).toContain(doc.rag_collection)
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
describe('rag-documents.json — Branchen-Zuordnungslogik', () => {
|
||||
const findDoc = (code: string) => ragData.documents.find((d: any) => d.code === code)
|
||||
|
||||
describe('Horizontale Regulierungen (alle Branchen)', () => {
|
||||
const horizontalCodes = [
|
||||
'GDPR', 'BDSG_FULL', 'EPRIVACY', 'TDDDG', 'AIACT', 'CRA',
|
||||
'NIS2', 'GPSR', 'PLD', 'EUCSA', 'DATAACT',
|
||||
]
|
||||
|
||||
horizontalCodes.forEach((code) => {
|
||||
it(`${code} sollte fuer alle Branchen gelten`, () => {
|
||||
const doc = findDoc(code)
|
||||
if (doc) {
|
||||
expect(doc.industries).toContain('all')
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
describe('Sektorspezifische Regulierungen', () => {
|
||||
it('Maschinenverordnung sollte Maschinenbau, Automotive, Elektrotechnik enthalten', () => {
|
||||
const doc = findDoc('MACHINERY_REG')
|
||||
if (doc) {
|
||||
expect(doc.industries).toContain('maschinenbau')
|
||||
expect(doc.industries).toContain('automotive')
|
||||
expect(doc.industries).toContain('elektrotechnik')
|
||||
expect(doc.industries).not.toContain('all')
|
||||
}
|
||||
})
|
||||
|
||||
it('ElektroG sollte Elektrotechnik und Automotive enthalten', () => {
|
||||
const doc = findDoc('DE_ELEKTROG')
|
||||
if (doc) {
|
||||
expect(doc.industries).toContain('elektrotechnik')
|
||||
expect(doc.industries).toContain('automotive')
|
||||
}
|
||||
})
|
||||
|
||||
it('BattDG sollte Automotive und Elektrotechnik enthalten', () => {
|
||||
const doc = findDoc('DE_BATTDG')
|
||||
if (doc) {
|
||||
expect(doc.industries).toContain('automotive')
|
||||
expect(doc.industries).toContain('elektrotechnik')
|
||||
}
|
||||
})
|
||||
|
||||
it('ENISA ICS/SCADA sollte Energie, Maschinenbau, Chemie enthalten', () => {
|
||||
const doc = findDoc('ENISA_ICS_SCADA')
|
||||
if (doc) {
|
||||
expect(doc.industries).toContain('energie')
|
||||
expect(doc.industries).toContain('maschinenbau')
|
||||
expect(doc.industries).toContain('chemie')
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
describe('Nicht zutreffende Regulierungen (Finanz/Medizin/Plattformen)', () => {
|
||||
const emptyIndustryCodes = ['DORA', 'PSD2', 'MiCA', 'AMLR', 'EHDS', 'DSA', 'DMA', 'MDR']
|
||||
|
||||
emptyIndustryCodes.forEach((code) => {
|
||||
it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
|
||||
const doc = findDoc(code)
|
||||
if (doc) {
|
||||
expect(doc.industries).toHaveLength(0)
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
describe('BSI-TR-03161 (DiGA) sollte nicht zutreffend sein', () => {
|
||||
['BSI-TR-03161-1', 'BSI-TR-03161-2', 'BSI-TR-03161-3'].forEach((code) => {
|
||||
it(`${code} sollte keine Branchen-Zuordnung haben`, () => {
|
||||
const doc = findDoc(code)
|
||||
if (doc) {
|
||||
expect(doc.industries).toHaveLength(0)
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
describe('rag-documents.json — Applicability Notes', () => {
|
||||
it('sollte applicability_note bei Dokumenten mit description haben', () => {
|
||||
const withDescription = ragData.documents.filter((d: any) => d.description)
|
||||
const withNote = withDescription.filter((d: any) => d.applicability_note)
|
||||
// Mindestens 90% der Dokumente mit Beschreibung sollten eine Note haben
|
||||
expect(withNote.length / withDescription.length).toBeGreaterThan(0.9)
|
||||
})
|
||||
|
||||
it('horizontale Regulierungen sollten "alle Branchen" in der Note erwaehnen', () => {
|
||||
const gdpr = ragData.documents.find((d: any) => d.code === 'GDPR')
|
||||
if (gdpr?.applicability_note) {
|
||||
expect(gdpr.applicability_note.toLowerCase()).toContain('alle branchen')
|
||||
}
|
||||
})
|
||||
|
||||
it('nicht zutreffende sollten "nicht zutreffend" in der Note erwaehnen', () => {
|
||||
const dora = ragData.documents.find((d: any) => d.code === 'DORA')
|
||||
if (dora?.applicability_note) {
|
||||
expect(dora.applicability_note.toLowerCase()).toContain('nicht zutreffend')
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
describe('rag-documents.json — Dokumenttyp-Verteilung', () => {
|
||||
it('sollte Dokumente in jedem doc_type haben', () => {
|
||||
ragData.doc_types.forEach((dt: any) => {
|
||||
const count = ragData.documents.filter((d: any) => d.doc_type === dt.id).length
|
||||
expect(count).toBeGreaterThan(0)
|
||||
})
|
||||
})
|
||||
|
||||
it('sollte EU-Verordnungen als groesste Kategorie haben (mind. 15)', () => {
|
||||
const euRegs = ragData.documents.filter((d: any) => d.doc_type === 'eu_regulation')
|
||||
expect(euRegs.length).toBeGreaterThanOrEqual(15)
|
||||
})
|
||||
|
||||
it('sollte EDPB Leitlinien als umfangreichste Kategorie haben (mind. 40)', () => {
|
||||
const edpb = ragData.documents.filter((d: any) => d.doc_type === 'edpb_guideline')
|
||||
expect(edpb.length).toBeGreaterThanOrEqual(40)
|
||||
})
|
||||
})
|
||||
File diff suppressed because it is too large
Load Diff
4332
admin-lehrer/app/(admin)/ai/rag/rag-documents.json
Normal file
4332
admin-lehrer/app/(admin)/ai/rag/rag-documents.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -107,12 +107,18 @@ export function GridTable({
|
||||
const row = zone.rows.find((r) => r.index === rowIndex)
|
||||
if (!row) return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
|
||||
|
||||
// Multi-line cells (containing \n): expand height based on line count
|
||||
const rowCells = zone.cells.filter((c) => c.row_index === rowIndex)
|
||||
const maxLines = Math.max(1, ...rowCells.map((c) => (c.text ?? '').split('\n').length))
|
||||
if (maxLines > 1) {
|
||||
const lineH = Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
|
||||
return lineH * maxLines
|
||||
}
|
||||
|
||||
if (isHeader) {
|
||||
// Headers keep their measured height
|
||||
const measuredH = row.y_max_px - row.y_min_px
|
||||
return Math.max(MIN_ROW_HEIGHT, measuredH * scale)
|
||||
}
|
||||
// Content rows use average for uniformity
|
||||
return Math.max(MIN_ROW_HEIGHT, avgRowHeightPx * scale)
|
||||
}
|
||||
|
||||
@@ -410,46 +416,43 @@ export function GridTable({
|
||||
|
||||
{/* Cells — spanning header or normal columns */}
|
||||
{isSpanning ? (
|
||||
<div
|
||||
className="border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center"
|
||||
style={{
|
||||
gridColumn: `2 / ${numCols + 2}`,
|
||||
height: `${rowH}px`,
|
||||
}}
|
||||
>
|
||||
{(() => {
|
||||
const spanCell = zone.cells.find(
|
||||
(c) => c.row_index === row.index && c.col_type === 'spanning_header',
|
||||
)
|
||||
if (!spanCell) return null
|
||||
const cellId = spanCell.cell_id
|
||||
const isSelected = selectedCell === cellId
|
||||
const cellColor = getCellColor(spanCell)
|
||||
return (
|
||||
<div className="flex items-center w-full">
|
||||
{cellColor && (
|
||||
<span
|
||||
className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm"
|
||||
style={{ backgroundColor: cellColor }}
|
||||
/>
|
||||
)}
|
||||
<input
|
||||
id={`cell-${cellId}`}
|
||||
type="text"
|
||||
value={spanCell.text}
|
||||
onChange={(e) => onCellTextChange(cellId, e.target.value)}
|
||||
onFocus={() => onSelectCell(cellId)}
|
||||
onKeyDown={(e) => handleKeyDown(e, cellId)}
|
||||
className={`w-full px-3 py-1 bg-transparent border-0 outline-none text-center ${
|
||||
isSelected ? 'ring-2 ring-teal-500 ring-inset rounded' : ''
|
||||
<>
|
||||
{zone.cells
|
||||
.filter((c) => c.row_index === row.index && c.col_type === 'spanning_header')
|
||||
.sort((a, b) => a.col_index - b.col_index)
|
||||
.map((spanCell) => {
|
||||
const colspan = spanCell.colspan || numCols
|
||||
const cellId = spanCell.cell_id
|
||||
const isSelected = selectedCell === cellId
|
||||
const cellColor = getCellColor(spanCell)
|
||||
const gridColStart = spanCell.col_index + 2
|
||||
const gridColEnd = gridColStart + colspan
|
||||
return (
|
||||
<div
|
||||
key={cellId}
|
||||
className={`border-b border-r border-gray-200 dark:border-gray-700 bg-blue-50/50 dark:bg-blue-900/10 flex items-center ${
|
||||
isSelected ? 'ring-2 ring-teal-500 ring-inset z-10' : ''
|
||||
}`}
|
||||
style={{ color: cellColor || undefined }}
|
||||
spellCheck={false}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
})()}
|
||||
</div>
|
||||
style={{ gridColumn: `${gridColStart} / ${gridColEnd}`, height: `${rowH}px` }}
|
||||
>
|
||||
{cellColor && (
|
||||
<span className="flex-shrink-0 w-1.5 self-stretch rounded-l-sm" style={{ backgroundColor: cellColor }} />
|
||||
)}
|
||||
<input
|
||||
id={`cell-${cellId}`}
|
||||
type="text"
|
||||
value={spanCell.text}
|
||||
onChange={(e) => onCellTextChange(cellId, e.target.value)}
|
||||
onFocus={() => onSelectCell(cellId)}
|
||||
onKeyDown={(e) => handleKeyDown(e, cellId)}
|
||||
className="w-full px-3 py-1 bg-transparent border-0 outline-none text-center"
|
||||
style={{ color: cellColor || undefined }}
|
||||
spellCheck={false}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
})}
|
||||
</>
|
||||
) : (
|
||||
zone.columns.map((col) => {
|
||||
const cell = cellMap.get(`${row.index}_${col.index}`)
|
||||
@@ -485,7 +488,13 @@ export function GridTable({
|
||||
} ${isMultiSelected ? 'bg-teal-50/60 dark:bg-teal-900/20' : ''} ${
|
||||
isLowConf && !isMultiSelected ? 'bg-amber-50/50 dark:bg-amber-900/10' : ''
|
||||
} ${row.is_header && !isMultiSelected ? 'bg-blue-50/50 dark:bg-blue-900/10' : ''}`}
|
||||
style={{ height: `${rowH}px` }}
|
||||
style={{
|
||||
height: `${rowH}px`,
|
||||
...(cell?.box_region?.bg_hex ? {
|
||||
backgroundColor: `${cell.box_region.bg_hex}12`,
|
||||
borderLeft: cell.box_region.border ? `3px solid ${cell.box_region.bg_hex}60` : undefined,
|
||||
} : {}),
|
||||
}}
|
||||
onContextMenu={(e) => {
|
||||
if (onSetCellColor) {
|
||||
e.preventDefault()
|
||||
@@ -501,53 +510,88 @@ export function GridTable({
|
||||
/>
|
||||
)}
|
||||
{/* Per-word colored display when not editing */}
|
||||
{hasColoredWords && !isSelected ? (
|
||||
<div
|
||||
className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
|
||||
onClick={(e) => {
|
||||
if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
|
||||
onToggleCellSelection(cellId)
|
||||
} else {
|
||||
onSelectCell(cellId)
|
||||
setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
|
||||
}
|
||||
}}
|
||||
>
|
||||
{cell!.word_boxes!.map((wb, i) => (
|
||||
<span
|
||||
key={i}
|
||||
style={
|
||||
wb.color_name && wb.color_name !== 'black'
|
||||
? { color: wb.color }
|
||||
: undefined
|
||||
}
|
||||
{(() => {
|
||||
const cellText = cell?.text ?? ''
|
||||
const isMultiLine = cellText.includes('\n')
|
||||
if (hasColoredWords && !isSelected) {
|
||||
return (
|
||||
<div
|
||||
className={`w-full px-2 cursor-text truncate ${isBold ? 'font-bold' : 'font-normal'}`}
|
||||
onClick={(e) => {
|
||||
if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
|
||||
onToggleCellSelection(cellId)
|
||||
} else {
|
||||
onSelectCell(cellId)
|
||||
setTimeout(() => document.getElementById(`cell-${cellId}`)?.focus(), 0)
|
||||
}
|
||||
}}
|
||||
>
|
||||
{wb.text}
|
||||
{i < cell!.word_boxes!.length - 1 ? ' ' : ''}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
) : (
|
||||
<input
|
||||
id={`cell-${cellId}`}
|
||||
type="text"
|
||||
value={cell?.text ?? ''}
|
||||
onChange={(e) => onCellTextChange(cellId, e.target.value)}
|
||||
onFocus={() => onSelectCell(cellId)}
|
||||
onClick={(e) => {
|
||||
if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
|
||||
e.preventDefault()
|
||||
onToggleCellSelection(cellId)
|
||||
}
|
||||
}}
|
||||
onKeyDown={(e) => handleKeyDown(e, cellId)}
|
||||
className={`w-full px-2 bg-transparent border-0 outline-none ${
|
||||
isBold ? 'font-bold' : 'font-normal'
|
||||
}`}
|
||||
style={{ color: cellColor || undefined }}
|
||||
spellCheck={false}
|
||||
/>
|
||||
)}
|
||||
{cell!.word_boxes!.map((wb, i) => (
|
||||
<span
|
||||
key={i}
|
||||
style={
|
||||
wb.color_name && wb.color_name !== 'black'
|
||||
? { color: wb.color }
|
||||
: undefined
|
||||
}
|
||||
>
|
||||
{wb.text}
|
||||
{i < cell!.word_boxes!.length - 1 ? ' ' : ''}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
if (isMultiLine) {
|
||||
return (
|
||||
<textarea
|
||||
id={`cell-${cellId}`}
|
||||
value={cellText}
|
||||
onChange={(e) => onCellTextChange(cellId, e.target.value)}
|
||||
onFocus={() => onSelectCell(cellId)}
|
||||
onClick={(e) => {
|
||||
if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
|
||||
e.preventDefault()
|
||||
onToggleCellSelection(cellId)
|
||||
}
|
||||
}}
|
||||
onKeyDown={(e) => {
|
||||
if (e.key === 'Tab') {
|
||||
e.preventDefault()
|
||||
onNavigate(cellId, e.shiftKey ? 'left' : 'right')
|
||||
}
|
||||
}}
|
||||
rows={cellText.split('\n').length}
|
||||
className={`w-full px-2 bg-transparent border-0 outline-none resize-none ${
|
||||
isBold ? 'font-bold' : 'font-normal'
|
||||
}`}
|
||||
style={{ color: cellColor || undefined }}
|
||||
spellCheck={false}
|
||||
/>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<input
|
||||
id={`cell-${cellId}`}
|
||||
type="text"
|
||||
value={cellText}
|
||||
onChange={(e) => onCellTextChange(cellId, e.target.value)}
|
||||
onFocus={() => onSelectCell(cellId)}
|
||||
onClick={(e) => {
|
||||
if ((e.metaKey || e.ctrlKey) && onToggleCellSelection) {
|
||||
e.preventDefault()
|
||||
onToggleCellSelection(cellId)
|
||||
}
|
||||
}}
|
||||
onKeyDown={(e) => handleKeyDown(e, cellId)}
|
||||
className={`w-full px-2 bg-transparent border-0 outline-none ${
|
||||
isBold ? 'font-bold' : 'font-normal'
|
||||
}`}
|
||||
style={{ color: cellColor || undefined }}
|
||||
spellCheck={false}
|
||||
/>
|
||||
)
|
||||
})()}
|
||||
</div>
|
||||
)
|
||||
})
|
||||
|
||||
@@ -73,6 +73,10 @@ export interface GridZone {
|
||||
header_rows: number[]
|
||||
layout_hint?: 'left_of_vsplit' | 'right_of_vsplit' | 'middle_of_vsplit'
|
||||
vsplit_group?: number
|
||||
box_layout_type?: 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
|
||||
box_grid_reviewed?: boolean
|
||||
box_bg_color?: string
|
||||
box_bg_hex?: string
|
||||
}
|
||||
|
||||
export interface BBox {
|
||||
@@ -122,6 +126,16 @@ export interface GridEditorCell {
|
||||
is_bold: boolean
|
||||
/** Manual color override: hex string or null to clear. */
|
||||
color_override?: string | null
|
||||
/** Number of columns this cell spans (merged cell). Default 1. */
|
||||
colspan?: number
|
||||
/** Source zone type when in unified grid. */
|
||||
source_zone_type?: 'content' | 'box'
|
||||
/** Box visual metadata for cells from box zones. */
|
||||
box_region?: {
|
||||
bg_hex?: string
|
||||
bg_color?: string
|
||||
border?: boolean
|
||||
}
|
||||
}
|
||||
|
||||
/** Layout dividers for the visual column/margin editor on the original image. */
|
||||
|
||||
@@ -81,8 +81,19 @@ export function useGridEditor(sessionId: string | null) {
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`,
|
||||
)
|
||||
if (res.status === 404) {
|
||||
// No grid yet — build it
|
||||
await buildGrid()
|
||||
// No grid yet — build it with current modes
|
||||
const params = new URLSearchParams()
|
||||
params.set('ipa_mode', ipaMode)
|
||||
params.set('syllable_mode', syllableMode)
|
||||
const buildRes = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
|
||||
{ method: 'POST' },
|
||||
)
|
||||
if (buildRes.ok) {
|
||||
const data: StructuredGrid = await buildRes.json()
|
||||
setGrid(data)
|
||||
setDirty(false)
|
||||
}
|
||||
return
|
||||
}
|
||||
if (!res.ok) {
|
||||
@@ -99,18 +110,48 @@ export function useGridEditor(sessionId: string | null) {
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [sessionId, buildGrid])
|
||||
// Only depends on sessionId — mode changes are handled by the
|
||||
// separate useEffect below, not by re-triggering loadGrid.
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [sessionId])
|
||||
|
||||
// Auto-rebuild when IPA or syllable mode changes (skip initial mount)
|
||||
const initialLoadDone = useRef(false)
|
||||
// Auto-rebuild when IPA or syllable mode changes (skip initial mount).
|
||||
// We call the API directly with the new values instead of going through
|
||||
// the buildGrid callback, which may still close over stale state due to
|
||||
// React's asynchronous state batching.
|
||||
const mountedRef = useRef(false)
|
||||
useEffect(() => {
|
||||
if (!initialLoadDone.current) {
|
||||
// Mark as initialized once the first grid is loaded
|
||||
if (grid) initialLoadDone.current = true
|
||||
if (!mountedRef.current) {
|
||||
// Skip the first trigger (component mount) — don't rebuild yet
|
||||
mountedRef.current = true
|
||||
return
|
||||
}
|
||||
// Mode changed after initial load — rebuild
|
||||
buildGrid()
|
||||
if (!sessionId) return
|
||||
const rebuild = async () => {
|
||||
setLoading(true)
|
||||
setError(null)
|
||||
try {
|
||||
const params = new URLSearchParams()
|
||||
params.set('ipa_mode', ipaMode)
|
||||
params.set('syllable_mode', syllableMode)
|
||||
const res = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-grid?${params}`,
|
||||
{ method: 'POST' },
|
||||
)
|
||||
if (!res.ok) {
|
||||
const data = await res.json().catch(() => ({}))
|
||||
throw new Error(data.detail || `HTTP ${res.status}`)
|
||||
}
|
||||
const data: StructuredGrid = await res.json()
|
||||
setGrid(data)
|
||||
setDirty(false)
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}
|
||||
rebuild()
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [ipaMode, syllableMode])
|
||||
|
||||
|
||||
@@ -7,6 +7,7 @@ interface SessionHeaderProps {
|
||||
sessionName: string
|
||||
activeCategory?: DocumentCategory
|
||||
isGroundTruth: boolean
|
||||
pageNumber?: number | null
|
||||
onUpdateCategory: (category: DocumentCategory) => void
|
||||
}
|
||||
|
||||
@@ -14,6 +15,7 @@ export function SessionHeader({
|
||||
sessionName,
|
||||
activeCategory,
|
||||
isGroundTruth,
|
||||
pageNumber,
|
||||
onUpdateCategory,
|
||||
}: SessionHeaderProps) {
|
||||
const [showCategoryPicker, setShowCategoryPicker] = useState(false)
|
||||
@@ -36,6 +38,11 @@ export function SessionHeader({
|
||||
>
|
||||
{catInfo ? `${catInfo.icon} ${catInfo.label}` : 'Kategorie setzen'}
|
||||
</button>
|
||||
{pageNumber != null && (
|
||||
<span className="text-xs px-2 py-0.5 rounded-full bg-gray-100 dark:bg-gray-700 border border-gray-200 dark:border-gray-600 text-gray-600 dark:text-gray-300">
|
||||
S. {pageNumber}
|
||||
</span>
|
||||
)}
|
||||
{isGroundTruth && (
|
||||
<span className="text-xs px-2 py-0.5 rounded-full bg-amber-50 dark:bg-amber-900/20 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
|
||||
GT
|
||||
|
||||
@@ -150,9 +150,16 @@ function GroupRow({
|
||||
{group.page_count} Seiten
|
||||
</div>
|
||||
</div>
|
||||
<span className="text-xs px-2 py-0.5 rounded-full bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 text-blue-600 dark:text-blue-400">
|
||||
Dokument
|
||||
</span>
|
||||
<div className="flex items-center gap-1.5">
|
||||
{group.sessions.some(s => s.is_ground_truth) && (
|
||||
<span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">
|
||||
GT {group.sessions.filter(s => s.is_ground_truth).length}/{group.sessions.length}
|
||||
</span>
|
||||
)}
|
||||
<span className="text-xs px-2 py-0.5 rounded-full bg-blue-50 dark:bg-blue-900/20 border border-blue-200 dark:border-blue-800 text-blue-600 dark:text-blue-400">
|
||||
Dokument
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{expanded && (
|
||||
@@ -179,6 +186,9 @@ function GroupRow({
|
||||
/>
|
||||
</div>
|
||||
<span className="truncate flex-1">S. {s.page_number || '?'}</span>
|
||||
{s.is_ground_truth && (
|
||||
<span className="text-[9px] px-1 py-0.5 rounded bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300">GT</span>
|
||||
)}
|
||||
<span className="text-[10px] text-gray-400">Step {s.current_step}</span>
|
||||
<button
|
||||
onClick={(e) => {
|
||||
@@ -298,7 +308,7 @@ function SessionRow({
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Category badge */}
|
||||
{/* Category + GT badge */}
|
||||
<div className="flex flex-col gap-1 items-end flex-shrink-0" onClick={(e) => e.stopPropagation()}>
|
||||
<button
|
||||
onClick={onToggleCategory}
|
||||
@@ -311,6 +321,11 @@ function SessionRow({
|
||||
>
|
||||
{catInfo ? `${catInfo.icon} ${catInfo.label}` : '+ Kategorie'}
|
||||
</button>
|
||||
{session.is_ground_truth && (
|
||||
<span className="text-[10px] px-1.5 py-0.5 rounded-full bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 text-amber-700 dark:text-amber-300" title="Ground Truth markiert">
|
||||
GT
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Actions */}
|
||||
|
||||
241
admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
Normal file
241
admin-lehrer/components/ocr-kombi/SpreadsheetView.tsx
Normal file
@@ -0,0 +1,241 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* SpreadsheetView — Fortune Sheet with multi-sheet support.
|
||||
*
|
||||
* Each zone (content + boxes) becomes its own Excel sheet tab,
|
||||
* so each can have independent column widths optimized for its content.
|
||||
*/
|
||||
|
||||
import { useMemo } from 'react'
|
||||
import dynamic from 'next/dynamic'
|
||||
|
||||
const Workbook = dynamic(
|
||||
() => import('@fortune-sheet/react').then((m) => m.Workbook),
|
||||
{ ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
|
||||
)
|
||||
|
||||
import '@fortune-sheet/react/dist/index.css'
|
||||
|
||||
import type { GridZone } from '@/components/grid-editor/types'
|
||||
|
||||
interface SpreadsheetViewProps {
|
||||
gridData: any
|
||||
height?: number
|
||||
}
|
||||
|
||||
/** No expansion — keep multi-line cells as single cells with \n and text-wrap. */
|
||||
|
||||
/** Convert a single zone to a Fortune Sheet sheet object. */
|
||||
function zoneToSheet(zone: GridZone, sheetIndex: number, isFirst: boolean): any {
|
||||
const isBox = zone.zone_type === 'box'
|
||||
const boxColor = (zone as any).box_bg_hex || ''
|
||||
|
||||
// Sheet name
|
||||
let name: string
|
||||
if (!isBox) {
|
||||
name = 'Vokabeln'
|
||||
} else {
|
||||
const firstText = zone.cells?.[0]?.text ?? `Box ${sheetIndex}`
|
||||
const cleaned = firstText.replace(/[^\w\s\u00C0-\u024F„"]/g, '').trim()
|
||||
name = cleaned.length > 25 ? cleaned.slice(0, 25) + '…' : cleaned || `Box ${sheetIndex}`
|
||||
}
|
||||
|
||||
const numCols = zone.columns?.length || 1
|
||||
const numRows = zone.rows?.length || 0
|
||||
const expandedCells = zone.cells || []
|
||||
|
||||
// Compute zone-wide median word height for font-size detection
|
||||
const allWordHeights = zone.cells
|
||||
.flatMap((c: any) => (c.word_boxes || []).map((wb: any) => wb.height || 0))
|
||||
.filter((h: number) => h > 0)
|
||||
const medianWordH = allWordHeights.length
|
||||
? [...allWordHeights].sort((a, b) => a - b)[Math.floor(allWordHeights.length / 2)]
|
||||
: 0
|
||||
|
||||
// Build celldata
|
||||
const celldata: any[] = []
|
||||
const merges: Record<string, any> = {}
|
||||
|
||||
for (const cell of expandedCells) {
|
||||
const r = cell.row_index
|
||||
const c = cell.col_index
|
||||
const text = cell.text ?? ''
|
||||
|
||||
// Row metadata
|
||||
const row = zone.rows?.find((rr) => rr.index === r)
|
||||
const isHeader = row?.is_header ?? false
|
||||
|
||||
// Font size detection from word_boxes
|
||||
const avgWbH = cell.word_boxes?.length
|
||||
? cell.word_boxes.reduce((s: number, wb: any) => s + (wb.height || 0), 0) / cell.word_boxes.length
|
||||
: 0
|
||||
const isLargerFont = avgWbH > 0 && medianWordH > 0 && avgWbH > medianWordH * 1.3
|
||||
|
||||
const v: any = { v: text, m: text }
|
||||
|
||||
// Bold: headers, is_bold, larger font
|
||||
if (cell.is_bold || isHeader || isLargerFont) {
|
||||
v.bl = 1
|
||||
}
|
||||
|
||||
// Larger font for box titles
|
||||
if (isLargerFont && isBox) {
|
||||
v.fs = 12
|
||||
}
|
||||
|
||||
// Multi-line text (bullets with \n): enable text wrap + vertical top align
|
||||
// Add bullet marker (•) if multi-line and no bullet present
|
||||
if (text.includes('\n') && !isHeader) {
|
||||
if (!text.startsWith('•') && !text.startsWith('-') && !text.startsWith('–') && r > 0) {
|
||||
text = '• ' + text
|
||||
v.v = text
|
||||
v.m = text
|
||||
}
|
||||
v.tb = '2' // text wrap
|
||||
v.vt = 0 // vertical align: top
|
||||
}
|
||||
|
||||
// Header row background
|
||||
if (isHeader) {
|
||||
v.bg = isBox ? `${boxColor || '#2563eb'}18` : '#f0f4ff'
|
||||
}
|
||||
|
||||
// Box cells: light tinted background
|
||||
if (isBox && !isHeader && boxColor) {
|
||||
v.bg = `${boxColor}08`
|
||||
}
|
||||
|
||||
// Text color from OCR
|
||||
const color = cell.color_override
|
||||
?? cell.word_boxes?.find((wb: any) => wb.color_name && wb.color_name !== 'black')?.color
|
||||
if (color) v.fc = color
|
||||
|
||||
celldata.push({ r, c, v })
|
||||
|
||||
// Colspan → merge
|
||||
const colspan = cell.colspan || 0
|
||||
if (colspan > 1 || cell.col_type === 'spanning_header') {
|
||||
const cs = colspan || numCols
|
||||
merges[`${r}_${c}`] = { r, c, rs: 1, cs }
|
||||
}
|
||||
}
|
||||
|
||||
// Column widths — auto-fit based on longest text
|
||||
const columnlen: Record<string, number> = {}
|
||||
for (const col of (zone.columns || [])) {
|
||||
const colCells = expandedCells.filter(
|
||||
(c: any) => c.col_index === col.index && c.col_type !== 'spanning_header'
|
||||
)
|
||||
let maxTextLen = 0
|
||||
for (const c of colCells) {
|
||||
const len = (c.text ?? '').length
|
||||
if (len > maxTextLen) maxTextLen = len
|
||||
}
|
||||
const autoWidth = Math.max(60, maxTextLen * 7.5 + 16)
|
||||
const pxW = (col.x_max_px ?? 0) - (col.x_min_px ?? 0)
|
||||
const scaledPxW = Math.max(60, Math.round(pxW * (numCols <= 2 ? 0.6 : 0.4)))
|
||||
columnlen[String(col.index)] = Math.round(Math.max(autoWidth, scaledPxW))
|
||||
}
|
||||
|
||||
// Row heights — taller for multi-line cells
|
||||
const rowlen: Record<string, number> = {}
|
||||
for (const row of (zone.rows || [])) {
|
||||
const rowCells = expandedCells.filter((c: any) => c.row_index === row.index)
|
||||
const maxLines = Math.max(1, ...rowCells.map((c: any) => (c.text ?? '').split('\n').length))
|
||||
const baseH = 24
|
||||
rowlen[String(row.index)] = Math.max(baseH, baseH * maxLines)
|
||||
}
|
||||
|
||||
// Border info
|
||||
const borderInfo: any[] = []
|
||||
|
||||
// Box: colored outside border
|
||||
if (isBox && boxColor && numRows > 0 && numCols > 0) {
|
||||
borderInfo.push({
|
||||
rangeType: 'range',
|
||||
borderType: 'border-outside',
|
||||
color: boxColor,
|
||||
style: 5,
|
||||
range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
|
||||
})
|
||||
borderInfo.push({
|
||||
rangeType: 'range',
|
||||
borderType: 'border-inside',
|
||||
color: `${boxColor}40`,
|
||||
style: 1,
|
||||
range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
|
||||
})
|
||||
}
|
||||
|
||||
// Content zone: light grid lines
|
||||
if (!isBox && numRows > 0 && numCols > 0) {
|
||||
borderInfo.push({
|
||||
rangeType: 'range',
|
||||
borderType: 'border-all',
|
||||
color: '#e5e7eb',
|
||||
style: 1,
|
||||
range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
|
||||
})
|
||||
}
|
||||
|
||||
return {
|
||||
name,
|
||||
id: `zone_${zone.zone_index}`,
|
||||
celldata,
|
||||
row: numRows,
|
||||
column: Math.max(numCols, 1),
|
||||
status: isFirst ? 1 : 0,
|
||||
color: isBox ? boxColor : undefined,
|
||||
config: {
|
||||
merge: Object.keys(merges).length > 0 ? merges : undefined,
|
||||
columnlen,
|
||||
rowlen,
|
||||
borderInfo: borderInfo.length > 0 ? borderInfo : undefined,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
export function SpreadsheetView({ gridData, height = 600 }: SpreadsheetViewProps) {
|
||||
const sheets = useMemo(() => {
|
||||
if (!gridData?.zones) return []
|
||||
|
||||
const sorted = [...gridData.zones].sort((a: GridZone, b: GridZone) => {
|
||||
if (a.zone_type === 'content' && b.zone_type !== 'content') return -1
|
||||
if (a.zone_type !== 'content' && b.zone_type === 'content') return 1
|
||||
return (a.bbox_px?.y ?? 0) - (b.bbox_px?.y ?? 0)
|
||||
})
|
||||
|
||||
return sorted
|
||||
.filter((z: GridZone) => z.cells && z.cells.length > 0)
|
||||
.map((z: GridZone, i: number) => zoneToSheet(z, i, i === 0))
|
||||
}, [gridData])
|
||||
|
||||
const maxRows = Math.max(0, ...sheets.map((s: any) => s.row || 0))
|
||||
const estimatedHeight = Math.max(height, maxRows * 26 + 80)
|
||||
|
||||
if (sheets.length === 0) {
|
||||
return <div className="p-4 text-center text-gray-400">Keine Daten für Spreadsheet.</div>
|
||||
}
|
||||
|
||||
return (
|
||||
<div style={{ width: '100%', height: `${estimatedHeight}px` }}>
|
||||
<Workbook
|
||||
data={sheets}
|
||||
lang="en"
|
||||
showToolbar
|
||||
showFormulaBar={false}
|
||||
showSheetTabs
|
||||
toolbarItems={[
|
||||
'undo', 'redo', '|',
|
||||
'font-bold', 'font-italic', 'font-strikethrough', '|',
|
||||
'font-color', 'background', '|',
|
||||
'font-size', '|',
|
||||
'horizontal-align', 'vertical-align', '|',
|
||||
'text-wrap', 'merge-cell', '|',
|
||||
'border',
|
||||
]}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
110
admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
Normal file
110
admin-lehrer/components/ocr-kombi/StepAnsicht.tsx
Normal file
@@ -0,0 +1,110 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* StepAnsicht — Excel-like Spreadsheet View.
|
||||
*
|
||||
* Left: Original scan with OCR word overlay
|
||||
* Right: Fortune Sheet spreadsheet with multi-sheet tabs per zone
|
||||
*/
|
||||
|
||||
import { useEffect, useRef, useState } from 'react'
|
||||
import dynamic from 'next/dynamic'
|
||||
|
||||
const SpreadsheetView = dynamic(
|
||||
() => import('./SpreadsheetView').then((m) => m.SpreadsheetView),
|
||||
{ ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
|
||||
)
|
||||
|
||||
const KLAUSUR_API = '/klausur-api'
|
||||
|
||||
interface StepAnsichtProps {
|
||||
sessionId: string | null
|
||||
onNext: () => void
|
||||
}
|
||||
|
||||
export function StepAnsicht({ sessionId, onNext }: StepAnsichtProps) {
|
||||
const [gridData, setGridData] = useState<any>(null)
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const leftRef = useRef<HTMLDivElement>(null)
|
||||
const [leftHeight, setLeftHeight] = useState(600)
|
||||
|
||||
// Load grid data on mount
|
||||
useEffect(() => {
|
||||
if (!sessionId) return
|
||||
;(async () => {
|
||||
try {
|
||||
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
|
||||
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
||||
setGridData(await res.json())
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : 'Fehler beim Laden')
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
})()
|
||||
}, [sessionId])
|
||||
|
||||
// Track left panel height
|
||||
useEffect(() => {
|
||||
if (!leftRef.current) return
|
||||
const ro = new ResizeObserver(([e]) => setLeftHeight(e.contentRect.height))
|
||||
ro.observe(leftRef.current)
|
||||
return () => ro.disconnect()
|
||||
}, [])
|
||||
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="flex items-center justify-center py-16">
|
||||
<div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
|
||||
<span className="ml-3 text-gray-500">Lade Spreadsheet...</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (error || !gridData) {
|
||||
return (
|
||||
<div className="p-8 text-center">
|
||||
<p className="text-red-500 mb-4">{error || 'Keine Grid-Daten.'}</p>
|
||||
<button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg">Weiter →</button>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-3">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h3 className="text-lg font-semibold text-gray-900 dark:text-white">Ansicht — Spreadsheet</h3>
|
||||
<p className="text-sm text-gray-500 dark:text-gray-400">
|
||||
Jede Zone als eigenes Sheet-Tab. Spaltenbreiten pro Sheet optimiert.
|
||||
</p>
|
||||
</div>
|
||||
<button onClick={onNext} className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 text-sm font-medium">
|
||||
Weiter →
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Split view */}
|
||||
<div className="flex gap-2">
|
||||
{/* LEFT: Original + OCR overlay */}
|
||||
<div ref={leftRef} className="w-1/3 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900 flex-shrink-0">
|
||||
<div className="px-2 py-1 bg-black/60 text-white text-[10px] font-medium">Original + OCR</div>
|
||||
{sessionId && (
|
||||
<img
|
||||
src={`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/words-overlay`}
|
||||
alt="Original + OCR"
|
||||
className="w-full h-auto"
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* RIGHT: Fortune Sheet — height adapts to content */}
|
||||
<div className="flex-1 border border-gray-300 dark:border-gray-600 rounded-lg overflow-hidden bg-white dark:bg-gray-900">
|
||||
<SpreadsheetView gridData={gridData} height={Math.max(700, leftHeight)} />
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
283
admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
Normal file
283
admin-lehrer/components/ocr-kombi/StepBoxGridReview.tsx
Normal file
@@ -0,0 +1,283 @@
|
||||
'use client'
|
||||
|
||||
import { useCallback, useEffect, useRef, useState } from 'react'
|
||||
import { useGridEditor } from '@/components/grid-editor/useGridEditor'
|
||||
import type { GridZone } from '@/components/grid-editor/types'
|
||||
import { GridTable } from '@/components/grid-editor/GridTable'
|
||||
|
||||
const KLAUSUR_API = '/klausur-api'
|
||||
|
||||
type BoxLayoutType = 'flowing' | 'columnar' | 'bullet_list' | 'header_only'
|
||||
|
||||
const LAYOUT_LABELS: Record<BoxLayoutType, string> = {
|
||||
flowing: 'Fließtext',
|
||||
columnar: 'Tabelle/Spalten',
|
||||
bullet_list: 'Aufzählung',
|
||||
header_only: 'Überschrift',
|
||||
}
|
||||
|
||||
interface StepBoxGridReviewProps {
|
||||
sessionId: string | null
|
||||
onNext: () => void
|
||||
}
|
||||
|
||||
export function StepBoxGridReview({ sessionId, onNext }: StepBoxGridReviewProps) {
|
||||
const {
|
||||
grid,
|
||||
loading,
|
||||
saving,
|
||||
error,
|
||||
dirty,
|
||||
selectedCell,
|
||||
setSelectedCell,
|
||||
loadGrid,
|
||||
saveGrid,
|
||||
updateCellText,
|
||||
toggleColumnBold,
|
||||
toggleRowHeader,
|
||||
undo,
|
||||
redo,
|
||||
canUndo,
|
||||
canRedo,
|
||||
getAdjacentCell,
|
||||
commitUndoPoint,
|
||||
selectedCells,
|
||||
toggleCellSelection,
|
||||
clearCellSelection,
|
||||
toggleSelectedBold,
|
||||
setCellColor,
|
||||
deleteColumn,
|
||||
addColumn,
|
||||
deleteRow,
|
||||
addRow,
|
||||
} = useGridEditor(sessionId)
|
||||
|
||||
const [building, setBuilding] = useState(false)
|
||||
const [buildError, setBuildError] = useState<string | null>(null)
|
||||
|
||||
// Load grid on mount
|
||||
useEffect(() => {
|
||||
if (sessionId) loadGrid()
|
||||
}, [sessionId]) // eslint-disable-line react-hooks/exhaustive-deps
|
||||
|
||||
// Get box zones
|
||||
const boxZones: GridZone[] = (grid?.zones || []).filter(
|
||||
(z: GridZone) => z.zone_type === 'box'
|
||||
)
|
||||
|
||||
// Build box grids via backend
|
||||
const buildBoxGrids = useCallback(async (overrides?: Record<string, string>) => {
|
||||
if (!sessionId) return
|
||||
setBuilding(true)
|
||||
setBuildError(null)
|
||||
try {
|
||||
const res = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/build-box-grids`,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ overrides: overrides || {} }),
|
||||
},
|
||||
)
|
||||
if (!res.ok) {
|
||||
const data = await res.json().catch(() => ({}))
|
||||
throw new Error(data.detail || `HTTP ${res.status}`)
|
||||
}
|
||||
await loadGrid()
|
||||
} catch (e) {
|
||||
setBuildError(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
setBuilding(false)
|
||||
}
|
||||
}, [sessionId, loadGrid])
|
||||
|
||||
// Handle layout type change for a specific box zone
|
||||
const changeLayoutType = useCallback(async (boxIdx: number, layoutType: string) => {
|
||||
await buildBoxGrids({ [String(boxIdx)]: layoutType })
|
||||
}, [buildBoxGrids])
|
||||
|
||||
// Auto-build once on first load if box zones have no cells
|
||||
const autoBuildDone = useRef(false)
|
||||
useEffect(() => {
|
||||
if (!grid || loading || building || autoBuildDone.current) return
|
||||
const needsBuild = boxZones.some(z => !z.cells || z.cells.length === 0)
|
||||
if (needsBuild && sessionId) {
|
||||
autoBuildDone.current = true
|
||||
buildBoxGrids()
|
||||
}
|
||||
}, [grid, loading]) // eslint-disable-line react-hooks/exhaustive-deps
|
||||
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="flex items-center justify-center py-16">
|
||||
<div className="w-8 h-8 border-4 border-teal-500 border-t-transparent rounded-full animate-spin" />
|
||||
<span className="ml-3 text-gray-500">Lade Grid...</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// No boxes after build attempt — skip step
|
||||
if (!building && boxZones.length === 0) {
|
||||
return (
|
||||
<div className="bg-white dark:bg-gray-800 rounded-xl border border-gray-200 dark:border-gray-700 p-8 text-center">
|
||||
<div className="text-4xl mb-3">📦</div>
|
||||
<h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-2">
|
||||
Keine Boxen erkannt
|
||||
</h3>
|
||||
<p className="text-gray-500 dark:text-gray-400 mb-6">
|
||||
Auf dieser Seite wurden keine eingebetteten Boxen (Grammatik-Tipps, Übungen etc.) erkannt.
|
||||
</p>
|
||||
<button
|
||||
onClick={onNext}
|
||||
className="px-6 py-2.5 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors font-medium"
|
||||
>
|
||||
Weiter →
|
||||
</button>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-4">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h3 className="text-lg font-semibold text-gray-900 dark:text-white">
|
||||
Box-Review ({boxZones.length} {boxZones.length === 1 ? 'Box' : 'Boxen'})
|
||||
</h3>
|
||||
<p className="text-sm text-gray-500 dark:text-gray-400">
|
||||
Eingebettete Boxen prüfen und korrigieren. Layout-Typ kann pro Box angepasst werden.
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
{dirty && (
|
||||
<button
|
||||
onClick={saveGrid}
|
||||
disabled={saving}
|
||||
className="px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition-colors text-sm font-medium disabled:opacity-50"
|
||||
>
|
||||
{saving ? 'Speichere...' : 'Speichern'}
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
onClick={() => buildBoxGrids()}
|
||||
disabled={building}
|
||||
className="px-4 py-2 bg-amber-600 text-white rounded-lg hover:bg-amber-700 transition-colors text-sm font-medium disabled:opacity-50"
|
||||
>
|
||||
{building ? 'Verarbeite...' : 'Alle Boxen neu aufbauen'}
|
||||
</button>
|
||||
<button
|
||||
onClick={async () => {
|
||||
if (dirty) await saveGrid()
|
||||
onNext()
|
||||
}}
|
||||
className="px-5 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors text-sm font-medium"
|
||||
>
|
||||
Weiter →
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Errors */}
|
||||
{(error || buildError) && (
|
||||
<div className="p-3 bg-red-50 dark:bg-red-900/30 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-300 text-sm">
|
||||
{error || buildError}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{building && (
|
||||
<div className="flex items-center gap-3 p-4 bg-amber-50 dark:bg-amber-900/20 border border-amber-200 dark:border-amber-800 rounded-lg">
|
||||
<div className="w-5 h-5 border-2 border-amber-500 border-t-transparent rounded-full animate-spin" />
|
||||
<span className="text-amber-700 dark:text-amber-300 text-sm">Box-Grids werden aufgebaut...</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Box zones */}
|
||||
{boxZones.map((zone, boxIdx) => {
|
||||
const boxColor = zone.box_bg_hex || '#d97706' // amber fallback
|
||||
const boxColorName = zone.box_bg_color || 'box'
|
||||
return (
|
||||
<div
|
||||
key={zone.zone_index}
|
||||
className="bg-white dark:bg-gray-800 rounded-xl overflow-hidden"
|
||||
style={{ border: `3px solid ${boxColor}` }}
|
||||
>
|
||||
{/* Box header */}
|
||||
<div
|
||||
className="flex items-center justify-between px-4 py-3 border-b"
|
||||
style={{ backgroundColor: `${boxColor}15`, borderColor: `${boxColor}30` }}
|
||||
>
|
||||
<div className="flex items-center gap-3">
|
||||
<div
|
||||
className="w-8 h-8 rounded-lg flex items-center justify-center text-white text-sm font-bold"
|
||||
style={{ backgroundColor: boxColor }}
|
||||
>
|
||||
{boxIdx + 1}
|
||||
</div>
|
||||
<div>
|
||||
<span className="font-medium text-gray-900 dark:text-white">
|
||||
Box {boxIdx + 1}
|
||||
</span>
|
||||
<span className="text-xs text-gray-500 dark:text-gray-400 ml-2">
|
||||
{zone.bbox_px?.w}x{zone.bbox_px?.h}px
|
||||
{zone.cells?.length ? ` | ${zone.cells.length} Zellen` : ''}
|
||||
{zone.box_layout_type ? ` | ${LAYOUT_LABELS[zone.box_layout_type as BoxLayoutType] || zone.box_layout_type}` : ''}
|
||||
{boxColorName !== 'box' ? ` | ${boxColorName}` : ''}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<label className="text-xs text-gray-500 dark:text-gray-400">Layout:</label>
|
||||
<select
|
||||
value={zone.box_layout_type || 'flowing'}
|
||||
onChange={(e) => changeLayoutType(boxIdx, e.target.value)}
|
||||
disabled={building}
|
||||
className="text-xs px-2 py-1 rounded border border-gray-300 dark:border-gray-600 bg-white dark:bg-gray-700 text-gray-700 dark:text-gray-200"
|
||||
>
|
||||
{Object.entries(LAYOUT_LABELS).map(([key, label]) => (
|
||||
<option key={key} value={key}>{label}</option>
|
||||
))}
|
||||
</select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Box grid table */}
|
||||
<div className="p-3">
|
||||
{zone.cells && zone.cells.length > 0 ? (
|
||||
<GridTable
|
||||
zone={zone}
|
||||
selectedCell={selectedCell}
|
||||
selectedCells={selectedCells}
|
||||
onSelectCell={setSelectedCell}
|
||||
onCellTextChange={updateCellText}
|
||||
onToggleColumnBold={toggleColumnBold}
|
||||
onToggleRowHeader={toggleRowHeader}
|
||||
onNavigate={(cellId, dir) => {
|
||||
const next = getAdjacentCell(cellId, dir)
|
||||
if (next) setSelectedCell(next)
|
||||
}}
|
||||
onDeleteColumn={deleteColumn}
|
||||
onAddColumn={addColumn}
|
||||
onDeleteRow={deleteRow}
|
||||
onAddRow={addRow}
|
||||
onToggleCellSelection={toggleCellSelection}
|
||||
onSetCellColor={setCellColor}
|
||||
/>
|
||||
) : (
|
||||
<div className="text-center py-8 text-gray-400">
|
||||
<p className="text-sm">Keine Zellen erkannt.</p>
|
||||
<button
|
||||
onClick={() => buildBoxGrids({ [String(boxIdx)]: 'flowing' })}
|
||||
className="mt-2 text-xs text-amber-600 hover:text-amber-700"
|
||||
>
|
||||
Als Fließtext verarbeiten
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -32,8 +32,10 @@ export function StepGridBuild({ sessionId, onNext }: StepGridBuildProps) {
|
||||
const res = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/grid-editor`)
|
||||
if (res.ok) {
|
||||
const data = await res.json()
|
||||
if (data.grid_shape) {
|
||||
setResult({ rows: data.grid_shape.rows, cols: data.grid_shape.cols, cells: data.grid_shape.total_cells })
|
||||
// Use grid-editor summary (accurate zone-based counts)
|
||||
const summary = data.summary
|
||||
if (summary) {
|
||||
setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
|
||||
return
|
||||
}
|
||||
}
|
||||
@@ -57,8 +59,14 @@ export function StepGridBuild({ sessionId, onNext }: StepGridBuildProps) {
|
||||
throw new Error(data.detail || `Grid-Build fehlgeschlagen (${res.status})`)
|
||||
}
|
||||
const data = await res.json()
|
||||
const shape = data.grid_shape || { rows: 0, cols: 0, total_cells: 0 }
|
||||
setResult({ rows: shape.rows, cols: shape.cols, cells: shape.total_cells })
|
||||
// Use grid-editor summary (zone-based, more accurate than word_result.grid_shape)
|
||||
const summary = data.summary
|
||||
if (summary) {
|
||||
setResult({ rows: summary.total_rows || 0, cols: summary.total_columns || 0, cells: summary.total_cells || 0 })
|
||||
} else {
|
||||
const shape = data.grid_shape || { rows: 0, cols: 0, total_cells: 0 }
|
||||
setResult({ rows: shape.rows, cols: shape.cols, cells: shape.total_cells })
|
||||
}
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
|
||||
@@ -1,6 +1,10 @@
|
||||
'use client'
|
||||
|
||||
import { useState } from 'react'
|
||||
import { useCallback, useEffect, useRef, useState } from 'react'
|
||||
import { useGridEditor } from '@/components/grid-editor/useGridEditor'
|
||||
import { GridTable } from '@/components/grid-editor/GridTable'
|
||||
import { ImageLayoutEditor } from '@/components/grid-editor/ImageLayoutEditor'
|
||||
import type { GridZone } from '@/components/grid-editor/types'
|
||||
|
||||
const KLAUSUR_API = '/klausur-api'
|
||||
|
||||
@@ -12,22 +16,104 @@ interface StepGroundTruthProps {
|
||||
}
|
||||
|
||||
/**
|
||||
* Step 11: Ground Truth marking.
|
||||
* Saves the current grid as reference data for regression tests.
|
||||
* Step 12: Ground Truth marking.
|
||||
*
|
||||
* Shows the full Grid-Review view (original image + table) so the user
|
||||
* can verify the final result before marking as Ground Truth reference.
|
||||
*/
|
||||
export function StepGroundTruth({ sessionId, isGroundTruth, onMarked, gridSaveRef }: StepGroundTruthProps) {
|
||||
const [saving, setSaving] = useState(false)
|
||||
const {
|
||||
grid,
|
||||
loading,
|
||||
saving,
|
||||
error,
|
||||
dirty,
|
||||
selectedCell,
|
||||
selectedCells,
|
||||
setSelectedCell,
|
||||
loadGrid,
|
||||
saveGrid,
|
||||
updateCellText,
|
||||
toggleColumnBold,
|
||||
toggleRowHeader,
|
||||
undo,
|
||||
redo,
|
||||
canUndo,
|
||||
canRedo,
|
||||
getAdjacentCell,
|
||||
deleteColumn,
|
||||
addColumn,
|
||||
deleteRow,
|
||||
addRow,
|
||||
toggleCellSelection,
|
||||
clearCellSelection,
|
||||
toggleSelectedBold,
|
||||
setCellColor,
|
||||
} = useGridEditor(sessionId)
|
||||
|
||||
const [showImage, setShowImage] = useState(true)
|
||||
const [zoom, setZoom] = useState(100)
|
||||
const [markSaving, setMarkSaving] = useState(false)
|
||||
const [message, setMessage] = useState('')
|
||||
|
||||
// Expose save function via ref
|
||||
useEffect(() => {
|
||||
if (gridSaveRef) {
|
||||
gridSaveRef.current = async () => {
|
||||
if (dirty) await saveGrid()
|
||||
}
|
||||
return () => { gridSaveRef.current = null }
|
||||
}
|
||||
}, [gridSaveRef, dirty, saveGrid])
|
||||
|
||||
// Load grid on mount
|
||||
useEffect(() => {
|
||||
if (sessionId) loadGrid()
|
||||
}, [sessionId, loadGrid])
|
||||
|
||||
// Keyboard shortcuts
|
||||
useEffect(() => {
|
||||
const handler = (e: KeyboardEvent) => {
|
||||
if ((e.metaKey || e.ctrlKey) && e.key === 'z' && !e.shiftKey) {
|
||||
e.preventDefault(); undo()
|
||||
} else if ((e.metaKey || e.ctrlKey) && e.key === 'z' && e.shiftKey) {
|
||||
e.preventDefault(); redo()
|
||||
} else if ((e.metaKey || e.ctrlKey) && e.key === 's') {
|
||||
e.preventDefault(); saveGrid()
|
||||
} else if ((e.metaKey || e.ctrlKey) && e.key === 'b') {
|
||||
e.preventDefault()
|
||||
if (selectedCells.size > 0) toggleSelectedBold()
|
||||
} else if (e.key === 'Escape') {
|
||||
clearCellSelection()
|
||||
}
|
||||
}
|
||||
window.addEventListener('keydown', handler)
|
||||
return () => window.removeEventListener('keydown', handler)
|
||||
}, [undo, redo, saveGrid, selectedCells, toggleSelectedBold, clearCellSelection])
|
||||
|
||||
const handleNavigate = useCallback(
|
||||
(cellId: string, direction: 'up' | 'down' | 'left' | 'right') => {
|
||||
const target = getAdjacentCell(cellId, direction)
|
||||
if (target) {
|
||||
setSelectedCell(target)
|
||||
setTimeout(() => {
|
||||
const el = document.getElementById(`cell-${target}`)
|
||||
if (el) {
|
||||
el.focus()
|
||||
if (el instanceof HTMLInputElement) el.select()
|
||||
}
|
||||
}, 0)
|
||||
}
|
||||
},
|
||||
[getAdjacentCell, setSelectedCell],
|
||||
)
|
||||
|
||||
const handleMark = async () => {
|
||||
if (!sessionId) return
|
||||
setSaving(true)
|
||||
setMarkSaving(true)
|
||||
setMessage('')
|
||||
try {
|
||||
// Auto-save grid editor before marking
|
||||
if (gridSaveRef.current) {
|
||||
await gridSaveRef.current()
|
||||
}
|
||||
if (dirty) await saveGrid()
|
||||
const res = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/mark-ground-truth?pipeline=kombi`,
|
||||
{ method: 'POST' },
|
||||
@@ -42,33 +128,168 @@ export function StepGroundTruth({ sessionId, isGroundTruth, onMarked, gridSaveRe
|
||||
} catch (e) {
|
||||
setMessage(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
setSaving(false)
|
||||
setMarkSaving(false)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-4 p-6 bg-amber-50 dark:bg-amber-900/10 rounded-xl border border-amber-200 dark:border-amber-800">
|
||||
<h3 className="text-sm font-medium text-amber-700 dark:text-amber-300">
|
||||
Ground Truth
|
||||
</h3>
|
||||
<p className="text-sm text-amber-600 dark:text-amber-400">
|
||||
Markiert die aktuelle Grid-Ausgabe als Referenz fuer Regressionstests.
|
||||
{isGroundTruth && ' Diese Session ist bereits als Ground Truth markiert.'}
|
||||
</p>
|
||||
if (!sessionId) {
|
||||
return <div className="text-center py-12 text-gray-400">Keine Session ausgewaehlt.</div>
|
||||
}
|
||||
|
||||
<button
|
||||
onClick={handleMark}
|
||||
disabled={saving}
|
||||
className="px-4 py-2 text-sm bg-amber-600 text-white rounded-lg hover:bg-amber-700 disabled:opacity-50"
|
||||
>
|
||||
{saving ? 'Speichere...' : isGroundTruth ? 'Ground Truth aktualisieren' : 'Als Ground Truth markieren'}
|
||||
</button>
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="flex items-center justify-center py-16">
|
||||
<div className="flex items-center gap-3 text-gray-500 dark:text-gray-400">
|
||||
<svg className="w-5 h-5 animate-spin" fill="none" viewBox="0 0 24 24">
|
||||
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
|
||||
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
|
||||
</svg>
|
||||
Grid wird geladen...
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (error) {
|
||||
return (
|
||||
<div className="bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-4">
|
||||
<p className="text-sm text-red-700 dark:text-red-300">Fehler: {error}</p>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (!grid || !grid.zones.length) {
|
||||
return <div className="text-center py-12 text-gray-400">Kein Grid vorhanden.</div>
|
||||
}
|
||||
|
||||
const imageUrl = `${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/image/cropped`
|
||||
|
||||
return (
|
||||
<div className="space-y-3">
|
||||
{/* GT Header Bar */}
|
||||
<div className="flex items-center justify-between p-3 bg-amber-50 dark:bg-amber-900/10 rounded-xl border border-amber-200 dark:border-amber-800">
|
||||
<div>
|
||||
<h3 className="text-sm font-medium text-amber-700 dark:text-amber-300">
|
||||
Ground Truth
|
||||
{isGroundTruth && <span className="ml-2 text-xs font-normal text-amber-500">(bereits markiert)</span>}
|
||||
</h3>
|
||||
<p className="text-xs text-amber-600 dark:text-amber-400 mt-0.5">
|
||||
Pruefen Sie das Ergebnis und markieren Sie es als Referenz fuer Regressionstests.
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
{dirty && (
|
||||
<button
|
||||
onClick={saveGrid}
|
||||
disabled={saving}
|
||||
className="px-3 py-1.5 text-xs bg-teal-600 text-white rounded-lg hover:bg-teal-700 disabled:opacity-50"
|
||||
>
|
||||
{saving ? 'Speichere...' : 'Speichern'}
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
onClick={handleMark}
|
||||
disabled={markSaving}
|
||||
className="px-4 py-1.5 text-xs bg-amber-600 text-white rounded-lg hover:bg-amber-700 disabled:opacity-50"
|
||||
>
|
||||
{markSaving ? 'Speichere...' : isGroundTruth ? 'GT aktualisieren' : 'Als Ground Truth markieren'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{message && (
|
||||
<div className={`text-sm ${message.includes('fehlgeschlagen') ? 'text-red-500' : 'text-amber-600 dark:text-amber-400'}`}>
|
||||
<div className={`text-sm p-2 rounded ${message.includes('fehlgeschlagen') ? 'text-red-500 bg-red-50 dark:bg-red-900/20' : 'text-amber-600 dark:text-amber-400 bg-amber-50 dark:bg-amber-900/10'}`}>
|
||||
{message}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Stats */}
|
||||
<div className="flex items-center gap-4 text-xs flex-wrap">
|
||||
<span className="text-gray-500 dark:text-gray-400">
|
||||
{grid.summary.total_zones} Zone(n), {grid.summary.total_columns} Spalten,{' '}
|
||||
{grid.summary.total_rows} Zeilen, {grid.summary.total_cells} Zellen
|
||||
</span>
|
||||
<button
|
||||
onClick={() => setShowImage(!showImage)}
|
||||
className={`px-2.5 py-1 rounded text-xs border transition-colors ${
|
||||
showImage
|
||||
? 'bg-teal-50 dark:bg-teal-900/30 border-teal-200 dark:border-teal-700 text-teal-700 dark:text-teal-300'
|
||||
: 'bg-gray-50 dark:bg-gray-800 border-gray-200 dark:border-gray-700 text-gray-500 dark:text-gray-400'
|
||||
}`}
|
||||
>
|
||||
{showImage ? 'Bild ausblenden' : 'Bild einblenden'}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Split View: Image left + Grid right */}
|
||||
<div className={showImage ? 'grid grid-cols-2 gap-3' : ''} style={{ minHeight: '55vh' }}>
|
||||
{showImage && (
|
||||
<ImageLayoutEditor
|
||||
imageUrl={imageUrl}
|
||||
zones={grid.zones}
|
||||
imageWidth={grid.image_width}
|
||||
layoutDividers={grid.layout_dividers}
|
||||
zoom={zoom}
|
||||
onZoomChange={setZoom}
|
||||
onColumnDividerMove={() => {}}
|
||||
onHorizontalsChange={() => {}}
|
||||
onCommitUndo={() => {}}
|
||||
onSplitColumnAt={() => {}}
|
||||
onDeleteColumn={() => {}}
|
||||
/>
|
||||
)}
|
||||
|
||||
<div className="space-y-3">
|
||||
{(() => {
|
||||
const groups: GridZone[][] = []
|
||||
for (const zone of grid.zones) {
|
||||
const prev = groups[groups.length - 1]
|
||||
if (prev && zone.vsplit_group != null && prev[0].vsplit_group === zone.vsplit_group) {
|
||||
prev.push(zone)
|
||||
} else {
|
||||
groups.push([zone])
|
||||
}
|
||||
}
|
||||
return groups.map((group) => (
|
||||
<div key={group[0].vsplit_group ?? group[0].zone_index}>
|
||||
<div className={`${group.length > 1 ? 'flex gap-2' : ''}`}>
|
||||
{group.map((zone) => (
|
||||
<div
|
||||
key={zone.zone_index}
|
||||
className={`${group.length > 1 ? 'flex-1 min-w-0' : ''} bg-white dark:bg-gray-800 rounded-lg border border-gray-200 dark:border-gray-700`}
|
||||
>
|
||||
<GridTable
|
||||
zone={zone}
|
||||
layoutMetrics={grid.layout_metrics}
|
||||
selectedCell={selectedCell}
|
||||
selectedCells={selectedCells}
|
||||
onSelectCell={setSelectedCell}
|
||||
onToggleCellSelection={toggleCellSelection}
|
||||
onCellTextChange={updateCellText}
|
||||
onToggleColumnBold={toggleColumnBold}
|
||||
onToggleRowHeader={toggleRowHeader}
|
||||
onNavigate={handleNavigate}
|
||||
onDeleteColumn={deleteColumn}
|
||||
onAddColumn={addColumn}
|
||||
onDeleteRow={deleteRow}
|
||||
onAddRow={addRow}
|
||||
onSetCellColor={setCellColor}
|
||||
/>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
))
|
||||
})()}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Keyboard tips */}
|
||||
<div className="text-[11px] text-gray-400 dark:text-gray-500 flex items-center gap-4">
|
||||
<span>Tab: naechste Zelle</span>
|
||||
<span>Ctrl+Z/Y: Undo/Redo</span>
|
||||
<span>Ctrl+S: Speichern</span>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
422
admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
Normal file
422
admin-lehrer/components/ocr-kombi/StepGutterRepair.tsx
Normal file
@@ -0,0 +1,422 @@
|
||||
'use client'
|
||||
|
||||
import { useState, useEffect, useCallback } from 'react'
|
||||
|
||||
const KLAUSUR_API = '/klausur-api'
|
||||
|
||||
interface GutterSuggestion {
|
||||
id: string
|
||||
type: 'hyphen_join' | 'spell_fix'
|
||||
zone_index: number
|
||||
row_index: number
|
||||
col_index: number
|
||||
col_type: string
|
||||
cell_id: string
|
||||
original_text: string
|
||||
suggested_text: string
|
||||
next_row_index: number
|
||||
next_row_cell_id: string
|
||||
next_row_text: string
|
||||
missing_chars: string
|
||||
display_parts: string[]
|
||||
alternatives: string[]
|
||||
confidence: number
|
||||
reason: string
|
||||
}
|
||||
|
||||
interface GutterRepairResult {
|
||||
suggestions: GutterSuggestion[]
|
||||
stats: {
|
||||
words_checked: number
|
||||
gutter_candidates: number
|
||||
suggestions_found: number
|
||||
error?: string
|
||||
}
|
||||
duration_seconds: number
|
||||
}
|
||||
|
||||
interface StepGutterRepairProps {
|
||||
sessionId: string | null
|
||||
onNext: () => void
|
||||
}
|
||||
|
||||
/**
|
||||
* Step 11: Gutter Repair (Wortkorrektur).
|
||||
* Detects words truncated at the book gutter and proposes corrections.
|
||||
* User can accept/reject each suggestion individually or in batch.
|
||||
*/
|
||||
export function StepGutterRepair({ sessionId, onNext }: StepGutterRepairProps) {
|
||||
const [loading, setLoading] = useState(false)
|
||||
const [applying, setApplying] = useState(false)
|
||||
const [result, setResult] = useState<GutterRepairResult | null>(null)
|
||||
const [accepted, setAccepted] = useState<Set<string>>(new Set())
|
||||
const [rejected, setRejected] = useState<Set<string>>(new Set())
|
||||
const [selectedText, setSelectedText] = useState<Record<string, string>>({})
|
||||
const [applied, setApplied] = useState(false)
|
||||
const [error, setError] = useState('')
|
||||
const [applyMessage, setApplyMessage] = useState('')
|
||||
|
||||
const analyse = useCallback(async () => {
|
||||
if (!sessionId) return
|
||||
setLoading(true)
|
||||
setError('')
|
||||
setApplied(false)
|
||||
setApplyMessage('')
|
||||
try {
|
||||
const res = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair`,
|
||||
{ method: 'POST' },
|
||||
)
|
||||
if (!res.ok) {
|
||||
const body = await res.json().catch(() => ({}))
|
||||
throw new Error(body.detail || `Analyse fehlgeschlagen (${res.status})`)
|
||||
}
|
||||
const data: GutterRepairResult = await res.json()
|
||||
setResult(data)
|
||||
// Auto-accept all suggestions with high confidence
|
||||
const autoAccept = new Set<string>()
|
||||
for (const s of data.suggestions) {
|
||||
if (s.confidence >= 0.85) {
|
||||
autoAccept.add(s.id)
|
||||
}
|
||||
}
|
||||
setAccepted(autoAccept)
|
||||
setRejected(new Set())
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [sessionId])
|
||||
|
||||
// Auto-trigger analysis on mount
|
||||
useEffect(() => {
|
||||
if (sessionId) analyse()
|
||||
}, [sessionId, analyse])
|
||||
|
||||
const toggleSuggestion = (id: string) => {
|
||||
setAccepted(prev => {
|
||||
const next = new Set(prev)
|
||||
if (next.has(id)) {
|
||||
next.delete(id)
|
||||
setRejected(r => new Set(r).add(id))
|
||||
} else {
|
||||
next.add(id)
|
||||
setRejected(r => { const n = new Set(r); n.delete(id); return n })
|
||||
}
|
||||
return next
|
||||
})
|
||||
}
|
||||
|
||||
const acceptAll = () => {
|
||||
if (!result) return
|
||||
setAccepted(new Set(result.suggestions.map(s => s.id)))
|
||||
setRejected(new Set())
|
||||
}
|
||||
|
||||
const rejectAll = () => {
|
||||
if (!result) return
|
||||
setRejected(new Set(result.suggestions.map(s => s.id)))
|
||||
setAccepted(new Set())
|
||||
}
|
||||
|
||||
const applyAccepted = async () => {
|
||||
if (!sessionId || accepted.size === 0) return
|
||||
setApplying(true)
|
||||
setApplyMessage('')
|
||||
try {
|
||||
const res = await fetch(
|
||||
`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}/gutter-repair/apply`,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
accepted: Array.from(accepted),
|
||||
text_overrides: selectedText,
|
||||
}),
|
||||
},
|
||||
)
|
||||
if (!res.ok) {
|
||||
const body = await res.json().catch(() => ({}))
|
||||
throw new Error(body.detail || `Anwenden fehlgeschlagen (${res.status})`)
|
||||
}
|
||||
const data = await res.json()
|
||||
setApplied(true)
|
||||
setApplyMessage(`${data.applied_count} Korrektur(en) angewendet.`)
|
||||
} catch (e) {
|
||||
setApplyMessage(e instanceof Error ? e.message : String(e))
|
||||
} finally {
|
||||
setApplying(false)
|
||||
}
|
||||
}
|
||||
|
||||
const suggestions = result?.suggestions || []
|
||||
const hasSuggestions = suggestions.length > 0
|
||||
|
||||
return (
|
||||
<div className="space-y-4">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h3 className="text-sm font-medium text-gray-700 dark:text-gray-300">
|
||||
Wortkorrektur (Buchfalz)
|
||||
</h3>
|
||||
<p className="text-xs text-gray-500 dark:text-gray-400 mt-1">
|
||||
Erkennt abgeschnittene oder unscharfe Woerter am Buchfalz und Bindestrich-Trennungen ueber Zeilen hinweg.
|
||||
</p>
|
||||
</div>
|
||||
{result && !loading && (
|
||||
<button
|
||||
onClick={analyse}
|
||||
className="px-3 py-1.5 text-xs bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 rounded-lg hover:bg-gray-200 dark:hover:bg-gray-600"
|
||||
>
|
||||
Erneut analysieren
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Loading */}
|
||||
{loading && (
|
||||
<div className="flex items-center gap-3 p-6 bg-blue-50 dark:bg-blue-900/20 rounded-xl border border-blue-200 dark:border-blue-800">
|
||||
<div className="animate-spin w-5 h-5 border-2 border-blue-400 border-t-transparent rounded-full" />
|
||||
<span className="text-sm text-blue-600 dark:text-blue-400">Analysiere Woerter am Buchfalz...</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Error */}
|
||||
{error && (
|
||||
<div className="space-y-3">
|
||||
<div className="text-sm text-red-500 bg-red-50 dark:bg-red-900/20 p-3 rounded-lg">
|
||||
{error}
|
||||
</div>
|
||||
<button
|
||||
onClick={analyse}
|
||||
className="px-4 py-2 bg-orange-600 text-white text-sm rounded-lg hover:bg-orange-700"
|
||||
>
|
||||
Erneut versuchen
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* No suggestions */}
|
||||
{result && !hasSuggestions && !loading && (
|
||||
<div className="p-4 bg-green-50 dark:bg-green-900/20 rounded-xl border border-green-200 dark:border-green-800">
|
||||
<div className="text-sm font-medium text-green-700 dark:text-green-300">
|
||||
Keine Buchfalz-Fehler erkannt.
|
||||
</div>
|
||||
<div className="text-xs text-green-600 dark:text-green-400 mt-1">
|
||||
{result.stats.words_checked} Woerter geprueft, {result.stats.gutter_candidates} Kandidaten am Rand analysiert.
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Suggestions list */}
|
||||
{hasSuggestions && !loading && (
|
||||
<>
|
||||
{/* Stats bar */}
|
||||
<div className="flex items-center justify-between p-3 bg-gray-50 dark:bg-gray-800 rounded-lg">
|
||||
<div className="text-xs text-gray-500 dark:text-gray-400">
|
||||
{suggestions.length} Vorschlag/Vorschlaege ·{' '}
|
||||
{result!.stats.words_checked} Woerter geprueft ·{' '}
|
||||
{result!.duration_seconds}s
|
||||
</div>
|
||||
<div className="flex gap-2">
|
||||
<button
|
||||
onClick={acceptAll}
|
||||
disabled={applied}
|
||||
className="px-2 py-1 text-xs bg-green-100 dark:bg-green-900/30 text-green-700 dark:text-green-300 rounded hover:bg-green-200 dark:hover:bg-green-900/50 disabled:opacity-50"
|
||||
>
|
||||
Alle akzeptieren
|
||||
</button>
|
||||
<button
|
||||
onClick={rejectAll}
|
||||
disabled={applied}
|
||||
className="px-2 py-1 text-xs bg-red-100 dark:bg-red-900/30 text-red-700 dark:text-red-300 rounded hover:bg-red-200 dark:hover:bg-red-900/50 disabled:opacity-50"
|
||||
>
|
||||
Alle ablehnen
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Suggestion cards */}
|
||||
<div className="space-y-2">
|
||||
{suggestions.map((s) => {
|
||||
const isAccepted = accepted.has(s.id)
|
||||
const isRejected = rejected.has(s.id)
|
||||
|
||||
return (
|
||||
<div
|
||||
key={s.id}
|
||||
className={`p-3 rounded-lg border transition-colors ${
|
||||
applied
|
||||
? isAccepted
|
||||
? 'bg-green-50 dark:bg-green-900/10 border-green-200 dark:border-green-800'
|
||||
: 'bg-gray-50 dark:bg-gray-800/50 border-gray-200 dark:border-gray-700 opacity-60'
|
||||
: isAccepted
|
||||
? 'bg-green-50 dark:bg-green-900/10 border-green-300 dark:border-green-700'
|
||||
: isRejected
|
||||
? 'bg-red-50 dark:bg-red-900/10 border-red-200 dark:border-red-800 opacity-60'
|
||||
: 'bg-white dark:bg-gray-800 border-gray-200 dark:border-gray-700'
|
||||
}`}
|
||||
>
|
||||
<div className="flex items-start justify-between gap-3">
|
||||
{/* Left: suggestion details */}
|
||||
<div className="flex-1 min-w-0">
|
||||
{/* Type badge */}
|
||||
<div className="flex items-center gap-2 mb-1.5">
|
||||
<span className={`inline-flex px-1.5 py-0.5 text-[10px] font-medium rounded ${
|
||||
s.type === 'hyphen_join'
|
||||
? 'bg-purple-100 dark:bg-purple-900/30 text-purple-700 dark:text-purple-300'
|
||||
: 'bg-orange-100 dark:bg-orange-900/30 text-orange-700 dark:text-orange-300'
|
||||
}`}>
|
||||
{s.type === 'hyphen_join' ? 'Zeilenumbruch' : 'Buchfalz-Korrektur'}
|
||||
</span>
|
||||
<span className="text-[10px] text-gray-400">
|
||||
Zeile {s.row_index + 1}, Spalte {s.col_index + 1}
|
||||
{s.col_type && ` (${s.col_type.replace('column_', '')})`}
|
||||
</span>
|
||||
<span className={`text-[10px] ${
|
||||
s.confidence >= 0.9 ? 'text-green-500' :
|
||||
s.confidence >= 0.7 ? 'text-yellow-500' : 'text-red-500'
|
||||
}`}>
|
||||
{Math.round(s.confidence * 100)}%
|
||||
</span>
|
||||
</div>
|
||||
|
||||
{/* Correction display */}
|
||||
{s.type === 'hyphen_join' ? (
|
||||
<div className="space-y-1">
|
||||
<div className="flex items-center gap-2 text-sm">
|
||||
<span className="font-mono text-red-600 dark:text-red-400 line-through">
|
||||
{s.original_text}
|
||||
</span>
|
||||
<span className="text-gray-400 text-xs">Z.{s.row_index + 1}</span>
|
||||
<span className="text-gray-300 dark:text-gray-600">+</span>
|
||||
<span className="font-mono text-red-600 dark:text-red-400 line-through">
|
||||
{s.next_row_text.split(' ')[0]}
|
||||
</span>
|
||||
<span className="text-gray-400 text-xs">Z.{s.next_row_index + 1}</span>
|
||||
<span className="text-gray-400">→</span>
|
||||
<span className="font-mono text-green-600 dark:text-green-400 font-semibold">
|
||||
{s.suggested_text}
|
||||
</span>
|
||||
</div>
|
||||
{s.missing_chars && (
|
||||
<div className="text-[10px] text-gray-400">
|
||||
Fehlende Zeichen: <span className="font-mono font-semibold">{s.missing_chars}</span>
|
||||
{' '}· Darstellung: <span className="font-mono">{s.display_parts.join(' | ')}</span>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
) : (
|
||||
<div className="space-y-1">
|
||||
<div className="flex items-center gap-2 text-sm">
|
||||
<span className="font-mono text-red-600 dark:text-red-400 line-through">
|
||||
{s.original_text}
|
||||
</span>
|
||||
<span className="text-gray-400">→</span>
|
||||
<span className="font-mono text-green-600 dark:text-green-400 font-semibold">
|
||||
{selectedText[s.id] || s.suggested_text}
|
||||
</span>
|
||||
</div>
|
||||
{/* Alternatives: show other candidates the user can pick */}
|
||||
{s.alternatives && s.alternatives.length > 0 && !applied && (
|
||||
<div className="flex items-center gap-1.5 flex-wrap">
|
||||
<span className="text-[10px] text-gray-400">Alternativen:</span>
|
||||
{[s.suggested_text, ...s.alternatives].map((alt) => {
|
||||
const isSelected = (selectedText[s.id] || s.suggested_text) === alt
|
||||
return (
|
||||
<button
|
||||
key={alt}
|
||||
onClick={() => setSelectedText(prev => ({ ...prev, [s.id]: alt }))}
|
||||
className={`px-1.5 py-0.5 text-[11px] font-mono rounded transition-colors ${
|
||||
isSelected
|
||||
? 'bg-green-200 dark:bg-green-800 text-green-800 dark:text-green-200 font-semibold'
|
||||
: 'bg-gray-100 dark:bg-gray-700 text-gray-600 dark:text-gray-300 hover:bg-gray-200 dark:hover:bg-gray-600'
|
||||
}`}
|
||||
>
|
||||
{alt}
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Right: accept/reject toggle */}
|
||||
{!applied && (
|
||||
<button
|
||||
onClick={() => toggleSuggestion(s.id)}
|
||||
className={`flex-shrink-0 w-8 h-8 rounded-full flex items-center justify-center text-sm transition-colors ${
|
||||
isAccepted
|
||||
? 'bg-green-500 text-white hover:bg-green-600'
|
||||
: isRejected
|
||||
? 'bg-red-400 text-white hover:bg-red-500'
|
||||
: 'bg-gray-200 dark:bg-gray-600 text-gray-500 dark:text-gray-300 hover:bg-gray-300 dark:hover:bg-gray-500'
|
||||
}`}
|
||||
title={isAccepted ? 'Akzeptiert (klicken zum Ablehnen)' : isRejected ? 'Abgelehnt (klicken zum Akzeptieren)' : 'Klicken zum Akzeptieren'}
|
||||
>
|
||||
{isAccepted ? '\u2713' : isRejected ? '\u2717' : '?'}
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
|
||||
{/* Apply / Next buttons */}
|
||||
<div className="flex items-center gap-3 pt-2">
|
||||
{!applied ? (
|
||||
<button
|
||||
onClick={applyAccepted}
|
||||
disabled={applying || accepted.size === 0}
|
||||
className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700 disabled:opacity-50"
|
||||
>
|
||||
{applying ? 'Wird angewendet...' : `${accepted.size} Korrektur(en) anwenden`}
|
||||
</button>
|
||||
) : (
|
||||
<button
|
||||
onClick={onNext}
|
||||
className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
|
||||
>
|
||||
Weiter zu Ground Truth
|
||||
</button>
|
||||
)}
|
||||
{!applied && (
|
||||
<button
|
||||
onClick={onNext}
|
||||
className="px-4 py-2 text-sm text-gray-500 dark:text-gray-400 hover:text-gray-700 dark:hover:text-gray-200"
|
||||
>
|
||||
Ueberspringen
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Apply result message */}
|
||||
{applyMessage && (
|
||||
<div className={`text-sm p-2 rounded ${
|
||||
applyMessage.includes('fehlgeschlagen')
|
||||
? 'text-red-500 bg-red-50 dark:bg-red-900/20'
|
||||
: 'text-green-600 dark:text-green-400 bg-green-50 dark:bg-green-900/20'
|
||||
}`}>
|
||||
{applyMessage}
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
|
||||
{/* Skip button when no suggestions */}
|
||||
{result && !hasSuggestions && !loading && (
|
||||
<button
|
||||
onClick={onNext}
|
||||
className="px-4 py-2 bg-teal-600 text-white text-sm rounded-lg hover:bg-teal-700"
|
||||
>
|
||||
Weiter zu Ground Truth
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,8 +1,6 @@
|
||||
'use client'
|
||||
|
||||
import { useState, useEffect, useRef } from 'react'
|
||||
import type { SubSession } from '@/app/(admin)/ai/ocr-pipeline/types'
|
||||
|
||||
const KLAUSUR_API = '/klausur-api'
|
||||
|
||||
interface PageSplitResult {
|
||||
@@ -18,10 +16,10 @@ interface StepPageSplitProps {
|
||||
sessionId: string | null
|
||||
sessionName: string
|
||||
onNext: () => void
|
||||
onSubSessionsCreated: (subs: SubSession[]) => void
|
||||
onSplitComplete: (firstChildId: string, firstChildName: string) => void
|
||||
}
|
||||
|
||||
export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCreated }: StepPageSplitProps) {
|
||||
export function StepPageSplit({ sessionId, sessionName, onNext, onSplitComplete }: StepPageSplitProps) {
|
||||
const [detecting, setDetecting] = useState(false)
|
||||
const [splitResult, setSplitResult] = useState<PageSplitResult | null>(null)
|
||||
const [error, setError] = useState('')
|
||||
@@ -40,30 +38,33 @@ export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCre
|
||||
setDetecting(true)
|
||||
setError('')
|
||||
try {
|
||||
// First check if sub-sessions already exist
|
||||
// First check if this session was already split (status='split')
|
||||
const sessionRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions/${sessionId}`)
|
||||
if (sessionRes.ok) {
|
||||
const sessionData = await sessionRes.json()
|
||||
if (sessionData.sub_sessions?.length > 0) {
|
||||
// Already split — show existing sub-sessions
|
||||
const subs = sessionData.sub_sessions as { id: string; name: string; page_index?: number; box_index?: number; current_step?: number }[]
|
||||
setSplitResult({
|
||||
multi_page: true,
|
||||
page_count: subs.length,
|
||||
sub_sessions: subs.map((s: { id: string; name: string; page_index?: number; box_index?: number }) => ({
|
||||
id: s.id,
|
||||
name: s.name,
|
||||
page_index: s.page_index ?? s.box_index ?? 0,
|
||||
})),
|
||||
})
|
||||
onSubSessionsCreated(subs.map((s: { id: string; name: string; page_index?: number; box_index?: number; current_step?: number }) => ({
|
||||
id: s.id,
|
||||
name: s.name,
|
||||
box_index: s.page_index ?? s.box_index ?? 0,
|
||||
current_step: s.current_step ?? 2,
|
||||
})))
|
||||
setDetecting(false)
|
||||
return
|
||||
if (sessionData.status === 'split' && sessionData.crop_result?.multi_page) {
|
||||
// Already split — find the child sessions in the session list
|
||||
const listRes = await fetch(`${KLAUSUR_API}/api/v1/ocr-pipeline/sessions`)
|
||||
if (listRes.ok) {
|
||||
const listData = await listRes.json()
|
||||
// Child sessions have names like "ParentName — Seite N"
|
||||
const baseName = sessionName || sessionData.name || ''
|
||||
const children = (listData.sessions || [])
|
||||
.filter((s: { name?: string }) => s.name?.startsWith(baseName + ' — '))
|
||||
.sort((a: { name: string }, b: { name: string }) => a.name.localeCompare(b.name))
|
||||
if (children.length > 0) {
|
||||
setSplitResult({
|
||||
multi_page: true,
|
||||
page_count: children.length,
|
||||
sub_sessions: children.map((s: { id: string; name: string }, i: number) => ({
|
||||
id: s.id, name: s.name, page_index: i,
|
||||
})),
|
||||
})
|
||||
onSplitComplete(children[0].id, children[0].name)
|
||||
setDetecting(false)
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -92,12 +93,8 @@ export function StepPageSplit({ sessionId, sessionName, onNext, onSubSessionsCre
|
||||
sub.name = newName
|
||||
}
|
||||
|
||||
onSubSessionsCreated(data.sub_sessions.map(s => ({
|
||||
id: s.id,
|
||||
name: s.name,
|
||||
box_index: s.page_index,
|
||||
current_step: 2,
|
||||
})))
|
||||
// Signal parent to switch to the first child session
|
||||
onSplitComplete(data.sub_sessions[0].id, data.sub_sessions[0].name)
|
||||
}
|
||||
} catch (e) {
|
||||
setError(e instanceof Error ? e.message : String(e))
|
||||
|
||||
1697
admin-lehrer/package-lock.json
generated
1697
admin-lehrer/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -18,6 +18,8 @@
|
||||
"test:all": "vitest run && playwright test --project=chromium"
|
||||
},
|
||||
"dependencies": {
|
||||
"@fortune-sheet/react": "^1.0.4",
|
||||
"fabric": "^6.0.0",
|
||||
"jspdf": "^4.1.0",
|
||||
"jszip": "^3.10.1",
|
||||
"lucide-react": "^0.468.0",
|
||||
@@ -26,7 +28,6 @@
|
||||
"react-dom": "^18.3.1",
|
||||
"reactflow": "^11.11.4",
|
||||
"recharts": "^2.15.0",
|
||||
"fabric": "^6.0.0",
|
||||
"uuid": "^13.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
from typing import List, Dict, Any, Optional
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
import json
|
||||
import os
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
@@ -15,6 +19,8 @@ from learning_units import (
|
||||
delete_learning_unit,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
router = APIRouter(
|
||||
prefix="/learning-units",
|
||||
@@ -49,6 +55,11 @@ class RemoveWorksheetPayload(BaseModel):
|
||||
worksheet_file: str
|
||||
|
||||
|
||||
class GenerateFromAnalysisPayload(BaseModel):
|
||||
analysis_data: Dict[str, Any]
|
||||
num_questions: int = 8
|
||||
|
||||
|
||||
# ---------- Hilfsfunktion: Backend-Modell -> Frontend-Objekt ----------
|
||||
|
||||
|
||||
@@ -195,3 +206,171 @@ def api_delete_learning_unit(unit_id: str):
|
||||
raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
|
||||
return {"status": "deleted", "id": unit_id}
|
||||
|
||||
|
||||
# ---------- Generator-Endpunkte ----------
|
||||
|
||||
LERNEINHEITEN_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten")
|
||||
|
||||
|
||||
def _save_analysis_and_get_path(unit_id: str, analysis_data: Dict[str, Any]) -> Path:
|
||||
"""Save analysis_data to disk and return the path."""
|
||||
os.makedirs(LERNEINHEITEN_DIR, exist_ok=True)
|
||||
path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_analyse.json"
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
json.dump(analysis_data, f, ensure_ascii=False, indent=2)
|
||||
return path
|
||||
|
||||
|
||||
@router.post("/{unit_id}/generate-qa")
|
||||
def api_generate_qa(unit_id: str, payload: GenerateFromAnalysisPayload):
|
||||
"""Generate Q&A items with Leitner fields from analysis data."""
|
||||
lu = get_learning_unit(unit_id)
|
||||
if not lu:
|
||||
raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
|
||||
|
||||
analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
|
||||
|
||||
try:
|
||||
from ai_processing.qa_generator import generate_qa_from_analysis
|
||||
qa_path = generate_qa_from_analysis(analysis_path, num_questions=payload.num_questions)
|
||||
with open(qa_path, "r", encoding="utf-8") as f:
|
||||
qa_data = json.load(f)
|
||||
|
||||
# Update unit status
|
||||
update_learning_unit(unit_id, LearningUnitUpdate(status="qa_generated"))
|
||||
logger.info(f"Generated QA for unit {unit_id}: {len(qa_data.get('qa_items', []))} items")
|
||||
return qa_data
|
||||
except Exception as e:
|
||||
logger.error(f"QA generation failed for {unit_id}: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"QA-Generierung fehlgeschlagen: {e}")
|
||||
|
||||
|
||||
@router.post("/{unit_id}/generate-mc")
|
||||
def api_generate_mc(unit_id: str, payload: GenerateFromAnalysisPayload):
|
||||
"""Generate multiple choice questions from analysis data."""
|
||||
lu = get_learning_unit(unit_id)
|
||||
if not lu:
|
||||
raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
|
||||
|
||||
analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
|
||||
|
||||
try:
|
||||
from ai_processing.mc_generator import generate_mc_from_analysis
|
||||
mc_path = generate_mc_from_analysis(analysis_path, num_questions=payload.num_questions)
|
||||
with open(mc_path, "r", encoding="utf-8") as f:
|
||||
mc_data = json.load(f)
|
||||
|
||||
update_learning_unit(unit_id, LearningUnitUpdate(status="mc_generated"))
|
||||
logger.info(f"Generated MC for unit {unit_id}: {len(mc_data.get('questions', []))} questions")
|
||||
return mc_data
|
||||
except Exception as e:
|
||||
logger.error(f"MC generation failed for {unit_id}: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"MC-Generierung fehlgeschlagen: {e}")
|
||||
|
||||
|
||||
@router.post("/{unit_id}/generate-cloze")
|
||||
def api_generate_cloze(unit_id: str, payload: GenerateFromAnalysisPayload):
|
||||
"""Generate cloze (fill-in-the-blank) items from analysis data."""
|
||||
lu = get_learning_unit(unit_id)
|
||||
if not lu:
|
||||
raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
|
||||
|
||||
analysis_path = _save_analysis_and_get_path(unit_id, payload.analysis_data)
|
||||
|
||||
try:
|
||||
from ai_processing.cloze_generator import generate_cloze_from_analysis
|
||||
cloze_path = generate_cloze_from_analysis(analysis_path)
|
||||
with open(cloze_path, "r", encoding="utf-8") as f:
|
||||
cloze_data = json.load(f)
|
||||
|
||||
update_learning_unit(unit_id, LearningUnitUpdate(status="cloze_generated"))
|
||||
logger.info(f"Generated Cloze for unit {unit_id}: {len(cloze_data.get('cloze_items', []))} items")
|
||||
return cloze_data
|
||||
except Exception as e:
|
||||
logger.error(f"Cloze generation failed for {unit_id}: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"Cloze-Generierung fehlgeschlagen: {e}")
|
||||
|
||||
|
||||
@router.get("/{unit_id}/qa")
|
||||
def api_get_qa(unit_id: str):
|
||||
"""Get generated QA items for a unit."""
|
||||
qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
|
||||
if not qa_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
|
||||
with open(qa_path, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
@router.get("/{unit_id}/mc")
|
||||
def api_get_mc(unit_id: str):
|
||||
"""Get generated MC questions for a unit."""
|
||||
mc_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_mc.json"
|
||||
if not mc_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Keine MC-Daten gefunden.")
|
||||
with open(mc_path, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
@router.get("/{unit_id}/cloze")
|
||||
def api_get_cloze(unit_id: str):
|
||||
"""Get generated cloze items for a unit."""
|
||||
cloze_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_cloze.json"
|
||||
if not cloze_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Keine Cloze-Daten gefunden.")
|
||||
with open(cloze_path, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
@router.post("/{unit_id}/leitner/update")
|
||||
def api_update_leitner(unit_id: str, item_id: str, correct: bool):
|
||||
"""Update Leitner progress for a QA item."""
|
||||
qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
|
||||
if not qa_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
|
||||
try:
|
||||
from ai_processing.qa_generator import update_leitner_progress
|
||||
result = update_leitner_progress(qa_path, item_id, correct)
|
||||
return result
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
|
||||
@router.get("/{unit_id}/leitner/next")
|
||||
def api_get_next_review(unit_id: str, limit: int = 5):
|
||||
"""Get next Leitner review items."""
|
||||
qa_path = Path(LERNEINHEITEN_DIR) / f"{unit_id}_qa.json"
|
||||
if not qa_path.exists():
|
||||
raise HTTPException(status_code=404, detail="Keine QA-Daten gefunden.")
|
||||
try:
|
||||
from ai_processing.qa_generator import get_next_review_items
|
||||
items = get_next_review_items(qa_path, limit=limit)
|
||||
return {"items": items, "count": len(items)}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
|
||||
class StoryGeneratePayload(BaseModel):
|
||||
vocabulary: List[Dict[str, Any]]
|
||||
language: str = "en"
|
||||
grade_level: str = "5-8"
|
||||
|
||||
|
||||
@router.post("/{unit_id}/generate-story")
|
||||
def api_generate_story(unit_id: str, payload: StoryGeneratePayload):
|
||||
"""Generate a short story using vocabulary words."""
|
||||
lu = get_learning_unit(unit_id)
|
||||
if not lu:
|
||||
raise HTTPException(status_code=404, detail="Lerneinheit nicht gefunden.")
|
||||
|
||||
try:
|
||||
from story_generator import generate_story
|
||||
result = generate_story(
|
||||
vocabulary=payload.vocabulary,
|
||||
language=payload.language,
|
||||
grade_level=payload.grade_level,
|
||||
)
|
||||
return result
|
||||
except Exception as e:
|
||||
logger.error(f"Story generation failed for {unit_id}: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"Story-Generierung fehlgeschlagen: {e}")
|
||||
|
||||
|
||||
@@ -106,6 +106,10 @@ app.include_router(correction_router, prefix="/api")
|
||||
from learning_units_api import router as learning_units_router
|
||||
app.include_router(learning_units_router, prefix="/api")
|
||||
|
||||
# --- 4b. Learning Progress ---
|
||||
from progress_api import router as progress_router
|
||||
app.include_router(progress_router, prefix="/api")
|
||||
|
||||
from unit_api import router as unit_router
|
||||
app.include_router(unit_router) # Already has /api/units prefix
|
||||
|
||||
|
||||
131
backend-lehrer/progress_api.py
Normal file
131
backend-lehrer/progress_api.py
Normal file
@@ -0,0 +1,131 @@
|
||||
"""
|
||||
Progress API — Tracks student learning progress per unit.
|
||||
|
||||
Stores coins, crowns, streak data, and exercise completion stats.
|
||||
Uses JSON file storage (same pattern as learning_units.py).
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, date
|
||||
from typing import Dict, Any, Optional, List
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(
|
||||
prefix="/progress",
|
||||
tags=["progress"],
|
||||
)
|
||||
|
||||
PROGRESS_DIR = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten/progress")
|
||||
|
||||
|
||||
def _ensure_dir():
|
||||
os.makedirs(PROGRESS_DIR, exist_ok=True)
|
||||
|
||||
|
||||
def _progress_path(unit_id: str) -> Path:
|
||||
return Path(PROGRESS_DIR) / f"{unit_id}.json"
|
||||
|
||||
|
||||
def _load_progress(unit_id: str) -> Dict[str, Any]:
|
||||
path = _progress_path(unit_id)
|
||||
if path.exists():
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
return {
|
||||
"unit_id": unit_id,
|
||||
"coins": 0,
|
||||
"crowns": 0,
|
||||
"streak_days": 0,
|
||||
"last_activity": None,
|
||||
"exercises": {
|
||||
"flashcards": {"completed": 0, "correct": 0, "incorrect": 0},
|
||||
"quiz": {"completed": 0, "correct": 0, "incorrect": 0},
|
||||
"type": {"completed": 0, "correct": 0, "incorrect": 0},
|
||||
"story": {"generated": 0},
|
||||
},
|
||||
"created_at": datetime.now().isoformat(),
|
||||
}
|
||||
|
||||
|
||||
def _save_progress(unit_id: str, data: Dict[str, Any]):
|
||||
_ensure_dir()
|
||||
path = _progress_path(unit_id)
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
json.dump(data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
|
||||
class RewardPayload(BaseModel):
|
||||
exercise_type: str # flashcards, quiz, type, story
|
||||
correct: bool = True
|
||||
first_try: bool = True
|
||||
|
||||
|
||||
@router.get("/{unit_id}")
|
||||
def get_progress(unit_id: str):
|
||||
"""Get learning progress for a unit."""
|
||||
return _load_progress(unit_id)
|
||||
|
||||
|
||||
@router.post("/{unit_id}/reward")
|
||||
def add_reward(unit_id: str, payload: RewardPayload):
|
||||
"""Record an exercise result and award coins."""
|
||||
progress = _load_progress(unit_id)
|
||||
|
||||
# Update exercise stats
|
||||
ex = progress["exercises"].get(payload.exercise_type, {"completed": 0, "correct": 0, "incorrect": 0})
|
||||
ex["completed"] = ex.get("completed", 0) + 1
|
||||
if payload.correct:
|
||||
ex["correct"] = ex.get("correct", 0) + 1
|
||||
else:
|
||||
ex["incorrect"] = ex.get("incorrect", 0) + 1
|
||||
progress["exercises"][payload.exercise_type] = ex
|
||||
|
||||
# Award coins
|
||||
if payload.correct:
|
||||
coins = 3 if payload.first_try else 1
|
||||
else:
|
||||
coins = 0
|
||||
progress["coins"] = progress.get("coins", 0) + coins
|
||||
|
||||
# Update streak
|
||||
today = date.today().isoformat()
|
||||
last = progress.get("last_activity")
|
||||
if last != today:
|
||||
if last == (date.today().replace(day=date.today().day - 1)).isoformat() if date.today().day > 1 else None:
|
||||
progress["streak_days"] = progress.get("streak_days", 0) + 1
|
||||
elif last != today:
|
||||
progress["streak_days"] = 1
|
||||
progress["last_activity"] = today
|
||||
|
||||
# Award crowns for milestones
|
||||
total_correct = sum(
|
||||
e.get("correct", 0) for e in progress["exercises"].values() if isinstance(e, dict)
|
||||
)
|
||||
progress["crowns"] = total_correct // 20 # 1 crown per 20 correct answers
|
||||
|
||||
_save_progress(unit_id, progress)
|
||||
|
||||
return {
|
||||
"coins_awarded": coins,
|
||||
"total_coins": progress["coins"],
|
||||
"crowns": progress["crowns"],
|
||||
"streak_days": progress["streak_days"],
|
||||
}
|
||||
|
||||
|
||||
@router.get("/")
|
||||
def list_all_progress():
|
||||
"""List progress for all units."""
|
||||
_ensure_dir()
|
||||
results = []
|
||||
for f in Path(PROGRESS_DIR).glob("*.json"):
|
||||
with open(f, "r", encoding="utf-8") as fh:
|
||||
results.append(json.load(fh))
|
||||
return results
|
||||
108
backend-lehrer/story_generator.py
Normal file
108
backend-lehrer/story_generator.py
Normal file
@@ -0,0 +1,108 @@
|
||||
"""
|
||||
Story Generator — Creates short stories using vocabulary words.
|
||||
|
||||
Generates age-appropriate mini-stories (3-5 sentences) that incorporate
|
||||
the given vocabulary words, marked with <mark> tags for highlighting.
|
||||
|
||||
Uses Ollama (local LLM) for generation.
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import logging
|
||||
import requests
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
OLLAMA_URL = os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434")
|
||||
STORY_MODEL = os.getenv("STORY_MODEL", "llama3.1:8b")
|
||||
|
||||
|
||||
def generate_story(
|
||||
vocabulary: List[Dict[str, str]],
|
||||
language: str = "en",
|
||||
grade_level: str = "5-8",
|
||||
max_words: int = 5,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate a short story incorporating vocabulary words.
|
||||
|
||||
Args:
|
||||
vocabulary: List of dicts with 'english' and 'german' keys
|
||||
language: 'en' for English story, 'de' for German story
|
||||
grade_level: Target grade level
|
||||
max_words: Maximum vocab words to include (to keep story short)
|
||||
|
||||
Returns:
|
||||
Dict with 'story_html', 'story_text', 'vocab_used', 'language'
|
||||
"""
|
||||
# Select subset of vocabulary
|
||||
words = vocabulary[:max_words]
|
||||
word_list = [w.get("english", "") if language == "en" else w.get("german", "") for w in words]
|
||||
word_list = [w for w in word_list if w.strip()]
|
||||
|
||||
if not word_list:
|
||||
return {"story_html": "", "story_text": "", "vocab_used": [], "language": language}
|
||||
|
||||
lang_name = "English" if language == "en" else "German"
|
||||
words_str = ", ".join(word_list)
|
||||
|
||||
prompt = f"""Write a short story (3-5 sentences) in {lang_name} for a grade {grade_level} student.
|
||||
The story MUST use these vocabulary words: {words_str}
|
||||
|
||||
Rules:
|
||||
1. The story should be fun and age-appropriate
|
||||
2. Each vocabulary word must appear at least once
|
||||
3. Keep sentences simple and clear
|
||||
4. The story should make sense and be engaging
|
||||
|
||||
Write ONLY the story, nothing else. No title, no introduction."""
|
||||
|
||||
try:
|
||||
resp = requests.post(
|
||||
f"{OLLAMA_URL}/api/generate",
|
||||
json={
|
||||
"model": STORY_MODEL,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"temperature": 0.8, "num_predict": 300},
|
||||
},
|
||||
timeout=30,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
story_text = resp.json().get("response", "").strip()
|
||||
except Exception as e:
|
||||
logger.error(f"Story generation failed: {e}")
|
||||
# Fallback: simple template story
|
||||
story_text = _fallback_story(word_list, language)
|
||||
|
||||
# Mark vocabulary words in the story
|
||||
story_html = story_text
|
||||
vocab_found = []
|
||||
for word in word_list:
|
||||
if word.lower() in story_html.lower():
|
||||
# Case-insensitive replacement preserving original case
|
||||
import re
|
||||
pattern = re.compile(re.escape(word), re.IGNORECASE)
|
||||
story_html = pattern.sub(
|
||||
lambda m: f'<mark class="vocab-highlight">{m.group()}</mark>',
|
||||
story_html,
|
||||
count=1,
|
||||
)
|
||||
vocab_found.append(word)
|
||||
|
||||
return {
|
||||
"story_html": story_html,
|
||||
"story_text": story_text,
|
||||
"vocab_used": vocab_found,
|
||||
"vocab_total": len(word_list),
|
||||
"language": language,
|
||||
}
|
||||
|
||||
|
||||
def _fallback_story(words: List[str], language: str) -> str:
|
||||
"""Simple fallback when LLM is unavailable."""
|
||||
if language == "de":
|
||||
return f"Heute habe ich neue Woerter gelernt: {', '.join(words)}. Es war ein guter Tag zum Lernen."
|
||||
return f"Today I learned new words: {', '.join(words)}. It was a great day for learning."
|
||||
204
docs-src/services/klausur-service/RAG-Landkarte.md
Normal file
204
docs-src/services/klausur-service/RAG-Landkarte.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# RAG Landkarte — Branchen-Regulierungs-Matrix
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Die RAG Landkarte zeigt eine interaktive Matrix aller 320 Compliance-Dokumente im RAG-System, gruppiert nach Dokumenttyp und zugeordnet zu 10 Industriebranchen.
|
||||
|
||||
**URL**: `https://macmini:3002/ai/rag` → Tab "Landkarte"
|
||||
|
||||
**Letzte Aktualisierung**: 2026-04-15
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
rag-documents.json ← Zentrale Datendatei (320 Dokumente)
|
||||
├── doc_types[] ← 17 Dokumenttypen (EU-VO, DE-Gesetz, etc.)
|
||||
├── industries[] ← 10 Branchen (VDMA/VDA/BDI)
|
||||
└── documents[] ← Alle Dokumente mit Branchen-Mapping
|
||||
├── code ← Eindeutiger Identifier
|
||||
├── name ← Anzeigename
|
||||
├── doc_type ← Verweis auf doc_types.id
|
||||
├── industries[] ← ["all"] oder ["automotive", "chemie", ...]
|
||||
├── in_rag ← true (alle im RAG)
|
||||
├── rag_collection ← Qdrant Collection Name
|
||||
├── description? ← Beschreibung (fuer ~100 Hauptregulierungen)
|
||||
├── applicability_note? ← Begruendung der Branchenzuordnung
|
||||
└── effective_date? ← Gueltigkeitsdatum
|
||||
|
||||
rag-constants.ts ← RAG-Metadaten (Chunks, Qdrant-IDs)
|
||||
page.tsx ← Frontend (importiert aus JSON)
|
||||
```
|
||||
|
||||
## Dateien
|
||||
|
||||
| Pfad | Beschreibung |
|
||||
|------|--------------|
|
||||
| `admin-lehrer/app/(admin)/ai/rag/rag-documents.json` | Alle 320 Dokumente mit Branchen-Mapping |
|
||||
| `admin-lehrer/app/(admin)/ai/rag/rag-constants.ts` | REGULATIONS_IN_RAG (Chunk-Counts, Qdrant-IDs) |
|
||||
| `admin-lehrer/app/(admin)/ai/rag/page.tsx` | Frontend-Rendering |
|
||||
| `admin-lehrer/app/(admin)/ai/rag/__tests__/rag-documents.test.ts` | 44 Tests fuer JSON-Validierung |
|
||||
|
||||
## Branchen (10 Industriesektoren)
|
||||
|
||||
Die Branchen orientieren sich an den Mitgliedsverbaenden von VDMA, VDA und BDI:
|
||||
|
||||
| ID | Branche | Icon | Typische Kunden |
|
||||
|----|---------|------|-----------------|
|
||||
| `automotive` | Automobilindustrie | 🚗 | OEMs, Tier-1/2 Zulieferer |
|
||||
| `maschinenbau` | Maschinen- & Anlagenbau | ⚙️ | Werkzeugmaschinen, Automatisierung |
|
||||
| `elektrotechnik` | Elektro- & Digitalindustrie | ⚡ | Embedded Systems, Steuerungstechnik |
|
||||
| `chemie` | Chemie- & Prozessindustrie | 🧪 | Grundstoffchemie, Spezialchemie |
|
||||
| `metall` | Metallindustrie | 🔩 | Stahl, Aluminium, Metallverarbeitung |
|
||||
| `energie` | Energie & Versorgung | 🔋 | Energieerzeugung, Netzbetreiber |
|
||||
| `transport` | Transport & Logistik | 🚚 | Gueterverkehr, Schiene, Luftfahrt |
|
||||
| `handel` | Handel | 🏪 | Einzel-/Grosshandel, E-Commerce |
|
||||
| `konsumgueter` | Konsumgueter & Lebensmittel | 📦 | FMCG, Lebensmittel, Verpackung |
|
||||
| `bau` | Bauwirtschaft | 🏗️ | Hoch-/Tiefbau, Gebaeudeautomation |
|
||||
|
||||
!!! warning "Keine Pseudo-Branchen"
|
||||
Es werden bewusst **keine** Querschnittsthemen wie IoT, KI, HR, KRITIS oder E-Commerce als "Branchen" gefuehrt. Diese sind Technologien, Abteilungen oder Klassifizierungen — keine Wirtschaftssektoren.
|
||||
|
||||
## Zuordnungslogik
|
||||
|
||||
### Drei Ebenen
|
||||
|
||||
| Ebene | `industries` Wert | Anzahl | Beispiele |
|
||||
|-------|-------------------|--------|-----------|
|
||||
| **Horizontal** | `["all"]` | 264 | DSGVO, AI Act, CRA, NIS2, BetrVG |
|
||||
| **Sektorspezifisch** | `["automotive", "chemie", ...]` | 42 | Maschinenverordnung, ElektroG, BattDG |
|
||||
| **Nicht zutreffend** | `[]` | 14 | DORA, MiCA, EHDS, DSA |
|
||||
|
||||
### Horizontal (alle Branchen)
|
||||
|
||||
Regulierungen die **branchenuebergreifend** gelten:
|
||||
|
||||
- **Datenschutz**: DSGVO, BDSG, ePrivacy, TDDDG, SCC, DPF
|
||||
- **KI**: AI Act (jedes Unternehmen das KI einsetzt)
|
||||
- **Cybersecurity**: CRA (jedes Produkt mit digitalen Elementen), NIS2, EUCSA
|
||||
- **Produktsicherheit**: GPSR, Produkthaftungs-RL
|
||||
- **Arbeitsrecht**: BetrVG, AGG, KSchG, ArbSchG, LkSG
|
||||
- **Handels-/Steuerrecht**: HGB, AO, UStG
|
||||
- **Software-Security**: OWASP Top 10, NIST SSDF, CISA Secure by Design
|
||||
- **Supply Chain**: CycloneDX, SPDX, SLSA (CRA verlangt SBOM)
|
||||
- **Alle Leitlinien**: EDPB, DSK, DSFA-Listen, Gerichtsurteile
|
||||
|
||||
### Sektorspezifisch
|
||||
|
||||
| Regulierung | Branchen | Begruendung |
|
||||
|-------------|----------|-------------|
|
||||
| Maschinenverordnung | Maschinenbau, Automotive, Elektrotechnik, Metall, Bau | Hersteller von Maschinen und zugehoerigen Produkten |
|
||||
| ElektroG | Elektrotechnik, Automotive, Konsumgueter | Elektro-/Elektronikgeraete |
|
||||
| BattDG/BattVO | Automotive, Elektrotechnik, Energie | Batterien und Akkumulatoren |
|
||||
| VerpackG | Konsumgueter, Handel, Chemie | Verpackungspflichtige Produkte |
|
||||
| PAngV, UWG, VSBG | Handel, Konsumgueter | Verbraucherschutz im Verkauf |
|
||||
| BSI-KritisV, KRITIS-Dachgesetz | Energie, Transport, Chemie | KRITIS-Sektoren |
|
||||
| ENISA ICS/SCADA | Maschinenbau, Elektrotechnik, Automotive, Chemie, Energie, Transport | Industrielle Steuerungstechnik |
|
||||
| NIST SP 800-82 (OT) | Maschinenbau, Automotive, Elektrotechnik, Chemie, Energie, Metall | Operational Technology |
|
||||
|
||||
### Nicht zutreffend
|
||||
|
||||
Dokumente die **im RAG bleiben** aber fuer keine der 10 Zielbranchen relevant sind:
|
||||
|
||||
| Code | Name | Grund |
|
||||
|------|------|-------|
|
||||
| DORA | Digital Operational Resilience Act | Finanzsektor |
|
||||
| PSD2 | Zahlungsdiensterichtlinie | Zahlungsdienstleister |
|
||||
| MiCA | Markets in Crypto-Assets | Krypto-Maerkte |
|
||||
| AMLR | AML-Verordnung | Geldwaesche-Bekaempfung |
|
||||
| EHDS | Europaeischer Gesundheitsdatenraum | Gesundheitswesen |
|
||||
| DSA | Digital Services Act | Online-Plattformen |
|
||||
| DMA | Digital Markets Act | Gatekeeper-Plattformen |
|
||||
| MDR | Medizinprodukteverordnung | Medizintechnik |
|
||||
| BSI-TR-03161 | DiGA-Sicherheit (3 Teile) | Digitale Gesundheitsanwendungen |
|
||||
|
||||
## Dokumenttypen (17)
|
||||
|
||||
| doc_type | Label | Anzahl | Beispiele |
|
||||
|----------|-------|--------|-----------|
|
||||
| `eu_regulation` | EU-Verordnungen | 22 | DSGVO, AI Act, CRA, DORA |
|
||||
| `eu_directive` | EU-Richtlinien | 14 | ePrivacy, NIS2, PSD2 |
|
||||
| `eu_guidance` | EU-Leitfaeden | 9 | Blue Guide, GPAI CoP |
|
||||
| `de_law` | Deutsche Gesetze | 41 | BDSG, BGB, HGB, BetrVG |
|
||||
| `at_law` | Oesterreichische Gesetze | 11 | DSG AT, ECG, KSchG |
|
||||
| `ch_law` | Schweizer Gesetze | 8 | revDSG, DSV, OR |
|
||||
| `national_law` | Nationale Datenschutzgesetze | 17 | UK DPA, LOPDGDD, UAVG |
|
||||
| `bsi_standard` | BSI Standards & TR | 4 | BSI 200-4, BSI-TR-03161 |
|
||||
| `edpb_guideline` | EDPB/WP29 Leitlinien | 50 | Consent, Controller/Processor |
|
||||
| `dsk_guidance` | DSK Orientierungshilfen | 57 | Kurzpapiere, OH Telemedien |
|
||||
| `court_decision` | Gerichtsurteile | 20 | BAG M365, BGH Planet49 |
|
||||
| `dsfa_list` | DSFA Muss-Listen | 20 | Pro Bundesland + DSK |
|
||||
| `nist_standard` | NIST Standards | 11 | CSF 2.0, SSDF, AI RMF |
|
||||
| `owasp_standard` | OWASP Standards | 6 | Top 10, ASVS, API Security |
|
||||
| `enisa_guidance` | ENISA Guidance | 6 | Supply Chain, ICS/SCADA |
|
||||
| `international` | Internationale Standards | 7 | CVSS, CycloneDX, SPDX |
|
||||
| `legal_template` | Vorlagen & Muster | 17 | GitHub Policies, VVT-Muster |
|
||||
|
||||
## Integration in andere Projekte
|
||||
|
||||
### JSON importieren
|
||||
|
||||
```typescript
|
||||
import ragData from './rag-documents.json'
|
||||
|
||||
const documents = ragData.documents // 320 Dokumente
|
||||
const docTypes = ragData.doc_types // 17 Kategorien
|
||||
const industries = ragData.industries // 10 Branchen
|
||||
```
|
||||
|
||||
### Matrix-Logik
|
||||
|
||||
```typescript
|
||||
// Pruefen ob Dokument fuer Branche gilt
|
||||
const applies = (doc, industryId) =>
|
||||
doc.industries.includes(industryId) || doc.industries.includes('all')
|
||||
|
||||
// Dokumente nach Typ gruppieren
|
||||
const grouped = Object.groupBy(documents, d => d.doc_type)
|
||||
|
||||
// Nur sektorspezifische Dokumente fuer eine Branche
|
||||
const forAutomotive = documents.filter(d =>
|
||||
d.industries.includes('automotive') && !d.industries.includes('all')
|
||||
)
|
||||
```
|
||||
|
||||
### RAG-Status pruefen
|
||||
|
||||
```typescript
|
||||
import { REGULATIONS_IN_RAG } from './rag-constants'
|
||||
|
||||
const isInRag = (code: string) => code in REGULATIONS_IN_RAG
|
||||
const chunks = REGULATIONS_IN_RAG['GDPR']?.chunks // 423
|
||||
```
|
||||
|
||||
## Datenquellen
|
||||
|
||||
| Quelle | Pfad | Beschreibung |
|
||||
|--------|------|--------------|
|
||||
| RAG-Inventar | `~/Desktop/RAG-Dokumenten-Inventar.md` | 386 Quelldateien |
|
||||
| rag-documents.json | `admin-lehrer/.../rag/rag-documents.json` | 320 konsolidierte Dokumente |
|
||||
| rag-constants.ts | `admin-lehrer/.../rag/rag-constants.ts` | Qdrant-Metadaten |
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
cd admin-lehrer
|
||||
npx vitest run app/\(admin\)/ai/rag/__tests__/rag-documents.test.ts
|
||||
```
|
||||
|
||||
44 Tests validieren:
|
||||
|
||||
- JSON-Struktur (doc_types, industries, documents)
|
||||
- 10 echte Branchen (keine Pseudo-Branchen)
|
||||
- Pflichtfelder und gueltige Referenzen
|
||||
- Horizontale Regulierungen (DSGVO, AI Act, CRA → "all")
|
||||
- Sektorspezifische Zuordnungen (Maschinenverordnung, ElektroG)
|
||||
- Nicht zutreffende Regulierungen (DORA, MiCA → leer)
|
||||
- Applicability Notes vorhanden und korrekt
|
||||
|
||||
## Aenderungshistorie
|
||||
|
||||
| Datum | Aenderung |
|
||||
|-------|-----------|
|
||||
| 2026-04-15 | Initiale Implementierung: 320 Dokumente, 10 Branchen, 17 Typen |
|
||||
| 2026-04-15 | Branchen-Review: OWASP/SBOM → alle, BSI-TR-03161 → leer |
|
||||
| 2026-04-15 | Applicability Notes UI: Aufklappbare Erklaerungen pro Dokument |
|
||||
339
klausur-service/backend/cv_box_layout.py
Normal file
339
klausur-service/backend/cv_box_layout.py
Normal file
@@ -0,0 +1,339 @@
|
||||
"""
|
||||
Box layout classifier — detects internal layout type of embedded boxes.
|
||||
|
||||
Classifies each box as: flowing | columnar | bullet_list | header_only
|
||||
and provides layout-appropriate grid building.
|
||||
|
||||
Used by the Box-Grid-Review step to rebuild box zones with correct structure.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import statistics
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Bullet / list-item patterns at the start of a line
|
||||
_BULLET_RE = re.compile(
|
||||
r'^[\-\u2022\u2013\u2014\u25CF\u25CB\u25AA\u25A0•·]\s' # dash, bullet chars
|
||||
r'|^\d{1,2}[.)]\s' # numbered: "1) " or "1. "
|
||||
r'|^[a-z][.)]\s' # lettered: "a) " or "a. "
|
||||
)
|
||||
|
||||
|
||||
def classify_box_layout(
|
||||
words: List[Dict],
|
||||
box_w: int,
|
||||
box_h: int,
|
||||
) -> str:
|
||||
"""Classify the internal layout of a detected box.
|
||||
|
||||
Args:
|
||||
words: OCR word dicts within the box (with top, left, width, height, text)
|
||||
box_w: Box width in pixels
|
||||
box_h: Box height in pixels
|
||||
|
||||
Returns:
|
||||
'header_only' | 'bullet_list' | 'columnar' | 'flowing'
|
||||
"""
|
||||
if not words:
|
||||
return "header_only"
|
||||
|
||||
# Group words into lines by y-proximity
|
||||
lines = _group_into_lines(words)
|
||||
|
||||
# Header only: very few words or single line
|
||||
total_words = sum(len(line) for line in lines)
|
||||
if total_words <= 5 or len(lines) <= 1:
|
||||
return "header_only"
|
||||
|
||||
# Bullet list: check if majority of lines start with bullet patterns
|
||||
bullet_count = 0
|
||||
for line in lines:
|
||||
first_text = line[0].get("text", "") if line else ""
|
||||
if _BULLET_RE.match(first_text):
|
||||
bullet_count += 1
|
||||
# Also check if first word IS a bullet char
|
||||
elif first_text.strip() in ("-", "–", "—", "•", "·", "▪", "▸"):
|
||||
bullet_count += 1
|
||||
if bullet_count >= len(lines) * 0.4 and bullet_count >= 2:
|
||||
return "bullet_list"
|
||||
|
||||
# Columnar: check for multiple distinct x-clusters
|
||||
if len(lines) >= 3 and _has_column_structure(words, box_w):
|
||||
return "columnar"
|
||||
|
||||
# Default: flowing text
|
||||
return "flowing"
|
||||
|
||||
|
||||
def _group_into_lines(words: List[Dict]) -> List[List[Dict]]:
|
||||
"""Group words into lines by y-proximity."""
|
||||
if not words:
|
||||
return []
|
||||
|
||||
sorted_words = sorted(words, key=lambda w: (w["top"], w["left"]))
|
||||
heights = [w["height"] for w in sorted_words if w.get("height", 0) > 0]
|
||||
median_h = statistics.median(heights) if heights else 20
|
||||
y_tolerance = max(median_h * 0.5, 5)
|
||||
|
||||
lines: List[List[Dict]] = []
|
||||
current_line: List[Dict] = [sorted_words[0]]
|
||||
current_y = sorted_words[0]["top"]
|
||||
|
||||
for w in sorted_words[1:]:
|
||||
if abs(w["top"] - current_y) <= y_tolerance:
|
||||
current_line.append(w)
|
||||
else:
|
||||
lines.append(sorted(current_line, key=lambda ww: ww["left"]))
|
||||
current_line = [w]
|
||||
current_y = w["top"]
|
||||
|
||||
if current_line:
|
||||
lines.append(sorted(current_line, key=lambda ww: ww["left"]))
|
||||
|
||||
return lines
|
||||
|
||||
|
||||
def _has_column_structure(words: List[Dict], box_w: int) -> bool:
|
||||
"""Check if words have multiple distinct left-edge clusters (columns)."""
|
||||
if box_w <= 0:
|
||||
return False
|
||||
|
||||
lines = _group_into_lines(words)
|
||||
if len(lines) < 3:
|
||||
return False
|
||||
|
||||
# Collect left-edges of non-first words in each line
|
||||
# (first word of each line often aligns regardless of columns)
|
||||
left_edges = []
|
||||
for line in lines:
|
||||
for w in line[1:]: # skip first word
|
||||
left_edges.append(w["left"])
|
||||
|
||||
if len(left_edges) < 4:
|
||||
return False
|
||||
|
||||
# Check if left edges cluster into 2+ distinct groups
|
||||
left_edges.sort()
|
||||
gaps = [left_edges[i + 1] - left_edges[i] for i in range(len(left_edges) - 1)]
|
||||
if not gaps:
|
||||
return False
|
||||
|
||||
median_gap = statistics.median(gaps)
|
||||
# A column gap is typically > 15% of box width
|
||||
column_gap_threshold = box_w * 0.15
|
||||
large_gaps = [g for g in gaps if g > column_gap_threshold]
|
||||
|
||||
return len(large_gaps) >= 1
|
||||
|
||||
|
||||
def build_box_zone_grid(
|
||||
zone_words: List[Dict],
|
||||
box_x: int,
|
||||
box_y: int,
|
||||
box_w: int,
|
||||
box_h: int,
|
||||
zone_index: int,
|
||||
img_w: int,
|
||||
img_h: int,
|
||||
layout_type: Optional[str] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Build a grid for a box zone with layout-aware processing.
|
||||
|
||||
If layout_type is None, auto-detects it.
|
||||
For 'flowing' and 'bullet_list', forces single-column layout.
|
||||
For 'columnar', uses the standard multi-column detection.
|
||||
For 'header_only', creates a single cell.
|
||||
|
||||
Returns the same format as _build_zone_grid (columns, rows, cells, header_rows).
|
||||
"""
|
||||
from grid_editor_helpers import _build_zone_grid, _cluster_rows
|
||||
|
||||
if not zone_words:
|
||||
return {
|
||||
"columns": [],
|
||||
"rows": [],
|
||||
"cells": [],
|
||||
"header_rows": [],
|
||||
"box_layout_type": layout_type or "header_only",
|
||||
"box_grid_reviewed": False,
|
||||
}
|
||||
|
||||
# Auto-detect layout if not specified
|
||||
if not layout_type:
|
||||
layout_type = classify_box_layout(zone_words, box_w, box_h)
|
||||
|
||||
logger.info(
|
||||
"Box zone %d: layout_type=%s, %d words, %dx%d",
|
||||
zone_index, layout_type, len(zone_words), box_w, box_h,
|
||||
)
|
||||
|
||||
if layout_type == "header_only":
|
||||
# Single cell with all text concatenated
|
||||
all_text = " ".join(
|
||||
w.get("text", "") for w in sorted(zone_words, key=lambda ww: (ww["top"], ww["left"]))
|
||||
).strip()
|
||||
return {
|
||||
"columns": [{"col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
|
||||
"x_min_px": box_x, "x_max_px": box_x + box_w,
|
||||
"x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
|
||||
"x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
|
||||
"bold": False}],
|
||||
"rows": [{"index": 0, "row_index": 0,
|
||||
"y_min": box_y, "y_max": box_y + box_h, "y_center": box_y + box_h / 2,
|
||||
"y_min_px": box_y, "y_max_px": box_y + box_h,
|
||||
"y_min_pct": round(box_y / img_h * 100, 2) if img_h else 0,
|
||||
"y_max_pct": round((box_y + box_h) / img_h * 100, 2) if img_h else 0,
|
||||
"is_header": True}],
|
||||
"cells": [{
|
||||
"cell_id": f"Z{zone_index}_R0C0",
|
||||
"row_index": 0,
|
||||
"col_index": 0,
|
||||
"col_type": "column_1",
|
||||
"text": all_text,
|
||||
"word_boxes": zone_words,
|
||||
}],
|
||||
"header_rows": [0],
|
||||
"box_layout_type": layout_type,
|
||||
"box_grid_reviewed": False,
|
||||
}
|
||||
|
||||
if layout_type in ("flowing", "bullet_list"):
|
||||
# Force single column — each line becomes one row with one cell.
|
||||
# Detect bullet structure from indentation and merge continuation
|
||||
# lines into the bullet they belong to.
|
||||
lines = _group_into_lines(zone_words)
|
||||
column = {
|
||||
"col_index": 0, "index": 0, "label": "column_text", "col_type": "column_1",
|
||||
"x_min_px": box_x, "x_max_px": box_x + box_w,
|
||||
"x_min_pct": round(box_x / img_w * 100, 2) if img_w else 0,
|
||||
"x_max_pct": round((box_x + box_w) / img_w * 100, 2) if img_w else 0,
|
||||
"bold": False,
|
||||
}
|
||||
|
||||
# --- Detect indentation levels ---
|
||||
line_indents = []
|
||||
for line_words in lines:
|
||||
if not line_words:
|
||||
line_indents.append(0)
|
||||
continue
|
||||
min_left = min(w["left"] for w in line_words)
|
||||
line_indents.append(min_left - box_x)
|
||||
|
||||
# Find the minimum indent (= bullet/main level)
|
||||
valid_indents = [ind for ind in line_indents if ind >= 0]
|
||||
min_indent = min(valid_indents) if valid_indents else 0
|
||||
|
||||
# Indentation threshold: lines indented > 15px more than minimum
|
||||
# are continuation lines belonging to the previous bullet
|
||||
INDENT_THRESHOLD = 15
|
||||
|
||||
# --- Group lines into logical items (bullet + continuations) ---
|
||||
# Each item is a list of line indices
|
||||
items: List[List[int]] = []
|
||||
for li, indent in enumerate(line_indents):
|
||||
is_continuation = (indent > min_indent + INDENT_THRESHOLD) and len(items) > 0
|
||||
if is_continuation:
|
||||
items[-1].append(li)
|
||||
else:
|
||||
items.append([li])
|
||||
|
||||
logger.info(
|
||||
"Box zone %d flowing: %d lines → %d items (indents=%s, min=%d, threshold=%d)",
|
||||
zone_index, len(lines), len(items),
|
||||
[int(i) for i in line_indents], int(min_indent), INDENT_THRESHOLD,
|
||||
)
|
||||
|
||||
# --- Build rows and cells from grouped items ---
|
||||
rows = []
|
||||
cells = []
|
||||
header_rows = []
|
||||
|
||||
for row_idx, item_line_indices in enumerate(items):
|
||||
# Collect all words from all lines in this item
|
||||
item_words = []
|
||||
item_texts = []
|
||||
for li in item_line_indices:
|
||||
if li < len(lines):
|
||||
item_words.extend(lines[li])
|
||||
line_text = " ".join(w.get("text", "") for w in lines[li]).strip()
|
||||
if line_text:
|
||||
item_texts.append(line_text)
|
||||
|
||||
if not item_words:
|
||||
continue
|
||||
|
||||
y_min = min(w["top"] for w in item_words)
|
||||
y_max = max(w["top"] + w["height"] for w in item_words)
|
||||
y_center = (y_min + y_max) / 2
|
||||
|
||||
row = {
|
||||
"index": row_idx,
|
||||
"row_index": row_idx,
|
||||
"y_min": y_min,
|
||||
"y_max": y_max,
|
||||
"y_center": y_center,
|
||||
"y_min_px": y_min,
|
||||
"y_max_px": y_max,
|
||||
"y_min_pct": round(y_min / img_h * 100, 2) if img_h else 0,
|
||||
"y_max_pct": round(y_max / img_h * 100, 2) if img_h else 0,
|
||||
"is_header": False,
|
||||
}
|
||||
rows.append(row)
|
||||
|
||||
# Join multi-line text with newline for display
|
||||
merged_text = "\n".join(item_texts)
|
||||
|
||||
# Add bullet marker if this is a bullet item without one
|
||||
first_text = item_texts[0] if item_texts else ""
|
||||
is_bullet = len(item_line_indices) > 1 or _BULLET_RE.match(first_text)
|
||||
if is_bullet and not _BULLET_RE.match(first_text) and row_idx > 0:
|
||||
# Continuation item without bullet — add one
|
||||
merged_text = "• " + merged_text
|
||||
|
||||
cell = {
|
||||
"cell_id": f"Z{zone_index}_R{row_idx}C0",
|
||||
"row_index": row_idx,
|
||||
"col_index": 0,
|
||||
"col_type": "column_1",
|
||||
"text": merged_text,
|
||||
"word_boxes": item_words,
|
||||
}
|
||||
cells.append(cell)
|
||||
|
||||
# Detect header: first item if it has no continuation lines and is short
|
||||
if len(items) >= 2:
|
||||
first_item_texts = []
|
||||
for li in items[0]:
|
||||
if li < len(lines):
|
||||
first_item_texts.append(" ".join(w.get("text", "") for w in lines[li]).strip())
|
||||
first_text = " ".join(first_item_texts)
|
||||
if (len(first_text) < 40
|
||||
or first_text.isupper()
|
||||
or first_text.rstrip().endswith(':')):
|
||||
header_rows = [0]
|
||||
|
||||
return {
|
||||
"columns": [column],
|
||||
"rows": rows,
|
||||
"cells": cells,
|
||||
"header_rows": header_rows,
|
||||
"box_layout_type": layout_type,
|
||||
"box_grid_reviewed": False,
|
||||
}
|
||||
|
||||
# Columnar: use standard grid builder with independent column detection
|
||||
result = _build_zone_grid(
|
||||
zone_words, box_x, box_y, box_w, box_h,
|
||||
zone_index, img_w, img_h,
|
||||
global_columns=None, # detect columns independently
|
||||
)
|
||||
|
||||
# Colspan detection is now handled generically by _detect_colspan_cells
|
||||
# in grid_editor_helpers.py (called inside _build_zone_grid).
|
||||
|
||||
result["box_layout_type"] = layout_type
|
||||
result["box_grid_reviewed"] = False
|
||||
return result
|
||||
@@ -1447,6 +1447,90 @@ def _merge_phonetic_continuation_rows(
|
||||
return merged
|
||||
|
||||
|
||||
def _merge_wrapped_rows(
|
||||
entries: List[Dict[str, Any]],
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Merge rows where the primary column (EN) is empty — cell wrap continuation.
|
||||
|
||||
In textbook vocabulary tables, columns are often narrow, so the author
|
||||
wraps text within a cell. OCR treats each physical line as a separate row.
|
||||
The key indicator: if the EN column is empty but DE/example have text,
|
||||
this row is a continuation of the previous row's cells.
|
||||
|
||||
Example (original textbook has ONE row):
|
||||
Row 2: EN="take part (in)" DE="teilnehmen (an), mitmachen" EX="More than 200 singers took"
|
||||
Row 3: EN="" DE="(bei)" EX="part in the concert."
|
||||
→ Merged: EN="take part (in)" DE="teilnehmen (an), mitmachen (bei)" EX="More than 200 singers took part in the concert."
|
||||
|
||||
Also handles the reverse case: DE empty but EN has text (wrap in EN column).
|
||||
"""
|
||||
if len(entries) < 2:
|
||||
return entries
|
||||
|
||||
merged: List[Dict[str, Any]] = []
|
||||
for entry in entries:
|
||||
en = (entry.get('english') or '').strip()
|
||||
de = (entry.get('german') or '').strip()
|
||||
ex = (entry.get('example') or '').strip()
|
||||
|
||||
if not merged:
|
||||
merged.append(entry)
|
||||
continue
|
||||
|
||||
prev = merged[-1]
|
||||
prev_en = (prev.get('english') or '').strip()
|
||||
prev_de = (prev.get('german') or '').strip()
|
||||
prev_ex = (prev.get('example') or '').strip()
|
||||
|
||||
# Case 1: EN is empty → continuation of previous row
|
||||
# (DE or EX have text that should be appended to previous row)
|
||||
if not en and (de or ex) and prev_en:
|
||||
if de:
|
||||
if prev_de.endswith(','):
|
||||
sep = ' ' # "Wort," + " " + "Ausdruck"
|
||||
elif prev_de.endswith(('-', '(')):
|
||||
sep = '' # "teil-" + "nehmen" or "(" + "bei)"
|
||||
else:
|
||||
sep = ' '
|
||||
prev['german'] = (prev_de + sep + de).strip()
|
||||
if ex:
|
||||
sep = ' ' if prev_ex else ''
|
||||
prev['example'] = (prev_ex + sep + ex).strip()
|
||||
logger.debug(
|
||||
f"Merged wrapped row {entry.get('row_index')} into previous "
|
||||
f"(empty EN): DE={prev['german']!r}, EX={prev.get('example', '')!r}"
|
||||
)
|
||||
continue
|
||||
|
||||
# Case 2: DE is empty, EN has text that looks like continuation
|
||||
# (starts with lowercase or is a parenthetical like "(bei)")
|
||||
if en and not de and prev_de:
|
||||
is_paren = en.startswith('(')
|
||||
first_alpha = next((c for c in en if c.isalpha()), '')
|
||||
starts_lower = first_alpha and first_alpha.islower()
|
||||
|
||||
if (is_paren or starts_lower) and len(en.split()) < 5:
|
||||
sep = ' ' if prev_en and not prev_en.endswith((',', '-', '(')) else ''
|
||||
prev['english'] = (prev_en + sep + en).strip()
|
||||
if ex:
|
||||
sep2 = ' ' if prev_ex else ''
|
||||
prev['example'] = (prev_ex + sep2 + ex).strip()
|
||||
logger.debug(
|
||||
f"Merged wrapped row {entry.get('row_index')} into previous "
|
||||
f"(empty DE): EN={prev['english']!r}"
|
||||
)
|
||||
continue
|
||||
|
||||
merged.append(entry)
|
||||
|
||||
if len(merged) < len(entries):
|
||||
logger.info(
|
||||
f"_merge_wrapped_rows: merged {len(entries) - len(merged)} "
|
||||
f"continuation rows ({len(entries)} → {len(merged)})"
|
||||
)
|
||||
return merged
|
||||
|
||||
|
||||
def _merge_continuation_rows(
|
||||
entries: List[Dict[str, Any]],
|
||||
) -> List[Dict[str, Any]]:
|
||||
@@ -1561,6 +1645,9 @@ def build_word_grid(
|
||||
# --- Post-processing pipeline (deterministic, no LLM) ---
|
||||
n_raw = len(entries)
|
||||
|
||||
# 0. Merge cell-wrap continuation rows (empty primary column = text wrap)
|
||||
entries = _merge_wrapped_rows(entries)
|
||||
|
||||
# 0a. Merge phonetic-only continuation rows into previous entry
|
||||
entries = _merge_phonetic_continuation_rows(entries)
|
||||
|
||||
|
||||
610
klausur-service/backend/cv_gutter_repair.py
Normal file
610
klausur-service/backend/cv_gutter_repair.py
Normal file
@@ -0,0 +1,610 @@
|
||||
"""
|
||||
Gutter Repair — detects and fixes words truncated or blurred at the book gutter.
|
||||
|
||||
When scanning double-page spreads, the binding area (gutter) causes:
|
||||
1. Blurry/garbled trailing characters ("stammeli" → "stammeln")
|
||||
2. Words split across lines with a hyphen lost in the gutter
|
||||
("ve" + "künden" → "verkünden")
|
||||
|
||||
This module analyses grid cells, identifies gutter-edge candidates, and
|
||||
proposes corrections using pyspellchecker (DE + EN).
|
||||
|
||||
Lizenz: Apache 2.0 (kommerziell nutzbar)
|
||||
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
|
||||
"""
|
||||
|
||||
import itertools
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
import uuid
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Spellchecker setup (lazy, cached)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_spell_de = None
|
||||
_spell_en = None
|
||||
_SPELL_AVAILABLE = False
|
||||
|
||||
def _init_spellcheckers():
|
||||
"""Lazy-load DE + EN spellcheckers (cached across calls)."""
|
||||
global _spell_de, _spell_en, _SPELL_AVAILABLE
|
||||
if _spell_de is not None:
|
||||
return
|
||||
try:
|
||||
from spellchecker import SpellChecker
|
||||
_spell_de = SpellChecker(language='de', distance=1)
|
||||
_spell_en = SpellChecker(language='en', distance=1)
|
||||
_SPELL_AVAILABLE = True
|
||||
logger.info("Gutter repair: spellcheckers loaded (DE + EN)")
|
||||
except ImportError:
|
||||
logger.warning("pyspellchecker not installed — gutter repair unavailable")
|
||||
|
||||
|
||||
def _is_known(word: str) -> bool:
|
||||
"""Check if a word is known in DE or EN dictionary."""
|
||||
_init_spellcheckers()
|
||||
if not _SPELL_AVAILABLE:
|
||||
return False
|
||||
w = word.lower()
|
||||
return bool(_spell_de.known([w])) or bool(_spell_en.known([w]))
|
||||
|
||||
|
||||
def _spell_candidates(word: str, lang: str = "both") -> List[str]:
|
||||
"""Get all plausible spellchecker candidates for a word (deduplicated)."""
|
||||
_init_spellcheckers()
|
||||
if not _SPELL_AVAILABLE:
|
||||
return []
|
||||
w = word.lower()
|
||||
seen: set = set()
|
||||
results: List[str] = []
|
||||
|
||||
for checker in ([_spell_de, _spell_en] if lang == "both"
|
||||
else [_spell_de] if lang == "de"
|
||||
else [_spell_en]):
|
||||
if checker is None:
|
||||
continue
|
||||
cands = checker.candidates(w)
|
||||
if cands:
|
||||
for c in cands:
|
||||
if c and c != w and c not in seen:
|
||||
seen.add(c)
|
||||
results.append(c)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Gutter position detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Minimum word length for spell-fix (very short words are often legitimate)
|
||||
_MIN_WORD_LEN_SPELL = 3
|
||||
|
||||
# Minimum word length for hyphen-join candidates (fragments at the gutter
|
||||
# can be as short as 1-2 chars, e.g. "ve" from "ver-künden")
|
||||
_MIN_WORD_LEN_HYPHEN = 2
|
||||
|
||||
# How close to the right column edge a word must be to count as "gutter-adjacent".
|
||||
# Expressed as fraction of column width (e.g. 0.75 = rightmost 25%).
|
||||
_GUTTER_EDGE_THRESHOLD = 0.70
|
||||
|
||||
# Small common words / abbreviations that should NOT be repaired
|
||||
_STOPWORDS = frozenset([
|
||||
# German
|
||||
"ab", "an", "am", "da", "er", "es", "im", "in", "ja", "ob", "so", "um",
|
||||
"zu", "wo", "du", "eh", "ei", "je", "na", "nu", "oh",
|
||||
# English
|
||||
"a", "am", "an", "as", "at", "be", "by", "do", "go", "he", "if", "in",
|
||||
"is", "it", "me", "my", "no", "of", "on", "or", "so", "to", "up", "us",
|
||||
"we",
|
||||
])
|
||||
|
||||
# IPA / phonetic patterns — skip these cells
|
||||
_IPA_RE = re.compile(r'[\[\]/ˈˌːʃʒθðŋɑɒæɔəɛɪʊʌ]')
|
||||
|
||||
|
||||
def _is_ipa_text(text: str) -> bool:
|
||||
"""True if text looks like IPA transcription."""
|
||||
return bool(_IPA_RE.search(text))
|
||||
|
||||
|
||||
def _word_is_at_gutter_edge(word_bbox: Dict, col_x: float, col_width: float) -> bool:
|
||||
"""Check if a word's right edge is near the right boundary of its column."""
|
||||
if col_width <= 0:
|
||||
return False
|
||||
word_right = word_bbox.get("left", 0) + word_bbox.get("width", 0)
|
||||
col_right = col_x + col_width
|
||||
# Word's right edge within the rightmost portion of the column
|
||||
relative_pos = (word_right - col_x) / col_width
|
||||
return relative_pos >= _GUTTER_EDGE_THRESHOLD
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Suggestion types
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class GutterSuggestion:
|
||||
"""A single correction suggestion."""
|
||||
id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
|
||||
type: str = "" # "hyphen_join" | "spell_fix"
|
||||
zone_index: int = 0
|
||||
row_index: int = 0
|
||||
col_index: int = 0
|
||||
col_type: str = ""
|
||||
cell_id: str = ""
|
||||
original_text: str = ""
|
||||
suggested_text: str = ""
|
||||
# For hyphen_join:
|
||||
next_row_index: int = -1
|
||||
next_row_cell_id: str = ""
|
||||
next_row_text: str = ""
|
||||
missing_chars: str = ""
|
||||
display_parts: List[str] = field(default_factory=list)
|
||||
# Alternatives (other plausible corrections the user can pick from)
|
||||
alternatives: List[str] = field(default_factory=list)
|
||||
# Meta:
|
||||
confidence: float = 0.0
|
||||
reason: str = "" # "gutter_truncation" | "gutter_blur" | "hyphen_continuation"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Core repair logic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_TRAILING_PUNCT_RE = re.compile(r'[.,;:!?\)\]]+$')
|
||||
|
||||
|
||||
def _try_hyphen_join(
|
||||
word_text: str,
|
||||
next_word_text: str,
|
||||
max_missing: int = 3,
|
||||
) -> Optional[Tuple[str, str, float]]:
|
||||
"""Try joining two fragments with 0..max_missing interpolated chars.
|
||||
|
||||
Strips trailing punctuation from the continuation word before testing
|
||||
(e.g. "künden," → "künden") so dictionary lookup succeeds.
|
||||
|
||||
Returns (joined_word, missing_chars, confidence) or None.
|
||||
"""
|
||||
base = word_text.rstrip("-").rstrip()
|
||||
# Strip trailing punctuation from continuation (commas, periods, etc.)
|
||||
raw_continuation = next_word_text.lstrip()
|
||||
continuation = _TRAILING_PUNCT_RE.sub('', raw_continuation)
|
||||
|
||||
if not base or not continuation:
|
||||
return None
|
||||
|
||||
# 1. Direct join (no missing chars)
|
||||
direct = base + continuation
|
||||
if _is_known(direct):
|
||||
return (direct, "", 0.95)
|
||||
|
||||
# 2. Try with 1..max_missing missing characters
|
||||
# Use common letters, weighted by frequency in German/English
|
||||
_COMMON_CHARS = "enristaldhgcmobwfkzpvjyxqu"
|
||||
|
||||
for n_missing in range(1, max_missing + 1):
|
||||
for chars in itertools.product(_COMMON_CHARS[:15], repeat=n_missing):
|
||||
candidate = base + "".join(chars) + continuation
|
||||
if _is_known(candidate):
|
||||
missing = "".join(chars)
|
||||
# Confidence decreases with more missing chars
|
||||
conf = 0.90 - (n_missing - 1) * 0.10
|
||||
return (candidate, missing, conf)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _try_spell_fix(
|
||||
word_text: str, col_type: str = "",
|
||||
) -> Optional[Tuple[str, float, List[str]]]:
|
||||
"""Try to fix a single garbled gutter word via spellchecker.
|
||||
|
||||
Returns (best_correction, confidence, alternatives_list) or None.
|
||||
The alternatives list contains other plausible corrections the user
|
||||
can choose from (e.g. "stammelt" vs "stammeln").
|
||||
"""
|
||||
if len(word_text) < _MIN_WORD_LEN_SPELL:
|
||||
return None
|
||||
|
||||
# Strip trailing/leading parentheses and check if the bare word is valid.
|
||||
# Words like "probieren)" or "(Englisch" are valid words with punctuation,
|
||||
# not OCR errors. Don't suggest corrections for them.
|
||||
stripped = word_text.strip("()")
|
||||
if stripped and _is_known(stripped):
|
||||
return None
|
||||
|
||||
# Determine language priority from column type
|
||||
if "en" in col_type:
|
||||
lang = "en"
|
||||
elif "de" in col_type:
|
||||
lang = "de"
|
||||
else:
|
||||
lang = "both"
|
||||
|
||||
candidates = _spell_candidates(word_text, lang=lang)
|
||||
if not candidates and lang != "both":
|
||||
candidates = _spell_candidates(word_text, lang="both")
|
||||
|
||||
if not candidates:
|
||||
return None
|
||||
|
||||
# Preserve original casing
|
||||
is_upper = word_text[0].isupper()
|
||||
|
||||
def _preserve_case(w: str) -> str:
|
||||
if is_upper and w:
|
||||
return w[0].upper() + w[1:]
|
||||
return w
|
||||
|
||||
# Sort candidates by edit distance (closest first)
|
||||
scored = []
|
||||
for c in candidates:
|
||||
dist = _edit_distance(word_text.lower(), c.lower())
|
||||
scored.append((dist, c))
|
||||
scored.sort(key=lambda x: x[0])
|
||||
|
||||
best_dist, best = scored[0]
|
||||
best = _preserve_case(best)
|
||||
conf = max(0.5, 1.0 - best_dist * 0.15)
|
||||
|
||||
# Build alternatives (all other candidates, also case-preserved)
|
||||
alts = [_preserve_case(c) for _, c in scored[1:] if c.lower() != best.lower()]
|
||||
# Limit to top 5 alternatives
|
||||
alts = alts[:5]
|
||||
|
||||
return (best, conf, alts)
|
||||
|
||||
|
||||
def _edit_distance(a: str, b: str) -> int:
|
||||
"""Simple Levenshtein distance."""
|
||||
if len(a) < len(b):
|
||||
return _edit_distance(b, a)
|
||||
if len(b) == 0:
|
||||
return len(a)
|
||||
prev = list(range(len(b) + 1))
|
||||
for i, ca in enumerate(a):
|
||||
curr = [i + 1]
|
||||
for j, cb in enumerate(b):
|
||||
cost = 0 if ca == cb else 1
|
||||
curr.append(min(curr[j] + 1, prev[j + 1] + 1, prev[j] + cost))
|
||||
prev = curr
|
||||
return prev[len(b)]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Grid analysis
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def analyse_grid_for_gutter_repair(
|
||||
grid_data: Dict[str, Any],
|
||||
image_width: int = 0,
|
||||
) -> Dict[str, Any]:
|
||||
"""Analyse a structured grid and return gutter repair suggestions.
|
||||
|
||||
Args:
|
||||
grid_data: The grid_editor_result from the session (zones→cells structure).
|
||||
image_width: Image width in pixels (for determining gutter side).
|
||||
|
||||
Returns:
|
||||
Dict with "suggestions" list and "stats".
|
||||
"""
|
||||
t0 = time.time()
|
||||
_init_spellcheckers()
|
||||
|
||||
if not _SPELL_AVAILABLE:
|
||||
return {
|
||||
"suggestions": [],
|
||||
"stats": {"error": "pyspellchecker not installed"},
|
||||
"duration_seconds": 0,
|
||||
}
|
||||
|
||||
zones = grid_data.get("zones", [])
|
||||
suggestions: List[GutterSuggestion] = []
|
||||
words_checked = 0
|
||||
gutter_candidates = 0
|
||||
|
||||
for zi, zone in enumerate(zones):
|
||||
columns = zone.get("columns", [])
|
||||
cells = zone.get("cells", [])
|
||||
if not columns or not cells:
|
||||
continue
|
||||
|
||||
# Build column lookup: col_index → {x, width, type}
|
||||
col_info: Dict[int, Dict] = {}
|
||||
for col in columns:
|
||||
ci = col.get("index", col.get("col_index", -1))
|
||||
col_info[ci] = {
|
||||
"x": col.get("x_min_px", col.get("x", 0)),
|
||||
"width": col.get("x_max_px", col.get("width", 0)) - col.get("x_min_px", col.get("x", 0)),
|
||||
"type": col.get("type", col.get("col_type", "")),
|
||||
}
|
||||
|
||||
# Build row→col→cell lookup
|
||||
cell_map: Dict[Tuple[int, int], Dict] = {}
|
||||
max_row = 0
|
||||
for cell in cells:
|
||||
ri = cell.get("row_index", 0)
|
||||
ci = cell.get("col_index", 0)
|
||||
cell_map[(ri, ci)] = cell
|
||||
if ri > max_row:
|
||||
max_row = ri
|
||||
|
||||
# Determine which columns are at the gutter edge.
|
||||
# For a left page: rightmost content columns.
|
||||
# For now, check ALL columns — a word is a candidate if it's at the
|
||||
# right edge of its column AND not a known word.
|
||||
for (ri, ci), cell in cell_map.items():
|
||||
text = (cell.get("text") or "").strip()
|
||||
if not text:
|
||||
continue
|
||||
if _is_ipa_text(text):
|
||||
continue
|
||||
|
||||
words_checked += 1
|
||||
col = col_info.get(ci, {})
|
||||
col_type = col.get("type", "")
|
||||
|
||||
# Get word boxes to check position
|
||||
word_boxes = cell.get("word_boxes", [])
|
||||
|
||||
# Check the LAST word in the cell (rightmost, closest to gutter)
|
||||
cell_words = text.split()
|
||||
if not cell_words:
|
||||
continue
|
||||
|
||||
last_word = cell_words[-1]
|
||||
|
||||
# Skip stopwords
|
||||
if last_word.lower().rstrip(".,;:!?-") in _STOPWORDS:
|
||||
continue
|
||||
|
||||
last_word_clean = last_word.rstrip(".,;:!?)(")
|
||||
if len(last_word_clean) < _MIN_WORD_LEN_HYPHEN:
|
||||
continue
|
||||
|
||||
# Check if the last word is at the gutter edge
|
||||
is_at_edge = False
|
||||
if word_boxes:
|
||||
last_wb = word_boxes[-1]
|
||||
is_at_edge = _word_is_at_gutter_edge(
|
||||
last_wb, col.get("x", 0), col.get("width", 1)
|
||||
)
|
||||
else:
|
||||
# No word boxes — use cell bbox
|
||||
bbox = cell.get("bbox_px", {})
|
||||
is_at_edge = _word_is_at_gutter_edge(
|
||||
{"left": bbox.get("x", 0), "width": bbox.get("w", 0)},
|
||||
col.get("x", 0), col.get("width", 1)
|
||||
)
|
||||
|
||||
if not is_at_edge:
|
||||
continue
|
||||
|
||||
# Word is at gutter edge — check if it's a known word
|
||||
if _is_known(last_word_clean):
|
||||
continue
|
||||
|
||||
# Check if the word ends with "-" (explicit hyphen break)
|
||||
ends_with_hyphen = last_word.endswith("-")
|
||||
|
||||
# If the word already ends with "-" and the stem (without
|
||||
# the hyphen) is a known word, this is a VALID line-break
|
||||
# hyphenation — not a gutter error. Gutter problems cause
|
||||
# the hyphen to be LOST ("ve" instead of "ver-"), so a
|
||||
# visible hyphen + known stem = intentional word-wrap.
|
||||
# Example: "wunder-" → "wunder" is known → skip.
|
||||
if ends_with_hyphen:
|
||||
stem = last_word_clean.rstrip("-")
|
||||
if stem and _is_known(stem):
|
||||
continue
|
||||
|
||||
gutter_candidates += 1
|
||||
|
||||
# --- Strategy 1: Hyphen join with next row ---
|
||||
next_cell = cell_map.get((ri + 1, ci))
|
||||
if next_cell:
|
||||
next_text = (next_cell.get("text") or "").strip()
|
||||
next_words = next_text.split()
|
||||
if next_words:
|
||||
first_next = next_words[0]
|
||||
first_next_clean = _TRAILING_PUNCT_RE.sub('', first_next)
|
||||
first_alpha = next((c for c in first_next if c.isalpha()), "")
|
||||
|
||||
# Also skip if the joined word is known (covers compound
|
||||
# words where the stem alone might not be in the dictionary)
|
||||
if ends_with_hyphen and first_next_clean:
|
||||
direct = last_word_clean.rstrip("-") + first_next_clean
|
||||
if _is_known(direct):
|
||||
continue
|
||||
|
||||
# Continuation likely if:
|
||||
# - explicit hyphen, OR
|
||||
# - next row starts lowercase (= not a new entry)
|
||||
if ends_with_hyphen or (first_alpha and first_alpha.islower()):
|
||||
result = _try_hyphen_join(last_word_clean, first_next)
|
||||
if result:
|
||||
joined, missing, conf = result
|
||||
# Build display parts: show hyphenation for original layout
|
||||
if ends_with_hyphen:
|
||||
display_p1 = last_word_clean.rstrip("-")
|
||||
if missing:
|
||||
display_p1 += missing
|
||||
display_p1 += "-"
|
||||
else:
|
||||
display_p1 = last_word_clean
|
||||
if missing:
|
||||
display_p1 += missing + "-"
|
||||
else:
|
||||
display_p1 += "-"
|
||||
|
||||
suggestion = GutterSuggestion(
|
||||
type="hyphen_join",
|
||||
zone_index=zi,
|
||||
row_index=ri,
|
||||
col_index=ci,
|
||||
col_type=col_type,
|
||||
cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
|
||||
original_text=last_word,
|
||||
suggested_text=joined,
|
||||
next_row_index=ri + 1,
|
||||
next_row_cell_id=next_cell.get("cell_id", f"R{ri+1:02d}_C{ci}"),
|
||||
next_row_text=next_text,
|
||||
missing_chars=missing,
|
||||
display_parts=[display_p1, first_next],
|
||||
confidence=conf,
|
||||
reason="gutter_truncation" if missing else "hyphen_continuation",
|
||||
)
|
||||
suggestions.append(suggestion)
|
||||
continue # skip spell_fix if hyphen_join found
|
||||
|
||||
# --- Strategy 2: Single-word spell fix (only for longer words) ---
|
||||
fix_result = _try_spell_fix(last_word_clean, col_type)
|
||||
if fix_result:
|
||||
corrected, conf, alts = fix_result
|
||||
suggestion = GutterSuggestion(
|
||||
type="spell_fix",
|
||||
zone_index=zi,
|
||||
row_index=ri,
|
||||
col_index=ci,
|
||||
col_type=col_type,
|
||||
cell_id=cell.get("cell_id", f"R{ri:02d}_C{ci}"),
|
||||
original_text=last_word,
|
||||
suggested_text=corrected,
|
||||
alternatives=alts,
|
||||
confidence=conf,
|
||||
reason="gutter_blur",
|
||||
)
|
||||
suggestions.append(suggestion)
|
||||
|
||||
duration = round(time.time() - t0, 3)
|
||||
|
||||
logger.info(
|
||||
"Gutter repair: checked %d words, %d gutter candidates, %d suggestions (%.2fs)",
|
||||
words_checked, gutter_candidates, len(suggestions), duration,
|
||||
)
|
||||
|
||||
return {
|
||||
"suggestions": [s.to_dict() for s in suggestions],
|
||||
"stats": {
|
||||
"words_checked": words_checked,
|
||||
"gutter_candidates": gutter_candidates,
|
||||
"suggestions_found": len(suggestions),
|
||||
},
|
||||
"duration_seconds": duration,
|
||||
}
|
||||
|
||||
|
||||
def apply_gutter_suggestions(
|
||||
grid_data: Dict[str, Any],
|
||||
accepted_ids: List[str],
|
||||
suggestions: List[Dict[str, Any]],
|
||||
) -> Dict[str, Any]:
|
||||
"""Apply accepted gutter repair suggestions to the grid data.
|
||||
|
||||
Modifies cells in-place and returns summary of changes.
|
||||
|
||||
Args:
|
||||
grid_data: The grid_editor_result (zones→cells).
|
||||
accepted_ids: List of suggestion IDs the user accepted.
|
||||
suggestions: The full suggestions list (from analyse_grid_for_gutter_repair).
|
||||
|
||||
Returns:
|
||||
Dict with "applied_count" and "changes" list.
|
||||
"""
|
||||
accepted_set = set(accepted_ids)
|
||||
accepted_suggestions = [s for s in suggestions if s.get("id") in accepted_set]
|
||||
|
||||
zones = grid_data.get("zones", [])
|
||||
changes: List[Dict[str, Any]] = []
|
||||
|
||||
for s in accepted_suggestions:
|
||||
zi = s.get("zone_index", 0)
|
||||
ri = s.get("row_index", 0)
|
||||
ci = s.get("col_index", 0)
|
||||
stype = s.get("type", "")
|
||||
|
||||
if zi >= len(zones):
|
||||
continue
|
||||
zone_cells = zones[zi].get("cells", [])
|
||||
|
||||
# Find the target cell
|
||||
target_cell = None
|
||||
for cell in zone_cells:
|
||||
if cell.get("row_index") == ri and cell.get("col_index") == ci:
|
||||
target_cell = cell
|
||||
break
|
||||
|
||||
if not target_cell:
|
||||
continue
|
||||
|
||||
old_text = target_cell.get("text", "")
|
||||
|
||||
if stype == "spell_fix":
|
||||
# Replace the last word in the cell text
|
||||
original_word = s.get("original_text", "")
|
||||
corrected = s.get("suggested_text", "")
|
||||
if original_word and corrected:
|
||||
# Replace from the right (last occurrence)
|
||||
idx = old_text.rfind(original_word)
|
||||
if idx >= 0:
|
||||
new_text = old_text[:idx] + corrected + old_text[idx + len(original_word):]
|
||||
target_cell["text"] = new_text
|
||||
changes.append({
|
||||
"type": "spell_fix",
|
||||
"zone_index": zi,
|
||||
"row_index": ri,
|
||||
"col_index": ci,
|
||||
"cell_id": target_cell.get("cell_id", ""),
|
||||
"old_text": old_text,
|
||||
"new_text": new_text,
|
||||
})
|
||||
|
||||
elif stype == "hyphen_join":
|
||||
# Current cell: replace last word with the hyphenated first part
|
||||
original_word = s.get("original_text", "")
|
||||
joined = s.get("suggested_text", "")
|
||||
display_parts = s.get("display_parts", [])
|
||||
next_ri = s.get("next_row_index", -1)
|
||||
|
||||
if not original_word or not joined or not display_parts:
|
||||
continue
|
||||
|
||||
# The first display part is what goes in the current row
|
||||
first_part = display_parts[0] if display_parts else ""
|
||||
|
||||
# Replace the last word in current cell with the restored form.
|
||||
# The next row is NOT modified — "künden" stays in its row
|
||||
# because the original book layout has it there. We only fix
|
||||
# the truncated word in the current row (e.g. "ve" → "ver-").
|
||||
idx = old_text.rfind(original_word)
|
||||
if idx >= 0:
|
||||
new_text = old_text[:idx] + first_part + old_text[idx + len(original_word):]
|
||||
target_cell["text"] = new_text
|
||||
changes.append({
|
||||
"type": "hyphen_join",
|
||||
"zone_index": zi,
|
||||
"row_index": ri,
|
||||
"col_index": ci,
|
||||
"cell_id": target_cell.get("cell_id", ""),
|
||||
"old_text": old_text,
|
||||
"new_text": new_text,
|
||||
"joined_word": joined,
|
||||
})
|
||||
|
||||
logger.info("Gutter repair applied: %d/%d suggestions", len(changes), len(accepted_suggestions))
|
||||
|
||||
return {
|
||||
"applied_count": len(accepted_suggestions),
|
||||
"changes": changes,
|
||||
}
|
||||
@@ -1182,6 +1182,10 @@ def _insert_missing_ipa(text: str, pronunciation: str = 'british') -> str:
|
||||
if wj in ('–', '—', '-', '/', '|', ',', ';'):
|
||||
kept.extend(words[j:])
|
||||
break
|
||||
# Pure digits or numbering (e.g. "1", "2.", "3)") — keep
|
||||
if re.match(r'^[\d.)\-]+$', wj):
|
||||
kept.extend(words[j:])
|
||||
break
|
||||
# Starts with uppercase — likely German or proper noun
|
||||
clean_j = re.sub(r'[^a-zA-Z]', '', wj)
|
||||
if clean_j and clean_j[0].isupper():
|
||||
@@ -1243,6 +1247,9 @@ def _has_non_dict_trailing(text: str, pronunciation: str = 'british') -> bool:
|
||||
wj = words[j]
|
||||
if wj in ('–', '—', '-', '/', '|', ',', ';'):
|
||||
return False
|
||||
# Pure digits or numbering (e.g. "1", "2.", "3)") — not garbled IPA
|
||||
if re.match(r'^[\d.)\-]+$', wj):
|
||||
return False
|
||||
clean_j = re.sub(r'[^a-zA-Z]', '', wj)
|
||||
if clean_j and clean_j[0].isupper():
|
||||
return False
|
||||
@@ -1874,6 +1881,11 @@ def _is_noise_tail_token(token: str) -> bool:
|
||||
if t.endswith(']'):
|
||||
return False
|
||||
|
||||
# Keep meaningful punctuation tokens used in textbooks
|
||||
# = (definition marker), (= (definition opener), ; (separator)
|
||||
if t in ('=', '(=', '=)', ';', ':', '-', '–', '—', '/', '+', '&'):
|
||||
return False
|
||||
|
||||
# Pure non-alpha → noise ("3", ")", "|")
|
||||
alpha_chars = _RE_ALPHA.findall(t)
|
||||
if not alpha_chars:
|
||||
|
||||
@@ -720,6 +720,62 @@ def _spell_dict_knows(word: str) -> bool:
|
||||
return bool(_en_spell.known([w])) or bool(_de_spell.known([w]))
|
||||
|
||||
|
||||
def _try_split_merged_word(token: str) -> Optional[str]:
|
||||
"""Try to split a merged word like 'atmyschool' into 'at my school'.
|
||||
|
||||
Uses dynamic programming to find the shortest sequence of dictionary
|
||||
words that covers the entire token. Only returns a result when the
|
||||
split produces at least 2 words and ALL parts are known dictionary words.
|
||||
|
||||
Preserves original capitalisation by mapping back to the input string.
|
||||
"""
|
||||
if not _SPELL_AVAILABLE or len(token) < 4:
|
||||
return None
|
||||
|
||||
lower = token.lower()
|
||||
n = len(lower)
|
||||
|
||||
# dp[i] = (word_lengths_list, score) for best split of lower[:i], or None
|
||||
# Score: (-word_count, sum_of_squared_lengths) — fewer words first,
|
||||
# then prefer longer words (e.g. "come on" over "com eon")
|
||||
dp: list = [None] * (n + 1)
|
||||
dp[0] = ([], 0)
|
||||
|
||||
for i in range(1, n + 1):
|
||||
for j in range(max(0, i - 20), i):
|
||||
if dp[j] is None:
|
||||
continue
|
||||
candidate = lower[j:i]
|
||||
word_len = i - j
|
||||
if word_len == 1 and candidate not in ('a', 'i'):
|
||||
continue
|
||||
if _spell_dict_knows(candidate):
|
||||
prev_words, prev_sq = dp[j]
|
||||
new_words = prev_words + [word_len]
|
||||
new_sq = prev_sq + word_len * word_len
|
||||
new_key = (-len(new_words), new_sq)
|
||||
if dp[i] is None:
|
||||
dp[i] = (new_words, new_sq)
|
||||
else:
|
||||
old_key = (-len(dp[i][0]), dp[i][1])
|
||||
if new_key >= old_key:
|
||||
# >= so that later splits (longer first word) win ties
|
||||
dp[i] = (new_words, new_sq)
|
||||
|
||||
if dp[n] is None or len(dp[n][0]) < 2:
|
||||
return None
|
||||
|
||||
# Reconstruct with original casing
|
||||
result = []
|
||||
pos = 0
|
||||
for wlen in dp[n][0]:
|
||||
result.append(token[pos:pos + wlen])
|
||||
pos += wlen
|
||||
|
||||
logger.debug("Split merged word: %r → %r", token, " ".join(result))
|
||||
return " ".join(result)
|
||||
|
||||
|
||||
def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
|
||||
"""Return corrected form of token, or None if no fix needed/possible.
|
||||
|
||||
@@ -777,6 +833,14 @@ def _spell_fix_token(token: str, field: str = "") -> Optional[str]:
|
||||
correction = correction[0].upper() + correction[1:]
|
||||
if _spell_dict_knows(correction):
|
||||
return correction
|
||||
|
||||
# 5. Merged-word split: OCR often merges adjacent words when spacing
|
||||
# is too tight, e.g. "atmyschool" → "at my school"
|
||||
if len(token) >= 4 and token.isalpha():
|
||||
split = _try_split_merged_word(token)
|
||||
if split:
|
||||
return split
|
||||
|
||||
return None
|
||||
|
||||
|
||||
@@ -817,10 +881,25 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
|
||||
"""Rule-based OCR correction: spell-checker + structural heuristics.
|
||||
|
||||
Deterministic — never translates, never touches IPA, never hallucinates.
|
||||
Uses SmartSpellChecker for language-aware corrections with context-based
|
||||
disambiguation (a/I), multi-digit substitution, and cross-language guard.
|
||||
"""
|
||||
t0 = time.time()
|
||||
changes: List[Dict] = []
|
||||
all_corrected: List[Dict] = []
|
||||
|
||||
# Use SmartSpellChecker if available, fall back to legacy _spell_fix_field
|
||||
_smart = None
|
||||
try:
|
||||
from smart_spell import SmartSpellChecker
|
||||
_smart = SmartSpellChecker()
|
||||
logger.debug("spell_review: using SmartSpellChecker")
|
||||
except Exception:
|
||||
logger.debug("spell_review: SmartSpellChecker not available, using legacy")
|
||||
|
||||
# Map field names → language codes for SmartSpellChecker
|
||||
_LANG_MAP = {"english": "en", "german": "de", "example": "auto"}
|
||||
|
||||
for i, entry in enumerate(entries):
|
||||
e = dict(entry)
|
||||
# Page-ref normalization (always, regardless of review status)
|
||||
@@ -843,9 +922,18 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
|
||||
old_val = (e.get(field_name) or "").strip()
|
||||
if not old_val:
|
||||
continue
|
||||
# example field is mixed-language — try German first (for umlauts)
|
||||
lang = "german" if field_name in ("german", "example") else "english"
|
||||
new_val, was_changed = _spell_fix_field(old_val, field=lang)
|
||||
|
||||
if _smart:
|
||||
# SmartSpellChecker path — language-aware, context-based
|
||||
lang_code = _LANG_MAP.get(field_name, "en")
|
||||
result = _smart.correct_text(old_val, lang=lang_code)
|
||||
new_val = result.corrected
|
||||
was_changed = result.changed
|
||||
else:
|
||||
# Legacy path
|
||||
lang = "german" if field_name in ("german", "example") else "english"
|
||||
new_val, was_changed = _spell_fix_field(old_val, field=lang)
|
||||
|
||||
if was_changed and new_val != old_val:
|
||||
changes.append({
|
||||
"row_index": e.get("row_index", i),
|
||||
@@ -857,12 +945,13 @@ def spell_review_entries_sync(entries: List[Dict]) -> Dict:
|
||||
e["llm_corrected"] = True
|
||||
all_corrected.append(e)
|
||||
duration_ms = int((time.time() - t0) * 1000)
|
||||
model_name = "smart-spell-checker" if _smart else "spell-checker"
|
||||
return {
|
||||
"entries_original": entries,
|
||||
"entries_corrected": all_corrected,
|
||||
"changes": changes,
|
||||
"skipped_count": 0,
|
||||
"model_used": "spell-checker",
|
||||
"model_used": model_name,
|
||||
"duration_ms": duration_ms,
|
||||
}
|
||||
|
||||
|
||||
@@ -55,6 +55,9 @@ _STOP_WORDS = frozenset([
|
||||
_hyph_de = None
|
||||
_hyph_en = None
|
||||
|
||||
# Cached spellchecker (for autocorrect_pipe_artifacts)
|
||||
_spell_de = None
|
||||
|
||||
|
||||
def _get_hyphenators():
|
||||
"""Lazy-load pyphen hyphenators (cached across calls)."""
|
||||
@@ -70,6 +73,35 @@ def _get_hyphenators():
|
||||
return _hyph_de, _hyph_en
|
||||
|
||||
|
||||
def _get_spellchecker():
|
||||
"""Lazy-load German spellchecker (cached across calls)."""
|
||||
global _spell_de
|
||||
if _spell_de is not None:
|
||||
return _spell_de
|
||||
try:
|
||||
from spellchecker import SpellChecker
|
||||
except ImportError:
|
||||
return None
|
||||
_spell_de = SpellChecker(language='de')
|
||||
return _spell_de
|
||||
|
||||
|
||||
def _is_known_word(word: str, hyph_de, hyph_en) -> bool:
|
||||
"""Check whether pyphen recognises a word (DE or EN)."""
|
||||
if len(word) < 2:
|
||||
return False
|
||||
return ('|' in hyph_de.inserted(word, hyphen='|')
|
||||
or '|' in hyph_en.inserted(word, hyphen='|'))
|
||||
|
||||
|
||||
def _is_real_word(word: str) -> bool:
|
||||
"""Check whether spellchecker knows this word (case-insensitive)."""
|
||||
spell = _get_spellchecker()
|
||||
if spell is None:
|
||||
return False
|
||||
return word.lower() in spell
|
||||
|
||||
|
||||
def _hyphenate_word(word: str, hyph_de, hyph_en) -> Optional[str]:
|
||||
"""Try to hyphenate a word using DE then EN dictionary.
|
||||
|
||||
@@ -84,6 +116,139 @@ def _hyphenate_word(word: str, hyph_de, hyph_en) -> Optional[str]:
|
||||
return None
|
||||
|
||||
|
||||
def _autocorrect_piped_word(word_with_pipes: str) -> Optional[str]:
|
||||
"""Try to correct a word that has OCR pipe artifacts.
|
||||
|
||||
Printed syllable divider lines on dictionary pages confuse OCR:
|
||||
the vertical stroke is often read as an extra character (commonly
|
||||
``l``, ``I``, ``1``, ``i``) adjacent to where the pipe appears.
|
||||
Sometimes OCR reads one divider as ``|`` and another as a letter,
|
||||
so the garbled character may be far from any detected pipe.
|
||||
|
||||
Uses ``spellchecker`` (frequency-based word list) for validation —
|
||||
unlike pyphen which is a pattern-based hyphenator and accepts
|
||||
nonsense strings like "Zeplpelin".
|
||||
|
||||
Strategy:
|
||||
1. Strip ``|`` — if spellchecker knows the result, done.
|
||||
2. Try deleting each pipe-like character (l, I, 1, i, t).
|
||||
OCR inserts extra chars that resemble vertical strokes.
|
||||
3. Fall back to spellchecker's own ``correction()`` method.
|
||||
4. Preserve the original casing of the first letter.
|
||||
"""
|
||||
stripped = word_with_pipes.replace('|', '')
|
||||
if not stripped or len(stripped) < 3:
|
||||
return stripped # too short to validate
|
||||
|
||||
# Step 1: if the stripped word is already a real word, done
|
||||
if _is_real_word(stripped):
|
||||
return stripped
|
||||
|
||||
# Step 2: try deleting pipe-like characters (most likely artifacts)
|
||||
_PIPE_LIKE = frozenset('lI1it')
|
||||
for idx in range(len(stripped)):
|
||||
if stripped[idx] not in _PIPE_LIKE:
|
||||
continue
|
||||
candidate = stripped[:idx] + stripped[idx + 1:]
|
||||
if len(candidate) >= 3 and _is_real_word(candidate):
|
||||
return candidate
|
||||
|
||||
# Step 3: use spellchecker's built-in correction
|
||||
spell = _get_spellchecker()
|
||||
if spell is not None:
|
||||
suggestion = spell.correction(stripped.lower())
|
||||
if suggestion and suggestion != stripped.lower():
|
||||
# Preserve original first-letter case
|
||||
if stripped[0].isupper():
|
||||
suggestion = suggestion[0].upper() + suggestion[1:]
|
||||
return suggestion
|
||||
|
||||
return None # could not fix
|
||||
|
||||
|
||||
def autocorrect_pipe_artifacts(
|
||||
zones_data: List[Dict], session_id: str,
|
||||
) -> int:
|
||||
"""Strip OCR pipe artifacts and correct garbled words in-place.
|
||||
|
||||
Printed syllable divider lines on dictionary scans are read by OCR
|
||||
as ``|`` characters embedded in words (e.g. ``Zel|le``, ``Ze|plpe|lin``).
|
||||
This function:
|
||||
|
||||
1. Strips ``|`` from every word in content cells.
|
||||
2. Validates with spellchecker (real dictionary lookup).
|
||||
3. If not recognised, tries deleting pipe-like characters or uses
|
||||
spellchecker's correction (e.g. ``Zeplpelin`` → ``Zeppelin``).
|
||||
4. Updates both word-box texts and cell text.
|
||||
|
||||
Returns the number of cells modified.
|
||||
"""
|
||||
spell = _get_spellchecker()
|
||||
if spell is None:
|
||||
logger.warning("spellchecker not available — pipe autocorrect limited")
|
||||
# Fall back: still strip pipes even without spellchecker
|
||||
pass
|
||||
|
||||
modified = 0
|
||||
for z in zones_data:
|
||||
for cell in z.get("cells", []):
|
||||
ct = cell.get("col_type", "")
|
||||
if not ct.startswith("column_"):
|
||||
continue
|
||||
|
||||
cell_changed = False
|
||||
|
||||
# --- Fix word boxes ---
|
||||
for wb in cell.get("word_boxes", []):
|
||||
wb_text = wb.get("text", "")
|
||||
if "|" not in wb_text:
|
||||
continue
|
||||
|
||||
# Separate trailing punctuation
|
||||
m = re.match(
|
||||
r'^([^a-zA-ZäöüÄÖÜßẞ]*)'
|
||||
r'(.*?)'
|
||||
r'([^a-zA-ZäöüÄÖÜßẞ]*)$',
|
||||
wb_text,
|
||||
)
|
||||
if not m:
|
||||
continue
|
||||
lead, core, trail = m.group(1), m.group(2), m.group(3)
|
||||
if "|" not in core:
|
||||
continue
|
||||
|
||||
corrected = _autocorrect_piped_word(core)
|
||||
if corrected is not None and corrected != core:
|
||||
wb["text"] = lead + corrected + trail
|
||||
cell_changed = True
|
||||
|
||||
# --- Rebuild cell text from word boxes ---
|
||||
if cell_changed:
|
||||
wbs = cell.get("word_boxes", [])
|
||||
if wbs:
|
||||
cell["text"] = " ".join(
|
||||
(wb.get("text") or "") for wb in wbs
|
||||
)
|
||||
modified += 1
|
||||
|
||||
# --- Fallback: strip residual | from cell text ---
|
||||
# (covers cases where word_boxes don't exist or weren't fixed)
|
||||
text = cell.get("text", "")
|
||||
if "|" in text:
|
||||
clean = text.replace("|", "")
|
||||
if clean != text:
|
||||
cell["text"] = clean
|
||||
if not cell_changed:
|
||||
modified += 1
|
||||
|
||||
if modified:
|
||||
logger.info(
|
||||
"build-grid session %s: autocorrected pipe artifacts in %d cells",
|
||||
session_id, modified,
|
||||
)
|
||||
return modified
|
||||
|
||||
|
||||
def _try_merge_pipe_gaps(text: str, hyph_de) -> str:
|
||||
"""Merge fragments separated by single spaces where OCR split at a pipe.
|
||||
|
||||
@@ -185,7 +350,7 @@ def merge_word_gaps_in_zones(zones_data: List[Dict], session_id: str) -> int:
|
||||
|
||||
|
||||
def _try_merge_word_gaps(text: str, hyph_de) -> str:
|
||||
"""Merge OCR word fragments with relaxed threshold (max_short=6).
|
||||
"""Merge OCR word fragments with relaxed threshold (max_short=5).
|
||||
|
||||
Similar to ``_try_merge_pipe_gaps`` but allows slightly longer fragments
|
||||
(max_short=5 instead of 3). Still requires pyphen to recognize the
|
||||
|
||||
1958
klausur-service/backend/grid_build_core.py
Normal file
1958
klausur-service/backend/grid_build_core.py
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -22,6 +22,148 @@ from cv_ocr_engines import _text_has_garbled_ipa
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cross-column word splitting
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_spell_cache: Optional[Any] = None
|
||||
_spell_loaded = False
|
||||
|
||||
|
||||
def _is_recognized_word(text: str) -> bool:
|
||||
"""Check if *text* is a recognized German or English word.
|
||||
|
||||
Uses the spellchecker library (same as cv_syllable_detect.py).
|
||||
Returns True for real words like "oder", "Kabel", "Zeitung".
|
||||
Returns False for OCR merge artifacts like "sichzie", "dasZimmer".
|
||||
"""
|
||||
global _spell_cache, _spell_loaded
|
||||
if not text or len(text) < 2:
|
||||
return False
|
||||
|
||||
if not _spell_loaded:
|
||||
_spell_loaded = True
|
||||
try:
|
||||
from spellchecker import SpellChecker
|
||||
_spell_cache = SpellChecker(language="de")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if _spell_cache is None:
|
||||
return False
|
||||
|
||||
return text.lower() in _spell_cache
|
||||
|
||||
|
||||
def _split_cross_column_words(
|
||||
words: List[Dict],
|
||||
columns: List[Dict],
|
||||
) -> List[Dict]:
|
||||
"""Split word boxes that span across column boundaries.
|
||||
|
||||
When OCR merges adjacent words from different columns (e.g. "sichzie"
|
||||
spanning Col 1 and Col 2, or "dasZimmer" crossing the boundary),
|
||||
split the word box at the column boundary so each piece is assigned
|
||||
to the correct column.
|
||||
|
||||
Only splits when:
|
||||
- The word has significant overlap (>15% of its width) on both sides
|
||||
- AND the word is not a recognized real word (OCR merge artifact), OR
|
||||
the word contains a case transition (lowercase→uppercase) near the
|
||||
boundary indicating two merged words like "dasZimmer".
|
||||
"""
|
||||
if len(columns) < 2:
|
||||
return words
|
||||
|
||||
# Column boundaries = midpoints between adjacent column edges
|
||||
boundaries = []
|
||||
for i in range(len(columns) - 1):
|
||||
boundary = (columns[i]["x_max"] + columns[i + 1]["x_min"]) / 2
|
||||
boundaries.append(boundary)
|
||||
|
||||
new_words: List[Dict] = []
|
||||
split_count = 0
|
||||
for w in words:
|
||||
w_left = w["left"]
|
||||
w_width = w["width"]
|
||||
w_right = w_left + w_width
|
||||
text = (w.get("text") or "").strip()
|
||||
|
||||
if not text or len(text) < 4 or w_width < 10:
|
||||
new_words.append(w)
|
||||
continue
|
||||
|
||||
# Find the first boundary this word straddles significantly
|
||||
split_boundary = None
|
||||
for b in boundaries:
|
||||
if w_left < b < w_right:
|
||||
left_part = b - w_left
|
||||
right_part = w_right - b
|
||||
# Both sides must have at least 15% of the word width
|
||||
if left_part > w_width * 0.15 and right_part > w_width * 0.15:
|
||||
split_boundary = b
|
||||
break
|
||||
|
||||
if split_boundary is None:
|
||||
new_words.append(w)
|
||||
continue
|
||||
|
||||
# Compute approximate split position in the text.
|
||||
left_width = split_boundary - w_left
|
||||
split_ratio = left_width / w_width
|
||||
approx_pos = len(text) * split_ratio
|
||||
|
||||
# Strategy 1: look for a case transition (lowercase→uppercase) near
|
||||
# the approximate split point — e.g. "dasZimmer" splits at 'Z'.
|
||||
split_char = None
|
||||
search_lo = max(1, int(approx_pos) - 3)
|
||||
search_hi = min(len(text), int(approx_pos) + 2)
|
||||
for i in range(search_lo, search_hi):
|
||||
if text[i - 1].islower() and text[i].isupper():
|
||||
split_char = i
|
||||
break
|
||||
|
||||
# Strategy 2: if no case transition, only split if the whole word
|
||||
# is NOT a real word (i.e. it's an OCR merge artifact like "sichzie").
|
||||
# Real words like "oder", "Kabel", "Zeitung" must not be split.
|
||||
if split_char is None:
|
||||
clean = re.sub(r"[,;:.!?]+$", "", text) # strip trailing punct
|
||||
if _is_recognized_word(clean):
|
||||
new_words.append(w)
|
||||
continue
|
||||
# Not a real word — use floor of proportional position
|
||||
split_char = max(1, min(len(text) - 1, int(approx_pos)))
|
||||
|
||||
left_text = text[:split_char].rstrip()
|
||||
right_text = text[split_char:].lstrip()
|
||||
|
||||
if len(left_text) < 2 or len(right_text) < 2:
|
||||
new_words.append(w)
|
||||
continue
|
||||
|
||||
right_width = w_width - round(left_width)
|
||||
new_words.append({
|
||||
**w,
|
||||
"text": left_text,
|
||||
"width": round(left_width),
|
||||
})
|
||||
new_words.append({
|
||||
**w,
|
||||
"text": right_text,
|
||||
"left": round(split_boundary),
|
||||
"width": right_width,
|
||||
})
|
||||
split_count += 1
|
||||
logger.info(
|
||||
"split cross-column word %r → %r + %r at boundary %.0f",
|
||||
text, left_text, right_text, split_boundary,
|
||||
)
|
||||
|
||||
if split_count:
|
||||
logger.info("split %d cross-column word(s)", split_count)
|
||||
return new_words
|
||||
|
||||
|
||||
def _filter_border_strip_words(words: List[Dict]) -> Tuple[List[Dict], int]:
|
||||
"""Remove page-border decoration strip words BEFORE column detection.
|
||||
|
||||
@@ -138,8 +280,27 @@ def _cluster_columns_by_alignment(
|
||||
median_gap = sorted_gaps[len(sorted_gaps) // 2]
|
||||
heights = [w["height"] for w in words if w.get("height", 0) > 0]
|
||||
median_h = sorted(heights)[len(heights) // 2] if heights else 25
|
||||
# Column boundary: gap > 3× median gap or > 1.5× median word height
|
||||
gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
|
||||
|
||||
# For small word counts (boxes, sub-zones): PaddleOCR returns
|
||||
# multi-word blocks, so ALL inter-word gaps are potential column
|
||||
# boundaries. Use a low threshold based on word height — any gap
|
||||
# wider than ~1x median word height is a column separator.
|
||||
if len(words) <= 60:
|
||||
gap_threshold = max(median_h * 1.0, 25)
|
||||
logger.info(
|
||||
"alignment columns (small zone): gap_threshold=%.0f "
|
||||
"(median_h=%.0f, %d words, %d gaps: %s)",
|
||||
gap_threshold, median_h, len(words), len(sorted_gaps),
|
||||
[int(g) for g in sorted_gaps[:10]],
|
||||
)
|
||||
else:
|
||||
# Standard approach for large zones (full pages)
|
||||
gap_threshold = max(median_gap * 3, median_h * 1.5, 30)
|
||||
# Cap at 25% of zone width
|
||||
max_gap = zone_w * 0.25
|
||||
if gap_threshold > max_gap > 30:
|
||||
logger.info("alignment columns: capping gap_threshold %.0f → %.0f (25%% of zone_w=%d)", gap_threshold, max_gap, zone_w)
|
||||
gap_threshold = max_gap
|
||||
else:
|
||||
gap_threshold = 50
|
||||
|
||||
@@ -233,13 +394,17 @@ def _cluster_columns_by_alignment(
|
||||
used_ids = {id(c) for c in primary} | {id(c) for c in secondary}
|
||||
sig_xs = [c["mean_x"] for c in primary + secondary]
|
||||
|
||||
MIN_DISTINCT_ROWS_TERTIARY = max(MIN_DISTINCT_ROWS + 1, 4)
|
||||
MIN_COVERAGE_TERTIARY = 0.05 # at least 5% of rows
|
||||
# Tertiary: clusters that are clearly to the LEFT of the first
|
||||
# significant column (or RIGHT of the last). If words consistently
|
||||
# start at a position left of the established first column boundary,
|
||||
# they MUST be a separate column — regardless of how few rows they
|
||||
# cover. The only requirement is a clear spatial gap.
|
||||
MIN_COVERAGE_TERTIARY = 0.02 # at least 1 row effectively
|
||||
tertiary = []
|
||||
for c in clusters:
|
||||
if id(c) in used_ids:
|
||||
continue
|
||||
if c["distinct_rows"] < MIN_DISTINCT_ROWS_TERTIARY:
|
||||
if c["distinct_rows"] < 1:
|
||||
continue
|
||||
if c["row_coverage"] < MIN_COVERAGE_TERTIARY:
|
||||
continue
|
||||
@@ -907,6 +1072,16 @@ def _detect_heading_rows_by_single_cell(
|
||||
text = (cell.get("text") or "").strip()
|
||||
if not text or text.startswith("["):
|
||||
continue
|
||||
# Continuation lines start with "(" — e.g. "(usw.)", "(TV-Serie)"
|
||||
if text.startswith("("):
|
||||
continue
|
||||
# Single cell NOT in the first content column is likely a
|
||||
# continuation/overflow line, not a heading. Real headings
|
||||
# ("Theme 1", "Unit 3: ...") appear in the first or second
|
||||
# content column.
|
||||
first_content_col = col_indices[0] if col_indices else 0
|
||||
if cell.get("col_index", 0) > first_content_col + 1:
|
||||
continue
|
||||
# Skip garbled IPA without brackets (e.g. "ska:f – ska:vz")
|
||||
# but NOT text with real IPA symbols (e.g. "Theme [θˈiːm]")
|
||||
_REAL_IPA_CHARS = set("ˈˌəɪɛɒʊʌæɑɔʃʒθðŋ")
|
||||
@@ -1043,6 +1218,130 @@ def _detect_header_rows(
|
||||
return headers
|
||||
|
||||
|
||||
def _detect_colspan_cells(
|
||||
zone_words: List[Dict],
|
||||
columns: List[Dict],
|
||||
rows: List[Dict],
|
||||
cells: List[Dict],
|
||||
img_w: int,
|
||||
img_h: int,
|
||||
) -> List[Dict]:
|
||||
"""Detect and merge cells that span multiple columns (colspan).
|
||||
|
||||
A word-block (PaddleOCR phrase) that extends significantly past a column
|
||||
boundary into the next column indicates a merged cell. This replaces
|
||||
the incorrectly split cells with a single cell spanning multiple columns.
|
||||
|
||||
Works for both full-page scans and box zones.
|
||||
"""
|
||||
if len(columns) < 2 or not zone_words or not rows:
|
||||
return cells
|
||||
|
||||
from cv_words_first import _assign_word_to_row
|
||||
|
||||
# Column boundaries (midpoints between adjacent columns)
|
||||
col_boundaries = []
|
||||
for ci in range(len(columns) - 1):
|
||||
col_boundaries.append((columns[ci]["x_max"] + columns[ci + 1]["x_min"]) / 2)
|
||||
|
||||
def _cols_covered(w_left: float, w_right: float) -> List[int]:
|
||||
"""Return list of column indices that a word-block covers."""
|
||||
covered = []
|
||||
for col in columns:
|
||||
col_mid = (col["x_min"] + col["x_max"]) / 2
|
||||
# Word covers a column if it extends past the column's midpoint
|
||||
if w_left < col_mid < w_right:
|
||||
covered.append(col["index"])
|
||||
# Also include column if word starts within it
|
||||
elif col["x_min"] <= w_left < col["x_max"]:
|
||||
covered.append(col["index"])
|
||||
return sorted(set(covered))
|
||||
|
||||
# Group original word-blocks by row
|
||||
row_word_blocks: Dict[int, List[Dict]] = {}
|
||||
for w in zone_words:
|
||||
ri = _assign_word_to_row(w, rows)
|
||||
row_word_blocks.setdefault(ri, []).append(w)
|
||||
|
||||
# For each row, check if any word-block spans multiple columns
|
||||
rows_to_merge: Dict[int, List[Dict]] = {} # row_index → list of spanning word-blocks
|
||||
|
||||
for ri, wblocks in row_word_blocks.items():
|
||||
spanning = []
|
||||
for w in wblocks:
|
||||
w_left = w["left"]
|
||||
w_right = w_left + w["width"]
|
||||
covered = _cols_covered(w_left, w_right)
|
||||
if len(covered) >= 2:
|
||||
spanning.append({"word": w, "cols": covered})
|
||||
if spanning:
|
||||
rows_to_merge[ri] = spanning
|
||||
|
||||
if not rows_to_merge:
|
||||
return cells
|
||||
|
||||
# Merge cells for spanning rows
|
||||
new_cells = []
|
||||
for cell in cells:
|
||||
ri = cell.get("row_index", -1)
|
||||
if ri not in rows_to_merge:
|
||||
new_cells.append(cell)
|
||||
continue
|
||||
|
||||
# Check if this cell's column is part of a spanning block
|
||||
ci = cell.get("col_index", -1)
|
||||
is_part_of_span = False
|
||||
for span in rows_to_merge[ri]:
|
||||
if ci in span["cols"]:
|
||||
is_part_of_span = True
|
||||
# Only emit the merged cell for the FIRST column in the span
|
||||
if ci == span["cols"][0]:
|
||||
# Use the ORIGINAL word-block text (not the split cell texts
|
||||
# which may have broken words like "euros a" + "nd cents")
|
||||
orig_word = span["word"]
|
||||
merged_text = orig_word.get("text", "").strip()
|
||||
all_wb = [orig_word]
|
||||
|
||||
# Compute merged bbox
|
||||
if all_wb:
|
||||
x_min = min(wb["left"] for wb in all_wb)
|
||||
y_min = min(wb["top"] for wb in all_wb)
|
||||
x_max = max(wb["left"] + wb["width"] for wb in all_wb)
|
||||
y_max = max(wb["top"] + wb["height"] for wb in all_wb)
|
||||
else:
|
||||
x_min = y_min = x_max = y_max = 0
|
||||
|
||||
new_cells.append({
|
||||
"cell_id": cell["cell_id"],
|
||||
"row_index": ri,
|
||||
"col_index": span["cols"][0],
|
||||
"col_type": "spanning_header",
|
||||
"colspan": len(span["cols"]),
|
||||
"text": merged_text,
|
||||
"confidence": cell.get("confidence", 0),
|
||||
"bbox_px": {"x": x_min, "y": y_min,
|
||||
"w": x_max - x_min, "h": y_max - y_min},
|
||||
"bbox_pct": {
|
||||
"x": round(x_min / img_w * 100, 2) if img_w else 0,
|
||||
"y": round(y_min / img_h * 100, 2) if img_h else 0,
|
||||
"w": round((x_max - x_min) / img_w * 100, 2) if img_w else 0,
|
||||
"h": round((y_max - y_min) / img_h * 100, 2) if img_h else 0,
|
||||
},
|
||||
"word_boxes": all_wb,
|
||||
"ocr_engine": cell.get("ocr_engine", ""),
|
||||
"is_bold": cell.get("is_bold", False),
|
||||
})
|
||||
logger.info(
|
||||
"colspan detected: row %d, cols %s → merged %d cells (%r)",
|
||||
ri, span["cols"], len(span["cols"]), merged_text[:50],
|
||||
)
|
||||
break
|
||||
if not is_part_of_span:
|
||||
new_cells.append(cell)
|
||||
|
||||
return new_cells
|
||||
|
||||
|
||||
def _build_zone_grid(
|
||||
zone_words: List[Dict],
|
||||
zone_x: int,
|
||||
@@ -1111,9 +1410,24 @@ def _build_zone_grid(
|
||||
"header_rows": [],
|
||||
}
|
||||
|
||||
# Split word boxes that straddle column boundaries (e.g. "sichzie"
|
||||
# spanning Col 1 + Col 2). Must happen after column detection and
|
||||
# before cell assignment.
|
||||
# Keep original words for colspan detection (split destroys span info).
|
||||
original_zone_words = zone_words
|
||||
if len(columns) >= 2:
|
||||
zone_words = _split_cross_column_words(zone_words, columns)
|
||||
|
||||
# Build cells
|
||||
cells = _build_cells(zone_words, columns, rows, img_w, img_h)
|
||||
|
||||
# --- Detect colspan (merged cells spanning multiple columns) ---
|
||||
# Uses the ORIGINAL (pre-split) words to detect word-blocks that span
|
||||
# multiple columns. _split_cross_column_words would have destroyed
|
||||
# this information by cutting words at column boundaries.
|
||||
if len(columns) >= 2:
|
||||
cells = _detect_colspan_cells(original_zone_words, columns, rows, cells, img_w, img_h)
|
||||
|
||||
# Prefix cell IDs with zone index
|
||||
for cell in cells:
|
||||
cell["cell_id"] = f"Z{zone_index}_{cell['cell_id']}"
|
||||
|
||||
@@ -262,14 +262,22 @@ async def list_sessions_db(
|
||||
document_category, doc_type,
|
||||
parent_session_id, box_index,
|
||||
document_group_id, page_number,
|
||||
created_at, updated_at
|
||||
created_at, updated_at,
|
||||
ground_truth
|
||||
FROM ocr_pipeline_sessions
|
||||
{where}
|
||||
ORDER BY created_at DESC
|
||||
LIMIT $1
|
||||
""", limit)
|
||||
|
||||
return [_row_to_dict(row) for row in rows]
|
||||
results = []
|
||||
for row in rows:
|
||||
d = _row_to_dict(row)
|
||||
# Derive is_ground_truth flag from JSONB, then drop the heavy field
|
||||
gt = d.pop("ground_truth", None) or {}
|
||||
d["is_ground_truth"] = bool(gt.get("build_grid_reference"))
|
||||
results.append(d)
|
||||
return results
|
||||
|
||||
|
||||
async def get_sub_sessions(parent_session_id: str) -> List[Dict[str, Any]]:
|
||||
|
||||
@@ -71,13 +71,36 @@ async def create_session(
|
||||
file: UploadFile = File(...),
|
||||
name: Optional[str] = Form(None),
|
||||
):
|
||||
"""Upload a PDF or image file and create a pipeline session."""
|
||||
"""Upload a PDF or image file and create a pipeline session.
|
||||
|
||||
For multi-page PDFs (> 1 page), each page becomes its own session
|
||||
grouped under a ``document_group_id``. The response includes a
|
||||
``pages`` array with one entry per page/session.
|
||||
"""
|
||||
file_data = await file.read()
|
||||
filename = file.filename or "upload"
|
||||
content_type = file.content_type or ""
|
||||
|
||||
session_id = str(uuid.uuid4())
|
||||
is_pdf = content_type == "application/pdf" or filename.lower().endswith(".pdf")
|
||||
session_name = name or filename
|
||||
|
||||
# --- Multi-page PDF handling ---
|
||||
if is_pdf:
|
||||
try:
|
||||
import fitz # PyMuPDF
|
||||
pdf_doc = fitz.open(stream=file_data, filetype="pdf")
|
||||
page_count = pdf_doc.page_count
|
||||
pdf_doc.close()
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=400, detail=f"Could not read PDF: {e}")
|
||||
|
||||
if page_count > 1:
|
||||
return await _create_multi_page_sessions(
|
||||
file_data, filename, session_name, page_count,
|
||||
)
|
||||
|
||||
# --- Single page (image or 1-page PDF) ---
|
||||
session_id = str(uuid.uuid4())
|
||||
|
||||
try:
|
||||
if is_pdf:
|
||||
@@ -93,7 +116,6 @@ async def create_session(
|
||||
raise HTTPException(status_code=500, detail="Failed to encode image")
|
||||
|
||||
original_png = png_buf.tobytes()
|
||||
session_name = name or filename
|
||||
|
||||
# Persist to DB
|
||||
await create_session_db(
|
||||
@@ -134,6 +156,86 @@ async def create_session(
|
||||
}
|
||||
|
||||
|
||||
async def _create_multi_page_sessions(
|
||||
pdf_data: bytes,
|
||||
filename: str,
|
||||
base_name: str,
|
||||
page_count: int,
|
||||
) -> dict:
|
||||
"""Create one session per PDF page, grouped by document_group_id."""
|
||||
document_group_id = str(uuid.uuid4())
|
||||
pages = []
|
||||
|
||||
for page_idx in range(page_count):
|
||||
session_id = str(uuid.uuid4())
|
||||
page_name = f"{base_name} — Seite {page_idx + 1}"
|
||||
|
||||
try:
|
||||
img_bgr = render_pdf_high_res(pdf_data, page_number=page_idx, zoom=3.0)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to render PDF page {page_idx + 1}: {e}")
|
||||
continue
|
||||
|
||||
ok, png_buf = cv2.imencode(".png", img_bgr)
|
||||
if not ok:
|
||||
continue
|
||||
page_png = png_buf.tobytes()
|
||||
|
||||
await create_session_db(
|
||||
session_id=session_id,
|
||||
name=page_name,
|
||||
filename=filename,
|
||||
original_png=page_png,
|
||||
document_group_id=document_group_id,
|
||||
page_number=page_idx + 1,
|
||||
)
|
||||
|
||||
_cache[session_id] = {
|
||||
"id": session_id,
|
||||
"filename": filename,
|
||||
"name": page_name,
|
||||
"original_bgr": img_bgr,
|
||||
"oriented_bgr": None,
|
||||
"cropped_bgr": None,
|
||||
"deskewed_bgr": None,
|
||||
"dewarped_bgr": None,
|
||||
"orientation_result": None,
|
||||
"crop_result": None,
|
||||
"deskew_result": None,
|
||||
"dewarp_result": None,
|
||||
"ground_truth": {},
|
||||
"current_step": 1,
|
||||
}
|
||||
|
||||
h, w = img_bgr.shape[:2]
|
||||
pages.append({
|
||||
"session_id": session_id,
|
||||
"name": page_name,
|
||||
"page_number": page_idx + 1,
|
||||
"image_width": w,
|
||||
"image_height": h,
|
||||
"original_image_url": f"/api/v1/ocr-pipeline/sessions/{session_id}/image/original",
|
||||
})
|
||||
|
||||
logger.info(
|
||||
f"OCR Pipeline: created page session {session_id} "
|
||||
f"(page {page_idx + 1}/{page_count}) from {filename} ({w}x{h})"
|
||||
)
|
||||
|
||||
# Include session_id pointing to first page for backwards compatibility
|
||||
# (frontends that expect a single session_id will navigate to page 1)
|
||||
first_session_id = pages[0]["session_id"] if pages else None
|
||||
|
||||
return {
|
||||
"session_id": first_session_id,
|
||||
"document_group_id": document_group_id,
|
||||
"filename": filename,
|
||||
"name": base_name,
|
||||
"page_count": page_count,
|
||||
"pages": pages,
|
||||
}
|
||||
|
||||
|
||||
@router.get("/sessions/{session_id}")
|
||||
async def get_session_info(session_id: str):
|
||||
"""Get session info including deskew/dewarp/column results for step navigation."""
|
||||
|
||||
@@ -457,6 +457,164 @@ def _detect_spine_shadow(
|
||||
return spine_x
|
||||
|
||||
|
||||
def _detect_gutter_continuity(
|
||||
gray: np.ndarray,
|
||||
search_region: np.ndarray,
|
||||
offset_x: int,
|
||||
w: int,
|
||||
side: str,
|
||||
) -> Optional[int]:
|
||||
"""Detect gutter shadow via vertical continuity analysis.
|
||||
|
||||
Camera book scans produce a subtle brightness gradient at the gutter
|
||||
that is too faint for scanner-shadow detection (range < 40). However,
|
||||
the gutter shadow has a unique property: it runs **continuously from
|
||||
top to bottom** without interruption. Text and images always have
|
||||
vertical gaps between lines, paragraphs, or sections.
|
||||
|
||||
Algorithm:
|
||||
1. Divide image into N horizontal strips (~60px each)
|
||||
2. For each column, compute what fraction of strips are darker than
|
||||
the page median (from the center 50% of the full image)
|
||||
3. A "gutter column" has ≥ 75% of strips darker than page_median − δ
|
||||
4. Smooth the dark-fraction profile and find the transition point
|
||||
from the edge inward where the fraction drops below 0.50
|
||||
5. Validate: gutter band must be 0.5%-10% of image width
|
||||
|
||||
Args:
|
||||
gray: Full grayscale image.
|
||||
search_region: Edge slice of the grayscale image.
|
||||
offset_x: X offset of search_region relative to full image.
|
||||
w: Full image width.
|
||||
side: 'left' or 'right'.
|
||||
|
||||
Returns:
|
||||
X coordinate (in full image) of the gutter inner edge, or None.
|
||||
"""
|
||||
region_h, region_w = search_region.shape[:2]
|
||||
if region_w < 20 or region_h < 100:
|
||||
return None
|
||||
|
||||
# --- 1. Divide into horizontal strips ---
|
||||
strip_target_h = 60 # ~60px per strip
|
||||
n_strips = max(10, region_h // strip_target_h)
|
||||
strip_h = region_h // n_strips
|
||||
|
||||
strip_means = np.zeros((n_strips, region_w), dtype=np.float64)
|
||||
for s in range(n_strips):
|
||||
y0 = s * strip_h
|
||||
y1 = min((s + 1) * strip_h, region_h)
|
||||
strip_means[s] = np.mean(search_region[y0:y1, :], axis=0)
|
||||
|
||||
# --- 2. Page median from center 50% of full image ---
|
||||
center_lo = w // 4
|
||||
center_hi = 3 * w // 4
|
||||
page_median = float(np.median(gray[:, center_lo:center_hi]))
|
||||
|
||||
# Camera shadows are subtle — threshold just 5 levels below page median
|
||||
dark_thresh = page_median - 5.0
|
||||
|
||||
# If page is very dark overall (e.g. photo, not a book page), bail out
|
||||
if page_median < 180:
|
||||
return None
|
||||
|
||||
# --- 3. Per-column dark fraction ---
|
||||
dark_count = np.sum(strip_means < dark_thresh, axis=0).astype(np.float64)
|
||||
dark_frac = dark_count / n_strips # shape: (region_w,)
|
||||
|
||||
# --- 4. Smooth and find transition ---
|
||||
# Rolling mean (window = 1% of image width, min 5)
|
||||
smooth_w = max(5, w // 100)
|
||||
if smooth_w % 2 == 0:
|
||||
smooth_w += 1
|
||||
kernel = np.ones(smooth_w) / smooth_w
|
||||
frac_smooth = np.convolve(dark_frac, kernel, mode="same")
|
||||
|
||||
# Trim convolution edges
|
||||
margin = smooth_w // 2
|
||||
if region_w <= 2 * margin + 10:
|
||||
return None
|
||||
|
||||
# Find the peak of dark fraction (gutter center).
|
||||
# For right gutters the peak is near the edge; for left gutters
|
||||
# (V-shaped spine shadow) the peak may be well inside the region.
|
||||
transition_thresh = 0.50
|
||||
peak_frac = float(np.max(frac_smooth[margin:region_w - margin]))
|
||||
|
||||
if peak_frac < 0.70:
|
||||
logger.debug(
|
||||
"%s gutter: peak dark fraction %.2f < 0.70", side.capitalize(), peak_frac,
|
||||
)
|
||||
return None
|
||||
|
||||
peak_x = int(np.argmax(frac_smooth[margin:region_w - margin])) + margin
|
||||
gutter_inner = None # local x in search_region
|
||||
|
||||
if side == "right":
|
||||
# Scan from peak toward the page center (leftward)
|
||||
for x in range(peak_x, margin, -1):
|
||||
if frac_smooth[x] < transition_thresh:
|
||||
gutter_inner = x + 1
|
||||
break
|
||||
else:
|
||||
# Scan from peak toward the page center (rightward)
|
||||
for x in range(peak_x, region_w - margin):
|
||||
if frac_smooth[x] < transition_thresh:
|
||||
gutter_inner = x - 1
|
||||
break
|
||||
|
||||
if gutter_inner is None:
|
||||
return None
|
||||
|
||||
# --- 5. Validate gutter width ---
|
||||
if side == "right":
|
||||
gutter_width = region_w - gutter_inner
|
||||
else:
|
||||
gutter_width = gutter_inner
|
||||
|
||||
min_gutter = max(3, int(w * 0.005)) # at least 0.5% of image
|
||||
max_gutter = int(w * 0.10) # at most 10% of image
|
||||
|
||||
if gutter_width < min_gutter:
|
||||
logger.debug(
|
||||
"%s gutter: too narrow (%dpx < %dpx)", side.capitalize(),
|
||||
gutter_width, min_gutter,
|
||||
)
|
||||
return None
|
||||
|
||||
if gutter_width > max_gutter:
|
||||
logger.debug(
|
||||
"%s gutter: too wide (%dpx > %dpx)", side.capitalize(),
|
||||
gutter_width, max_gutter,
|
||||
)
|
||||
return None
|
||||
|
||||
# Check that the gutter band is meaningfully darker than the page
|
||||
if side == "right":
|
||||
gutter_brightness = float(np.mean(strip_means[:, gutter_inner:]))
|
||||
else:
|
||||
gutter_brightness = float(np.mean(strip_means[:, :gutter_inner]))
|
||||
|
||||
brightness_drop = page_median - gutter_brightness
|
||||
if brightness_drop < 3:
|
||||
logger.debug(
|
||||
"%s gutter: insufficient brightness drop (%.1f levels)",
|
||||
side.capitalize(), brightness_drop,
|
||||
)
|
||||
return None
|
||||
|
||||
gutter_x = offset_x + gutter_inner
|
||||
|
||||
logger.info(
|
||||
"%s gutter (continuity): x=%d, width=%dpx (%.1f%%), "
|
||||
"brightness=%.0f vs page=%.0f (drop=%.0f), frac@edge=%.2f",
|
||||
side.capitalize(), gutter_x, gutter_width,
|
||||
100.0 * gutter_width / w, gutter_brightness, page_median,
|
||||
brightness_drop, float(frac_smooth[gutter_inner]),
|
||||
)
|
||||
return gutter_x
|
||||
|
||||
|
||||
def _detect_left_edge_shadow(
|
||||
gray: np.ndarray,
|
||||
binary: np.ndarray,
|
||||
@@ -465,15 +623,22 @@ def _detect_left_edge_shadow(
|
||||
) -> int:
|
||||
"""Detect left content edge, accounting for book-spine shadow.
|
||||
|
||||
Looks at the left 25% for a scanner gray strip. Cuts at the
|
||||
darkest column (= spine center). Fallback: binary projection.
|
||||
Tries three methods in order:
|
||||
1. Scanner spine-shadow (dark gradient, range > 40)
|
||||
2. Camera gutter continuity (subtle shadow running top-to-bottom)
|
||||
3. Binary projection fallback (first ink column)
|
||||
"""
|
||||
search_w = max(1, w // 4)
|
||||
spine_x = _detect_spine_shadow(gray, gray[:, :search_w], 0, w, "left")
|
||||
if spine_x is not None:
|
||||
return spine_x
|
||||
|
||||
# Fallback: binary vertical projection
|
||||
# Fallback 1: vertical continuity (camera gutter shadow)
|
||||
gutter_x = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
|
||||
if gutter_x is not None:
|
||||
return gutter_x
|
||||
|
||||
# Fallback 2: binary vertical projection
|
||||
return _detect_edge_projection(binary, axis=0, from_start=True, dim=w)
|
||||
|
||||
|
||||
@@ -485,8 +650,10 @@ def _detect_right_edge_shadow(
|
||||
) -> int:
|
||||
"""Detect right content edge, accounting for book-spine shadow.
|
||||
|
||||
Looks at the right 25% for a scanner gray strip. Cuts at the
|
||||
darkest column (= spine center). Fallback: binary projection.
|
||||
Tries three methods in order:
|
||||
1. Scanner spine-shadow (dark gradient, range > 40)
|
||||
2. Camera gutter continuity (subtle shadow running top-to-bottom)
|
||||
3. Binary projection fallback (last ink column)
|
||||
"""
|
||||
search_w = max(1, w // 4)
|
||||
right_start = w - search_w
|
||||
@@ -494,7 +661,12 @@ def _detect_right_edge_shadow(
|
||||
if spine_x is not None:
|
||||
return spine_x
|
||||
|
||||
# Fallback: binary vertical projection
|
||||
# Fallback 1: vertical continuity (camera gutter shadow)
|
||||
gutter_x = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
|
||||
if gutter_x is not None:
|
||||
return gutter_x
|
||||
|
||||
# Fallback 2: binary vertical projection
|
||||
return _detect_edge_projection(binary, axis=0, from_start=False, dim=w)
|
||||
|
||||
|
||||
|
||||
594
klausur-service/backend/smart_spell.py
Normal file
594
klausur-service/backend/smart_spell.py
Normal file
@@ -0,0 +1,594 @@
|
||||
"""
|
||||
SmartSpellChecker — Language-aware OCR post-correction without LLMs.
|
||||
|
||||
Uses pyspellchecker (MIT) with dual EN+DE dictionaries for:
|
||||
- Automatic language detection per word (dual-dictionary heuristic)
|
||||
- OCR error correction (digit↔letter, umlauts, transpositions)
|
||||
- Context-based disambiguation (a/I, l/I) via bigram lookup
|
||||
- Mixed-language support for example sentences
|
||||
|
||||
Lizenz: Apache 2.0 (kommerziell nutzbar)
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Literal, Optional, Set, Tuple
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Init
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
try:
|
||||
from spellchecker import SpellChecker as _SpellChecker
|
||||
_en_spell = _SpellChecker(language='en', distance=1)
|
||||
_de_spell = _SpellChecker(language='de', distance=1)
|
||||
_AVAILABLE = True
|
||||
except ImportError:
|
||||
_AVAILABLE = False
|
||||
logger.warning("pyspellchecker not installed — SmartSpellChecker disabled")
|
||||
|
||||
Lang = Literal["en", "de", "both", "unknown"]
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Bigram context for a/I disambiguation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Words that commonly follow "I" (subject pronoun → verb/modal)
|
||||
_I_FOLLOWERS: frozenset = frozenset({
|
||||
"am", "was", "have", "had", "do", "did", "will", "would", "can",
|
||||
"could", "should", "shall", "may", "might", "must",
|
||||
"think", "know", "see", "want", "need", "like", "love", "hate",
|
||||
"go", "went", "come", "came", "say", "said", "get", "got",
|
||||
"make", "made", "take", "took", "give", "gave", "tell", "told",
|
||||
"feel", "felt", "find", "found", "believe", "hope", "wish",
|
||||
"remember", "forget", "understand", "mean", "meant",
|
||||
"don't", "didn't", "can't", "won't", "couldn't", "wouldn't",
|
||||
"shouldn't", "haven't", "hadn't", "isn't", "wasn't",
|
||||
"really", "just", "also", "always", "never", "often", "sometimes",
|
||||
})
|
||||
|
||||
# Words that commonly follow "a" (article → noun/adjective)
|
||||
_A_FOLLOWERS: frozenset = frozenset({
|
||||
"lot", "few", "little", "bit", "good", "bad", "great", "new", "old",
|
||||
"long", "short", "big", "small", "large", "huge", "tiny",
|
||||
"nice", "beautiful", "wonderful", "terrible", "horrible",
|
||||
"man", "woman", "boy", "girl", "child", "dog", "cat", "bird",
|
||||
"book", "car", "house", "room", "school", "teacher", "student",
|
||||
"day", "week", "month", "year", "time", "place", "way",
|
||||
"friend", "family", "person", "problem", "question", "story",
|
||||
"very", "really", "quite", "rather", "pretty", "single",
|
||||
})
|
||||
|
||||
# Digit→letter substitutions (OCR confusion)
|
||||
_DIGIT_SUBS: Dict[str, List[str]] = {
|
||||
'0': ['o', 'O'],
|
||||
'1': ['l', 'I'],
|
||||
'5': ['s', 'S'],
|
||||
'6': ['g', 'G'],
|
||||
'8': ['b', 'B'],
|
||||
'|': ['I', 'l'],
|
||||
'/': ['l'], # italic 'l' misread as slash (e.g. "p/" → "pl")
|
||||
}
|
||||
_SUSPICIOUS_CHARS = frozenset(_DIGIT_SUBS.keys())
|
||||
|
||||
# Umlaut confusion: OCR drops dots (ü→u, ä→a, ö→o)
|
||||
_UMLAUT_MAP = {
|
||||
'a': 'ä', 'o': 'ö', 'u': 'ü', 'i': 'ü',
|
||||
'A': 'Ä', 'O': 'Ö', 'U': 'Ü', 'I': 'Ü',
|
||||
}
|
||||
|
||||
# Tokenizer — includes | and / so OCR artifacts like "p/" are treated as words
|
||||
_TOKEN_RE = re.compile(r"([A-Za-zÄÖÜäöüß'|/]+)([^A-Za-zÄÖÜäöüß'|/]*)")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Data types
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class CorrectionResult:
|
||||
original: str
|
||||
corrected: str
|
||||
lang_detected: Lang
|
||||
changed: bool
|
||||
changes: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Core class
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class SmartSpellChecker:
|
||||
"""Language-aware OCR spell checker using pyspellchecker (no LLM)."""
|
||||
|
||||
def __init__(self):
|
||||
if not _AVAILABLE:
|
||||
raise RuntimeError("pyspellchecker not installed")
|
||||
self.en = _en_spell
|
||||
self.de = _de_spell
|
||||
|
||||
# --- Language detection ---
|
||||
|
||||
def detect_word_lang(self, word: str) -> Lang:
|
||||
"""Detect language of a single word using dual-dict heuristic."""
|
||||
w = word.lower().strip(".,;:!?\"'()")
|
||||
if not w:
|
||||
return "unknown"
|
||||
in_en = bool(self.en.known([w]))
|
||||
in_de = bool(self.de.known([w]))
|
||||
if in_en and in_de:
|
||||
return "both"
|
||||
if in_en:
|
||||
return "en"
|
||||
if in_de:
|
||||
return "de"
|
||||
return "unknown"
|
||||
|
||||
def detect_text_lang(self, text: str) -> Lang:
|
||||
"""Detect dominant language of a text string (sentence/phrase)."""
|
||||
words = re.findall(r"[A-Za-zÄÖÜäöüß]+", text)
|
||||
if not words:
|
||||
return "unknown"
|
||||
|
||||
en_count = 0
|
||||
de_count = 0
|
||||
for w in words:
|
||||
lang = self.detect_word_lang(w)
|
||||
if lang == "en":
|
||||
en_count += 1
|
||||
elif lang == "de":
|
||||
de_count += 1
|
||||
# "both" doesn't count for either
|
||||
|
||||
if en_count > de_count:
|
||||
return "en"
|
||||
if de_count > en_count:
|
||||
return "de"
|
||||
if en_count == de_count and en_count > 0:
|
||||
return "both"
|
||||
return "unknown"
|
||||
|
||||
# --- Single-word correction ---
|
||||
|
||||
def _known(self, word: str) -> bool:
|
||||
"""True if word is known in EN or DE dictionary, or is a known abbreviation."""
|
||||
w = word.lower()
|
||||
if bool(self.en.known([w])) or bool(self.de.known([w])):
|
||||
return True
|
||||
# Also accept known abbreviations (sth, sb, adj, etc.)
|
||||
try:
|
||||
from cv_ocr_engines import _KNOWN_ABBREVIATIONS
|
||||
if w in _KNOWN_ABBREVIATIONS:
|
||||
return True
|
||||
except ImportError:
|
||||
pass
|
||||
return False
|
||||
|
||||
def _word_freq(self, word: str) -> float:
|
||||
"""Get word frequency (max of EN and DE)."""
|
||||
w = word.lower()
|
||||
return max(self.en.word_usage_frequency(w), self.de.word_usage_frequency(w))
|
||||
|
||||
def _known_in(self, word: str, lang: str) -> bool:
|
||||
"""True if word is known in a specific language dictionary."""
|
||||
w = word.lower()
|
||||
spell = self.en if lang == "en" else self.de
|
||||
return bool(spell.known([w]))
|
||||
|
||||
def correct_word(self, word: str, lang: str = "en",
|
||||
prev_word: str = "", next_word: str = "") -> Optional[str]:
|
||||
"""Correct a single word for the given language.
|
||||
|
||||
Returns None if no correction needed, or the corrected string.
|
||||
|
||||
Args:
|
||||
word: The word to check/correct
|
||||
lang: Expected language ("en" or "de")
|
||||
prev_word: Previous word (for context)
|
||||
next_word: Next word (for context)
|
||||
"""
|
||||
if not word or not word.strip():
|
||||
return None
|
||||
|
||||
# Skip numbers, abbreviations with dots, very short tokens
|
||||
if word.isdigit() or '.' in word:
|
||||
return None
|
||||
|
||||
# Skip IPA/phonetic content in brackets
|
||||
if '[' in word or ']' in word:
|
||||
return None
|
||||
|
||||
has_suspicious = any(ch in _SUSPICIOUS_CHARS for ch in word)
|
||||
|
||||
# 1. Already known → no fix
|
||||
if self._known(word):
|
||||
# But check a/I disambiguation for single-char words
|
||||
if word.lower() in ('l', '|') and next_word:
|
||||
return self._disambiguate_a_I(word, next_word)
|
||||
return None
|
||||
|
||||
# 2. Digit/pipe substitution
|
||||
if has_suspicious:
|
||||
if word == '|':
|
||||
return 'I'
|
||||
# Try single-char substitutions
|
||||
for i, ch in enumerate(word):
|
||||
if ch not in _DIGIT_SUBS:
|
||||
continue
|
||||
for replacement in _DIGIT_SUBS[ch]:
|
||||
candidate = word[:i] + replacement + word[i + 1:]
|
||||
if self._known(candidate):
|
||||
return candidate
|
||||
# Try multi-char substitution (e.g., "sch00l" → "school")
|
||||
multi = self._try_multi_digit_sub(word)
|
||||
if multi:
|
||||
return multi
|
||||
|
||||
# 3. Umlaut correction (German)
|
||||
if lang == "de" and len(word) >= 3 and word.isalpha():
|
||||
umlaut_fix = self._try_umlaut_fix(word)
|
||||
if umlaut_fix:
|
||||
return umlaut_fix
|
||||
|
||||
# 4. General spell correction
|
||||
if not has_suspicious and len(word) >= 3 and word.isalpha():
|
||||
# Safety: don't correct if the word is valid in the OTHER language
|
||||
# (either directly or via umlaut fix)
|
||||
other_lang = "de" if lang == "en" else "en"
|
||||
if self._known_in(word, other_lang):
|
||||
return None
|
||||
if other_lang == "de" and self._try_umlaut_fix(word):
|
||||
return None # has a valid DE umlaut variant → don't touch
|
||||
|
||||
spell = self.en if lang == "en" else self.de
|
||||
correction = spell.correction(word.lower())
|
||||
if correction and correction != word.lower():
|
||||
if word[0].isupper():
|
||||
correction = correction[0].upper() + correction[1:]
|
||||
if self._known(correction):
|
||||
return correction
|
||||
|
||||
return None
|
||||
|
||||
# --- Multi-digit substitution ---
|
||||
|
||||
def _try_multi_digit_sub(self, word: str) -> Optional[str]:
|
||||
"""Try replacing multiple digits simultaneously."""
|
||||
positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
|
||||
if len(positions) < 1 or len(positions) > 4:
|
||||
return None
|
||||
|
||||
# Try all combinations (max 2^4 = 16 for 4 positions)
|
||||
chars = list(word)
|
||||
best = None
|
||||
self._multi_sub_recurse(chars, positions, 0, best_result=[None])
|
||||
return self._multi_sub_recurse_result
|
||||
|
||||
_multi_sub_recurse_result: Optional[str] = None
|
||||
|
||||
def _try_multi_digit_sub(self, word: str) -> Optional[str]:
|
||||
"""Try replacing multiple digits simultaneously using BFS."""
|
||||
positions = [(i, ch) for i, ch in enumerate(word) if ch in _DIGIT_SUBS]
|
||||
if not positions or len(positions) > 4:
|
||||
return None
|
||||
|
||||
# BFS over substitution combinations
|
||||
queue = [list(word)]
|
||||
for pos, ch in positions:
|
||||
next_queue = []
|
||||
for current in queue:
|
||||
# Keep original
|
||||
next_queue.append(current[:])
|
||||
# Try each substitution
|
||||
for repl in _DIGIT_SUBS[ch]:
|
||||
variant = current[:]
|
||||
variant[pos] = repl
|
||||
next_queue.append(variant)
|
||||
queue = next_queue
|
||||
|
||||
# Check which combinations produce known words
|
||||
for combo in queue:
|
||||
candidate = "".join(combo)
|
||||
if candidate != word and self._known(candidate):
|
||||
return candidate
|
||||
|
||||
return None
|
||||
|
||||
# --- Umlaut fix ---
|
||||
|
||||
def _try_umlaut_fix(self, word: str) -> Optional[str]:
|
||||
"""Try single-char umlaut substitutions for German words."""
|
||||
for i, ch in enumerate(word):
|
||||
if ch in _UMLAUT_MAP:
|
||||
candidate = word[:i] + _UMLAUT_MAP[ch] + word[i + 1:]
|
||||
if self._known(candidate):
|
||||
return candidate
|
||||
return None
|
||||
|
||||
# --- Boundary repair (shifted word boundaries) ---
|
||||
|
||||
def _try_boundary_repair(self, word1: str, word2: str) -> Optional[Tuple[str, str]]:
|
||||
"""Fix shifted word boundaries between adjacent tokens.
|
||||
|
||||
OCR sometimes shifts the boundary: "at sth." → "ats th."
|
||||
Try moving 1-2 chars from end of word1 to start of word2 and vice versa.
|
||||
Returns (fixed_word1, fixed_word2) or None.
|
||||
"""
|
||||
# Import known abbreviations for vocabulary context
|
||||
try:
|
||||
from cv_ocr_engines import _KNOWN_ABBREVIATIONS
|
||||
except ImportError:
|
||||
_KNOWN_ABBREVIATIONS = set()
|
||||
|
||||
# Strip trailing punctuation for checking, preserve for result
|
||||
w2_stripped = word2.rstrip(".,;:!?")
|
||||
w2_punct = word2[len(w2_stripped):]
|
||||
|
||||
# Try shifting 1-2 chars from word1 → word2
|
||||
for shift in (1, 2):
|
||||
if len(word1) <= shift:
|
||||
continue
|
||||
new_w1 = word1[:-shift]
|
||||
new_w2_base = word1[-shift:] + w2_stripped
|
||||
|
||||
w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
|
||||
w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
|
||||
|
||||
if w1_ok and w2_ok:
|
||||
return (new_w1, new_w2_base + w2_punct)
|
||||
|
||||
# Try shifting 1-2 chars from word2 → word1
|
||||
for shift in (1, 2):
|
||||
if len(w2_stripped) <= shift:
|
||||
continue
|
||||
new_w1 = word1 + w2_stripped[:shift]
|
||||
new_w2_base = w2_stripped[shift:]
|
||||
|
||||
w1_ok = self._known(new_w1) or new_w1.lower() in _KNOWN_ABBREVIATIONS
|
||||
w2_ok = self._known(new_w2_base) or new_w2_base.lower() in _KNOWN_ABBREVIATIONS
|
||||
|
||||
if w1_ok and w2_ok:
|
||||
return (new_w1, new_w2_base + w2_punct)
|
||||
|
||||
return None
|
||||
|
||||
# --- Context-based word split for ambiguous merges ---
|
||||
|
||||
# Patterns where a valid word is actually "a" + adjective/noun
|
||||
_ARTICLE_SPLIT_CANDIDATES = {
|
||||
# word → (article, remainder) — only when followed by a compatible word
|
||||
"anew": ("a", "new"),
|
||||
"areal": ("a", "real"),
|
||||
"alive": None, # genuinely one word, never split
|
||||
"alone": None,
|
||||
"aware": None,
|
||||
"alike": None,
|
||||
"apart": None,
|
||||
"aside": None,
|
||||
"above": None,
|
||||
"about": None,
|
||||
"among": None,
|
||||
"along": None,
|
||||
}
|
||||
|
||||
def _try_context_split(self, word: str, next_word: str,
|
||||
prev_word: str) -> Optional[str]:
|
||||
"""Split words like 'anew' → 'a new' when context indicates a merge.
|
||||
|
||||
Only splits when:
|
||||
- The word is in the split candidates list
|
||||
- The following word makes sense as a noun (for "a + adj + noun" pattern)
|
||||
- OR the word is unknown and can be split into article + known word
|
||||
"""
|
||||
w_lower = word.lower()
|
||||
|
||||
# Check explicit candidates
|
||||
if w_lower in self._ARTICLE_SPLIT_CANDIDATES:
|
||||
split = self._ARTICLE_SPLIT_CANDIDATES[w_lower]
|
||||
if split is None:
|
||||
return None # explicitly marked as "don't split"
|
||||
article, remainder = split
|
||||
# Only split if followed by a word (noun pattern)
|
||||
if next_word and next_word[0].islower():
|
||||
return f"{article} {remainder}"
|
||||
# Also split if remainder + next_word makes a common phrase
|
||||
if next_word and self._known(next_word):
|
||||
return f"{article} {remainder}"
|
||||
|
||||
# Generic: if word starts with 'a' and rest is a known adjective/word
|
||||
if (len(word) >= 4 and word[0].lower() == 'a'
|
||||
and not self._known(word) # only for UNKNOWN words
|
||||
and self._known(word[1:])):
|
||||
return f"a {word[1:]}"
|
||||
|
||||
return None
|
||||
|
||||
# --- a/I disambiguation ---
|
||||
|
||||
def _disambiguate_a_I(self, token: str, next_word: str) -> Optional[str]:
|
||||
"""Disambiguate 'a' vs 'I' (and OCR variants like 'l', '|')."""
|
||||
nw = next_word.lower().strip(".,;:!?")
|
||||
if nw in _I_FOLLOWERS:
|
||||
return "I"
|
||||
if nw in _A_FOLLOWERS:
|
||||
return "a"
|
||||
# Fallback: check if next word is more commonly a verb (→I) or noun/adj (→a)
|
||||
# Simple heuristic: if next word starts with uppercase (and isn't first in sentence)
|
||||
# it's likely a German noun following "I"... but in English context, uppercase
|
||||
# after "I" is unusual.
|
||||
return None # uncertain, don't change
|
||||
|
||||
# --- Full text correction ---
|
||||
|
||||
def correct_text(self, text: str, lang: str = "en") -> CorrectionResult:
|
||||
"""Correct a full text string (field value).
|
||||
|
||||
Three passes:
|
||||
1. Boundary repair — fix shifted word boundaries between adjacent tokens
|
||||
2. Context split — split ambiguous merges (anew → a new)
|
||||
3. Per-word correction — spell check individual words
|
||||
|
||||
Args:
|
||||
text: The text to correct
|
||||
lang: Expected language ("en" or "de")
|
||||
"""
|
||||
if not text or not text.strip():
|
||||
return CorrectionResult(text, text, "unknown", False)
|
||||
|
||||
detected = self.detect_text_lang(text) if lang == "auto" else lang
|
||||
effective_lang = detected if detected in ("en", "de") else "en"
|
||||
|
||||
changes: List[str] = []
|
||||
tokens = list(_TOKEN_RE.finditer(text))
|
||||
|
||||
# Extract token list: [(word, separator), ...]
|
||||
token_list: List[List[str]] = [] # [[word, sep], ...]
|
||||
for m in tokens:
|
||||
token_list.append([m.group(1), m.group(2)])
|
||||
|
||||
# --- Pass 1: Boundary repair between adjacent unknown words ---
|
||||
# Import abbreviations for the heuristic below
|
||||
try:
|
||||
from cv_ocr_engines import _KNOWN_ABBREVIATIONS as _ABBREVS
|
||||
except ImportError:
|
||||
_ABBREVS = set()
|
||||
|
||||
for i in range(len(token_list) - 1):
|
||||
w1 = token_list[i][0]
|
||||
w2_raw = token_list[i + 1][0]
|
||||
|
||||
# Skip boundary repair for IPA/bracket content
|
||||
# Brackets may be in the token OR in the adjacent separators
|
||||
sep_before_w1 = token_list[i - 1][1] if i > 0 else ""
|
||||
sep_after_w1 = token_list[i][1]
|
||||
sep_after_w2 = token_list[i + 1][1]
|
||||
has_bracket = (
|
||||
'[' in w1 or ']' in w1 or '[' in w2_raw or ']' in w2_raw
|
||||
or ']' in sep_after_w1 # w1 text was inside [brackets]
|
||||
or '[' in sep_after_w1 # w2 starts a bracket
|
||||
or ']' in sep_after_w2 # w2 text was inside [brackets]
|
||||
or '[' in sep_before_w1 # w1 starts a bracket
|
||||
)
|
||||
if has_bracket:
|
||||
continue
|
||||
|
||||
# Include trailing punct from separator in w2 for abbreviation matching
|
||||
w2_with_punct = w2_raw + token_list[i + 1][1].rstrip(" ")
|
||||
|
||||
# Try boundary repair — always, even if both words are valid.
|
||||
# Use word-frequency scoring to decide if repair is better.
|
||||
repair = self._try_boundary_repair(w1, w2_with_punct)
|
||||
if not repair and w2_with_punct != w2_raw:
|
||||
repair = self._try_boundary_repair(w1, w2_raw)
|
||||
if repair:
|
||||
new_w1, new_w2_full = repair
|
||||
new_w2_base = new_w2_full.rstrip(".,;:!?")
|
||||
|
||||
# Frequency-based scoring: product of word frequencies
|
||||
# Higher product = more common word pair = better
|
||||
old_freq = self._word_freq(w1) * self._word_freq(w2_raw)
|
||||
new_freq = self._word_freq(new_w1) * self._word_freq(new_w2_base)
|
||||
|
||||
# Abbreviation bonus: if repair produces a known abbreviation
|
||||
has_abbrev = new_w1.lower() in _ABBREVS or new_w2_base.lower() in _ABBREVS
|
||||
if has_abbrev:
|
||||
# Accept abbreviation repair ONLY if at least one of the
|
||||
# original words is rare/unknown (prevents "Can I" → "Ca nI"
|
||||
# where both original words are common and correct).
|
||||
# "Rare" = frequency < 1e-6 (covers "ats", "th" but not "Can", "I")
|
||||
RARE_THRESHOLD = 1e-6
|
||||
orig_both_common = (
|
||||
self._word_freq(w1) > RARE_THRESHOLD
|
||||
and self._word_freq(w2_raw) > RARE_THRESHOLD
|
||||
)
|
||||
if not orig_both_common:
|
||||
new_freq = max(new_freq, old_freq * 10)
|
||||
else:
|
||||
has_abbrev = False # both originals common → don't trust
|
||||
|
||||
# Accept if repair produces a more frequent word pair
|
||||
# (threshold: at least 5x more frequent to avoid false positives)
|
||||
if new_freq > old_freq * 5:
|
||||
new_w2_punct = new_w2_full[len(new_w2_base):]
|
||||
changes.append(f"{w1} {w2_raw}→{new_w1} {new_w2_base}")
|
||||
token_list[i][0] = new_w1
|
||||
token_list[i + 1][0] = new_w2_base
|
||||
if new_w2_punct:
|
||||
token_list[i + 1][1] = new_w2_punct + token_list[i + 1][1].lstrip(".,;:!?")
|
||||
|
||||
# --- Pass 2: Context split (anew → a new) ---
|
||||
expanded: List[List[str]] = []
|
||||
for i, (word, sep) in enumerate(token_list):
|
||||
next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
|
||||
prev_word = token_list[i - 1][0] if i > 0 else ""
|
||||
split = self._try_context_split(word, next_word, prev_word)
|
||||
if split and split != word:
|
||||
changes.append(f"{word}→{split}")
|
||||
expanded.append([split, sep])
|
||||
else:
|
||||
expanded.append([word, sep])
|
||||
token_list = expanded
|
||||
|
||||
# --- Pass 3: Per-word correction ---
|
||||
parts: List[str] = []
|
||||
|
||||
# Preserve any leading text before the first token match
|
||||
# (e.g., "(= " before "I won and he lost.")
|
||||
first_start = tokens[0].start() if tokens else 0
|
||||
if first_start > 0:
|
||||
parts.append(text[:first_start])
|
||||
|
||||
for i, (word, sep) in enumerate(token_list):
|
||||
# Skip words inside IPA brackets (brackets land in separators)
|
||||
prev_sep = token_list[i - 1][1] if i > 0 else ""
|
||||
if '[' in prev_sep or ']' in sep:
|
||||
parts.append(word)
|
||||
parts.append(sep)
|
||||
continue
|
||||
|
||||
next_word = token_list[i + 1][0] if i + 1 < len(token_list) else ""
|
||||
prev_word = token_list[i - 1][0] if i > 0 else ""
|
||||
|
||||
correction = self.correct_word(
|
||||
word, lang=effective_lang,
|
||||
prev_word=prev_word, next_word=next_word,
|
||||
)
|
||||
if correction and correction != word:
|
||||
changes.append(f"{word}→{correction}")
|
||||
parts.append(correction)
|
||||
else:
|
||||
parts.append(word)
|
||||
parts.append(sep)
|
||||
|
||||
# Append any trailing text
|
||||
last_end = tokens[-1].end() if tokens else 0
|
||||
if last_end < len(text):
|
||||
parts.append(text[last_end:])
|
||||
|
||||
corrected = "".join(parts)
|
||||
return CorrectionResult(
|
||||
original=text,
|
||||
corrected=corrected,
|
||||
lang_detected=detected,
|
||||
changed=corrected != text,
|
||||
changes=changes,
|
||||
)
|
||||
|
||||
# --- Vocabulary entry correction ---
|
||||
|
||||
def correct_vocab_entry(self, english: str, german: str,
|
||||
example: str = "") -> Dict[str, CorrectionResult]:
|
||||
"""Correct a full vocabulary entry (EN + DE + example).
|
||||
|
||||
Uses column position to determine language — the most reliable signal.
|
||||
"""
|
||||
results = {}
|
||||
results["english"] = self.correct_text(english, lang="en")
|
||||
results["german"] = self.correct_text(german, lang="de")
|
||||
if example:
|
||||
# For examples, auto-detect language
|
||||
results["example"] = self.correct_text(example, lang="auto")
|
||||
return results
|
||||
124
klausur-service/backend/tests/test_box_layout.py
Normal file
124
klausur-service/backend/tests/test_box_layout.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""Tests for cv_box_layout.py — box layout classification and grid building."""
|
||||
|
||||
import pytest
|
||||
import sys, os
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
from cv_box_layout import classify_box_layout, build_box_zone_grid, _group_into_lines
|
||||
|
||||
|
||||
def _make_words(lines_data):
|
||||
"""Create word dicts from [(y, x, text), ...] tuples."""
|
||||
words = []
|
||||
for y, x, text in lines_data:
|
||||
words.append({"top": y, "left": x, "width": len(text) * 10, "height": 25, "text": text})
|
||||
return words
|
||||
|
||||
|
||||
class TestClassifyBoxLayout:
|
||||
|
||||
def test_header_only(self):
|
||||
words = _make_words([(100, 50, "Unit 3")])
|
||||
assert classify_box_layout(words, 500, 50) == "header_only"
|
||||
|
||||
def test_empty(self):
|
||||
assert classify_box_layout([], 500, 200) == "header_only"
|
||||
|
||||
def test_flowing(self):
|
||||
"""Multiple lines without bullet patterns → flowing."""
|
||||
words = _make_words([
|
||||
(100, 50, "German leihen title"),
|
||||
(130, 50, "etwas ausleihen von jm"),
|
||||
(160, 70, "borrow sth from sb"),
|
||||
(190, 70, "Can I borrow your CD"),
|
||||
(220, 50, "etwas verleihen an jn"),
|
||||
(250, 70, "OK I can lend you my"),
|
||||
])
|
||||
assert classify_box_layout(words, 500, 200) == "flowing"
|
||||
|
||||
def test_bullet_list(self):
|
||||
"""Lines starting with bullet markers → bullet_list."""
|
||||
words = _make_words([
|
||||
(100, 50, "Title of the box"),
|
||||
(130, 50, "• First item text here"),
|
||||
(160, 50, "• Second item text here"),
|
||||
(190, 50, "• Third item text here"),
|
||||
(220, 50, "• Fourth item text here"),
|
||||
(250, 50, "• Fifth item text here"),
|
||||
])
|
||||
assert classify_box_layout(words, 500, 150) == "bullet_list"
|
||||
|
||||
|
||||
class TestGroupIntoLines:
|
||||
|
||||
def test_single_line(self):
|
||||
words = _make_words([(100, 50, "hello"), (100, 120, "world")])
|
||||
lines = _group_into_lines(words)
|
||||
assert len(lines) == 1
|
||||
assert len(lines[0]) == 2
|
||||
|
||||
def test_two_lines(self):
|
||||
words = _make_words([(100, 50, "line1"), (150, 50, "line2")])
|
||||
lines = _group_into_lines(words)
|
||||
assert len(lines) == 2
|
||||
|
||||
def test_y_proximity(self):
|
||||
"""Words within y-tolerance are on same line."""
|
||||
words = _make_words([(100, 50, "a"), (103, 120, "b")]) # 3px apart
|
||||
lines = _group_into_lines(words)
|
||||
assert len(lines) == 1
|
||||
|
||||
|
||||
class TestBuildBoxZoneGrid:
|
||||
|
||||
def test_flowing_groups_by_indent(self):
|
||||
"""Flowing layout groups continuation lines by indentation."""
|
||||
words = _make_words([
|
||||
(100, 50, "Header Title"),
|
||||
(130, 50, "Bullet start text"),
|
||||
(160, 80, "continuation line 1"),
|
||||
(190, 80, "continuation line 2"),
|
||||
])
|
||||
result = build_box_zone_grid(words, 40, 90, 500, 120, 0, 1600, 2200, layout_type="flowing")
|
||||
# Header + 1 grouped bullet = 2 rows
|
||||
assert len(result["rows"]) == 2
|
||||
assert len(result["cells"]) == 2
|
||||
# Second cell should have \n (multi-line)
|
||||
bullet_cell = result["cells"][1]
|
||||
assert "\n" in bullet_cell["text"]
|
||||
|
||||
def test_header_only_single_cell(self):
|
||||
words = _make_words([(100, 50, "Just a title")])
|
||||
result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="header_only")
|
||||
assert len(result["cells"]) == 1
|
||||
assert result["box_layout_type"] == "header_only"
|
||||
|
||||
def test_columnar_delegates_to_zone_grid(self):
|
||||
"""Columnar layout uses standard grid builder."""
|
||||
words = _make_words([
|
||||
(100, 50, "Col A header"),
|
||||
(100, 300, "Col B header"),
|
||||
(130, 50, "A data"),
|
||||
(130, 300, "B data"),
|
||||
])
|
||||
result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="columnar")
|
||||
assert result["box_layout_type"] == "columnar"
|
||||
# Should have detected columns
|
||||
assert len(result.get("columns", [])) >= 1
|
||||
|
||||
def test_row_fields_for_gridtable(self):
|
||||
"""Rows must have y_min_px, y_max_px, is_header for GridTable."""
|
||||
words = _make_words([(100, 50, "Title"), (130, 50, "Body")])
|
||||
result = build_box_zone_grid(words, 40, 90, 500, 80, 0, 1600, 2200, layout_type="flowing")
|
||||
for row in result["rows"]:
|
||||
assert "y_min_px" in row
|
||||
assert "y_max_px" in row
|
||||
assert "is_header" in row
|
||||
|
||||
def test_column_fields_for_gridtable(self):
|
||||
"""Columns must have x_min_px, x_max_px for GridTable width calculation."""
|
||||
words = _make_words([(100, 50, "Text")])
|
||||
result = build_box_zone_grid(words, 40, 90, 500, 50, 0, 1600, 2200, layout_type="flowing")
|
||||
for col in result["columns"]:
|
||||
assert "x_min_px" in col
|
||||
assert "x_max_px" in col
|
||||
339
klausur-service/backend/tests/test_gutter_repair.py
Normal file
339
klausur-service/backend/tests/test_gutter_repair.py
Normal file
@@ -0,0 +1,339 @@
|
||||
"""Tests for cv_gutter_repair: gutter-edge word detection and repair."""
|
||||
|
||||
import pytest
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add parent directory to path so we can import the module
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||
|
||||
from cv_gutter_repair import (
|
||||
_is_known,
|
||||
_try_hyphen_join,
|
||||
_try_spell_fix,
|
||||
_edit_distance,
|
||||
_word_is_at_gutter_edge,
|
||||
_MIN_WORD_LEN_SPELL,
|
||||
_MIN_WORD_LEN_HYPHEN,
|
||||
analyse_grid_for_gutter_repair,
|
||||
apply_gutter_suggestions,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helper function tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestEditDistance:
|
||||
def test_identical(self):
|
||||
assert _edit_distance("hello", "hello") == 0
|
||||
|
||||
def test_one_substitution(self):
|
||||
assert _edit_distance("stammeli", "stammeln") == 1
|
||||
|
||||
def test_one_deletion(self):
|
||||
assert _edit_distance("cat", "ca") == 1
|
||||
|
||||
def test_one_insertion(self):
|
||||
assert _edit_distance("ca", "cat") == 1
|
||||
|
||||
def test_empty(self):
|
||||
assert _edit_distance("", "abc") == 3
|
||||
assert _edit_distance("abc", "") == 3
|
||||
|
||||
def test_both_empty(self):
|
||||
assert _edit_distance("", "") == 0
|
||||
|
||||
|
||||
class TestWordIsAtGutterEdge:
|
||||
def test_word_at_right_edge(self):
|
||||
# Word right edge at 90% of column = within gutter zone
|
||||
word_bbox = {"left": 80, "width": 15} # right edge = 95
|
||||
assert _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
|
||||
|
||||
def test_word_in_middle(self):
|
||||
# Word right edge at 50% of column = NOT at gutter
|
||||
word_bbox = {"left": 30, "width": 20} # right edge = 50
|
||||
assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
|
||||
|
||||
def test_word_at_left(self):
|
||||
word_bbox = {"left": 5, "width": 20} # right edge = 25
|
||||
assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=100)
|
||||
|
||||
def test_zero_width_column(self):
|
||||
word_bbox = {"left": 0, "width": 10}
|
||||
assert not _word_is_at_gutter_edge(word_bbox, col_x=0, col_width=0)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Spellchecker-dependent tests (skip if not installed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
try:
|
||||
from spellchecker import SpellChecker
|
||||
_HAS_SPELLCHECKER = True
|
||||
except ImportError:
|
||||
_HAS_SPELLCHECKER = False
|
||||
|
||||
needs_spellchecker = pytest.mark.skipif(
|
||||
not _HAS_SPELLCHECKER, reason="pyspellchecker not installed"
|
||||
)
|
||||
|
||||
|
||||
@needs_spellchecker
|
||||
class TestIsKnown:
|
||||
def test_known_english(self):
|
||||
assert _is_known("hello") is True
|
||||
assert _is_known("world") is True
|
||||
|
||||
def test_known_german(self):
|
||||
assert _is_known("verkünden") is True
|
||||
assert _is_known("stammeln") is True
|
||||
|
||||
def test_unknown_garbled(self):
|
||||
assert _is_known("stammeli") is False
|
||||
assert _is_known("xyzqwp") is False
|
||||
|
||||
def test_short_word(self):
|
||||
# Words < 3 chars are not checked
|
||||
assert _is_known("a") is False
|
||||
|
||||
|
||||
@needs_spellchecker
|
||||
class TestTryHyphenJoin:
|
||||
def test_direct_join(self):
|
||||
# "ver" + "künden" = "verkünden"
|
||||
result = _try_hyphen_join("ver-", "künden")
|
||||
assert result is not None
|
||||
joined, missing, conf = result
|
||||
assert joined == "verkünden"
|
||||
assert missing == ""
|
||||
assert conf >= 0.9
|
||||
|
||||
def test_join_with_missing_chars(self):
|
||||
# "ve" + "künden" → needs "r" in between → "verkünden"
|
||||
result = _try_hyphen_join("ve", "künden", max_missing=2)
|
||||
assert result is not None
|
||||
joined, missing, conf = result
|
||||
assert joined == "verkünden"
|
||||
assert "r" in missing
|
||||
|
||||
def test_no_valid_join(self):
|
||||
result = _try_hyphen_join("xyz", "qwpgh")
|
||||
assert result is None
|
||||
|
||||
def test_empty_inputs(self):
|
||||
assert _try_hyphen_join("", "word") is None
|
||||
assert _try_hyphen_join("word", "") is None
|
||||
|
||||
def test_join_strips_trailing_punctuation(self):
|
||||
# "ver" + "künden," → should still find "verkünden" despite comma
|
||||
result = _try_hyphen_join("ver-", "künden,")
|
||||
assert result is not None
|
||||
joined, missing, conf = result
|
||||
assert joined == "verkünden"
|
||||
|
||||
def test_join_with_missing_chars_and_punctuation(self):
|
||||
# "ve" + "künden," → needs "r" in between, comma must be stripped
|
||||
result = _try_hyphen_join("ve", "künden,", max_missing=2)
|
||||
assert result is not None
|
||||
joined, missing, conf = result
|
||||
assert joined == "verkünden"
|
||||
assert "r" in missing
|
||||
|
||||
|
||||
@needs_spellchecker
|
||||
class TestTrySpellFix:
|
||||
def test_fix_garbled_ending_returns_alternatives(self):
|
||||
# "stammeli" should return a correction with alternatives
|
||||
result = _try_spell_fix("stammeli", col_type="column_de")
|
||||
assert result is not None
|
||||
corrected, conf, alts = result
|
||||
# The best correction is one of the valid forms
|
||||
all_options = [corrected] + alts
|
||||
all_lower = [w.lower() for w in all_options]
|
||||
# "stammeln" must be among the candidates
|
||||
assert "stammeln" in all_lower, f"Expected 'stammeln' in {all_options}"
|
||||
|
||||
def test_known_word_not_fixed(self):
|
||||
# "Haus" is correct — no fix needed
|
||||
result = _try_spell_fix("Haus", col_type="column_de")
|
||||
# Should be None since the word is correct
|
||||
if result is not None:
|
||||
corrected, _, _ = result
|
||||
assert corrected.lower() == "haus"
|
||||
|
||||
def test_short_word_skipped(self):
|
||||
result = _try_spell_fix("ab")
|
||||
assert result is None
|
||||
|
||||
def test_min_word_len_thresholds(self):
|
||||
assert _MIN_WORD_LEN_HYPHEN == 2
|
||||
assert _MIN_WORD_LEN_SPELL == 3
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Grid analysis tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_grid(cells, columns=None):
|
||||
"""Helper to create a minimal grid_data structure."""
|
||||
if columns is None:
|
||||
columns = [
|
||||
{"index": 0, "type": "column_en", "x_min_px": 0, "x_max_px": 200},
|
||||
{"index": 1, "type": "column_de", "x_min_px": 200, "x_max_px": 400},
|
||||
{"index": 2, "type": "column_text", "x_min_px": 400, "x_max_px": 600},
|
||||
]
|
||||
return {
|
||||
"image_width": 600,
|
||||
"image_height": 800,
|
||||
"zones": [{
|
||||
"columns": columns,
|
||||
"cells": cells,
|
||||
}],
|
||||
}
|
||||
|
||||
|
||||
def _make_cell(row, col, text, left=0, width=50, col_width=200, col_x=0):
|
||||
"""Helper to create a cell dict with word_boxes at a specific position."""
|
||||
return {
|
||||
"cell_id": f"R{row:02d}_C{col}",
|
||||
"row_index": row,
|
||||
"col_index": col,
|
||||
"col_type": "column_text",
|
||||
"text": text,
|
||||
"confidence": 90.0,
|
||||
"bbox_px": {"x": left, "y": row * 25, "w": width, "h": 20},
|
||||
"word_boxes": [
|
||||
{"text": text, "left": left, "top": row * 25, "width": width, "height": 20, "conf": 90},
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@needs_spellchecker
|
||||
class TestAnalyseGrid:
|
||||
def test_empty_grid(self):
|
||||
result = analyse_grid_for_gutter_repair({"zones": []})
|
||||
assert result["suggestions"] == []
|
||||
assert result["stats"]["words_checked"] == 0
|
||||
|
||||
def test_detects_spell_fix_at_edge(self):
|
||||
# "stammeli" at position 160 in a column 0-200 wide = 80% = at gutter
|
||||
cells = [
|
||||
_make_cell(29, 2, "stammeli", left=540, width=55, col_width=200, col_x=400),
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
result = analyse_grid_for_gutter_repair(grid)
|
||||
suggestions = result["suggestions"]
|
||||
assert len(suggestions) >= 1
|
||||
assert suggestions[0]["type"] == "spell_fix"
|
||||
assert suggestions[0]["suggested_text"] == "stammeln"
|
||||
|
||||
def test_detects_hyphen_join(self):
|
||||
# Row 30: "ve" at gutter edge, Row 31: "künden"
|
||||
cells = [
|
||||
_make_cell(30, 2, "ve", left=570, width=25, col_width=200, col_x=400),
|
||||
_make_cell(31, 2, "künden", left=410, width=80, col_width=200, col_x=400),
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
result = analyse_grid_for_gutter_repair(grid)
|
||||
suggestions = result["suggestions"]
|
||||
# Should find hyphen_join or spell_fix
|
||||
assert len(suggestions) >= 1
|
||||
|
||||
def test_ignores_known_words(self):
|
||||
# "hello" is a known word — should not be suggested
|
||||
cells = [
|
||||
_make_cell(0, 0, "hello", left=160, width=35),
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
result = analyse_grid_for_gutter_repair(grid)
|
||||
# Should not suggest anything for known words
|
||||
spell_fixes = [s for s in result["suggestions"] if s["original_text"] == "hello"]
|
||||
assert len(spell_fixes) == 0
|
||||
|
||||
def test_ignores_words_not_at_edge(self):
|
||||
# "stammeli" at position 10 = NOT at gutter edge
|
||||
cells = [
|
||||
_make_cell(0, 0, "stammeli", left=10, width=50),
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
result = analyse_grid_for_gutter_repair(grid)
|
||||
assert len(result["suggestions"]) == 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Apply suggestions tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestApplySuggestions:
|
||||
def test_apply_spell_fix(self):
|
||||
cells = [
|
||||
{"cell_id": "R29_C2", "row_index": 29, "col_index": 2,
|
||||
"text": "er stammeli", "word_boxes": []},
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
suggestions = [{
|
||||
"id": "abc",
|
||||
"type": "spell_fix",
|
||||
"zone_index": 0,
|
||||
"row_index": 29,
|
||||
"col_index": 2,
|
||||
"original_text": "stammeli",
|
||||
"suggested_text": "stammeln",
|
||||
}]
|
||||
result = apply_gutter_suggestions(grid, ["abc"], suggestions)
|
||||
assert result["applied_count"] == 1
|
||||
assert grid["zones"][0]["cells"][0]["text"] == "er stammeln"
|
||||
|
||||
def test_apply_hyphen_join(self):
|
||||
cells = [
|
||||
{"cell_id": "R30_C2", "row_index": 30, "col_index": 2,
|
||||
"text": "ve", "word_boxes": []},
|
||||
{"cell_id": "R31_C2", "row_index": 31, "col_index": 2,
|
||||
"text": "künden und", "word_boxes": []},
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
suggestions = [{
|
||||
"id": "def",
|
||||
"type": "hyphen_join",
|
||||
"zone_index": 0,
|
||||
"row_index": 30,
|
||||
"col_index": 2,
|
||||
"original_text": "ve",
|
||||
"suggested_text": "verkünden",
|
||||
"next_row_index": 31,
|
||||
"display_parts": ["ver-", "künden"],
|
||||
"missing_chars": "r",
|
||||
}]
|
||||
result = apply_gutter_suggestions(grid, ["def"], suggestions)
|
||||
assert result["applied_count"] == 1
|
||||
# Current row: "ve" replaced with "ver-"
|
||||
assert grid["zones"][0]["cells"][0]["text"] == "ver-"
|
||||
# Next row: UNCHANGED — "künden" stays in its original row
|
||||
assert grid["zones"][0]["cells"][1]["text"] == "künden und"
|
||||
|
||||
def test_apply_nothing_when_no_accepted(self):
|
||||
grid = _make_grid([])
|
||||
result = apply_gutter_suggestions(grid, [], [])
|
||||
assert result["applied_count"] == 0
|
||||
|
||||
def test_skip_unknown_suggestion_id(self):
|
||||
cells = [
|
||||
{"cell_id": "R0_C0", "row_index": 0, "col_index": 0,
|
||||
"text": "test", "word_boxes": []},
|
||||
]
|
||||
grid = _make_grid(cells)
|
||||
suggestions = [{
|
||||
"id": "abc",
|
||||
"type": "spell_fix",
|
||||
"zone_index": 0,
|
||||
"row_index": 0,
|
||||
"col_index": 0,
|
||||
"original_text": "test",
|
||||
"suggested_text": "test2",
|
||||
}]
|
||||
# Accept a non-existent ID
|
||||
result = apply_gutter_suggestions(grid, ["nonexistent"], suggestions)
|
||||
assert result["applied_count"] == 0
|
||||
assert grid["zones"][0]["cells"][0]["text"] == "test"
|
||||
135
klausur-service/backend/tests/test_merge_wrapped_rows.py
Normal file
135
klausur-service/backend/tests/test_merge_wrapped_rows.py
Normal file
@@ -0,0 +1,135 @@
|
||||
"""Tests for _merge_wrapped_rows — cell-wrap continuation row merging."""
|
||||
|
||||
import pytest
|
||||
import sys
|
||||
import os
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
||||
from cv_cell_grid import _merge_wrapped_rows
|
||||
|
||||
|
||||
def _entry(row_index, english='', german='', example=''):
|
||||
return {
|
||||
'row_index': row_index,
|
||||
'english': english,
|
||||
'german': german,
|
||||
'example': example,
|
||||
}
|
||||
|
||||
|
||||
class TestMergeWrappedRows:
|
||||
"""Test cell-wrap continuation row merging."""
|
||||
|
||||
def test_basic_en_empty_merge(self):
|
||||
"""EN empty, DE has text → merge DE into previous row."""
|
||||
entries = [
|
||||
_entry(0, english='take part (in)', german='teilnehmen (an), mitmachen', example='More than 200 singers took'),
|
||||
_entry(1, english='', german='(bei)', example='part in the concert.'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['german'] == 'teilnehmen (an), mitmachen (bei)'
|
||||
assert result[0]['example'] == 'More than 200 singers took part in the concert.'
|
||||
|
||||
def test_en_empty_de_only(self):
|
||||
"""EN empty, only DE continuation (no example)."""
|
||||
entries = [
|
||||
_entry(0, english='competition', german='der Wettbewerb,'),
|
||||
_entry(1, english='', german='das Turnier'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['german'] == 'der Wettbewerb, das Turnier'
|
||||
|
||||
def test_en_empty_example_only(self):
|
||||
"""EN empty, only example continuation."""
|
||||
entries = [
|
||||
_entry(0, english='to arrive', german='ankommen', example='We arrived at the'),
|
||||
_entry(1, english='', german='', example='hotel at midnight.'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['example'] == 'We arrived at the hotel at midnight.'
|
||||
|
||||
def test_de_empty_paren_continuation(self):
|
||||
"""DE empty, EN starts with parenthetical → merge into previous EN."""
|
||||
entries = [
|
||||
_entry(0, english='to take part', german='teilnehmen'),
|
||||
_entry(1, english='(in)', german=''),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['english'] == 'to take part (in)'
|
||||
|
||||
def test_de_empty_lowercase_continuation(self):
|
||||
"""DE empty, EN starts lowercase → merge into previous EN."""
|
||||
entries = [
|
||||
_entry(0, english='to put up', german='aufstellen'),
|
||||
_entry(1, english='with sth.', german=''),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['english'] == 'to put up with sth.'
|
||||
|
||||
def test_no_merge_both_have_content(self):
|
||||
"""Both EN and DE have text → normal row, don't merge."""
|
||||
entries = [
|
||||
_entry(0, english='house', german='Haus'),
|
||||
_entry(1, english='garden', german='Garten'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_no_merge_new_word_uppercase(self):
|
||||
"""EN has uppercase text, DE is empty → could be a new word, not merged."""
|
||||
entries = [
|
||||
_entry(0, english='house', german='Haus'),
|
||||
_entry(1, english='Garden', german=''),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_triple_wrap(self):
|
||||
"""Three consecutive wrapped rows → all merge into first."""
|
||||
entries = [
|
||||
_entry(0, english='competition', german='der Wettbewerb,'),
|
||||
_entry(1, english='', german='das Turnier,'),
|
||||
_entry(2, english='', german='der Wettkampf'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['german'] == 'der Wettbewerb, das Turnier, der Wettkampf'
|
||||
|
||||
def test_empty_entries(self):
|
||||
"""Empty list."""
|
||||
assert _merge_wrapped_rows([]) == []
|
||||
|
||||
def test_single_entry(self):
|
||||
"""Single entry unchanged."""
|
||||
entries = [_entry(0, english='house', german='Haus')]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
|
||||
def test_mixed_normal_and_wrapped(self):
|
||||
"""Mix of normal rows and wrapped rows."""
|
||||
entries = [
|
||||
_entry(0, english='house', german='Haus'),
|
||||
_entry(1, english='take part (in)', german='teilnehmen (an),'),
|
||||
_entry(2, english='', german='mitmachen (bei)'),
|
||||
_entry(3, english='garden', german='Garten'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 3
|
||||
assert result[0]['english'] == 'house'
|
||||
assert result[1]['german'] == 'teilnehmen (an), mitmachen (bei)'
|
||||
assert result[2]['english'] == 'garden'
|
||||
|
||||
def test_comma_separator_handling(self):
|
||||
"""Previous DE ends with comma → no extra space needed."""
|
||||
entries = [
|
||||
_entry(0, english='word', german='Wort,'),
|
||||
_entry(1, english='', german='Ausdruck'),
|
||||
]
|
||||
result = _merge_wrapped_rows(entries)
|
||||
assert len(result) == 1
|
||||
assert result[0]['german'] == 'Wort, Ausdruck'
|
||||
@@ -18,6 +18,7 @@ from page_crop import (
|
||||
detect_page_splits,
|
||||
_detect_format,
|
||||
_detect_edge_projection,
|
||||
_detect_gutter_continuity,
|
||||
_detect_left_edge_shadow,
|
||||
_detect_right_edge_shadow,
|
||||
_detect_spine_shadow,
|
||||
@@ -564,3 +565,110 @@ class TestDetectPageSplits:
|
||||
assert pages[0]["x"] == 0
|
||||
total_w = sum(p["width"] for p in pages)
|
||||
assert total_w == w, f"Total page width {total_w} != image width {w}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests: _detect_gutter_continuity (camera book scans)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_camera_book_scan(h: int = 2400, w: int = 1700, gutter_side: str = "right") -> np.ndarray:
|
||||
"""Create a synthetic camera book scan with a subtle gutter shadow.
|
||||
|
||||
Camera gutter shadows are much subtler than scanner shadows:
|
||||
- Page brightness ~250 (well-lit)
|
||||
- Gutter brightness ~210-230 (slight shadow)
|
||||
- Shadow runs continuously from top to bottom
|
||||
- Gradient is ~40px wide
|
||||
"""
|
||||
img = np.full((h, w, 3), 250, dtype=np.uint8)
|
||||
|
||||
# Add some variation to make it realistic
|
||||
rng = np.random.RandomState(99)
|
||||
|
||||
# Subtle gutter gradient at the specified side
|
||||
gutter_w = int(w * 0.04) # ~4% of width
|
||||
gradient_w = int(w * 0.03) # transition zone
|
||||
|
||||
if gutter_side == "right":
|
||||
gutter_start = w - gutter_w - gradient_w
|
||||
for x in range(gutter_start, w):
|
||||
dist_from_start = x - gutter_start
|
||||
# Linear gradient from 250 down to 210
|
||||
brightness = int(250 - 40 * min(dist_from_start / (gutter_w + gradient_w), 1.0))
|
||||
img[:, x] = brightness
|
||||
else:
|
||||
gutter_end = gutter_w + gradient_w
|
||||
for x in range(gutter_end):
|
||||
dist_from_edge = gutter_end - x
|
||||
brightness = int(250 - 40 * min(dist_from_edge / (gutter_w + gradient_w), 1.0))
|
||||
img[:, x] = brightness
|
||||
|
||||
# Scatter some text (dark pixels) in the content area
|
||||
content_left = gutter_end + 20 if gutter_side == "left" else 50
|
||||
content_right = gutter_start - 20 if gutter_side == "right" else w - 50
|
||||
for _ in range(800):
|
||||
y = rng.randint(h // 10, h - h // 10)
|
||||
x = rng.randint(content_left, content_right)
|
||||
y2 = min(y + 3, h)
|
||||
x2 = min(x + 15, w)
|
||||
img[y:y2, x:x2] = 20
|
||||
|
||||
return img
|
||||
|
||||
|
||||
class TestDetectGutterContinuity:
|
||||
"""Tests for camera gutter shadow detection via vertical continuity."""
|
||||
|
||||
def test_detects_right_gutter(self):
|
||||
"""Should detect a subtle gutter shadow on the right side."""
|
||||
img = _make_camera_book_scan(gutter_side="right")
|
||||
h, w = img.shape[:2]
|
||||
gray = np.mean(img, axis=2).astype(np.uint8)
|
||||
search_w = w // 4
|
||||
right_start = w - search_w
|
||||
result = _detect_gutter_continuity(
|
||||
gray, gray[:, right_start:], right_start, w, "right",
|
||||
)
|
||||
assert result is not None
|
||||
# Gutter starts roughly at 93% of width (w - 4% - 3%)
|
||||
assert result > w * 0.85, f"Gutter x={result} too far left"
|
||||
assert result < w * 0.98, f"Gutter x={result} too close to edge"
|
||||
|
||||
def test_detects_left_gutter(self):
|
||||
"""Should detect a subtle gutter shadow on the left side."""
|
||||
img = _make_camera_book_scan(gutter_side="left")
|
||||
h, w = img.shape[:2]
|
||||
gray = np.mean(img, axis=2).astype(np.uint8)
|
||||
search_w = w // 4
|
||||
result = _detect_gutter_continuity(
|
||||
gray, gray[:, :search_w], 0, w, "left",
|
||||
)
|
||||
assert result is not None
|
||||
assert result > w * 0.02, f"Gutter x={result} too close to edge"
|
||||
assert result < w * 0.15, f"Gutter x={result} too far right"
|
||||
|
||||
def test_no_gutter_on_clean_page(self):
|
||||
"""Should NOT detect a gutter on a uniformly bright page."""
|
||||
img = np.full((2000, 1600, 3), 250, dtype=np.uint8)
|
||||
# Add some text but no gutter
|
||||
rng = np.random.RandomState(42)
|
||||
for _ in range(500):
|
||||
y = rng.randint(100, 1900)
|
||||
x = rng.randint(100, 1500)
|
||||
img[y:min(y+3, 2000), x:min(x+15, 1600)] = 20
|
||||
gray = np.mean(img, axis=2).astype(np.uint8)
|
||||
w = 1600
|
||||
search_w = w // 4
|
||||
right_start = w - search_w
|
||||
result_r = _detect_gutter_continuity(gray, gray[:, right_start:], right_start, w, "right")
|
||||
result_l = _detect_gutter_continuity(gray, gray[:, :search_w], 0, w, "left")
|
||||
assert result_r is None, f"False positive on right: x={result_r}"
|
||||
assert result_l is None, f"False positive on left: x={result_l}"
|
||||
|
||||
def test_integrated_with_crop(self):
|
||||
"""End-to-end: detect_and_crop_page should crop at the gutter."""
|
||||
img = _make_camera_book_scan(gutter_side="right")
|
||||
cropped, result = detect_and_crop_page(img)
|
||||
# The right border should be > 0 (gutter cropped)
|
||||
right_border = result["border_fractions"]["right"]
|
||||
assert right_border > 0.01, f"Right border {right_border} — gutter not cropped"
|
||||
|
||||
286
klausur-service/backend/tests/test_smart_spell.py
Normal file
286
klausur-service/backend/tests/test_smart_spell.py
Normal file
@@ -0,0 +1,286 @@
|
||||
"""Tests for SmartSpellChecker — language-aware OCR post-correction."""
|
||||
|
||||
import pytest
|
||||
import sys, os
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
from smart_spell import SmartSpellChecker, CorrectionResult
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sc():
|
||||
return SmartSpellChecker()
|
||||
|
||||
|
||||
# ─── Language Detection ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestLanguageDetection:
|
||||
|
||||
def test_clear_english_words(self, sc):
|
||||
for word in ("school", "beautiful", "homework", "yesterday", "because"):
|
||||
assert sc.detect_word_lang(word) in ("en", "both"), f"{word} should be EN"
|
||||
|
||||
def test_clear_german_words(self, sc):
|
||||
for word in ("Schule", "Hausaufgaben", "Freundschaft", "Straße", "Entschuldigung"):
|
||||
assert sc.detect_word_lang(word) in ("de", "both"), f"{word} should be DE"
|
||||
|
||||
def test_ambiguous_words(self, sc):
|
||||
"""Words that exist in both languages."""
|
||||
for word in ("Hand", "Finger", "Arm", "Name", "Ball"):
|
||||
assert sc.detect_word_lang(word) == "both", f"{word} should be 'both'"
|
||||
|
||||
def test_unknown_words(self, sc):
|
||||
assert sc.detect_word_lang("xyzqwk") == "unknown"
|
||||
assert sc.detect_word_lang("") == "unknown"
|
||||
|
||||
def test_english_sentence(self, sc):
|
||||
assert sc.detect_text_lang("I go to school every day") == "en"
|
||||
|
||||
def test_german_sentence(self, sc):
|
||||
assert sc.detect_text_lang("Ich gehe jeden Tag zur Schule") == "de"
|
||||
|
||||
def test_mixed_sentence(self, sc):
|
||||
# Dominant language should win
|
||||
lang = sc.detect_text_lang("I like to play Fußball with my Freunde")
|
||||
assert lang in ("en", "both")
|
||||
|
||||
|
||||
# ─── Single Word Correction ────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSingleWordCorrection:
|
||||
|
||||
def test_known_word_not_changed(self, sc):
|
||||
assert sc.correct_word("school", "en") is None
|
||||
assert sc.correct_word("Freund", "de") is None
|
||||
|
||||
def test_digit_letter_single(self, sc):
|
||||
assert sc.correct_word("g0od", "en") == "good"
|
||||
assert sc.correct_word("he1lo", "en") == "hello"
|
||||
|
||||
def test_digit_letter_multi(self, sc):
|
||||
"""Multiple digit substitutions (e.g., sch00l)."""
|
||||
result = sc.correct_word("sch00l", "en")
|
||||
assert result == "school", f"Expected 'school', got '{result}'"
|
||||
|
||||
def test_pipe_to_I(self, sc):
|
||||
assert sc.correct_word("|", "en") == "I"
|
||||
|
||||
def test_umlaut_schuler(self, sc):
|
||||
assert sc.correct_word("Schuler", "de") == "Schüler"
|
||||
|
||||
def test_umlaut_uber(self, sc):
|
||||
assert sc.correct_word("uber", "de") == "über"
|
||||
|
||||
def test_umlaut_bucher(self, sc):
|
||||
assert sc.correct_word("Bucher", "de") == "Bücher"
|
||||
|
||||
def test_umlaut_turkei(self, sc):
|
||||
assert sc.correct_word("Turkei", "de") == "Türkei"
|
||||
|
||||
def test_missing_char(self, sc):
|
||||
assert sc.correct_word("beautful", "en") == "beautiful"
|
||||
|
||||
def test_transposition(self, sc):
|
||||
assert sc.correct_word("teh", "en") == "the"
|
||||
|
||||
def test_swap(self, sc):
|
||||
assert sc.correct_word("freind", "en") == "friend"
|
||||
|
||||
def test_no_false_correction_cross_lang(self, sc):
|
||||
"""Don't correct a word that's valid in the other language.
|
||||
|
||||
'Schuler' in the EN column should NOT be corrected to 'Schuyler'
|
||||
because 'Schüler' is valid German — it's likely a German word
|
||||
that ended up in the wrong column (or is a surname).
|
||||
"""
|
||||
# Schuler is valid DE (after umlaut fix → Schüler), so
|
||||
# in the EN column it should be left alone
|
||||
result = sc.correct_word("Schuler", "en")
|
||||
# Should either be None (no change) or not "Schuyler"
|
||||
assert result != "Schuyler", "Should not false-correct German word in EN column"
|
||||
|
||||
|
||||
# ─── a/I Disambiguation ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestAIDisambiguation:
|
||||
|
||||
def test_I_before_verb(self, sc):
|
||||
assert sc._disambiguate_a_I("l", "am") == "I"
|
||||
assert sc._disambiguate_a_I("l", "was") == "I"
|
||||
assert sc._disambiguate_a_I("l", "think") == "I"
|
||||
assert sc._disambiguate_a_I("l", "have") == "I"
|
||||
assert sc._disambiguate_a_I("l", "don't") == "I"
|
||||
|
||||
def test_a_before_noun_adj(self, sc):
|
||||
assert sc._disambiguate_a_I("a", "book") == "a"
|
||||
assert sc._disambiguate_a_I("a", "cat") == "a"
|
||||
assert sc._disambiguate_a_I("a", "big") == "a"
|
||||
assert sc._disambiguate_a_I("a", "lot") == "a"
|
||||
|
||||
def test_uncertain_returns_none(self, sc):
|
||||
"""When context is ambiguous, return None (don't change)."""
|
||||
assert sc._disambiguate_a_I("l", "xyzqwk") is None
|
||||
|
||||
|
||||
# ─── Full Text Correction ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestFullTextCorrection:
|
||||
|
||||
def test_english_sentence(self, sc):
|
||||
result = sc.correct_text("teh cat is beautful", "en")
|
||||
assert result.changed
|
||||
assert "the" in result.corrected
|
||||
assert "beautiful" in result.corrected
|
||||
|
||||
def test_german_sentence_no_change(self, sc):
|
||||
result = sc.correct_text("Ich gehe zur Schule", "de")
|
||||
assert not result.changed
|
||||
|
||||
def test_german_umlaut_fix(self, sc):
|
||||
result = sc.correct_text("Der Schuler liest Bucher", "de")
|
||||
assert "Schüler" in result.corrected
|
||||
assert "Bücher" in result.corrected
|
||||
|
||||
def test_preserves_punctuation(self, sc):
|
||||
result = sc.correct_text("teh cat, beautful!", "en")
|
||||
assert "," in result.corrected
|
||||
assert "!" in result.corrected
|
||||
|
||||
def test_empty_text(self, sc):
|
||||
result = sc.correct_text("", "en")
|
||||
assert not result.changed
|
||||
assert result.corrected == ""
|
||||
|
||||
|
||||
# ─── Boundary Repair ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBoundaryRepair:
|
||||
|
||||
def test_ats_th_to_at_sth(self, sc):
|
||||
"""'ats th.' → 'at sth.' — shifted boundary with abbreviation."""
|
||||
result = sc.correct_text("be good ats th.", "en")
|
||||
assert "at sth." in result.corrected, f"Expected 'at sth.' in '{result.corrected}'"
|
||||
|
||||
def test_no_repair_common_pair(self, sc):
|
||||
"""Don't repair if both words form a common pair."""
|
||||
result = sc.correct_text("at the", "en")
|
||||
assert result.corrected == "at the"
|
||||
assert not result.changed
|
||||
|
||||
def test_boundary_shift_right(self, sc):
|
||||
"""Shift chars from word1 to word2."""
|
||||
repair = sc._try_boundary_repair("ats", "th")
|
||||
assert repair == ("at", "sth") or repair == ("at", "sth"), f"Got {repair}"
|
||||
|
||||
def test_boundary_shift_with_punct(self, sc):
|
||||
"""Preserve punctuation during boundary repair."""
|
||||
repair = sc._try_boundary_repair("ats", "th.")
|
||||
assert repair is not None
|
||||
assert repair[0] == "at"
|
||||
assert repair[1] == "sth."
|
||||
|
||||
def test_pound_sand_to_pounds_and(self, sc):
|
||||
"""'Pound sand' → 'Pounds and' — both valid but repair is much more frequent."""
|
||||
result = sc.correct_text("Pound sand euros", "en")
|
||||
assert "Pounds and" in result.corrected, f"Expected 'Pounds and' in '{result.corrected}'"
|
||||
|
||||
def test_wit_hit_to_with_it(self, sc):
|
||||
"""'wit hit' → 'with it' — frequency-based repair."""
|
||||
result = sc.correct_text("be careful wit hit", "en")
|
||||
assert "with it" in result.corrected, f"Expected 'with it' in '{result.corrected}'"
|
||||
|
||||
def test_done_euro_to_one_euro(self, sc):
|
||||
"""'done euro' → 'one euro' in context."""
|
||||
result = sc.correct_text("done euro", "en")
|
||||
assert "one euro" in result.corrected, f"Expected 'one euro' in '{result.corrected}'"
|
||||
|
||||
|
||||
# ─── Context Split ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestContextSplit:
|
||||
|
||||
def test_anew_to_a_new(self, sc):
|
||||
"""'anew' → 'a new' when followed by a noun."""
|
||||
result = sc.correct_text("anew book", "en")
|
||||
assert result.corrected == "a new book", f"Got '{result.corrected}'"
|
||||
|
||||
def test_anew_standalone_no_split(self, sc):
|
||||
"""'anew' at end of phrase might genuinely be 'anew'."""
|
||||
# "start anew" — no next word to indicate split
|
||||
# This is ambiguous, so we accept either behavior
|
||||
pass
|
||||
|
||||
def test_alive_not_split(self, sc):
|
||||
"""'alive' should never be split to 'a live'."""
|
||||
result = sc.correct_text("alive and well", "en")
|
||||
assert "alive" in result.corrected
|
||||
|
||||
def test_alone_not_split(self, sc):
|
||||
"""'alone' should never be split."""
|
||||
result = sc.correct_text("alone in the dark", "en")
|
||||
assert "alone" in result.corrected
|
||||
|
||||
def test_about_not_split(self, sc):
|
||||
"""'about' should never be split to 'a bout'."""
|
||||
result = sc.correct_text("about time", "en")
|
||||
assert "about" in result.corrected
|
||||
|
||||
|
||||
# ─── Vocab Entry Correction ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestVocabEntryCorrection:
|
||||
|
||||
def test_basic_entry(self, sc):
|
||||
results = sc.correct_vocab_entry(
|
||||
english="beautful",
|
||||
german="schön",
|
||||
)
|
||||
assert results["english"].corrected == "beautiful"
|
||||
assert results["german"].changed is False
|
||||
|
||||
def test_umlaut_in_german(self, sc):
|
||||
results = sc.correct_vocab_entry(
|
||||
english="school",
|
||||
german="Schuler",
|
||||
)
|
||||
assert results["english"].changed is False
|
||||
assert results["german"].corrected == "Schüler"
|
||||
|
||||
def test_example_auto_detect(self, sc):
|
||||
results = sc.correct_vocab_entry(
|
||||
english="friend",
|
||||
german="Freund",
|
||||
example="My best freind lives in Berlin",
|
||||
)
|
||||
assert "friend" in results["example"].corrected
|
||||
|
||||
|
||||
# ─── Speed ─────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestSpeed:
|
||||
|
||||
def test_100_corrections_under_500ms(self, sc):
|
||||
"""100 word corrections should complete in under 500ms."""
|
||||
import time
|
||||
words = [
|
||||
("beautful", "en"), ("teh", "en"), ("freind", "en"),
|
||||
("homwork", "en"), ("yesturday", "en"),
|
||||
("Schuler", "de"), ("Bucher", "de"), ("Turkei", "de"),
|
||||
("uber", "de"), ("Ubung", "de"),
|
||||
] * 10
|
||||
|
||||
t0 = time.time()
|
||||
for word, lang in words:
|
||||
sc.correct_word(word, lang)
|
||||
dt = time.time() - t0
|
||||
|
||||
print(f"\n 100 corrections in {dt*1000:.0f}ms")
|
||||
assert dt < 0.5, f"Too slow: {dt*1000:.0f}ms"
|
||||
494
klausur-service/backend/tests/test_spell_benchmark.py
Normal file
494
klausur-service/backend/tests/test_spell_benchmark.py
Normal file
@@ -0,0 +1,494 @@
|
||||
"""
|
||||
Benchmark: Spell-checking & language detection approaches for OCR post-correction.
|
||||
|
||||
Tests pyspellchecker (already used), symspellpy (candidate), and
|
||||
dual-dictionary language detection heuristic on real vocabulary OCR data.
|
||||
|
||||
Run: pytest tests/test_spell_benchmark.py -v -s
|
||||
"""
|
||||
|
||||
import time
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _load_pyspellchecker():
|
||||
from spellchecker import SpellChecker
|
||||
en = SpellChecker(language='en', distance=1)
|
||||
de = SpellChecker(language='de', distance=1)
|
||||
return en, de
|
||||
|
||||
|
||||
def _load_symspellpy():
|
||||
"""Load symspellpy with English frequency dict (bundled)."""
|
||||
from symspellpy import SymSpell, Verbosity
|
||||
sym = SymSpell(max_dictionary_edit_distance=2)
|
||||
# Use bundled English frequency dict
|
||||
import pkg_resources
|
||||
dict_path = pkg_resources.resource_filename("symspellpy", "frequency_dictionary_en_82_765.txt")
|
||||
sym.load_dictionary(dict_path, term_index=0, count_index=1)
|
||||
return sym, Verbosity
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test data: (ocr_output, expected_correction, language, category)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
OCR_TEST_CASES = [
|
||||
# --- Single-char ambiguity ---
|
||||
("l am a student", "I am a student", "en", "a_vs_I"),
|
||||
("a book", "a book", "en", "a_vs_I"), # should NOT change
|
||||
("I like cats", "I like cats", "en", "a_vs_I"), # should NOT change
|
||||
("lt is raining", "It is raining", "en", "a_vs_I"), # l→I at start
|
||||
|
||||
# --- Digit-letter confusion ---
|
||||
("g0od", "good", "en", "digit_letter"),
|
||||
("sch00l", "school", "en", "digit_letter"),
|
||||
("he1lo", "hello", "en", "digit_letter"),
|
||||
("Sch0n", "Schon", "de", "digit_letter"), # German
|
||||
|
||||
# --- Umlaut drops ---
|
||||
("schon", "schön", "de", "umlaut"), # context: "schon" is also valid DE!
|
||||
("Schuler", "Schüler", "de", "umlaut"),
|
||||
("uber", "über", "de", "umlaut"),
|
||||
("Bucher", "Bücher", "de", "umlaut"),
|
||||
("Turkei", "Türkei", "de", "umlaut"),
|
||||
|
||||
# --- Common OCR errors ---
|
||||
("beautful", "beautiful", "en", "missing_char"),
|
||||
("teh", "the", "en", "transposition"),
|
||||
("becasue", "because", "en", "transposition"),
|
||||
("freind", "friend", "en", "swap"),
|
||||
("Freund", "Freund", "de", "correct"), # already correct
|
||||
|
||||
# --- Merged words ---
|
||||
("atmyschool", "at my school", "en", "merged"),
|
||||
("goodidea", "good idea", "en", "merged"),
|
||||
|
||||
# --- Mixed language example sentences ---
|
||||
("I go to teh school", "I go to the school", "en", "sentence"),
|
||||
("Ich gehe zur Schule", "Ich gehe zur Schule", "de", "sentence_correct"),
|
||||
]
|
||||
|
||||
# Language detection test: (word, expected_language)
|
||||
LANG_DETECT_CASES = [
|
||||
# Clear English
|
||||
("school", "en"),
|
||||
("beautiful", "en"),
|
||||
("homework", "en"),
|
||||
("yesterday", "en"),
|
||||
("children", "en"),
|
||||
("because", "en"),
|
||||
("environment", "en"),
|
||||
("although", "en"),
|
||||
|
||||
# Clear German
|
||||
("Schule", "de"),
|
||||
("Hausaufgaben", "de"),
|
||||
("Freundschaft", "de"),
|
||||
("Umwelt", "de"),
|
||||
("Kindergarten", "de"), # also used in English!
|
||||
("Bücher", "de"),
|
||||
("Straße", "de"),
|
||||
("Entschuldigung", "de"),
|
||||
|
||||
# Ambiguous (exist in both)
|
||||
("Hand", "both"),
|
||||
("Finger", "both"),
|
||||
("Arm", "both"),
|
||||
("Name", "both"),
|
||||
("Ball", "both"),
|
||||
|
||||
# Short/tricky
|
||||
("a", "en"),
|
||||
("I", "en"),
|
||||
("in", "both"),
|
||||
("an", "both"),
|
||||
("the", "en"),
|
||||
("die", "de"),
|
||||
("der", "de"),
|
||||
("to", "en"),
|
||||
("zu", "de"),
|
||||
]
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Tests
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestPyspellchecker:
|
||||
"""Test pyspellchecker capabilities for OCR correction."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup(self):
|
||||
self.en, self.de = _load_pyspellchecker()
|
||||
|
||||
def test_known_words(self):
|
||||
"""Verify basic dictionary lookup."""
|
||||
assert self.en.known(["school"])
|
||||
assert self.en.known(["beautiful"])
|
||||
assert self.de.known(["schule"]) # lowercase
|
||||
assert self.de.known(["freund"])
|
||||
# Not known
|
||||
assert not self.en.known(["xyzqwk"])
|
||||
assert not self.de.known(["xyzqwk"])
|
||||
|
||||
def test_correction_quality(self):
|
||||
"""Test correction suggestions for OCR errors."""
|
||||
results = []
|
||||
for ocr, expected, lang, category in OCR_TEST_CASES:
|
||||
if category in ("sentence", "sentence_correct", "merged", "a_vs_I"):
|
||||
continue # skip multi-word cases
|
||||
|
||||
spell = self.en if lang == "en" else self.de
|
||||
words = ocr.split()
|
||||
corrected = []
|
||||
for w in words:
|
||||
if spell.known([w.lower()]):
|
||||
corrected.append(w)
|
||||
else:
|
||||
fix = spell.correction(w.lower())
|
||||
if fix and fix != w.lower():
|
||||
# Preserve case
|
||||
if w[0].isupper():
|
||||
fix = fix[0].upper() + fix[1:]
|
||||
corrected.append(fix)
|
||||
else:
|
||||
corrected.append(w)
|
||||
result = " ".join(corrected)
|
||||
ok = result == expected
|
||||
results.append((ocr, expected, result, ok, category))
|
||||
if not ok:
|
||||
print(f" MISS: '{ocr}' → '{result}' (expected '{expected}') [{category}]")
|
||||
else:
|
||||
print(f" OK: '{ocr}' → '{result}' [{category}]")
|
||||
|
||||
correct = sum(1 for *_, ok, _ in results if ok)
|
||||
total = len(results)
|
||||
print(f"\npyspellchecker: {correct}/{total} correct ({100*correct/total:.0f}%)")
|
||||
|
||||
def test_language_detection_heuristic(self):
|
||||
"""Test dual-dictionary language detection."""
|
||||
results = []
|
||||
for word, expected_lang in LANG_DETECT_CASES:
|
||||
w = word.lower()
|
||||
in_en = bool(self.en.known([w]))
|
||||
in_de = bool(self.de.known([w]))
|
||||
|
||||
if in_en and in_de:
|
||||
detected = "both"
|
||||
elif in_en:
|
||||
detected = "en"
|
||||
elif in_de:
|
||||
detected = "de"
|
||||
else:
|
||||
detected = "unknown"
|
||||
|
||||
ok = detected == expected_lang
|
||||
results.append((word, expected_lang, detected, ok))
|
||||
if not ok:
|
||||
print(f" MISS: '{word}' → {detected} (expected {expected_lang})")
|
||||
else:
|
||||
print(f" OK: '{word}' → {detected}")
|
||||
|
||||
correct = sum(1 for *_, ok in results if ok)
|
||||
total = len(results)
|
||||
print(f"\nLang detection heuristic: {correct}/{total} correct ({100*correct/total:.0f}%)")
|
||||
|
||||
def test_umlaut_awareness(self):
|
||||
"""Test if pyspellchecker suggests umlaut corrections."""
|
||||
# "Schuler" should suggest "Schüler"
|
||||
candidates = self.de.candidates("schuler")
|
||||
print(f" 'schuler' candidates: {candidates}")
|
||||
# "uber" should suggest "über"
|
||||
candidates_uber = self.de.candidates("uber")
|
||||
print(f" 'uber' candidates: {candidates_uber}")
|
||||
# "Turkei" should suggest "Türkei"
|
||||
candidates_turkei = self.de.candidates("turkei")
|
||||
print(f" 'turkei' candidates: {candidates_turkei}")
|
||||
|
||||
def test_speed_100_words(self):
|
||||
"""Measure correction speed for 100 words."""
|
||||
words_en = ["beautful", "teh", "becasue", "freind", "shcool",
|
||||
"homwork", "yesturday", "chilren", "becuse", "enviroment"] * 10
|
||||
t0 = time.time()
|
||||
for w in words_en:
|
||||
self.en.correction(w)
|
||||
dt = time.time() - t0
|
||||
print(f"\n pyspellchecker: 100 EN corrections in {dt*1000:.0f}ms")
|
||||
|
||||
words_de = ["schuler", "bucher", "turkei", "strasze", "entschuldigung",
|
||||
"kindergaten", "freumd", "hauaufgaben", "umwlt", "ubung"] * 10
|
||||
t0 = time.time()
|
||||
for w in words_de:
|
||||
self.de.correction(w)
|
||||
dt = time.time() - t0
|
||||
print(f" pyspellchecker: 100 DE corrections in {dt*1000:.0f}ms")
|
||||
|
||||
|
||||
class TestSymspellpy:
|
||||
"""Test symspellpy as a faster alternative."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup(self):
|
||||
try:
|
||||
self.sym, self.Verbosity = _load_symspellpy()
|
||||
self.available = True
|
||||
except (ImportError, FileNotFoundError) as e:
|
||||
self.available = False
|
||||
pytest.skip(f"symspellpy not installed: {e}")
|
||||
|
||||
def test_correction_quality(self):
|
||||
"""Test symspellpy corrections (EN only — no DE dict bundled)."""
|
||||
en_cases = [(o, e, c) for o, e, _, c in OCR_TEST_CASES
|
||||
if _ == "en" and c not in ("sentence", "sentence_correct", "merged", "a_vs_I")]
|
||||
|
||||
results = []
|
||||
for ocr, expected, category in en_cases:
|
||||
suggestions = self.sym.lookup(ocr.lower(), self.Verbosity.CLOSEST, max_edit_distance=2)
|
||||
if suggestions:
|
||||
fix = suggestions[0].term
|
||||
if ocr[0].isupper():
|
||||
fix = fix[0].upper() + fix[1:]
|
||||
result = fix
|
||||
else:
|
||||
result = ocr
|
||||
|
||||
ok = result == expected
|
||||
results.append((ocr, expected, result, ok, category))
|
||||
status = "OK" if ok else "MISS"
|
||||
print(f" {status}: '{ocr}' → '{result}' (expected '{expected}') [{category}]")
|
||||
|
||||
correct = sum(1 for *_, ok, _ in results if ok)
|
||||
total = len(results)
|
||||
print(f"\nsymspellpy EN: {correct}/{total} correct ({100*correct/total:.0f}%)")
|
||||
|
||||
def test_speed_100_words(self):
|
||||
"""Measure symspellpy correction speed for 100 words."""
|
||||
words = ["beautful", "teh", "becasue", "freind", "shcool",
|
||||
"homwork", "yesturday", "chilren", "becuse", "enviroment"] * 10
|
||||
t0 = time.time()
|
||||
for w in words:
|
||||
self.sym.lookup(w, self.Verbosity.CLOSEST, max_edit_distance=2)
|
||||
dt = time.time() - t0
|
||||
print(f"\n symspellpy: 100 EN corrections in {dt*1000:.0f}ms")
|
||||
|
||||
def test_compound_segmentation(self):
|
||||
"""Test symspellpy's word segmentation for merged words."""
|
||||
cases = [
|
||||
("atmyschool", "at my school"),
|
||||
("goodidea", "good idea"),
|
||||
("makeadecision", "make a decision"),
|
||||
]
|
||||
for merged, expected in cases:
|
||||
result = self.sym.word_segmentation(merged)
|
||||
ok = result.corrected_string == expected
|
||||
status = "OK" if ok else "MISS"
|
||||
print(f" {status}: '{merged}' → '{result.corrected_string}' (expected '{expected}')")
|
||||
|
||||
|
||||
class TestContextDisambiguation:
|
||||
"""Test context-based disambiguation for a/I and similar cases."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup(self):
|
||||
self.en, self.de = _load_pyspellchecker()
|
||||
|
||||
def test_bigram_context(self):
|
||||
"""Use simple bigram heuristic for a/I disambiguation.
|
||||
|
||||
Approach: check if 'a <next_word>' or 'I <next_word>' is more
|
||||
common by checking if <next_word> is a noun (follows 'a') or
|
||||
verb (follows 'I').
|
||||
"""
|
||||
# Common words that follow "I" (verbs)
|
||||
i_followers = {"am", "was", "have", "had", "do", "did", "will",
|
||||
"would", "can", "could", "should", "shall", "may",
|
||||
"might", "think", "know", "see", "want", "need",
|
||||
"like", "love", "hate", "go", "went", "come",
|
||||
"came", "say", "said", "get", "got", "make", "made",
|
||||
"take", "took", "give", "gave", "tell", "told",
|
||||
"feel", "felt", "find", "found", "believe", "hope",
|
||||
"remember", "forget", "understand", "mean", "meant",
|
||||
"don't", "didn't", "can't", "won't", "couldn't",
|
||||
"shouldn't", "wouldn't", "haven't", "hadn't"}
|
||||
|
||||
# Common words that follow "a" (nouns/adjectives)
|
||||
a_followers = {"lot", "few", "little", "bit", "good", "bad",
|
||||
"big", "small", "great", "new", "old", "long",
|
||||
"short", "man", "woman", "boy", "girl", "dog",
|
||||
"cat", "book", "car", "house", "day", "year",
|
||||
"nice", "beautiful", "large", "huge", "tiny"}
|
||||
|
||||
def disambiguate_a_I(token: str, next_word: str) -> str:
|
||||
"""Given an ambiguous 'a' or 'I' (or 'l'), pick the right one."""
|
||||
nw = next_word.lower()
|
||||
if nw in i_followers:
|
||||
return "I"
|
||||
if nw in a_followers:
|
||||
return "a"
|
||||
# Fallback: if next word is known verb → I, known adj/noun → a
|
||||
# For now, use a simple heuristic: lowercase → "a", uppercase first letter → "I"
|
||||
return token # no change if uncertain
|
||||
|
||||
cases = [
|
||||
("l", "am", "I"),
|
||||
("l", "was", "I"),
|
||||
("l", "think", "I"),
|
||||
("a", "book", "a"),
|
||||
("a", "cat", "a"),
|
||||
("a", "lot", "a"),
|
||||
("l", "big", "a"), # "a big ..."
|
||||
("a", "have", "I"), # "I have ..."
|
||||
]
|
||||
|
||||
results = []
|
||||
for token, next_word, expected in cases:
|
||||
result = disambiguate_a_I(token, next_word)
|
||||
ok = result == expected
|
||||
results.append((token, next_word, expected, result, ok))
|
||||
status = "OK" if ok else "MISS"
|
||||
print(f" {status}: '{token} {next_word}...' → '{result}' (expected '{expected}')")
|
||||
|
||||
correct = sum(1 for *_, ok in results if ok)
|
||||
total = len(results)
|
||||
print(f"\na/I disambiguation: {correct}/{total} correct ({100*correct/total:.0f}%)")
|
||||
|
||||
|
||||
class TestLangDetectLibrary:
|
||||
"""Test py3langid or langdetect if available."""
|
||||
|
||||
def test_py3langid(self):
|
||||
try:
|
||||
import langid
|
||||
except ImportError:
|
||||
pytest.skip("langid not installed")
|
||||
|
||||
sentences = [
|
||||
("I go to school every day", "en"),
|
||||
("Ich gehe jeden Tag zur Schule", "de"),
|
||||
("The weather is nice today", "en"),
|
||||
("Das Wetter ist heute schön", "de"),
|
||||
("She likes to play football", "en"),
|
||||
("Er spielt gerne Fußball", "de"),
|
||||
]
|
||||
|
||||
results = []
|
||||
for text, expected in sentences:
|
||||
lang, confidence = langid.classify(text)
|
||||
ok = lang == expected
|
||||
results.append(ok)
|
||||
status = "OK" if ok else "MISS"
|
||||
print(f" {status}: '{text[:40]}...' → {lang} ({confidence:.2f}) (expected {expected})")
|
||||
|
||||
correct = sum(results)
|
||||
print(f"\nlangid sentence detection: {correct}/{len(results)} correct")
|
||||
|
||||
def test_langid_single_words(self):
|
||||
"""langid on single words — expected to be unreliable."""
|
||||
try:
|
||||
import langid
|
||||
except ImportError:
|
||||
pytest.skip("langid not installed")
|
||||
|
||||
words = [("school", "en"), ("Schule", "de"), ("book", "en"),
|
||||
("Buch", "de"), ("car", "en"), ("Auto", "de"),
|
||||
("a", "en"), ("I", "en"), ("der", "de"), ("the", "en")]
|
||||
|
||||
results = []
|
||||
for word, expected in words:
|
||||
lang, conf = langid.classify(word)
|
||||
ok = lang == expected
|
||||
results.append(ok)
|
||||
status = "OK" if ok else "MISS"
|
||||
print(f" {status}: '{word}' → {lang} ({conf:.2f}) (expected {expected})")
|
||||
|
||||
correct = sum(results)
|
||||
print(f"\nlangid single-word: {correct}/{len(results)} correct")
|
||||
|
||||
|
||||
class TestIntegratedApproach:
|
||||
"""Test the combined approach: dict-heuristic for lang + spell correction."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup(self):
|
||||
self.en, self.de = _load_pyspellchecker()
|
||||
|
||||
def detect_language(self, word: str) -> str:
|
||||
"""Dual-dict heuristic language detection."""
|
||||
w = word.lower()
|
||||
# Skip very short words — too ambiguous
|
||||
if len(w) <= 2:
|
||||
return "ambiguous"
|
||||
in_en = bool(self.en.known([w]))
|
||||
in_de = bool(self.de.known([w]))
|
||||
if in_en and in_de:
|
||||
return "both"
|
||||
if in_en:
|
||||
return "en"
|
||||
if in_de:
|
||||
return "de"
|
||||
return "unknown"
|
||||
|
||||
def correct_word(self, word: str, expected_lang: str) -> str:
|
||||
"""Correct a single word given the expected language."""
|
||||
w_lower = word.lower()
|
||||
spell = self.en if expected_lang == "en" else self.de
|
||||
|
||||
# Already known
|
||||
if spell.known([w_lower]):
|
||||
return word
|
||||
|
||||
# Also check the other language — might be fine
|
||||
other = self.de if expected_lang == "en" else self.en
|
||||
if other.known([w_lower]):
|
||||
return word # valid in the other language
|
||||
|
||||
# Try correction
|
||||
fix = spell.correction(w_lower)
|
||||
if fix and fix != w_lower:
|
||||
if word[0].isupper():
|
||||
fix = fix[0].upper() + fix[1:]
|
||||
return fix
|
||||
|
||||
return word
|
||||
|
||||
def test_full_pipeline(self):
|
||||
"""Test: detect language → correct with appropriate dict."""
|
||||
vocab_entries = [
|
||||
# (english_col, german_col, expected_en, expected_de)
|
||||
("beautful", "schön", "beautiful", "schön"),
|
||||
("school", "Schule", "school", "Schule"),
|
||||
("teh cat", "die Katze", "the cat", "die Katze"),
|
||||
("freind", "Freund", "friend", "Freund"),
|
||||
("homwork", "Hausaufgaben", "homework", "Hausaufgaben"),
|
||||
("Schuler", "Schuler", "Schuler", "Schüler"), # DE umlaut: Schüler
|
||||
]
|
||||
|
||||
en_correct = 0
|
||||
de_correct = 0
|
||||
total = len(vocab_entries)
|
||||
|
||||
for en_ocr, de_ocr, exp_en, exp_de in vocab_entries:
|
||||
# Correct each word in the column
|
||||
en_words = en_ocr.split()
|
||||
de_words = de_ocr.split()
|
||||
en_fixed = " ".join(self.correct_word(w, "en") for w in en_words)
|
||||
de_fixed = " ".join(self.correct_word(w, "de") for w in de_words)
|
||||
|
||||
en_ok = en_fixed == exp_en
|
||||
de_ok = de_fixed == exp_de
|
||||
en_correct += en_ok
|
||||
de_correct += de_ok
|
||||
|
||||
en_status = "OK" if en_ok else "MISS"
|
||||
de_status = "OK" if de_ok else "MISS"
|
||||
print(f" EN {en_status}: '{en_ocr}' → '{en_fixed}' (expected '{exp_en}')")
|
||||
print(f" DE {de_status}: '{de_ocr}' → '{de_fixed}' (expected '{exp_de}')")
|
||||
|
||||
print(f"\nEN corrections: {en_correct}/{total} correct")
|
||||
print(f"DE corrections: {de_correct}/{total} correct")
|
||||
141
klausur-service/backend/tests/test_unified_grid.py
Normal file
141
klausur-service/backend/tests/test_unified_grid.py
Normal file
@@ -0,0 +1,141 @@
|
||||
"""Tests for unified_grid.py — merging multi-zone grids into single zone."""
|
||||
|
||||
import pytest
|
||||
import sys, os
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
from unified_grid import (
|
||||
_compute_dominant_row_height,
|
||||
_classify_boxes,
|
||||
build_unified_grid,
|
||||
)
|
||||
|
||||
|
||||
def _make_content_zone(rows_y, num_cols=4, bbox_w=1400):
|
||||
"""Helper: create a content zone with rows at given y positions."""
|
||||
rows = [{"index": i, "y_min_px": y, "y_max_px": y + 30, "y_min": y, "y_max": y + 30,
|
||||
"is_header": False} for i, y in enumerate(rows_y)]
|
||||
cols = [{"index": i, "x_min_px": i * (bbox_w // num_cols), "x_max_px": (i + 1) * (bbox_w // num_cols)}
|
||||
for i in range(num_cols)]
|
||||
cells = [{"row_index": r["index"], "col_index": c["index"], "col_type": f"column_{c['index']+1}",
|
||||
"text": f"R{r['index']}C{c['index']}", "cell_id": f"R{r['index']}C{c['index']}",
|
||||
"word_boxes": [], "confidence": 90, "is_bold": False, "ocr_engine": "test",
|
||||
"bbox_px": {"x": 0, "y": 0, "w": 100, "h": 30}, "bbox_pct": {"x": 0, "y": 0, "w": 10, "h": 2}}
|
||||
for r in rows for c in cols]
|
||||
return {
|
||||
"zone_index": 1, "zone_type": "content",
|
||||
"bbox_px": {"x": 50, "y": rows_y[0] - 10, "w": bbox_w, "h": rows_y[-1] - rows_y[0] + 50},
|
||||
"bbox_pct": {"x": 3, "y": 10, "w": 85, "h": 80},
|
||||
"columns": cols, "rows": rows, "cells": cells,
|
||||
"header_rows": [], "border": None, "word_count": len(cells),
|
||||
}
|
||||
|
||||
|
||||
def _make_box_zone(zone_index, bbox, cells_data, bg_hex="#2563eb", layout_type="flowing"):
|
||||
"""Helper: create a box zone."""
|
||||
rows = [{"index": i, "y_min_px": bbox["y"] + i * 30, "y_max_px": bbox["y"] + (i + 1) * 30,
|
||||
"is_header": i == 0} for i in range(len(cells_data))]
|
||||
cols = [{"index": 0, "x_min_px": bbox["x"], "x_max_px": bbox["x"] + bbox["w"]}]
|
||||
cells = [{"row_index": i, "col_index": 0, "col_type": "column_1",
|
||||
"text": text, "cell_id": f"Z{zone_index}_R{i}C0",
|
||||
"word_boxes": [], "confidence": 90, "is_bold": False, "ocr_engine": "test",
|
||||
"bbox_px": {"x": bbox["x"], "y": bbox["y"] + i * 30, "w": bbox["w"], "h": 30},
|
||||
"bbox_pct": {"x": 50, "y": 50, "w": 30, "h": 10}}
|
||||
for i, text in enumerate(cells_data)]
|
||||
return {
|
||||
"zone_index": zone_index, "zone_type": "box",
|
||||
"bbox_px": bbox, "bbox_pct": {"x": 50, "y": 50, "w": 30, "h": 10},
|
||||
"columns": cols, "rows": rows, "cells": cells,
|
||||
"header_rows": [0], "border": None, "word_count": len(cells),
|
||||
"box_bg_hex": bg_hex, "box_bg_color": "blue", "box_layout_type": layout_type,
|
||||
}
|
||||
|
||||
|
||||
class TestDominantRowHeight:
|
||||
|
||||
def test_regular_spacing(self):
|
||||
"""Rows with uniform spacing → median = that spacing."""
|
||||
zone = _make_content_zone([100, 147, 194, 241, 288])
|
||||
h = _compute_dominant_row_height(zone)
|
||||
assert h == 47
|
||||
|
||||
def test_filters_large_gaps(self):
|
||||
"""Large gaps (box interruptions) are filtered out."""
|
||||
zone = _make_content_zone([100, 147, 194, 600, 647, 694])
|
||||
# spacings: 47, 47, 406(!), 47, 47 → filter >100 → median of [47,47,47,47] = 47
|
||||
h = _compute_dominant_row_height(zone)
|
||||
assert h == 47
|
||||
|
||||
def test_single_row(self):
|
||||
"""Single row → default 47."""
|
||||
zone = _make_content_zone([100])
|
||||
h = _compute_dominant_row_height(zone)
|
||||
assert h == 47.0
|
||||
|
||||
|
||||
class TestClassifyBoxes:
|
||||
|
||||
def test_full_width(self):
|
||||
"""Box wider than 85% of content → full_width."""
|
||||
boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 1300, "h": 200}, ["Header", "Text"])]
|
||||
result = _classify_boxes(boxes, content_width=1400)
|
||||
assert result[0]["classification"] == "full_width"
|
||||
|
||||
def test_partial_width_right(self):
|
||||
"""Narrow box on right side → partial_width, side=right."""
|
||||
boxes = [_make_box_zone(2, {"x": 800, "y": 500, "w": 500, "h": 200}, ["Header", "Text"])]
|
||||
result = _classify_boxes(boxes, content_width=1400)
|
||||
assert result[0]["classification"] == "partial_width"
|
||||
assert result[0]["side"] == "right"
|
||||
|
||||
def test_partial_width_left(self):
|
||||
"""Narrow box on left side → partial_width, side=left."""
|
||||
boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 500, "h": 200}, ["Header", "Text"])]
|
||||
result = _classify_boxes(boxes, content_width=1400)
|
||||
assert result[0]["classification"] == "partial_width"
|
||||
assert result[0]["side"] == "left"
|
||||
|
||||
def test_text_line_count(self):
|
||||
"""Total text lines counted including \\n."""
|
||||
boxes = [_make_box_zone(2, {"x": 50, "y": 500, "w": 500, "h": 200},
|
||||
["Header", "Line1\nLine2\nLine3"])]
|
||||
result = _classify_boxes(boxes, content_width=1400)
|
||||
assert result[0]["total_lines"] == 4 # "Header" (1) + "Line1\nLine2\nLine3" (3)
|
||||
|
||||
|
||||
class TestBuildUnifiedGrid:
|
||||
|
||||
def test_content_only(self):
|
||||
"""Content zone without boxes → single unified zone."""
|
||||
content = _make_content_zone([100, 147, 194, 241])
|
||||
result = build_unified_grid([content], 1600, 2200, {})
|
||||
assert result["is_unified"] is True
|
||||
assert len(result["zones"]) == 1
|
||||
assert result["zones"][0]["zone_type"] == "unified"
|
||||
assert result["summary"]["total_rows"] == 4
|
||||
|
||||
def test_full_width_box_integration(self):
|
||||
"""Full-width box rows are integrated into unified grid."""
|
||||
content = _make_content_zone([100, 147, 194, 600, 647])
|
||||
box = _make_box_zone(2, {"x": 50, "y": 300, "w": 1300, "h": 200},
|
||||
["Box Header", "Box Row 1", "Box Row 2"])
|
||||
result = build_unified_grid([content, box], 1600, 2200, {})
|
||||
assert result["is_unified"] is True
|
||||
total_rows = result["summary"]["total_rows"]
|
||||
# 5 content rows + 3 box rows = 8
|
||||
assert total_rows == 8
|
||||
|
||||
def test_box_cells_tagged(self):
|
||||
"""Box-origin cells have source_zone_type and box_region."""
|
||||
content = _make_content_zone([100, 147, 600, 647])
|
||||
box = _make_box_zone(2, {"x": 50, "y": 300, "w": 1300, "h": 200}, ["Box Text"])
|
||||
result = build_unified_grid([content, box], 1600, 2200, {})
|
||||
box_cells = [c for c in result["zones"][0]["cells"] if c.get("source_zone_type") == "box"]
|
||||
assert len(box_cells) > 0
|
||||
assert box_cells[0]["box_region"]["bg_hex"] == "#2563eb"
|
||||
|
||||
def test_no_content_zone(self):
|
||||
"""No content zone → returns zones as-is."""
|
||||
box = _make_box_zone(2, {"x": 50, "y": 300, "w": 500, "h": 200}, ["Text"])
|
||||
result = build_unified_grid([box], 1600, 2200, {})
|
||||
assert "zones" in result
|
||||
104
klausur-service/backend/tests/test_word_split.py
Normal file
104
klausur-service/backend/tests/test_word_split.py
Normal file
@@ -0,0 +1,104 @@
|
||||
"""Tests for merged-word splitting in cv_review.py.
|
||||
|
||||
The OCR sometimes merges adjacent words when character spacing is tight,
|
||||
e.g. "atmyschool" → "at my school". The _try_split_merged_word() function
|
||||
uses dynamic programming + dictionary lookup to find valid splits.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from cv_review import _try_split_merged_word, _spell_dict_knows, _SPELL_AVAILABLE
|
||||
|
||||
pytestmark = pytest.mark.skipif(
|
||||
not _SPELL_AVAILABLE,
|
||||
reason="pyspellchecker not installed",
|
||||
)
|
||||
|
||||
|
||||
class TestTrySplitMergedWord:
|
||||
"""Tests for _try_split_merged_word()."""
|
||||
|
||||
# --- Should split ---
|
||||
|
||||
def test_atmyschool(self):
|
||||
result = _try_split_merged_word("atmyschool")
|
||||
assert result is not None
|
||||
words = result.lower().split()
|
||||
assert "at" in words
|
||||
assert "my" in words
|
||||
assert "school" in words
|
||||
|
||||
def test_goodidea(self):
|
||||
result = _try_split_merged_word("goodidea")
|
||||
assert result is not None
|
||||
assert "good" in result.lower()
|
||||
assert "idea" in result.lower()
|
||||
|
||||
def test_comeon(self):
|
||||
result = _try_split_merged_word("Comeon")
|
||||
assert result is not None
|
||||
assert result.startswith("Come") # preserves casing
|
||||
assert "on" in result.lower().split()
|
||||
|
||||
def test_youknowthe(self):
|
||||
result = _try_split_merged_word("youknowthe")
|
||||
assert result is not None
|
||||
words = result.lower().split()
|
||||
assert "you" in words
|
||||
assert "know" in words
|
||||
assert "the" in words
|
||||
|
||||
# --- Should NOT split ---
|
||||
|
||||
def test_known_word_unchanged(self):
|
||||
"""A known dictionary word should not be split."""
|
||||
assert _try_split_merged_word("school") is None
|
||||
assert _try_split_merged_word("beautiful") is None
|
||||
assert _try_split_merged_word("together") is None
|
||||
|
||||
def test_anew(self):
|
||||
result = _try_split_merged_word("anew")
|
||||
# "anew" is itself a known word, so should NOT be split
|
||||
# But "a new" is also valid. Dictionary decides.
|
||||
# If "anew" is known → None. If not → "a new".
|
||||
# Either way, both are acceptable.
|
||||
pass # depends on dictionary
|
||||
|
||||
def test_imadea(self):
|
||||
result = _try_split_merged_word("Imadea")
|
||||
assert result is not None
|
||||
assert "made" in result.lower() or "I" in result
|
||||
|
||||
def test_makeadecision(self):
|
||||
result = _try_split_merged_word("makeadecision")
|
||||
assert result is not None
|
||||
assert "make" in result.lower()
|
||||
assert "decision" in result.lower()
|
||||
|
||||
def test_short_word(self):
|
||||
"""Words < 4 chars should not be attempted."""
|
||||
assert _try_split_merged_word("the") is None
|
||||
assert _try_split_merged_word("at") is None
|
||||
|
||||
def test_nonsense(self):
|
||||
"""Random letter sequences should not produce a split."""
|
||||
result = _try_split_merged_word("xyzqwk")
|
||||
assert result is None
|
||||
|
||||
# --- Casing preservation ---
|
||||
|
||||
def test_preserves_capitalization(self):
|
||||
result = _try_split_merged_word("Goodidea")
|
||||
assert result is not None
|
||||
assert result.startswith("Good")
|
||||
|
||||
# --- Edge cases ---
|
||||
|
||||
def test_empty_string(self):
|
||||
assert _try_split_merged_word("") is None
|
||||
|
||||
def test_none_safe(self):
|
||||
"""Non-alpha input should be handled gracefully."""
|
||||
# _try_split_merged_word is only called for .isalpha() tokens,
|
||||
# but test robustness anyway
|
||||
assert _try_split_merged_word("123") is None
|
||||
425
klausur-service/backend/unified_grid.py
Normal file
425
klausur-service/backend/unified_grid.py
Normal file
@@ -0,0 +1,425 @@
|
||||
"""
|
||||
Unified Grid Builder — merges multi-zone grid into a single Excel-like grid.
|
||||
|
||||
Takes content zone + box zones and produces one unified zone where:
|
||||
- All content rows use the dominant row height
|
||||
- Full-width boxes are integrated directly (box rows replace standard rows)
|
||||
- Partial-width boxes: extra rows inserted if box has more lines than standard
|
||||
- Box-origin cells carry metadata (bg_color, border) for visual distinction
|
||||
|
||||
The result is a single-zone StructuredGrid that can be:
|
||||
- Rendered in an Excel-like editor
|
||||
- Exported to Excel/CSV
|
||||
- Edited with unified row/column numbering
|
||||
"""
|
||||
|
||||
import logging
|
||||
import math
|
||||
import statistics
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _compute_dominant_row_height(content_zone: Dict) -> float:
|
||||
"""Median of content row-to-row spacings, excluding box-gap jumps."""
|
||||
rows = content_zone.get("rows", [])
|
||||
if len(rows) < 2:
|
||||
return 47.0
|
||||
|
||||
spacings = []
|
||||
for i in range(len(rows) - 1):
|
||||
y1 = rows[i].get("y_min_px", rows[i].get("y_min", 0))
|
||||
y2 = rows[i + 1].get("y_min_px", rows[i + 1].get("y_min", 0))
|
||||
d = y2 - y1
|
||||
if 0 < d < 100: # exclude box-gap jumps
|
||||
spacings.append(d)
|
||||
|
||||
if not spacings:
|
||||
return 47.0
|
||||
spacings.sort()
|
||||
return spacings[len(spacings) // 2]
|
||||
|
||||
|
||||
def _classify_boxes(
|
||||
box_zones: List[Dict],
|
||||
content_width: float,
|
||||
) -> List[Dict]:
|
||||
"""Classify each box as full_width or partial_width."""
|
||||
result = []
|
||||
for bz in box_zones:
|
||||
bb = bz.get("bbox_px", {})
|
||||
bw = bb.get("w", 0)
|
||||
bx = bb.get("x", 0)
|
||||
|
||||
if bw >= content_width * 0.85:
|
||||
classification = "full_width"
|
||||
side = "center"
|
||||
else:
|
||||
classification = "partial_width"
|
||||
# Determine which side of the page the box is on
|
||||
page_center = content_width / 2
|
||||
box_center = bx + bw / 2
|
||||
side = "right" if box_center > page_center else "left"
|
||||
|
||||
# Count total text lines in box (including \n within cells)
|
||||
total_lines = sum(
|
||||
(c.get("text", "").count("\n") + 1)
|
||||
for c in bz.get("cells", [])
|
||||
)
|
||||
|
||||
result.append({
|
||||
"zone": bz,
|
||||
"classification": classification,
|
||||
"side": side,
|
||||
"y_start": bb.get("y", 0),
|
||||
"y_end": bb.get("y", 0) + bb.get("h", 0),
|
||||
"total_lines": total_lines,
|
||||
"bg_hex": bz.get("box_bg_hex", ""),
|
||||
"bg_color": bz.get("box_bg_color", ""),
|
||||
})
|
||||
return result
|
||||
|
||||
|
||||
def build_unified_grid(
|
||||
zones: List[Dict],
|
||||
image_width: int,
|
||||
image_height: int,
|
||||
layout_metrics: Dict,
|
||||
) -> Dict[str, Any]:
|
||||
"""Build a single-zone unified grid from multi-zone grid data.
|
||||
|
||||
Returns a StructuredGrid with one zone containing all rows and cells.
|
||||
"""
|
||||
content_zone = None
|
||||
box_zones = []
|
||||
for z in zones:
|
||||
if z.get("zone_type") == "content":
|
||||
content_zone = z
|
||||
elif z.get("zone_type") == "box":
|
||||
box_zones.append(z)
|
||||
|
||||
if not content_zone:
|
||||
logger.warning("build_unified_grid: no content zone found")
|
||||
return {"zones": zones} # fallback: return as-is
|
||||
|
||||
box_zones.sort(key=lambda b: b.get("bbox_px", {}).get("y", 0))
|
||||
|
||||
dominant_h = _compute_dominant_row_height(content_zone)
|
||||
content_bbox = content_zone.get("bbox_px", {})
|
||||
content_width = content_bbox.get("w", image_width)
|
||||
content_x = content_bbox.get("x", 0)
|
||||
content_cols = content_zone.get("columns", [])
|
||||
num_cols = len(content_cols)
|
||||
|
||||
box_infos = _classify_boxes(box_zones, content_width)
|
||||
|
||||
logger.info(
|
||||
"build_unified_grid: dominant_h=%.1f, %d content rows, %d boxes (%s)",
|
||||
dominant_h, len(content_zone.get("rows", [])), len(box_infos),
|
||||
[b["classification"] for b in box_infos],
|
||||
)
|
||||
|
||||
# --- Build unified row list + cell list ---
|
||||
unified_rows: List[Dict] = []
|
||||
unified_cells: List[Dict] = []
|
||||
unified_row_idx = 0
|
||||
|
||||
# Content rows and cells indexed by row_index
|
||||
content_rows = content_zone.get("rows", [])
|
||||
content_cells = content_zone.get("cells", [])
|
||||
content_cells_by_row: Dict[int, List[Dict]] = {}
|
||||
for c in content_cells:
|
||||
content_cells_by_row.setdefault(c.get("row_index", -1), []).append(c)
|
||||
|
||||
# Track which content rows we've processed
|
||||
content_row_ptr = 0
|
||||
|
||||
for bi, box_info in enumerate(box_infos):
|
||||
bz = box_info["zone"]
|
||||
by_start = box_info["y_start"]
|
||||
by_end = box_info["y_end"]
|
||||
|
||||
# --- Add content rows ABOVE this box ---
|
||||
while content_row_ptr < len(content_rows):
|
||||
cr = content_rows[content_row_ptr]
|
||||
cry = cr.get("y_min_px", cr.get("y_min", 0))
|
||||
if cry >= by_start:
|
||||
break
|
||||
# Add this content row
|
||||
_add_content_row(
|
||||
unified_rows, unified_cells, unified_row_idx,
|
||||
cr, content_cells_by_row, dominant_h, image_height,
|
||||
)
|
||||
unified_row_idx += 1
|
||||
content_row_ptr += 1
|
||||
|
||||
# --- Add box rows ---
|
||||
if box_info["classification"] == "full_width":
|
||||
# Full-width box: integrate box rows directly
|
||||
_add_full_width_box(
|
||||
unified_rows, unified_cells, unified_row_idx,
|
||||
bz, box_info, dominant_h, num_cols, image_height,
|
||||
)
|
||||
unified_row_idx += len(bz.get("rows", []))
|
||||
# Skip content rows that overlap with this box
|
||||
while content_row_ptr < len(content_rows):
|
||||
cr = content_rows[content_row_ptr]
|
||||
cry = cr.get("y_min_px", cr.get("y_min", 0))
|
||||
if cry > by_end:
|
||||
break
|
||||
content_row_ptr += 1
|
||||
|
||||
else:
|
||||
# Partial-width box: merge with adjacent content rows
|
||||
unified_row_idx = _add_partial_width_box(
|
||||
unified_rows, unified_cells, unified_row_idx,
|
||||
bz, box_info, content_rows, content_cells_by_row,
|
||||
content_row_ptr, dominant_h, num_cols, image_height,
|
||||
content_x, content_width,
|
||||
)
|
||||
# Advance content pointer past box region
|
||||
while content_row_ptr < len(content_rows):
|
||||
cr = content_rows[content_row_ptr]
|
||||
cry = cr.get("y_min_px", cr.get("y_min", 0))
|
||||
if cry > by_end:
|
||||
break
|
||||
content_row_ptr += 1
|
||||
|
||||
# --- Add remaining content rows BELOW all boxes ---
|
||||
while content_row_ptr < len(content_rows):
|
||||
cr = content_rows[content_row_ptr]
|
||||
_add_content_row(
|
||||
unified_rows, unified_cells, unified_row_idx,
|
||||
cr, content_cells_by_row, dominant_h, image_height,
|
||||
)
|
||||
unified_row_idx += 1
|
||||
content_row_ptr += 1
|
||||
|
||||
# --- Build unified zone ---
|
||||
unified_zone = {
|
||||
"zone_index": 0,
|
||||
"zone_type": "unified",
|
||||
"bbox_px": content_bbox,
|
||||
"bbox_pct": content_zone.get("bbox_pct", {}),
|
||||
"border": None,
|
||||
"word_count": sum(len(c.get("word_boxes", [])) for c in unified_cells),
|
||||
"columns": content_cols,
|
||||
"rows": unified_rows,
|
||||
"cells": unified_cells,
|
||||
"header_rows": [],
|
||||
}
|
||||
|
||||
logger.info(
|
||||
"build_unified_grid: %d unified rows, %d cells (from %d content + %d box zones)",
|
||||
len(unified_rows), len(unified_cells),
|
||||
len(content_rows), len(box_zones),
|
||||
)
|
||||
|
||||
return {
|
||||
"zones": [unified_zone],
|
||||
"image_width": image_width,
|
||||
"image_height": image_height,
|
||||
"layout_metrics": layout_metrics,
|
||||
"summary": {
|
||||
"total_zones": 1,
|
||||
"total_columns": num_cols,
|
||||
"total_rows": len(unified_rows),
|
||||
"total_cells": len(unified_cells),
|
||||
},
|
||||
"is_unified": True,
|
||||
"dominant_row_h": dominant_h,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_row(idx: int, y: float, h: float, img_h: int, is_header: bool = False) -> Dict:
|
||||
return {
|
||||
"index": idx,
|
||||
"row_index": idx,
|
||||
"y_min_px": round(y),
|
||||
"y_max_px": round(y + h),
|
||||
"y_min_pct": round(y / img_h * 100, 2) if img_h else 0,
|
||||
"y_max_pct": round((y + h) / img_h * 100, 2) if img_h else 0,
|
||||
"is_header": is_header,
|
||||
}
|
||||
|
||||
|
||||
def _remap_cell(cell: Dict, new_row: int, new_col: int = None,
|
||||
source_type: str = "content", box_region: Dict = None) -> Dict:
|
||||
"""Create a new cell dict with remapped indices."""
|
||||
c = dict(cell)
|
||||
c["row_index"] = new_row
|
||||
if new_col is not None:
|
||||
c["col_index"] = new_col
|
||||
c["cell_id"] = f"U_R{new_row:02d}_C{c.get('col_index', 0)}"
|
||||
c["source_zone_type"] = source_type
|
||||
if box_region:
|
||||
c["box_region"] = box_region
|
||||
return c
|
||||
|
||||
|
||||
def _add_content_row(
|
||||
unified_rows, unified_cells, row_idx,
|
||||
content_row, cells_by_row, dominant_h, img_h,
|
||||
):
|
||||
"""Add a single content row to the unified grid."""
|
||||
y = content_row.get("y_min_px", content_row.get("y_min", 0))
|
||||
is_hdr = content_row.get("is_header", False)
|
||||
unified_rows.append(_make_row(row_idx, y, dominant_h, img_h, is_hdr))
|
||||
|
||||
for cell in cells_by_row.get(content_row.get("index", -1), []):
|
||||
unified_cells.append(_remap_cell(cell, row_idx, source_type="content"))
|
||||
|
||||
|
||||
def _add_full_width_box(
|
||||
unified_rows, unified_cells, start_row_idx,
|
||||
box_zone, box_info, dominant_h, num_cols, img_h,
|
||||
):
|
||||
"""Add a full-width box's rows to the unified grid."""
|
||||
box_rows = box_zone.get("rows", [])
|
||||
box_cells = box_zone.get("cells", [])
|
||||
box_region = {"bg_hex": box_info["bg_hex"], "bg_color": box_info["bg_color"], "border": True}
|
||||
|
||||
# Distribute box height evenly among its rows
|
||||
box_h = box_info["y_end"] - box_info["y_start"]
|
||||
row_h = box_h / len(box_rows) if box_rows else dominant_h
|
||||
|
||||
for i, br in enumerate(box_rows):
|
||||
y = box_info["y_start"] + i * row_h
|
||||
new_idx = start_row_idx + i
|
||||
is_hdr = br.get("is_header", False)
|
||||
unified_rows.append(_make_row(new_idx, y, row_h, img_h, is_hdr))
|
||||
|
||||
for cell in box_cells:
|
||||
if cell.get("row_index") == br.get("index", i):
|
||||
unified_cells.append(
|
||||
_remap_cell(cell, new_idx, source_type="box", box_region=box_region)
|
||||
)
|
||||
|
||||
|
||||
def _add_partial_width_box(
|
||||
unified_rows, unified_cells, start_row_idx,
|
||||
box_zone, box_info, content_rows, content_cells_by_row,
|
||||
content_row_ptr, dominant_h, num_cols, img_h,
|
||||
content_x, content_width,
|
||||
) -> int:
|
||||
"""Add a partial-width box merged with content rows.
|
||||
|
||||
Returns the next unified_row_idx after processing.
|
||||
"""
|
||||
by_start = box_info["y_start"]
|
||||
by_end = box_info["y_end"]
|
||||
box_h = by_end - by_start
|
||||
box_region = {"bg_hex": box_info["bg_hex"], "bg_color": box_info["bg_color"], "border": True}
|
||||
|
||||
# Content rows in the box's Y range
|
||||
overlap_content_rows = []
|
||||
ptr = content_row_ptr
|
||||
while ptr < len(content_rows):
|
||||
cr = content_rows[ptr]
|
||||
cry = cr.get("y_min_px", cr.get("y_min", 0))
|
||||
if cry > by_end:
|
||||
break
|
||||
if cry >= by_start:
|
||||
overlap_content_rows.append(cr)
|
||||
ptr += 1
|
||||
|
||||
# How many standard rows fit in the box height
|
||||
standard_rows = max(1, math.floor(box_h / dominant_h))
|
||||
# How many text lines the box actually has
|
||||
box_text_lines = box_info["total_lines"]
|
||||
# Extra rows needed
|
||||
extra_rows = max(0, box_text_lines - standard_rows)
|
||||
total_rows_for_region = standard_rows + extra_rows
|
||||
|
||||
logger.info(
|
||||
"partial box: standard=%d, box_lines=%d, extra=%d, content_overlap=%d",
|
||||
standard_rows, box_text_lines, extra_rows, len(overlap_content_rows),
|
||||
)
|
||||
|
||||
# Determine which columns the box occupies
|
||||
box_bb = box_zone.get("bbox_px", {})
|
||||
box_x = box_bb.get("x", 0)
|
||||
box_w = box_bb.get("w", 0)
|
||||
|
||||
# Map box to content columns: find which content columns overlap
|
||||
box_col_start = 0
|
||||
box_col_end = num_cols
|
||||
content_cols_list = []
|
||||
for z_col_idx in range(num_cols):
|
||||
# Find the column definition by checking all column entries
|
||||
# Simple heuristic: if box starts past halfway, it's the right columns
|
||||
pass
|
||||
|
||||
# Simpler approach: box on right side → last N columns
|
||||
# box on left side → first N columns
|
||||
if box_info["side"] == "right":
|
||||
# Box starts at x=box_x. Find first content column that overlaps
|
||||
box_col_start = num_cols # default: beyond all columns
|
||||
for z in (box_zone.get("columns") or [{"index": 0}]):
|
||||
pass
|
||||
# Use content column positions to determine overlap
|
||||
content_cols_data = [
|
||||
{"idx": c.get("index", i), "x_min": c.get("x_min_px", 0), "x_max": c.get("x_max_px", 0)}
|
||||
for i, c in enumerate(content_rows[0:0] or []) # placeholder
|
||||
]
|
||||
# Simple: split columns at midpoint
|
||||
box_col_start = num_cols // 2 # right half
|
||||
box_col_end = num_cols
|
||||
else:
|
||||
box_col_start = 0
|
||||
box_col_end = num_cols // 2
|
||||
|
||||
# Build rows for this region
|
||||
box_cells = box_zone.get("cells", [])
|
||||
box_rows = box_zone.get("rows", [])
|
||||
row_idx = start_row_idx
|
||||
|
||||
# Expand box cell texts with \n into individual lines for row mapping
|
||||
box_lines: List[Tuple[str, Dict]] = [] # (text_line, parent_cell)
|
||||
for bc in sorted(box_cells, key=lambda c: c.get("row_index", 0)):
|
||||
text = bc.get("text", "")
|
||||
for line in text.split("\n"):
|
||||
box_lines.append((line.strip(), bc))
|
||||
|
||||
for i in range(total_rows_for_region):
|
||||
y = by_start + i * dominant_h
|
||||
unified_rows.append(_make_row(row_idx, y, dominant_h, img_h))
|
||||
|
||||
# Content cells for this row (from overlapping content rows)
|
||||
if i < len(overlap_content_rows):
|
||||
cr = overlap_content_rows[i]
|
||||
for cell in content_cells_by_row.get(cr.get("index", -1), []):
|
||||
# Only include cells from columns NOT covered by the box
|
||||
ci = cell.get("col_index", 0)
|
||||
if ci < box_col_start or ci >= box_col_end:
|
||||
unified_cells.append(_remap_cell(cell, row_idx, source_type="content"))
|
||||
|
||||
# Box cell for this row
|
||||
if i < len(box_lines):
|
||||
line_text, parent_cell = box_lines[i]
|
||||
box_cell = {
|
||||
"cell_id": f"U_R{row_idx:02d}_C{box_col_start}",
|
||||
"row_index": row_idx,
|
||||
"col_index": box_col_start,
|
||||
"col_type": "spanning_header" if (box_col_end - box_col_start) > 1 else parent_cell.get("col_type", "column_1"),
|
||||
"colspan": box_col_end - box_col_start,
|
||||
"text": line_text,
|
||||
"confidence": parent_cell.get("confidence", 0),
|
||||
"bbox_px": parent_cell.get("bbox_px", {}),
|
||||
"bbox_pct": parent_cell.get("bbox_pct", {}),
|
||||
"word_boxes": [],
|
||||
"ocr_engine": parent_cell.get("ocr_engine", ""),
|
||||
"is_bold": parent_cell.get("is_bold", False),
|
||||
"source_zone_type": "box",
|
||||
"box_region": box_region,
|
||||
}
|
||||
unified_cells.append(box_cell)
|
||||
|
||||
row_idx += 1
|
||||
|
||||
return row_idx
|
||||
196
klausur-service/backend/vocab_learn_bridge.py
Normal file
196
klausur-service/backend/vocab_learn_bridge.py
Normal file
@@ -0,0 +1,196 @@
|
||||
"""
|
||||
Vocab Learn Bridge — Converts vocabulary session data into Learning Units.
|
||||
|
||||
Bridges klausur-service (vocab extraction) with backend-lehrer (learning units + generators).
|
||||
Creates a Learning Unit in backend-lehrer, then triggers MC/Cloze/QA generation.
|
||||
|
||||
DATENSCHUTZ: All communication stays within Docker network (breakpilot-network).
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import logging
|
||||
import httpx
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BACKEND_LEHRER_URL = os.getenv("BACKEND_LEHRER_URL", "http://backend-lehrer:8001")
|
||||
|
||||
|
||||
def vocab_to_analysis_data(session_name: str, vocabulary: List[Dict[str, Any]]) -> Dict[str, Any]:
|
||||
"""
|
||||
Convert vocabulary entries from a vocab session into the analysis_data format
|
||||
expected by backend-lehrer generators (MC, Cloze, QA).
|
||||
|
||||
The generators consume:
|
||||
- title: Display name
|
||||
- subject: Subject area
|
||||
- grade_level: Target grade
|
||||
- canonical_text: Full text representation
|
||||
- printed_blocks: Individual text blocks
|
||||
- vocabulary: Original vocab data (for vocab-specific modules)
|
||||
"""
|
||||
canonical_lines = []
|
||||
printed_blocks = []
|
||||
|
||||
for v in vocabulary:
|
||||
en = v.get("english", "").strip()
|
||||
de = v.get("german", "").strip()
|
||||
example = v.get("example_sentence", "").strip()
|
||||
|
||||
if not en and not de:
|
||||
continue
|
||||
|
||||
line = f"{en} = {de}"
|
||||
if example:
|
||||
line += f" ({example})"
|
||||
canonical_lines.append(line)
|
||||
|
||||
block_text = f"{en} — {de}"
|
||||
if example:
|
||||
block_text += f" | {example}"
|
||||
printed_blocks.append({"text": block_text})
|
||||
|
||||
return {
|
||||
"title": session_name,
|
||||
"subject": "English Vocabulary",
|
||||
"grade_level": "5-8",
|
||||
"canonical_text": "\n".join(canonical_lines),
|
||||
"printed_blocks": printed_blocks,
|
||||
"vocabulary": vocabulary,
|
||||
}
|
||||
|
||||
|
||||
async def create_learning_unit(
|
||||
session_name: str,
|
||||
vocabulary: List[Dict[str, Any]],
|
||||
grade: Optional[str] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Create a Learning Unit in backend-lehrer from vocabulary data.
|
||||
|
||||
Steps:
|
||||
1. Create unit via POST /api/learning-units/
|
||||
2. Return the created unit info
|
||||
|
||||
Returns dict with unit_id, status, vocabulary_count.
|
||||
"""
|
||||
if not vocabulary:
|
||||
raise ValueError("No vocabulary entries provided")
|
||||
|
||||
analysis_data = vocab_to_analysis_data(session_name, vocabulary)
|
||||
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
# 1. Create Learning Unit
|
||||
create_payload = {
|
||||
"title": session_name,
|
||||
"subject": "Englisch",
|
||||
"grade": grade or "5-8",
|
||||
}
|
||||
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{BACKEND_LEHRER_URL}/api/learning-units/",
|
||||
json=create_payload,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
unit = resp.json()
|
||||
except httpx.HTTPError as e:
|
||||
logger.error(f"Failed to create learning unit: {e}")
|
||||
raise RuntimeError(f"Backend-Lehrer nicht erreichbar: {e}")
|
||||
|
||||
unit_id = unit.get("id")
|
||||
if not unit_id:
|
||||
raise RuntimeError("Learning Unit created but no ID returned")
|
||||
|
||||
logger.info(f"Created learning unit {unit_id} with {len(vocabulary)} vocabulary entries")
|
||||
|
||||
# 2. Save analysis_data as JSON file for generators
|
||||
analysis_dir = os.path.expanduser("~/Arbeitsblaetter/Lerneinheiten")
|
||||
os.makedirs(analysis_dir, exist_ok=True)
|
||||
analysis_path = os.path.join(analysis_dir, f"{unit_id}_analyse.json")
|
||||
|
||||
with open(analysis_path, "w", encoding="utf-8") as f:
|
||||
json.dump(analysis_data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
logger.info(f"Saved analysis data to {analysis_path}")
|
||||
|
||||
return {
|
||||
"unit_id": unit_id,
|
||||
"unit": unit,
|
||||
"analysis_path": analysis_path,
|
||||
"vocabulary_count": len(vocabulary),
|
||||
"status": "created",
|
||||
}
|
||||
|
||||
|
||||
async def generate_learning_modules(
|
||||
unit_id: str,
|
||||
analysis_path: str,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Trigger MC, Cloze, and QA generation from analysis data.
|
||||
|
||||
Imports generators directly (they run in-process for klausur-service)
|
||||
or calls backend-lehrer API if generators aren't available locally.
|
||||
|
||||
Returns dict with generation results.
|
||||
"""
|
||||
results = {
|
||||
"unit_id": unit_id,
|
||||
"mc": {"status": "pending"},
|
||||
"cloze": {"status": "pending"},
|
||||
"qa": {"status": "pending"},
|
||||
}
|
||||
|
||||
# Load analysis data
|
||||
with open(analysis_path, "r", encoding="utf-8") as f:
|
||||
analysis_data = json.load(f)
|
||||
|
||||
# Try to generate via backend-lehrer API
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
# Generate QA (includes Leitner fields)
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-qa",
|
||||
json={"analysis_data": analysis_data, "num_questions": min(len(analysis_data.get("vocabulary", [])), 20)},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
results["qa"] = {"status": "generated", "data": resp.json()}
|
||||
else:
|
||||
logger.warning(f"QA generation returned {resp.status_code}")
|
||||
results["qa"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
|
||||
except Exception as e:
|
||||
logger.warning(f"QA generation failed: {e}")
|
||||
results["qa"] = {"status": "error", "reason": str(e)}
|
||||
|
||||
# Generate MC
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-mc",
|
||||
json={"analysis_data": analysis_data, "num_questions": min(len(analysis_data.get("vocabulary", [])), 10)},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
results["mc"] = {"status": "generated", "data": resp.json()}
|
||||
else:
|
||||
results["mc"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
|
||||
except Exception as e:
|
||||
logger.warning(f"MC generation failed: {e}")
|
||||
results["mc"] = {"status": "error", "reason": str(e)}
|
||||
|
||||
# Generate Cloze
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{BACKEND_LEHRER_URL}/api/learning-units/{unit_id}/generate-cloze",
|
||||
json={"analysis_data": analysis_data},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
results["cloze"] = {"status": "generated", "data": resp.json()}
|
||||
else:
|
||||
results["cloze"] = {"status": "skipped", "reason": f"HTTP {resp.status_code}"}
|
||||
except Exception as e:
|
||||
logger.warning(f"Cloze generation failed: {e}")
|
||||
results["cloze"] = {"status": "error", "reason": str(e)}
|
||||
|
||||
return results
|
||||
@@ -77,6 +77,11 @@ try:
|
||||
render_pdf_high_res,
|
||||
PageRegion, RowGeometry,
|
||||
)
|
||||
from cv_cell_grid import (
|
||||
_merge_wrapped_rows,
|
||||
_merge_phonetic_continuation_rows,
|
||||
_merge_continuation_rows,
|
||||
)
|
||||
from ocr_pipeline_session_store import (
|
||||
create_session_db as create_pipeline_session_db,
|
||||
update_session_db as update_pipeline_session_db,
|
||||
@@ -1283,12 +1288,18 @@ async def get_pdf_page_image(session_id: str, page_number: int, zoom: float = Qu
|
||||
async def process_single_page(
|
||||
session_id: str,
|
||||
page_number: int,
|
||||
ipa_mode: str = Query("none", pattern="^(auto|all|de|en|none)$"),
|
||||
syllable_mode: str = Query("none", pattern="^(auto|all|de|en|none)$"),
|
||||
):
|
||||
"""
|
||||
Process a SINGLE page of an uploaded PDF using the OCR pipeline.
|
||||
Process a SINGLE page of an uploaded PDF using the Kombi OCR pipeline.
|
||||
|
||||
Uses the multi-step CV pipeline (deskew → dewarp → columns → rows → words)
|
||||
instead of LLM vision for much better extraction quality.
|
||||
Uses the full Kombi pipeline (orientation → deskew → dewarp → crop →
|
||||
dual-engine OCR → grid-build with autocorrect/merge) for best quality.
|
||||
|
||||
Query params:
|
||||
ipa_mode: "none" (default), "auto", "all", "en", "de"
|
||||
syllable_mode: "none" (default), "auto", "all", "en", "de"
|
||||
|
||||
The frontend should call this sequentially for each page.
|
||||
Returns the vocabulary for just this one page.
|
||||
@@ -1296,7 +1307,10 @@ async def process_single_page(
|
||||
logger.info(f"Processing SINGLE page {page_number + 1} for session {session_id}")
|
||||
|
||||
if session_id not in _sessions:
|
||||
raise HTTPException(status_code=404, detail="Session not found")
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail="Session nicht im Speicher. Bitte erstellen Sie eine neue Session und laden Sie das PDF erneut hoch.",
|
||||
)
|
||||
|
||||
session = _sessions[session_id]
|
||||
pdf_data = session.get("pdf_data")
|
||||
@@ -1316,6 +1330,7 @@ async def process_single_page(
|
||||
img_bgr = render_pdf_high_res(pdf_data, page_number, zoom=3.0)
|
||||
page_vocabulary, rotation_deg = await _run_ocr_pipeline_for_page(
|
||||
img_bgr, page_number, session_id,
|
||||
ipa_mode=ipa_mode, syllable_mode=syllable_mode,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"OCR pipeline failed for page {page_number + 1}: {e}", exc_info=True)
|
||||
@@ -1384,28 +1399,33 @@ async def _run_ocr_pipeline_for_page(
|
||||
img_bgr: np.ndarray,
|
||||
page_number: int,
|
||||
vocab_session_id: str,
|
||||
*,
|
||||
ipa_mode: str = "none",
|
||||
syllable_mode: str = "none",
|
||||
) -> tuple:
|
||||
"""Run the full OCR pipeline on a single page image and return vocab entries.
|
||||
"""Run the full Kombi OCR pipeline on a single page and return vocab entries.
|
||||
|
||||
Uses the same pipeline as the admin OCR pipeline (ocr_pipeline_api.py).
|
||||
Uses the same pipeline as the admin OCR Kombi pipeline:
|
||||
orientation → deskew → dewarp → crop → dual-engine OCR → grid-build
|
||||
(with pipe-autocorrect, word-gap merge, dictionary detection, etc.)
|
||||
|
||||
Args:
|
||||
img_bgr: BGR numpy array (from render_pdf_high_res, same as admin pipeline).
|
||||
img_bgr: BGR numpy array.
|
||||
page_number: 0-indexed page number.
|
||||
vocab_session_id: Vocab session ID for logging.
|
||||
ipa_mode: "none" (default for worksheets), "auto", "all", "en", "de".
|
||||
syllable_mode: "none" (default for worksheets), "auto", "all", "en", "de".
|
||||
|
||||
Steps: deskew → dewarp → columns → rows → words → (LLM review)
|
||||
Returns (entries, rotation_deg) where entries is a list of dicts and
|
||||
rotation_deg is the orientation correction applied (0, 90, 180, 270).
|
||||
"""
|
||||
import time as _time
|
||||
|
||||
t_total = _time.time()
|
||||
|
||||
img_h, img_w = img_bgr.shape[:2]
|
||||
logger.info(f"OCR Pipeline page {page_number + 1}: image {img_w}x{img_h}")
|
||||
logger.info(f"Kombi Pipeline page {page_number + 1}: image {img_w}x{img_h}")
|
||||
|
||||
# 1b. Orientation detection (fix upside-down scans)
|
||||
# 1. Orientation detection (fix upside-down scans)
|
||||
t0 = _time.time()
|
||||
img_bgr, rotation = detect_and_fix_orientation(img_bgr)
|
||||
if rotation:
|
||||
@@ -1414,7 +1434,7 @@ async def _run_ocr_pipeline_for_page(
|
||||
else:
|
||||
logger.info(f" orientation: OK ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 2. Create pipeline session in DB (for debugging in admin UI)
|
||||
# 2. Create pipeline session in DB (visible in admin Kombi UI)
|
||||
pipeline_session_id = str(uuid.uuid4())
|
||||
try:
|
||||
_, png_buf = cv2.imencode(".png", img_bgr)
|
||||
@@ -1428,155 +1448,299 @@ async def _run_ocr_pipeline_for_page(
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not create pipeline session in DB: {e}")
|
||||
|
||||
# 3. Three-pass deskew: iterative + word-alignment + text-line regression
|
||||
# 3. Three-pass deskew
|
||||
t0 = _time.time()
|
||||
deskewed_bgr, angle_applied, deskew_debug = deskew_two_pass(img_bgr.copy())
|
||||
angle_pass1 = deskew_debug.get("pass1_angle", 0.0)
|
||||
angle_pass2 = deskew_debug.get("pass2_angle", 0.0)
|
||||
angle_pass3 = deskew_debug.get("pass3_angle", 0.0)
|
||||
|
||||
logger.info(f" deskew: p1={angle_pass1:.2f} p2={angle_pass2:.2f} "
|
||||
f"p3={angle_pass3:.2f} total={angle_applied:.2f} "
|
||||
f"({_time.time() - t0:.1f}s)")
|
||||
logger.info(f" deskew: angle={angle_applied:.2f} ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 4. Dewarp
|
||||
t0 = _time.time()
|
||||
dewarped_bgr, dewarp_info = dewarp_image(deskewed_bgr)
|
||||
logger.info(f" dewarp: shear={dewarp_info['shear_degrees']:.3f} ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 5. Column detection
|
||||
# 5. Content crop (removes scanner borders, gutter shadows)
|
||||
t0 = _time.time()
|
||||
ocr_img = create_ocr_image(dewarped_bgr)
|
||||
h, w = ocr_img.shape[:2]
|
||||
|
||||
geo_result = detect_column_geometry(ocr_img, dewarped_bgr)
|
||||
if geo_result is None:
|
||||
layout_img = create_layout_image(dewarped_bgr)
|
||||
regions = analyze_layout(layout_img, ocr_img)
|
||||
word_dicts = None
|
||||
inv = None
|
||||
content_bounds = None
|
||||
else:
|
||||
geometries, left_x, right_x, top_y, bottom_y, word_dicts, inv = geo_result
|
||||
content_w = right_x - left_x
|
||||
header_y, footer_y = _detect_header_footer_gaps(inv, w, h) if inv is not None else (None, None)
|
||||
geometries = _detect_sub_columns(geometries, content_w, left_x=left_x,
|
||||
top_y=top_y, header_y=header_y, footer_y=footer_y)
|
||||
geometries = _split_broad_columns(geometries, content_w, left_x=left_x)
|
||||
geometries = expand_narrow_columns(geometries, content_w, left_x, word_dicts)
|
||||
content_h = bottom_y - top_y
|
||||
regions = positional_column_regions(geometries, content_w, content_h, left_x)
|
||||
content_bounds = (left_x, right_x, top_y, bottom_y)
|
||||
|
||||
logger.info(f" columns: {len(regions)} detected ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 6. Row detection
|
||||
t0 = _time.time()
|
||||
if word_dicts is None or inv is None or content_bounds is None:
|
||||
# Re-run geometry detection to get intermediates
|
||||
geo_result2 = detect_column_geometry(ocr_img, dewarped_bgr)
|
||||
if geo_result2 is None:
|
||||
raise ValueError("Column geometry detection failed — cannot detect rows")
|
||||
_, left_x, right_x, top_y, bottom_y, word_dicts, inv = geo_result2
|
||||
content_bounds = (left_x, right_x, top_y, bottom_y)
|
||||
|
||||
left_x, right_x, top_y, bottom_y = content_bounds
|
||||
rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y)
|
||||
logger.info(f" rows: {len(rows)} detected ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 7. Word recognition (cell-first OCR v2)
|
||||
t0 = _time.time()
|
||||
col_regions = regions # already PageRegion objects
|
||||
|
||||
# Populate row.words for word_count filtering
|
||||
for row in rows:
|
||||
row_y_rel = row.y - top_y
|
||||
row_bottom_rel = row_y_rel + row.height
|
||||
row.words = [
|
||||
wd for wd in word_dicts
|
||||
if row_y_rel <= wd['top'] + wd['height'] / 2 < row_bottom_rel
|
||||
]
|
||||
row.word_count = len(row.words)
|
||||
|
||||
cells, columns_meta = build_cell_grid_v2(
|
||||
ocr_img, col_regions, rows, img_w, img_h,
|
||||
ocr_engine="auto", img_bgr=dewarped_bgr,
|
||||
)
|
||||
|
||||
col_types = {c['type'] for c in columns_meta}
|
||||
is_vocab = bool(col_types & {'column_en', 'column_de'})
|
||||
logger.info(f" words: {len(cells)} cells, vocab={is_vocab} ({_time.time() - t0:.1f}s)")
|
||||
|
||||
if not is_vocab:
|
||||
logger.warning(f" Page {page_number + 1}: layout is not vocab table "
|
||||
f"(types: {col_types}), returning empty")
|
||||
return [], rotation
|
||||
|
||||
# 8. Map cells → vocab entries
|
||||
entries = _cells_to_vocab_entries(cells, columns_meta)
|
||||
entries = _fix_phonetic_brackets(entries, pronunciation="british")
|
||||
|
||||
# 9. Optional LLM review
|
||||
try:
|
||||
review_result = await llm_review_entries(entries)
|
||||
if review_result and review_result.get("changes"):
|
||||
# Apply corrections
|
||||
changes_map = {}
|
||||
for ch in review_result["changes"]:
|
||||
idx = ch.get("index")
|
||||
if idx is not None:
|
||||
changes_map[idx] = ch
|
||||
for idx, ch in changes_map.items():
|
||||
if 0 <= idx < len(entries):
|
||||
for field in ("english", "german", "example"):
|
||||
if ch.get(field) and ch[field] != entries[idx].get(field):
|
||||
entries[idx][field] = ch[field]
|
||||
logger.info(f" llm review: {len(review_result['changes'])} corrections applied")
|
||||
from page_crop import detect_and_crop_page
|
||||
cropped_bgr, crop_result = detect_and_crop_page(dewarped_bgr)
|
||||
if crop_result.get("crop_applied"):
|
||||
dewarped_bgr = cropped_bgr
|
||||
logger.info(f" crop: applied ({_time.time() - t0:.1f}s)")
|
||||
else:
|
||||
logger.info(f" crop: skipped ({_time.time() - t0:.1f}s)")
|
||||
except Exception as e:
|
||||
logger.warning(f" llm review skipped: {e}")
|
||||
logger.warning(f" crop: failed ({e}), continuing with uncropped image")
|
||||
|
||||
# 10. Map to frontend format
|
||||
page_vocabulary = []
|
||||
for entry in entries:
|
||||
if not entry.get("english") and not entry.get("german"):
|
||||
continue # skip empty rows
|
||||
page_vocabulary.append({
|
||||
"id": str(uuid.uuid4()),
|
||||
"english": entry.get("english", ""),
|
||||
"german": entry.get("german", ""),
|
||||
"example_sentence": entry.get("example", ""),
|
||||
"source_page": page_number + 1,
|
||||
# 6. Dual-engine OCR (RapidOCR + Tesseract → merge)
|
||||
t0 = _time.time()
|
||||
img_h, img_w = dewarped_bgr.shape[:2]
|
||||
|
||||
# RapidOCR (local ONNX)
|
||||
try:
|
||||
from cv_ocr_engines import ocr_region_rapid
|
||||
from cv_vocab_types import PageRegion
|
||||
full_region = PageRegion(type="full_page", x=0, y=0, width=img_w, height=img_h)
|
||||
rapid_words = ocr_region_rapid(dewarped_bgr, full_region) or []
|
||||
except Exception as e:
|
||||
logger.warning(f" RapidOCR failed: {e}")
|
||||
rapid_words = []
|
||||
|
||||
# Tesseract
|
||||
from PIL import Image
|
||||
import pytesseract
|
||||
pil_img = Image.fromarray(cv2.cvtColor(dewarped_bgr, cv2.COLOR_BGR2RGB))
|
||||
data = pytesseract.image_to_data(
|
||||
pil_img, lang="eng+deu", config="--psm 6 --oem 3",
|
||||
output_type=pytesseract.Output.DICT,
|
||||
)
|
||||
tess_words = []
|
||||
for i in range(len(data["text"])):
|
||||
text = str(data["text"][i]).strip()
|
||||
conf_raw = str(data["conf"][i])
|
||||
conf = int(conf_raw) if conf_raw.lstrip("-").isdigit() else -1
|
||||
if not text or conf < 20:
|
||||
continue
|
||||
tess_words.append({
|
||||
"text": text,
|
||||
"left": data["left"][i], "top": data["top"][i],
|
||||
"width": data["width"][i], "height": data["height"][i],
|
||||
"conf": conf,
|
||||
})
|
||||
|
||||
# 11. Update pipeline session in DB (for admin debugging)
|
||||
try:
|
||||
success_dsk, dsk_buf = cv2.imencode(".png", deskewed_bgr)
|
||||
deskewed_png = dsk_buf.tobytes() if success_dsk else None
|
||||
success_dwp, dwp_buf = cv2.imencode(".png", dewarped_bgr)
|
||||
dewarped_png = dwp_buf.tobytes() if success_dwp else None
|
||||
# Merge dual-engine results
|
||||
from ocr_pipeline_ocr_merge import _split_paddle_multi_words, _merge_paddle_tesseract, _deduplicate_words
|
||||
from cv_words_first import build_grid_from_words
|
||||
|
||||
rapid_split = _split_paddle_multi_words(rapid_words) if rapid_words else []
|
||||
if rapid_split or tess_words:
|
||||
merged_words = _merge_paddle_tesseract(rapid_split, tess_words)
|
||||
merged_words = _deduplicate_words(merged_words)
|
||||
else:
|
||||
merged_words = tess_words # fallback to Tesseract only
|
||||
|
||||
# Build initial grid from merged words
|
||||
cells, columns_meta = build_grid_from_words(merged_words, img_w, img_h)
|
||||
for cell in cells:
|
||||
cell["ocr_engine"] = "rapid_kombi"
|
||||
|
||||
n_rows = len(set(c["row_index"] for c in cells)) if cells else 0
|
||||
n_cols = len(columns_meta)
|
||||
logger.info(f" ocr: rapid={len(rapid_words)}, tess={len(tess_words)}, "
|
||||
f"merged={len(merged_words)}, cells={len(cells)} ({_time.time() - t0:.1f}s)")
|
||||
|
||||
# 7. Save word_result to pipeline session (needed by _build_grid_core)
|
||||
word_result = {
|
||||
"cells": cells,
|
||||
"grid_shape": {"rows": n_rows, "cols": n_cols, "total_cells": len(cells)},
|
||||
"columns_used": columns_meta,
|
||||
"layout": "vocab" if {c.get("type") for c in columns_meta} & {"column_en", "column_de"} else "generic",
|
||||
"image_width": img_w,
|
||||
"image_height": img_h,
|
||||
"duration_seconds": 0,
|
||||
"ocr_engine": "rapid_kombi",
|
||||
"raw_tesseract_words": tess_words,
|
||||
"summary": {
|
||||
"total_cells": len(cells),
|
||||
"non_empty_cells": sum(1 for c in cells if c.get("text")),
|
||||
},
|
||||
}
|
||||
|
||||
# Save images + word_result to pipeline session for admin visibility
|
||||
try:
|
||||
_, dsk_buf = cv2.imencode(".png", deskewed_bgr)
|
||||
_, dwp_buf = cv2.imencode(".png", dewarped_bgr)
|
||||
await update_pipeline_session_db(
|
||||
pipeline_session_id,
|
||||
deskewed_png=deskewed_png,
|
||||
dewarped_png=dewarped_png,
|
||||
deskewed_png=dsk_buf.tobytes(),
|
||||
dewarped_png=dwp_buf.tobytes(),
|
||||
cropped_png=cv2.imencode(".png", dewarped_bgr)[1].tobytes(),
|
||||
word_result=word_result,
|
||||
deskew_result={"angle_applied": round(angle_applied, 3)},
|
||||
dewarp_result={"shear_degrees": dewarp_info.get("shear_degrees", 0)},
|
||||
column_result={"columns": [{"type": r.type, "x": r.x, "y": r.y,
|
||||
"width": r.width, "height": r.height}
|
||||
for r in col_regions]},
|
||||
row_result={"total_rows": len(rows)},
|
||||
word_result={
|
||||
"entry_count": len(page_vocabulary),
|
||||
"layout": "vocab",
|
||||
"vocab_entries": entries,
|
||||
},
|
||||
current_step=6,
|
||||
current_step=8,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not update pipeline session: {e}")
|
||||
|
||||
# 8. Run full grid-build (with pipe-autocorrect, word-gap merge, etc.)
|
||||
t0 = _time.time()
|
||||
try:
|
||||
from grid_editor_api import _build_grid_core
|
||||
session_data = {
|
||||
"word_result": word_result,
|
||||
}
|
||||
grid_result = await _build_grid_core(
|
||||
pipeline_session_id, session_data,
|
||||
ipa_mode=ipa_mode, syllable_mode=syllable_mode,
|
||||
)
|
||||
logger.info(f" grid-build: {grid_result.get('summary', {}).get('total_cells', 0)} cells "
|
||||
f"({_time.time() - t0:.1f}s)")
|
||||
|
||||
# Save grid result to pipeline session
|
||||
try:
|
||||
await update_pipeline_session_db(
|
||||
pipeline_session_id,
|
||||
grid_editor_result=grid_result,
|
||||
current_step=11,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f" grid-build failed: {e}, falling back to basic grid")
|
||||
grid_result = None
|
||||
|
||||
# 9. Extract vocab entries
|
||||
# Prefer grid-build result (better column detection, more cells) over
|
||||
# the initial build_grid_from_words() which often under-clusters.
|
||||
page_vocabulary = []
|
||||
extraction_source = "none"
|
||||
|
||||
# A) Try grid-build zones first (best quality: 4-column detection, autocorrect)
|
||||
if grid_result and grid_result.get("zones"):
|
||||
for zone in grid_result["zones"]:
|
||||
zone_cols = zone.get("columns", [])
|
||||
zone_cells = zone.get("cells", [])
|
||||
if not zone_cols or not zone_cells:
|
||||
continue
|
||||
|
||||
# Sort columns by x position to determine roles
|
||||
sorted_cols = sorted(zone_cols, key=lambda c: c.get("x_min_px", 0))
|
||||
col_idx_to_pos = {}
|
||||
for pos, col in enumerate(sorted_cols):
|
||||
ci = col.get("col_index", col.get("index", -1))
|
||||
col_idx_to_pos[ci] = pos
|
||||
|
||||
# Skip zones with only 1 column (likely headers/boxes)
|
||||
if len(sorted_cols) < 2:
|
||||
continue
|
||||
|
||||
# Group cells by row
|
||||
rows_map: dict = {}
|
||||
for cell in zone_cells:
|
||||
ri = cell.get("row_index", 0)
|
||||
if ri not in rows_map:
|
||||
rows_map[ri] = {}
|
||||
ci = cell.get("col_index", 0)
|
||||
rows_map[ri][ci] = (cell.get("text") or "").strip()
|
||||
|
||||
n_cols = len(sorted_cols)
|
||||
for ri in sorted(rows_map.keys()):
|
||||
row = rows_map[ri]
|
||||
# Collect texts in column-position order
|
||||
texts = []
|
||||
for col in sorted_cols:
|
||||
ci = col.get("col_index", col.get("index", -1))
|
||||
texts.append(row.get(ci, ""))
|
||||
|
||||
if not any(texts):
|
||||
continue
|
||||
|
||||
# Map by position, skipping narrow first column (page refs/markers)
|
||||
# Heuristic: if first column is very narrow (<15% of zone width),
|
||||
# it's likely a marker/ref column — skip it for vocab
|
||||
first_col_width = sorted_cols[0].get("x_max_px", 0) - sorted_cols[0].get("x_min_px", 0)
|
||||
zone_width = max(1, (sorted_cols[-1].get("x_max_px", 0) - sorted_cols[0].get("x_min_px", 0)))
|
||||
skip_first = first_col_width / zone_width < 0.15 and n_cols >= 3
|
||||
|
||||
data_texts = texts[1:] if skip_first else texts
|
||||
|
||||
entry = {
|
||||
"id": str(uuid.uuid4()),
|
||||
"english": data_texts[0] if len(data_texts) > 0 else "",
|
||||
"german": data_texts[1] if len(data_texts) > 1 else "",
|
||||
"example_sentence": " ".join(t for t in data_texts[2:] if t) if len(data_texts) > 2 else "",
|
||||
"source_page": page_number + 1,
|
||||
}
|
||||
if entry["english"] or entry["german"]:
|
||||
page_vocabulary.append(entry)
|
||||
|
||||
if page_vocabulary:
|
||||
extraction_source = f"grid-zones ({len(grid_result['zones'])} zones)"
|
||||
|
||||
# B) Fallback: original cells with column classification
|
||||
if not page_vocabulary:
|
||||
col_types = {c.get("type") for c in columns_meta}
|
||||
is_vocab = bool(col_types & {"column_en", "column_de"})
|
||||
|
||||
if is_vocab:
|
||||
entries = _cells_to_vocab_entries(cells, columns_meta)
|
||||
entries = _fix_phonetic_brackets(entries, pronunciation="british")
|
||||
for entry in entries:
|
||||
if not entry.get("english") and not entry.get("german"):
|
||||
continue
|
||||
page_vocabulary.append({
|
||||
"id": str(uuid.uuid4()),
|
||||
"english": entry.get("english", ""),
|
||||
"german": entry.get("german", ""),
|
||||
"example_sentence": entry.get("example", ""),
|
||||
"source_page": page_number + 1,
|
||||
})
|
||||
extraction_source = f"classified ({len(columns_meta)} cols)"
|
||||
else:
|
||||
# Last resort: all cells by position
|
||||
rows_map2: dict = {}
|
||||
for cell in cells:
|
||||
ri = cell.get("row_index", 0)
|
||||
if ri not in rows_map2:
|
||||
rows_map2[ri] = {}
|
||||
ci = cell.get("col_index", 0)
|
||||
rows_map2[ri][ci] = (cell.get("text") or "").strip()
|
||||
all_ci = sorted({ci for r in rows_map2.values() for ci in r.keys()})
|
||||
for ri in sorted(rows_map2.keys()):
|
||||
row = rows_map2[ri]
|
||||
texts = [row.get(ci, "") for ci in all_ci]
|
||||
if not any(texts):
|
||||
continue
|
||||
page_vocabulary.append({
|
||||
"id": str(uuid.uuid4()),
|
||||
"english": texts[0] if len(texts) > 0 else "",
|
||||
"german": texts[1] if len(texts) > 1 else "",
|
||||
"example_sentence": " ".join(texts[2:]) if len(texts) > 2 else "",
|
||||
"source_page": page_number + 1,
|
||||
})
|
||||
extraction_source = f"generic ({len(all_ci)} cols)"
|
||||
|
||||
# --- Post-processing: merge cell-wrap continuation rows ---
|
||||
if len(page_vocabulary) >= 2:
|
||||
try:
|
||||
# Convert to internal format (example_sentence → example)
|
||||
internal = []
|
||||
for v in page_vocabulary:
|
||||
internal.append({
|
||||
'row_index': len(internal),
|
||||
'english': v.get('english', ''),
|
||||
'german': v.get('german', ''),
|
||||
'example': v.get('example_sentence', ''),
|
||||
})
|
||||
|
||||
n_before = len(internal)
|
||||
internal = _merge_wrapped_rows(internal)
|
||||
internal = _merge_phonetic_continuation_rows(internal)
|
||||
internal = _merge_continuation_rows(internal)
|
||||
|
||||
if len(internal) < n_before:
|
||||
# Rebuild page_vocabulary from merged entries
|
||||
merged_vocab = []
|
||||
for entry in internal:
|
||||
if not entry.get('english') and not entry.get('german'):
|
||||
continue
|
||||
merged_vocab.append({
|
||||
'id': str(uuid.uuid4()),
|
||||
'english': entry.get('english', ''),
|
||||
'german': entry.get('german', ''),
|
||||
'example_sentence': entry.get('example', ''),
|
||||
'source_page': page_number + 1,
|
||||
})
|
||||
logger.info(f" row merging: {n_before} → {len(merged_vocab)} entries")
|
||||
page_vocabulary = merged_vocab
|
||||
except Exception as e:
|
||||
logger.warning(f" row merging failed (non-critical): {e}")
|
||||
|
||||
logger.info(f" vocab extraction: {len(page_vocabulary)} entries via {extraction_source}")
|
||||
|
||||
total_duration = _time.time() - t_total
|
||||
logger.info(f"OCR Pipeline page {page_number + 1}: "
|
||||
logger.info(f"Kombi Pipeline page {page_number + 1}: "
|
||||
f"{len(page_vocabulary)} vocab entries in {total_duration:.1f}s")
|
||||
|
||||
return page_vocabulary, rotation
|
||||
@@ -2554,3 +2718,66 @@ async def load_ground_truth(session_id: str, page_number: int):
|
||||
gt_data = json.load(f)
|
||||
|
||||
return {"success": True, "entries": gt_data.get("entries", []), "source": "disk"}
|
||||
|
||||
|
||||
# ─── Learning Module Generation ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class GenerateLearningUnitRequest(BaseModel):
|
||||
grade: Optional[str] = None
|
||||
generate_modules: bool = True
|
||||
|
||||
|
||||
@router.post("/sessions/{session_id}/generate-learning-unit")
|
||||
async def generate_learning_unit_endpoint(session_id: str, request: GenerateLearningUnitRequest = None):
|
||||
"""
|
||||
Create a Learning Unit from the vocabulary in this session.
|
||||
|
||||
1. Takes vocabulary from the session
|
||||
2. Creates a Learning Unit in backend-lehrer
|
||||
3. Optionally triggers MC/Cloze/QA generation
|
||||
|
||||
Returns the created unit info and generation status.
|
||||
"""
|
||||
if request is None:
|
||||
request = GenerateLearningUnitRequest()
|
||||
|
||||
if session_id not in _sessions:
|
||||
raise HTTPException(status_code=404, detail="Session not found")
|
||||
|
||||
session = _sessions[session_id]
|
||||
vocabulary = session.get("vocabulary", [])
|
||||
|
||||
if not vocabulary:
|
||||
raise HTTPException(status_code=400, detail="No vocabulary in this session")
|
||||
|
||||
try:
|
||||
from vocab_learn_bridge import create_learning_unit, generate_learning_modules
|
||||
|
||||
# Step 1: Create Learning Unit
|
||||
result = await create_learning_unit(
|
||||
session_name=session["name"],
|
||||
vocabulary=vocabulary,
|
||||
grade=request.grade,
|
||||
)
|
||||
|
||||
# Step 2: Generate modules if requested
|
||||
if request.generate_modules:
|
||||
try:
|
||||
gen_result = await generate_learning_modules(
|
||||
unit_id=result["unit_id"],
|
||||
analysis_path=result["analysis_path"],
|
||||
)
|
||||
result["generation"] = gen_result
|
||||
except Exception as e:
|
||||
logger.warning(f"Module generation failed (unit created): {e}")
|
||||
result["generation"] = {"status": "error", "reason": str(e)}
|
||||
|
||||
return result
|
||||
|
||||
except ImportError:
|
||||
raise HTTPException(status_code=501, detail="vocab_learn_bridge module not available")
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except RuntimeError as e:
|
||||
raise HTTPException(status_code=502, detail=str(e))
|
||||
|
||||
@@ -77,6 +77,7 @@ nav:
|
||||
- RAG Admin: services/klausur-service/RAG-Admin-Spec.md
|
||||
- Worksheet Editor: services/klausur-service/Worksheet-Editor-Architecture.md
|
||||
- Chunk-Browser: services/klausur-service/Chunk-Browser.md
|
||||
- RAG Landkarte: services/klausur-service/RAG-Landkarte.md
|
||||
- Voice-Service:
|
||||
- Uebersicht: services/voice-service/index.md
|
||||
- Agent-Core:
|
||||
|
||||
189
studio-v2/app/learn/[unitId]/flashcards/page.tsx
Normal file
189
studio-v2/app/learn/[unitId]/flashcards/page.tsx
Normal file
@@ -0,0 +1,189 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect, useCallback } from 'react'
|
||||
import { useParams, useRouter } from 'next/navigation'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { FlashCard } from '@/components/learn/FlashCard'
|
||||
import { AudioButton } from '@/components/learn/AudioButton'
|
||||
|
||||
interface QAItem {
|
||||
id: string
|
||||
question: string
|
||||
answer: string
|
||||
leitner_box: number
|
||||
correct_count: number
|
||||
incorrect_count: number
|
||||
}
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
export default function FlashcardsPage() {
|
||||
const { unitId } = useParams<{ unitId: string }>()
|
||||
const router = useRouter()
|
||||
const { isDark } = useTheme()
|
||||
|
||||
const [items, setItems] = useState<QAItem[]>([])
|
||||
const [currentIndex, setCurrentIndex] = useState(0)
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
|
||||
const [isComplete, setIsComplete] = useState(false)
|
||||
|
||||
const glassCard = isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/10'
|
||||
: 'bg-white/80 backdrop-blur-xl border border-black/5'
|
||||
|
||||
useEffect(() => {
|
||||
loadQA()
|
||||
}, [unitId])
|
||||
|
||||
const loadQA = async () => {
|
||||
setIsLoading(true)
|
||||
try {
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
setItems(data.qa_items || [])
|
||||
} catch (err: any) {
|
||||
setError(err.message)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
const handleAnswer = useCallback(async (correct: boolean) => {
|
||||
const item = items[currentIndex]
|
||||
if (!item) return
|
||||
|
||||
// Update Leitner progress
|
||||
try {
|
||||
await fetch(
|
||||
`${getBackendUrl()}/api/learning-units/${unitId}/leitner/update?item_id=${item.id}&correct=${correct}`,
|
||||
{ method: 'POST' }
|
||||
)
|
||||
} catch (err) {
|
||||
console.error('Leitner update failed:', err)
|
||||
}
|
||||
|
||||
setStats((prev) => ({
|
||||
correct: prev.correct + (correct ? 1 : 0),
|
||||
incorrect: prev.incorrect + (correct ? 0 : 1),
|
||||
}))
|
||||
|
||||
if (currentIndex + 1 >= items.length) {
|
||||
setIsComplete(true)
|
||||
} else {
|
||||
setCurrentIndex((i) => i + 1)
|
||||
}
|
||||
}, [items, currentIndex, unitId])
|
||||
|
||||
if (isLoading) {
|
||||
return (
|
||||
<div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
<div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (error) {
|
||||
return (
|
||||
<div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
<div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
|
||||
<p className={isDark ? 'text-red-300' : 'text-red-600'}>Fehler: {error}</p>
|
||||
<button onClick={() => router.push('/learn')} className="mt-4 px-4 py-2 rounded-xl bg-blue-500 text-white text-sm">
|
||||
Zurueck
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
{/* Header */}
|
||||
<div className={`${glassCard} border-0 border-b`}>
|
||||
<div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
|
||||
</svg>
|
||||
Zurueck
|
||||
</button>
|
||||
<h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Karteikarten
|
||||
</h1>
|
||||
<span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{items.length} Karten
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Content */}
|
||||
<div className="flex-1 flex items-center justify-center px-6 py-8">
|
||||
{isComplete ? (
|
||||
<div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
|
||||
<div className="text-5xl mb-4">
|
||||
{stats.correct > stats.incorrect ? '🎉' : '💪'}
|
||||
</div>
|
||||
<h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Geschafft!
|
||||
</h2>
|
||||
<div className={`flex justify-center gap-8 mb-6 ${isDark ? 'text-white/80' : 'text-slate-700'}`}>
|
||||
<div>
|
||||
<span className="text-3xl font-bold text-green-500">{stats.correct}</span>
|
||||
<p className="text-sm mt-1">Richtig</p>
|
||||
</div>
|
||||
<div>
|
||||
<span className="text-3xl font-bold text-red-500">{stats.incorrect}</span>
|
||||
<p className="text-sm mt-1">Falsch</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex gap-3">
|
||||
<button
|
||||
onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadQA() }}
|
||||
className="flex-1 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium"
|
||||
>
|
||||
Nochmal
|
||||
</button>
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
|
||||
>
|
||||
Zurueck
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
) : items.length > 0 ? (
|
||||
<div className="w-full max-w-lg">
|
||||
<FlashCard
|
||||
front={items[currentIndex].question}
|
||||
back={items[currentIndex].answer}
|
||||
cardNumber={currentIndex + 1}
|
||||
totalCards={items.length}
|
||||
leitnerBox={items[currentIndex].leitner_box}
|
||||
onCorrect={() => handleAnswer(true)}
|
||||
onIncorrect={() => handleAnswer(false)}
|
||||
isDark={isDark}
|
||||
/>
|
||||
{/* Audio Button */}
|
||||
<div className="flex justify-center mt-4">
|
||||
<AudioButton text={items[currentIndex].question} lang="en" isDark={isDark} />
|
||||
</div>
|
||||
</div>
|
||||
) : (
|
||||
<div className={`${glassCard} rounded-2xl p-8 text-center`}>
|
||||
<p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Karteikarten verfuegbar.</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
160
studio-v2/app/learn/[unitId]/quiz/page.tsx
Normal file
160
studio-v2/app/learn/[unitId]/quiz/page.tsx
Normal file
@@ -0,0 +1,160 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect, useCallback } from 'react'
|
||||
import { useParams, useRouter } from 'next/navigation'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { QuizQuestion } from '@/components/learn/QuizQuestion'
|
||||
|
||||
interface MCQuestion {
|
||||
id: string
|
||||
question: string
|
||||
options: { id: string; text: string }[]
|
||||
correct_answer: string
|
||||
explanation?: string
|
||||
}
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
export default function QuizPage() {
|
||||
const { unitId } = useParams<{ unitId: string }>()
|
||||
const router = useRouter()
|
||||
const { isDark } = useTheme()
|
||||
|
||||
const [questions, setQuestions] = useState<MCQuestion[]>([])
|
||||
const [currentIndex, setCurrentIndex] = useState(0)
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
|
||||
const [isComplete, setIsComplete] = useState(false)
|
||||
|
||||
const glassCard = isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/10'
|
||||
: 'bg-white/80 backdrop-blur-xl border border-black/5'
|
||||
|
||||
useEffect(() => {
|
||||
loadMC()
|
||||
}, [unitId])
|
||||
|
||||
const loadMC = async () => {
|
||||
setIsLoading(true)
|
||||
try {
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/mc`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
setQuestions(data.questions || [])
|
||||
} catch (err: any) {
|
||||
setError(err.message)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
const handleAnswer = useCallback((correct: boolean) => {
|
||||
setStats((prev) => ({
|
||||
correct: prev.correct + (correct ? 1 : 0),
|
||||
incorrect: prev.incorrect + (correct ? 0 : 1),
|
||||
}))
|
||||
|
||||
if (currentIndex + 1 >= questions.length) {
|
||||
setIsComplete(true)
|
||||
} else {
|
||||
setCurrentIndex((i) => i + 1)
|
||||
}
|
||||
}, [currentIndex, questions.length])
|
||||
|
||||
if (isLoading) {
|
||||
return (
|
||||
<div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
<div className={`w-8 h-8 border-4 ${isDark ? 'border-purple-400' : 'border-purple-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
{/* Header */}
|
||||
<div className={`${glassCard} border-0 border-b`}>
|
||||
<div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
|
||||
</svg>
|
||||
Zurueck
|
||||
</button>
|
||||
<h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Quiz
|
||||
</h1>
|
||||
<span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{questions.length} Fragen
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Content */}
|
||||
<div className="flex-1 flex items-center justify-center px-6 py-8">
|
||||
{error ? (
|
||||
<div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
|
||||
<p className={isDark ? 'text-red-300' : 'text-red-600'}>{error}</p>
|
||||
<button onClick={() => router.push('/learn')} className="mt-4 px-4 py-2 rounded-xl bg-purple-500 text-white text-sm">
|
||||
Zurueck
|
||||
</button>
|
||||
</div>
|
||||
) : isComplete ? (
|
||||
<div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
|
||||
<div className="text-5xl mb-4">
|
||||
{stats.correct === questions.length ? '🏆' : stats.correct > stats.incorrect ? '🎉' : '💪'}
|
||||
</div>
|
||||
<h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{stats.correct === questions.length ? 'Perfekt!' : 'Geschafft!'}
|
||||
</h2>
|
||||
<p className={`text-lg mb-4 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
{stats.correct} von {questions.length} richtig
|
||||
({Math.round((stats.correct / questions.length) * 100)}%)
|
||||
</p>
|
||||
<div className="w-full h-3 rounded-full bg-white/10 overflow-hidden mb-6">
|
||||
<div
|
||||
className="h-full rounded-full bg-gradient-to-r from-purple-500 to-pink-500"
|
||||
style={{ width: `${(stats.correct / questions.length) * 100}%` }}
|
||||
/>
|
||||
</div>
|
||||
<div className="flex gap-3">
|
||||
<button
|
||||
onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadMC() }}
|
||||
className="flex-1 py-3 rounded-xl bg-gradient-to-r from-purple-500 to-pink-500 text-white font-medium"
|
||||
>
|
||||
Nochmal
|
||||
</button>
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
|
||||
>
|
||||
Zurueck
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
) : questions[currentIndex] ? (
|
||||
<QuizQuestion
|
||||
question={questions[currentIndex].question}
|
||||
options={questions[currentIndex].options}
|
||||
correctAnswer={questions[currentIndex].correct_answer}
|
||||
explanation={questions[currentIndex].explanation}
|
||||
questionNumber={currentIndex + 1}
|
||||
totalQuestions={questions.length}
|
||||
onAnswer={handleAnswer}
|
||||
isDark={isDark}
|
||||
/>
|
||||
) : (
|
||||
<p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Quiz-Fragen verfuegbar.</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
184
studio-v2/app/learn/[unitId]/story/page.tsx
Normal file
184
studio-v2/app/learn/[unitId]/story/page.tsx
Normal file
@@ -0,0 +1,184 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect } from 'react'
|
||||
import { useParams, useRouter } from 'next/navigation'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { AudioButton } from '@/components/learn/AudioButton'
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
function getKlausurApiUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8086'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8086'
|
||||
return `${protocol}//${hostname}/klausur-api`
|
||||
}
|
||||
|
||||
export default function StoryPage() {
|
||||
const { unitId } = useParams<{ unitId: string }>()
|
||||
const router = useRouter()
|
||||
const { isDark } = useTheme()
|
||||
|
||||
const [story, setStory] = useState<{ story_html: string; story_text: string; vocab_used: string[]; language: string } | null>(null)
|
||||
const [isLoading, setIsLoading] = useState(false)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [language, setLanguage] = useState<'en' | 'de'>('en')
|
||||
|
||||
const glassCard = isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/10'
|
||||
: 'bg-white/80 backdrop-blur-xl border border-black/5'
|
||||
|
||||
const generateStory = async () => {
|
||||
setIsLoading(true)
|
||||
setError(null)
|
||||
|
||||
try {
|
||||
// First get the QA data to extract vocabulary
|
||||
const qaResp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
|
||||
let vocabulary: { english: string; german: string }[] = []
|
||||
|
||||
if (qaResp.ok) {
|
||||
const qaData = await qaResp.json()
|
||||
// Convert QA items to vocabulary format
|
||||
vocabulary = (qaData.qa_items || []).map((item: any) => ({
|
||||
english: item.question,
|
||||
german: item.answer,
|
||||
}))
|
||||
}
|
||||
|
||||
if (vocabulary.length === 0) {
|
||||
setError('Keine Vokabeln gefunden.')
|
||||
setIsLoading(false)
|
||||
return
|
||||
}
|
||||
|
||||
// Generate story
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/generate-story`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ vocabulary, language, grade_level: '5-8' }),
|
||||
})
|
||||
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
setStory(data)
|
||||
} catch (err: any) {
|
||||
setError(err.message)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-amber-50 to-orange-100'}`}>
|
||||
{/* Header */}
|
||||
<div className={`${glassCard} border-0 border-b`}>
|
||||
<div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
|
||||
</svg>
|
||||
Zurueck
|
||||
</button>
|
||||
<h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Minigeschichte
|
||||
</h1>
|
||||
<button
|
||||
onClick={() => setLanguage((l) => l === 'en' ? 'de' : 'en')}
|
||||
className={`text-xs px-3 py-1.5 rounded-lg ${isDark ? 'bg-white/10 text-white/70' : 'bg-slate-100 text-slate-600'}`}
|
||||
>
|
||||
{language === 'en' ? 'Englisch' : 'Deutsch'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Content */}
|
||||
<div className="flex-1 flex items-center justify-center px-6 py-8">
|
||||
<div className="w-full max-w-lg space-y-6">
|
||||
{story ? (
|
||||
<>
|
||||
{/* Story Card */}
|
||||
<div className={`${glassCard} rounded-3xl p-8`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<span className={`text-xs font-medium uppercase ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
{story.language === 'en' ? 'English Story' : 'Deutsche Geschichte'}
|
||||
</span>
|
||||
<AudioButton text={story.story_text} lang={story.language as 'en' | 'de'} isDark={isDark} size="md" />
|
||||
</div>
|
||||
<div
|
||||
className={`text-lg leading-relaxed ${isDark ? 'text-white/90' : 'text-slate-800'}`}
|
||||
dangerouslySetInnerHTML={{ __html: story.story_html }}
|
||||
/>
|
||||
<style>{`
|
||||
.vocab-highlight {
|
||||
background: ${isDark ? 'rgba(96, 165, 250, 0.3)' : 'rgba(59, 130, 246, 0.15)'};
|
||||
color: ${isDark ? '#93c5fd' : '#1d4ed8'};
|
||||
padding: 1px 4px;
|
||||
border-radius: 4px;
|
||||
font-weight: 600;
|
||||
}
|
||||
`}</style>
|
||||
</div>
|
||||
|
||||
{/* Vocab used */}
|
||||
<div className={`text-center text-sm ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
Vokabeln verwendet: {story.vocab_used.length} / {story.vocab_used.length > 0 ? story.vocab_used.join(', ') : '-'}
|
||||
</div>
|
||||
|
||||
{/* New Story Button */}
|
||||
<button
|
||||
onClick={generateStory}
|
||||
disabled={isLoading}
|
||||
className="w-full py-3 rounded-xl bg-gradient-to-r from-amber-500 to-orange-500 text-white font-medium hover:shadow-lg transition-all"
|
||||
>
|
||||
Neue Geschichte generieren
|
||||
</button>
|
||||
</>
|
||||
) : (
|
||||
<div className={`${glassCard} rounded-3xl p-10 text-center`}>
|
||||
<div className="text-5xl mb-4">📖</div>
|
||||
<h2 className={`text-xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Minigeschichte
|
||||
</h2>
|
||||
<p className={`text-sm mb-6 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Die KI schreibt eine kurze Geschichte mit deinen Vokabeln.
|
||||
Die Vokabelwoerter werden farbig hervorgehoben.
|
||||
</p>
|
||||
|
||||
{error && (
|
||||
<p className={`text-sm mb-4 ${isDark ? 'text-red-300' : 'text-red-600'}`}>{error}</p>
|
||||
)}
|
||||
|
||||
<button
|
||||
onClick={generateStory}
|
||||
disabled={isLoading}
|
||||
className={`w-full py-4 rounded-xl font-medium transition-all ${
|
||||
isLoading
|
||||
? (isDark ? 'bg-white/5 text-white/30' : 'bg-slate-100 text-slate-400')
|
||||
: 'bg-gradient-to-r from-amber-500 to-orange-500 text-white hover:shadow-lg hover:shadow-orange-500/25'
|
||||
}`}
|
||||
>
|
||||
{isLoading ? (
|
||||
<span className="flex items-center justify-center gap-3">
|
||||
<div className="w-5 h-5 border-2 border-white border-t-transparent rounded-full animate-spin" />
|
||||
Geschichte wird geschrieben...
|
||||
</span>
|
||||
) : (
|
||||
'Geschichte generieren'
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
194
studio-v2/app/learn/[unitId]/type/page.tsx
Normal file
194
studio-v2/app/learn/[unitId]/type/page.tsx
Normal file
@@ -0,0 +1,194 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect, useCallback } from 'react'
|
||||
import { useParams, useRouter } from 'next/navigation'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { TypeInput } from '@/components/learn/TypeInput'
|
||||
import { AudioButton } from '@/components/learn/AudioButton'
|
||||
|
||||
interface QAItem {
|
||||
id: string
|
||||
question: string
|
||||
answer: string
|
||||
leitner_box: number
|
||||
}
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
export default function TypePage() {
|
||||
const { unitId } = useParams<{ unitId: string }>()
|
||||
const router = useRouter()
|
||||
const { isDark } = useTheme()
|
||||
|
||||
const [items, setItems] = useState<QAItem[]>([])
|
||||
const [currentIndex, setCurrentIndex] = useState(0)
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [stats, setStats] = useState({ correct: 0, incorrect: 0 })
|
||||
const [isComplete, setIsComplete] = useState(false)
|
||||
const [direction, setDirection] = useState<'en_to_de' | 'de_to_en'>('en_to_de')
|
||||
|
||||
const glassCard = isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/10'
|
||||
: 'bg-white/80 backdrop-blur-xl border border-black/5'
|
||||
|
||||
useEffect(() => {
|
||||
loadQA()
|
||||
}, [unitId])
|
||||
|
||||
const loadQA = async () => {
|
||||
setIsLoading(true)
|
||||
try {
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}/qa`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
setItems(data.qa_items || [])
|
||||
} catch (err: any) {
|
||||
setError(err.message)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
const handleResult = useCallback(async (correct: boolean) => {
|
||||
const item = items[currentIndex]
|
||||
if (!item) return
|
||||
|
||||
try {
|
||||
await fetch(
|
||||
`${getBackendUrl()}/api/learning-units/${unitId}/leitner/update?item_id=${item.id}&correct=${correct}`,
|
||||
{ method: 'POST' }
|
||||
)
|
||||
} catch (err) {
|
||||
console.error('Leitner update failed:', err)
|
||||
}
|
||||
|
||||
setStats((prev) => ({
|
||||
correct: prev.correct + (correct ? 1 : 0),
|
||||
incorrect: prev.incorrect + (correct ? 0 : 1),
|
||||
}))
|
||||
|
||||
if (currentIndex + 1 >= items.length) {
|
||||
setIsComplete(true)
|
||||
} else {
|
||||
setCurrentIndex((i) => i + 1)
|
||||
}
|
||||
}, [items, currentIndex, unitId])
|
||||
|
||||
const currentItem = items[currentIndex]
|
||||
const prompt = currentItem
|
||||
? (direction === 'en_to_de' ? currentItem.question : currentItem.answer)
|
||||
: ''
|
||||
const answer = currentItem
|
||||
? (direction === 'en_to_de' ? currentItem.answer : currentItem.question)
|
||||
: ''
|
||||
|
||||
if (isLoading) {
|
||||
return (
|
||||
<div className={`min-h-screen flex items-center justify-center ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
<div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`min-h-screen flex flex-col ${isDark ? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800' : 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'}`}>
|
||||
{/* Header */}
|
||||
<div className={`${glassCard} border-0 border-b`}>
|
||||
<div className="max-w-2xl mx-auto px-6 py-4 flex items-center justify-between">
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex items-center gap-2 text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-900'}`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 19l-7-7 7-7" />
|
||||
</svg>
|
||||
Zurueck
|
||||
</button>
|
||||
<h1 className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Eintippen
|
||||
</h1>
|
||||
{/* Direction toggle */}
|
||||
<button
|
||||
onClick={() => setDirection((d) => d === 'en_to_de' ? 'de_to_en' : 'en_to_de')}
|
||||
className={`text-xs px-3 py-1.5 rounded-lg ${isDark ? 'bg-white/10 text-white/70' : 'bg-slate-100 text-slate-600'}`}
|
||||
>
|
||||
{direction === 'en_to_de' ? 'EN → DE' : 'DE → EN'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Progress */}
|
||||
<div className="w-full h-1 bg-white/10">
|
||||
<div
|
||||
className="h-full bg-gradient-to-r from-blue-500 to-cyan-500 transition-all"
|
||||
style={{ width: `${((currentIndex) / Math.max(items.length, 1)) * 100}%` }}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Content */}
|
||||
<div className="flex-1 flex items-center justify-center px-6 py-8">
|
||||
{error ? (
|
||||
<div className={`${glassCard} rounded-2xl p-8 text-center max-w-md`}>
|
||||
<p className={isDark ? 'text-red-300' : 'text-red-600'}>{error}</p>
|
||||
</div>
|
||||
) : isComplete ? (
|
||||
<div className={`${glassCard} rounded-3xl p-10 text-center max-w-md w-full`}>
|
||||
<div className="text-5xl mb-4">
|
||||
{stats.correct > stats.incorrect ? '🎉' : '💪'}
|
||||
</div>
|
||||
<h2 className={`text-2xl font-bold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Geschafft!
|
||||
</h2>
|
||||
<div className={`flex justify-center gap-8 mb-6 ${isDark ? 'text-white/80' : 'text-slate-700'}`}>
|
||||
<div>
|
||||
<span className="text-3xl font-bold text-green-500">{stats.correct}</span>
|
||||
<p className="text-sm mt-1">Richtig</p>
|
||||
</div>
|
||||
<div>
|
||||
<span className="text-3xl font-bold text-red-500">{stats.incorrect}</span>
|
||||
<p className="text-sm mt-1">Falsch</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex gap-3">
|
||||
<button
|
||||
onClick={() => { setCurrentIndex(0); setStats({ correct: 0, incorrect: 0 }); setIsComplete(false); loadQA() }}
|
||||
className="flex-1 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium"
|
||||
>
|
||||
Nochmal
|
||||
</button>
|
||||
<button
|
||||
onClick={() => router.push('/learn')}
|
||||
className={`flex-1 py-3 rounded-xl border font-medium ${isDark ? 'border-white/20 text-white/80' : 'border-slate-300 text-slate-700'}`}
|
||||
>
|
||||
Zurueck
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
) : currentItem ? (
|
||||
<div className="w-full max-w-lg space-y-4">
|
||||
<div className="flex justify-center">
|
||||
<AudioButton text={prompt} lang={direction === 'en_to_de' ? 'en' : 'de'} isDark={isDark} />
|
||||
</div>
|
||||
<TypeInput
|
||||
prompt={prompt}
|
||||
answer={answer}
|
||||
onResult={handleResult}
|
||||
isDark={isDark}
|
||||
/>
|
||||
<p className={`text-center text-sm ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
{currentIndex + 1} / {items.length}
|
||||
</p>
|
||||
</div>
|
||||
) : (
|
||||
<p className={isDark ? 'text-white/60' : 'text-slate-500'}>Keine Vokabeln verfuegbar.</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
164
studio-v2/app/learn/page.tsx
Normal file
164
studio-v2/app/learn/page.tsx
Normal file
@@ -0,0 +1,164 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect } from 'react'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { Sidebar } from '@/components/Sidebar'
|
||||
import { UnitCard } from '@/components/learn/UnitCard'
|
||||
|
||||
interface LearningUnit {
|
||||
id: string
|
||||
label: string
|
||||
meta: string
|
||||
title: string
|
||||
topic: string | null
|
||||
grade_level: string | null
|
||||
status: string
|
||||
vocabulary_count?: number
|
||||
created_at: string
|
||||
}
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
export default function LearnPage() {
|
||||
const { isDark } = useTheme()
|
||||
const [units, setUnits] = useState<LearningUnit[]>([])
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
|
||||
const glassCard = isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/10'
|
||||
: 'bg-white/80 backdrop-blur-xl border border-black/5'
|
||||
|
||||
useEffect(() => {
|
||||
loadUnits()
|
||||
}, [])
|
||||
|
||||
const loadUnits = async () => {
|
||||
setIsLoading(true)
|
||||
try {
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/`)
|
||||
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
|
||||
const data = await resp.json()
|
||||
setUnits(data)
|
||||
} catch (err: any) {
|
||||
setError(err.message)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
const handleDelete = async (unitId: string) => {
|
||||
try {
|
||||
const resp = await fetch(`${getBackendUrl()}/api/learning-units/${unitId}`, { method: 'DELETE' })
|
||||
if (resp.ok) {
|
||||
setUnits((prev) => prev.filter((u) => u.id !== unitId))
|
||||
}
|
||||
} catch (err) {
|
||||
console.error('Delete failed:', err)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`min-h-screen flex relative overflow-hidden ${
|
||||
isDark
|
||||
? 'bg-gradient-to-br from-indigo-900 via-purple-900 to-pink-800'
|
||||
: 'bg-gradient-to-br from-slate-100 via-blue-50 to-cyan-100'
|
||||
}`}>
|
||||
{/* Background Blobs */}
|
||||
<div className="absolute inset-0 overflow-hidden pointer-events-none">
|
||||
<div className={`absolute -top-40 -right-40 w-80 h-80 rounded-full mix-blend-multiply filter blur-3xl animate-pulse ${
|
||||
isDark ? 'bg-blue-500 opacity-50' : 'bg-blue-300 opacity-30'
|
||||
}`} />
|
||||
<div className={`absolute -bottom-40 -left-40 w-80 h-80 rounded-full mix-blend-multiply filter blur-3xl animate-pulse ${
|
||||
isDark ? 'bg-cyan-500 opacity-50' : 'bg-cyan-300 opacity-30'
|
||||
}`} style={{ animationDelay: '2s' }} />
|
||||
</div>
|
||||
|
||||
{/* Sidebar */}
|
||||
<div className="relative z-10 p-4">
|
||||
<Sidebar />
|
||||
</div>
|
||||
|
||||
{/* Main Content */}
|
||||
<div className="flex-1 flex flex-col relative z-10 overflow-y-auto">
|
||||
{/* Header */}
|
||||
<div className={`${glassCard} border-0 border-b`}>
|
||||
<div className="max-w-5xl mx-auto px-6 py-4">
|
||||
<div className="flex items-center gap-4">
|
||||
<div className={`w-12 h-12 rounded-xl flex items-center justify-center ${
|
||||
isDark ? 'bg-blue-500/30' : 'bg-blue-200'
|
||||
}`}>
|
||||
<svg className={`w-6 h-6 ${isDark ? 'text-blue-300' : 'text-blue-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h1 className={`text-xl font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Meine Lernmodule
|
||||
</h1>
|
||||
<p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Karteikarten, Quiz und Lueckentexte aus deinen Vokabeln
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Content */}
|
||||
<div className="max-w-5xl mx-auto w-full px-6 py-6">
|
||||
{isLoading && (
|
||||
<div className="flex items-center justify-center py-20">
|
||||
<div className={`w-8 h-8 border-4 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
</div>
|
||||
)}
|
||||
|
||||
{error && (
|
||||
<div className={`${glassCard} rounded-2xl p-6 text-center`}>
|
||||
<p className={`${isDark ? 'text-red-300' : 'text-red-600'}`}>Fehler: {error}</p>
|
||||
<button onClick={loadUnits} className="mt-3 px-4 py-2 rounded-xl bg-blue-500 text-white text-sm">
|
||||
Erneut versuchen
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{!isLoading && !error && units.length === 0 && (
|
||||
<div className={`${glassCard} rounded-2xl p-12 text-center`}>
|
||||
<div className={`w-16 h-16 mx-auto mb-4 rounded-2xl flex items-center justify-center ${
|
||||
isDark ? 'bg-blue-500/20' : 'bg-blue-100'
|
||||
}`}>
|
||||
<svg className={`w-8 h-8 ${isDark ? 'text-blue-300' : 'text-blue-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3 className={`text-lg font-semibold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Noch keine Lernmodule
|
||||
</h3>
|
||||
<p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Scanne eine Schulbuchseite im Vokabel-Arbeitsblatt Generator und klicke "Lernmodule generieren".
|
||||
</p>
|
||||
<a
|
||||
href="/vocab-worksheet"
|
||||
className="inline-block px-6 py-3 rounded-xl bg-gradient-to-r from-blue-500 to-cyan-500 text-white font-medium hover:shadow-lg transition-all"
|
||||
>
|
||||
Zum Vokabel-Scanner
|
||||
</a>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{!isLoading && units.length > 0 && (
|
||||
<div className="grid gap-4">
|
||||
{units.map((unit) => (
|
||||
<UnitCard key={unit.id} unit={unit} isDark={isDark} glassCard={glassCard} onDelete={handleDelete} />
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
153
studio-v2/app/vocab-worksheet/components/ExportTab.tsx
Normal file
153
studio-v2/app/vocab-worksheet/components/ExportTab.tsx
Normal file
@@ -0,0 +1,153 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState } from 'react'
|
||||
import { useRouter } from 'next/navigation'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
import { getApiBase } from '../constants'
|
||||
|
||||
export function ExportTab({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard } = h
|
||||
const router = useRouter()
|
||||
|
||||
const [isGeneratingLearning, setIsGeneratingLearning] = useState(false)
|
||||
const [learningUnitId, setLearningUnitId] = useState<string | null>(null)
|
||||
const [learningError, setLearningError] = useState<string | null>(null)
|
||||
|
||||
const handleGenerateLearningUnit = async () => {
|
||||
if (!h.session) return
|
||||
setIsGeneratingLearning(true)
|
||||
setLearningError(null)
|
||||
|
||||
try {
|
||||
const apiBase = getApiBase()
|
||||
const resp = await fetch(`${apiBase}/api/v1/vocab/sessions/${h.session.id}/generate-learning-unit`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ generate_modules: true }),
|
||||
})
|
||||
|
||||
if (!resp.ok) {
|
||||
const err = await resp.json().catch(() => ({}))
|
||||
throw new Error(err.detail || `HTTP ${resp.status}`)
|
||||
}
|
||||
|
||||
const result = await resp.json()
|
||||
setLearningUnitId(result.unit_id)
|
||||
} catch (err: any) {
|
||||
setLearningError(err.message || 'Fehler bei der Generierung')
|
||||
} finally {
|
||||
setIsGeneratingLearning(false)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
{/* PDF Download Section */}
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>PDF herunterladen</h2>
|
||||
|
||||
{h.worksheetId ? (
|
||||
<div className="space-y-4">
|
||||
<div className={`p-4 rounded-xl ${isDark ? 'bg-green-500/20 border border-green-500/30' : 'bg-green-100 border border-green-200'}`}>
|
||||
<div className="flex items-center gap-3">
|
||||
<svg className="w-6 h-6 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
<span className={`font-medium ${isDark ? 'text-green-200' : 'text-green-700'}`}>Arbeitsblatt erfolgreich generiert!</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 gap-4">
|
||||
<button onClick={() => h.downloadPDF('worksheet')} className={`${glassCard} p-6 rounded-xl text-left transition-all hover:shadow-lg ${isDark ? 'hover:border-purple-400/50' : 'hover:border-purple-500'}`}>
|
||||
<div className={`w-12 h-12 mb-3 rounded-xl flex items-center justify-center ${isDark ? 'bg-purple-500/30' : 'bg-purple-100'}`}>
|
||||
<svg className={`w-6 h-6 ${isDark ? 'text-purple-300' : 'text-purple-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M12 10v6m0 0l-3-3m3 3l3-3m2 8H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3 className={`font-semibold mb-1 ${isDark ? 'text-white' : 'text-slate-900'}`}>Arbeitsblatt</h3>
|
||||
<p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>PDF zum Ausdrucken</p>
|
||||
</button>
|
||||
|
||||
{h.includeSolutions && (
|
||||
<button onClick={() => h.downloadPDF('solution')} className={`${glassCard} p-6 rounded-xl text-left transition-all hover:shadow-lg ${isDark ? 'hover:border-green-400/50' : 'hover:border-green-500'}`}>
|
||||
<div className={`w-12 h-12 mb-3 rounded-xl flex items-center justify-center ${isDark ? 'bg-green-500/30' : 'bg-green-100'}`}>
|
||||
<svg className={`w-6 h-6 ${isDark ? 'text-green-300' : 'text-green-600'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3 className={`font-semibold mb-1 ${isDark ? 'text-white' : 'text-slate-900'}`}>Loesungsblatt</h3>
|
||||
<p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>PDF mit Loesungen</p>
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
) : (
|
||||
<p className={`text-center py-8 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Noch kein Arbeitsblatt generiert.</p>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Learning Module Generation Section */}
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<h2 className={`text-lg font-semibold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>Interaktive Lernmodule</h2>
|
||||
<p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Aus den Vokabeln automatisch Karteikarten, Quiz und Lueckentexte erstellen.
|
||||
</p>
|
||||
|
||||
{learningError && (
|
||||
<div className={`p-3 rounded-xl mb-4 ${isDark ? 'bg-red-500/20 border border-red-500/30' : 'bg-red-100 border border-red-200'}`}>
|
||||
<p className={`text-sm ${isDark ? 'text-red-200' : 'text-red-700'}`}>{learningError}</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{learningUnitId ? (
|
||||
<div className="space-y-4">
|
||||
<div className={`p-4 rounded-xl ${isDark ? 'bg-blue-500/20 border border-blue-500/30' : 'bg-blue-100 border border-blue-200'}`}>
|
||||
<div className="flex items-center gap-3">
|
||||
<svg className="w-6 h-6 text-blue-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
<span className={`font-medium ${isDark ? 'text-blue-200' : 'text-blue-700'}`}>Lernmodule wurden generiert!</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<button
|
||||
onClick={() => router.push(`/learn/${learningUnitId}`)}
|
||||
className="w-full py-3 rounded-xl font-medium bg-gradient-to-r from-blue-500 to-cyan-500 text-white hover:shadow-lg transition-all"
|
||||
>
|
||||
Lernmodule oeffnen
|
||||
</button>
|
||||
</div>
|
||||
) : (
|
||||
<button
|
||||
onClick={handleGenerateLearningUnit}
|
||||
disabled={isGeneratingLearning || h.vocabulary.length === 0}
|
||||
className={`w-full py-4 rounded-xl font-medium transition-all ${
|
||||
isGeneratingLearning || h.vocabulary.length === 0
|
||||
? (isDark ? 'bg-white/5 text-white/30 cursor-not-allowed' : 'bg-slate-100 text-slate-400 cursor-not-allowed')
|
||||
: 'bg-gradient-to-r from-blue-500 to-cyan-500 text-white hover:shadow-lg hover:shadow-blue-500/25'
|
||||
}`}
|
||||
>
|
||||
{isGeneratingLearning ? (
|
||||
<span className="flex items-center justify-center gap-3">
|
||||
<div className="w-5 h-5 border-2 border-white border-t-transparent rounded-full animate-spin" />
|
||||
Lernmodule werden generiert...
|
||||
</span>
|
||||
) : (
|
||||
<span className="flex items-center justify-center gap-2">
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z" />
|
||||
</svg>
|
||||
Lernmodule generieren ({h.vocabulary.length} Vokabeln)
|
||||
</span>
|
||||
)}
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Reset Button */}
|
||||
<button onClick={h.resetSession} className={`w-full py-3 rounded-xl border font-medium transition-colors ${isDark ? 'border-white/20 text-white/80 hover:bg-white/10' : 'border-slate-300 text-slate-700 hover:bg-slate-50'}`}>
|
||||
Neues Arbeitsblatt erstellen
|
||||
</button>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
|
||||
export function FullscreenPreview({ h }: { h: VocabWorksheetHook }) {
|
||||
return (
|
||||
<div className="fixed inset-0 z-50 bg-black/80 backdrop-blur-sm flex items-center justify-center" onClick={() => h.setShowFullPreview(false)}>
|
||||
<button
|
||||
onClick={() => h.setShowFullPreview(false)}
|
||||
className="absolute top-4 right-4 p-2 rounded-full bg-white/10 hover:bg-white/20 text-white z-10 transition-colors"
|
||||
>
|
||||
<svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
|
||||
</svg>
|
||||
</button>
|
||||
<div className="max-w-[95vw] max-h-[95vh] overflow-auto" onClick={(e) => e.stopPropagation()}>
|
||||
{h.directFile?.type.startsWith('image/') && h.directFilePreview && (
|
||||
<img src={h.directFilePreview} alt="Original" className="max-w-none" />
|
||||
)}
|
||||
{h.directFile?.type === 'application/pdf' && h.directFilePreview && (
|
||||
<iframe src={h.directFilePreview} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
|
||||
)}
|
||||
{h.selectedMobileFile && !h.directFile && (
|
||||
h.selectedMobileFile.type.startsWith('image/')
|
||||
? <img src={h.selectedMobileFile.dataUrl} alt="Original" className="max-w-none" />
|
||||
: <iframe src={h.selectedMobileFile.dataUrl} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
|
||||
)}
|
||||
{h.selectedDocumentId && !h.directFile && !h.selectedMobileFile && (() => {
|
||||
const doc = h.storedDocuments.find(d => d.id === h.selectedDocumentId)
|
||||
if (!doc?.url) return null
|
||||
return doc.type.startsWith('image/')
|
||||
? <img src={doc.url} alt="Original" className="max-w-none" />
|
||||
: <iframe src={doc.url} className="border-0 rounded-xl bg-white" style={{ width: '90vw', height: '90vh' }} />
|
||||
})()}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
135
studio-v2/app/vocab-worksheet/components/OcrComparisonModal.tsx
Normal file
135
studio-v2/app/vocab-worksheet/components/OcrComparisonModal.tsx
Normal file
@@ -0,0 +1,135 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
|
||||
export function OcrComparisonModal({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard } = h
|
||||
|
||||
return (
|
||||
<div className="fixed inset-0 z-50 flex items-center justify-center p-4 bg-black/50 backdrop-blur-sm">
|
||||
<div className={`relative w-full max-w-6xl max-h-[90vh] overflow-auto rounded-3xl ${glassCard} p-6`}>
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between mb-6">
|
||||
<div>
|
||||
<h2 className={`text-xl font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
OCR-Methoden Vergleich
|
||||
</h2>
|
||||
<p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Seite {h.ocrComparePageIndex !== null ? h.ocrComparePageIndex + 1 : '-'}
|
||||
</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => h.setShowOcrComparison(false)}
|
||||
className={`p-2 rounded-xl ${isDark ? 'hover:bg-white/10 text-white' : 'hover:bg-black/5 text-slate-500'}`}
|
||||
>
|
||||
<svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Loading State */}
|
||||
{h.isComparingOcr && (
|
||||
<div className="flex flex-col items-center justify-center py-12">
|
||||
<div className="w-12 h-12 border-4 border-purple-500 border-t-transparent rounded-full animate-spin mb-4" />
|
||||
<p className={isDark ? 'text-white/60' : 'text-slate-500'}>
|
||||
Vergleiche OCR-Methoden... (kann 1-2 Minuten dauern)
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Error State */}
|
||||
{h.ocrCompareError && (
|
||||
<div className={`p-4 rounded-xl ${isDark ? 'bg-red-500/20 text-red-300' : 'bg-red-100 text-red-700'}`}>
|
||||
Fehler: {h.ocrCompareError}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Results */}
|
||||
{h.ocrCompareResult && !h.isComparingOcr && (
|
||||
<div className="space-y-6">
|
||||
{/* Method Results Grid */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||
{Object.entries(h.ocrCompareResult.methods || {}).map(([key, method]: [string, any]) => (
|
||||
<div
|
||||
key={key}
|
||||
className={`p-4 rounded-2xl ${
|
||||
h.ocrCompareResult.recommendation?.best_method === key
|
||||
? (isDark ? 'bg-green-500/20 border border-green-500/50' : 'bg-green-100 border border-green-300')
|
||||
: (isDark ? 'bg-white/5 border border-white/10' : 'bg-white/50 border border-black/10')
|
||||
}`}
|
||||
>
|
||||
<div className="flex items-center justify-between mb-3">
|
||||
<h3 className={`font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{method.name}
|
||||
</h3>
|
||||
{h.ocrCompareResult.recommendation?.best_method === key && (
|
||||
<span className="px-2 py-1 text-xs font-medium bg-green-500 text-white rounded-full">
|
||||
Beste
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{method.success ? (
|
||||
<>
|
||||
<div className={`text-sm mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
<span className="font-medium">{method.vocabulary_count}</span> Vokabeln in <span className="font-medium">{method.duration_seconds}s</span>
|
||||
</div>
|
||||
|
||||
{method.vocabulary && method.vocabulary.length > 0 && (
|
||||
<div className={`max-h-48 overflow-y-auto rounded-xl p-2 ${isDark ? 'bg-black/20' : 'bg-white/50'}`}>
|
||||
{method.vocabulary.slice(0, 10).map((v: any, idx: number) => (
|
||||
<div key={idx} className={`text-sm py-1 border-b last:border-0 ${isDark ? 'border-white/10 text-white/80' : 'border-black/5 text-slate-700'}`}>
|
||||
<span className="font-medium">{v.english}</span> = {v.german}
|
||||
</div>
|
||||
))}
|
||||
{method.vocabulary.length > 10 && (
|
||||
<div className={`text-xs mt-2 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
+ {method.vocabulary.length - 10} weitere...
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
) : (
|
||||
<div className={`text-sm ${isDark ? 'text-red-300' : 'text-red-600'}`}>
|
||||
{method.error || 'Fehler'}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Comparison Summary */}
|
||||
{h.ocrCompareResult.comparison && (
|
||||
<div className={`p-4 rounded-2xl ${isDark ? 'bg-blue-500/20 border border-blue-500/30' : 'bg-blue-100 border border-blue-200'}`}>
|
||||
<h3 className={`font-semibold mb-3 ${isDark ? 'text-blue-300' : 'text-blue-900'}`}>
|
||||
Uebereinstimmung
|
||||
</h3>
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
|
||||
<div>
|
||||
<span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Von allen erkannt:</span>
|
||||
<span className="ml-2 font-bold">{h.ocrCompareResult.comparison.found_by_all_methods?.length || 0}</span>
|
||||
</div>
|
||||
<div>
|
||||
<span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Nur teilweise:</span>
|
||||
<span className="ml-2 font-bold">{h.ocrCompareResult.comparison.found_by_some_methods?.length || 0}</span>
|
||||
</div>
|
||||
<div>
|
||||
<span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Gesamt einzigartig:</span>
|
||||
<span className="ml-2 font-bold">{h.ocrCompareResult.comparison.total_unique_vocabulary || 0}</span>
|
||||
</div>
|
||||
<div>
|
||||
<span className={isDark ? 'text-blue-200' : 'text-blue-700'}>Uebereinstimmung:</span>
|
||||
<span className="ml-2 font-bold">{Math.round((h.ocrCompareResult.comparison.agreement_rate || 0) * 100)}%</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
125
studio-v2/app/vocab-worksheet/components/OcrSettingsPanel.tsx
Normal file
125
studio-v2/app/vocab-worksheet/components/OcrSettingsPanel.tsx
Normal file
@@ -0,0 +1,125 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
import { defaultOcrPrompts } from '../constants'
|
||||
|
||||
export function OcrSettingsPanel({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard, glassInput } = h
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6 mb-6`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
OCR-Filter Einstellungen
|
||||
</h2>
|
||||
<button
|
||||
onClick={() => h.setShowSettings(false)}
|
||||
className={`p-1 rounded-lg ${isDark ? 'hover:bg-white/10 text-white/60' : 'hover:bg-black/5 text-slate-500'}`}
|
||||
>
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className={`p-4 rounded-xl mb-4 ${isDark ? 'bg-blue-500/20 text-blue-200' : 'bg-blue-100 text-blue-800'}`}>
|
||||
<p className="text-sm">
|
||||
Diese Einstellungen helfen, unerwuenschte Elemente wie Seitenzahlen, Kapitelnamen oder Kopfzeilen aus dem OCR-Ergebnis zu filtern.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
|
||||
{/* Checkboxes */}
|
||||
<div className="space-y-3">
|
||||
<label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={h.ocrPrompts.filterHeaders}
|
||||
onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterHeaders: e.target.checked })}
|
||||
className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
|
||||
/>
|
||||
<span>Kopfzeilen filtern (z.B. Kapitelnamen)</span>
|
||||
</label>
|
||||
|
||||
<label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={h.ocrPrompts.filterFooters}
|
||||
onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterFooters: e.target.checked })}
|
||||
className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
|
||||
/>
|
||||
<span>Fusszeilen filtern</span>
|
||||
</label>
|
||||
|
||||
<label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={h.ocrPrompts.filterPageNumbers}
|
||||
onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, filterPageNumbers: e.target.checked })}
|
||||
className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500"
|
||||
/>
|
||||
<span>Seitenzahlen filtern (auch ausgeschrieben: "zweihundertzwoelf")</span>
|
||||
</label>
|
||||
</div>
|
||||
|
||||
{/* Patterns */}
|
||||
<div className="space-y-4">
|
||||
<div>
|
||||
<label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
Kopfzeilen-Muster (kommagetrennt)
|
||||
</label>
|
||||
<input
|
||||
type="text"
|
||||
value={h.ocrPrompts.headerPatterns.join(', ')}
|
||||
onChange={(e) => h.saveOcrPrompts({
|
||||
...h.ocrPrompts,
|
||||
headerPatterns: e.target.value.split(',').map(s => s.trim()).filter(Boolean)
|
||||
})}
|
||||
placeholder="Unit, Chapter, Lesson..."
|
||||
className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
Fusszeilen-Muster (kommagetrennt)
|
||||
</label>
|
||||
<input
|
||||
type="text"
|
||||
value={h.ocrPrompts.footerPatterns.join(', ')}
|
||||
onChange={(e) => h.saveOcrPrompts({
|
||||
...h.ocrPrompts,
|
||||
footerPatterns: e.target.value.split(',').map(s => s.trim()).filter(Boolean)
|
||||
})}
|
||||
placeholder="zweihundert, Page, Seite..."
|
||||
className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="mt-4">
|
||||
<label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
Zusaetzlicher Filter-Prompt (optional)
|
||||
</label>
|
||||
<textarea
|
||||
value={h.ocrPrompts.customFilter}
|
||||
onChange={(e) => h.saveOcrPrompts({ ...h.ocrPrompts, customFilter: e.target.value })}
|
||||
placeholder="z.B.: Ignoriere alle Zeilen, die nur Zahlen oder Buchstaben enthalten..."
|
||||
rows={2}
|
||||
className={`w-full px-4 py-2 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500 resize-none`}
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 flex justify-end">
|
||||
<button
|
||||
onClick={() => h.saveOcrPrompts(defaultOcrPrompts)}
|
||||
className={`px-4 py-2 rounded-xl text-sm ${isDark ? 'text-white/60 hover:text-white' : 'text-slate-500 hover:text-slate-700'}`}
|
||||
>
|
||||
Auf Standard zuruecksetzen
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
108
studio-v2/app/vocab-worksheet/components/PageSelection.tsx
Normal file
108
studio-v2/app/vocab-worksheet/components/PageSelection.tsx
Normal file
@@ -0,0 +1,108 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
|
||||
export function PageSelection({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard } = h
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
PDF-Seiten auswaehlen ({h.selectedPages.length} von {h.pdfPageCount - h.excludedPages.length} ausgewaehlt)
|
||||
</h2>
|
||||
<div className="flex gap-2">
|
||||
{h.excludedPages.length > 0 && (
|
||||
<button onClick={h.restoreExcludedPages} className={`px-3 py-1 rounded-lg text-sm ${isDark ? 'bg-orange-500/20 text-orange-300 hover:bg-orange-500/30' : 'bg-orange-100 text-orange-700 hover:bg-orange-200'}`}>
|
||||
{h.excludedPages.length} ausgeblendet - wiederherstellen
|
||||
</button>
|
||||
)}
|
||||
<button onClick={h.selectAllPages} className={`px-3 py-1 rounded-lg text-sm transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
|
||||
Alle
|
||||
</button>
|
||||
<button onClick={h.selectNoPages} className={`px-3 py-1 rounded-lg text-sm transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
|
||||
Keine
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p className={`text-sm mb-4 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Klicken Sie auf eine Seite um sie auszuwaehlen. Klicken Sie auf das X um leere Seiten auszublenden.
|
||||
</p>
|
||||
|
||||
{h.isLoadingThumbnails ? (
|
||||
<div className="flex items-center justify-center py-12">
|
||||
<div className="w-8 h-8 border-4 border-purple-500 border-t-transparent rounded-full animate-spin" />
|
||||
<span className={`ml-3 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Lade Seitenvorschau...</span>
|
||||
</div>
|
||||
) : (
|
||||
<div className="grid grid-cols-2 sm:grid-cols-3 md:grid-cols-4 lg:grid-cols-6 gap-4 mb-6">
|
||||
{h.pagesThumbnails.map((thumb, idx) => {
|
||||
if (h.excludedPages.includes(idx)) return null
|
||||
return (
|
||||
<div key={idx} className="relative group">
|
||||
{/* Exclude/Delete Button */}
|
||||
<button
|
||||
onClick={(e) => h.excludePage(idx, e)}
|
||||
className="absolute top-1 left-1 z-10 p-1 rounded-full opacity-0 group-hover:opacity-100 transition-opacity bg-red-500/80 hover:bg-red-600 text-white"
|
||||
title="Seite ausblenden"
|
||||
>
|
||||
<svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
|
||||
</svg>
|
||||
</button>
|
||||
|
||||
{/* OCR Compare Button */}
|
||||
<button
|
||||
onClick={(e) => { e.stopPropagation(); h.runOcrComparison(idx); }}
|
||||
className="absolute top-1 right-1 z-10 p-1 rounded-full opacity-0 group-hover:opacity-100 transition-opacity bg-blue-500/80 hover:bg-blue-600 text-white"
|
||||
title="OCR-Methoden vergleichen"
|
||||
>
|
||||
<svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
|
||||
</svg>
|
||||
</button>
|
||||
|
||||
<button
|
||||
onClick={() => h.togglePageSelection(idx)}
|
||||
className={`relative rounded-xl overflow-hidden border-2 transition-all w-full ${
|
||||
h.selectedPages.includes(idx)
|
||||
? 'border-purple-500 ring-2 ring-purple-500/50'
|
||||
: (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
|
||||
}`}
|
||||
>
|
||||
<img src={thumb} alt={`Seite ${idx + 1}`} className="w-full h-auto" />
|
||||
<div className={`absolute bottom-0 left-0 right-0 py-1 text-center text-xs font-medium ${
|
||||
h.selectedPages.includes(idx)
|
||||
? 'bg-purple-500 text-white'
|
||||
: (isDark ? 'bg-black/60 text-white/80' : 'bg-white/90 text-slate-700')
|
||||
}`}>
|
||||
Seite {idx + 1}
|
||||
</div>
|
||||
{h.selectedPages.includes(idx) && (
|
||||
<div className="absolute top-2 right-2 w-6 h-6 bg-purple-500 rounded-full flex items-center justify-center">
|
||||
<svg className="w-4 h-4 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
</div>
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex justify-center">
|
||||
<button
|
||||
onClick={h.processSelectedPages}
|
||||
disabled={h.selectedPages.length === 0 || h.isExtracting}
|
||||
className="px-8 py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-2xl font-semibold disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all transform hover:scale-105"
|
||||
>
|
||||
{h.isExtracting ? 'Extrahiere Vokabeln...' : `${h.selectedPages.length} Seiten verarbeiten`}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
31
studio-v2/app/vocab-worksheet/components/QRCodeModal.tsx
Normal file
31
studio-v2/app/vocab-worksheet/components/QRCodeModal.tsx
Normal file
@@ -0,0 +1,31 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import { QRCodeUpload } from '@/components/QRCodeUpload'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
|
||||
export function QRCodeModal({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark } = h
|
||||
|
||||
return (
|
||||
<div className="fixed inset-0 z-50 flex items-center justify-center p-4">
|
||||
<div className="absolute inset-0 bg-black/50 backdrop-blur-sm" onClick={() => h.setShowQRModal(false)} />
|
||||
<div className={`relative w-full max-w-md rounded-3xl ${
|
||||
isDark ? 'bg-slate-900' : 'bg-white'
|
||||
}`}>
|
||||
<QRCodeUpload
|
||||
sessionId={h.uploadSessionId}
|
||||
onClose={() => h.setShowQRModal(false)}
|
||||
onFilesChanged={(files) => {
|
||||
h.setMobileUploadedFiles(files)
|
||||
if (files.length > 0) {
|
||||
h.setSelectedMobileFile(files[files.length - 1])
|
||||
h.setDirectFile(null)
|
||||
h.setSelectedDocumentId(null)
|
||||
}
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
157
studio-v2/app/vocab-worksheet/components/SpreadsheetTab.tsx
Normal file
157
studio-v2/app/vocab-worksheet/components/SpreadsheetTab.tsx
Normal file
@@ -0,0 +1,157 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* SpreadsheetTab — Fortune Sheet editor for vocabulary data.
|
||||
*
|
||||
* Converts VocabularyEntry[] into a Fortune Sheet workbook
|
||||
* where users can edit vocabulary in a familiar Excel-like UI.
|
||||
*/
|
||||
|
||||
import React, { useMemo, useCallback } from 'react'
|
||||
import dynamic from 'next/dynamic'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
|
||||
const Workbook = dynamic(
|
||||
() => import('@fortune-sheet/react').then((m) => m.Workbook),
|
||||
{ ssr: false, loading: () => <div className="py-8 text-center text-sm text-gray-400">Spreadsheet wird geladen...</div> },
|
||||
)
|
||||
|
||||
import '@fortune-sheet/react/dist/index.css'
|
||||
|
||||
/** Convert VocabularyEntry[] to Fortune Sheet sheet data */
|
||||
function vocabToSheet(vocabulary: VocabWorksheetHook['vocabulary']) {
|
||||
const headers = ['Englisch', 'Deutsch', 'Beispielsatz', 'Wortart', 'Seite']
|
||||
const numCols = headers.length
|
||||
const numRows = vocabulary.length + 1 // +1 for header
|
||||
|
||||
const celldata: any[] = []
|
||||
|
||||
// Header row
|
||||
headers.forEach((label, c) => {
|
||||
celldata.push({
|
||||
r: 0,
|
||||
c,
|
||||
v: { v: label, m: label, bl: 1, bg: '#f0f4ff', fc: '#1e293b' },
|
||||
})
|
||||
})
|
||||
|
||||
// Data rows
|
||||
vocabulary.forEach((entry, idx) => {
|
||||
const r = idx + 1
|
||||
celldata.push({ r, c: 0, v: { v: entry.english, m: entry.english } })
|
||||
celldata.push({ r, c: 1, v: { v: entry.german, m: entry.german } })
|
||||
celldata.push({ r, c: 2, v: { v: entry.example_sentence || '', m: entry.example_sentence || '' } })
|
||||
celldata.push({ r, c: 3, v: { v: entry.word_type || '', m: entry.word_type || '' } })
|
||||
celldata.push({ r, c: 4, v: { v: entry.source_page != null ? String(entry.source_page) : '', m: entry.source_page != null ? String(entry.source_page) : '' } })
|
||||
})
|
||||
|
||||
// Column widths
|
||||
const columnlen: Record<string, number> = {
|
||||
'0': 180, // Englisch
|
||||
'1': 180, // Deutsch
|
||||
'2': 280, // Beispielsatz
|
||||
'3': 100, // Wortart
|
||||
'4': 60, // Seite
|
||||
}
|
||||
|
||||
// Row heights
|
||||
const rowlen: Record<string, number> = {}
|
||||
rowlen['0'] = 28 // header
|
||||
|
||||
// Borders: light grid
|
||||
const borderInfo = numRows > 0 && numCols > 0 ? [{
|
||||
rangeType: 'range',
|
||||
borderType: 'border-all',
|
||||
color: '#e5e7eb',
|
||||
style: 1,
|
||||
range: [{ row: [0, numRows - 1], column: [0, numCols - 1] }],
|
||||
}] : []
|
||||
|
||||
return {
|
||||
name: 'Vokabeln',
|
||||
id: 'vocab_sheet',
|
||||
celldata,
|
||||
row: numRows,
|
||||
column: numCols,
|
||||
status: 1,
|
||||
config: {
|
||||
columnlen,
|
||||
rowlen,
|
||||
borderInfo,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
export function SpreadsheetTab({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard, vocabulary } = h
|
||||
|
||||
const sheets = useMemo(() => {
|
||||
if (!vocabulary || vocabulary.length === 0) return []
|
||||
return [vocabToSheet(vocabulary)]
|
||||
}, [vocabulary])
|
||||
|
||||
const estimatedHeight = Math.max(500, (vocabulary.length + 2) * 26 + 80)
|
||||
|
||||
const handleSaveFromSheet = useCallback(async () => {
|
||||
await h.saveVocabulary()
|
||||
}, [h])
|
||||
|
||||
if (vocabulary.length === 0) {
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<p className={`text-center py-12 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
Keine Vokabeln vorhanden. Bitte zuerst Seiten verarbeiten.
|
||||
</p>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-4`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Spreadsheet-Editor
|
||||
</h2>
|
||||
<div className="flex items-center gap-3">
|
||||
<span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{vocabulary.length} Vokabeln
|
||||
</span>
|
||||
<button
|
||||
onClick={handleSaveFromSheet}
|
||||
className="px-4 py-2 rounded-xl text-sm font-medium bg-gradient-to-r from-purple-500 to-pink-500 text-white hover:shadow-lg transition-all"
|
||||
>
|
||||
Speichern
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div
|
||||
className="rounded-xl overflow-hidden border"
|
||||
style={{
|
||||
borderColor: isDark ? 'rgba(255,255,255,0.1)' : 'rgba(0,0,0,0.1)',
|
||||
}}
|
||||
>
|
||||
{sheets.length > 0 && (
|
||||
<div style={{ width: '100%', height: `${estimatedHeight}px` }}>
|
||||
<Workbook
|
||||
data={sheets}
|
||||
lang="en"
|
||||
showToolbar
|
||||
showFormulaBar={false}
|
||||
showSheetTabs={false}
|
||||
toolbarItems={[
|
||||
'undo', 'redo', '|',
|
||||
'font-bold', 'font-italic', 'font-strikethrough', '|',
|
||||
'font-color', 'background', '|',
|
||||
'font-size', '|',
|
||||
'horizontal-align', 'vertical-align', '|',
|
||||
'text-wrap', '|',
|
||||
'border',
|
||||
]}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
315
studio-v2/app/vocab-worksheet/components/UploadScreen.tsx
Normal file
315
studio-v2/app/vocab-worksheet/components/UploadScreen.tsx
Normal file
@@ -0,0 +1,315 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
import { formatFileSize } from '../constants'
|
||||
|
||||
export function UploadScreen({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard, glassInput } = h
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
{/* Existing Sessions */}
|
||||
{h.existingSessions.length > 0 && (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Vorhandene Sessions fortsetzen
|
||||
</h2>
|
||||
{h.isLoadingSessions ? (
|
||||
<div className="flex items-center gap-3 py-4">
|
||||
<div className="w-5 h-5 border-2 border-purple-500 border-t-transparent rounded-full animate-spin" />
|
||||
<span className={isDark ? 'text-white/60' : 'text-slate-500'}>Lade Sessions...</span>
|
||||
</div>
|
||||
) : (
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||
{h.existingSessions.map((s) => (
|
||||
<div
|
||||
key={s.id}
|
||||
className={`${glassCard} p-4 rounded-xl text-left transition-all hover:shadow-lg relative group cursor-pointer ${
|
||||
isDark ? 'hover:border-purple-400/50' : 'hover:border-purple-400'
|
||||
}`}
|
||||
onClick={() => h.resumeSession(s)}
|
||||
>
|
||||
{/* Delete Button */}
|
||||
<button
|
||||
onClick={(e) => h.deleteSession(s.id, e)}
|
||||
className={`absolute top-2 right-2 p-1.5 rounded-lg opacity-0 group-hover:opacity-100 transition-opacity ${
|
||||
isDark ? 'hover:bg-red-500/20 text-red-400' : 'hover:bg-red-100 text-red-500'
|
||||
}`}
|
||||
title="Session loeschen"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
|
||||
</svg>
|
||||
</button>
|
||||
|
||||
<div className="flex items-start gap-3">
|
||||
<div className={`w-10 h-10 rounded-lg flex items-center justify-center flex-shrink-0 ${
|
||||
s.status === 'extracted' || s.status === 'completed'
|
||||
? (isDark ? 'bg-green-500/30' : 'bg-green-100')
|
||||
: (isDark ? 'bg-white/10' : 'bg-slate-100')
|
||||
}`}>
|
||||
{s.status === 'extracted' || s.status === 'completed' ? (
|
||||
<svg className="w-5 h-5 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
) : (
|
||||
<svg className={`w-5 h-5 ${isDark ? 'text-white/40' : 'text-slate-400'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" />
|
||||
</svg>
|
||||
)}
|
||||
</div>
|
||||
<div className="flex-1 min-w-0">
|
||||
<h3 className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{s.name}</h3>
|
||||
<p className={`text-sm ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
{s.vocabulary_count} Vokabeln
|
||||
{s.status === 'pending' && ' • Nicht gestartet'}
|
||||
{s.status === 'extracted' && ' • Bereit'}
|
||||
{s.status === 'completed' && ' • Abgeschlossen'}
|
||||
</p>
|
||||
{s.created_at && (
|
||||
<p className={`text-xs mt-1 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
{new Date(s.created_at).toLocaleDateString('de-DE', {
|
||||
day: '2-digit',
|
||||
month: '2-digit',
|
||||
year: 'numeric',
|
||||
hour: '2-digit',
|
||||
minute: '2-digit'
|
||||
})}
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
<svg className={`w-5 h-5 flex-shrink-0 ${isDark ? 'text-white/30' : 'text-slate-300'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
|
||||
</svg>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Explanation */}
|
||||
<div className={`${glassCard} rounded-2xl p-6 ${isDark ? 'bg-gradient-to-br from-purple-500/20 to-pink-500/20' : 'bg-gradient-to-br from-purple-100/50 to-pink-100/50'}`}>
|
||||
<h2 className={`text-lg font-semibold mb-3 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{h.existingSessions.length > 0 ? 'Oder neue Session starten:' : 'So funktioniert es:'}
|
||||
</h2>
|
||||
<ol className={`space-y-2 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
{['Dokument (Bild oder PDF) auswaehlen', 'Vorschau pruefen und Session benennen', 'Bei PDFs: Seiten auswaehlen die verarbeitet werden sollen', 'KI extrahiert Vokabeln — pruefen, korrigieren, Arbeitsblatt-Typ waehlen', 'PDF herunterladen und ausdrucken'].map((text, i) => (
|
||||
<li key={i} className="flex items-start gap-2">
|
||||
<span className={`w-6 h-6 rounded-full flex items-center justify-center text-xs font-bold flex-shrink-0 ${isDark ? 'bg-purple-500/30 text-purple-300' : 'bg-purple-200 text-purple-700'}`}>{i + 1}</span>
|
||||
<span>{text}</span>
|
||||
</li>
|
||||
))}
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
{/* Step 1: Document Selection */}
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
1. Dokument auswaehlen
|
||||
</h2>
|
||||
|
||||
<input ref={h.directFileInputRef} type="file" accept="image/png,image/jpeg,image/jpg,application/pdf" onChange={h.handleDirectFileSelect} className="hidden" />
|
||||
|
||||
<div className="grid grid-cols-2 gap-3 mb-4">
|
||||
{/* File Upload Button */}
|
||||
<button
|
||||
onClick={() => h.directFileInputRef.current?.click()}
|
||||
className={`p-4 rounded-xl border-2 border-dashed transition-all ${
|
||||
h.directFile
|
||||
? (isDark ? 'border-green-400/50 bg-green-500/20' : 'border-green-500 bg-green-50')
|
||||
: (isDark ? 'border-white/20 hover:border-purple-400/50' : 'border-slate-300 hover:border-purple-500')
|
||||
}`}
|
||||
>
|
||||
{h.directFile ? (
|
||||
<div className="flex items-center gap-3">
|
||||
<span className="text-2xl">{h.directFile.type === 'application/pdf' ? '📄' : '🖼️'}</span>
|
||||
<div className="text-left flex-1 min-w-0">
|
||||
<p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{h.directFile.name}</p>
|
||||
<p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(h.directFile.size)}</p>
|
||||
</div>
|
||||
<svg className="w-5 h-5 text-green-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
</div>
|
||||
) : (
|
||||
<div className={`text-center ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
<span className="text-2xl block mb-1">📁</span>
|
||||
<span className="text-sm">Datei auswaehlen</span>
|
||||
</div>
|
||||
)}
|
||||
</button>
|
||||
|
||||
{/* QR Code Upload Button */}
|
||||
<button
|
||||
onClick={() => h.setShowQRModal(true)}
|
||||
className={`p-4 rounded-xl border-2 border-dashed transition-all ${
|
||||
h.selectedMobileFile
|
||||
? (isDark ? 'border-green-400/50 bg-green-500/20' : 'border-green-500 bg-green-50')
|
||||
: (isDark ? 'border-white/20 hover:border-purple-400/50' : 'border-slate-300 hover:border-purple-500')
|
||||
}`}
|
||||
>
|
||||
{h.selectedMobileFile ? (
|
||||
<div className="flex items-center gap-3">
|
||||
<span className="text-2xl">{h.selectedMobileFile.type.startsWith('image/') ? '🖼️' : '📄'}</span>
|
||||
<div className="text-left flex-1 min-w-0">
|
||||
<p className={`font-medium truncate text-sm ${isDark ? 'text-white' : 'text-slate-900'}`}>{h.selectedMobileFile.name}</p>
|
||||
<p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>vom Handy</p>
|
||||
</div>
|
||||
<svg className="w-5 h-5 text-green-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
</div>
|
||||
) : (
|
||||
<div className={`text-center ${isDark ? 'text-white/60' : 'text-slate-500'}`}>
|
||||
<span className="text-2xl block mb-1">📱</span>
|
||||
<span className="text-sm">Mit Handy scannen</span>
|
||||
</div>
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Mobile Uploaded Files */}
|
||||
{h.mobileUploadedFiles.length > 0 && !h.directFile && (
|
||||
<>
|
||||
<div className={`text-center text-sm mb-3 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>— Vom Handy hochgeladen —</div>
|
||||
<div className="space-y-2 max-h-32 overflow-y-auto mb-4">
|
||||
{h.mobileUploadedFiles.map((file) => (
|
||||
<button
|
||||
key={file.id}
|
||||
onClick={() => { h.setSelectedMobileFile(file); h.setDirectFile(null); h.setSelectedDocumentId(null); h.setError(null) }}
|
||||
className={`w-full flex items-center gap-3 p-3 rounded-xl text-left transition-all ${
|
||||
h.selectedMobileFile?.id === file.id
|
||||
? (isDark ? 'bg-green-500/30 border-2 border-green-400/50' : 'bg-green-100 border-2 border-green-500')
|
||||
: (isDark ? 'bg-white/5 border-2 border-transparent hover:border-white/20' : 'bg-slate-50 border-2 border-transparent hover:border-slate-200')
|
||||
}`}
|
||||
>
|
||||
<span className="text-xl">{file.type.startsWith('image/') ? '🖼️' : '📄'}</span>
|
||||
<div className="flex-1 min-w-0">
|
||||
<p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{file.name}</p>
|
||||
<p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(file.size)}</p>
|
||||
</div>
|
||||
{h.selectedMobileFile?.id === file.id && (
|
||||
<svg className="w-5 h-5 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
)}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
|
||||
{/* Stored Documents */}
|
||||
{h.storedDocuments.length > 0 && !h.directFile && !h.selectedMobileFile && (
|
||||
<>
|
||||
<div className={`text-center text-sm mb-3 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>— oder aus Ihren Dokumenten —</div>
|
||||
<div className="space-y-2 max-h-32 overflow-y-auto">
|
||||
{h.storedDocuments.map((doc) => (
|
||||
<button
|
||||
key={doc.id}
|
||||
onClick={() => { h.setSelectedDocumentId(doc.id); h.setDirectFile(null); h.setSelectedMobileFile(null); h.setError(null) }}
|
||||
className={`w-full flex items-center gap-3 p-3 rounded-xl text-left transition-all ${
|
||||
h.selectedDocumentId === doc.id
|
||||
? (isDark ? 'bg-purple-500/30 border-2 border-purple-400/50' : 'bg-purple-100 border-2 border-purple-500')
|
||||
: (isDark ? 'bg-white/5 border-2 border-transparent hover:border-white/20' : 'bg-slate-50 border-2 border-transparent hover:border-slate-200')
|
||||
}`}
|
||||
>
|
||||
<span className="text-xl">{doc.type === 'application/pdf' ? '📄' : '🖼️'}</span>
|
||||
<div className="flex-1 min-w-0">
|
||||
<p className={`font-medium truncate ${isDark ? 'text-white' : 'text-slate-900'}`}>{doc.name}</p>
|
||||
<p className={`text-xs ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{formatFileSize(doc.size)}</p>
|
||||
</div>
|
||||
{h.selectedDocumentId === doc.id && (
|
||||
<svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
)}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Step 2: Preview + Session Name */}
|
||||
{(h.directFile || h.selectedMobileFile || h.selectedDocumentId) && (
|
||||
<div className="grid grid-cols-1 lg:grid-cols-5 gap-6">
|
||||
{/* Document Preview */}
|
||||
<div className={`${glassCard} rounded-2xl p-6 lg:col-span-3`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Vorschau
|
||||
</h2>
|
||||
<button
|
||||
onClick={() => h.setShowFullPreview(true)}
|
||||
className={`px-3 py-1.5 rounded-lg text-sm font-medium transition-all flex items-center gap-2 ${
|
||||
isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-700'
|
||||
}`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 21l-6-6m2-5a7 7 0 11-14 0 7 7 0 0114 0zM10 7v3m0 0v3m0-3h3m-3 0H7" />
|
||||
</svg>
|
||||
Originalgroesse
|
||||
</button>
|
||||
</div>
|
||||
<div className={`max-h-[60vh] overflow-auto rounded-xl border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
|
||||
{h.directFile?.type.startsWith('image/') && h.directFilePreview && (
|
||||
<img src={h.directFilePreview} alt="Vorschau" className="w-full h-auto" />
|
||||
)}
|
||||
{h.directFile?.type === 'application/pdf' && h.directFilePreview && (
|
||||
<iframe src={h.directFilePreview} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
|
||||
)}
|
||||
{h.selectedMobileFile && !h.directFile && (
|
||||
h.selectedMobileFile.type.startsWith('image/')
|
||||
? <img src={h.selectedMobileFile.dataUrl} alt="Vorschau" className="w-full h-auto" />
|
||||
: <iframe src={h.selectedMobileFile.dataUrl} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
|
||||
)}
|
||||
{h.selectedDocumentId && !h.directFile && !h.selectedMobileFile && (() => {
|
||||
const doc = h.storedDocuments.find(d => d.id === h.selectedDocumentId)
|
||||
if (!doc?.url) return <p className={`p-8 text-center ${isDark ? 'text-white/40' : 'text-slate-400'}`}>Keine Vorschau verfuegbar</p>
|
||||
return doc.type.startsWith('image/')
|
||||
? <img src={doc.url} alt="Vorschau" className="w-full h-auto" />
|
||||
: <iframe src={doc.url} className="w-full border-0 rounded-xl" style={{ height: '60vh' }} />
|
||||
})()}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Session Name + Start */}
|
||||
<div className={`${glassCard} rounded-2xl p-6 lg:col-span-2 flex flex-col`}>
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
2. Session benennen
|
||||
</h2>
|
||||
<input
|
||||
type="text"
|
||||
value={h.sessionName}
|
||||
onChange={(e) => { h.setSessionName(e.target.value); h.setError(null) }}
|
||||
placeholder="z.B. Englisch Klasse 7 - Unit 3"
|
||||
className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500 mb-4`}
|
||||
autoFocus
|
||||
/>
|
||||
<p className={`text-sm mb-6 ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
Benennen Sie die Session z.B. nach dem Schulbuch-Kapitel, damit Sie sie spaeter wiederfinden.
|
||||
</p>
|
||||
<div className="flex-1" />
|
||||
<button
|
||||
onClick={() => {
|
||||
if (!h.sessionName.trim()) {
|
||||
h.setError('Bitte geben Sie einen Session-Namen ein (z.B. "Englisch Klasse 7 - Unit 3")')
|
||||
return
|
||||
}
|
||||
h.startSession()
|
||||
}}
|
||||
disabled={h.isCreatingSession || !h.sessionName.trim()}
|
||||
className="w-full px-6 py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-2xl font-semibold text-lg disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all transform hover:scale-105"
|
||||
>
|
||||
{h.isCreatingSession ? 'Verarbeite...' : 'Weiter →'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
312
studio-v2/app/vocab-worksheet/components/VocabularyTab.tsx
Normal file
312
studio-v2/app/vocab-worksheet/components/VocabularyTab.tsx
Normal file
@@ -0,0 +1,312 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook, IpaMode, SyllableMode } from '../types'
|
||||
import { getApiBase } from '../constants'
|
||||
|
||||
export function VocabularyTab({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard, glassInput } = h
|
||||
const extras = h.getAllExtraColumns()
|
||||
const baseCols = 3 + extras.length
|
||||
const gridCols = `14px 32px 36px repeat(${baseCols}, 1fr) 32px`
|
||||
|
||||
return (
|
||||
<div className="flex flex-col lg:flex-row gap-4" style={{ height: 'calc(100vh - 240px)', minHeight: '500px' }}>
|
||||
{/* Left: Original pages */}
|
||||
<div className={`${glassCard} rounded-2xl p-4 lg:w-1/3 flex flex-col overflow-hidden`}>
|
||||
<h2 className={`text-sm font-semibold mb-3 flex-shrink-0 ${isDark ? 'text-white/70' : 'text-slate-600'}`}>
|
||||
Original ({(() => { const pp = h.selectedPages.length > 0 ? h.selectedPages : [...new Set(h.vocabulary.map(v => (v.source_page || 1) - 1))]; return pp.length; })()} Seiten)
|
||||
</h2>
|
||||
<div className="flex-1 overflow-y-auto space-y-3">
|
||||
{(() => {
|
||||
const processedPageIndices = h.selectedPages.length > 0
|
||||
? h.selectedPages
|
||||
: [...new Set(h.vocabulary.map(v => (v.source_page || 1) - 1))].sort((a, b) => a - b)
|
||||
|
||||
const apiBase = getApiBase()
|
||||
const pagesToShow = processedPageIndices
|
||||
.filter(idx => idx >= 0)
|
||||
.map(idx => ({
|
||||
idx,
|
||||
src: h.session ? `${apiBase}/api/v1/vocab/sessions/${h.session.id}/pdf-page-image/${idx}` : null,
|
||||
}))
|
||||
.filter(t => t.src !== null) as { idx: number; src: string }[]
|
||||
|
||||
if (pagesToShow.length > 0) {
|
||||
return pagesToShow.map(({ idx, src }) => (
|
||||
<div key={idx} className={`relative rounded-xl overflow-hidden border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
|
||||
<div className={`absolute top-2 left-2 px-2 py-0.5 rounded-lg text-xs font-medium z-10 ${isDark ? 'bg-black/60 text-white' : 'bg-white/90 text-slate-700'}`}>
|
||||
S. {idx + 1}
|
||||
</div>
|
||||
<img src={src} alt={`Seite ${idx + 1}`} className="w-full h-auto" />
|
||||
</div>
|
||||
))
|
||||
}
|
||||
if (h.uploadedImage) {
|
||||
return (
|
||||
<div className={`relative rounded-xl overflow-hidden border ${isDark ? 'border-white/10' : 'border-black/10'}`}>
|
||||
<img src={h.uploadedImage} alt="Arbeitsblatt" className="w-full h-auto" />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<div className={`flex-1 flex items-center justify-center py-12 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
<div className="text-center">
|
||||
<svg className="w-12 h-12 mx-auto mb-2 opacity-50" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M4 16l4.586-4.586a2 2 0 012.828 0L16 16m-2-2l1.586-1.586a2 2 0 012.828 0L20 14m-6-6h.01M6 20h12a2 2 0 002-2V6a2 2 0 00-2-2H6a2 2 0 00-2 2v12a2 2 0 002 2z" />
|
||||
</svg>
|
||||
<p className="text-xs">Kein Bild verfuegbar</p>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
})()}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Right: Vocabulary table */}
|
||||
<div className={`${glassCard} rounded-2xl p-4 lg:w-2/3 flex flex-col overflow-hidden`}>
|
||||
<div className="flex items-center justify-between mb-3 flex-shrink-0">
|
||||
<h2 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Vokabeln ({h.vocabulary.length})
|
||||
</h2>
|
||||
<div className="flex items-center gap-2">
|
||||
{/* IPA mode */}
|
||||
<select
|
||||
value={h.ipaMode}
|
||||
onChange={(e) => {
|
||||
const newIpa = e.target.value as IpaMode
|
||||
h.setIpaMode(newIpa)
|
||||
h.reprocessPages(newIpa, h.syllableMode)
|
||||
}}
|
||||
className={`px-2 py-1.5 text-xs rounded-md border ${isDark ? 'border-white/20 bg-white/10 text-white' : 'border-gray-200 bg-white text-gray-600'}`}
|
||||
title="Lautschrift (IPA)"
|
||||
>
|
||||
<option value="none">IPA: Aus</option>
|
||||
<option value="auto">IPA: Auto</option>
|
||||
<option value="en">IPA: nur EN</option>
|
||||
<option value="de">IPA: nur DE</option>
|
||||
<option value="all">IPA: Alle</option>
|
||||
</select>
|
||||
{/* Syllable mode */}
|
||||
<select
|
||||
value={h.syllableMode}
|
||||
onChange={(e) => {
|
||||
const newSyl = e.target.value as SyllableMode
|
||||
h.setSyllableMode(newSyl)
|
||||
h.reprocessPages(h.ipaMode, newSyl)
|
||||
}}
|
||||
className={`px-2 py-1.5 text-xs rounded-md border ${isDark ? 'border-white/20 bg-white/10 text-white' : 'border-gray-200 bg-white text-gray-600'}`}
|
||||
title="Silbentrennung"
|
||||
>
|
||||
<option value="none">Silben: Aus</option>
|
||||
<option value="auto">Silben: Original</option>
|
||||
<option value="en">Silben: nur EN</option>
|
||||
<option value="de">Silben: nur DE</option>
|
||||
<option value="all">Silben: Alle</option>
|
||||
</select>
|
||||
<button
|
||||
onClick={() => h.reprocessPages(h.ipaMode, h.syllableMode)}
|
||||
className={`px-3 py-2 rounded-xl text-sm font-medium transition-colors ${isDark ? 'bg-orange-500/20 hover:bg-orange-500/30 text-orange-200 border border-orange-500/30' : 'bg-orange-50 hover:bg-orange-100 text-orange-700 border border-orange-200'}`}
|
||||
title="Seiten erneut verarbeiten (OCR + Zeilenmerge)"
|
||||
>
|
||||
Neu verarbeiten
|
||||
</button>
|
||||
<button onClick={h.saveVocabulary} className={`px-4 py-2 rounded-xl text-sm font-medium transition-colors ${isDark ? 'bg-white/10 hover:bg-white/20 text-white' : 'bg-slate-100 hover:bg-slate-200 text-slate-900'}`}>
|
||||
Speichern
|
||||
</button>
|
||||
<button onClick={() => h.setActiveTab('worksheet')} className="px-4 py-2 rounded-xl text-sm font-medium bg-gradient-to-r from-purple-500 to-pink-500 text-white hover:shadow-lg transition-all">
|
||||
Weiter →
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Error messages for failed pages */}
|
||||
{h.processingErrors.length > 0 && (
|
||||
<div className={`rounded-xl p-3 mb-3 flex-shrink-0 ${isDark ? 'bg-orange-500/20 text-orange-200 border border-orange-500/30' : 'bg-orange-100 text-orange-700 border border-orange-200'}`}>
|
||||
<div className="font-medium mb-1 text-sm">Einige Seiten konnten nicht verarbeitet werden:</div>
|
||||
<ul className="text-xs space-y-0.5">
|
||||
{h.processingErrors.map((err, idx) => (
|
||||
<li key={idx}>• {err}</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Processing Progress */}
|
||||
{h.currentlyProcessingPage && (
|
||||
<div className={`rounded-xl p-3 mb-3 flex-shrink-0 ${isDark ? 'bg-purple-500/20 border border-purple-500/30' : 'bg-purple-100 border border-purple-200'}`}>
|
||||
<div className="flex items-center gap-3">
|
||||
<div className={`w-4 h-4 border-2 ${isDark ? 'border-purple-300' : 'border-purple-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
<div>
|
||||
<div className={`text-sm font-medium ${isDark ? 'text-purple-200' : 'text-purple-700'}`}>Verarbeite Seite {h.currentlyProcessingPage}...</div>
|
||||
<div className={`text-xs ${isDark ? 'text-purple-300/70' : 'text-purple-600'}`}>
|
||||
{h.successfulPages.length > 0 && `${h.successfulPages.length} Seite(n) fertig • `}
|
||||
{h.vocabulary.length} Vokabeln bisher
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Success info */}
|
||||
{!h.currentlyProcessingPage && h.successfulPages.length > 0 && h.failedPages.length === 0 && (
|
||||
<div className={`rounded-xl p-2 mb-3 text-xs flex-shrink-0 ${isDark ? 'bg-green-500/20 text-green-200 border border-green-500/30' : 'bg-green-100 text-green-700 border border-green-200'}`}>
|
||||
Alle {h.successfulPages.length} Seite(n) erfolgreich verarbeitet - {h.vocabulary.length} Vokabeln insgesamt
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Partial success info */}
|
||||
{!h.currentlyProcessingPage && h.successfulPages.length > 0 && h.failedPages.length > 0 && (
|
||||
<div className={`rounded-xl p-2 mb-3 text-xs flex-shrink-0 ${isDark ? 'bg-yellow-500/20 text-yellow-200 border border-yellow-500/30' : 'bg-yellow-100 text-yellow-700 border border-yellow-200'}`}>
|
||||
{h.successfulPages.length} Seite(n) erfolgreich, {h.failedPages.length} fehlgeschlagen - {h.vocabulary.length} Vokabeln extrahiert
|
||||
</div>
|
||||
)}
|
||||
|
||||
{h.vocabulary.length === 0 ? (
|
||||
<p className={`text-center py-8 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Keine Vokabeln gefunden.</p>
|
||||
) : (
|
||||
<div className="flex flex-col flex-1 overflow-hidden">
|
||||
{/* Fixed Header */}
|
||||
<div className={`flex-shrink-0 grid gap-1 px-2 py-2 text-sm font-medium border-b items-center ${isDark ? 'border-white/10 text-white/60' : 'border-black/10 text-slate-500'}`} style={{ gridTemplateColumns: gridCols }}>
|
||||
<div>{/* insert-triangle spacer */}</div>
|
||||
<div className="flex items-center justify-center">
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={h.vocabulary.length > 0 && h.vocabulary.every(v => v.selected)}
|
||||
onChange={h.toggleAllSelection}
|
||||
className="w-4 h-4 rounded border-gray-300 text-purple-600 focus:ring-purple-500 cursor-pointer"
|
||||
title="Alle auswaehlen"
|
||||
/>
|
||||
</div>
|
||||
<div>S.</div>
|
||||
<div>Englisch</div>
|
||||
<div>Deutsch</div>
|
||||
<div>Beispiel</div>
|
||||
{extras.map(col => (
|
||||
<div key={col.key} className="flex items-center gap-1 group">
|
||||
<span className="truncate">{col.label}</span>
|
||||
<button
|
||||
onClick={() => {
|
||||
const page = Object.entries(h.pageExtraColumns).find(([, cols]) => cols.some(c => c.key === col.key))
|
||||
if (page) h.removeExtraColumn(Number(page[0]), col.key)
|
||||
}}
|
||||
className={`opacity-0 group-hover:opacity-100 transition-opacity ${isDark ? 'text-red-400 hover:text-red-300' : 'text-red-500 hover:text-red-600'}`}
|
||||
title="Spalte entfernen"
|
||||
>
|
||||
<svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" /></svg>
|
||||
</button>
|
||||
</div>
|
||||
))}
|
||||
<div className="flex items-center justify-center">
|
||||
<button
|
||||
onClick={() => h.addExtraColumn(0)}
|
||||
className={`p-0.5 rounded transition-colors ${isDark ? 'hover:bg-white/10 text-white/40 hover:text-white/70' : 'hover:bg-slate-200 text-slate-400 hover:text-slate-600'}`}
|
||||
title="Spalte hinzufuegen"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 4v16m8-8H4" /></svg>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Scrollable Content */}
|
||||
<div className="flex-1 overflow-y-auto">
|
||||
{h.vocabulary.map((entry, index) => (
|
||||
<React.Fragment key={entry.id}>
|
||||
<div className={`grid gap-1 px-2 py-1 items-center ${isDark ? 'hover:bg-white/5' : 'hover:bg-black/5'}`} style={{ gridTemplateColumns: gridCols }}>
|
||||
<button
|
||||
onClick={() => h.addVocabularyEntry(index)}
|
||||
className={`w-3.5 h-3.5 flex items-center justify-center opacity-0 hover:opacity-100 transition-opacity ${isDark ? 'text-purple-400' : 'text-purple-500'}`}
|
||||
title="Zeile einfuegen"
|
||||
>
|
||||
<svg className="w-2.5 h-2.5" viewBox="0 0 10 10" fill="currentColor"><polygon points="0,0 10,5 0,10" /></svg>
|
||||
</button>
|
||||
<div className="flex items-center justify-center">
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={entry.selected || false}
|
||||
onChange={() => h.toggleVocabularySelection(entry.id)}
|
||||
className="w-4 h-4 rounded border-gray-300 text-purple-600 focus:ring-purple-500 cursor-pointer"
|
||||
/>
|
||||
</div>
|
||||
<div className={`flex items-center justify-center text-xs font-medium rounded ${isDark ? 'bg-white/10 text-white/60' : 'bg-black/10 text-slate-600'}`}>
|
||||
{entry.source_page || '-'}
|
||||
</div>
|
||||
<input
|
||||
type="text"
|
||||
value={entry.english}
|
||||
onChange={(e) => h.updateVocabularyEntry(entry.id, 'english', e.target.value)}
|
||||
className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
value={entry.german}
|
||||
onChange={(e) => h.updateVocabularyEntry(entry.id, 'german', e.target.value)}
|
||||
className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
value={entry.example_sentence || ''}
|
||||
onChange={(e) => h.updateVocabularyEntry(entry.id, 'example_sentence', e.target.value)}
|
||||
placeholder="Beispiel"
|
||||
className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
|
||||
/>
|
||||
{extras.map(col => (
|
||||
<input
|
||||
key={col.key}
|
||||
type="text"
|
||||
value={(entry.extras && entry.extras[col.key]) || ''}
|
||||
onChange={(e) => h.updateVocabularyEntry(entry.id, col.key, e.target.value)}
|
||||
placeholder={col.label}
|
||||
className={`px-2 py-1 rounded-lg border text-sm min-w-0 ${glassInput} focus:outline-none focus:ring-1 focus:ring-purple-500`}
|
||||
/>
|
||||
))}
|
||||
<button onClick={() => h.deleteVocabularyEntry(entry.id)} className={`p-1 rounded-lg ${isDark ? 'hover:bg-red-500/20 text-red-400' : 'hover:bg-red-100 text-red-500'}`}>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
</React.Fragment>
|
||||
))}
|
||||
{/* Final insert triangle */}
|
||||
<div className="px-2 py-1">
|
||||
<button
|
||||
onClick={() => h.addVocabularyEntry()}
|
||||
className={`w-3.5 h-3.5 flex items-center justify-center opacity-30 hover:opacity-100 transition-opacity ${isDark ? 'text-purple-400' : 'text-purple-500'}`}
|
||||
title="Zeile am Ende einfuegen"
|
||||
>
|
||||
<svg className="w-2.5 h-2.5" viewBox="0 0 10 10" fill="currentColor"><polygon points="0,0 10,5 0,10" /></svg>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Footer */}
|
||||
<div className={`flex-shrink-0 pt-2 border-t flex items-center justify-between text-xs ${isDark ? 'border-white/10 text-white/50' : 'border-black/10 text-slate-400'}`}>
|
||||
<span>
|
||||
{h.vocabulary.length} Vokabeln
|
||||
{h.vocabulary.filter(v => v.selected).length > 0 && ` (${h.vocabulary.filter(v => v.selected).length} ausgewaehlt)`}
|
||||
{(() => {
|
||||
const pages = [...new Set(h.vocabulary.map(v => v.source_page).filter(Boolean))].sort((a, b) => (a || 0) - (b || 0))
|
||||
return pages.length > 1 ? ` • Seiten: ${pages.join(', ')}` : ''
|
||||
})()}
|
||||
</span>
|
||||
<button
|
||||
onClick={() => h.addVocabularyEntry()}
|
||||
className={`px-3 py-1 rounded-lg text-xs flex items-center gap-1 transition-colors ${
|
||||
isDark
|
||||
? 'bg-white/10 hover:bg-white/20 text-white/70'
|
||||
: 'bg-slate-100 hover:bg-slate-200 text-slate-600'
|
||||
}`}
|
||||
>
|
||||
<svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 4v16m8-8H4" />
|
||||
</svg>
|
||||
Zeile
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
155
studio-v2/app/vocab-worksheet/components/WorksheetTab.tsx
Normal file
155
studio-v2/app/vocab-worksheet/components/WorksheetTab.tsx
Normal file
@@ -0,0 +1,155 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import type { VocabWorksheetHook } from '../types'
|
||||
import { worksheetFormats, worksheetTypes } from '../constants'
|
||||
|
||||
export function WorksheetTab({ h }: { h: VocabWorksheetHook }) {
|
||||
const { isDark, glassCard, glassInput } = h
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
{/* Step 1: Format Selection */}
|
||||
<div className="mb-8">
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
1. Vorlage waehlen
|
||||
</h2>
|
||||
<div className="grid grid-cols-2 gap-4">
|
||||
{worksheetFormats.map((format) => (
|
||||
<button
|
||||
key={format.id}
|
||||
onClick={() => h.setSelectedFormat(format.id)}
|
||||
className={`p-5 rounded-xl border text-left transition-all ${
|
||||
h.selectedFormat === format.id
|
||||
? (isDark ? 'border-purple-400/50 bg-purple-500/20 ring-2 ring-purple-500/50' : 'border-purple-500 bg-purple-50 ring-2 ring-purple-500/30')
|
||||
: (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
|
||||
}`}
|
||||
>
|
||||
<div className="flex items-start gap-3">
|
||||
<div className={`w-10 h-10 rounded-lg flex items-center justify-center shrink-0 ${
|
||||
h.selectedFormat === format.id
|
||||
? (isDark ? 'bg-purple-500/30' : 'bg-purple-200')
|
||||
: (isDark ? 'bg-white/10' : 'bg-slate-100')
|
||||
}`}>
|
||||
{format.id === 'standard' ? (
|
||||
<svg className={`w-5 h-5 ${h.selectedFormat === format.id ? 'text-purple-400' : (isDark ? 'text-white/60' : 'text-slate-500')}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
||||
</svg>
|
||||
) : (
|
||||
<svg className={`w-5 h-5 ${h.selectedFormat === format.id ? 'text-purple-400' : (isDark ? 'text-white/60' : 'text-slate-500')}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M4 5a1 1 0 011-1h14a1 1 0 011 1v2a1 1 0 01-1 1H5a1 1 0 01-1-1V5zM4 13a1 1 0 011-1h6a1 1 0 011 1v6a1 1 0 01-1 1H5a1 1 0 01-1-1v-6zM16 13a1 1 0 011-1h2a1 1 0 011 1v6a1 1 0 01-1 1h-2a1 1 0 01-1-1v-6z" />
|
||||
</svg>
|
||||
)}
|
||||
</div>
|
||||
<div className="flex-1">
|
||||
<div className="flex items-center justify-between">
|
||||
<span className={`font-medium ${isDark ? 'text-white' : 'text-slate-900'}`}>{format.label}</span>
|
||||
{h.selectedFormat === format.id && (
|
||||
<svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
)}
|
||||
</div>
|
||||
<p className={`text-sm mt-1 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{format.description}</p>
|
||||
</div>
|
||||
</div>
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Step 2: Configuration */}
|
||||
<div className="mb-6">
|
||||
<h2 className={`text-lg font-semibold mb-4 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
2. Arbeitsblatt konfigurieren
|
||||
</h2>
|
||||
|
||||
{/* Title */}
|
||||
<div className="mb-6">
|
||||
<label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Titel</label>
|
||||
<input
|
||||
type="text"
|
||||
value={h.worksheetTitle}
|
||||
onChange={(e) => h.setWorksheetTitle(e.target.value)}
|
||||
placeholder="z.B. Vokabeln Unit 3"
|
||||
className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Standard format options */}
|
||||
{h.selectedFormat === 'standard' && (
|
||||
<>
|
||||
<div className="mb-6">
|
||||
<label className={`block text-sm font-medium mb-3 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Arbeitsblatt-Typen</label>
|
||||
<div className="grid grid-cols-2 gap-3">
|
||||
{worksheetTypes.map((type) => (
|
||||
<button
|
||||
key={type.id}
|
||||
onClick={() => h.toggleWorksheetType(type.id)}
|
||||
className={`p-4 rounded-xl border text-left transition-all ${
|
||||
h.selectedTypes.includes(type.id)
|
||||
? (isDark ? 'border-purple-400/50 bg-purple-500/20' : 'border-purple-500 bg-purple-50')
|
||||
: (isDark ? 'border-white/20 hover:border-white/40' : 'border-slate-200 hover:border-slate-300')
|
||||
}`}
|
||||
>
|
||||
<div className="flex items-center justify-between">
|
||||
<span className={`font-medium ${isDark ? 'text-white' : 'text-slate-900'}`}>{type.label}</span>
|
||||
{h.selectedTypes.includes(type.id) && <svg className="w-5 h-5 text-purple-500" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" /></svg>}
|
||||
</div>
|
||||
<p className={`text-sm mt-1 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>{type.description}</p>
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 gap-6 mb-6">
|
||||
<div>
|
||||
<label className={`block text-sm font-medium mb-2 ${isDark ? 'text-white/60' : 'text-slate-500'}`}>Zeilenhoehe</label>
|
||||
<select value={h.lineHeight} onChange={(e) => h.setLineHeight(e.target.value)} className={`w-full px-4 py-3 rounded-xl border ${glassInput} focus:outline-none focus:ring-2 focus:ring-purple-500`}>
|
||||
<option value="normal">Normal</option>
|
||||
<option value="large">Gross</option>
|
||||
<option value="extra-large">Extra gross</option>
|
||||
</select>
|
||||
</div>
|
||||
<div className="flex items-center">
|
||||
<label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
<input type="checkbox" checked={h.includeSolutions} onChange={(e) => h.setIncludeSolutions(e.target.checked)} className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500" />
|
||||
<span>Loesungsblatt erstellen</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
|
||||
{/* NRU format options */}
|
||||
{h.selectedFormat === 'nru' && (
|
||||
<div className="space-y-4">
|
||||
<div className={`p-4 rounded-xl ${isDark ? 'bg-indigo-500/20 border border-indigo-500/30' : 'bg-indigo-50 border border-indigo-200'}`}>
|
||||
<h4 className={`font-medium mb-2 ${isDark ? 'text-indigo-200' : 'text-indigo-700'}`}>NRU-Format Uebersicht:</h4>
|
||||
<ul className={`text-sm space-y-1 ${isDark ? 'text-indigo-200/80' : 'text-indigo-600'}`}>
|
||||
<li>• <strong>Vokabeln:</strong> 3-Spalten-Tabelle (Englisch | Deutsch leer | Korrektur leer)</li>
|
||||
<li>• <strong>Lernsaetze:</strong> Deutscher Satz + 2 leere Zeilen fuer englische Uebersetzung</li>
|
||||
<li>• Pro gescannter Seite werden 2 Arbeitsblatt-Seiten erzeugt</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div className="flex items-center">
|
||||
<label className={`flex items-center gap-3 cursor-pointer ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
<input type="checkbox" checked={h.includeSolutions} onChange={(e) => h.setIncludeSolutions(e.target.checked)} className="w-5 h-5 rounded border-2 border-purple-500 text-purple-500 focus:ring-purple-500" />
|
||||
<span>Loesungsblatt erstellen (mit deutschen Uebersetzungen)</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<button
|
||||
onClick={h.generateWorksheet}
|
||||
disabled={(h.selectedFormat === 'standard' && h.selectedTypes.length === 0) || h.isGenerating}
|
||||
className="w-full py-4 bg-gradient-to-r from-purple-500 to-pink-500 text-white rounded-xl font-semibold disabled:opacity-50 hover:shadow-xl hover:shadow-purple-500/30 transition-all"
|
||||
>
|
||||
{h.isGenerating ? 'Generiere PDF...' : `${h.selectedFormat === 'nru' ? 'NRU-Arbeitsblatt' : 'Arbeitsblatt'} generieren`}
|
||||
</button>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
56
studio-v2/app/vocab-worksheet/constants.ts
Normal file
56
studio-v2/app/vocab-worksheet/constants.ts
Normal file
@@ -0,0 +1,56 @@
|
||||
import type { OcrPrompts, WorksheetFormat, WorksheetType } from './types'
|
||||
|
||||
// API Base URL - dynamisch basierend auf Browser-Host
|
||||
// Verwendet /klausur-api/ Proxy um Zertifikat-Probleme zu vermeiden
|
||||
export const getApiBase = () => {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8086'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8086'
|
||||
return `${protocol}//${hostname}/klausur-api`
|
||||
}
|
||||
|
||||
// LocalStorage Keys
|
||||
export const DOCUMENTS_KEY = 'bp_documents'
|
||||
export const OCR_PROMPTS_KEY = 'bp_ocr_prompts'
|
||||
export const SESSION_ID_KEY = 'bp_upload_session'
|
||||
|
||||
// Worksheet format templates
|
||||
export const worksheetFormats: { id: WorksheetFormat; label: string; description: string; icon: string }[] = [
|
||||
{
|
||||
id: 'standard',
|
||||
label: 'Standard-Format',
|
||||
description: 'Klassisches Arbeitsblatt mit waehlbarer Uebersetzungsrichtung',
|
||||
icon: 'document'
|
||||
},
|
||||
{
|
||||
id: 'nru',
|
||||
label: 'NRU-Vorlage',
|
||||
description: '3-Spalten-Tabelle (EN|DE|Korrektur) + Lernsaetze mit Uebersetzungszeilen',
|
||||
icon: 'template'
|
||||
},
|
||||
]
|
||||
|
||||
// Default OCR filtering prompts
|
||||
export const defaultOcrPrompts: OcrPrompts = {
|
||||
filterHeaders: true,
|
||||
filterFooters: true,
|
||||
filterPageNumbers: true,
|
||||
customFilter: '',
|
||||
headerPatterns: ['Unit', 'Chapter', 'Lesson', 'Kapitel', 'Lektion'],
|
||||
footerPatterns: ['zweihundert', 'dreihundert', 'vierhundert', 'Page', 'Seite']
|
||||
}
|
||||
|
||||
export const worksheetTypes: { id: WorksheetType; label: string; description: string }[] = [
|
||||
{ id: 'en_to_de', label: 'Englisch → Deutsch', description: 'Englische Woerter uebersetzen' },
|
||||
{ id: 'de_to_en', label: 'Deutsch → Englisch', description: 'Deutsche Woerter uebersetzen' },
|
||||
{ id: 'copy', label: 'Abschreibuebung', description: 'Woerter mehrfach schreiben' },
|
||||
{ id: 'gap_fill', label: 'Lueckensaetze', description: 'Saetze mit Luecken ausfuellen' },
|
||||
]
|
||||
|
||||
export const formatFileSize = (bytes: number): string => {
|
||||
if (bytes === 0) return '0 B'
|
||||
const k = 1024
|
||||
const sizes = ['B', 'KB', 'MB', 'GB']
|
||||
const i = Math.floor(Math.log(bytes) / Math.log(k))
|
||||
return parseFloat((bytes / Math.pow(k, i)).toFixed(1)) + ' ' + sizes[i]
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
189
studio-v2/app/vocab-worksheet/types.ts
Normal file
189
studio-v2/app/vocab-worksheet/types.ts
Normal file
@@ -0,0 +1,189 @@
|
||||
import { UploadedFile } from '@/components/QRCodeUpload'
|
||||
|
||||
export interface VocabularyEntry {
|
||||
id: string
|
||||
english: string
|
||||
german: string
|
||||
example_sentence?: string
|
||||
example_sentence_gap?: string
|
||||
word_type?: string
|
||||
source_page?: number
|
||||
selected?: boolean
|
||||
extras?: Record<string, string>
|
||||
}
|
||||
|
||||
export interface ExtraColumn {
|
||||
key: string
|
||||
label: string
|
||||
}
|
||||
|
||||
export interface Session {
|
||||
id: string
|
||||
name: string
|
||||
status: string
|
||||
vocabulary_count: number
|
||||
image_path?: string
|
||||
description?: string
|
||||
source_language?: string
|
||||
target_language?: string
|
||||
created_at?: string
|
||||
}
|
||||
|
||||
export interface StoredDocument {
|
||||
id: string
|
||||
name: string
|
||||
type: string
|
||||
size: number
|
||||
uploadedAt: Date
|
||||
url?: string
|
||||
}
|
||||
|
||||
export interface OcrPrompts {
|
||||
filterHeaders: boolean
|
||||
filterFooters: boolean
|
||||
filterPageNumbers: boolean
|
||||
customFilter: string
|
||||
headerPatterns: string[]
|
||||
footerPatterns: string[]
|
||||
}
|
||||
|
||||
export type TabId = 'upload' | 'pages' | 'vocabulary' | 'spreadsheet' | 'worksheet' | 'export' | 'settings'
|
||||
export type WorksheetType = 'en_to_de' | 'de_to_en' | 'copy' | 'gap_fill'
|
||||
export type WorksheetFormat = 'standard' | 'nru'
|
||||
export type IpaMode = 'auto' | 'en' | 'de' | 'all' | 'none'
|
||||
export type SyllableMode = 'auto' | 'en' | 'de' | 'all' | 'none'
|
||||
|
||||
/** Return type of useVocabWorksheet — used as props by all child components */
|
||||
export interface VocabWorksheetHook {
|
||||
// Mounted (SSR guard)
|
||||
mounted: boolean
|
||||
|
||||
// Theme
|
||||
isDark: boolean
|
||||
glassCard: string
|
||||
glassInput: string
|
||||
|
||||
// Tab
|
||||
activeTab: TabId
|
||||
setActiveTab: (tab: TabId) => void
|
||||
|
||||
// Session
|
||||
session: Session | null
|
||||
sessionName: string
|
||||
setSessionName: (name: string) => void
|
||||
isCreatingSession: boolean
|
||||
error: string | null
|
||||
setError: (err: string | null) => void
|
||||
extractionStatus: string
|
||||
|
||||
// Existing sessions
|
||||
existingSessions: Session[]
|
||||
isLoadingSessions: boolean
|
||||
|
||||
// Documents
|
||||
storedDocuments: StoredDocument[]
|
||||
selectedDocumentId: string | null
|
||||
setSelectedDocumentId: (id: string | null) => void
|
||||
|
||||
// Direct file
|
||||
directFile: File | null
|
||||
setDirectFile: (f: File | null) => void
|
||||
directFilePreview: string | null
|
||||
showFullPreview: boolean
|
||||
setShowFullPreview: (show: boolean) => void
|
||||
directFileInputRef: React.RefObject<HTMLInputElement | null>
|
||||
|
||||
// PDF pages
|
||||
pdfPageCount: number
|
||||
selectedPages: number[]
|
||||
pagesThumbnails: string[]
|
||||
isLoadingThumbnails: boolean
|
||||
excludedPages: number[]
|
||||
|
||||
// Extra columns
|
||||
pageExtraColumns: Record<number, ExtraColumn[]>
|
||||
|
||||
// Upload
|
||||
uploadedImage: string | null
|
||||
isExtracting: boolean
|
||||
|
||||
// Vocabulary
|
||||
vocabulary: VocabularyEntry[]
|
||||
|
||||
// Worksheet
|
||||
selectedTypes: WorksheetType[]
|
||||
worksheetTitle: string
|
||||
setWorksheetTitle: (title: string) => void
|
||||
includeSolutions: boolean
|
||||
setIncludeSolutions: (inc: boolean) => void
|
||||
lineHeight: string
|
||||
setLineHeight: (lh: string) => void
|
||||
selectedFormat: WorksheetFormat
|
||||
setSelectedFormat: (f: WorksheetFormat) => void
|
||||
ipaMode: IpaMode
|
||||
setIpaMode: (m: IpaMode) => void
|
||||
syllableMode: SyllableMode
|
||||
setSyllableMode: (m: SyllableMode) => void
|
||||
|
||||
// Export
|
||||
worksheetId: string | null
|
||||
isGenerating: boolean
|
||||
|
||||
// Processing
|
||||
processingErrors: string[]
|
||||
successfulPages: number[]
|
||||
failedPages: number[]
|
||||
currentlyProcessingPage: number | null
|
||||
|
||||
// OCR settings
|
||||
ocrPrompts: OcrPrompts
|
||||
showSettings: boolean
|
||||
setShowSettings: (show: boolean) => void
|
||||
|
||||
// QR
|
||||
showQRModal: boolean
|
||||
setShowQRModal: (show: boolean) => void
|
||||
uploadSessionId: string
|
||||
mobileUploadedFiles: UploadedFile[]
|
||||
selectedMobileFile: UploadedFile | null
|
||||
setSelectedMobileFile: (f: UploadedFile | null) => void
|
||||
setMobileUploadedFiles: (files: UploadedFile[]) => void
|
||||
|
||||
// OCR Comparison
|
||||
showOcrComparison: boolean
|
||||
setShowOcrComparison: (show: boolean) => void
|
||||
ocrComparePageIndex: number | null
|
||||
ocrCompareResult: any
|
||||
isComparingOcr: boolean
|
||||
ocrCompareError: string | null
|
||||
|
||||
// Handlers
|
||||
handleDirectFileSelect: (e: React.ChangeEvent<HTMLInputElement>) => void
|
||||
startSession: () => Promise<void>
|
||||
processSelectedPages: () => Promise<void>
|
||||
togglePageSelection: (idx: number) => void
|
||||
selectAllPages: () => void
|
||||
selectNoPages: () => void
|
||||
excludePage: (idx: number, e: React.MouseEvent) => void
|
||||
restoreExcludedPages: () => void
|
||||
runOcrComparison: (pageIdx: number) => Promise<void>
|
||||
updateVocabularyEntry: (id: string, field: string, value: string) => void
|
||||
addExtraColumn: (page: number) => void
|
||||
removeExtraColumn: (page: number, key: string) => void
|
||||
getExtraColumnsForPage: (page: number) => ExtraColumn[]
|
||||
getAllExtraColumns: () => ExtraColumn[]
|
||||
deleteVocabularyEntry: (id: string) => void
|
||||
toggleVocabularySelection: (id: string) => void
|
||||
toggleAllSelection: () => void
|
||||
addVocabularyEntry: (atIndex?: number) => void
|
||||
saveVocabulary: () => Promise<void>
|
||||
generateWorksheet: () => Promise<void>
|
||||
downloadPDF: (type: 'worksheet' | 'solution') => void
|
||||
toggleWorksheetType: (type: WorksheetType) => void
|
||||
resumeSession: (session: Session) => Promise<void>
|
||||
resetSession: () => Promise<void>
|
||||
deleteSession: (id: string, e: React.MouseEvent) => Promise<void>
|
||||
saveOcrPrompts: (prompts: OcrPrompts) => void
|
||||
formatFileSize: (bytes: number) => string
|
||||
reprocessPages: (ipa: IpaMode, syllable: SyllableMode) => void
|
||||
}
|
||||
860
studio-v2/app/vocab-worksheet/useVocabWorksheet.ts
Normal file
860
studio-v2/app/vocab-worksheet/useVocabWorksheet.ts
Normal file
@@ -0,0 +1,860 @@
|
||||
'use client'
|
||||
|
||||
import { useState, useRef, useEffect } from 'react'
|
||||
import { useTheme } from '@/lib/ThemeContext'
|
||||
import { useLanguage } from '@/lib/LanguageContext'
|
||||
import { useRouter } from 'next/navigation'
|
||||
import { useActivity } from '@/lib/ActivityContext'
|
||||
import type { UploadedFile } from '@/components/QRCodeUpload'
|
||||
|
||||
import type {
|
||||
VocabularyEntry, ExtraColumn, Session, StoredDocument, OcrPrompts,
|
||||
TabId, WorksheetType, WorksheetFormat, IpaMode, SyllableMode,
|
||||
VocabWorksheetHook,
|
||||
} from './types'
|
||||
import {
|
||||
getApiBase, DOCUMENTS_KEY, OCR_PROMPTS_KEY, SESSION_ID_KEY,
|
||||
defaultOcrPrompts, formatFileSize,
|
||||
} from './constants'
|
||||
|
||||
export function useVocabWorksheet(): VocabWorksheetHook {
|
||||
const { isDark } = useTheme()
|
||||
const { t } = useLanguage()
|
||||
const router = useRouter()
|
||||
const { startActivity, completeActivity } = useActivity()
|
||||
const [mounted, setMounted] = useState(false)
|
||||
|
||||
// Tab state
|
||||
const [activeTab, setActiveTab] = useState<TabId>('upload')
|
||||
|
||||
// Session state
|
||||
const [session, setSession] = useState<Session | null>(null)
|
||||
const [sessionName, setSessionName] = useState('')
|
||||
const [isCreatingSession, setIsCreatingSession] = useState(false)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [extractionStatus, setExtractionStatus] = useState<string>('')
|
||||
|
||||
// Existing sessions list
|
||||
const [existingSessions, setExistingSessions] = useState<Session[]>([])
|
||||
const [isLoadingSessions, setIsLoadingSessions] = useState(true)
|
||||
|
||||
// Documents from storage
|
||||
const [storedDocuments, setStoredDocuments] = useState<StoredDocument[]>([])
|
||||
const [selectedDocumentId, setSelectedDocumentId] = useState<string | null>(null)
|
||||
|
||||
// Direct file upload
|
||||
const [directFile, setDirectFile] = useState<File | null>(null)
|
||||
const [directFilePreview, setDirectFilePreview] = useState<string | null>(null)
|
||||
const [showFullPreview, setShowFullPreview] = useState(false)
|
||||
const directFileInputRef = useRef<HTMLInputElement>(null)
|
||||
|
||||
// PDF page selection state
|
||||
const [pdfPageCount, setPdfPageCount] = useState<number>(0)
|
||||
const [selectedPages, setSelectedPages] = useState<number[]>([])
|
||||
const [pagesThumbnails, setPagesThumbnails] = useState<string[]>([])
|
||||
const [isLoadingThumbnails, setIsLoadingThumbnails] = useState(false)
|
||||
const [excludedPages, setExcludedPages] = useState<number[]>([])
|
||||
|
||||
// Dynamic extra columns per source page
|
||||
const [pageExtraColumns, setPageExtraColumns] = useState<Record<number, ExtraColumn[]>>({})
|
||||
|
||||
// Upload state
|
||||
const [uploadedImage, setUploadedImage] = useState<string | null>(null)
|
||||
const [isExtracting, setIsExtracting] = useState(false)
|
||||
const fileInputRef = useRef<HTMLInputElement>(null)
|
||||
|
||||
// Vocabulary state
|
||||
const [vocabulary, setVocabulary] = useState<VocabularyEntry[]>([])
|
||||
|
||||
// Worksheet state
|
||||
const [selectedTypes, setSelectedTypes] = useState<WorksheetType[]>(['en_to_de'])
|
||||
const [worksheetTitle, setWorksheetTitle] = useState('')
|
||||
const [includeSolutions, setIncludeSolutions] = useState(true)
|
||||
const [lineHeight, setLineHeight] = useState('normal')
|
||||
const [selectedFormat, setSelectedFormat] = useState<WorksheetFormat>('standard')
|
||||
const [ipaMode, setIpaMode] = useState<IpaMode>('none')
|
||||
const [syllableMode, setSyllableMode] = useState<SyllableMode>('none')
|
||||
|
||||
// Export state
|
||||
const [worksheetId, setWorksheetId] = useState<string | null>(null)
|
||||
const [isGenerating, setIsGenerating] = useState(false)
|
||||
|
||||
// Processing results
|
||||
const [processingErrors, setProcessingErrors] = useState<string[]>([])
|
||||
const [successfulPages, setSuccessfulPages] = useState<number[]>([])
|
||||
const [failedPages, setFailedPages] = useState<number[]>([])
|
||||
const [currentlyProcessingPage, setCurrentlyProcessingPage] = useState<number | null>(null)
|
||||
const [processingQueue, setProcessingQueue] = useState<number[]>([])
|
||||
|
||||
// OCR Prompts/Settings
|
||||
const [ocrPrompts, setOcrPrompts] = useState<OcrPrompts>(defaultOcrPrompts)
|
||||
const [showSettings, setShowSettings] = useState(false)
|
||||
|
||||
// QR Code Upload
|
||||
const [showQRModal, setShowQRModal] = useState(false)
|
||||
const [uploadSessionId, setUploadSessionId] = useState('')
|
||||
const [mobileUploadedFiles, setMobileUploadedFiles] = useState<UploadedFile[]>([])
|
||||
const [selectedMobileFile, setSelectedMobileFile] = useState<UploadedFile | null>(null)
|
||||
|
||||
// OCR Comparison
|
||||
const [showOcrComparison, setShowOcrComparison] = useState(false)
|
||||
const [ocrComparePageIndex, setOcrComparePageIndex] = useState<number | null>(null)
|
||||
const [ocrCompareResult, setOcrCompareResult] = useState<any>(null)
|
||||
const [isComparingOcr, setIsComparingOcr] = useState(false)
|
||||
const [ocrCompareError, setOcrCompareError] = useState<string | null>(null)
|
||||
|
||||
// --- Effects ---
|
||||
|
||||
// SSR Safety
|
||||
useEffect(() => {
|
||||
setMounted(true)
|
||||
let storedSessionId = localStorage.getItem(SESSION_ID_KEY)
|
||||
if (!storedSessionId) {
|
||||
storedSessionId = `vocab-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`
|
||||
localStorage.setItem(SESSION_ID_KEY, storedSessionId)
|
||||
}
|
||||
setUploadSessionId(storedSessionId)
|
||||
}, [])
|
||||
|
||||
// Load OCR prompts from localStorage
|
||||
useEffect(() => {
|
||||
if (!mounted) return
|
||||
const stored = localStorage.getItem(OCR_PROMPTS_KEY)
|
||||
if (stored) {
|
||||
try {
|
||||
setOcrPrompts({ ...defaultOcrPrompts, ...JSON.parse(stored) })
|
||||
} catch (e) {
|
||||
console.error('Failed to parse OCR prompts:', e)
|
||||
}
|
||||
}
|
||||
}, [mounted])
|
||||
|
||||
// Load documents from localStorage
|
||||
useEffect(() => {
|
||||
if (!mounted) return
|
||||
const stored = localStorage.getItem(DOCUMENTS_KEY)
|
||||
if (stored) {
|
||||
try {
|
||||
const docs = JSON.parse(stored)
|
||||
const imagesDocs = docs.filter((d: StoredDocument) =>
|
||||
d.type?.startsWith('image/') || d.type === 'application/pdf'
|
||||
)
|
||||
setStoredDocuments(imagesDocs)
|
||||
} catch (e) {
|
||||
console.error('Failed to parse stored documents:', e)
|
||||
}
|
||||
}
|
||||
}, [mounted])
|
||||
|
||||
// Load existing sessions from API
|
||||
useEffect(() => {
|
||||
if (!mounted) return
|
||||
const loadSessions = async () => {
|
||||
const API_BASE = getApiBase()
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions`)
|
||||
if (res.ok) {
|
||||
const sessions = await res.json()
|
||||
setExistingSessions(sessions)
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('Failed to load sessions:', e)
|
||||
} finally {
|
||||
setIsLoadingSessions(false)
|
||||
}
|
||||
}
|
||||
loadSessions()
|
||||
}, [mounted])
|
||||
|
||||
// --- Glassmorphism styles ---
|
||||
|
||||
const glassCard = isDark
|
||||
? 'backdrop-blur-xl bg-white/10 border border-white/20'
|
||||
: 'backdrop-blur-xl bg-white/70 border border-black/10'
|
||||
|
||||
const glassInput = isDark
|
||||
? 'bg-white/10 border-white/20 text-white placeholder-white/40 focus:border-purple-400'
|
||||
: 'bg-white/50 border-black/10 text-slate-900 placeholder-slate-400 focus:border-purple-500'
|
||||
|
||||
// --- Handlers ---
|
||||
|
||||
const saveOcrPrompts = (prompts: OcrPrompts) => {
|
||||
setOcrPrompts(prompts)
|
||||
localStorage.setItem(OCR_PROMPTS_KEY, JSON.stringify(prompts))
|
||||
}
|
||||
|
||||
const handleDirectFileSelect = (e: React.ChangeEvent<HTMLInputElement>) => {
|
||||
const file = e.target.files?.[0]
|
||||
if (!file) return
|
||||
|
||||
setDirectFile(file)
|
||||
setSelectedDocumentId(null)
|
||||
setSelectedMobileFile(null)
|
||||
|
||||
if (file.type.startsWith('image/')) {
|
||||
const reader = new FileReader()
|
||||
reader.onload = (ev) => {
|
||||
setDirectFilePreview(ev.target?.result as string)
|
||||
}
|
||||
reader.readAsDataURL(file)
|
||||
} else if (file.type === 'application/pdf') {
|
||||
setDirectFilePreview(URL.createObjectURL(file))
|
||||
} else {
|
||||
setDirectFilePreview(null)
|
||||
}
|
||||
}
|
||||
|
||||
const startSession = async () => {
|
||||
if (!sessionName.trim()) {
|
||||
setError('Bitte geben Sie einen Namen fuer die Session ein.')
|
||||
return
|
||||
}
|
||||
if (!selectedDocumentId && !directFile && !selectedMobileFile) {
|
||||
setError('Bitte waehlen Sie ein Dokument aus oder laden Sie eine Datei hoch.')
|
||||
return
|
||||
}
|
||||
|
||||
setError(null)
|
||||
setIsCreatingSession(true)
|
||||
setExtractionStatus('Session wird erstellt...')
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
const sessionRes = await fetch(`${API_BASE}/api/v1/vocab/sessions`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
name: sessionName,
|
||||
ocr_prompts: ocrPrompts
|
||||
}),
|
||||
})
|
||||
|
||||
if (!sessionRes.ok) {
|
||||
throw new Error('Session konnte nicht erstellt werden')
|
||||
}
|
||||
|
||||
const sessionData = await sessionRes.json()
|
||||
setSession(sessionData)
|
||||
setWorksheetTitle(sessionName)
|
||||
|
||||
startActivity('vocab_extraction', { description: sessionName })
|
||||
|
||||
let file: File
|
||||
let isPdf = false
|
||||
|
||||
if (directFile) {
|
||||
file = directFile
|
||||
isPdf = directFile.type === 'application/pdf'
|
||||
} else if (selectedMobileFile) {
|
||||
isPdf = selectedMobileFile.type === 'application/pdf'
|
||||
const base64Data = selectedMobileFile.dataUrl.split(',')[1]
|
||||
const byteCharacters = atob(base64Data)
|
||||
const byteNumbers = new Array(byteCharacters.length)
|
||||
for (let i = 0; i < byteCharacters.length; i++) {
|
||||
byteNumbers[i] = byteCharacters.charCodeAt(i)
|
||||
}
|
||||
const byteArray = new Uint8Array(byteNumbers)
|
||||
const blob = new Blob([byteArray], { type: selectedMobileFile.type })
|
||||
file = new File([blob], selectedMobileFile.name, { type: selectedMobileFile.type })
|
||||
} else {
|
||||
const selectedDoc = storedDocuments.find(d => d.id === selectedDocumentId)
|
||||
if (!selectedDoc || !selectedDoc.url) {
|
||||
throw new Error('Das ausgewaehlte Dokument ist nicht verfuegbar.')
|
||||
}
|
||||
|
||||
isPdf = selectedDoc.type === 'application/pdf'
|
||||
|
||||
const base64Data = selectedDoc.url.split(',')[1]
|
||||
const byteCharacters = atob(base64Data)
|
||||
const byteNumbers = new Array(byteCharacters.length)
|
||||
for (let i = 0; i < byteCharacters.length; i++) {
|
||||
byteNumbers[i] = byteCharacters.charCodeAt(i)
|
||||
}
|
||||
const byteArray = new Uint8Array(byteNumbers)
|
||||
const blob = new Blob([byteArray], { type: selectedDoc.type })
|
||||
file = new File([blob], selectedDoc.name, { type: selectedDoc.type })
|
||||
}
|
||||
|
||||
if (isPdf) {
|
||||
setExtractionStatus('PDF wird hochgeladen...')
|
||||
|
||||
const formData = new FormData()
|
||||
formData.append('file', file)
|
||||
|
||||
const pdfInfoRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${sessionData.id}/upload-pdf-info`, {
|
||||
method: 'POST',
|
||||
body: formData,
|
||||
})
|
||||
|
||||
if (!pdfInfoRes.ok) {
|
||||
throw new Error('PDF konnte nicht verarbeitet werden')
|
||||
}
|
||||
|
||||
const pdfInfo = await pdfInfoRes.json()
|
||||
setPdfPageCount(pdfInfo.page_count)
|
||||
setSelectedPages(Array.from({ length: pdfInfo.page_count }, (_, i) => i))
|
||||
|
||||
setActiveTab('pages')
|
||||
setExtractionStatus(`${pdfInfo.page_count} Seiten erkannt. Vorschau wird geladen...`)
|
||||
setIsLoadingThumbnails(true)
|
||||
|
||||
const thumbnails: string[] = []
|
||||
for (let i = 0; i < pdfInfo.page_count; i++) {
|
||||
try {
|
||||
const thumbRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${sessionData.id}/pdf-thumbnail/${i}?hires=true`)
|
||||
if (thumbRes.ok) {
|
||||
const blob = await thumbRes.blob()
|
||||
thumbnails.push(URL.createObjectURL(blob))
|
||||
}
|
||||
} catch (e) {
|
||||
console.error(`Failed to load thumbnail for page ${i}`)
|
||||
}
|
||||
}
|
||||
|
||||
setPagesThumbnails(thumbnails)
|
||||
setIsLoadingThumbnails(false)
|
||||
setExtractionStatus(`${pdfInfo.page_count} Seiten bereit. Waehlen Sie die zu verarbeitenden Seiten.`)
|
||||
|
||||
} else {
|
||||
setExtractionStatus('KI analysiert das Bild... (kann 30-60 Sekunden dauern)')
|
||||
|
||||
const formData = new FormData()
|
||||
formData.append('file', file)
|
||||
|
||||
const uploadRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${sessionData.id}/upload`, {
|
||||
method: 'POST',
|
||||
body: formData,
|
||||
})
|
||||
|
||||
if (!uploadRes.ok) {
|
||||
throw new Error('Bild konnte nicht verarbeitet werden')
|
||||
}
|
||||
|
||||
const uploadData = await uploadRes.json()
|
||||
setSession(prev => prev ? { ...prev, status: 'extracted', vocabulary_count: uploadData.vocabulary_count } : null)
|
||||
|
||||
const vocabRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${sessionData.id}/vocabulary`)
|
||||
if (vocabRes.ok) {
|
||||
const vocabData = await vocabRes.json()
|
||||
setVocabulary(vocabData.vocabulary || [])
|
||||
setExtractionStatus(`${vocabData.vocabulary?.length || 0} Vokabeln gefunden!`)
|
||||
}
|
||||
|
||||
await new Promise(r => setTimeout(r, 1000))
|
||||
setActiveTab('vocabulary')
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error('Session start failed:', error)
|
||||
setError(error instanceof Error ? error.message : 'Ein Fehler ist aufgetreten')
|
||||
setExtractionStatus('')
|
||||
setSession(null)
|
||||
} finally {
|
||||
setIsCreatingSession(false)
|
||||
}
|
||||
}
|
||||
|
||||
const processSinglePage = async (pageIndex: number, ipa: IpaMode, syllable: SyllableMode): Promise<{ success: boolean; vocabulary: VocabularyEntry[]; error?: string }> => {
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session!.id}/process-single-page/${pageIndex}?ipa_mode=${ipa}&syllable_mode=${syllable}`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ ocr_prompts: ocrPrompts }),
|
||||
})
|
||||
|
||||
if (!res.ok) {
|
||||
const errBody = await res.json().catch(() => ({}))
|
||||
const detail = errBody.detail || `HTTP ${res.status}`
|
||||
return { success: false, vocabulary: [], error: `Seite ${pageIndex + 1}: ${detail}` }
|
||||
}
|
||||
|
||||
const data = await res.json()
|
||||
|
||||
if (!data.success) {
|
||||
return { success: false, vocabulary: [], error: data.error || `Seite ${pageIndex + 1}: Unbekannter Fehler` }
|
||||
}
|
||||
|
||||
return { success: true, vocabulary: data.vocabulary || [] }
|
||||
} catch (e) {
|
||||
return { success: false, vocabulary: [], error: `Seite ${pageIndex + 1}: ${e instanceof Error ? e.message : 'Netzwerkfehler'}` }
|
||||
}
|
||||
}
|
||||
|
||||
const processSelectedPages = async () => {
|
||||
if (!session || selectedPages.length === 0) return
|
||||
|
||||
const pagesToProcess = [...selectedPages].sort((a, b) => a - b)
|
||||
|
||||
setIsExtracting(true)
|
||||
setProcessingErrors([])
|
||||
setSuccessfulPages([])
|
||||
setFailedPages([])
|
||||
setProcessingQueue(pagesToProcess)
|
||||
setVocabulary([])
|
||||
|
||||
setActiveTab('vocabulary')
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
const errors: string[] = []
|
||||
const successful: number[] = []
|
||||
const failed: number[] = []
|
||||
|
||||
for (let i = 0; i < pagesToProcess.length; i++) {
|
||||
const pageIndex = pagesToProcess[i]
|
||||
setCurrentlyProcessingPage(pageIndex + 1)
|
||||
setExtractionStatus(`Verarbeite Seite ${pageIndex + 1} von ${pagesToProcess.length}... (kann 30-60 Sekunden dauern)`)
|
||||
|
||||
const result = await processSinglePage(pageIndex, ipaMode, syllableMode)
|
||||
|
||||
if (result.success) {
|
||||
successful.push(pageIndex + 1)
|
||||
setSuccessfulPages([...successful])
|
||||
setVocabulary(prev => [...prev, ...result.vocabulary])
|
||||
setExtractionStatus(`Seite ${pageIndex + 1} fertig: ${result.vocabulary.length} Vokabeln gefunden`)
|
||||
} else {
|
||||
failed.push(pageIndex + 1)
|
||||
setFailedPages([...failed])
|
||||
if (result.error) {
|
||||
errors.push(result.error)
|
||||
setProcessingErrors([...errors])
|
||||
}
|
||||
setExtractionStatus(`Seite ${pageIndex + 1} fehlgeschlagen`)
|
||||
}
|
||||
|
||||
await new Promise(r => setTimeout(r, 500))
|
||||
}
|
||||
|
||||
setCurrentlyProcessingPage(null)
|
||||
setProcessingQueue([])
|
||||
setIsExtracting(false)
|
||||
|
||||
if (successful.length === pagesToProcess.length) {
|
||||
setExtractionStatus(`Fertig! Alle ${successful.length} Seiten verarbeitet.`)
|
||||
} else if (successful.length > 0) {
|
||||
setExtractionStatus(`${successful.length} von ${pagesToProcess.length} Seiten verarbeitet. ${failed.length} fehlgeschlagen.`)
|
||||
} else {
|
||||
setExtractionStatus(`Alle Seiten fehlgeschlagen.`)
|
||||
}
|
||||
|
||||
// Reload thumbnails for processed pages (server may have rotated them)
|
||||
if (successful.length > 0 && session) {
|
||||
const updatedThumbs = [...pagesThumbnails]
|
||||
for (const pageNum of successful) {
|
||||
const idx = pageNum - 1
|
||||
try {
|
||||
const thumbRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/pdf-thumbnail/${idx}?hires=true&t=${Date.now()}`)
|
||||
if (thumbRes.ok) {
|
||||
const blob = await thumbRes.blob()
|
||||
if (updatedThumbs[idx]) URL.revokeObjectURL(updatedThumbs[idx])
|
||||
updatedThumbs[idx] = URL.createObjectURL(blob)
|
||||
}
|
||||
} catch (e) {
|
||||
console.error(`Failed to refresh thumbnail for page ${pageNum}`)
|
||||
}
|
||||
}
|
||||
setPagesThumbnails(updatedThumbs)
|
||||
}
|
||||
|
||||
setSession(prev => prev ? { ...prev, status: 'extracted' } : null)
|
||||
}
|
||||
|
||||
const togglePageSelection = (pageIndex: number) => {
|
||||
setSelectedPages(prev =>
|
||||
prev.includes(pageIndex)
|
||||
? prev.filter(p => p !== pageIndex)
|
||||
: [...prev, pageIndex].sort((a, b) => a - b)
|
||||
)
|
||||
}
|
||||
|
||||
const selectAllPages = () => setSelectedPages(
|
||||
Array.from({ length: pdfPageCount }, (_, i) => i).filter(p => !excludedPages.includes(p))
|
||||
)
|
||||
const selectNoPages = () => setSelectedPages([])
|
||||
|
||||
const excludePage = (pageIndex: number, e: React.MouseEvent) => {
|
||||
e.stopPropagation()
|
||||
setExcludedPages(prev => [...prev, pageIndex])
|
||||
setSelectedPages(prev => prev.filter(p => p !== pageIndex))
|
||||
}
|
||||
|
||||
const restoreExcludedPages = () => {
|
||||
setExcludedPages([])
|
||||
}
|
||||
|
||||
const runOcrComparison = async (pageIndex: number) => {
|
||||
if (!session) return
|
||||
|
||||
setOcrComparePageIndex(pageIndex)
|
||||
setShowOcrComparison(true)
|
||||
setIsComparingOcr(true)
|
||||
setOcrCompareError(null)
|
||||
setOcrCompareResult(null)
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/compare-ocr/${pageIndex}`, {
|
||||
method: 'POST',
|
||||
})
|
||||
|
||||
if (!res.ok) {
|
||||
throw new Error(`HTTP ${res.status}`)
|
||||
}
|
||||
|
||||
const data = await res.json()
|
||||
setOcrCompareResult(data)
|
||||
} catch (e) {
|
||||
setOcrCompareError(e instanceof Error ? e.message : 'Vergleich fehlgeschlagen')
|
||||
} finally {
|
||||
setIsComparingOcr(false)
|
||||
}
|
||||
}
|
||||
|
||||
const updateVocabularyEntry = (id: string, field: string, value: string) => {
|
||||
setVocabulary(prev => prev.map(v => {
|
||||
if (v.id !== id) return v
|
||||
if (field === 'english' || field === 'german' || field === 'example_sentence' || field === 'word_type') {
|
||||
return { ...v, [field]: value }
|
||||
}
|
||||
return { ...v, extras: { ...(v.extras || {}), [field]: value } }
|
||||
}))
|
||||
}
|
||||
|
||||
const addExtraColumn = (sourcePage: number) => {
|
||||
const label = prompt('Spaltenname:')
|
||||
if (!label || !label.trim()) return
|
||||
const key = `extra_${Date.now()}`
|
||||
setPageExtraColumns(prev => ({
|
||||
...prev,
|
||||
[sourcePage]: [...(prev[sourcePage] || []), { key, label: label.trim() }],
|
||||
}))
|
||||
}
|
||||
|
||||
const removeExtraColumn = (sourcePage: number, key: string) => {
|
||||
setPageExtraColumns(prev => ({
|
||||
...prev,
|
||||
[sourcePage]: (prev[sourcePage] || []).filter(c => c.key !== key),
|
||||
}))
|
||||
setVocabulary(prev => prev.map(v => {
|
||||
if (!v.extras || !(key in v.extras)) return v
|
||||
const { [key]: _, ...rest } = v.extras
|
||||
return { ...v, extras: rest }
|
||||
}))
|
||||
}
|
||||
|
||||
const getExtraColumnsForPage = (sourcePage: number): ExtraColumn[] => {
|
||||
const global = pageExtraColumns[0] || []
|
||||
const pageSpecific = pageExtraColumns[sourcePage] || []
|
||||
return [...global, ...pageSpecific]
|
||||
}
|
||||
|
||||
const getAllExtraColumns = (): ExtraColumn[] => {
|
||||
const seen = new Set<string>()
|
||||
const result: ExtraColumn[] = []
|
||||
for (const cols of Object.values(pageExtraColumns)) {
|
||||
for (const col of cols) {
|
||||
if (!seen.has(col.key)) {
|
||||
seen.add(col.key)
|
||||
result.push(col)
|
||||
}
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
const deleteVocabularyEntry = (id: string) => {
|
||||
setVocabulary(prev => prev.filter(v => v.id !== id))
|
||||
}
|
||||
|
||||
const toggleVocabularySelection = (id: string) => {
|
||||
setVocabulary(prev => prev.map(v =>
|
||||
v.id === id ? { ...v, selected: !v.selected } : v
|
||||
))
|
||||
}
|
||||
|
||||
const toggleAllSelection = () => {
|
||||
const allSelected = vocabulary.every(v => v.selected)
|
||||
setVocabulary(prev => prev.map(v => ({ ...v, selected: !allSelected })))
|
||||
}
|
||||
|
||||
const addVocabularyEntry = (atIndex?: number) => {
|
||||
const newEntry: VocabularyEntry = {
|
||||
id: `new-${Date.now()}`,
|
||||
english: '',
|
||||
german: '',
|
||||
example_sentence: '',
|
||||
selected: true
|
||||
}
|
||||
setVocabulary(prev => {
|
||||
if (atIndex === undefined) {
|
||||
return [...prev, newEntry]
|
||||
}
|
||||
const newList = [...prev]
|
||||
newList.splice(atIndex, 0, newEntry)
|
||||
return newList
|
||||
})
|
||||
}
|
||||
|
||||
const saveVocabulary = async () => {
|
||||
if (!session) return
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/vocabulary`, {
|
||||
method: 'PUT',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ vocabulary }),
|
||||
})
|
||||
} catch (error) {
|
||||
console.error('Failed to save vocabulary:', error)
|
||||
}
|
||||
}
|
||||
|
||||
const generateWorksheet = async () => {
|
||||
if (!session) return
|
||||
if (selectedFormat === 'standard' && selectedTypes.length === 0) return
|
||||
|
||||
setIsGenerating(true)
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
await saveVocabulary()
|
||||
|
||||
let res: Response
|
||||
|
||||
if (selectedFormat === 'nru') {
|
||||
res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/generate-nru`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
title: worksheetTitle || session.name,
|
||||
include_solutions: includeSolutions,
|
||||
}),
|
||||
})
|
||||
} else {
|
||||
res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/generate`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
worksheet_types: selectedTypes,
|
||||
title: worksheetTitle || session.name,
|
||||
include_solutions: includeSolutions,
|
||||
line_height: lineHeight,
|
||||
}),
|
||||
})
|
||||
}
|
||||
|
||||
if (res.ok) {
|
||||
const data = await res.json()
|
||||
setWorksheetId(data.worksheet_id || data.id)
|
||||
setActiveTab('export')
|
||||
completeActivity({ vocabCount: vocabulary.length })
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Failed to generate worksheet:', error)
|
||||
} finally {
|
||||
setIsGenerating(false)
|
||||
}
|
||||
}
|
||||
|
||||
const downloadPDF = (type: 'worksheet' | 'solution') => {
|
||||
if (!worksheetId) return
|
||||
const API_BASE = getApiBase()
|
||||
const endpoint = type === 'worksheet' ? 'pdf' : 'solution'
|
||||
window.open(`${API_BASE}/api/v1/vocab/worksheets/${worksheetId}/${endpoint}`, '_blank')
|
||||
}
|
||||
|
||||
const toggleWorksheetType = (type: WorksheetType) => {
|
||||
setSelectedTypes(prev =>
|
||||
prev.includes(type) ? prev.filter(t => t !== type) : [...prev, type]
|
||||
)
|
||||
}
|
||||
|
||||
const resumeSession = async (existingSession: Session) => {
|
||||
setError(null)
|
||||
setExtractionStatus('Session wird geladen...')
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
try {
|
||||
const sessionRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${existingSession.id}`)
|
||||
if (!sessionRes.ok) throw new Error('Session nicht gefunden')
|
||||
const sessionData = await sessionRes.json()
|
||||
setSession(sessionData)
|
||||
setWorksheetTitle(sessionData.name)
|
||||
|
||||
if (sessionData.status === 'extracted' || sessionData.status === 'completed') {
|
||||
const vocabRes = await fetch(`${API_BASE}/api/v1/vocab/sessions/${existingSession.id}/vocabulary`)
|
||||
if (vocabRes.ok) {
|
||||
const vocabData = await vocabRes.json()
|
||||
setVocabulary(vocabData.vocabulary || [])
|
||||
}
|
||||
setActiveTab('vocabulary')
|
||||
setExtractionStatus('')
|
||||
} else if (sessionData.status === 'pending') {
|
||||
setActiveTab('upload')
|
||||
setExtractionStatus('Diese Session hat noch keine Vokabeln. Bitte laden Sie ein Dokument hoch.')
|
||||
} else {
|
||||
setActiveTab('vocabulary')
|
||||
setExtractionStatus('')
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error('Failed to resume session:', error)
|
||||
setError(error instanceof Error ? error.message : 'Fehler beim Laden der Session')
|
||||
setExtractionStatus('')
|
||||
}
|
||||
}
|
||||
|
||||
const resetSession = async () => {
|
||||
setSession(null)
|
||||
setSessionName('')
|
||||
setVocabulary([])
|
||||
setUploadedImage(null)
|
||||
setWorksheetId(null)
|
||||
setSelectedDocumentId(null)
|
||||
setDirectFile(null)
|
||||
setDirectFilePreview(null)
|
||||
setShowFullPreview(false)
|
||||
setPdfPageCount(0)
|
||||
setSelectedPages([])
|
||||
setPagesThumbnails([])
|
||||
setExcludedPages([])
|
||||
setActiveTab('upload')
|
||||
setError(null)
|
||||
setExtractionStatus('')
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions`)
|
||||
if (res.ok) {
|
||||
const sessions = await res.json()
|
||||
setExistingSessions(sessions)
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('Failed to reload sessions:', e)
|
||||
}
|
||||
}
|
||||
|
||||
const deleteSession = async (sessionId: string, e: React.MouseEvent) => {
|
||||
e.stopPropagation()
|
||||
if (!confirm('Session wirklich loeschen? Diese Aktion kann nicht rueckgaengig gemacht werden.')) {
|
||||
return
|
||||
}
|
||||
|
||||
const API_BASE = getApiBase()
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${sessionId}`, {
|
||||
method: 'DELETE',
|
||||
})
|
||||
if (res.ok) {
|
||||
setExistingSessions(prev => prev.filter(s => s.id !== sessionId))
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('Failed to delete session:', e)
|
||||
}
|
||||
}
|
||||
|
||||
// Reprocess all successful pages with new IPA/syllable modes
|
||||
const reprocessPages = (ipa: IpaMode, syllable: SyllableMode) => {
|
||||
if (!session) return
|
||||
|
||||
// Determine pages to reprocess: use successfulPages if available,
|
||||
// otherwise derive from vocabulary source_page or selectedPages
|
||||
let pagesToReprocess: number[]
|
||||
if (successfulPages.length > 0) {
|
||||
pagesToReprocess = successfulPages.map(p => p - 1)
|
||||
} else if (vocabulary.length > 0) {
|
||||
// Derive from vocabulary entries' source_page (1-indexed → 0-indexed)
|
||||
const pageSet = new Set(vocabulary.map(v => (v.source_page || 1) - 1))
|
||||
pagesToReprocess = [...pageSet].sort((a, b) => a - b)
|
||||
} else if (selectedPages.length > 0) {
|
||||
pagesToReprocess = [...selectedPages]
|
||||
} else {
|
||||
// Fallback: try page 0
|
||||
pagesToReprocess = [0]
|
||||
}
|
||||
|
||||
if (pagesToReprocess.length === 0) return
|
||||
|
||||
setIsExtracting(true)
|
||||
setExtractionStatus('Verarbeite mit neuen Einstellungen...')
|
||||
const API_BASE = getApiBase()
|
||||
|
||||
;(async () => {
|
||||
const allVocab: VocabularyEntry[] = []
|
||||
for (const pageIndex of pagesToReprocess) {
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/vocab/sessions/${session.id}/process-single-page/${pageIndex}?ipa_mode=${ipa}&syllable_mode=${syllable}`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ ocr_prompts: ocrPrompts }),
|
||||
})
|
||||
if (res.ok) {
|
||||
const data = await res.json()
|
||||
if (data.vocabulary) allVocab.push(...data.vocabulary)
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
setVocabulary(allVocab)
|
||||
setIsExtracting(false)
|
||||
setExtractionStatus(`${allVocab.length} Vokabeln mit neuen Einstellungen`)
|
||||
})()
|
||||
}
|
||||
|
||||
return {
|
||||
// Mounted
|
||||
mounted,
|
||||
// Theme
|
||||
isDark, glassCard, glassInput,
|
||||
// Tab
|
||||
activeTab, setActiveTab,
|
||||
// Session
|
||||
session, sessionName, setSessionName, isCreatingSession, error, setError, extractionStatus,
|
||||
// Existing sessions
|
||||
existingSessions, isLoadingSessions,
|
||||
// Documents
|
||||
storedDocuments, selectedDocumentId, setSelectedDocumentId,
|
||||
// Direct file
|
||||
directFile, setDirectFile, directFilePreview, showFullPreview, setShowFullPreview, directFileInputRef,
|
||||
// PDF pages
|
||||
pdfPageCount, selectedPages, pagesThumbnails, isLoadingThumbnails, excludedPages,
|
||||
// Extra columns
|
||||
pageExtraColumns,
|
||||
// Upload
|
||||
uploadedImage, isExtracting,
|
||||
// Vocabulary
|
||||
vocabulary,
|
||||
// Worksheet
|
||||
selectedTypes, worksheetTitle, setWorksheetTitle,
|
||||
includeSolutions, setIncludeSolutions,
|
||||
lineHeight, setLineHeight,
|
||||
selectedFormat, setSelectedFormat,
|
||||
ipaMode, setIpaMode, syllableMode, setSyllableMode,
|
||||
// Export
|
||||
worksheetId, isGenerating,
|
||||
// Processing
|
||||
processingErrors, successfulPages, failedPages, currentlyProcessingPage,
|
||||
// OCR settings
|
||||
ocrPrompts, showSettings, setShowSettings,
|
||||
// QR
|
||||
showQRModal, setShowQRModal, uploadSessionId,
|
||||
mobileUploadedFiles, selectedMobileFile, setSelectedMobileFile, setMobileUploadedFiles,
|
||||
// OCR Comparison
|
||||
showOcrComparison, setShowOcrComparison,
|
||||
ocrComparePageIndex, ocrCompareResult, isComparingOcr, ocrCompareError,
|
||||
// Handlers
|
||||
handleDirectFileSelect, startSession, processSelectedPages,
|
||||
togglePageSelection, selectAllPages, selectNoPages, excludePage, restoreExcludedPages,
|
||||
runOcrComparison,
|
||||
updateVocabularyEntry, addExtraColumn, removeExtraColumn,
|
||||
getExtraColumnsForPage, getAllExtraColumns,
|
||||
deleteVocabularyEntry, toggleVocabularySelection, toggleAllSelection, addVocabularyEntry,
|
||||
saveVocabulary, generateWorksheet, downloadPDF, toggleWorksheetType,
|
||||
resumeSession, resetSession, deleteSession,
|
||||
saveOcrPrompts, formatFileSize, reprocessPages,
|
||||
}
|
||||
}
|
||||
200
studio-v2/components/dashboard/LearningProgress.tsx
Normal file
200
studio-v2/components/dashboard/LearningProgress.tsx
Normal file
@@ -0,0 +1,200 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect } from 'react'
|
||||
import Link from 'next/link'
|
||||
import { ProgressRing } from '@/components/gamification/ProgressRing'
|
||||
import { CrownBadge } from '@/components/gamification/CrownBadge'
|
||||
|
||||
interface UnitProgress {
|
||||
unit_id: string
|
||||
coins: number
|
||||
crowns: number
|
||||
streak_days: number
|
||||
last_activity: string | null
|
||||
exercises: {
|
||||
flashcards?: { completed: number; correct: number; incorrect: number }
|
||||
quiz?: { completed: number; correct: number; incorrect: number }
|
||||
type?: { completed: number; correct: number; incorrect: number }
|
||||
story?: { generated: number }
|
||||
}
|
||||
}
|
||||
|
||||
interface LearningUnit {
|
||||
id: string
|
||||
label: string
|
||||
meta: string
|
||||
status: string
|
||||
}
|
||||
|
||||
interface LearningProgressProps {
|
||||
isDark: boolean
|
||||
glassCard: string
|
||||
}
|
||||
|
||||
function getBackendUrl() {
|
||||
if (typeof window === 'undefined') return 'http://localhost:8001'
|
||||
const { hostname, protocol } = window.location
|
||||
if (hostname === 'localhost') return 'http://localhost:8001'
|
||||
return `${protocol}//${hostname}:8001`
|
||||
}
|
||||
|
||||
export function LearningProgress({ isDark, glassCard }: LearningProgressProps) {
|
||||
const [units, setUnits] = useState<LearningUnit[]>([])
|
||||
const [progress, setProgress] = useState<UnitProgress[]>([])
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
|
||||
useEffect(() => {
|
||||
loadData()
|
||||
}, [])
|
||||
|
||||
const loadData = async () => {
|
||||
setIsLoading(true)
|
||||
try {
|
||||
const [unitsResp, progressResp] = await Promise.all([
|
||||
fetch(`${getBackendUrl()}/api/learning-units/`),
|
||||
fetch(`${getBackendUrl()}/api/progress/`),
|
||||
])
|
||||
|
||||
if (unitsResp.ok) setUnits(await unitsResp.json())
|
||||
if (progressResp.ok) setProgress(await progressResp.json())
|
||||
} catch (err) {
|
||||
console.error('Failed to load learning data:', err)
|
||||
} finally {
|
||||
setIsLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
const totalCoins = progress.reduce((sum, p) => sum + (p.coins || 0), 0)
|
||||
const totalCrowns = progress.reduce((sum, p) => sum + (p.crowns || 0), 0)
|
||||
const maxStreak = Math.max(0, ...progress.map((p) => p.streak_days || 0))
|
||||
|
||||
const totalCorrect = progress.reduce((sum, p) => {
|
||||
const ex = p.exercises || {}
|
||||
return sum +
|
||||
(ex.flashcards?.correct || 0) +
|
||||
(ex.quiz?.correct || 0) +
|
||||
(ex.type?.correct || 0)
|
||||
}, 0)
|
||||
|
||||
const totalAnswered = progress.reduce((sum, p) => {
|
||||
const ex = p.exercises || {}
|
||||
return sum +
|
||||
(ex.flashcards?.completed || 0) +
|
||||
(ex.quiz?.completed || 0) +
|
||||
(ex.type?.completed || 0)
|
||||
}, 0)
|
||||
|
||||
const accuracy = totalAnswered > 0 ? Math.round((totalCorrect / totalAnswered) * 100) : 0
|
||||
|
||||
if (isLoading) {
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<div className="flex items-center justify-center py-8">
|
||||
<div className={`w-6 h-6 border-2 ${isDark ? 'border-blue-400' : 'border-blue-600'} border-t-transparent rounded-full animate-spin`} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
if (units.length === 0) {
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<h3 className={`text-lg font-semibold mb-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Lernfortschritt
|
||||
</h3>
|
||||
<p className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
Noch keine Lernmodule vorhanden.
|
||||
</p>
|
||||
<Link
|
||||
href="/vocab-worksheet"
|
||||
className={`inline-block mt-3 text-sm font-medium ${isDark ? 'text-blue-300 hover:text-blue-200' : 'text-blue-600 hover:text-blue-700'}`}
|
||||
>
|
||||
Vokabeln scannen →
|
||||
</Link>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6`}>
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h3 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
Lernfortschritt
|
||||
</h3>
|
||||
<Link
|
||||
href="/learn"
|
||||
className={`text-sm font-medium ${isDark ? 'text-blue-300 hover:text-blue-200' : 'text-blue-600 hover:text-blue-700'}`}
|
||||
>
|
||||
Alle Module →
|
||||
</Link>
|
||||
</div>
|
||||
|
||||
{/* Stats Row */}
|
||||
<div className="grid grid-cols-4 gap-4 mb-6">
|
||||
<ProgressRing
|
||||
progress={accuracy}
|
||||
size={64}
|
||||
strokeWidth={5}
|
||||
label="Genauigkeit"
|
||||
value={`${accuracy}%`}
|
||||
color="#22c55e"
|
||||
isDark={isDark}
|
||||
/>
|
||||
<div className="flex flex-col items-center gap-1">
|
||||
<span className="text-2xl">🪙</span>
|
||||
<span className={`text-lg font-bold ${isDark ? 'text-yellow-300' : 'text-yellow-600'}`}>{totalCoins}</span>
|
||||
<span className={`text-xs ${isDark ? 'text-white/40' : 'text-slate-400'}`}>Muenzen</span>
|
||||
</div>
|
||||
<div className="flex flex-col items-center gap-1">
|
||||
<CrownBadge crowns={totalCrowns} size="md" showLabel={false} />
|
||||
<span className={`text-lg font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>{totalCrowns}</span>
|
||||
<span className={`text-xs ${isDark ? 'text-white/40' : 'text-slate-400'}`}>Kronen</span>
|
||||
</div>
|
||||
<div className="flex flex-col items-center gap-1">
|
||||
<span className="text-2xl">🔥</span>
|
||||
<span className={`text-lg font-bold ${isDark ? 'text-orange-300' : 'text-orange-600'}`}>{maxStreak}</span>
|
||||
<span className={`text-xs ${isDark ? 'text-white/40' : 'text-slate-400'}`}>Tage-Streak</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Unit List */}
|
||||
<div className="space-y-2">
|
||||
{units.slice(0, 3).map((unit) => {
|
||||
const unitProgress = progress.find((p) => p.unit_id === unit.id)
|
||||
const unitCorrect = unitProgress
|
||||
? (unitProgress.exercises?.flashcards?.correct || 0) +
|
||||
(unitProgress.exercises?.quiz?.correct || 0) +
|
||||
(unitProgress.exercises?.type?.correct || 0)
|
||||
: 0
|
||||
const unitTotal = unitProgress
|
||||
? (unitProgress.exercises?.flashcards?.completed || 0) +
|
||||
(unitProgress.exercises?.quiz?.completed || 0) +
|
||||
(unitProgress.exercises?.type?.completed || 0)
|
||||
: 0
|
||||
|
||||
return (
|
||||
<Link
|
||||
key={unit.id}
|
||||
href={`/learn/${unit.id}/flashcards`}
|
||||
className={`flex items-center justify-between p-3 rounded-xl transition-colors ${
|
||||
isDark ? 'hover:bg-white/5' : 'hover:bg-slate-50'
|
||||
}`}
|
||||
>
|
||||
<div>
|
||||
<span className={`text-sm font-medium ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{unit.label}
|
||||
</span>
|
||||
<span className={`text-xs ml-2 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
{unitTotal > 0 ? `${unitCorrect}/${unitTotal} richtig` : 'Noch nicht geubt'}
|
||||
</span>
|
||||
</div>
|
||||
<svg className={`w-4 h-4 ${isDark ? 'text-white/30' : 'text-slate-300'}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
|
||||
</svg>
|
||||
</Link>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
91
studio-v2/components/gamification/CoinAnimation.tsx
Normal file
91
studio-v2/components/gamification/CoinAnimation.tsx
Normal file
@@ -0,0 +1,91 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useEffect, useCallback } from 'react'
|
||||
|
||||
interface CoinAnimationProps {
|
||||
amount: number
|
||||
trigger: number // increment to trigger animation
|
||||
}
|
||||
|
||||
interface FloatingCoin {
|
||||
id: number
|
||||
x: number
|
||||
delay: number
|
||||
}
|
||||
|
||||
export function CoinAnimation({ amount, trigger }: CoinAnimationProps) {
|
||||
const [coins, setCoins] = useState<FloatingCoin[]>([])
|
||||
const [total, setTotal] = useState(0)
|
||||
const [showBounce, setShowBounce] = useState(false)
|
||||
|
||||
useEffect(() => {
|
||||
if (trigger === 0) return
|
||||
|
||||
// Create floating coins
|
||||
const newCoins: FloatingCoin[] = Array.from({ length: Math.min(amount, 5) }, (_, i) => ({
|
||||
id: Date.now() + i,
|
||||
x: Math.random() * 60 - 30,
|
||||
delay: i * 100,
|
||||
}))
|
||||
setCoins(newCoins)
|
||||
setShowBounce(true)
|
||||
|
||||
// Update total after animation
|
||||
setTimeout(() => {
|
||||
setTotal((prev) => prev + amount)
|
||||
setShowBounce(false)
|
||||
}, 800)
|
||||
|
||||
// Clean up coins
|
||||
setTimeout(() => setCoins([]), 1500)
|
||||
}, [trigger, amount])
|
||||
|
||||
return (
|
||||
<div className="relative inline-flex items-center gap-1.5">
|
||||
{/* Coin icon + total */}
|
||||
<div className={`flex items-center gap-1 px-3 py-1 rounded-full bg-yellow-500/20 border border-yellow-500/30 transition-transform ${showBounce ? 'scale-110' : 'scale-100'}`}>
|
||||
<span className="text-yellow-400 text-sm">🪙</span>
|
||||
<span className="text-yellow-300 text-sm font-bold tabular-nums">{total}</span>
|
||||
</div>
|
||||
|
||||
{/* Floating coins animation */}
|
||||
{coins.map((coin) => (
|
||||
<span
|
||||
key={coin.id}
|
||||
className="absolute text-lg animate-coin-float pointer-events-none"
|
||||
style={{
|
||||
left: `calc(50% + ${coin.x}px)`,
|
||||
animationDelay: `${coin.delay}ms`,
|
||||
}}
|
||||
>
|
||||
🪙
|
||||
</span>
|
||||
))}
|
||||
|
||||
<style>{`
|
||||
@keyframes coin-float {
|
||||
0% { transform: translateY(0) scale(1); opacity: 1; }
|
||||
100% { transform: translateY(-60px) scale(0.5); opacity: 0; }
|
||||
}
|
||||
.animate-coin-float {
|
||||
animation: coin-float 1s ease-out forwards;
|
||||
}
|
||||
`}</style>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
/** Hook to manage coin rewards */
|
||||
export function useCoinRewards() {
|
||||
const [totalCoins, setTotalCoins] = useState(0)
|
||||
const [triggerCount, setTriggerCount] = useState(0)
|
||||
const [lastReward, setLastReward] = useState(0)
|
||||
|
||||
const awardCoins = useCallback((amount: number) => {
|
||||
setLastReward(amount)
|
||||
setTriggerCount((c) => c + 1)
|
||||
setTotalCoins((t) => t + amount)
|
||||
}, [])
|
||||
|
||||
return { totalCoins, triggerCount, lastReward, awardCoins }
|
||||
}
|
||||
35
studio-v2/components/gamification/CrownBadge.tsx
Normal file
35
studio-v2/components/gamification/CrownBadge.tsx
Normal file
@@ -0,0 +1,35 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
|
||||
interface CrownBadgeProps {
|
||||
crowns: number
|
||||
size?: 'sm' | 'md' | 'lg'
|
||||
showLabel?: boolean
|
||||
}
|
||||
|
||||
export function CrownBadge({ crowns, size = 'md', showLabel = true }: CrownBadgeProps) {
|
||||
const sizeClasses = {
|
||||
sm: 'text-base',
|
||||
md: 'text-xl',
|
||||
lg: 'text-3xl',
|
||||
}
|
||||
|
||||
const isGold = crowns >= 3
|
||||
const isSilver = crowns >= 1
|
||||
|
||||
return (
|
||||
<div className="inline-flex items-center gap-1.5">
|
||||
<span className={`${sizeClasses[size]} ${isGold ? 'animate-pulse' : ''}`}>
|
||||
{isGold ? '👑' : isSilver ? '🥈' : '⭐'}
|
||||
</span>
|
||||
{showLabel && (
|
||||
<span className={`font-bold tabular-nums ${
|
||||
isGold ? 'text-yellow-400' : isSilver ? 'text-slate-300' : 'text-white/50'
|
||||
} ${size === 'sm' ? 'text-xs' : size === 'md' ? 'text-sm' : 'text-base'}`}>
|
||||
{crowns}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
67
studio-v2/components/gamification/ProgressRing.tsx
Normal file
67
studio-v2/components/gamification/ProgressRing.tsx
Normal file
@@ -0,0 +1,67 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
|
||||
interface ProgressRingProps {
|
||||
progress: number // 0-100
|
||||
size?: number
|
||||
strokeWidth?: number
|
||||
label: string
|
||||
value: string
|
||||
color?: string
|
||||
isDark?: boolean
|
||||
}
|
||||
|
||||
export function ProgressRing({
|
||||
progress,
|
||||
size = 80,
|
||||
strokeWidth = 6,
|
||||
label,
|
||||
value,
|
||||
color = '#60a5fa',
|
||||
isDark = true,
|
||||
}: ProgressRingProps) {
|
||||
const radius = (size - strokeWidth) / 2
|
||||
const circumference = radius * 2 * Math.PI
|
||||
const offset = circumference - (Math.min(progress, 100) / 100) * circumference
|
||||
|
||||
return (
|
||||
<div className="flex flex-col items-center gap-1">
|
||||
<div className="relative" style={{ width: size, height: size }}>
|
||||
<svg width={size} height={size} className="-rotate-90">
|
||||
{/* Background circle */}
|
||||
<circle
|
||||
cx={size / 2}
|
||||
cy={size / 2}
|
||||
r={radius}
|
||||
fill="none"
|
||||
stroke={isDark ? 'rgba(255,255,255,0.1)' : 'rgba(0,0,0,0.08)'}
|
||||
strokeWidth={strokeWidth}
|
||||
/>
|
||||
{/* Progress circle */}
|
||||
<circle
|
||||
cx={size / 2}
|
||||
cy={size / 2}
|
||||
r={radius}
|
||||
fill="none"
|
||||
stroke={color}
|
||||
strokeWidth={strokeWidth}
|
||||
strokeDasharray={circumference}
|
||||
strokeDashoffset={offset}
|
||||
strokeLinecap="round"
|
||||
className="transition-all duration-700 ease-out"
|
||||
/>
|
||||
</svg>
|
||||
{/* Center text */}
|
||||
<div className="absolute inset-0 flex items-center justify-center">
|
||||
<span className={`text-sm font-bold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{value}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
<span className={`text-xs ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{label}
|
||||
</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
75
studio-v2/components/learn/AudioButton.tsx
Normal file
75
studio-v2/components/learn/AudioButton.tsx
Normal file
@@ -0,0 +1,75 @@
|
||||
'use client'
|
||||
|
||||
import React, { useCallback, useState } from 'react'
|
||||
|
||||
interface AudioButtonProps {
|
||||
text: string
|
||||
lang: 'en' | 'de'
|
||||
isDark: boolean
|
||||
size?: 'sm' | 'md' | 'lg'
|
||||
}
|
||||
|
||||
export function AudioButton({ text, lang, isDark, size = 'md' }: AudioButtonProps) {
|
||||
const [isSpeaking, setIsSpeaking] = useState(false)
|
||||
|
||||
const speak = useCallback(() => {
|
||||
if (!('speechSynthesis' in window)) return
|
||||
if (isSpeaking) {
|
||||
window.speechSynthesis.cancel()
|
||||
setIsSpeaking(false)
|
||||
return
|
||||
}
|
||||
|
||||
const utterance = new SpeechSynthesisUtterance(text)
|
||||
utterance.lang = lang === 'de' ? 'de-DE' : 'en-GB'
|
||||
utterance.rate = 0.9
|
||||
utterance.pitch = 1.0
|
||||
|
||||
// Try to find a good voice
|
||||
const voices = window.speechSynthesis.getVoices()
|
||||
const preferred = voices.find((v) =>
|
||||
v.lang.startsWith(lang === 'de' ? 'de' : 'en') && v.localService
|
||||
) || voices.find((v) => v.lang.startsWith(lang === 'de' ? 'de' : 'en'))
|
||||
if (preferred) utterance.voice = preferred
|
||||
|
||||
utterance.onend = () => setIsSpeaking(false)
|
||||
utterance.onerror = () => setIsSpeaking(false)
|
||||
|
||||
setIsSpeaking(true)
|
||||
window.speechSynthesis.speak(utterance)
|
||||
}, [text, lang, isSpeaking])
|
||||
|
||||
const sizeClasses = {
|
||||
sm: 'w-7 h-7',
|
||||
md: 'w-9 h-9',
|
||||
lg: 'w-11 h-11',
|
||||
}
|
||||
|
||||
const iconSizes = {
|
||||
sm: 'w-3.5 h-3.5',
|
||||
md: 'w-4 h-4',
|
||||
lg: 'w-5 h-5',
|
||||
}
|
||||
|
||||
return (
|
||||
<button
|
||||
onClick={speak}
|
||||
className={`${sizeClasses[size]} rounded-full flex items-center justify-center transition-all ${
|
||||
isSpeaking
|
||||
? 'bg-blue-500 text-white animate-pulse'
|
||||
: isDark
|
||||
? 'bg-white/10 text-white/60 hover:bg-white/20 hover:text-white'
|
||||
: 'bg-slate-100 text-slate-500 hover:bg-slate-200 hover:text-slate-700'
|
||||
}`}
|
||||
title={isSpeaking ? 'Stop' : `${lang === 'de' ? 'Deutsch' : 'Englisch'} vorlesen`}
|
||||
>
|
||||
<svg className={iconSizes[size]} fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
{isSpeaking ? (
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0zM10 9v6m4-6v6" />
|
||||
) : (
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15.536 8.464a5 5 0 010 7.072m2.828-9.9a9 9 0 010 12.728M5.586 15H4a1 1 0 01-1-1v-4a1 1 0 011-1h1.586l4.707-4.707C10.923 3.663 12 4.109 12 5v14c0 .891-1.077 1.337-1.707.707L5.586 15z" />
|
||||
)}
|
||||
</svg>
|
||||
</button>
|
||||
)
|
||||
}
|
||||
136
studio-v2/components/learn/FlashCard.tsx
Normal file
136
studio-v2/components/learn/FlashCard.tsx
Normal file
@@ -0,0 +1,136 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useCallback } from 'react'
|
||||
|
||||
interface FlashCardProps {
|
||||
front: string
|
||||
back: string
|
||||
cardNumber: number
|
||||
totalCards: number
|
||||
leitnerBox: number
|
||||
onCorrect: () => void
|
||||
onIncorrect: () => void
|
||||
isDark: boolean
|
||||
}
|
||||
|
||||
const boxLabels = ['Neu', 'Gelernt', 'Gefestigt']
|
||||
const boxColors = ['text-yellow-400', 'text-blue-400', 'text-green-400']
|
||||
|
||||
export function FlashCard({
|
||||
front,
|
||||
back,
|
||||
cardNumber,
|
||||
totalCards,
|
||||
leitnerBox,
|
||||
onCorrect,
|
||||
onIncorrect,
|
||||
isDark,
|
||||
}: FlashCardProps) {
|
||||
const [isFlipped, setIsFlipped] = useState(false)
|
||||
|
||||
const handleFlip = useCallback(() => {
|
||||
setIsFlipped((f) => !f)
|
||||
}, [])
|
||||
|
||||
const handleCorrect = useCallback(() => {
|
||||
setIsFlipped(false)
|
||||
onCorrect()
|
||||
}, [onCorrect])
|
||||
|
||||
const handleIncorrect = useCallback(() => {
|
||||
setIsFlipped(false)
|
||||
onIncorrect()
|
||||
}, [onIncorrect])
|
||||
|
||||
return (
|
||||
<div className="flex flex-col items-center gap-6 w-full max-w-lg mx-auto">
|
||||
{/* Card */}
|
||||
<div
|
||||
onClick={handleFlip}
|
||||
className="w-full cursor-pointer select-none"
|
||||
style={{ perspective: '1000px' }}
|
||||
>
|
||||
<div
|
||||
className="relative w-full transition-transform duration-500"
|
||||
style={{
|
||||
transformStyle: 'preserve-3d',
|
||||
transform: isFlipped ? 'rotateY(180deg)' : 'rotateY(0deg)',
|
||||
}}
|
||||
>
|
||||
{/* Front */}
|
||||
<div
|
||||
className={`w-full min-h-[280px] rounded-3xl p-8 flex flex-col items-center justify-center ${
|
||||
isDark
|
||||
? 'bg-white/10 backdrop-blur-xl border border-white/20'
|
||||
: 'bg-white shadow-xl border border-slate-200'
|
||||
}`}
|
||||
style={{ backfaceVisibility: 'hidden' }}
|
||||
>
|
||||
<span className={`text-xs font-medium mb-4 ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
ENGLISCH
|
||||
</span>
|
||||
<span className={`text-3xl font-bold text-center ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{front}
|
||||
</span>
|
||||
<span className={`text-sm mt-6 ${isDark ? 'text-white/30' : 'text-slate-400'}`}>
|
||||
Klick zum Umdrehen
|
||||
</span>
|
||||
</div>
|
||||
|
||||
{/* Back */}
|
||||
<div
|
||||
className={`w-full min-h-[280px] rounded-3xl p-8 flex flex-col items-center justify-center absolute inset-0 ${
|
||||
isDark
|
||||
? 'bg-gradient-to-br from-blue-500/20 to-cyan-500/20 backdrop-blur-xl border border-blue-400/30'
|
||||
: 'bg-gradient-to-br from-blue-50 to-cyan-50 shadow-xl border border-blue-200'
|
||||
}`}
|
||||
style={{ backfaceVisibility: 'hidden', transform: 'rotateY(180deg)' }}
|
||||
>
|
||||
<span className={`text-xs font-medium mb-4 ${isDark ? 'text-blue-300/60' : 'text-blue-500'}`}>
|
||||
DEUTSCH
|
||||
</span>
|
||||
<span className={`text-3xl font-bold text-center ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{back}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Status Bar */}
|
||||
<div className={`flex items-center justify-between w-full px-2 ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
<span className="text-sm">
|
||||
Karte {cardNumber} von {totalCards}
|
||||
</span>
|
||||
<span className={`text-sm font-medium ${boxColors[leitnerBox] || boxColors[0]}`}>
|
||||
Box: {boxLabels[leitnerBox] || boxLabels[0]}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
{/* Progress */}
|
||||
<div className="w-full h-1.5 rounded-full bg-white/10 overflow-hidden">
|
||||
<div
|
||||
className="h-full rounded-full bg-gradient-to-r from-blue-500 to-cyan-500 transition-all"
|
||||
style={{ width: `${(cardNumber / totalCards) * 100}%` }}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Answer Buttons */}
|
||||
{isFlipped && (
|
||||
<div className="flex gap-4 w-full">
|
||||
<button
|
||||
onClick={handleIncorrect}
|
||||
className="flex-1 py-4 rounded-2xl font-semibold text-lg transition-all bg-gradient-to-r from-red-500 to-rose-500 text-white hover:shadow-lg hover:shadow-red-500/25 hover:scale-[1.02]"
|
||||
>
|
||||
Falsch
|
||||
</button>
|
||||
<button
|
||||
onClick={handleCorrect}
|
||||
className="flex-1 py-4 rounded-2xl font-semibold text-lg transition-all bg-gradient-to-r from-green-500 to-emerald-500 text-white hover:shadow-lg hover:shadow-green-500/25 hover:scale-[1.02]"
|
||||
>
|
||||
Richtig
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
140
studio-v2/components/learn/MicrophoneInput.tsx
Normal file
140
studio-v2/components/learn/MicrophoneInput.tsx
Normal file
@@ -0,0 +1,140 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useRef, useCallback } from 'react'
|
||||
|
||||
interface MicrophoneInputProps {
|
||||
expectedText: string
|
||||
lang: 'en' | 'de'
|
||||
onResult: (transcript: string, correct: boolean) => void
|
||||
isDark: boolean
|
||||
}
|
||||
|
||||
export function MicrophoneInput({ expectedText, lang, onResult, isDark }: MicrophoneInputProps) {
|
||||
const [isListening, setIsListening] = useState(false)
|
||||
const [transcript, setTranscript] = useState('')
|
||||
const [feedback, setFeedback] = useState<'correct' | 'wrong' | null>(null)
|
||||
const recognitionRef = useRef<any>(null)
|
||||
|
||||
const startListening = useCallback(() => {
|
||||
const SpeechRecognition = (window as any).SpeechRecognition || (window as any).webkitSpeechRecognition
|
||||
if (!SpeechRecognition) {
|
||||
setTranscript('Spracherkennung nicht verfuegbar')
|
||||
return
|
||||
}
|
||||
|
||||
const recognition = new SpeechRecognition()
|
||||
recognition.lang = lang === 'de' ? 'de-DE' : 'en-GB'
|
||||
recognition.interimResults = false
|
||||
recognition.maxAlternatives = 3
|
||||
recognition.continuous = false
|
||||
|
||||
recognition.onresult = (event: any) => {
|
||||
const results = event.results[0]
|
||||
let bestMatch = ''
|
||||
let isCorrect = false
|
||||
|
||||
// Check all alternatives for a match
|
||||
for (let i = 0; i < results.length; i++) {
|
||||
const alt = results[i].transcript.trim().toLowerCase()
|
||||
if (alt === expectedText.trim().toLowerCase()) {
|
||||
bestMatch = results[i].transcript
|
||||
isCorrect = true
|
||||
break
|
||||
}
|
||||
if (!bestMatch) bestMatch = results[i].transcript
|
||||
}
|
||||
|
||||
setTranscript(bestMatch)
|
||||
setFeedback(isCorrect ? 'correct' : 'wrong')
|
||||
setIsListening(false)
|
||||
|
||||
setTimeout(() => {
|
||||
onResult(bestMatch, isCorrect)
|
||||
setFeedback(null)
|
||||
setTranscript('')
|
||||
}, isCorrect ? 1000 : 2500)
|
||||
}
|
||||
|
||||
recognition.onerror = (event: any) => {
|
||||
console.error('Speech recognition error:', event.error)
|
||||
setIsListening(false)
|
||||
if (event.error === 'no-speech') {
|
||||
setTranscript('Kein Ton erkannt. Nochmal versuchen.')
|
||||
} else if (event.error === 'not-allowed') {
|
||||
setTranscript('Mikrofon-Zugriff nicht erlaubt.')
|
||||
}
|
||||
}
|
||||
|
||||
recognition.onend = () => {
|
||||
setIsListening(false)
|
||||
}
|
||||
|
||||
recognitionRef.current = recognition
|
||||
recognition.start()
|
||||
setIsListening(true)
|
||||
setTranscript('')
|
||||
setFeedback(null)
|
||||
}, [lang, expectedText, onResult])
|
||||
|
||||
const stopListening = useCallback(() => {
|
||||
recognitionRef.current?.stop()
|
||||
setIsListening(false)
|
||||
}, [])
|
||||
|
||||
return (
|
||||
<div className="flex flex-col items-center gap-4">
|
||||
{/* Microphone Button */}
|
||||
<button
|
||||
onClick={isListening ? stopListening : startListening}
|
||||
className={`w-20 h-20 rounded-full flex items-center justify-center transition-all ${
|
||||
isListening
|
||||
? 'bg-red-500 text-white animate-pulse shadow-lg shadow-red-500/30'
|
||||
: feedback === 'correct'
|
||||
? 'bg-green-500 text-white shadow-lg shadow-green-500/30'
|
||||
: feedback === 'wrong'
|
||||
? 'bg-red-500/60 text-white'
|
||||
: isDark
|
||||
? 'bg-white/10 text-white/70 hover:bg-white/20 hover:text-white'
|
||||
: 'bg-slate-100 text-slate-500 hover:bg-slate-200 hover:text-slate-700'
|
||||
}`}
|
||||
>
|
||||
<svg className="w-8 h-8" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
{isListening ? (
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0zM10 9v6m4-6v6" />
|
||||
) : (
|
||||
<>
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11a7 7 0 01-7 7m0 0a7 7 0 01-7-7m7 7v4m0 0H8m4 0h4" />
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 1a3 3 0 00-3 3v8a3 3 0 006 0V4a3 3 0 00-3-3z" />
|
||||
</>
|
||||
)}
|
||||
</svg>
|
||||
</button>
|
||||
|
||||
{/* Status Text */}
|
||||
<p className={`text-sm text-center ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{isListening
|
||||
? 'Sprich jetzt...'
|
||||
: transcript
|
||||
? transcript
|
||||
: 'Tippe auf das Mikrofon'}
|
||||
</p>
|
||||
|
||||
{/* Feedback */}
|
||||
{feedback === 'correct' && (
|
||||
<p className={`text-lg font-semibold ${isDark ? 'text-green-300' : 'text-green-600'}`}>
|
||||
Richtig ausgesprochen!
|
||||
</p>
|
||||
)}
|
||||
{feedback === 'wrong' && (
|
||||
<div className="text-center">
|
||||
<p className={`text-sm ${isDark ? 'text-red-300' : 'text-red-600'}`}>
|
||||
Erkannt: "{transcript}"
|
||||
</p>
|
||||
<p className={`text-sm mt-1 ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
Erwartet: "{expectedText}"
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
126
studio-v2/components/learn/QuizQuestion.tsx
Normal file
126
studio-v2/components/learn/QuizQuestion.tsx
Normal file
@@ -0,0 +1,126 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useCallback } from 'react'
|
||||
|
||||
interface Option {
|
||||
id: string
|
||||
text: string
|
||||
}
|
||||
|
||||
interface QuizQuestionProps {
|
||||
question: string
|
||||
options: Option[]
|
||||
correctAnswer: string
|
||||
explanation?: string
|
||||
questionNumber: number
|
||||
totalQuestions: number
|
||||
onAnswer: (correct: boolean) => void
|
||||
isDark: boolean
|
||||
}
|
||||
|
||||
export function QuizQuestion({
|
||||
question,
|
||||
options,
|
||||
correctAnswer,
|
||||
explanation,
|
||||
questionNumber,
|
||||
totalQuestions,
|
||||
onAnswer,
|
||||
isDark,
|
||||
}: QuizQuestionProps) {
|
||||
const [selected, setSelected] = useState<string | null>(null)
|
||||
const [revealed, setRevealed] = useState(false)
|
||||
|
||||
const handleSelect = useCallback((optionId: string) => {
|
||||
if (revealed) return
|
||||
setSelected(optionId)
|
||||
setRevealed(true)
|
||||
|
||||
const isCorrect = optionId === correctAnswer
|
||||
setTimeout(() => {
|
||||
onAnswer(isCorrect)
|
||||
setSelected(null)
|
||||
setRevealed(false)
|
||||
}, isCorrect ? 1000 : 2500)
|
||||
}, [revealed, correctAnswer, onAnswer])
|
||||
|
||||
const getOptionStyle = (optionId: string) => {
|
||||
if (!revealed) {
|
||||
return isDark
|
||||
? 'bg-white/10 border-white/20 hover:bg-white/20 hover:border-white/30 text-white'
|
||||
: 'bg-white border-slate-200 hover:bg-slate-50 hover:border-slate-300 text-slate-900'
|
||||
}
|
||||
|
||||
if (optionId === correctAnswer) {
|
||||
return isDark
|
||||
? 'bg-green-500/20 border-green-400 text-green-200'
|
||||
: 'bg-green-50 border-green-500 text-green-800'
|
||||
}
|
||||
|
||||
if (optionId === selected && optionId !== correctAnswer) {
|
||||
return isDark
|
||||
? 'bg-red-500/20 border-red-400 text-red-200'
|
||||
: 'bg-red-50 border-red-500 text-red-800'
|
||||
}
|
||||
|
||||
return isDark
|
||||
? 'bg-white/5 border-white/10 text-white/40'
|
||||
: 'bg-slate-50 border-slate-200 text-slate-400'
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="w-full max-w-lg mx-auto space-y-6">
|
||||
{/* Progress */}
|
||||
<div className="flex items-center justify-between">
|
||||
<span className={`text-sm ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
Frage {questionNumber} / {totalQuestions}
|
||||
</span>
|
||||
<div className="w-32 h-1.5 rounded-full bg-white/10 overflow-hidden">
|
||||
<div
|
||||
className="h-full rounded-full bg-gradient-to-r from-purple-500 to-pink-500 transition-all"
|
||||
style={{ width: `${(questionNumber / totalQuestions) * 100}%` }}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Question */}
|
||||
<div className={`p-6 rounded-2xl ${isDark ? 'bg-white/10 backdrop-blur-xl border border-white/20' : 'bg-white shadow-lg border border-slate-200'}`}>
|
||||
<p className={`text-xl font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{question}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Options */}
|
||||
<div className="space-y-3">
|
||||
{options.map((opt, idx) => (
|
||||
<button
|
||||
key={opt.id}
|
||||
onClick={() => handleSelect(opt.id)}
|
||||
disabled={revealed}
|
||||
className={`w-full p-4 rounded-xl border-2 text-left transition-all flex items-center gap-3 ${getOptionStyle(opt.id)}`}
|
||||
>
|
||||
<span className={`w-8 h-8 rounded-full flex items-center justify-center text-sm font-bold flex-shrink-0 ${
|
||||
revealed && opt.id === correctAnswer
|
||||
? 'bg-green-500 text-white'
|
||||
: revealed && opt.id === selected
|
||||
? 'bg-red-500 text-white'
|
||||
: isDark ? 'bg-white/10' : 'bg-slate-100'
|
||||
}`}>
|
||||
{String.fromCharCode(65 + idx)}
|
||||
</span>
|
||||
<span className="text-base">{opt.text}</span>
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Explanation */}
|
||||
{revealed && explanation && (
|
||||
<div className={`p-4 rounded-xl ${isDark ? 'bg-blue-500/10 border border-blue-500/20' : 'bg-blue-50 border border-blue-200'}`}>
|
||||
<p className={`text-sm ${isDark ? 'text-blue-200' : 'text-blue-700'}`}>
|
||||
{explanation}
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
117
studio-v2/components/learn/SyllableBow.tsx
Normal file
117
studio-v2/components/learn/SyllableBow.tsx
Normal file
@@ -0,0 +1,117 @@
|
||||
'use client'
|
||||
|
||||
import React, { useMemo } from 'react'
|
||||
|
||||
interface SyllableBowProps {
|
||||
word: string
|
||||
syllables: string[]
|
||||
onSyllableClick?: (syllable: string, index: number) => void
|
||||
isDark: boolean
|
||||
size?: 'sm' | 'md' | 'lg'
|
||||
}
|
||||
|
||||
/**
|
||||
* SyllableBow — Renders a word with SVG arcs under each syllable.
|
||||
*
|
||||
* Uses pyphen syllable data from the backend.
|
||||
* Each syllable is clickable (triggers TTS for that syllable).
|
||||
*/
|
||||
export function SyllableBow({ word, syllables, onSyllableClick, isDark, size = 'md' }: SyllableBowProps) {
|
||||
const fontSize = size === 'sm' ? 20 : size === 'md' ? 32 : 44
|
||||
const charWidth = fontSize * 0.6
|
||||
const bowHeight = size === 'sm' ? 12 : size === 'md' ? 18 : 24
|
||||
const gap = 4
|
||||
|
||||
const layout = useMemo(() => {
|
||||
let x = 0
|
||||
return syllables.map((syl) => {
|
||||
const width = syl.length * charWidth
|
||||
const entry = { syllable: syl, x, width }
|
||||
x += width + gap
|
||||
return entry
|
||||
})
|
||||
}, [syllables, charWidth])
|
||||
|
||||
const totalWidth = layout.length > 0
|
||||
? layout[layout.length - 1].x + layout[layout.length - 1].width
|
||||
: 0
|
||||
|
||||
const svgHeight = bowHeight + 6
|
||||
|
||||
return (
|
||||
<div className="inline-flex flex-col items-center">
|
||||
{/* Letters */}
|
||||
<div className="flex" style={{ gap: `${gap}px` }}>
|
||||
{layout.map((item, idx) => (
|
||||
<span
|
||||
key={idx}
|
||||
onClick={() => onSyllableClick?.(item.syllable, idx)}
|
||||
className={`font-bold cursor-pointer select-none transition-colors ${
|
||||
onSyllableClick
|
||||
? (isDark ? 'hover:text-blue-300' : 'hover:text-blue-600')
|
||||
: ''
|
||||
} ${isDark ? 'text-white' : 'text-slate-900'}`}
|
||||
style={{ fontSize: `${fontSize}px`, letterSpacing: '0.02em' }}
|
||||
>
|
||||
{item.syllable}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* SVG Bows */}
|
||||
<svg
|
||||
width={totalWidth}
|
||||
height={svgHeight}
|
||||
viewBox={`0 0 ${totalWidth} ${svgHeight}`}
|
||||
className="mt-0.5"
|
||||
>
|
||||
{layout.map((item, idx) => {
|
||||
const cx = item.x + item.width / 2
|
||||
const startX = item.x + 2
|
||||
const endX = item.x + item.width - 2
|
||||
const controlY = svgHeight - 2
|
||||
|
||||
return (
|
||||
<path
|
||||
key={idx}
|
||||
d={`M ${startX} 2 Q ${cx} ${controlY} ${endX} 2`}
|
||||
fill="none"
|
||||
stroke={isDark ? 'rgba(96, 165, 250, 0.6)' : 'rgba(37, 99, 235, 0.5)'}
|
||||
strokeWidth={size === 'sm' ? 1.5 : 2}
|
||||
strokeLinecap="round"
|
||||
/>
|
||||
)
|
||||
})}
|
||||
</svg>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Simple client-side syllable splitting fallback.
|
||||
* For accurate results, use the backend pyphen endpoint.
|
||||
*/
|
||||
export function simpleSyllableSplit(word: string): string[] {
|
||||
// Very basic vowel-based heuristic for display purposes
|
||||
const vowels = /[aeiouyäöü]/i
|
||||
const chars = word.split('')
|
||||
const syllables: string[] = []
|
||||
let current = ''
|
||||
|
||||
for (let i = 0; i < chars.length; i++) {
|
||||
current += chars[i]
|
||||
if (
|
||||
vowels.test(chars[i]) &&
|
||||
i < chars.length - 1 &&
|
||||
current.length >= 2
|
||||
) {
|
||||
// Check if next char starts a new consonant cluster
|
||||
if (!vowels.test(chars[i + 1]) && i + 2 < chars.length && vowels.test(chars[i + 2])) {
|
||||
syllables.push(current)
|
||||
current = ''
|
||||
}
|
||||
}
|
||||
}
|
||||
if (current) syllables.push(current)
|
||||
return syllables.length > 0 ? syllables : [word]
|
||||
}
|
||||
149
studio-v2/components/learn/TypeInput.tsx
Normal file
149
studio-v2/components/learn/TypeInput.tsx
Normal file
@@ -0,0 +1,149 @@
|
||||
'use client'
|
||||
|
||||
import React, { useState, useRef, useEffect } from 'react'
|
||||
|
||||
interface TypeInputProps {
|
||||
prompt: string
|
||||
answer: string
|
||||
onResult: (correct: boolean) => void
|
||||
isDark: boolean
|
||||
}
|
||||
|
||||
function levenshtein(a: string, b: string): number {
|
||||
const m = a.length
|
||||
const n = b.length
|
||||
const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0))
|
||||
for (let i = 0; i <= m; i++) dp[i][0] = i
|
||||
for (let j = 0; j <= n; j++) dp[0][j] = j
|
||||
for (let i = 1; i <= m; i++) {
|
||||
for (let j = 1; j <= n; j++) {
|
||||
dp[i][j] = Math.min(
|
||||
dp[i - 1][j] + 1,
|
||||
dp[i][j - 1] + 1,
|
||||
dp[i - 1][j - 1] + (a[i - 1] === b[j - 1] ? 0 : 1)
|
||||
)
|
||||
}
|
||||
}
|
||||
return dp[m][n]
|
||||
}
|
||||
|
||||
export function TypeInput({ prompt, answer, onResult, isDark }: TypeInputProps) {
|
||||
const [input, setInput] = useState('')
|
||||
const [feedback, setFeedback] = useState<'correct' | 'almost' | 'wrong' | null>(null)
|
||||
const inputRef = useRef<HTMLInputElement>(null)
|
||||
|
||||
useEffect(() => {
|
||||
setInput('')
|
||||
setFeedback(null)
|
||||
inputRef.current?.focus()
|
||||
}, [prompt, answer])
|
||||
|
||||
const handleSubmit = (e: React.FormEvent) => {
|
||||
e.preventDefault()
|
||||
const userAnswer = input.trim().toLowerCase()
|
||||
const correctAnswer = answer.trim().toLowerCase()
|
||||
|
||||
if (userAnswer === correctAnswer) {
|
||||
setFeedback('correct')
|
||||
setTimeout(() => onResult(true), 800)
|
||||
} else if (levenshtein(userAnswer, correctAnswer) <= 2) {
|
||||
setFeedback('almost')
|
||||
setTimeout(() => {
|
||||
setFeedback('wrong')
|
||||
setTimeout(() => onResult(false), 2000)
|
||||
}, 1500)
|
||||
} else {
|
||||
setFeedback('wrong')
|
||||
setTimeout(() => onResult(false), 2500)
|
||||
}
|
||||
}
|
||||
|
||||
const feedbackColors = {
|
||||
correct: isDark ? 'border-green-400 bg-green-500/20' : 'border-green-500 bg-green-50',
|
||||
almost: isDark ? 'border-yellow-400 bg-yellow-500/20' : 'border-yellow-500 bg-yellow-50',
|
||||
wrong: isDark ? 'border-red-400 bg-red-500/20' : 'border-red-500 bg-red-50',
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="w-full max-w-lg mx-auto space-y-6">
|
||||
{/* Prompt */}
|
||||
<div className={`text-center p-8 rounded-3xl ${isDark ? 'bg-white/10 backdrop-blur-xl border border-white/20' : 'bg-white shadow-xl border border-slate-200'}`}>
|
||||
<span className={`text-xs font-medium ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
UEBERSETZE
|
||||
</span>
|
||||
<p className={`text-3xl font-bold mt-2 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{prompt}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Input */}
|
||||
<form onSubmit={handleSubmit}>
|
||||
<div className={`rounded-2xl overflow-hidden border-2 transition-colors ${
|
||||
feedback ? feedbackColors[feedback] : (isDark ? 'border-white/20' : 'border-slate-200')
|
||||
}`}>
|
||||
<input
|
||||
ref={inputRef}
|
||||
type="text"
|
||||
value={input}
|
||||
onChange={(e) => setInput(e.target.value)}
|
||||
disabled={feedback !== null}
|
||||
placeholder="Antwort eintippen..."
|
||||
autoComplete="off"
|
||||
autoCorrect="off"
|
||||
spellCheck={false}
|
||||
className={`w-full px-6 py-4 text-xl text-center outline-none ${
|
||||
isDark
|
||||
? 'bg-transparent text-white placeholder-white/30'
|
||||
: 'bg-transparent text-slate-900 placeholder-slate-400'
|
||||
}`}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{!feedback && (
|
||||
<button
|
||||
type="submit"
|
||||
disabled={!input.trim()}
|
||||
className={`w-full mt-4 py-3 rounded-xl font-medium transition-all ${
|
||||
input.trim()
|
||||
? 'bg-gradient-to-r from-blue-500 to-cyan-500 text-white hover:shadow-lg'
|
||||
: isDark ? 'bg-white/5 text-white/30' : 'bg-slate-100 text-slate-400'
|
||||
}`}
|
||||
>
|
||||
Pruefen
|
||||
</button>
|
||||
)}
|
||||
</form>
|
||||
|
||||
{/* Feedback Message */}
|
||||
{feedback === 'correct' && (
|
||||
<div className="text-center">
|
||||
<span className="text-2xl">✅</span>
|
||||
<p className={`text-lg font-semibold mt-1 ${isDark ? 'text-green-300' : 'text-green-600'}`}>
|
||||
Richtig!
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{feedback === 'almost' && (
|
||||
<div className="text-center">
|
||||
<span className="text-2xl">🤏</span>
|
||||
<p className={`text-lg font-semibold mt-1 ${isDark ? 'text-yellow-300' : 'text-yellow-600'}`}>
|
||||
Fast richtig! Meintest du: <span className="underline">{answer}</span>
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{feedback === 'wrong' && (
|
||||
<div className="text-center">
|
||||
<span className="text-2xl">❌</span>
|
||||
<p className={`text-lg font-semibold mt-1 ${isDark ? 'text-red-300' : 'text-red-600'}`}>
|
||||
Falsch. Richtige Antwort:
|
||||
</p>
|
||||
<p className={`text-xl font-bold mt-1 ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{answer}
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
91
studio-v2/components/learn/UnitCard.tsx
Normal file
91
studio-v2/components/learn/UnitCard.tsx
Normal file
@@ -0,0 +1,91 @@
|
||||
'use client'
|
||||
|
||||
import React from 'react'
|
||||
import Link from 'next/link'
|
||||
|
||||
interface LearningUnit {
|
||||
id: string
|
||||
label: string
|
||||
meta: string
|
||||
title: string
|
||||
topic: string | null
|
||||
grade_level: string | null
|
||||
status: string
|
||||
created_at: string
|
||||
}
|
||||
|
||||
interface UnitCardProps {
|
||||
unit: LearningUnit
|
||||
isDark: boolean
|
||||
glassCard: string
|
||||
onDelete: (id: string) => void
|
||||
}
|
||||
|
||||
const exerciseTypes = [
|
||||
{ key: 'flashcards', label: 'Karteikarten', icon: 'M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10', color: 'from-amber-500 to-orange-500' },
|
||||
{ key: 'quiz', label: 'Quiz', icon: 'M8.228 9c.549-1.165 2.03-2 3.772-2 2.21 0 4 1.343 4 3 0 1.4-1.278 2.575-3.006 2.907-.542.104-.994.54-.994 1.093m0 3h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z', color: 'from-purple-500 to-pink-500' },
|
||||
{ key: 'type', label: 'Eintippen', icon: 'M9.75 17L9 20l-1 1h8l-1-1-.75-3M3 13h18M5 17h14a2 2 0 002-2V5a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z', color: 'from-blue-500 to-cyan-500' },
|
||||
{ key: 'story', label: 'Geschichte', icon: 'M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253', color: 'from-amber-500 to-yellow-500' },
|
||||
]
|
||||
|
||||
export function UnitCard({ unit, isDark, glassCard, onDelete }: UnitCardProps) {
|
||||
const createdDate = new Date(unit.created_at).toLocaleDateString('de-DE', {
|
||||
day: '2-digit',
|
||||
month: '2-digit',
|
||||
year: 'numeric',
|
||||
})
|
||||
|
||||
return (
|
||||
<div className={`${glassCard} rounded-2xl p-6 transition-all hover:shadow-lg`}>
|
||||
<div className="flex items-start justify-between mb-4">
|
||||
<div>
|
||||
<h3 className={`text-lg font-semibold ${isDark ? 'text-white' : 'text-slate-900'}`}>
|
||||
{unit.label}
|
||||
</h3>
|
||||
<p className={`text-sm mt-1 ${isDark ? 'text-white/50' : 'text-slate-500'}`}>
|
||||
{unit.meta}
|
||||
</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => onDelete(unit.id)}
|
||||
className={`p-2 rounded-lg transition-colors ${isDark ? 'hover:bg-white/10 text-white/40 hover:text-red-400' : 'hover:bg-slate-100 text-slate-400 hover:text-red-500'}`}
|
||||
title="Loeschen"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Exercise Type Buttons */}
|
||||
<div className="flex flex-wrap gap-2">
|
||||
{exerciseTypes.map((ex) => (
|
||||
<Link
|
||||
key={ex.key}
|
||||
href={`/learn/${unit.id}/${ex.key}`}
|
||||
className={`flex items-center gap-2 px-4 py-2.5 rounded-xl text-sm font-medium text-white bg-gradient-to-r ${ex.color} hover:shadow-lg hover:scale-[1.02] transition-all`}
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5} d={ex.icon} />
|
||||
</svg>
|
||||
{ex.label}
|
||||
</Link>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Status */}
|
||||
<div className={`flex items-center gap-3 mt-4 pt-3 border-t ${isDark ? 'border-white/10' : 'border-black/5'}`}>
|
||||
<span className={`text-xs ${isDark ? 'text-white/40' : 'text-slate-400'}`}>
|
||||
Erstellt: {createdDate}
|
||||
</span>
|
||||
<span className={`text-xs px-2 py-0.5 rounded-full ${
|
||||
unit.status === 'qa_generated' || unit.status === 'mc_generated' || unit.status === 'cloze_generated'
|
||||
? (isDark ? 'bg-green-500/20 text-green-300' : 'bg-green-100 text-green-700')
|
||||
: (isDark ? 'bg-yellow-500/20 text-yellow-300' : 'bg-yellow-100 text-yellow-700')
|
||||
}`}>
|
||||
{unit.status === 'raw' ? 'Neu' : 'Module generiert'}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -22,6 +22,7 @@
|
||||
"pdf-lib": "^1.17.1",
|
||||
"react": "^19.0.0",
|
||||
"react-dom": "^19.0.0",
|
||||
"@fortune-sheet/react": "^1.0.4",
|
||||
"react-leaflet": "^5.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
|
||||
Reference in New Issue
Block a user