Services: Admin-Lehrer, Backend-Lehrer, Studio v2, Website, Klausur-Service, School-Service, Voice-Service, Geo-Service, BreakPilot Drive, Agent-Core Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
426 lines
10 KiB
Markdown
426 lines
10 KiB
Markdown
# Vokabel-Arbeitsblatt Generator - Entwicklerhandbuch
|
|
|
|
**Version:** 1.0.0
|
|
**Datum:** 2026-01-23
|
|
|
|
---
|
|
|
|
## 1. Schnellstart
|
|
|
|
### 1.1 Lokale Entwicklung
|
|
|
|
```bash
|
|
# Backend starten (klausur-service)
|
|
cd /Users/benjaminadmin/Projekte/breakpilot-pwa/klausur-service/backend
|
|
source venv/bin/activate
|
|
uvicorn main:app --host 0.0.0.0 --port 8086 --reload
|
|
|
|
# Frontend starten (studio-v2)
|
|
cd /Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2
|
|
npm run dev
|
|
```
|
|
|
|
### 1.2 URLs
|
|
|
|
| Umgebung | Frontend | Backend API |
|
|
|----------|----------|-------------|
|
|
| Lokal | http://localhost:3001/vocab-worksheet | http://localhost:8086/api/v1/vocab/ |
|
|
| Mac Mini | http://macmini:3001/vocab-worksheet | http://macmini:8086/api/v1/vocab/ |
|
|
|
|
---
|
|
|
|
## 2. Projektstruktur
|
|
|
|
```
|
|
breakpilot-pwa/
|
|
├── klausur-service/
|
|
│ ├── backend/
|
|
│ │ ├── main.py # FastAPI App (inkl. Vocab-Router)
|
|
│ │ ├── vocab_worksheet_api.py # Vocab-Worksheet Endpoints
|
|
│ │ ├── hybrid_vocab_extractor.py # PaddleOCR + LLM Pipeline
|
|
│ │ └── tests/
|
|
│ │ └── test_vocab_worksheet.py # Unit Tests
|
|
│ └── docs/
|
|
│ ├── Vocab-Worksheet-Architecture.md
|
|
│ └── Vocab-Worksheet-Developer-Guide.md
|
|
│
|
|
└── studio-v2/
|
|
└── app/
|
|
└── vocab-worksheet/
|
|
└── page.tsx # Frontend (React/Next.js)
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Backend API
|
|
|
|
### 3.1 Endpoints-Uebersicht
|
|
|
|
```
|
|
POST /api/v1/vocab/sessions # Session erstellen
|
|
GET /api/v1/vocab/sessions # Sessions auflisten
|
|
GET /api/v1/vocab/sessions/{id} # Session abrufen
|
|
DELETE /api/v1/vocab/sessions/{id} # Session loeschen
|
|
|
|
POST /api/v1/vocab/sessions/{id}/upload # Bild/PDF hochladen
|
|
POST /api/v1/vocab/sessions/{id}/upload-pdf-info # PDF-Info abrufen
|
|
GET /api/v1/vocab/sessions/{id}/pdf-thumbnail/{p} # Seiten-Thumbnail
|
|
POST /api/v1/vocab/sessions/{id}/process-single-page/{p} # Seite verarbeiten
|
|
|
|
GET /api/v1/vocab/sessions/{id}/vocabulary # Vokabeln abrufen
|
|
PUT /api/v1/vocab/sessions/{id}/vocabulary # Vokabeln aktualisieren
|
|
|
|
POST /api/v1/vocab/sessions/{id}/generate # Arbeitsblatt generieren
|
|
GET /api/v1/vocab/worksheets/{id}/pdf # PDF herunterladen
|
|
GET /api/v1/vocab/worksheets/{id}/solution # Loesungs-PDF
|
|
```
|
|
|
|
### 3.2 Session erstellen
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "Englisch Klasse 7 - Unit 3",
|
|
"description": "Vokabeln aus Green Line",
|
|
"source_language": "en",
|
|
"target_language": "de"
|
|
}'
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "15dce1f4-f587-4b80-8c3d-62b20e7b845c",
|
|
"name": "Englisch Klasse 7 - Unit 3",
|
|
"status": "pending",
|
|
"vocabulary_count": 0,
|
|
"created_at": "2026-01-23T10:00:00Z"
|
|
}
|
|
```
|
|
|
|
### 3.3 Bild hochladen
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{session_id}/upload \
|
|
-F "file=@vokabeln.png"
|
|
```
|
|
|
|
### 3.4 PDF verarbeiten
|
|
|
|
```bash
|
|
# 1. PDF hochladen und Info abrufen
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/upload-pdf-info \
|
|
-F "file=@schulbuch.pdf"
|
|
|
|
# Response: {"session_id": "...", "page_count": 5}
|
|
|
|
# 2. Einzelne Seiten verarbeiten (empfohlen)
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/process-single-page/0
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/process-single-page/1
|
|
```
|
|
|
|
### 3.5 Vokabeln aktualisieren
|
|
|
|
```bash
|
|
curl -X PUT http://localhost:8086/api/v1/vocab/sessions/{id}/vocabulary \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"vocabulary": [
|
|
{
|
|
"id": "uuid-1",
|
|
"english": "achieve",
|
|
"german": "erreichen",
|
|
"example_sentence": "She achieved her goals."
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
|
|
### 3.6 Arbeitsblatt generieren
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"worksheet_types": ["en_to_de", "de_to_en"],
|
|
"include_solutions": true,
|
|
"line_height": "large"
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Frontend-Entwicklung
|
|
|
|
### 4.1 Komponenten-Struktur
|
|
|
|
Die gesamte UI ist in einer Datei (`page.tsx`) organisiert:
|
|
|
|
```typescript
|
|
// Hauptkomponente
|
|
export default function VocabWorksheetPage() {
|
|
const [activeTab, setActiveTab] = useState<TabType>('upload')
|
|
const [sessions, setSessions] = useState<Session[]>([])
|
|
const [currentSession, setCurrentSession] = useState<Session | null>(null)
|
|
const [vocabulary, setVocabulary] = useState<VocabularyEntry[]>([])
|
|
|
|
// ...
|
|
}
|
|
|
|
// Tabs
|
|
type TabType = 'upload' | 'pages' | 'vocabulary' | 'worksheet' | 'export'
|
|
```
|
|
|
|
### 4.2 API-Aufrufe
|
|
|
|
```typescript
|
|
// API Base URL automatisch ermitteln
|
|
const getApiBase = () => {
|
|
if (typeof window === 'undefined') return 'http://localhost:8086'
|
|
const host = window.location.hostname
|
|
return `http://${host}:8086`
|
|
}
|
|
|
|
// Session erstellen
|
|
const createSession = async (name: string) => {
|
|
const response = await fetch(`${getApiBase()}/api/v1/vocab/sessions`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({ name })
|
|
})
|
|
return response.json()
|
|
}
|
|
```
|
|
|
|
### 4.3 Styling
|
|
|
|
Tailwind CSS mit Dark/Light Theme:
|
|
|
|
```typescript
|
|
// Theme-aware Klassen
|
|
className="bg-white dark:bg-gray-800 text-gray-900 dark:text-white"
|
|
|
|
// Gradient-Buttons
|
|
className="bg-gradient-to-r from-purple-600 to-blue-600 hover:from-purple-700 hover:to-blue-700"
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Vokabel-Extraktion
|
|
|
|
### 5.1 Vision LLM Modus (Standard)
|
|
|
|
```python
|
|
# vocab_worksheet_api.py
|
|
|
|
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
|
|
VISION_MODEL = os.getenv("OLLAMA_VISION_MODEL", "qwen2.5vl:32b")
|
|
|
|
async def extract_vocabulary_from_image(image_data: bytes, filename: str):
|
|
# Base64-kodiertes Bild an Ollama senden
|
|
image_base64 = base64.b64encode(image_data).decode("utf-8")
|
|
|
|
payload = {
|
|
"model": VISION_MODEL,
|
|
"messages": [{
|
|
"role": "user",
|
|
"content": VOCAB_EXTRACTION_PROMPT,
|
|
"images": [image_base64]
|
|
}],
|
|
"stream": False
|
|
}
|
|
|
|
response = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
|
|
# Parse JSON response...
|
|
```
|
|
|
|
### 5.2 Hybrid OCR + LLM Modus (Optional)
|
|
|
|
```python
|
|
# hybrid_vocab_extractor.py
|
|
|
|
async def extract_vocabulary_hybrid(image_bytes: bytes, page_number: int):
|
|
# 1. PaddleOCR fuer Text-Erkennung
|
|
regions, raw_text = run_paddle_ocr(image_bytes)
|
|
|
|
# 2. Text fuer LLM formatieren
|
|
formatted_text = format_ocr_for_llm(regions)
|
|
|
|
# 3. LLM strukturiert die Daten
|
|
vocabulary = await structure_vocabulary_with_llm(formatted_text)
|
|
|
|
return vocabulary, confidence, error
|
|
```
|
|
|
|
### 5.3 Prompt Engineering
|
|
|
|
Der Extraktions-Prompt ist auf Deutsch fuer bessere Ergebnisse:
|
|
|
|
```python
|
|
VOCAB_EXTRACTION_PROMPT = """Analysiere dieses Bild einer Vokabelliste aus einem Schulbuch.
|
|
|
|
AUFGABE: Extrahiere alle Vokabeleintraege in folgendem JSON-Format:
|
|
|
|
{
|
|
"vocabulary": [
|
|
{
|
|
"english": "to improve",
|
|
"german": "verbessern",
|
|
"example": "I want to improve my English."
|
|
}
|
|
]
|
|
}
|
|
|
|
REGELN:
|
|
1. Erkenne das typische 3-Spalten-Layout: Englisch | Deutsch | Beispielsatz
|
|
2. Behalte die exakte Schreibweise bei
|
|
3. Bei fehlenden Beispielsaetzen: "example": null
|
|
4. Ignoriere Seitenzahlen, Ueberschriften
|
|
5. Gib NUR valides JSON zurueck
|
|
"""
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Tests
|
|
|
|
### 6.1 Tests ausfuehren
|
|
|
|
```bash
|
|
cd /Users/benjaminadmin/Projekte/breakpilot-pwa/klausur-service/backend
|
|
source venv/bin/activate
|
|
|
|
# Alle Tests
|
|
pytest tests/test_vocab_worksheet.py -v
|
|
|
|
# Mit Coverage
|
|
pytest tests/test_vocab_worksheet.py --cov=vocab_worksheet_api --cov-report=html
|
|
|
|
# Einzelne Testklasse
|
|
pytest tests/test_vocab_worksheet.py::TestSessionCRUD -v
|
|
```
|
|
|
|
### 6.2 Test-Kategorien
|
|
|
|
| Klasse | Beschreibung |
|
|
|--------|--------------|
|
|
| `TestSessionCRUD` | Session erstellen, lesen, loeschen |
|
|
| `TestVocabulary` | Vokabeln abrufen, aktualisieren |
|
|
| `TestWorksheetGeneration` | Arbeitsblatt-Generierung |
|
|
| `TestJSONParsing` | LLM-Response parsing |
|
|
| `TestFileUpload` | Bild/PDF-Upload |
|
|
| `TestSessionStatus` | Status-Workflow |
|
|
| `TestEdgeCases` | Randfaelle |
|
|
|
|
---
|
|
|
|
## 7. Deployment
|
|
|
|
### 7.1 Docker Build
|
|
|
|
```bash
|
|
# Klausur-Service neu bauen
|
|
docker compose build klausur-service
|
|
|
|
# Studio-v2 neu bauen
|
|
docker compose build studio-v2
|
|
```
|
|
|
|
### 7.2 Sync zum Mac Mini
|
|
|
|
```bash
|
|
# Source-Files synchronisieren
|
|
rsync -avz --exclude 'node_modules' --exclude '.next' --exclude '__pycache__' \
|
|
/Users/benjaminadmin/Projekte/breakpilot-pwa/ \
|
|
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/
|
|
|
|
# Container neu starten
|
|
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-pwa && docker compose up -d"
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Troubleshooting
|
|
|
|
### 8.1 Haeufige Probleme
|
|
|
|
| Problem | Loesung |
|
|
|---------|---------|
|
|
| Ollama nicht erreichbar | `docker exec -it ollama ollama list` pruefen |
|
|
| PDF-Konvertierung schlaegt fehl | PyMuPDF installiert? `pip install PyMuPDF` |
|
|
| Vision LLM Timeout | Timeout auf 300s erhoehen |
|
|
| Leere Vokabel-Liste | Bild-Qualitaet pruefen, anderen LLM-Modus testen |
|
|
|
|
### 8.2 Logs pruefen
|
|
|
|
```bash
|
|
# Backend-Logs
|
|
docker logs klausur-service -f --tail 100
|
|
|
|
# Ollama-Logs
|
|
docker logs ollama -f --tail 100
|
|
```
|
|
|
|
### 8.3 Debug-Modus
|
|
|
|
```python
|
|
# In vocab_worksheet_api.py
|
|
import logging
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Zeigt detaillierte OCR/LLM-Ausgaben
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Erweiterung
|
|
|
|
### 9.1 Neue Sprache hinzufuegen
|
|
|
|
1. `source_language` und `target_language` in Session-Model erweitern
|
|
2. Prompt anpassen fuer neue Sprachkombination
|
|
3. Frontend-Dropdown erweitern
|
|
|
|
### 9.2 Neuer Arbeitsblatt-Typ
|
|
|
|
1. `WorksheetType` Enum erweitern:
|
|
```python
|
|
class WorksheetType(str, Enum):
|
|
# ...
|
|
CROSSWORD = "crossword" # Neu
|
|
```
|
|
|
|
2. `generate_worksheet_html()` erweitern
|
|
|
|
3. Frontend-Checkbox hinzufuegen
|
|
|
|
### 9.3 Datenbank-Persistenz
|
|
|
|
Aktuell: In-Memory (`_sessions` Dict)
|
|
|
|
Fuer Produktion PostgreSQL hinzufuegen:
|
|
|
|
```python
|
|
# models.py
|
|
class VocabSession(Base):
|
|
__tablename__ = "vocab_sessions"
|
|
id = Column(UUID, primary_key=True)
|
|
name = Column(String)
|
|
status = Column(String)
|
|
vocabulary = Column(JSON)
|
|
# ...
|
|
```
|
|
|
|
---
|
|
|
|
## 10. API-Referenz
|
|
|
|
Vollstaendige OpenAPI-Dokumentation verfuegbar unter:
|
|
|
|
- **Swagger UI:** http://macmini:8086/docs
|
|
- **ReDoc:** http://macmini:8086/redoc
|
|
|
|
Filter nach Tag `Vocabulary Worksheets` fuer alle Vocab-Endpoints.
|