This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
breakpilot-pwa/klausur-service/docs/Vocab-Worksheet-Developer-Guide.md
Benjamin Admin 21a844cb8a fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 09:51:32 +01:00

10 KiB

Vokabel-Arbeitsblatt Generator - Entwicklerhandbuch

Version: 1.0.0 Datum: 2026-01-23


1. Schnellstart

1.1 Lokale Entwicklung

# Backend starten (klausur-service)
cd /Users/benjaminadmin/Projekte/breakpilot-pwa/klausur-service/backend
source venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8086 --reload

# Frontend starten (studio-v2)
cd /Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2
npm run dev

1.2 URLs

Umgebung Frontend Backend API
Lokal http://localhost:3001/vocab-worksheet http://localhost:8086/api/v1/vocab/
Mac Mini http://macmini:3001/vocab-worksheet http://macmini:8086/api/v1/vocab/

2. Projektstruktur

breakpilot-pwa/
├── klausur-service/
│   ├── backend/
│   │   ├── main.py                      # FastAPI App (inkl. Vocab-Router)
│   │   ├── vocab_worksheet_api.py       # Vocab-Worksheet Endpoints
│   │   ├── hybrid_vocab_extractor.py    # PaddleOCR + LLM Pipeline
│   │   └── tests/
│   │       └── test_vocab_worksheet.py  # Unit Tests
│   └── docs/
│       ├── Vocab-Worksheet-Architecture.md
│       └── Vocab-Worksheet-Developer-Guide.md
│
└── studio-v2/
    └── app/
        └── vocab-worksheet/
            └── page.tsx                 # Frontend (React/Next.js)

3. Backend API

3.1 Endpoints-Uebersicht

POST   /api/v1/vocab/sessions                        # Session erstellen
GET    /api/v1/vocab/sessions                        # Sessions auflisten
GET    /api/v1/vocab/sessions/{id}                   # Session abrufen
DELETE /api/v1/vocab/sessions/{id}                   # Session loeschen

POST   /api/v1/vocab/sessions/{id}/upload            # Bild/PDF hochladen
POST   /api/v1/vocab/sessions/{id}/upload-pdf-info   # PDF-Info abrufen
GET    /api/v1/vocab/sessions/{id}/pdf-thumbnail/{p} # Seiten-Thumbnail
POST   /api/v1/vocab/sessions/{id}/process-single-page/{p}  # Seite verarbeiten

GET    /api/v1/vocab/sessions/{id}/vocabulary        # Vokabeln abrufen
PUT    /api/v1/vocab/sessions/{id}/vocabulary        # Vokabeln aktualisieren

POST   /api/v1/vocab/sessions/{id}/generate          # Arbeitsblatt generieren
GET    /api/v1/vocab/worksheets/{id}/pdf             # PDF herunterladen
GET    /api/v1/vocab/worksheets/{id}/solution        # Loesungs-PDF

3.2 Session erstellen

curl -X POST http://localhost:8086/api/v1/vocab/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Englisch Klasse 7 - Unit 3",
    "description": "Vokabeln aus Green Line",
    "source_language": "en",
    "target_language": "de"
  }'

Response:

{
  "id": "15dce1f4-f587-4b80-8c3d-62b20e7b845c",
  "name": "Englisch Klasse 7 - Unit 3",
  "status": "pending",
  "vocabulary_count": 0,
  "created_at": "2026-01-23T10:00:00Z"
}

3.3 Bild hochladen

curl -X POST http://localhost:8086/api/v1/vocab/sessions/{session_id}/upload \
  -F "file=@vokabeln.png"

3.4 PDF verarbeiten

# 1. PDF hochladen und Info abrufen
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/upload-pdf-info \
  -F "file=@schulbuch.pdf"

# Response: {"session_id": "...", "page_count": 5}

# 2. Einzelne Seiten verarbeiten (empfohlen)
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/process-single-page/0
curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/process-single-page/1

3.5 Vokabeln aktualisieren

curl -X PUT http://localhost:8086/api/v1/vocab/sessions/{id}/vocabulary \
  -H "Content-Type: application/json" \
  -d '{
    "vocabulary": [
      {
        "id": "uuid-1",
        "english": "achieve",
        "german": "erreichen",
        "example_sentence": "She achieved her goals."
      }
    ]
  }'

3.6 Arbeitsblatt generieren

curl -X POST http://localhost:8086/api/v1/vocab/sessions/{id}/generate \
  -H "Content-Type: application/json" \
  -d '{
    "worksheet_types": ["en_to_de", "de_to_en"],
    "include_solutions": true,
    "line_height": "large"
  }'

4. Frontend-Entwicklung

4.1 Komponenten-Struktur

Die gesamte UI ist in einer Datei (page.tsx) organisiert:

// Hauptkomponente
export default function VocabWorksheetPage() {
  const [activeTab, setActiveTab] = useState<TabType>('upload')
  const [sessions, setSessions] = useState<Session[]>([])
  const [currentSession, setCurrentSession] = useState<Session | null>(null)
  const [vocabulary, setVocabulary] = useState<VocabularyEntry[]>([])

  // ...
}

// Tabs
type TabType = 'upload' | 'pages' | 'vocabulary' | 'worksheet' | 'export'

4.2 API-Aufrufe

// API Base URL automatisch ermitteln
const getApiBase = () => {
  if (typeof window === 'undefined') return 'http://localhost:8086'
  const host = window.location.hostname
  return `http://${host}:8086`
}

// Session erstellen
const createSession = async (name: string) => {
  const response = await fetch(`${getApiBase()}/api/v1/vocab/sessions`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ name })
  })
  return response.json()
}

4.3 Styling

Tailwind CSS mit Dark/Light Theme:

// Theme-aware Klassen
className="bg-white dark:bg-gray-800 text-gray-900 dark:text-white"

// Gradient-Buttons
className="bg-gradient-to-r from-purple-600 to-blue-600 hover:from-purple-700 hover:to-blue-700"

5. Vokabel-Extraktion

5.1 Vision LLM Modus (Standard)

# vocab_worksheet_api.py

OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
VISION_MODEL = os.getenv("OLLAMA_VISION_MODEL", "qwen2.5vl:32b")

async def extract_vocabulary_from_image(image_data: bytes, filename: str):
    # Base64-kodiertes Bild an Ollama senden
    image_base64 = base64.b64encode(image_data).decode("utf-8")

    payload = {
        "model": VISION_MODEL,
        "messages": [{
            "role": "user",
            "content": VOCAB_EXTRACTION_PROMPT,
            "images": [image_base64]
        }],
        "stream": False
    }

    response = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
    # Parse JSON response...

5.2 Hybrid OCR + LLM Modus (Optional)

# hybrid_vocab_extractor.py

async def extract_vocabulary_hybrid(image_bytes: bytes, page_number: int):
    # 1. PaddleOCR fuer Text-Erkennung
    regions, raw_text = run_paddle_ocr(image_bytes)

    # 2. Text fuer LLM formatieren
    formatted_text = format_ocr_for_llm(regions)

    # 3. LLM strukturiert die Daten
    vocabulary = await structure_vocabulary_with_llm(formatted_text)

    return vocabulary, confidence, error

5.3 Prompt Engineering

Der Extraktions-Prompt ist auf Deutsch fuer bessere Ergebnisse:

VOCAB_EXTRACTION_PROMPT = """Analysiere dieses Bild einer Vokabelliste aus einem Schulbuch.

AUFGABE: Extrahiere alle Vokabeleintraege in folgendem JSON-Format:

{
  "vocabulary": [
    {
      "english": "to improve",
      "german": "verbessern",
      "example": "I want to improve my English."
    }
  ]
}

REGELN:
1. Erkenne das typische 3-Spalten-Layout: Englisch | Deutsch | Beispielsatz
2. Behalte die exakte Schreibweise bei
3. Bei fehlenden Beispielsaetzen: "example": null
4. Ignoriere Seitenzahlen, Ueberschriften
5. Gib NUR valides JSON zurueck
"""

6. Tests

6.1 Tests ausfuehren

cd /Users/benjaminadmin/Projekte/breakpilot-pwa/klausur-service/backend
source venv/bin/activate

# Alle Tests
pytest tests/test_vocab_worksheet.py -v

# Mit Coverage
pytest tests/test_vocab_worksheet.py --cov=vocab_worksheet_api --cov-report=html

# Einzelne Testklasse
pytest tests/test_vocab_worksheet.py::TestSessionCRUD -v

6.2 Test-Kategorien

Klasse Beschreibung
TestSessionCRUD Session erstellen, lesen, loeschen
TestVocabulary Vokabeln abrufen, aktualisieren
TestWorksheetGeneration Arbeitsblatt-Generierung
TestJSONParsing LLM-Response parsing
TestFileUpload Bild/PDF-Upload
TestSessionStatus Status-Workflow
TestEdgeCases Randfaelle

7. Deployment

7.1 Docker Build

# Klausur-Service neu bauen
docker compose build klausur-service

# Studio-v2 neu bauen
docker compose build studio-v2

7.2 Sync zum Mac Mini

# Source-Files synchronisieren
rsync -avz --exclude 'node_modules' --exclude '.next' --exclude '__pycache__' \
  /Users/benjaminadmin/Projekte/breakpilot-pwa/ \
  macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/

# Container neu starten
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-pwa && docker compose up -d"

8. Troubleshooting

8.1 Haeufige Probleme

Problem Loesung
Ollama nicht erreichbar docker exec -it ollama ollama list pruefen
PDF-Konvertierung schlaegt fehl PyMuPDF installiert? pip install PyMuPDF
Vision LLM Timeout Timeout auf 300s erhoehen
Leere Vokabel-Liste Bild-Qualitaet pruefen, anderen LLM-Modus testen

8.2 Logs pruefen

# Backend-Logs
docker logs klausur-service -f --tail 100

# Ollama-Logs
docker logs ollama -f --tail 100

8.3 Debug-Modus

# In vocab_worksheet_api.py
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

# Zeigt detaillierte OCR/LLM-Ausgaben

9. Erweiterung

9.1 Neue Sprache hinzufuegen

  1. source_language und target_language in Session-Model erweitern
  2. Prompt anpassen fuer neue Sprachkombination
  3. Frontend-Dropdown erweitern

9.2 Neuer Arbeitsblatt-Typ

  1. WorksheetType Enum erweitern:

    class WorksheetType(str, Enum):
        # ...
        CROSSWORD = "crossword"  # Neu
    
  2. generate_worksheet_html() erweitern

  3. Frontend-Checkbox hinzufuegen

9.3 Datenbank-Persistenz

Aktuell: In-Memory (_sessions Dict)

Fuer Produktion PostgreSQL hinzufuegen:

# models.py
class VocabSession(Base):
    __tablename__ = "vocab_sessions"
    id = Column(UUID, primary_key=True)
    name = Column(String)
    status = Column(String)
    vocabulary = Column(JSON)
    # ...

10. API-Referenz

Vollstaendige OpenAPI-Dokumentation verfuegbar unter:

Filter nach Tag Vocabulary Worksheets fuer alle Vocab-Endpoints.