Archived

This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

BreakPilot Dev 19855efacc

Tests / Go Tests (push) Has been cancelled

Details

Tests / Python Tests (push) Has been cancelled

Details

Tests / Integration Tests (push) Has been cancelled

Details

Tests / Go Lint (push) Has been cancelled

Details

Tests / Python Lint (push) Has been cancelled

Details

Tests / Security Scan (push) Has been cancelled

Details

Tests / All Checks Passed (push) Has been cancelled

Details

Security Scanning / Secret Scanning (push) Has been cancelled

Details

Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled

Details

Security Scanning / Go Security Scan (push) Has been cancelled

Details

Security Scanning / Python Security Scan (push) Has been cancelled

Details

Security Scanning / Node.js Security Scan (push) Has been cancelled

Details

Security Scanning / Docker Image Security (push) Has been cancelled

Details

Security Scanning / Security Summary (push) Has been cancelled

Details

CI/CD Pipeline / Go Tests (push) Has been cancelled

Details

CI/CD Pipeline / Python Tests (push) Has been cancelled

Details

CI/CD Pipeline / Website Tests (push) Has been cancelled

Details

CI/CD Pipeline / Linting (push) Has been cancelled

Details

CI/CD Pipeline / Security Scan (push) Has been cancelled

Details

CI/CD Pipeline / Docker Build & Push (push) Has been cancelled

Details

CI/CD Pipeline / Integration Tests (push) Has been cancelled

Details

CI/CD Pipeline / Deploy to Staging (push) Has been cancelled

Details

CI/CD Pipeline / Deploy to Production (push) Has been cancelled

Details

CI/CD Pipeline / CI Summary (push) Has been cancelled

Details

ci/woodpecker/manual/build-ci-image Pipeline was successful

Details

ci/woodpecker/manual/main Pipeline failed

Details

feat: BreakPilot PWA - Full codebase (clean push without large binaries)

All services: admin-v2, studio-v2, website, ai-compliance-sdk,
consent-service, klausur-service, voice-service, and infrastructure.
Large PDFs and compiled binaries excluded via .gitignore.

2026-02-11 13:25:58 +01:00

11 KiB

Raw Permalink Blame History

Vokabel-Arbeitsblatt Generator - Architektur

Version: 1.0.0 Datum: 2026-01-23 Status: Produktiv

1. Uebersicht

Der Vokabel-Arbeitsblatt Generator ist ein DSGVO-konformes Tool fuer Lehrer, das Vokabeln aus Schulbuchseiten extrahiert und druckfertige Arbeitsblaetter generiert.

┌─────────────────────────────────────────────────────────────────────────┐
│                        Studio v2 (Next.js)                               │
│                        Port 3001                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  /vocab-worksheet                                                │   │
│  │  - Session-Management (erstellen, fortsetzen, loeschen)         │   │
│  │  - PDF-Upload mit Seitenauswahl                                  │   │
│  │  - Vokabel-Bearbeitung (Grid-Editor)                            │   │
│  │  - Arbeitsblatt-Konfiguration                                    │   │
│  │  - PDF-Export                                                    │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼ HTTP/REST
┌─────────────────────────────────────────────────────────────────────────┐
│                     Klausur-Service (FastAPI)                            │
│                     Port 8086                                            │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  /api/v1/vocab/*                                                 │   │
│  │  - Session CRUD                                                  │   │
│  │  - PDF-Verarbeitung (PyMuPDF)                                   │   │
│  │  - Vokabel-Extraktion (Vision LLM / Hybrid OCR)                 │   │
│  │  - Arbeitsblatt-Generierung (WeasyPrint)                        │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    ▼                               ▼
┌───────────────────────────────┐   ┌───────────────────────────────────┐
│     Ollama Vision LLM          │   │        LLM Gateway                 │
│     Port 11434                 │   │        Port 8002                   │
│  ┌─────────────────────────┐  │   │  ┌─────────────────────────────┐  │
│  │ qwen2.5vl:32b           │  │   │  │ qwen2.5:14b                 │  │
│  │ (Bild → Vokabeln)       │  │   │  │ (OCR-Text → strukturiert)   │  │
│  └─────────────────────────┘  │   │  └─────────────────────────────┘  │
└───────────────────────────────┘   └───────────────────────────────────┘

2. Komponenten

2.1 Frontend (studio-v2)

Datei: /studio-v2/app/vocab-worksheet/page.tsx

Aspekt	Details
Framework	Next.js 16.1.4 mit React 19.0.0
Styling	Tailwind CSS 3.4.17
Sprache	TypeScript 5.7.0
State	React Hooks (useState, useRef, useEffect)

Tab-basierter Workflow:

Upload - Session benennen, Datei auswaehlen (Bild/PDF)
Pages - Bei PDFs: Seiten mit Thumbnails auswaehlen
Vocabulary - Extrahierte Vokabeln pruefen/bearbeiten
Worksheet - Arbeitsblatt-Typ und Format waehlen
Export - PDF herunterladen

Datenstrukturen:

interface VocabularyEntry {
  id: string
  english: string
  german: string
  example_sentence?: string
  word_type?: string
  source_page?: number
}

interface Session {
  id: string
  name: string
  status: 'pending' | 'processing' | 'extracted' | 'completed'
  vocabulary_count: number
}

type WorksheetType = 'en_to_de' | 'de_to_en' | 'copy' | 'gap_fill'

2.2 Backend API

Datei: /klausur-service/backend/vocab_worksheet_api.py

Aspekt	Details
Framework	FastAPI (async)
Router-Prefix	`/api/v1/vocab`
Storage	In-Memory (Dict) + Filesystem

Endpoints:

Methode	Pfad	Beschreibung
POST	`/sessions`	Session erstellen
GET	`/sessions`	Sessions auflisten
GET	`/sessions/{id}`	Session-Details
DELETE	`/sessions/{id}`	Session loeschen
POST	`/sessions/{id}/upload`	Bild/PDF hochladen
POST	`/sessions/{id}/upload-pdf-info`	PDF-Info abrufen
GET	`/sessions/{id}/pdf-thumbnail/{page}`	Seiten-Thumbnail
POST	`/sessions/{id}/process-single-page/{page}`	Einzelne Seite verarbeiten
GET	`/sessions/{id}/vocabulary`	Vokabeln abrufen
PUT	`/sessions/{id}/vocabulary`	Vokabeln aktualisieren
POST	`/sessions/{id}/generate`	Arbeitsblatt generieren
GET	`/worksheets/{id}/pdf`	Arbeitsblatt-PDF
GET	`/worksheets/{id}/solution`	Loesungs-PDF

2.3 Vokabel-Extraktion

Zwei Modi verfuegbar:

A. Vision LLM (Standard)

OLLAMA_URL = "http://host.docker.internal:11434"
VISION_MODEL = "qwen2.5vl:32b"

Bild wird Base64-kodiert an Ollama gesendet
Prompt in Deutsch fuer bessere Erkennung
Timeout: 5 Minuten pro Seite
Confidence: ~85%

B. Hybrid OCR + LLM (Optional)

Datei: /klausur-service/backend/hybrid_vocab_extractor.py

Bild → PaddleOCR → Text-Regionen → LLM Gateway → Strukturiertes JSON

PaddleOCR 3.x fuer Text-Erkennung
Automatische Spalten-Erkennung (2 oder 3 Spalten)
qwen2.5:14b fuer Strukturierung
~4x schneller als Vision LLM

2.4 PDF-Verarbeitung

Aufgabe	Bibliothek
PDF → PNG	PyMuPDF (fitz)
Thumbnails	PyMuPDF mit Zoom 0.5
OCR-Bilder	PyMuPDF mit Zoom 2.0
PDF-Generierung	WeasyPrint

3. Datenfluss

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Upload  │───►│  OCR/    │───►│  Edit    │───►│  Export  │
│  PDF     │    │  Extract │    │  Vocab   │    │  PDF     │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
     │               │               │               │
     ▼               ▼               ▼               ▼
 /upload         /process-      /vocabulary     /generate
                 single-page

Session-Status-Workflow:

PENDING → PROCESSING → EXTRACTED → COMPLETED
   │           │           │           │
 Upload    Extraktion   Bereit zum  Worksheet
 erfolgt   laeuft       Bearbeiten  generiert

4. Arbeitsblatt-Typen

Typ	Beschreibung
`en_to_de`	Englisch → Deutsch uebersetzen
`de_to_en`	Deutsch → Englisch uebersetzen
`copy`	Woerter mehrfach abschreiben
`gap_fill`	Lueckentext mit Beispielsaetzen

Optionen:

Zeilenhoehe: normal / large / extra-large
Loesungen: ja / nein
Wiederholungen (bei Copy): 1-5

5. Datenschutz (DSGVO)

Aspekt	Umsetzung
Verarbeitung	100% lokal (Mac Mini)
Externe APIs	Keine
LLM	Ollama (lokal)
Speicherung	Lokales Filesystem
Datentransfer	Nur innerhalb LAN

Keine Daten werden an externe Server gesendet.

6. Konfiguration

Umgebungsvariablen:

# Ollama Vision LLM
OLLAMA_URL=http://host.docker.internal:11434
OLLAMA_VISION_MODEL=qwen2.5vl:32b

# LLM Gateway (Hybrid Mode)
LLM_GATEWAY_URL=http://host.docker.internal:8002
LLM_MODEL=qwen2.5:14b

# Storage
VOCAB_STORAGE_PATH=/app/vocab-worksheets

7. Abhaengigkeiten

Backend (Python)

Paket	Version	Zweck
FastAPI	0.123.9	Web Framework
PyMuPDF	1.25.4	PDF-Verarbeitung
WeasyPrint	66.0	PDF-Generierung
Pillow	11.3.0	Bildverarbeitung
httpx	0.28.1	Async HTTP Client
PaddleOCR	3.x	OCR (optional)

Frontend (Node.js)

Paket	Version	Zweck
Next.js	16.1.4	Framework
React	19.0.0	UI Library
Tailwind CSS	3.4.17	Styling
TypeScript	5.7.0	Type Safety

8. Deployment

Docker-Container:

klausur-service (Port 8086) - Backend API
studio-v2 (Port 3001) - Frontend

URLs:

Frontend: http://macmini:3001/vocab-worksheet
API: http://macmini:8086/api/v1/vocab/

9. Erweiterungsmoeglichkeiten

Feature	Status
Weitere Sprachen (FR, ES)	Geplant
Datenbank-Persistenz	Geplant
Batch-Verarbeitung	Geplant
Woerterbuch-Integration	Idee
Audio-Ausspracheuebungen	Idee

11 KiB Raw Permalink Blame History

Vokabel-Arbeitsblatt Generator - Architektur

1. Uebersicht

2. Komponenten

2.1 Frontend (studio-v2)

2.2 Backend API

2.3 Vokabel-Extraktion

A. Vision LLM (Standard)

B. Hybrid OCR + LLM (Optional)

2.4 PDF-Verarbeitung

3. Datenfluss

4. Arbeitsblatt-Typen

5. Datenschutz (DSGVO)

6. Konfiguration

7. Abhaengigkeiten

Backend (Python)

Frontend (Node.js)

8. Deployment

9. Erweiterungsmoeglichkeiten

10. Verwandte Dokumentation

11 KiB

Raw Permalink Blame History