Initial commit: breakpilot-core - Shared Infrastructure
Docker Compose with 24+ services: - PostgreSQL (PostGIS), Valkey, MinIO, Qdrant - Vault (PKI/TLS), Nginx (Reverse Proxy) - Backend Core API, Consent Service, Billing Service - RAG Service, Embedding Service - Gitea, Woodpecker CI/CD - Night Scheduler, Health Aggregator - Jitsi (Web/XMPP/JVB/Jicofo), Mailpit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
322
docs-src/services/klausur-service/BYOEH-Architecture.md
Normal file
322
docs-src/services/klausur-service/BYOEH-Architecture.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# BYOEH (Bring-Your-Own-Expectation-Horizon) - Architecture Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The BYOEH module enables teachers to upload their own Erwartungshorizonte (expectation horizons/grading rubrics) and use them for RAG-assisted grading suggestions. Key design principles:
|
||||
|
||||
- **Tenant Isolation**: Each teacher/school has an isolated namespace
|
||||
- **No Training Guarantee**: EH content is only used for RAG, never for model training
|
||||
- **Operator Blindness**: Client-side encryption ensures Breakpilot cannot view plaintext
|
||||
- **Rights Confirmation**: Required legal acknowledgment at upload time
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (Port 8086) │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ ┌────────────────────┐ ┌─────────────────────────────────────────┐ │
|
||||
│ │ BYOEH REST API │ │ BYOEH Service Layer │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ POST /api/v1/eh │───▶│ - Upload Wizard Logic │ │
|
||||
│ │ GET /api/v1/eh │ │ - Rights Confirmation │ │
|
||||
│ │ DELETE /api/v1/eh │ │ - Chunking Pipeline │ │
|
||||
│ │ POST /rag-query │ │ - Encryption Service │ │
|
||||
│ └────────────────────┘ └────────────────────┬────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┼────────────────────────┘
|
||||
│
|
||||
┌───────────────────────────────────────┼───────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐
|
||||
│ PostgreSQL │ │ Qdrant │ │ Encrypted Storage │
|
||||
│ (Metadata + Audit) │ │ (Vector Search) │ │ /app/eh-uploads/ │
|
||||
│ │ │ │ │ │
|
||||
│ In-Memory Storage: │ │ Collection: bp_eh │ │ {tenant}/{eh_id}/ │
|
||||
│ - erwartungshorizonte│ │ - tenant_id (filter) │ │ encrypted.bin │
|
||||
│ - eh_chunks │ │ - eh_id │ │ salt.txt │
|
||||
│ - eh_key_shares │ │ - embedding[1536] │ │ │
|
||||
│ - eh_klausur_links │ │ - encrypted_content │ └──────────────────────┘
|
||||
│ - eh_audit_log │ │ │
|
||||
└──────────────────────┘ └──────────────────────────┘
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Upload Flow
|
||||
|
||||
```
|
||||
Browser Backend Storage
|
||||
│ │ │
|
||||
│ 1. User selects PDF │ │
|
||||
│ 2. User enters passphrase │ │
|
||||
│ 3. PBKDF2 key derivation │ │
|
||||
│ 4. AES-256-GCM encryption │ │
|
||||
│ 5. SHA-256 key hash │ │
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/upload │ │
|
||||
│ (encrypted blob + key_hash) │ │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ Store encrypted.bin + salt │
|
||||
│ │◀──────────────────────────────│
|
||||
│ │ │
|
||||
│ │ Save metadata to DB │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return EH record │ │
|
||||
```
|
||||
|
||||
### 2. Indexing Flow (RAG Preparation)
|
||||
|
||||
```
|
||||
Browser Backend Qdrant
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/{id}/index │ │
|
||||
│ (passphrase for decryption) │ │
|
||||
│ │ │
|
||||
│ │ 1. Verify key hash │
|
||||
│ │ 2. Decrypt content │
|
||||
│ │ 3. Extract text (PDF) │
|
||||
│ │ 4. Chunk text │
|
||||
│ │ 5. Generate embeddings │
|
||||
│ │ 6. Re-encrypt each chunk │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ Index vectors + encrypted │
|
||||
│ │ chunks with tenant filter │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return chunk count │ │
|
||||
```
|
||||
|
||||
### 3. RAG Query Flow
|
||||
|
||||
```
|
||||
Browser Backend Qdrant
|
||||
│ │ │
|
||||
│──────────────────────────────▶│ │
|
||||
│ POST /api/v1/eh/rag-query │ │
|
||||
│ (query + passphrase) │ │
|
||||
│ │ │
|
||||
│ │ 1. Generate query embedding │
|
||||
│ │──────────────────────────────▶│
|
||||
│ │ 2. Semantic search │
|
||||
│ │ (tenant-filtered) │
|
||||
│ │◀──────────────────────────────│
|
||||
│ │ 3. Decrypt matched chunks │
|
||||
│◀──────────────────────────────│ │
|
||||
│ Return decrypted context │ │
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Client-Side Encryption
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Browser (Client-Side) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. User enters passphrase (NEVER sent to server) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 2. Key Derivation: PBKDF2-SHA256(passphrase, salt, 100k iter) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 3. Encryption: AES-256-GCM(key, iv, file_content) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 4. Key-Hash: SHA-256(derived_key) → server verification only │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ 5. Upload: encrypted_blob + key_hash + salt (NOT key!) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Security Guarantees
|
||||
|
||||
| Guarantee | Implementation |
|
||||
|-----------|----------------|
|
||||
| **No Training** | `training_allowed: false` on all Qdrant points |
|
||||
| **Operator Blindness** | Passphrase never leaves browser; server only sees key hash |
|
||||
| **Tenant Isolation** | Every query filtered by `tenant_id` |
|
||||
| **Audit Trail** | All actions logged with timestamps |
|
||||
|
||||
## Key Sharing System
|
||||
|
||||
The key sharing system enables first examiners to grant access to their EH to second examiners and supervisors.
|
||||
|
||||
### Share Flow
|
||||
|
||||
```
|
||||
First Examiner Backend Second Examiner
|
||||
│ │ │
|
||||
│ 1. Encrypt passphrase for │ │
|
||||
│ recipient (client-side) │ │
|
||||
│ │ │
|
||||
│─────────────────────────────▶ │
|
||||
│ POST /eh/{id}/share │ │
|
||||
│ (encrypted_passphrase, role)│ │
|
||||
│ │ │
|
||||
│ │ Store EHKeyShare │
|
||||
│◀───────────────────────────── │
|
||||
│ │ │
|
||||
│ │ │
|
||||
│ │◀────────────────────────────│
|
||||
│ │ GET /eh/shared-with-me │
|
||||
│ │ │
|
||||
│ │─────────────────────────────▶
|
||||
│ │ Return shared EH list │
|
||||
│ │ │
|
||||
│ │◀────────────────────────────│
|
||||
│ │ RAG query with decrypted │
|
||||
│ │ passphrase │
|
||||
```
|
||||
|
||||
### Data Structures
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EHKeyShare:
|
||||
id: str
|
||||
eh_id: str
|
||||
user_id: str # Recipient
|
||||
encrypted_passphrase: str # Client-encrypted for recipient
|
||||
passphrase_hint: str # Optional hint
|
||||
granted_by: str # Grantor user ID
|
||||
granted_at: datetime
|
||||
role: str # second_examiner, third_examiner, supervisor
|
||||
klausur_id: Optional[str] # Link to specific Klausur
|
||||
active: bool
|
||||
|
||||
@dataclass
|
||||
class EHKlausurLink:
|
||||
id: str
|
||||
eh_id: str
|
||||
klausur_id: str
|
||||
linked_by: str
|
||||
linked_at: datetime
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Core EH Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/upload` | Upload encrypted EH |
|
||||
| GET | `/api/v1/eh` | List user's EH |
|
||||
| GET | `/api/v1/eh/{id}` | Get single EH |
|
||||
| DELETE | `/api/v1/eh/{id}` | Soft delete EH |
|
||||
| POST | `/api/v1/eh/{id}/index` | Index EH for RAG |
|
||||
| POST | `/api/v1/eh/rag-query` | Query EH content |
|
||||
|
||||
### Key Sharing Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/{id}/share` | Share EH with examiner |
|
||||
| GET | `/api/v1/eh/{id}/shares` | List shares (owner) |
|
||||
| DELETE | `/api/v1/eh/{id}/shares/{shareId}` | Revoke share |
|
||||
| GET | `/api/v1/eh/shared-with-me` | List EH shared with user |
|
||||
|
||||
### Klausur Integration Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/eh/{id}/link-klausur` | Link EH to Klausur |
|
||||
| DELETE | `/api/v1/eh/{id}/link-klausur/{klausurId}` | Unlink EH |
|
||||
| GET | `/api/v1/klausuren/{id}/linked-eh` | Get linked EH for Klausur |
|
||||
|
||||
### Audit & Admin Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/v1/eh/audit-log` | Get audit log |
|
||||
| GET | `/api/v1/eh/rights-text` | Get rights confirmation text |
|
||||
| GET | `/api/v1/eh/qdrant-status` | Get Qdrant status (admin) |
|
||||
|
||||
## Frontend Components
|
||||
|
||||
### EHUploadWizard
|
||||
|
||||
5-step wizard for uploading Erwartungshorizonte:
|
||||
|
||||
1. **File Selection** - Choose PDF file
|
||||
2. **Metadata** - Title, Subject, Niveau, Year
|
||||
3. **Rights Confirmation** - Legal acknowledgment
|
||||
4. **Encryption** - Set passphrase (2x confirmation)
|
||||
5. **Summary** - Review and upload
|
||||
|
||||
### Integration Points
|
||||
|
||||
- **KorrekturPage**: Shows EH prompt after first student upload
|
||||
- **GutachtenGeneration**: Uses RAG context from linked EH
|
||||
- **Sidebar Badge**: Shows linked EH count
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # API endpoints + data structures
|
||||
│ ├── qdrant_service.py # Vector database operations
|
||||
│ ├── eh_pipeline.py # Chunking, embedding, encryption
|
||||
│ └── requirements.txt # Python dependencies
|
||||
├── frontend/
|
||||
│ └── src/
|
||||
│ ├── components/
|
||||
│ │ └── EHUploadWizard.tsx
|
||||
│ ├── services/
|
||||
│ │ ├── api.ts # API client
|
||||
│ │ └── encryption.ts # Client-side crypto
|
||||
│ ├── pages/
|
||||
│ │ └── KorrekturPage.tsx # EH integration
|
||||
│ └── styles/
|
||||
│ └── eh-wizard.css
|
||||
└── docs/
|
||||
├── BYOEH-Architecture.md
|
||||
└── BYOEH-Developer-Guide.md
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```env
|
||||
QDRANT_URL=http://qdrant:6333
|
||||
OPENAI_API_KEY=sk-... # For embeddings
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
EH_UPLOAD_DIR=/app/eh-uploads
|
||||
```
|
||||
|
||||
### Docker Services
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
qdrant:
|
||||
image: qdrant/qdrant:v1.7.4
|
||||
ports:
|
||||
- "6333:6333"
|
||||
volumes:
|
||||
- qdrant_data:/qdrant/storage
|
||||
```
|
||||
|
||||
## Audit Events
|
||||
|
||||
| Action | Description |
|
||||
|--------|-------------|
|
||||
| `upload` | EH uploaded |
|
||||
| `index` | EH indexed for RAG |
|
||||
| `rag_query` | RAG query executed |
|
||||
| `delete` | EH soft deleted |
|
||||
| `share` | EH shared with examiner |
|
||||
| `revoke_share` | Share revoked |
|
||||
| `link_klausur` | EH linked to Klausur |
|
||||
| `unlink_klausur` | EH unlinked from Klausur |
|
||||
|
||||
## See Also
|
||||
|
||||
- [Zeugnis-System Architektur](../../architecture/zeugnis-system.md)
|
||||
- [Klausur-Service Index](./index.md)
|
||||
481
docs-src/services/klausur-service/BYOEH-Developer-Guide.md
Normal file
481
docs-src/services/klausur-service/BYOEH-Developer-Guide.md
Normal file
@@ -0,0 +1,481 @@
|
||||
# BYOEH Developer Guide
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- Docker & Docker Compose
|
||||
- OpenAI API Key (for embeddings)
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Start services:**
|
||||
```bash
|
||||
docker-compose up -d qdrant
|
||||
```
|
||||
|
||||
2. **Configure environment:**
|
||||
```env
|
||||
QDRANT_URL=http://localhost:6333
|
||||
OPENAI_API_KEY=sk-your-key
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
```
|
||||
|
||||
3. **Run klausur-service:**
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
pip install -r requirements.txt
|
||||
uvicorn main:app --reload --port 8086
|
||||
```
|
||||
|
||||
4. **Run frontend:**
|
||||
```bash
|
||||
cd klausur-service/frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## Client-Side Encryption
|
||||
|
||||
The encryption service (`encryption.ts`) handles all cryptographic operations in the browser:
|
||||
|
||||
### Encrypting a File
|
||||
|
||||
```typescript
|
||||
import { encryptFile, generateSalt } from '../services/encryption'
|
||||
|
||||
const file = document.getElementById('fileInput').files[0]
|
||||
const passphrase = 'user-secret-password'
|
||||
|
||||
const encrypted = await encryptFile(file, passphrase)
|
||||
// Result:
|
||||
// {
|
||||
// encryptedData: ArrayBuffer,
|
||||
// keyHash: string, // SHA-256 hash for verification
|
||||
// salt: string, // Hex-encoded salt
|
||||
// iv: string // Hex-encoded initialization vector
|
||||
// }
|
||||
```
|
||||
|
||||
### Decrypting Content
|
||||
|
||||
```typescript
|
||||
import { decryptText, verifyPassphrase } from '../services/encryption'
|
||||
|
||||
// First verify the passphrase
|
||||
const isValid = await verifyPassphrase(passphrase, salt, expectedKeyHash)
|
||||
|
||||
if (isValid) {
|
||||
const decrypted = await decryptText(encryptedBase64, passphrase, salt)
|
||||
}
|
||||
```
|
||||
|
||||
## Backend API Usage
|
||||
|
||||
### Upload an Erwartungshorizont
|
||||
|
||||
```python
|
||||
# The upload endpoint accepts FormData with:
|
||||
# - file: encrypted binary blob
|
||||
# - metadata_json: JSON string with metadata
|
||||
|
||||
POST /api/v1/eh/upload
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
{
|
||||
"file": <encrypted_blob>,
|
||||
"metadata_json": {
|
||||
"metadata": {
|
||||
"title": "Deutsch LK 2025",
|
||||
"subject": "deutsch",
|
||||
"niveau": "eA",
|
||||
"year": 2025,
|
||||
"aufgaben_nummer": "Aufgabe 1"
|
||||
},
|
||||
"encryption_key_hash": "abc123...",
|
||||
"salt": "def456...",
|
||||
"rights_confirmed": true,
|
||||
"original_filename": "erwartungshorizont.pdf"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Index for RAG
|
||||
|
||||
```python
|
||||
POST /api/v1/eh/{eh_id}/index
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"passphrase": "user-secret-password"
|
||||
}
|
||||
```
|
||||
|
||||
The backend will:
|
||||
1. Verify the passphrase against stored key hash
|
||||
2. Decrypt the file
|
||||
3. Extract text from PDF
|
||||
4. Chunk the text (1000 chars, 200 overlap)
|
||||
5. Generate OpenAI embeddings
|
||||
6. Re-encrypt each chunk
|
||||
7. Index in Qdrant with tenant filter
|
||||
|
||||
### RAG Query
|
||||
|
||||
```python
|
||||
POST /api/v1/eh/rag-query
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query_text": "Wie sollte die Einleitung strukturiert sein?",
|
||||
"passphrase": "user-secret-password",
|
||||
"subject": "deutsch", # Optional filter
|
||||
"limit": 5 # Max results
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"context": "Die Einleitung sollte...",
|
||||
"sources": [
|
||||
{
|
||||
"text": "Die Einleitung sollte...",
|
||||
"eh_id": "uuid",
|
||||
"eh_title": "Deutsch LK 2025",
|
||||
"chunk_index": 2,
|
||||
"score": 0.89
|
||||
}
|
||||
],
|
||||
"query": "Wie sollte die Einleitung strukturiert sein?"
|
||||
}
|
||||
```
|
||||
|
||||
## Key Sharing Implementation
|
||||
|
||||
### Invitation Flow (Recommended)
|
||||
|
||||
The invitation flow provides a two-phase sharing process: Invite -> Accept
|
||||
|
||||
```typescript
|
||||
import { ehApi } from '../services/api'
|
||||
|
||||
// 1. First examiner sends invitation to second examiner
|
||||
const invitation = await ehApi.inviteToEH(ehId, {
|
||||
invitee_email: 'zweitkorrektor@school.de',
|
||||
role: 'second_examiner',
|
||||
klausur_id: 'klausur-uuid', // Optional: link to specific Klausur
|
||||
message: 'Bitte fuer Zweitkorrektur nutzen',
|
||||
expires_in_days: 14 // Default: 14 days
|
||||
})
|
||||
// Returns: { invitation_id, eh_id, invitee_email, role, expires_at, eh_title }
|
||||
|
||||
// 2. Second examiner sees pending invitation
|
||||
const pending = await ehApi.getPendingInvitations()
|
||||
// [{ invitation: {...}, eh: { id, title, subject, niveau, year } }]
|
||||
|
||||
// 3. Second examiner accepts invitation
|
||||
const accepted = await ehApi.acceptInvitation(
|
||||
invitationId,
|
||||
encryptedPassphrase // Passphrase encrypted for recipient
|
||||
)
|
||||
// Returns: { status: 'accepted', share_id, eh_id, role, klausur_id }
|
||||
```
|
||||
|
||||
### Invitation Management
|
||||
|
||||
```typescript
|
||||
// Get invitations sent by current user
|
||||
const sent = await ehApi.getSentInvitations()
|
||||
|
||||
// Decline an invitation (as invitee)
|
||||
await ehApi.declineInvitation(invitationId)
|
||||
|
||||
// Revoke a pending invitation (as inviter)
|
||||
await ehApi.revokeInvitation(invitationId)
|
||||
|
||||
// Get complete access chain for an EH
|
||||
const chain = await ehApi.getAccessChain(ehId)
|
||||
// Returns: { eh_id, eh_title, owner, active_shares, pending_invitations, revoked_shares }
|
||||
```
|
||||
|
||||
### Direct Sharing (Legacy)
|
||||
|
||||
For immediate sharing without invitation:
|
||||
|
||||
```typescript
|
||||
// First examiner shares directly with second examiner
|
||||
await ehApi.shareEH(ehId, {
|
||||
user_id: 'second-examiner-uuid',
|
||||
role: 'second_examiner',
|
||||
encrypted_passphrase: encryptedPassphrase, // Encrypted for recipient
|
||||
passphrase_hint: 'Das uebliche Passwort',
|
||||
klausur_id: 'klausur-uuid' // Optional
|
||||
})
|
||||
```
|
||||
|
||||
### Accessing Shared EH
|
||||
|
||||
```typescript
|
||||
// Second examiner gets shared EH
|
||||
const shared = await ehApi.getSharedWithMe()
|
||||
// [{ eh: {...}, share: {...} }]
|
||||
|
||||
// Query using provided passphrase
|
||||
const result = await ehApi.ragQuery({
|
||||
query_text: 'search query',
|
||||
passphrase: decryptedPassphrase,
|
||||
subject: 'deutsch'
|
||||
})
|
||||
```
|
||||
|
||||
### Revoking Access
|
||||
|
||||
```typescript
|
||||
// List all shares for an EH
|
||||
const shares = await ehApi.listShares(ehId)
|
||||
|
||||
// Revoke a share
|
||||
await ehApi.revokeShare(ehId, shareId)
|
||||
```
|
||||
|
||||
## Klausur Integration
|
||||
|
||||
### Automatic EH Prompt
|
||||
|
||||
The `KorrekturPage` shows an EH upload prompt after the first student work is uploaded:
|
||||
|
||||
```typescript
|
||||
// In KorrekturPage.tsx
|
||||
useEffect(() => {
|
||||
if (
|
||||
currentKlausur?.students.length === 1 &&
|
||||
linkedEHs.length === 0 &&
|
||||
!ehPromptDismissed
|
||||
) {
|
||||
setShowEHPrompt(true)
|
||||
}
|
||||
}, [currentKlausur?.students.length])
|
||||
```
|
||||
|
||||
### Linking EH to Klausur
|
||||
|
||||
```typescript
|
||||
// After EH upload, auto-link to Klausur
|
||||
await ehApi.linkToKlausur(ehId, klausurId)
|
||||
|
||||
// Get linked EH for a Klausur
|
||||
const linked = await klausurEHApi.getLinkedEH(klausurId)
|
||||
```
|
||||
|
||||
## Frontend Components
|
||||
|
||||
### EHUploadWizard Props
|
||||
|
||||
```typescript
|
||||
interface EHUploadWizardProps {
|
||||
onClose: () => void
|
||||
onComplete?: (ehId: string) => void
|
||||
defaultSubject?: string // Pre-fill subject
|
||||
defaultYear?: number // Pre-fill year
|
||||
klausurId?: string // Auto-link after upload
|
||||
}
|
||||
|
||||
// Usage
|
||||
<EHUploadWizard
|
||||
onClose={() => setShowWizard(false)}
|
||||
onComplete={(ehId) => console.log('Uploaded:', ehId)}
|
||||
defaultSubject={klausur.subject}
|
||||
defaultYear={klausur.year}
|
||||
klausurId={klausur.id}
|
||||
/>
|
||||
```
|
||||
|
||||
### Wizard Steps
|
||||
|
||||
1. **file** - PDF file selection with drag & drop
|
||||
2. **metadata** - Form for title, subject, niveau, year
|
||||
3. **rights** - Rights confirmation checkbox
|
||||
4. **encryption** - Passphrase input with strength meter
|
||||
5. **summary** - Review and confirm upload
|
||||
|
||||
## Qdrant Operations
|
||||
|
||||
### Collection Schema
|
||||
|
||||
```python
|
||||
# Collection: bp_eh
|
||||
{
|
||||
"vectors": {
|
||||
"size": 1536, # OpenAI text-embedding-3-small
|
||||
"distance": "Cosine"
|
||||
}
|
||||
}
|
||||
|
||||
# Point payload
|
||||
{
|
||||
"tenant_id": "school-uuid",
|
||||
"eh_id": "eh-uuid",
|
||||
"chunk_index": 0,
|
||||
"encrypted_content": "base64...",
|
||||
"training_allowed": false # ALWAYS false
|
||||
}
|
||||
```
|
||||
|
||||
### Tenant-Isolated Search
|
||||
|
||||
```python
|
||||
from qdrant_service import search_eh
|
||||
|
||||
results = await search_eh(
|
||||
query_embedding=embedding,
|
||||
tenant_id="school-uuid",
|
||||
subject="deutsch",
|
||||
limit=5
|
||||
)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
pytest tests/test_byoeh.py -v
|
||||
```
|
||||
|
||||
### Test Structure
|
||||
|
||||
```python
|
||||
# tests/test_byoeh.py
|
||||
class TestBYOEH:
|
||||
def test_upload_eh(self, client, auth_headers):
|
||||
"""Test EH upload with encryption"""
|
||||
pass
|
||||
|
||||
def test_index_eh(self, client, auth_headers, uploaded_eh):
|
||||
"""Test EH indexing for RAG"""
|
||||
pass
|
||||
|
||||
def test_rag_query(self, client, auth_headers, indexed_eh):
|
||||
"""Test RAG query returns relevant chunks"""
|
||||
pass
|
||||
|
||||
def test_share_eh(self, client, auth_headers, uploaded_eh):
|
||||
"""Test sharing EH with another user"""
|
||||
pass
|
||||
```
|
||||
|
||||
### Frontend Tests
|
||||
|
||||
```typescript
|
||||
// EHUploadWizard.test.tsx
|
||||
describe('EHUploadWizard', () => {
|
||||
it('completes all steps successfully', async () => {
|
||||
// ...
|
||||
})
|
||||
|
||||
it('validates passphrase strength', async () => {
|
||||
// ...
|
||||
})
|
||||
|
||||
it('auto-links to klausur when klausurId provided', async () => {
|
||||
// ...
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| `Passphrase verification failed` | Wrong passphrase | Ask user to re-enter |
|
||||
| `EH not found` | Invalid ID or deleted | Check ID, reload list |
|
||||
| `Access denied` | User not owner/shared | Check permissions |
|
||||
| `Qdrant connection failed` | Service unavailable | Check Qdrant container |
|
||||
|
||||
### Error Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Passphrase verification failed"
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Do's
|
||||
|
||||
- Store key hash, never the key itself
|
||||
- Always filter by tenant_id
|
||||
- Log all access in audit trail
|
||||
- Use HTTPS in production
|
||||
|
||||
### Don'ts
|
||||
|
||||
- Never log passphrase or decrypted content
|
||||
- Never store passphrase in localStorage
|
||||
- Never send passphrase as URL parameter
|
||||
- Never return decrypted content without auth
|
||||
|
||||
## Performance Tips
|
||||
|
||||
### Chunking Configuration
|
||||
|
||||
```python
|
||||
CHUNK_SIZE = 1000 # Characters per chunk
|
||||
CHUNK_OVERLAP = 200 # Overlap for context continuity
|
||||
```
|
||||
|
||||
### Embedding Batching
|
||||
|
||||
```python
|
||||
# Generate embeddings in batches of 20
|
||||
EMBEDDING_BATCH_SIZE = 20
|
||||
```
|
||||
|
||||
### Qdrant Optimization
|
||||
|
||||
```python
|
||||
# Use HNSW index for fast approximate search
|
||||
# Collection is automatically optimized on creation
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
```python
|
||||
import logging
|
||||
logging.getLogger('byoeh').setLevel(logging.DEBUG)
|
||||
```
|
||||
|
||||
### Check Qdrant Status
|
||||
|
||||
```bash
|
||||
curl http://localhost:6333/collections/bp_eh
|
||||
```
|
||||
|
||||
### Verify Encryption
|
||||
|
||||
```typescript
|
||||
import { isEncryptionSupported } from '../services/encryption'
|
||||
|
||||
if (!isEncryptionSupported()) {
|
||||
console.error('Web Crypto API not available')
|
||||
}
|
||||
```
|
||||
|
||||
## Migration Notes
|
||||
|
||||
### From v1.0 to v1.1
|
||||
|
||||
1. Added key sharing system
|
||||
2. Added Klausur linking
|
||||
3. EH prompt after student upload
|
||||
|
||||
No database migrations required - all data structures are additive.
|
||||
227
docs-src/services/klausur-service/NiBiS-Ingestion-Pipeline.md
Normal file
227
docs-src/services/klausur-service/NiBiS-Ingestion-Pipeline.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# NiBiS Ingestion Pipeline
|
||||
|
||||
## Overview
|
||||
|
||||
Die NiBiS Ingestion Pipeline verarbeitet Abitur-Erwartungshorizonte aus Niedersachsen und indexiert sie in Qdrant für RAG-basierte Klausurkorrektur.
|
||||
|
||||
## Unterstützte Daten
|
||||
|
||||
### Verzeichnisse
|
||||
|
||||
| Verzeichnis | Jahre | Namenskonvention |
|
||||
|-------------|-------|------------------|
|
||||
| `docs/za-download` | 2024, 2025 | `{Jahr}_{Fach}_{niveau}_{Nr}_EWH.pdf` |
|
||||
| `docs/za-download-2` | 2016 | `{Jahr}{Fach}{Niveau}Lehrer/{Jahr}{Fach}{Niveau}A{Nr}L.pdf` |
|
||||
| `docs/za-download-3` | 2017 | `{Jahr}{Fach}{Niveau}Lehrer/{Jahr}{Fach}{Niveau}A{Nr}L.pdf` |
|
||||
|
||||
### Dokumenttypen
|
||||
|
||||
- **EWH** - Erwartungshorizont (Hauptziel)
|
||||
- **Aufgabe** - Prüfungsaufgaben
|
||||
- **Material** - Zusatzmaterialien
|
||||
- **GBU** - Gefährdungsbeurteilung (Chemie/Biologie)
|
||||
- **Bewertungsbogen** - Standardisierte Bewertungsbögen
|
||||
|
||||
### Fächer
|
||||
|
||||
Deutsch, Englisch, Mathematik, Informatik, Biologie, Chemie, Physik, Geschichte, Erdkunde, Kunst, Musik, Sport, Latein, Griechisch, Französisch, Spanisch, Katholische Religion, Evangelische Religion, Werte und Normen, BRC, BVW, Gesundheit-Pflege
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ NiBiS Ingestion Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. ZIP Extraction │
|
||||
│ └── Entpackt 2024.zip, 2025.zip, etc. │
|
||||
│ │
|
||||
│ 2. Document Discovery │
|
||||
│ ├── Parst alte Namenskonvention (2016/2017) │
|
||||
│ └── Parst neue Namenskonvention (2024/2025) │
|
||||
│ │
|
||||
│ 3. PDF Processing │
|
||||
│ ├── Text-Extraktion (PyPDF2) │
|
||||
│ └── Chunking (1000 chars, 200 overlap) │
|
||||
│ │
|
||||
│ 4. Embedding Generation │
|
||||
│ └── OpenAI text-embedding-3-small (1536 dim) │
|
||||
│ │
|
||||
│ 5. Qdrant Indexing │
|
||||
│ └── Collection: bp_nibis_eh │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Verwendung
|
||||
|
||||
### Via API (empfohlen)
|
||||
|
||||
```bash
|
||||
# 1. Vorschau der verfügbaren Dokumente
|
||||
curl http://localhost:8086/api/v1/admin/nibis/discover
|
||||
|
||||
# 2. ZIP-Dateien entpacken
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/extract-zips
|
||||
|
||||
# 3. Ingestion starten
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/ingest \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"ewh_only": true}'
|
||||
|
||||
# 4. Status prüfen
|
||||
curl http://localhost:8086/api/v1/admin/nibis/status
|
||||
|
||||
# 5. Semantische Suche testen
|
||||
curl -X POST http://localhost:8086/api/v1/admin/nibis/search \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"query": "Analyse literarischer Texte", "subject": "Deutsch", "limit": 5}'
|
||||
```
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
# Dry-Run (nur analysieren)
|
||||
cd klausur-service/backend
|
||||
python nibis_ingestion.py --dry-run
|
||||
|
||||
# Vollständige Ingestion
|
||||
python nibis_ingestion.py
|
||||
|
||||
# Nur bestimmtes Jahr
|
||||
python nibis_ingestion.py --year 2024
|
||||
|
||||
# Nur bestimmtes Fach
|
||||
python nibis_ingestion.py --subject Deutsch
|
||||
|
||||
# Manifest erstellen
|
||||
python nibis_ingestion.py --manifest /tmp/nibis_manifest.json
|
||||
```
|
||||
|
||||
### Via Shell Script
|
||||
|
||||
```bash
|
||||
./klausur-service/scripts/run_nibis_ingestion.sh --dry-run
|
||||
./klausur-service/scripts/run_nibis_ingestion.sh --year 2024 --subject Deutsch
|
||||
```
|
||||
|
||||
## Qdrant Schema
|
||||
|
||||
### Collection: `bp_nibis_eh`
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "nibis_2024_deutsch_ea_1_abc123_chunk_0",
|
||||
"vector": [1536 dimensions],
|
||||
"payload": {
|
||||
"doc_id": "nibis_2024_deutsch_ea_1_abc123",
|
||||
"chunk_index": 0,
|
||||
"text": "Der Erwartungshorizont...",
|
||||
"year": 2024,
|
||||
"subject": "Deutsch",
|
||||
"niveau": "eA",
|
||||
"task_number": 1,
|
||||
"doc_type": "EWH",
|
||||
"bundesland": "NI",
|
||||
"variant": null,
|
||||
"source": "nibis",
|
||||
"training_allowed": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Methode | Endpoint | Beschreibung |
|
||||
|---------|----------|--------------|
|
||||
| GET | `/api/v1/admin/nibis/status` | Ingestion-Status |
|
||||
| POST | `/api/v1/admin/nibis/extract-zips` | ZIP-Dateien entpacken |
|
||||
| GET | `/api/v1/admin/nibis/discover` | Dokumente finden |
|
||||
| POST | `/api/v1/admin/nibis/ingest` | Ingestion starten |
|
||||
| POST | `/api/v1/admin/nibis/search` | Semantische Suche |
|
||||
| GET | `/api/v1/admin/nibis/stats` | Statistiken |
|
||||
| GET | `/api/v1/admin/nibis/collections` | Qdrant Collections |
|
||||
| DELETE | `/api/v1/admin/nibis/collection` | Collection löschen |
|
||||
|
||||
## Erweiterung für andere Bundesländer
|
||||
|
||||
Die Pipeline ist so designed, dass sie leicht erweitert werden kann:
|
||||
|
||||
### 1. Neues Bundesland hinzufügen
|
||||
|
||||
```python
|
||||
# In nibis_ingestion.py
|
||||
|
||||
# Bundesland-Code (ISO 3166-2:DE)
|
||||
BUNDESLAND_CODES = {
|
||||
"NI": "Niedersachsen",
|
||||
"BE": "Berlin",
|
||||
"BY": "Bayern",
|
||||
# ...
|
||||
}
|
||||
|
||||
# Parsing-Funktion für neues Format
|
||||
def parse_filename_berlin(filename: str, file_path: Path) -> Optional[Dict]:
|
||||
# Berlin-spezifische Namenskonvention
|
||||
pass
|
||||
```
|
||||
|
||||
### 2. Neues Verzeichnis registrieren
|
||||
|
||||
```python
|
||||
# docs/za-download-berlin/ hinzufügen
|
||||
ZA_DOWNLOAD_DIRS = [
|
||||
"za-download",
|
||||
"za-download-2",
|
||||
"za-download-3",
|
||||
"za-download-berlin", # NEU
|
||||
]
|
||||
```
|
||||
|
||||
### 3. Dokumenttyp-Erweiterung
|
||||
|
||||
Für Zeugnisgeneration oder andere Dokumenttypen:
|
||||
|
||||
```python
|
||||
DOC_TYPES = {
|
||||
"EWH": "Erwartungshorizont",
|
||||
"ZEUGNIS_VORLAGE": "Zeugnisvorlage",
|
||||
"NOTENSPIEGEL": "Notenspiegel",
|
||||
"BEMERKUNG": "Bemerkungstexte",
|
||||
}
|
||||
```
|
||||
|
||||
## Rechtliche Hinweise
|
||||
|
||||
- NiBiS-Daten sind unter den [NiBiS-Nutzungsbedingungen](https://nibis.de) frei nutzbar
|
||||
- `training_allowed: true` - Strukturelles Wissen darf für KI-Training genutzt werden
|
||||
- Für Lehrer-eigene Erwartungshorizonte (BYOEH) gilt: `training_allowed: false`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Qdrant nicht erreichbar
|
||||
|
||||
```bash
|
||||
# Prüfen ob Qdrant läuft
|
||||
curl http://localhost:6333/health
|
||||
|
||||
# Docker starten
|
||||
docker-compose up -d qdrant
|
||||
```
|
||||
|
||||
### OpenAI API Fehler
|
||||
|
||||
```bash
|
||||
# API Key setzen
|
||||
export OPENAI_API_KEY=sk-...
|
||||
```
|
||||
|
||||
### PDF-Extraktion fehlgeschlagen
|
||||
|
||||
Einige PDFs können problematisch sein (gescannte Dokumente ohne OCR). Diese werden übersprungen und im Error-Log protokolliert.
|
||||
|
||||
## Performance
|
||||
|
||||
- ~500-1000 Chunks pro Minute (abhängig von OpenAI API)
|
||||
- ~2-3 GB Qdrant Storage für alle NiBiS-Daten (2016-2025)
|
||||
- Embeddings werden nur einmal generiert (idempotent via Hash)
|
||||
235
docs-src/services/klausur-service/OCR-Compare.md
Normal file
235
docs-src/services/klausur-service/OCR-Compare.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# OCR Compare - Block Review Feature
|
||||
|
||||
**Status:** Produktiv
|
||||
**Letzte Aktualisierung:** 2026-02-08
|
||||
**URL:** https://macmini:3002/ai/ocr-compare
|
||||
|
||||
---
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Das OCR Compare Tool ermoeglicht den Vergleich verschiedener OCR-Methoden zur Texterkennung aus gescannten Dokumenten. Die Block Review Funktion erlaubt eine zellenweise Ueberpruefung und Korrektur der OCR-Ergebnisse.
|
||||
|
||||
### Hauptfunktionen
|
||||
|
||||
| Feature | Beschreibung |
|
||||
|---------|--------------|
|
||||
| **Multi-Method OCR** | Vergleich von Vision LLM, Tesseract, PaddleOCR und Claude Vision |
|
||||
| **Grid Detection** | Automatische Erkennung von Tabellenstrukturen |
|
||||
| **Block Review** | Zellenweise Ueberpruefung und Korrektur |
|
||||
| **Session Persistence** | Sessions bleiben bei Seitenwechsel erhalten |
|
||||
| **High-Resolution Display** | Hochaufloesende Bildanzeige (zoom=2.0) |
|
||||
|
||||
---
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ admin-v2 (Next.js) │
|
||||
│ /app/(admin)/ai/ocr-compare/page.tsx │
|
||||
│ - PDF Upload & Session Management │
|
||||
│ - Grid Visualization mit SVG Overlay │
|
||||
│ - Block Review Panel │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI) │
|
||||
│ Port 8086 │
|
||||
│ - /api/v1/vocab/sessions (Session CRUD) │
|
||||
│ - /api/v1/vocab/sessions/{id}/pdf-thumbnail (Bild-Export) │
|
||||
│ - /api/v1/vocab/sessions/{id}/detect-grid (Grid-Erkennung) │
|
||||
│ - /api/v1/vocab/sessions/{id}/run-ocr (OCR-Ausfuehrung) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Komponenten
|
||||
|
||||
### GridOverlay
|
||||
|
||||
SVG-Overlay zur Visualisierung der erkannten Grid-Struktur.
|
||||
|
||||
**Datei:** `/admin-v2/components/ocr/GridOverlay.tsx`
|
||||
|
||||
```typescript
|
||||
interface GridOverlayProps {
|
||||
grid: GridData
|
||||
imageUrl?: string
|
||||
onCellClick?: (cell: GridCell) => void
|
||||
selectedCell?: GridCell | null
|
||||
showEmpty?: boolean // Leere Zellen anzeigen
|
||||
showLabels?: boolean // Spaltenlabels (EN, DE, Ex)
|
||||
showNumbers?: boolean // Block-Nummern anzeigen
|
||||
highlightedBlockNumber?: number | null // Hervorgehobener Block
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
**Zellenstatus-Farben:**
|
||||
|
||||
| Status | Farbe | Bedeutung |
|
||||
|--------|-------|-----------|
|
||||
| `recognized` | Gruen | Text erfolgreich erkannt |
|
||||
| `problematic` | Orange | Niedriger Confidence-Wert |
|
||||
| `manual` | Blau | Manuell korrigiert |
|
||||
| `empty` | Transparent | Keine Erkennung |
|
||||
|
||||
### BlockReviewPanel
|
||||
|
||||
Panel zur Block-fuer-Block Ueberpruefung der OCR-Ergebnisse.
|
||||
|
||||
**Datei:** `/admin-v2/components/ocr/BlockReviewPanel.tsx`
|
||||
|
||||
```typescript
|
||||
interface BlockReviewPanelProps {
|
||||
grid: GridData
|
||||
methodResults: Record<string, { vocabulary: Array<...> }>
|
||||
currentBlockNumber: number
|
||||
onBlockChange: (blockNumber: number) => void
|
||||
onApprove: (blockNumber: number, methodId: string, text: string) => void
|
||||
onCorrect: (blockNumber: number, correctedText: string) => void
|
||||
onSkip: (blockNumber: number) => void
|
||||
reviewData: Record<number, BlockReviewData>
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
**Review-Status:**
|
||||
|
||||
| Status | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `pending` | Noch nicht ueberprueft |
|
||||
| `approved` | OCR-Ergebnis akzeptiert |
|
||||
| `corrected` | Manuell korrigiert |
|
||||
| `skipped` | Uebersprungen |
|
||||
|
||||
### BlockReviewSummary
|
||||
|
||||
Zusammenfassung aller ueberprueften Bloecke.
|
||||
|
||||
```typescript
|
||||
interface BlockReviewSummaryProps {
|
||||
reviewData: Record<number, BlockReviewData>
|
||||
totalBlocks: number
|
||||
onBlockClick: (blockNumber: number) => void
|
||||
className?: string
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OCR-Methoden
|
||||
|
||||
| ID | Name | Beschreibung |
|
||||
|----|------|--------------|
|
||||
| `vision_llm` | Vision LLM | Qwen VL 32B ueber Ollama |
|
||||
| `tesseract` | Tesseract | Klassisches OCR (lokal) |
|
||||
| `paddleocr` | PaddleOCR | PaddleOCR Engine |
|
||||
| `claude_vision` | Claude Vision | Anthropic Claude Vision API |
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Session Management
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/vocab/upload-pdf-info` | PDF hochladen |
|
||||
| GET | `/api/v1/vocab/sessions/{id}` | Session-Details |
|
||||
| DELETE | `/api/v1/vocab/sessions/{id}` | Session loeschen |
|
||||
|
||||
### Bildexport
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/vocab/sessions/{id}/pdf-thumbnail/{page}` | Thumbnail (zoom=0.5) |
|
||||
| GET | `/api/v1/vocab/sessions/{id}/pdf-thumbnail/{page}?hires=true` | High-Res (zoom=2.0) |
|
||||
|
||||
### Grid-Erkennung
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/vocab/sessions/{id}/detect-grid` | Grid-Struktur erkennen |
|
||||
| POST | `/api/v1/vocab/sessions/{id}/run-ocr` | OCR auf Grid ausfuehren |
|
||||
|
||||
---
|
||||
|
||||
## Session Persistence
|
||||
|
||||
Die aktive Session wird im localStorage gespeichert:
|
||||
|
||||
```javascript
|
||||
// Speichern
|
||||
localStorage.setItem('ocr-compare-active-session', sessionId)
|
||||
|
||||
// Wiederherstellen beim Seitenladen
|
||||
const lastSessionId = localStorage.getItem('ocr-compare-active-session')
|
||||
if (lastSessionId) {
|
||||
// Session-Daten laden
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Block Review Workflow
|
||||
|
||||
1. **PDF hochladen** - Dokument in das System laden
|
||||
2. **Grid erkennen** - Automatische Tabellenerkennung
|
||||
3. **OCR ausfuehren** - Alle Methoden parallel ausfuehren
|
||||
4. **Block Review starten** - "Block Review" Button klicken
|
||||
5. **Bloecke pruefen** - Fuer jeden Block:
|
||||
- Ergebnisse aller Methoden vergleichen
|
||||
- Bestes Ergebnis waehlen oder manuell korrigieren
|
||||
6. **Zusammenfassung** - Uebersicht der Korrekturen
|
||||
|
||||
---
|
||||
|
||||
## High-Resolution Bilder
|
||||
|
||||
Fuer die Anzeige werden hochaufloesende Bilder verwendet:
|
||||
|
||||
```typescript
|
||||
// Thumbnail URL mit High-Resolution Parameter
|
||||
const imageUrl = `${KLAUSUR_API}/api/v1/vocab/sessions/${sessionId}/pdf-thumbnail/${pageNumber}?hires=true`
|
||||
```
|
||||
|
||||
| Parameter | Zoom | Verwendung |
|
||||
|-----------|------|------------|
|
||||
| Ohne `hires` | 0.5 | Vorschau/Thumbnails |
|
||||
| Mit `hires=true` | 2.0 | Anzeige/OCR |
|
||||
|
||||
---
|
||||
|
||||
## Dateien
|
||||
|
||||
### Frontend (admin-v2)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `app/(admin)/ai/ocr-compare/page.tsx` | Haupt-UI |
|
||||
| `components/ocr/GridOverlay.tsx` | SVG Grid-Overlay |
|
||||
| `components/ocr/BlockReviewPanel.tsx` | Review-Panel |
|
||||
| `components/ocr/CellCorrectionDialog.tsx` | Korrektur-Dialog |
|
||||
| `components/ocr/index.ts` | Exports |
|
||||
|
||||
### Backend (klausur-service)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `vocab_worksheet_api.py` | API-Router |
|
||||
| `hybrid_vocab_extractor.py` | OCR-Extraktion |
|
||||
|
||||
---
|
||||
|
||||
## Aenderungshistorie
|
||||
|
||||
| Datum | Aenderung |
|
||||
|-------|-----------|
|
||||
| 2026-02-08 | Block Review Feature hinzugefuegt |
|
||||
| 2026-02-08 | High-Resolution Bilder aktiviert |
|
||||
| 2026-02-08 | Session Persistence implementiert |
|
||||
| 2026-02-07 | Grid Detection und Multi-Method OCR |
|
||||
445
docs-src/services/klausur-service/OCR-Labeling-Spec.md
Normal file
445
docs-src/services/klausur-service/OCR-Labeling-Spec.md
Normal file
@@ -0,0 +1,445 @@
|
||||
# OCR-Labeling System Spezifikation
|
||||
|
||||
**Version:** 1.1.0
|
||||
**Status:** In Produktion (Mac Mini)
|
||||
|
||||
## Übersicht
|
||||
|
||||
Das OCR-Labeling System ermöglicht das Erstellen von Trainingsdaten für Handschrift-OCR-Modelle aus eingescannten Klausuren. Es unterstützt folgende OCR-Modelle:
|
||||
|
||||
| Modell | Beschreibung | Geschwindigkeit | Empfohlen für |
|
||||
|--------|--------------|-----------------|---------------|
|
||||
| **llama3.2-vision:11b** | Vision-LLM (Standard) | Langsam | Handschrift, beste Qualität |
|
||||
| **TrOCR** | Microsoft Transformer | Schnell | Gedruckter Text |
|
||||
| **PaddleOCR + LLM** | Hybrid-Ansatz (NEU) | Sehr schnell (4x) | Gemischte Dokumente |
|
||||
| **Donut** | Document Understanding (NEU) | Mittel | Tabellen, Formulare |
|
||||
| **qwen2.5:14b** | Korrektur-LLM | - | Klausurbewertung |
|
||||
|
||||
### Neue OCR-Optionen (v1.1.0)
|
||||
|
||||
#### PaddleOCR + LLM (Empfohlen für Geschwindigkeit)
|
||||
|
||||
PaddleOCR ist ein zweistufiger Ansatz:
|
||||
1. **PaddleOCR** - Schnelle, präzise Texterkennung mit Bounding-Boxes
|
||||
2. **qwen2.5:14b** - Semantische Strukturierung des erkannten Texts
|
||||
|
||||
**Vorteile:**
|
||||
- 4x schneller als Vision-LLM (~7-15 Sek vs 30-60 Sek pro Seite)
|
||||
- Höhere Genauigkeit bei gedrucktem Text (95-99%)
|
||||
- Weniger Halluzinationen (LLM korrigiert nur, erfindet nicht)
|
||||
- Position-basierte Spaltenerkennung möglich
|
||||
|
||||
**Dateien:**
|
||||
- `/klausur-service/backend/hybrid_vocab_extractor.py` - PaddleOCR Integration
|
||||
|
||||
#### Donut (Document Understanding Transformer)
|
||||
|
||||
Donut ist speziell für strukturierte Dokumente optimiert:
|
||||
- Tabellen und Formulare
|
||||
- Rechnungen und Quittungen
|
||||
- Multi-Spalten-Layouts
|
||||
|
||||
**Dateien:**
|
||||
- `/klausur-service/backend/services/donut_ocr_service.py` - Donut Service
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────┐
|
||||
│ OCR-Labeling System │
|
||||
├──────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────────┐ ┌────────────────────────┐ │
|
||||
│ │ Frontend │◄──►│ Klausur-Service │◄──►│ PostgreSQL │ │
|
||||
│ │ (Next.js) │ │ (FastAPI) │ │ - ocr_labeling_sessions│ │
|
||||
│ │ Port 3000 │ │ Port 8086 │ │ - ocr_labeling_items │ │
|
||||
│ └─────────────┘ └────────┬─────────┘ │ - ocr_training_samples │ │
|
||||
│ │ └────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────┼──────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────┐ ┌─────────┐ ┌───────────────┐ │
|
||||
│ │ MinIO │ │ Ollama │ │ Export Service │ │
|
||||
│ │ (Images) │ │ (OCR) │ │ (Training) │ │
|
||||
│ │ Port 9000 │ │ :11434 │ │ │ │
|
||||
│ └───────────┘ └─────────┘ └───────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Datenmodell
|
||||
|
||||
### PostgreSQL Tabellen
|
||||
|
||||
```sql
|
||||
-- Labeling Sessions (gruppiert zusammengehörige Bilder)
|
||||
CREATE TABLE ocr_labeling_sessions (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
source_type VARCHAR(50) NOT NULL, -- 'klausur', 'handwriting_sample', 'scan'
|
||||
description TEXT,
|
||||
ocr_model VARCHAR(100), -- z.B. 'llama3.2-vision:11b'
|
||||
total_items INTEGER DEFAULT 0,
|
||||
labeled_items INTEGER DEFAULT 0,
|
||||
confirmed_items INTEGER DEFAULT 0,
|
||||
corrected_items INTEGER DEFAULT 0,
|
||||
skipped_items INTEGER DEFAULT 0,
|
||||
teacher_id VARCHAR(100),
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Einzelne Labeling Items (Bild + OCR + Ground Truth)
|
||||
CREATE TABLE ocr_labeling_items (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
session_id VARCHAR(36) REFERENCES ocr_labeling_sessions(id),
|
||||
image_path TEXT NOT NULL, -- MinIO Pfad oder lokaler Pfad
|
||||
image_hash VARCHAR(64), -- SHA256 für Deduplizierung
|
||||
ocr_text TEXT, -- Von LLM erkannter Text
|
||||
ocr_confidence FLOAT, -- Konfidenz (0-1)
|
||||
ocr_model VARCHAR(100),
|
||||
ground_truth TEXT, -- Korrigierter/bestätigter Text
|
||||
status VARCHAR(20) DEFAULT 'pending', -- pending/confirmed/corrected/skipped
|
||||
labeled_by VARCHAR(100),
|
||||
labeled_at TIMESTAMP,
|
||||
label_time_seconds INTEGER,
|
||||
metadata JSONB,
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Exportierte Training Samples
|
||||
CREATE TABLE ocr_training_samples (
|
||||
id VARCHAR(36) PRIMARY KEY,
|
||||
item_id VARCHAR(36) REFERENCES ocr_labeling_items(id),
|
||||
image_path TEXT NOT NULL,
|
||||
ground_truth TEXT NOT NULL,
|
||||
export_format VARCHAR(50) NOT NULL, -- 'generic', 'trocr', 'llama_vision'
|
||||
exported_at TIMESTAMP DEFAULT NOW(),
|
||||
training_batch VARCHAR(100),
|
||||
used_in_training BOOLEAN DEFAULT FALSE
|
||||
);
|
||||
```
|
||||
|
||||
## API Referenz
|
||||
|
||||
Base URL: `http://macmini:8086/api/v1/ocr-label`
|
||||
|
||||
### Sessions
|
||||
|
||||
#### POST /sessions
|
||||
Neue Labeling-Session erstellen.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "Klausur Deutsch 12a Q1",
|
||||
"source_type": "klausur",
|
||||
"description": "Gedichtanalyse Expressionismus",
|
||||
"ocr_model": "llama3.2-vision:11b"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "abc-123-def",
|
||||
"name": "Klausur Deutsch 12a Q1",
|
||||
"source_type": "klausur",
|
||||
"total_items": 0,
|
||||
"labeled_items": 0,
|
||||
"created_at": "2026-01-21T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /sessions
|
||||
Sessions auflisten.
|
||||
|
||||
**Query Parameter:**
|
||||
- `limit` (int, default: 50) - Maximale Anzahl
|
||||
|
||||
#### GET /sessions/{session_id}
|
||||
Einzelne Session abrufen.
|
||||
|
||||
### Upload
|
||||
|
||||
#### POST /sessions/{session_id}/upload
|
||||
Bilder zu einer Session hochladen.
|
||||
|
||||
**Request:** Multipart Form Data
|
||||
- `files` (File[]) - PNG/JPG/PDF Dateien
|
||||
- `run_ocr` (bool, default: true) - OCR direkt ausführen
|
||||
- `metadata` (JSON string) - Optional: Metadaten
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"session_id": "abc-123-def",
|
||||
"uploaded_count": 5,
|
||||
"items": [
|
||||
{
|
||||
"id": "item-1",
|
||||
"filename": "scan_001.png",
|
||||
"image_path": "ocr-labeling/abc-123/item-1.png",
|
||||
"ocr_text": "Die Lösung der Aufgabe...",
|
||||
"ocr_confidence": 0.87,
|
||||
"status": "pending"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Labeling Queue
|
||||
|
||||
#### GET /queue
|
||||
Nächste zu labelnde Items abrufen.
|
||||
|
||||
**Query Parameter:**
|
||||
- `session_id` (str, optional) - Nach Session filtern
|
||||
- `status` (str, default: "pending") - Status-Filter
|
||||
- `limit` (int, default: 10) - Maximale Anzahl
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "item-456",
|
||||
"session_id": "abc-123",
|
||||
"session_name": "Klausur Deutsch",
|
||||
"image_path": "/app/ocr-labeling/abc-123/item-456.png",
|
||||
"image_url": "/api/v1/ocr-label/images/abc-123/item-456.png",
|
||||
"ocr_text": "Erkannter Text...",
|
||||
"ocr_confidence": 0.87,
|
||||
"ground_truth": null,
|
||||
"status": "pending",
|
||||
"metadata": {"page": 1}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Labeling Actions
|
||||
|
||||
#### POST /confirm
|
||||
OCR-Text als korrekt bestätigen.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456",
|
||||
"label_time_seconds": 5
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `ground_truth = ocr_text`, `status = 'confirmed'`
|
||||
|
||||
#### POST /correct
|
||||
Ground Truth korrigieren.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456",
|
||||
"ground_truth": "Korrigierter Text hier",
|
||||
"label_time_seconds": 15
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `ground_truth = <input>`, `status = 'corrected'`
|
||||
|
||||
#### POST /skip
|
||||
Item überspringen (unbrauchbar).
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"item_id": "item-456"
|
||||
}
|
||||
```
|
||||
|
||||
**Effect:** `status = 'skipped'` (wird nicht exportiert)
|
||||
|
||||
### Statistiken
|
||||
|
||||
#### GET /stats
|
||||
Labeling-Statistiken abrufen.
|
||||
|
||||
**Query Parameter:**
|
||||
- `session_id` (str, optional) - Für Session-spezifische Stats
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"total_items": 100,
|
||||
"labeled_items": 75,
|
||||
"confirmed_items": 60,
|
||||
"corrected_items": 15,
|
||||
"pending_items": 25,
|
||||
"accuracy_rate": 0.80,
|
||||
"avg_label_time_seconds": 8.5
|
||||
}
|
||||
```
|
||||
|
||||
### Training Export
|
||||
|
||||
#### POST /export
|
||||
Trainingsdaten exportieren.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"export_format": "trocr",
|
||||
"session_id": "abc-123",
|
||||
"batch_id": "batch_20260121"
|
||||
}
|
||||
```
|
||||
|
||||
**Export Formate:**
|
||||
|
||||
| Format | Beschreibung | Output |
|
||||
|--------|--------------|--------|
|
||||
| `generic` | Allgemeines JSONL | `{"id", "image_path", "ground_truth", ...}` |
|
||||
| `trocr` | Microsoft TrOCR | `{"file_name", "text", "id"}` |
|
||||
| `llama_vision` | Llama 3.2 Vision | OpenAI-style Messages mit image_url |
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"export_format": "trocr",
|
||||
"batch_id": "batch_20260121",
|
||||
"exported_count": 75,
|
||||
"export_path": "/app/ocr-exports/trocr/batch_20260121",
|
||||
"manifest_path": "/app/ocr-exports/trocr/batch_20260121/manifest.json",
|
||||
"samples": [...]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /exports
|
||||
Verfügbare Exports auflisten.
|
||||
|
||||
**Query Parameter:**
|
||||
- `export_format` (str, optional) - Nach Format filtern
|
||||
|
||||
## Export Formate im Detail
|
||||
|
||||
### TrOCR Format
|
||||
|
||||
```
|
||||
batch_20260121/
|
||||
├── manifest.json
|
||||
├── train.jsonl
|
||||
└── images/
|
||||
├── item-1.png
|
||||
└── item-2.png
|
||||
```
|
||||
|
||||
**train.jsonl:**
|
||||
```jsonl
|
||||
{"file_name": "images/item-1.png", "text": "Ground truth text", "id": "item-1"}
|
||||
{"file_name": "images/item-2.png", "text": "Another text", "id": "item-2"}
|
||||
```
|
||||
|
||||
### Llama Vision Format
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"id": "item-1",
|
||||
"messages": [
|
||||
{"role": "system", "content": "Du bist ein OCR-Experte für deutsche Handschrift..."},
|
||||
{"role": "user", "content": [
|
||||
{"type": "image_url", "image_url": {"url": "images/item-1.png"}},
|
||||
{"type": "text", "text": "Lies den handgeschriebenen Text in diesem Bild."}
|
||||
]},
|
||||
{"role": "assistant", "content": "Ground truth text"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Generic Format
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"id": "item-1",
|
||||
"image_path": "images/item-1.png",
|
||||
"ground_truth": "Ground truth text",
|
||||
"ocr_text": "OCR recognized text",
|
||||
"ocr_confidence": 0.87,
|
||||
"metadata": {"page": 1, "session": "Deutsch 12a"}
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend Integration
|
||||
|
||||
Die OCR-Labeling UI ist unter `/admin/ocr-labeling` verfügbar.
|
||||
|
||||
### Keyboard Shortcuts
|
||||
|
||||
| Taste | Aktion |
|
||||
|-------|--------|
|
||||
| `Enter` | Bestätigen (OCR korrekt) |
|
||||
| `Tab` | Ins Korrekturfeld springen |
|
||||
| `Escape` | Überspringen |
|
||||
| `←` / `→` | Navigation (Prev/Next) |
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Session erstellen** - Name, Typ, OCR-Modell wählen
|
||||
2. **Bilder hochladen** - Drag & Drop oder File-Browser
|
||||
3. **Labeling durchführen** - Bild + OCR-Text vergleichen
|
||||
- Korrekt → Bestätigen (Enter)
|
||||
- Falsch → Korrigieren + Speichern
|
||||
- Unbrauchbar → Überspringen
|
||||
4. **Export** - Format wählen (TrOCR, Llama Vision, Generic)
|
||||
5. **Training starten** - Export-Ordner für Fine-Tuning nutzen
|
||||
|
||||
## Umgebungsvariablen
|
||||
|
||||
```bash
|
||||
# PostgreSQL
|
||||
DATABASE_URL=postgres://user:pass@postgres:5432/breakpilot_db
|
||||
|
||||
# MinIO (S3-kompatibel)
|
||||
MINIO_ENDPOINT=minio:9000
|
||||
MINIO_ACCESS_KEY=breakpilot
|
||||
MINIO_SECRET_KEY=breakpilot123
|
||||
MINIO_BUCKET=breakpilot-rag
|
||||
MINIO_SECURE=false
|
||||
|
||||
# Ollama (Vision-LLM)
|
||||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
OLLAMA_VISION_MODEL=llama3.2-vision:11b
|
||||
OLLAMA_CORRECTION_MODEL=qwen2.5:14b
|
||||
|
||||
# Export
|
||||
OCR_EXPORT_PATH=/app/ocr-exports
|
||||
OCR_STORAGE_PATH=/app/ocr-labeling
|
||||
```
|
||||
|
||||
## Sicherheit & Datenschutz
|
||||
|
||||
- **100% Lokale Verarbeitung** - Alle Daten bleiben auf dem Mac Mini
|
||||
- **Keine Cloud-Uploads** - Ollama läuft vollständig offline
|
||||
- **DSGVO-konform** - Keine Schülerdaten verlassen das Schulnetzwerk
|
||||
- **Deduplizierung** - SHA256-Hash verhindert doppelte Bilder
|
||||
|
||||
## Dateien
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `klausur-service/backend/ocr_labeling_api.py` | FastAPI Router mit OCR Model Dispatcher |
|
||||
| `klausur-service/backend/training_export_service.py` | Export-Service für TrOCR/Llama |
|
||||
| `klausur-service/backend/metrics_db.py` | PostgreSQL CRUD Funktionen |
|
||||
| `klausur-service/backend/minio_storage.py` | MinIO OCR-Image Storage |
|
||||
| `klausur-service/backend/hybrid_vocab_extractor.py` | PaddleOCR Integration |
|
||||
| `klausur-service/backend/services/donut_ocr_service.py` | Donut OCR Service (NEU) |
|
||||
| `klausur-service/backend/services/trocr_service.py` | TrOCR Service (NEU) |
|
||||
| `website/app/admin/ocr-labeling/page.tsx` | Frontend UI mit Model-Auswahl |
|
||||
| `website/app/admin/ocr-labeling/types.ts` | TypeScript Interfaces inkl. OCRModel Type |
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
# Backend-Tests ausführen
|
||||
cd klausur-service/backend
|
||||
pytest tests/test_ocr_labeling.py -v
|
||||
|
||||
# Mit Coverage
|
||||
pytest tests/test_ocr_labeling.py --cov=. --cov-report=html
|
||||
```
|
||||
472
docs-src/services/klausur-service/RAG-Admin-Spec.md
Normal file
472
docs-src/services/klausur-service/RAG-Admin-Spec.md
Normal file
@@ -0,0 +1,472 @@
|
||||
# RAG & Daten-Management Spezifikation
|
||||
|
||||
## Übersicht
|
||||
|
||||
Admin-Frontend für die Verwaltung von Trainingsdaten und RAG-Systemen in BreakPilot.
|
||||
|
||||
**Location**: `/admin/docs` → Tab "Daten & RAG"
|
||||
**Backend**: `klausur-service` (Port 8086)
|
||||
**Storage**: MinIO (persistentes Docker Volume `minio_data`)
|
||||
**Vector DB**: Qdrant (Port 6333)
|
||||
|
||||
## Datenmodell
|
||||
|
||||
### Zwei Datentypen mit unterschiedlichen Regeln
|
||||
|
||||
| Typ | Quelle | Training erlaubt | Isolation | Collection |
|
||||
|-----|--------|------------------|-----------|------------|
|
||||
| **Landes-Daten** | NiBiS, andere Bundesländer | ✅ Ja | Pro Bundesland | `bp_{bundesland}_{usecase}` |
|
||||
| **Lehrer-Daten** | Lehrer-Upload (BYOEH) | ❌ Nein | Pro Tenant (Schule/Lehrer) | `bp_eh` (verschlüsselt) |
|
||||
|
||||
### Bundesland-Codes (ISO 3166-2:DE)
|
||||
|
||||
```
|
||||
NI = Niedersachsen BY = Bayern BW = Baden-Württemberg
|
||||
NW = Nordrhein-Westf. HE = Hessen SN = Sachsen
|
||||
BE = Berlin HH = Hamburg SH = Schleswig-Holstein
|
||||
BB = Brandenburg MV = Meckl.-Vorp. ST = Sachsen-Anhalt
|
||||
TH = Thüringen RP = Rheinland-Pfalz SL = Saarland
|
||||
HB = Bremen
|
||||
```
|
||||
|
||||
### Use Cases (RAG-Sammlungen)
|
||||
|
||||
| Use Case | Collection Pattern | Beschreibung |
|
||||
|----------|-------------------|--------------|
|
||||
| Klausurkorrektur | `bp_{bl}_klausur` | Erwartungshorizonte für Abitur |
|
||||
| Zeugnisgenerator | `bp_{bl}_zeugnis` | Textbausteine für Zeugnisse |
|
||||
| Lehrplan | `bp_{bl}_lehrplan` | Kerncurricula, Rahmenrichtlinien |
|
||||
|
||||
Beispiel: `bp_ni_klausur` = Niedersachsen Klausurkorrektur
|
||||
|
||||
## MinIO Bucket-Struktur
|
||||
|
||||
```
|
||||
breakpilot-rag/
|
||||
├── landes-daten/
|
||||
│ ├── ni/ # Niedersachsen
|
||||
│ │ ├── klausur/
|
||||
│ │ │ ├── 2016/
|
||||
│ │ │ │ ├── manifest.json
|
||||
│ │ │ │ └── *.pdf
|
||||
│ │ │ ├── 2017/
|
||||
│ │ │ ├── ...
|
||||
│ │ │ └── 2025/
|
||||
│ │ └── zeugnis/
|
||||
│ ├── by/ # Bayern
|
||||
│ └── .../
|
||||
│
|
||||
└── lehrer-daten/ # BYOEH - verschlüsselt
|
||||
└── {tenant_id}/
|
||||
└── {lehrer_id}/
|
||||
└── *.pdf.enc
|
||||
```
|
||||
|
||||
## Qdrant Schema
|
||||
|
||||
### Landes-Daten Collection (z.B. `bp_ni_klausur`)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid-v5-from-string",
|
||||
"vector": [384 dimensions],
|
||||
"payload": {
|
||||
"original_id": "nibis_2024_deutsch_ea_1_abc123_chunk_0",
|
||||
"doc_id": "nibis_2024_deutsch_ea_1_abc123",
|
||||
"chunk_index": 0,
|
||||
"text": "Der Erwartungshorizont...",
|
||||
"year": 2024,
|
||||
"subject": "Deutsch",
|
||||
"niveau": "eA",
|
||||
"task_number": 1,
|
||||
"doc_type": "EWH",
|
||||
"bundesland": "NI",
|
||||
"source": "nibis",
|
||||
"training_allowed": true,
|
||||
"minio_path": "landes-daten/ni/klausur/2024/2024_Deutsch_eA_I_EWH.pdf"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Lehrer-Daten Collection (`bp_eh`)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"vector": [384 dimensions],
|
||||
"payload": {
|
||||
"tenant_id": "schule_123",
|
||||
"eh_id": "eh_abc",
|
||||
"chunk_index": 0,
|
||||
"subject": "deutsch",
|
||||
"encrypted_content": "base64...",
|
||||
"training_allowed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend-Komponenten
|
||||
|
||||
### 1. Sammlungen-Übersicht (`/admin/rag/collections`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Daten & RAG │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sammlungen [+ Neu] │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ 📚 Niedersachsen - Klausurkorrektur │ │
|
||||
│ │ bp_ni_klausur | 630 Docs | 4.521 Chunks | 2016-2025 │ │
|
||||
│ │ [Suchen] [Indexieren] [Details] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ 📚 Niedersachsen - Zeugnisgenerator │ │
|
||||
│ │ bp_ni_zeugnis | 0 Docs | Leer │ │
|
||||
│ │ [Suchen] [Indexieren] [Details] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2. Upload-Bereich (`/admin/rag/upload`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Dokumente hochladen │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Ziel-Sammlung: [Niedersachsen - Klausurkorrektur ▼] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ │ 📁 ZIP-Datei oder Ordner hierher ziehen │ │
|
||||
│ │ │ │
|
||||
│ │ oder [Dateien auswählen] │ │
|
||||
│ │ │ │
|
||||
│ │ Unterstützt: .zip, .pdf, Ordner │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Upload-Queue: │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ ✅ 2018.zip - 45 PDFs erkannt │ │
|
||||
│ │ ⏳ 2019.zip - Wird analysiert... │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [Hochladen & Indexieren] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3. Ingestion-Status (`/admin/rag/ingestion`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Ingestion Status │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Aktueller Job: Niedersachsen Klausur 2024 │
|
||||
│ ████████████████████░░░░░░░░░░ 65% (412/630 Docs) │
|
||||
│ Chunks: 2.891 | Fehler: 3 | ETA: 4:32 │
|
||||
│ [Pausieren] [Abbrechen] │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Letzte Jobs: │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ ✅ 09.01.2025 15:30 - NI Klausur 2024 - 128 Chunks │ │
|
||||
│ │ ✅ 09.01.2025 14:00 - NI Klausur 2017 - 890 Chunks │ │
|
||||
│ │ ❌ 08.01.2025 10:15 - BY Klausur - Fehler: Timeout │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 4. Suche & Qualitätstest (`/admin/rag/search`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RAG Suche & Qualitätstest │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sammlung: [Niedersachsen - Klausurkorrektur ▼] │
|
||||
│ │
|
||||
│ Query: [Analyse eines Gedichts von Rilke ] │
|
||||
│ │
|
||||
│ Filter: │
|
||||
│ Jahr: [Alle ▼] Fach: [Deutsch ▼] Niveau: [eA ▼] │
|
||||
│ │
|
||||
│ [🔍 Suchen] │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Ergebnisse (3): Latenz: 45ms │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ #1 | Score: 0.847 | 2024 Deutsch eA Aufgabe 2 │ │
|
||||
│ │ │ │
|
||||
│ │ "...Die Analyse des Rilke-Gedichts soll folgende │ │
|
||||
│ │ Aspekte berücksichtigen: Aufbau, Bildsprache..." │ │
|
||||
│ │ │ │
|
||||
│ │ Relevanz: [⭐⭐⭐⭐⭐] [⭐⭐⭐⭐] [⭐⭐⭐] [⭐⭐] [⭐] │ │
|
||||
│ │ Notizen: [Optional: Warum relevant/nicht relevant? ] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 5. Metriken-Dashboard (`/admin/rag/metrics`)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RAG Qualitätsmetriken │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Zeitraum: [Letzte 7 Tage ▼] Sammlung: [Alle ▼] │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Precision@5 │ │ Recall@10 │ │ MRR │ │
|
||||
│ │ 0.78 │ │ 0.85 │ │ 0.72 │ │
|
||||
│ │ ↑ +5% │ │ ↑ +3% │ │ ↓ -2% │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Avg Latency │ │ Bewertungen │ │ Fehlerrate │ │
|
||||
│ │ 52ms │ │ 127 │ │ 0.3% │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ───────────────────────────────────────────────────────────── │
|
||||
│ │
|
||||
│ Score-Verteilung: │
|
||||
│ 0.9+ ████████████████ 23% │
|
||||
│ 0.7+ ████████████████████████████ 41% │
|
||||
│ 0.5+ ████████████████████ 28% │
|
||||
│ <0.5 ██████ 8% │
|
||||
│ │
|
||||
│ [Export CSV] [Detailbericht] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Collections API
|
||||
|
||||
```
|
||||
GET /api/v1/admin/rag/collections
|
||||
POST /api/v1/admin/rag/collections
|
||||
GET /api/v1/admin/rag/collections/{id}
|
||||
DELETE /api/v1/admin/rag/collections/{id}
|
||||
GET /api/v1/admin/rag/collections/{id}/stats
|
||||
```
|
||||
|
||||
### Upload API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/upload
|
||||
Content-Type: multipart/form-data
|
||||
- file: ZIP oder PDF
|
||||
- collection_id: string
|
||||
- metadata: JSON (optional)
|
||||
|
||||
POST /api/v1/admin/rag/upload/folder
|
||||
- Für Ordner-Upload (WebKitDirectory)
|
||||
```
|
||||
|
||||
### Ingestion API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/ingest
|
||||
- collection_id: string
|
||||
- filters: {year?, subject?, doc_type?}
|
||||
|
||||
GET /api/v1/admin/rag/ingest/status
|
||||
GET /api/v1/admin/rag/ingest/history
|
||||
POST /api/v1/admin/rag/ingest/cancel
|
||||
```
|
||||
|
||||
### Search API
|
||||
|
||||
```
|
||||
POST /api/v1/admin/rag/search
|
||||
- query: string
|
||||
- collection_id: string
|
||||
- filters: {year?, subject?, niveau?}
|
||||
- limit: int
|
||||
|
||||
POST /api/v1/admin/rag/search/feedback
|
||||
- result_id: string
|
||||
- rating: 1-5
|
||||
- notes: string (optional)
|
||||
```
|
||||
|
||||
### Metrics API
|
||||
|
||||
```
|
||||
GET /api/v1/admin/rag/metrics
|
||||
- collection_id?: string
|
||||
- from_date?: date
|
||||
- to_date?: date
|
||||
|
||||
GET /api/v1/admin/rag/metrics/export
|
||||
- format: csv|json
|
||||
```
|
||||
|
||||
## Embedding-Konfiguration
|
||||
|
||||
```python
|
||||
# Default: Lokale Embeddings (kein API-Key nötig)
|
||||
EMBEDDING_BACKEND = "local"
|
||||
LOCAL_EMBEDDING_MODEL = "all-MiniLM-L6-v2"
|
||||
VECTOR_DIMENSIONS = 384
|
||||
|
||||
# Optional: OpenAI (für Produktion)
|
||||
EMBEDDING_BACKEND = "openai"
|
||||
EMBEDDING_MODEL = "text-embedding-3-small"
|
||||
VECTOR_DIMENSIONS = 1536
|
||||
```
|
||||
|
||||
## Datenpersistenz
|
||||
|
||||
### Docker Volumes (WICHTIG - nicht löschen!)
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
minio_data: # Alle hochgeladenen Dokumente
|
||||
qdrant_data: # Alle Vektoren und Embeddings
|
||||
postgres_data: # Metadaten, Bewertungen, History
|
||||
```
|
||||
|
||||
### Backup-Strategie
|
||||
|
||||
```bash
|
||||
# MinIO Backup
|
||||
docker exec breakpilot-pwa-minio mc mirror /data /backup
|
||||
|
||||
# Qdrant Backup
|
||||
curl -X POST http://localhost:6333/collections/bp_ni_klausur/snapshots
|
||||
|
||||
# Postgres Backup (bereits implementiert)
|
||||
# Läuft automatisch täglich um 2 Uhr
|
||||
```
|
||||
|
||||
## Implementierungsreihenfolge
|
||||
|
||||
1. ✅ Backend: Basis-Ingestion (nibis_ingestion.py)
|
||||
2. ✅ Backend: Lokale Embeddings (sentence-transformers)
|
||||
3. ✅ Backend: MinIO-Integration (minio_storage.py)
|
||||
4. ✅ Backend: Collections API (admin_api.py)
|
||||
5. ✅ Backend: Upload API mit ZIP-Support
|
||||
6. ✅ Backend: Metrics API mit PostgreSQL (metrics_db.py)
|
||||
7. ✅ Frontend: Sammlungen-Übersicht
|
||||
8. ✅ Frontend: Upload-Bereich (Drag & Drop)
|
||||
9. ✅ Frontend: Ingestion-Status
|
||||
10. ✅ Frontend: Suche & Qualitätstest (mit Stern-Bewertungen)
|
||||
11. ✅ Frontend: Metriken-Dashboard
|
||||
|
||||
## Technologie-Stack
|
||||
|
||||
- **Frontend**: Next.js 15 (`/website/app/admin/rag/page.tsx`)
|
||||
- **Backend**: FastAPI (`klausur-service/backend/`)
|
||||
- **Vector DB**: Qdrant v1.7.4 (384-dim Vektoren)
|
||||
- **Object Storage**: MinIO (S3-kompatibel)
|
||||
- **Embeddings**: sentence-transformers `all-MiniLM-L6-v2`
|
||||
- **Metrics DB**: PostgreSQL 16
|
||||
|
||||
## Entwickler-Dokumentation
|
||||
|
||||
### Projektstruktur
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # FastAPI App + BYOEH Endpoints
|
||||
│ ├── admin_api.py # RAG Admin API (Upload, Search, Metrics)
|
||||
│ ├── nibis_ingestion.py # NiBiS Dokument-Ingestion Pipeline
|
||||
│ ├── eh_pipeline.py # Chunking, Embeddings, Encryption
|
||||
│ ├── qdrant_service.py # Qdrant Client + Search
|
||||
│ ├── minio_storage.py # MinIO S3 Storage
|
||||
│ ├── metrics_db.py # PostgreSQL Metrics
|
||||
│ ├── requirements.txt # Python Dependencies
|
||||
│ └── tests/
|
||||
│ └── test_rag_admin.py
|
||||
└── docs/
|
||||
└── RAG-Admin-Spec.md # Diese Datei
|
||||
```
|
||||
|
||||
### Schnellstart für Entwickler
|
||||
|
||||
```bash
|
||||
# 1. Services starten
|
||||
cd /path/to/breakpilot-pwa
|
||||
docker-compose up -d qdrant minio postgres
|
||||
|
||||
# 2. Dependencies installieren
|
||||
cd klausur-service/backend
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 3. Service starten
|
||||
python -m uvicorn main:app --port 8086 --reload
|
||||
|
||||
# 4. RAG-Services initialisieren (erstellt Bucket + Tabellen)
|
||||
curl -X POST http://localhost:8086/api/v1/admin/rag/init
|
||||
```
|
||||
|
||||
### API-Referenz (Implementiert)
|
||||
|
||||
#### NiBiS Ingestion
|
||||
```
|
||||
GET /api/v1/admin/nibis/discover # Dokumente finden
|
||||
POST /api/v1/admin/nibis/ingest # Indexierung starten
|
||||
GET /api/v1/admin/nibis/status # Status abfragen
|
||||
GET /api/v1/admin/nibis/stats # Statistiken
|
||||
POST /api/v1/admin/nibis/search # Semantische Suche
|
||||
GET /api/v1/admin/nibis/collections # Qdrant Collections
|
||||
```
|
||||
|
||||
#### RAG Upload & Storage
|
||||
```
|
||||
POST /api/v1/admin/rag/upload # ZIP/PDF hochladen
|
||||
GET /api/v1/admin/rag/upload/history # Upload-Verlauf
|
||||
GET /api/v1/admin/rag/storage/stats # MinIO Statistiken
|
||||
```
|
||||
|
||||
#### Metrics & Feedback
|
||||
```
|
||||
GET /api/v1/admin/rag/metrics # Qualitätsmetriken
|
||||
POST /api/v1/admin/rag/search/feedback # Bewertung abgeben
|
||||
POST /api/v1/admin/rag/init # Services initialisieren
|
||||
```
|
||||
|
||||
### Umgebungsvariablen
|
||||
|
||||
```bash
|
||||
# Qdrant
|
||||
QDRANT_URL=http://localhost:6333
|
||||
|
||||
# MinIO
|
||||
MINIO_ENDPOINT=localhost:9000
|
||||
MINIO_ACCESS_KEY=breakpilot
|
||||
MINIO_SECRET_KEY=breakpilot123
|
||||
MINIO_BUCKET=breakpilot-rag
|
||||
|
||||
# PostgreSQL
|
||||
DATABASE_URL=postgres://breakpilot:breakpilot123@localhost:5432/breakpilot_db
|
||||
|
||||
# Embeddings
|
||||
EMBEDDING_BACKEND=local
|
||||
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
|
||||
```
|
||||
|
||||
### Aktuelle Indexierungs-Statistik
|
||||
|
||||
- **Dokumente**: 579 Erwartungshorizonte (NiBiS)
|
||||
- **Chunks**: 7.352
|
||||
- **Jahre**: 2016, 2017, 2024, 2025
|
||||
- **Fächer**: Deutsch, Englisch, Mathematik, Physik, Chemie, Biologie, Geschichte, Politik-Wirtschaft, Erdkunde, Sport, Kunst, Musik, Latein, Informatik, Ev. Religion, Kath. Religion, Werte und Normen, etc.
|
||||
- **Collection**: `bp_nibis_eh`
|
||||
- **Vektor-Dimensionen**: 384
|
||||
@@ -0,0 +1,409 @@
|
||||
# Visual Worksheet Editor - Architecture Documentation
|
||||
|
||||
**Version:** 1.0
|
||||
**Status:** Implementiert
|
||||
|
||||
## 1. Übersicht
|
||||
|
||||
Der Visual Worksheet Editor ist ein Canvas-basierter Editor für die Erstellung und Bearbeitung von Arbeitsblättern. Er ermöglicht Lehrern, eingescannte Arbeitsblätter originalgetreu zu rekonstruieren oder neue Arbeitsblätter visuell zu gestalten.
|
||||
|
||||
### 1.1 Hauptfunktionen
|
||||
|
||||
- **Canvas-basiertes Editieren** mit Fabric.js
|
||||
- **Freie Positionierung** von Text, Bildern und Formen
|
||||
- **Typografie-Steuerung** (Schriftarten, Größen, Stile)
|
||||
- **Bilder & Grafiken** hochladen und einfügen
|
||||
- **KI-generierte Bilder** via Ollama/Stable Diffusion
|
||||
- **PDF/Bild-Export** für Druck und digitale Nutzung
|
||||
- **Mehrseitige Dokumente** mit Seitennavigation
|
||||
|
||||
### 1.2 Technologie-Stack
|
||||
|
||||
| Komponente | Technologie | Lizenz |
|
||||
|------------|-------------|--------|
|
||||
| Canvas-Bibliothek | Fabric.js 6.x | MIT |
|
||||
| PDF-Export | pdf-lib 1.17.x | MIT |
|
||||
| Frontend | Next.js / React | MIT |
|
||||
| Backend API | FastAPI | MIT |
|
||||
| KI-Bilder | Ollama + Stable Diffusion | Apache 2.0 / MIT |
|
||||
|
||||
## 2. Architektur
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (studio-v2 / Next.js) │
|
||||
│ /studio-v2/app/worksheet-editor/page.tsx │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌────────────────────────────┐ ┌────────────────┐ │
|
||||
│ │ Toolbar │ │ Fabric.js Canvas │ │ Properties │ │
|
||||
│ │ (Links) │ │ (Mitte - 60%) │ │ Panel │ │
|
||||
│ │ │ │ │ │ (Rechts) │ │
|
||||
│ │ - Select │ │ ┌──────────────────────┐ │ │ │ │
|
||||
│ │ - Text │ │ │ │ │ │ - Schriftart │ │
|
||||
│ │ - Formen │ │ │ A4 Arbeitsfläche │ │ │ - Größe │ │
|
||||
│ │ - Bilder │ │ │ mit Grid │ │ │ - Farbe │ │
|
||||
│ │ - KI-Bild │ │ │ │ │ │ - Position │ │
|
||||
│ │ - Tabelle │ │ └──────────────────────┘ │ │ - Ebene │ │
|
||||
│ └─────────────┘ └────────────────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Seiten-Navigation | Zoom | Grid | Export PDF │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI - Port 8086) │
|
||||
│ POST /api/v1/worksheet/ai-image → Bild via Ollama generieren │
|
||||
│ POST /api/v1/worksheet/save → Worksheet speichern │
|
||||
│ GET /api/v1/worksheet/{id} → Worksheet laden │
|
||||
│ POST /api/v1/worksheet/export-pdf → PDF generieren │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Ollama (Port 11434) │
|
||||
│ Model: stable-diffusion oder kompatibles Text-to-Image Modell │
|
||||
│ Text-to-Image für KI-generierte Grafiken │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 3. Dateistruktur
|
||||
|
||||
### 3.1 Frontend (studio-v2)
|
||||
|
||||
```
|
||||
/studio-v2/
|
||||
├── app/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── page.tsx # Haupt-Editor-Seite
|
||||
│ └── types.ts # TypeScript Interfaces
|
||||
│
|
||||
├── components/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── index.ts # Exports
|
||||
│ ├── FabricCanvas.tsx # Fabric.js Canvas Wrapper
|
||||
│ ├── EditorToolbar.tsx # Werkzeugleiste (links)
|
||||
│ ├── PropertiesPanel.tsx # Eigenschaften-Panel (rechts)
|
||||
│ ├── AIImageGenerator.tsx # KI-Bild Generator Modal
|
||||
│ ├── CanvasControls.tsx # Zoom, Grid, Seiten
|
||||
│ ├── ExportPanel.tsx # PDF/Bild Export
|
||||
│ └── PageNavigator.tsx # Mehrseitige Dokumente
|
||||
│
|
||||
├── lib/
|
||||
│ └── worksheet-editor/
|
||||
│ ├── index.ts # Exports
|
||||
│ └── WorksheetContext.tsx # State Management
|
||||
```
|
||||
|
||||
### 3.2 Backend (klausur-service)
|
||||
|
||||
```
|
||||
/klausur-service/backend/
|
||||
├── worksheet_editor_api.py # API Endpoints
|
||||
└── main.py # Router-Registrierung
|
||||
```
|
||||
|
||||
## 4. API Endpoints
|
||||
|
||||
### 4.1 KI-Bild generieren
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/ai-image
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"prompt": "Ein freundlicher Cartoon-Hund der ein Buch liest",
|
||||
"style": "cartoon",
|
||||
"width": 512,
|
||||
"height": 512
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"image_base64": "data:image/png;base64,...",
|
||||
"prompt_used": "...",
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
**Styles:**
|
||||
- `realistic` - Fotorealistisch
|
||||
- `cartoon` - Cartoon/Comic
|
||||
- `sketch` - Handgezeichnete Skizze
|
||||
- `clipart` - Einfache Clipart-Grafiken
|
||||
- `educational` - Bildungs-Illustrationen
|
||||
|
||||
### 4.2 Worksheet speichern
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/save
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"id": "optional-existing-id",
|
||||
"title": "Englisch Vokabeln Unit 3",
|
||||
"pages": [
|
||||
{ "id": "page_1", "index": 0, "canvasJSON": "{...}" }
|
||||
],
|
||||
"pageFormat": {
|
||||
"width": 210,
|
||||
"height": 297,
|
||||
"orientation": "portrait"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Worksheet laden
|
||||
|
||||
```http
|
||||
GET /api/v1/worksheet/{id}
|
||||
```
|
||||
|
||||
### 4.4 PDF exportieren
|
||||
|
||||
```http
|
||||
POST /api/v1/worksheet/{id}/export-pdf
|
||||
```
|
||||
|
||||
**Response:** PDF-Datei als Download
|
||||
|
||||
### 4.5 Worksheets auflisten
|
||||
|
||||
```http
|
||||
GET /api/v1/worksheet/list/all
|
||||
```
|
||||
|
||||
## 5. Komponenten
|
||||
|
||||
### 5.1 FabricCanvas
|
||||
|
||||
Die Kernkomponente für den Canvas-Bereich:
|
||||
|
||||
- **A4-Format**: 794 x 1123 Pixel (96 DPI)
|
||||
- **Grid-Overlay**: Optionales Raster mit Snap-Funktion
|
||||
- **Zoom/Pan**: Mausrad und Controls
|
||||
- **Selection**: Einzel- und Mehrfachauswahl
|
||||
- **Keyboard Shortcuts**: Del, Ctrl+C/V/Z/D
|
||||
|
||||
### 5.2 EditorToolbar
|
||||
|
||||
Werkzeuge für die Bearbeitung:
|
||||
|
||||
| Icon | Tool | Beschreibung |
|
||||
|------|------|--------------|
|
||||
| 🖱️ | Select | Elemente auswählen/verschieben |
|
||||
| T | Text | Text hinzufügen (IText) |
|
||||
| ▭ | Rechteck | Rechteck zeichnen |
|
||||
| ○ | Kreis | Kreis/Ellipse zeichnen |
|
||||
| ― | Linie | Linie zeichnen |
|
||||
| → | Pfeil | Pfeil zeichnen |
|
||||
| 🖼️ | Bild | Bild hochladen |
|
||||
| ✨ | KI-Bild | Bild mit KI generieren |
|
||||
| ⊞ | Tabelle | Tabelle einfügen |
|
||||
|
||||
### 5.3 PropertiesPanel
|
||||
|
||||
Eigenschaften-Editor für ausgewählte Objekte:
|
||||
|
||||
**Text-Eigenschaften:**
|
||||
- Schriftart (Arial, Times, Georgia, OpenDyslexic, Schulschrift)
|
||||
- Schriftgröße (8-120pt)
|
||||
- Schriftstil (Normal, Fett, Kursiv)
|
||||
- Zeilenhöhe, Zeichenabstand
|
||||
- Textausrichtung
|
||||
- Textfarbe
|
||||
|
||||
**Form-Eigenschaften:**
|
||||
- Füllfarbe
|
||||
- Rahmenfarbe und -stärke
|
||||
- Eckenradius
|
||||
|
||||
**Allgemein:**
|
||||
- Deckkraft
|
||||
- Löschen-Button
|
||||
|
||||
### 5.4 WorksheetContext
|
||||
|
||||
React Context für globalen State:
|
||||
|
||||
```typescript
|
||||
interface WorksheetContextType {
|
||||
canvas: Canvas | null
|
||||
document: WorksheetDocument | null
|
||||
activeTool: EditorTool
|
||||
selectedObjects: FabricObject[]
|
||||
zoom: number
|
||||
showGrid: boolean
|
||||
snapToGrid: boolean
|
||||
currentPageIndex: number
|
||||
canUndo: boolean
|
||||
canRedo: boolean
|
||||
isDirty: boolean
|
||||
// ... Methoden
|
||||
}
|
||||
```
|
||||
|
||||
## 6. Datenmodelle
|
||||
|
||||
### 6.1 WorksheetDocument
|
||||
|
||||
```typescript
|
||||
interface WorksheetDocument {
|
||||
id: string
|
||||
title: string
|
||||
description?: string
|
||||
pages: WorksheetPage[]
|
||||
pageFormat: PageFormat
|
||||
createdAt: string
|
||||
updatedAt: string
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 WorksheetPage
|
||||
|
||||
```typescript
|
||||
interface WorksheetPage {
|
||||
id: string
|
||||
index: number
|
||||
canvasJSON: string // Serialisierter Fabric.js Canvas
|
||||
thumbnail?: string
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 PageFormat
|
||||
|
||||
```typescript
|
||||
interface PageFormat {
|
||||
width: number // in mm (Standard: 210)
|
||||
height: number // in mm (Standard: 297)
|
||||
orientation: 'portrait' | 'landscape'
|
||||
margins: { top, right, bottom, left: number }
|
||||
}
|
||||
```
|
||||
|
||||
## 7. Features
|
||||
|
||||
### 7.1 Undo/Redo
|
||||
|
||||
- History-Stack mit max. 50 Einträgen
|
||||
- Automatische Speicherung bei jeder Änderung
|
||||
- Keyboard: Ctrl+Z (Undo), Ctrl+Y (Redo)
|
||||
|
||||
### 7.2 Grid & Snap
|
||||
|
||||
- Konfigurierbares Raster (5mm, 10mm, 15mm, 20mm)
|
||||
- Snap-to-Grid beim Verschieben
|
||||
- Ein-/Ausblendbar
|
||||
|
||||
### 7.3 Export
|
||||
|
||||
- **PDF**: Mehrseitig, A4-Format
|
||||
- **PNG**: Hochauflösend (2x Multiplier)
|
||||
- **JPG**: Mit Qualitätseinstellung
|
||||
|
||||
### 7.4 Speicherung
|
||||
|
||||
- **Backend**: REST API mit JSON-Persistierung
|
||||
- **Fallback**: localStorage bei Offline-Betrieb
|
||||
|
||||
## 8. KI-Bildgenerierung
|
||||
|
||||
### 8.1 Ollama Integration
|
||||
|
||||
Der Editor nutzt Ollama für die KI-Bildgenerierung:
|
||||
|
||||
```python
|
||||
OLLAMA_URL = "http://host.docker.internal:11434"
|
||||
```
|
||||
|
||||
### 8.2 Placeholder-System
|
||||
|
||||
Falls Ollama nicht verfügbar ist, wird ein Placeholder-Bild generiert:
|
||||
- Farbcodiert nach Stil
|
||||
- Prompt-Text als Beschreibung
|
||||
- "KI-Bild (Platzhalter)"-Badge
|
||||
|
||||
### 8.3 Stil-Prompts
|
||||
|
||||
Jeder Stil fügt automatisch Modifikatoren zum Prompt hinzu:
|
||||
|
||||
```python
|
||||
STYLE_PROMPTS = {
|
||||
"realistic": "photorealistic, high detail",
|
||||
"cartoon": "cartoon style, colorful, child-friendly",
|
||||
"sketch": "pencil sketch, hand-drawn",
|
||||
"clipart": "clipart style, flat design",
|
||||
"educational": "educational illustration, textbook style"
|
||||
}
|
||||
```
|
||||
|
||||
## 9. Glassmorphism Design
|
||||
|
||||
Der Editor folgt dem Glassmorphism-Design des Studio v2:
|
||||
|
||||
```typescript
|
||||
// Dark Theme
|
||||
'backdrop-blur-xl bg-white/10 border border-white/20'
|
||||
|
||||
// Light Theme
|
||||
'backdrop-blur-xl bg-white/70 border border-black/10 shadow-xl'
|
||||
```
|
||||
|
||||
## 10. Internationalisierung
|
||||
|
||||
Unterstützte Sprachen:
|
||||
- 🇩🇪 Deutsch
|
||||
- 🇬🇧 English
|
||||
- 🇹🇷 Türkçe
|
||||
- 🇸🇦 العربية (RTL)
|
||||
- 🇷🇺 Русский
|
||||
- 🇺🇦 Українська
|
||||
- 🇵🇱 Polski
|
||||
|
||||
Translation Key: `nav_worksheet_editor`
|
||||
|
||||
## 11. Sicherheit
|
||||
|
||||
### 11.1 Bild-Upload
|
||||
|
||||
- Nur Bildformate (image/*)
|
||||
- Client-seitige Validierung
|
||||
- Base64-Konvertierung
|
||||
|
||||
### 11.2 CORS
|
||||
|
||||
Aktiviert für lokale Entwicklung und Docker-Umgebung.
|
||||
|
||||
## 12. Deployment
|
||||
|
||||
### 12.1 Frontend
|
||||
|
||||
```bash
|
||||
cd studio-v2
|
||||
npm install
|
||||
npm run dev # Port 3001
|
||||
```
|
||||
|
||||
### 12.2 Backend
|
||||
|
||||
Der klausur-service läuft auf Port 8086:
|
||||
|
||||
```bash
|
||||
cd klausur-service/backend
|
||||
python main.py
|
||||
```
|
||||
|
||||
### 12.3 Docker
|
||||
|
||||
Der Service ist Teil des docker-compose.yml.
|
||||
|
||||
## 13. Zukünftige Erweiterungen
|
||||
|
||||
- [ ] Tabellen-Tool mit Zellbearbeitung
|
||||
- [ ] Vorlagen-Bibliothek
|
||||
- [ ] Kollaboratives Editieren
|
||||
- [ ] Drag & Drop aus Dokumentenbibliothek
|
||||
- [ ] Integration mit Vocab-Worksheet
|
||||
173
docs-src/services/klausur-service/index.md
Normal file
173
docs-src/services/klausur-service/index.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Klausur-Service
|
||||
|
||||
Der Klausur-Service ist ein FastAPI-basierter Microservice fuer KI-gestuetzte Abitur-Klausurkorrektur.
|
||||
|
||||
## Uebersicht
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| **Port** | 8086 |
|
||||
| **Framework** | FastAPI (Python) |
|
||||
| **Datenbank** | PostgreSQL + Qdrant (Vektor-DB) |
|
||||
| **Speicher** | MinIO (Datei-Storage) |
|
||||
|
||||
## Features
|
||||
|
||||
- **OCR-Erkennung**: Automatische Texterkennung aus gescannten Klausuren
|
||||
- **KI-Bewertung**: Automatische Bewertungsvorschlaege basierend auf Erwartungshorizont
|
||||
- **BYOEH**: Bring-Your-Own-Expectation-Horizon mit Client-seitiger Verschluesselung
|
||||
- **Fairness-Analyse**: Statistische Analyse der Bewertungskonsistenz
|
||||
- **PDF-Export**: Gutachten und Notenuebersichten als PDF
|
||||
- **Zweitkorrektur**: Vollstaendiger Workflow fuer Erst-, Zweit- und Drittkorrektur
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (Next.js) │
|
||||
│ /website/app/admin/klausur-korrektur/ │
|
||||
│ - Klausur-Liste │
|
||||
│ - Studenten-Liste │
|
||||
│ - Korrektur-Workspace (2/3-1/3 Layout) │
|
||||
│ - Fairness-Dashboard │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ klausur-service (FastAPI) │
|
||||
│ Port 8086 - /klausur-service/backend/main.py │
|
||||
│ - Klausur CRUD (/api/v1/klausuren) │
|
||||
│ - Student Work (/api/v1/students) │
|
||||
│ - Annotations (/api/v1/annotations) │
|
||||
│ - BYOEH (/api/v1/eh) │
|
||||
│ - PDF Export │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Infrastruktur │
|
||||
│ - Qdrant (Vektor-DB fuer RAG) │
|
||||
│ - MinIO (Datei-Storage) │
|
||||
│ - PostgreSQL (Metadaten) │
|
||||
│ - Embedding-Service (Port 8087) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Klausur-Verwaltung
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/klausuren` | Liste aller Klausuren |
|
||||
| POST | `/api/v1/klausuren` | Neue Klausur erstellen |
|
||||
| GET | `/api/v1/klausuren/{id}` | Klausur-Details |
|
||||
| DELETE | `/api/v1/klausuren/{id}` | Klausur loeschen |
|
||||
|
||||
### Studenten-Arbeiten
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/klausuren/{id}/students` | Arbeit hochladen |
|
||||
| GET | `/api/v1/klausuren/{id}/students` | Studenten-Liste |
|
||||
| GET | `/api/v1/students/{id}` | Einzelne Arbeit |
|
||||
| PUT | `/api/v1/students/{id}/criteria` | Kriterien bewerten |
|
||||
| PUT | `/api/v1/students/{id}/gutachten` | Gutachten speichern |
|
||||
|
||||
### KI-Funktionen
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| POST | `/api/v1/students/{id}/gutachten/generate` | Gutachten generieren |
|
||||
| GET | `/api/v1/klausuren/{id}/fairness` | Fairness-Analyse |
|
||||
| POST | `/api/v1/students/{id}/eh-suggestions` | EH-Vorschlaege via RAG |
|
||||
|
||||
### PDF-Export
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|--------------|
|
||||
| GET | `/api/v1/students/{id}/export/gutachten` | Einzelgutachten PDF |
|
||||
| GET | `/api/v1/students/{id}/export/annotations` | Anmerkungen PDF |
|
||||
| GET | `/api/v1/klausuren/{id}/export/overview` | Notenuebersicht PDF |
|
||||
| GET | `/api/v1/klausuren/{id}/export/all-gutachten` | Alle Gutachten PDF |
|
||||
|
||||
## Notensystem
|
||||
|
||||
Das System verwendet das deutsche 15-Punkte-System fuer Abiturklausuren:
|
||||
|
||||
| Punkte | Prozent | Note |
|
||||
|--------|---------|------|
|
||||
| 15 | >= 95% | 1+ |
|
||||
| 14 | >= 90% | 1 |
|
||||
| 13 | >= 85% | 1- |
|
||||
| 12 | >= 80% | 2+ |
|
||||
| 11 | >= 75% | 2 |
|
||||
| 10 | >= 70% | 2- |
|
||||
| 9 | >= 65% | 3+ |
|
||||
| 8 | >= 60% | 3 |
|
||||
| 7 | >= 55% | 3- |
|
||||
| 6 | >= 50% | 4+ |
|
||||
| 5 | >= 45% | 4 |
|
||||
| 4 | >= 40% | 4- |
|
||||
| 3 | >= 33% | 5+ |
|
||||
| 2 | >= 27% | 5 |
|
||||
| 1 | >= 20% | 5- |
|
||||
| 0 | < 20% | 6 |
|
||||
|
||||
## Bewertungskriterien
|
||||
|
||||
| Kriterium | Gewicht | Beschreibung |
|
||||
|-----------|---------|--------------|
|
||||
| Rechtschreibung | 15% | Orthografie |
|
||||
| Grammatik | 15% | Grammatik & Syntax |
|
||||
| Inhalt | 40% | Inhaltliche Qualitaet |
|
||||
| Struktur | 15% | Aufbau & Gliederung |
|
||||
| Stil | 15% | Ausdruck & Stil |
|
||||
|
||||
## Verzeichnisstruktur
|
||||
|
||||
```
|
||||
klausur-service/
|
||||
├── backend/
|
||||
│ ├── main.py # API Endpoints + Datenmodelle
|
||||
│ ├── qdrant_service.py # Vektor-Datenbank Operationen
|
||||
│ ├── eh_pipeline.py # BYOEH Verarbeitung
|
||||
│ ├── hybrid_search.py # Hybrid Search (BM25 + Semantic)
|
||||
│ └── requirements.txt # Python Dependencies
|
||||
├── frontend/
|
||||
│ └── src/
|
||||
│ ├── components/ # React Komponenten
|
||||
│ ├── pages/ # Seiten
|
||||
│ └── services/ # API Client
|
||||
└── docs/
|
||||
├── BYOEH-Architecture.md
|
||||
└── BYOEH-Developer-Guide.md
|
||||
```
|
||||
|
||||
## Konfiguration
|
||||
|
||||
### Umgebungsvariablen
|
||||
|
||||
```env
|
||||
# Klausur-Service
|
||||
KLAUSUR_SERVICE_PORT=8086
|
||||
QDRANT_URL=http://qdrant:6333
|
||||
MINIO_ENDPOINT=minio:9000
|
||||
MINIO_ACCESS_KEY=...
|
||||
MINIO_SECRET_KEY=...
|
||||
|
||||
# Embedding-Service
|
||||
EMBEDDING_SERVICE_URL=http://embedding:8087
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# BYOEH
|
||||
BYOEH_ENCRYPTION_ENABLED=true
|
||||
EH_UPLOAD_DIR=/app/eh-uploads
|
||||
```
|
||||
|
||||
## Weiterführende Dokumentation
|
||||
|
||||
- [BYOEH Architektur](./BYOEH-Architecture.md) - Client-seitige Verschluesselung
|
||||
- [OCR Compare](./OCR-Compare.md) - Block Review Feature fuer OCR-Vergleich
|
||||
- [Zeugnis-System](../../architecture/zeugnis-system.md) - Zeugniserstellung
|
||||
- [Backend API](../../api/backend-api.md) - Allgemeine API-Dokumentation
|
||||
Reference in New Issue
Block a user