Files
breakpilot-lehrer/docs-src/services/klausur-service/BYOEH-Architecture.md
Benjamin Boenisch e22019b2d5 Add CLAUDE.md, MkDocs docs, .claude/rules
- CLAUDE.md: Comprehensive documentation for Lehrer KI platform
- docs-src: Klausur, Voice, Agent-Core, KI-Pipeline docs
- mkdocs.yml: Lehrer-specific nav with blue theme
- docker-compose: Added docs service (port 8010, profile: docs)
- .claude/rules: testing, docs, open-source, abiturkorrektur, vocab-worksheet, multi-agent, experimental-dashboard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 00:49:25 +01:00

323 lines
17 KiB
Markdown

# BYOEH (Bring-Your-Own-Expectation-Horizon) - Architecture Documentation
## Overview
The BYOEH module enables teachers to upload their own Erwartungshorizonte (expectation horizons/grading rubrics) and use them for RAG-assisted grading suggestions. Key design principles:
- **Tenant Isolation**: Each teacher/school has an isolated namespace
- **No Training Guarantee**: EH content is only used for RAG, never for model training
- **Operator Blindness**: Client-side encryption ensures Breakpilot cannot view plaintext
- **Rights Confirmation**: Required legal acknowledgment at upload time
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────┐
│ klausur-service (Port 8086) │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────┐ ┌─────────────────────────────────────────┐ │
│ │ BYOEH REST API │ │ BYOEH Service Layer │ │
│ │ │ │ │ │
│ │ POST /api/v1/eh │───▶│ - Upload Wizard Logic │ │
│ │ GET /api/v1/eh │ │ - Rights Confirmation │ │
│ │ DELETE /api/v1/eh │ │ - Chunking Pipeline │ │
│ │ POST /rag-query │ │ - Encryption Service │ │
│ └────────────────────┘ └────────────────────┬────────────────────┘ │
└─────────────────────────────────────────────────┼────────────────────────┘
┌───────────────────────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐
│ PostgreSQL │ │ Qdrant │ │ Encrypted Storage │
│ (Metadata + Audit) │ │ (Vector Search) │ │ /app/eh-uploads/ │
│ │ │ │ │ │
│ In-Memory Storage: │ │ Collection: bp_eh │ │ {tenant}/{eh_id}/ │
│ - erwartungshorizonte│ │ - tenant_id (filter) │ │ encrypted.bin │
│ - eh_chunks │ │ - eh_id │ │ salt.txt │
│ - eh_key_shares │ │ - embedding[1536] │ │ │
│ - eh_klausur_links │ │ - encrypted_content │ └──────────────────────┘
│ - eh_audit_log │ │ │
└──────────────────────┘ └──────────────────────────┘
```
## Data Flow
### 1. Upload Flow
```
Browser Backend Storage
│ │ │
│ 1. User selects PDF │ │
│ 2. User enters passphrase │ │
│ 3. PBKDF2 key derivation │ │
│ 4. AES-256-GCM encryption │ │
│ 5. SHA-256 key hash │ │
│ │ │
│──────────────────────────────▶│ │
│ POST /api/v1/eh/upload │ │
│ (encrypted blob + key_hash) │ │
│ │──────────────────────────────▶│
│ │ Store encrypted.bin + salt │
│ │◀──────────────────────────────│
│ │ │
│ │ Save metadata to DB │
│◀──────────────────────────────│ │
│ Return EH record │ │
```
### 2. Indexing Flow (RAG Preparation)
```
Browser Backend Qdrant
│ │ │
│──────────────────────────────▶│ │
│ POST /api/v1/eh/{id}/index │ │
│ (passphrase for decryption) │ │
│ │ │
│ │ 1. Verify key hash │
│ │ 2. Decrypt content │
│ │ 3. Extract text (PDF) │
│ │ 4. Chunk text │
│ │ 5. Generate embeddings │
│ │ 6. Re-encrypt each chunk │
│ │──────────────────────────────▶│
│ │ Index vectors + encrypted │
│ │ chunks with tenant filter │
│◀──────────────────────────────│ │
│ Return chunk count │ │
```
### 3. RAG Query Flow
```
Browser Backend Qdrant
│ │ │
│──────────────────────────────▶│ │
│ POST /api/v1/eh/rag-query │ │
│ (query + passphrase) │ │
│ │ │
│ │ 1. Generate query embedding │
│ │──────────────────────────────▶│
│ │ 2. Semantic search │
│ │ (tenant-filtered) │
│ │◀──────────────────────────────│
│ │ 3. Decrypt matched chunks │
│◀──────────────────────────────│ │
│ Return decrypted context │ │
```
## Security Architecture
### Client-Side Encryption
```
┌─────────────────────────────────────────────────────────────────┐
│ Browser (Client-Side) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. User enters passphrase (NEVER sent to server) │
│ │ │
│ ▼ │
│ 2. Key Derivation: PBKDF2-SHA256(passphrase, salt, 100k iter) │
│ │ │
│ ▼ │
│ 3. Encryption: AES-256-GCM(key, iv, file_content) │
│ │ │
│ ▼ │
│ 4. Key-Hash: SHA-256(derived_key) → server verification only │
│ │ │
│ ▼ │
│ 5. Upload: encrypted_blob + key_hash + salt (NOT key!) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Security Guarantees
| Guarantee | Implementation |
|-----------|----------------|
| **No Training** | `training_allowed: false` on all Qdrant points |
| **Operator Blindness** | Passphrase never leaves browser; server only sees key hash |
| **Tenant Isolation** | Every query filtered by `tenant_id` |
| **Audit Trail** | All actions logged with timestamps |
## Key Sharing System
The key sharing system enables first examiners to grant access to their EH to second examiners and supervisors.
### Share Flow
```
First Examiner Backend Second Examiner
│ │ │
│ 1. Encrypt passphrase for │ │
│ recipient (client-side) │ │
│ │ │
│─────────────────────────────▶ │
│ POST /eh/{id}/share │ │
│ (encrypted_passphrase, role)│ │
│ │ │
│ │ Store EHKeyShare │
│◀───────────────────────────── │
│ │ │
│ │ │
│ │◀────────────────────────────│
│ │ GET /eh/shared-with-me │
│ │ │
│ │─────────────────────────────▶
│ │ Return shared EH list │
│ │ │
│ │◀────────────────────────────│
│ │ RAG query with decrypted │
│ │ passphrase │
```
### Data Structures
```python
@dataclass
class EHKeyShare:
id: str
eh_id: str
user_id: str # Recipient
encrypted_passphrase: str # Client-encrypted for recipient
passphrase_hint: str # Optional hint
granted_by: str # Grantor user ID
granted_at: datetime
role: str # second_examiner, third_examiner, supervisor
klausur_id: Optional[str] # Link to specific Klausur
active: bool
@dataclass
class EHKlausurLink:
id: str
eh_id: str
klausur_id: str
linked_by: str
linked_at: datetime
```
## API Endpoints
### Core EH Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/eh/upload` | Upload encrypted EH |
| GET | `/api/v1/eh` | List user's EH |
| GET | `/api/v1/eh/{id}` | Get single EH |
| DELETE | `/api/v1/eh/{id}` | Soft delete EH |
| POST | `/api/v1/eh/{id}/index` | Index EH for RAG |
| POST | `/api/v1/eh/rag-query` | Query EH content |
### Key Sharing Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/eh/{id}/share` | Share EH with examiner |
| GET | `/api/v1/eh/{id}/shares` | List shares (owner) |
| DELETE | `/api/v1/eh/{id}/shares/{shareId}` | Revoke share |
| GET | `/api/v1/eh/shared-with-me` | List EH shared with user |
### Klausur Integration Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/eh/{id}/link-klausur` | Link EH to Klausur |
| DELETE | `/api/v1/eh/{id}/link-klausur/{klausurId}` | Unlink EH |
| GET | `/api/v1/klausuren/{id}/linked-eh` | Get linked EH for Klausur |
### Audit & Admin Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/v1/eh/audit-log` | Get audit log |
| GET | `/api/v1/eh/rights-text` | Get rights confirmation text |
| GET | `/api/v1/eh/qdrant-status` | Get Qdrant status (admin) |
## Frontend Components
### EHUploadWizard
5-step wizard for uploading Erwartungshorizonte:
1. **File Selection** - Choose PDF file
2. **Metadata** - Title, Subject, Niveau, Year
3. **Rights Confirmation** - Legal acknowledgment
4. **Encryption** - Set passphrase (2x confirmation)
5. **Summary** - Review and upload
### Integration Points
- **KorrekturPage**: Shows EH prompt after first student upload
- **GutachtenGeneration**: Uses RAG context from linked EH
- **Sidebar Badge**: Shows linked EH count
## File Structure
```
klausur-service/
├── backend/
│ ├── main.py # API endpoints + data structures
│ ├── qdrant_service.py # Vector database operations
│ ├── eh_pipeline.py # Chunking, embedding, encryption
│ └── requirements.txt # Python dependencies
├── frontend/
│ └── src/
│ ├── components/
│ │ └── EHUploadWizard.tsx
│ ├── services/
│ │ ├── api.ts # API client
│ │ └── encryption.ts # Client-side crypto
│ ├── pages/
│ │ └── KorrekturPage.tsx # EH integration
│ └── styles/
│ └── eh-wizard.css
└── docs/
├── BYOEH-Architecture.md
└── BYOEH-Developer-Guide.md
```
## Configuration
### Environment Variables
```env
QDRANT_URL=http://qdrant:6333
OPENAI_API_KEY=sk-... # For embeddings
BYOEH_ENCRYPTION_ENABLED=true
EH_UPLOAD_DIR=/app/eh-uploads
```
### Docker Services
```yaml
# docker-compose.yml
services:
qdrant:
image: qdrant/qdrant:v1.7.4
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
```
## Audit Events
| Action | Description |
|--------|-------------|
| `upload` | EH uploaded |
| `index` | EH indexed for RAG |
| `rag_query` | RAG query executed |
| `delete` | EH soft deleted |
| `share` | EH shared with examiner |
| `revoke_share` | Share revoked |
| `link_klausur` | EH linked to Klausur |
| `unlink_klausur` | EH unlinked from Klausur |
## See Also
- [Zeugnis-System Architektur](../../architecture/zeugnis-system.md)
- [Klausur-Service Index](./index.md)