# BYOEH Developer Guide ## Quick Start ### Prerequisites - Python 3.10+ - Node.js 18+ - Docker & Docker Compose - OpenAI API Key (for embeddings) ### Setup 1. **Start services:** ```bash docker-compose up -d qdrant ``` 2. **Configure environment:** ```env QDRANT_URL=http://localhost:6333 OPENAI_API_KEY=sk-your-key BYOEH_ENCRYPTION_ENABLED=true ``` 3. **Run klausur-service:** ```bash cd klausur-service/backend pip install -r requirements.txt uvicorn main:app --reload --port 8086 ``` 4. **Run frontend:** ```bash cd klausur-service/frontend npm install npm run dev ``` ## Client-Side Encryption The encryption service (`encryption.ts`) handles all cryptographic operations in the browser: ### Encrypting a File ```typescript import { encryptFile, generateSalt } from '../services/encryption' const file = document.getElementById('fileInput').files[0] const passphrase = 'user-secret-password' const encrypted = await encryptFile(file, passphrase) // Result: // { // encryptedData: ArrayBuffer, // keyHash: string, // SHA-256 hash for verification // salt: string, // Hex-encoded salt // iv: string // Hex-encoded initialization vector // } ``` ### Decrypting Content ```typescript import { decryptText, verifyPassphrase } from '../services/encryption' // First verify the passphrase const isValid = await verifyPassphrase(passphrase, salt, expectedKeyHash) if (isValid) { const decrypted = await decryptText(encryptedBase64, passphrase, salt) } ``` ## Backend API Usage ### Upload an Erwartungshorizont ```python # The upload endpoint accepts FormData with: # - file: encrypted binary blob # - metadata_json: JSON string with metadata POST /api/v1/eh/upload Content-Type: multipart/form-data { "file": , "metadata_json": { "metadata": { "title": "Deutsch LK 2025", "subject": "deutsch", "niveau": "eA", "year": 2025, "aufgaben_nummer": "Aufgabe 1" }, "encryption_key_hash": "abc123...", "salt": "def456...", "rights_confirmed": true, "original_filename": "erwartungshorizont.pdf" } } ``` ### Index for RAG ```python POST /api/v1/eh/{eh_id}/index Content-Type: application/json { "passphrase": "user-secret-password" } ``` The backend will: 1. Verify the passphrase against stored key hash 2. Decrypt the file 3. Extract text from PDF 4. Chunk the text (1000 chars, 200 overlap) 5. Generate OpenAI embeddings 6. Re-encrypt each chunk 7. Index in Qdrant with tenant filter ### RAG Query ```python POST /api/v1/eh/rag-query Content-Type: application/json { "query_text": "Wie sollte die Einleitung strukturiert sein?", "passphrase": "user-secret-password", "subject": "deutsch", # Optional filter "limit": 5 # Max results } ``` Response: ```json { "context": "Die Einleitung sollte...", "sources": [ { "text": "Die Einleitung sollte...", "eh_id": "uuid", "eh_title": "Deutsch LK 2025", "chunk_index": 2, "score": 0.89 } ], "query": "Wie sollte die Einleitung strukturiert sein?" } ``` ## Key Sharing Implementation ### Invitation Flow (Recommended) The invitation flow provides a two-phase sharing process: Invite -> Accept ```typescript import { ehApi } from '../services/api' // 1. First examiner sends invitation to second examiner const invitation = await ehApi.inviteToEH(ehId, { invitee_email: 'zweitkorrektor@school.de', role: 'second_examiner', klausur_id: 'klausur-uuid', // Optional: link to specific Klausur message: 'Bitte fuer Zweitkorrektur nutzen', expires_in_days: 14 // Default: 14 days }) // Returns: { invitation_id, eh_id, invitee_email, role, expires_at, eh_title } // 2. Second examiner sees pending invitation const pending = await ehApi.getPendingInvitations() // [{ invitation: {...}, eh: { id, title, subject, niveau, year } }] // 3. Second examiner accepts invitation const accepted = await ehApi.acceptInvitation( invitationId, encryptedPassphrase // Passphrase encrypted for recipient ) // Returns: { status: 'accepted', share_id, eh_id, role, klausur_id } ``` ### Invitation Management ```typescript // Get invitations sent by current user const sent = await ehApi.getSentInvitations() // Decline an invitation (as invitee) await ehApi.declineInvitation(invitationId) // Revoke a pending invitation (as inviter) await ehApi.revokeInvitation(invitationId) // Get complete access chain for an EH const chain = await ehApi.getAccessChain(ehId) // Returns: { eh_id, eh_title, owner, active_shares, pending_invitations, revoked_shares } ``` ### Direct Sharing (Legacy) For immediate sharing without invitation: ```typescript // First examiner shares directly with second examiner await ehApi.shareEH(ehId, { user_id: 'second-examiner-uuid', role: 'second_examiner', encrypted_passphrase: encryptedPassphrase, // Encrypted for recipient passphrase_hint: 'Das uebliche Passwort', klausur_id: 'klausur-uuid' // Optional }) ``` ### Accessing Shared EH ```typescript // Second examiner gets shared EH const shared = await ehApi.getSharedWithMe() // [{ eh: {...}, share: {...} }] // Query using provided passphrase const result = await ehApi.ragQuery({ query_text: 'search query', passphrase: decryptedPassphrase, subject: 'deutsch' }) ``` ### Revoking Access ```typescript // List all shares for an EH const shares = await ehApi.listShares(ehId) // Revoke a share await ehApi.revokeShare(ehId, shareId) ``` ## Klausur Integration ### Automatic EH Prompt The `KorrekturPage` shows an EH upload prompt after the first student work is uploaded: ```typescript // In KorrekturPage.tsx useEffect(() => { if ( currentKlausur?.students.length === 1 && linkedEHs.length === 0 && !ehPromptDismissed ) { setShowEHPrompt(true) } }, [currentKlausur?.students.length]) ``` ### Linking EH to Klausur ```typescript // After EH upload, auto-link to Klausur await ehApi.linkToKlausur(ehId, klausurId) // Get linked EH for a Klausur const linked = await klausurEHApi.getLinkedEH(klausurId) ``` ## Frontend Components ### EHUploadWizard Props ```typescript interface EHUploadWizardProps { onClose: () => void onComplete?: (ehId: string) => void defaultSubject?: string // Pre-fill subject defaultYear?: number // Pre-fill year klausurId?: string // Auto-link after upload } // Usage setShowWizard(false)} onComplete={(ehId) => console.log('Uploaded:', ehId)} defaultSubject={klausur.subject} defaultYear={klausur.year} klausurId={klausur.id} /> ``` ### Wizard Steps 1. **file** - PDF file selection with drag & drop 2. **metadata** - Form for title, subject, niveau, year 3. **rights** - Rights confirmation checkbox 4. **encryption** - Passphrase input with strength meter 5. **summary** - Review and confirm upload ## Qdrant Operations ### Collection Schema ```python # Collection: bp_eh { "vectors": { "size": 1536, # OpenAI text-embedding-3-small "distance": "Cosine" } } # Point payload { "tenant_id": "school-uuid", "eh_id": "eh-uuid", "chunk_index": 0, "encrypted_content": "base64...", "training_allowed": false # ALWAYS false } ``` ### Tenant-Isolated Search ```python from qdrant_service import search_eh results = await search_eh( query_embedding=embedding, tenant_id="school-uuid", subject="deutsch", limit=5 ) ``` ## Testing ### Unit Tests ```bash cd klausur-service/backend pytest tests/test_byoeh.py -v ``` ### Test Structure ```python # tests/test_byoeh.py class TestBYOEH: def test_upload_eh(self, client, auth_headers): """Test EH upload with encryption""" pass def test_index_eh(self, client, auth_headers, uploaded_eh): """Test EH indexing for RAG""" pass def test_rag_query(self, client, auth_headers, indexed_eh): """Test RAG query returns relevant chunks""" pass def test_share_eh(self, client, auth_headers, uploaded_eh): """Test sharing EH with another user""" pass ``` ### Frontend Tests ```typescript // EHUploadWizard.test.tsx describe('EHUploadWizard', () => { it('completes all steps successfully', async () => { // ... }) it('validates passphrase strength', async () => { // ... }) it('auto-links to klausur when klausurId provided', async () => { // ... }) }) ``` ## Error Handling ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | `Passphrase verification failed` | Wrong passphrase | Ask user to re-enter | | `EH not found` | Invalid ID or deleted | Check ID, reload list | | `Access denied` | User not owner/shared | Check permissions | | `Qdrant connection failed` | Service unavailable | Check Qdrant container | ### Error Response Format ```json { "detail": "Passphrase verification failed" } ``` ## Security Considerations ### Do's - Store key hash, never the key itself - Always filter by tenant_id - Log all access in audit trail - Use HTTPS in production ### Don'ts - Never log passphrase or decrypted content - Never store passphrase in localStorage - Never send passphrase as URL parameter - Never return decrypted content without auth ## Performance Tips ### Chunking Configuration ```python CHUNK_SIZE = 1000 # Characters per chunk CHUNK_OVERLAP = 200 # Overlap for context continuity ``` ### Embedding Batching ```python # Generate embeddings in batches of 20 EMBEDDING_BATCH_SIZE = 20 ``` ### Qdrant Optimization ```python # Use HNSW index for fast approximate search # Collection is automatically optimized on creation ``` ## Debugging ### Enable Debug Logging ```python import logging logging.getLogger('byoeh').setLevel(logging.DEBUG) ``` ### Check Qdrant Status ```bash curl http://localhost:6333/collections/bp_eh ``` ### Verify Encryption ```typescript import { isEncryptionSupported } from '../services/encryption' if (!isEncryptionSupported()) { console.error('Web Crypto API not available') } ``` ## Migration Notes ### From v1.0 to v1.1 1. Added key sharing system 2. Added Klausur linking 3. EH prompt after student upload No database migrations required - all data structures are additive.