feat(claude): Add testing, documentation rules and project hooks
- rules/testing.md: TDD workflow for Go and Python - rules/documentation.md: Auto-documentation guidelines - plans/embedding-service-separation.md: Migration plan - settings.json: Post-edit hooks for docs/tests validation - .gitignore: Exclude settings.local.json (contains API keys) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
263
.claude/plans/embedding-service-separation.md
Normal file
263
.claude/plans/embedding-service-separation.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Plan: Embedding-Service Separation
|
||||
|
||||
## Ziel
|
||||
Trennung der ML/Embedding-Komponenten vom klausur-service in einen eigenständigen `embedding-service`, um die Build-Zeit von ~20 Minuten auf ~30 Sekunden zu reduzieren.
|
||||
|
||||
## Aktuelle Situation
|
||||
|
||||
| Service | Build-Zeit | Image-Größe | Problem |
|
||||
|---------|------------|-------------|---------|
|
||||
| klausur-service | ~20 min | ~2.5 GB | PyTorch + sentence-transformers werden bei jedem Build installiert |
|
||||
|
||||
## Ziel-Architektur
|
||||
|
||||
```
|
||||
┌─────────────────┐ HTTP ┌──────────────────┐
|
||||
│ klausur-service │ ───────────→ │ embedding-service │
|
||||
│ (FastAPI) │ │ (FastAPI) │
|
||||
│ Port 8086 │ │ Port 8087 │
|
||||
│ ~200 MB │ │ ~2.5 GB │
|
||||
│ Build: 30s │ │ Build: 15 min │
|
||||
└─────────────────┘ └──────────────────┘
|
||||
│ │
|
||||
└────────────┬───────────────────┘
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Qdrant │
|
||||
│ Port 6333 │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Phase 1: Neuen Embedding-Service erstellen
|
||||
|
||||
### 1.1 Verzeichnisstruktur anlegen
|
||||
```
|
||||
klausur-service/
|
||||
├── embedding-service/ # NEU
|
||||
│ ├── Dockerfile
|
||||
│ ├── requirements.txt
|
||||
│ ├── main.py # FastAPI App
|
||||
│ ├── eh_pipeline.py # Kopie
|
||||
│ ├── reranker.py # Kopie
|
||||
│ ├── hyde.py # Kopie
|
||||
│ ├── hybrid_search.py # Kopie
|
||||
│ ├── pdf_extraction.py # Kopie
|
||||
│ └── config.py # Embedding-Konfiguration
|
||||
├── backend/ # Bestehend (wird angepasst)
|
||||
└── frontend/ # Bestehend
|
||||
```
|
||||
|
||||
### 1.2 Dateien in embedding-service erstellen
|
||||
|
||||
**requirements.txt** (ML-spezifisch):
|
||||
```
|
||||
fastapi>=0.109.0
|
||||
uvicorn[standard]>=0.27.0
|
||||
torch>=2.0.0
|
||||
sentence-transformers>=2.2.0
|
||||
qdrant-client>=1.7.0
|
||||
unstructured>=0.12.0
|
||||
pypdf>=4.0.0
|
||||
httpx>=0.26.0
|
||||
pydantic>=2.0.0
|
||||
python-dotenv>=1.0.0
|
||||
```
|
||||
|
||||
**main.py** - API-Endpoints:
|
||||
- `POST /embed` - Generiert Embeddings für Text/Liste von Texten
|
||||
- `POST /embed-single` - Einzelnes Embedding
|
||||
- `POST /rerank` - Re-Ranking von Suchergebnissen
|
||||
- `POST /extract-pdf` - PDF-Text-Extraktion
|
||||
- `GET /health` - Health-Check
|
||||
- `GET /models` - Verfügbare Modelle
|
||||
|
||||
### 1.3 Dockerfile (embedding-service)
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# PyTorch CPU-only für kleinere Images
|
||||
RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Modelle vorab laden (Layer-Cache)
|
||||
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-m3')"
|
||||
RUN python -c "from sentence_transformers import CrossEncoder; CrossEncoder('BAAI/bge-reranker-v2-m3')"
|
||||
|
||||
COPY . .
|
||||
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8087"]
|
||||
```
|
||||
|
||||
## Phase 2: Klausur-Service anpassen
|
||||
|
||||
### 2.1 ML-Dependencies aus requirements.txt entfernen
|
||||
Entfernen:
|
||||
- `torch`
|
||||
- `sentence-transformers`
|
||||
- `unstructured`
|
||||
- `pypdf`
|
||||
|
||||
Behalten:
|
||||
- `fastapi`, `uvicorn`, `httpx`
|
||||
- `qdrant-client` (für Suche)
|
||||
- `cryptography` (BYOEH)
|
||||
- Alle Business-Logic-Dependencies
|
||||
|
||||
### 2.2 Embedding-Client erstellen
|
||||
Neue Datei `backend/embedding_client.py`:
|
||||
```python
|
||||
class EmbeddingClient:
|
||||
def __init__(self, base_url: str = "http://embedding-service:8087"):
|
||||
self.base_url = base_url
|
||||
|
||||
async def generate_embeddings(self, texts: list[str]) -> list[list[float]]:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(f"{self.base_url}/embed", json={"texts": texts})
|
||||
return response.json()["embeddings"]
|
||||
|
||||
async def rerank(self, query: str, documents: list[str]) -> list[dict]:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(f"{self.base_url}/rerank",
|
||||
json={"query": query, "documents": documents})
|
||||
return response.json()["results"]
|
||||
```
|
||||
|
||||
### 2.3 Bestehende Aufrufe umleiten
|
||||
Dateien anpassen:
|
||||
- `backend/main.py`: `generate_single_embedding()` → `embedding_client.generate_embeddings()`
|
||||
- `backend/admin_api.py`: Embedding-Aufrufe über Client
|
||||
- `backend/qdrant_service.py`: Bleibt für Suche, Indexierung nutzt Client
|
||||
|
||||
## Phase 3: Docker-Compose Integration
|
||||
|
||||
### 3.1 docker-compose.dev.yml erweitern
|
||||
```yaml
|
||||
services:
|
||||
klausur-service:
|
||||
build:
|
||||
context: ./klausur-service
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8086:8086"
|
||||
environment:
|
||||
- EMBEDDING_SERVICE_URL=http://embedding-service:8087
|
||||
depends_on:
|
||||
- embedding-service
|
||||
- qdrant
|
||||
|
||||
embedding-service:
|
||||
build:
|
||||
context: ./klausur-service/embedding-service
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8087:8087"
|
||||
environment:
|
||||
- EMBEDDING_BACKEND=local
|
||||
- LOCAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||
- LOCAL_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
|
||||
volumes:
|
||||
- embedding-models:/root/.cache/huggingface # Model-Cache
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
|
||||
qdrant:
|
||||
image: qdrant/qdrant:latest
|
||||
ports:
|
||||
- "6333:6333"
|
||||
volumes:
|
||||
- qdrant-data:/qdrant/storage
|
||||
|
||||
volumes:
|
||||
embedding-models:
|
||||
qdrant-data:
|
||||
```
|
||||
|
||||
## Phase 4: Tests und Validierung
|
||||
|
||||
### 4.1 Unit Tests für Embedding-Service
|
||||
- Test Embedding-Generierung
|
||||
- Test Re-Ranking
|
||||
- Test PDF-Extraktion
|
||||
- Test Health-Endpoint
|
||||
|
||||
### 4.2 Integration Tests
|
||||
- Test klausur-service → embedding-service Kommunikation
|
||||
- Test RAG-Query End-to-End
|
||||
- Test EH-Upload mit Embedding
|
||||
|
||||
### 4.3 Performance-Validierung
|
||||
- Build-Zeit klausur-service messen (Ziel: <1 min)
|
||||
- Embedding-Latenz messen (Ziel: <500ms für einzelnes Embedding)
|
||||
- Re-Ranking-Latenz messen (Ziel: <1s für 10 Dokumente)
|
||||
|
||||
## Implementierungsreihenfolge
|
||||
|
||||
1. **embedding-service/main.py** - FastAPI App mit Endpoints
|
||||
2. **embedding-service/config.py** - Konfiguration
|
||||
3. **embedding-service/requirements.txt** - Dependencies
|
||||
4. **embedding-service/Dockerfile** - Container-Build
|
||||
5. **backend/embedding_client.py** - HTTP-Client
|
||||
6. **backend/requirements.txt** - ML-Deps entfernen
|
||||
7. **backend/main.py** - Aufrufe umleiten
|
||||
8. **backend/admin_api.py** - Aufrufe umleiten
|
||||
9. **docker-compose.dev.yml** - Service hinzufügen
|
||||
10. **Tests** - Validierung
|
||||
|
||||
## Zu bewegende Dateien (Referenz)
|
||||
|
||||
| Datei | Zeilen | Aktion |
|
||||
|-------|--------|--------|
|
||||
| eh_pipeline.py | 777 | Kopieren → embedding-service |
|
||||
| reranker.py | 253 | Kopieren → embedding-service |
|
||||
| hyde.py | 209 | Kopieren → embedding-service |
|
||||
| hybrid_search.py | 285 | Kopieren → embedding-service |
|
||||
| pdf_extraction.py | 479 | Kopieren → embedding-service |
|
||||
|
||||
## Umgebungsvariablen
|
||||
|
||||
### embedding-service
|
||||
```
|
||||
EMBEDDING_BACKEND=local
|
||||
LOCAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||
LOCAL_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
|
||||
OPENAI_API_KEY= # Optional für OpenAI-Backend
|
||||
PDF_EXTRACTION_BACKEND=auto
|
||||
LOG_LEVEL=INFO
|
||||
```
|
||||
|
||||
### klausur-service (neu)
|
||||
```
|
||||
EMBEDDING_SERVICE_URL=http://embedding-service:8087
|
||||
```
|
||||
|
||||
## Risiken und Mitigationen
|
||||
|
||||
| Risiko | Mitigation |
|
||||
|--------|------------|
|
||||
| Netzwerk-Latenz zwischen Services | Model-Caching, Connection-Pooling |
|
||||
| embedding-service nicht erreichbar | Health-Checks, Retry-Logik, Graceful Degradation |
|
||||
| Inkonsistente Embedding-Modelle | Versionierung, Model-Hash-Prüfung |
|
||||
| Erhöhter RAM-Bedarf (2 Container) | Memory-Limits in Docker, Model-Offloading |
|
||||
|
||||
## Erwartete Ergebnisse
|
||||
|
||||
| Metrik | Vorher | Nachher |
|
||||
|--------|--------|---------|
|
||||
| Build-Zeit klausur-service | ~20 min | ~30 sec |
|
||||
| Build-Zeit embedding-service | - | ~15 min |
|
||||
| Image-Größe klausur-service | ~2.5 GB | ~200 MB |
|
||||
| Image-Größe embedding-service | - | ~2.5 GB |
|
||||
| Entwickler-Iteration | Langsam | Schnell |
|
||||
|
||||
## Nicht vergessen (Task A)
|
||||
|
||||
Nach Abschluss der Service-Trennung:
|
||||
- [ ] EH-Upload-Wizard mit Test-Klausur testen
|
||||
- [ ] Security-Infobox im Wizard verifizieren
|
||||
- [ ] End-to-End RAG-Query testen
|
||||
Reference in New Issue
Block a user