feat(claude): Add comprehensive project context and development rules
- CLAUDE.md: Complete project context with SSH connection, 49 services, all URLs (including SDK modules), tech stack, and core principles - open-source-policy.md: License whitelist, SBOM workflow, dependency checks - compliance-checklist.md: DSGVO/AI Act checklists, 5-question quick check - debug-framework.md: 6-phase systematic debugging with Breakpilot-specific commands Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
237
.claude/skills/debug-framework.md
Normal file
237
.claude/skills/debug-framework.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Systematisches Debug-Framework
|
||||
|
||||
## Trigger
|
||||
|
||||
Dieses Skill aktivieren bei:
|
||||
- Fehlermeldungen / Exceptions
|
||||
- Unerwartetes Verhalten
|
||||
- Performance-Probleme
|
||||
- "Es funktioniert nicht"
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Reproduzieren (5 min max)
|
||||
|
||||
### Ziel: Bug in einen Test verwandeln
|
||||
|
||||
```bash
|
||||
# 1. Exakte Schritte dokumentieren
|
||||
# 2. Fehlermeldung/Symptom notieren
|
||||
# 3. Umgebung identifizieren
|
||||
```
|
||||
|
||||
**Fragen:**
|
||||
- [ ] Ist der Bug reproduzierbar?
|
||||
- [ ] Tritt er nur in bestimmten Umgebungen auf?
|
||||
- [ ] Seit wann tritt er auf? (letzter Deploy?)
|
||||
|
||||
### Breakpilot-spezifisch: Welcher Service?
|
||||
|
||||
```bash
|
||||
# Container-Status prüfen
|
||||
ssh macmini "docker compose ps | grep -E '(Exit|unhealthy)'"
|
||||
|
||||
# Logs der letzten 5 Minuten
|
||||
ssh macmini "docker compose logs --since 5m <service-name>"
|
||||
```
|
||||
|
||||
| Symptom | Wahrscheinlicher Service |
|
||||
|---------|-------------------------|
|
||||
| Login fehlgeschlagen | consent-service, backend |
|
||||
| 502 Bad Gateway | nginx, upstream-service |
|
||||
| Langsame Suche | qdrant, embedding-service |
|
||||
| Upload-Fehler | minio, backend |
|
||||
| OCR-Fehler | paddleocr-service, klausur-service |
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Hypothesen bilden (5 min max)
|
||||
|
||||
### 3-5 mögliche Ursachen auflisten
|
||||
|
||||
| # | Hypothese | Wahrscheinlichkeit | Test |
|
||||
|---|-----------|-------------------|------|
|
||||
| 1 | | Hoch/Mittel/Niedrig | |
|
||||
| 2 | | | |
|
||||
| 3 | | | |
|
||||
|
||||
### Häufige Ursachen bei Breakpilot
|
||||
|
||||
**Container/Docker:**
|
||||
- Container nicht gestartet
|
||||
- Volume-Mount-Problem
|
||||
- Netzwerk zwischen Containern unterbrochen
|
||||
- Resource-Limits erreicht
|
||||
|
||||
**Datenbank:**
|
||||
- Connection Pool erschöpft
|
||||
- Migration nicht ausgeführt
|
||||
- Deadlock
|
||||
|
||||
**API/Backend:**
|
||||
- JWT abgelaufen
|
||||
- CORS-Fehler
|
||||
- Rate-Limit erreicht
|
||||
- Falscher Content-Type
|
||||
|
||||
**Frontend:**
|
||||
- Cache-Problem (Safari!)
|
||||
- Build nicht aktualisiert
|
||||
- Umgebungsvariable fehlt
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Systematisch eliminieren (10 min max)
|
||||
|
||||
### Reihenfolge: Schnellste Tests zuerst
|
||||
|
||||
```bash
|
||||
# 1. Ist der Service überhaupt erreichbar?
|
||||
curl -s https://macmini:8000/health | jq
|
||||
|
||||
# 2. Container-Logs auf Fehler prüfen
|
||||
ssh macmini "docker compose logs --tail 50 <service> 2>&1 | grep -iE '(error|exception|failed|traceback)'"
|
||||
|
||||
# 3. Datenbank-Verbindung testen
|
||||
ssh macmini "docker exec breakpilot-pwa-postgres pg_isready"
|
||||
|
||||
# 4. Redis/Valkey erreichbar?
|
||||
ssh macmini "docker exec breakpilot-pwa-valkey valkey-cli ping"
|
||||
```
|
||||
|
||||
### Hypothese testen
|
||||
|
||||
Für jede Hypothese:
|
||||
1. **Test definieren** (wie prüfen wir das?)
|
||||
2. **Test ausführen**
|
||||
3. **Ergebnis:** Bestätigt ✅ / Widerlegt ❌
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Root Cause identifizieren
|
||||
|
||||
### Nicht das Symptom behandeln!
|
||||
|
||||
**Schlecht:** "Ich starte den Container neu"
|
||||
**Gut:** "Der Container crashed wegen OOM → Memory-Limit erhöhen"
|
||||
|
||||
### 5-Why-Methode
|
||||
|
||||
```
|
||||
Problem: API gibt 500 zurück
|
||||
↓ Warum?
|
||||
Datenbank-Query failed
|
||||
↓ Warum?
|
||||
Connection Pool erschöpft
|
||||
↓ Warum?
|
||||
Connections werden nicht freigegeben
|
||||
↓ Warum?
|
||||
Exception vor connection.close()
|
||||
↓ Warum?
|
||||
Fehlendes try/finally
|
||||
→ ROOT CAUSE: Fehlendes Resource-Cleanup
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Fix implementieren
|
||||
|
||||
### Checkliste vor dem Fix
|
||||
|
||||
- [ ] Root Cause verstanden (nicht nur Symptom)
|
||||
- [ ] Fix adressiert Root Cause
|
||||
- [ ] Keine Seiteneffekte erwartet
|
||||
- [ ] Test geschrieben, der Regression verhindert
|
||||
|
||||
### Fix-Template
|
||||
|
||||
```python
|
||||
# VORHER: Bug
|
||||
def fetch_data():
|
||||
conn = get_connection()
|
||||
result = conn.query(...) # Exception hier → Leak!
|
||||
conn.close()
|
||||
return result
|
||||
|
||||
# NACHHER: Fix mit Erklärung
|
||||
def fetch_data():
|
||||
"""Fetch data with proper connection handling.
|
||||
|
||||
Fixed: Connection leak when query raises exception.
|
||||
See: https://macmini:3003/.../issues/123
|
||||
"""
|
||||
conn = get_connection()
|
||||
try:
|
||||
result = conn.query(...)
|
||||
return result
|
||||
finally:
|
||||
conn.close() # Immer ausgeführt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Regression verhindern
|
||||
|
||||
### Test schreiben
|
||||
|
||||
```python
|
||||
def test_connection_released_on_error():
|
||||
"""Regression test for connection leak bug.
|
||||
|
||||
Issue: #123
|
||||
Root cause: Missing finally block in fetch_data()
|
||||
"""
|
||||
initial_connections = get_pool_size()
|
||||
|
||||
with pytest.raises(DatabaseError):
|
||||
fetch_data_with_bad_query()
|
||||
|
||||
# Connection should be returned to pool
|
||||
assert get_pool_size() == initial_connections
|
||||
```
|
||||
|
||||
### Dokumentieren
|
||||
|
||||
```bash
|
||||
# Was haben wir gelernt?
|
||||
# → docs-src/development/debugging-notes.md ergänzen
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: Breakpilot Debug Commands
|
||||
|
||||
```bash
|
||||
# Alle Container-Status
|
||||
ssh macmini "docker compose ps"
|
||||
|
||||
# Logs eines Services (live)
|
||||
ssh macmini "docker compose logs -f <service>"
|
||||
|
||||
# In Container einloggen
|
||||
ssh macmini "docker exec -it breakpilot-pwa-<service> sh"
|
||||
|
||||
# PostgreSQL Query
|
||||
ssh macmini "docker exec breakpilot-pwa-postgres psql -U breakpilot -d breakpilot_db -c 'SELECT 1'"
|
||||
|
||||
# Netzwerk-Debug
|
||||
ssh macmini "docker exec breakpilot-pwa-backend curl -s http://consent-service:8081/health"
|
||||
|
||||
# Resource-Nutzung
|
||||
ssh macmini "docker stats --no-stream"
|
||||
|
||||
# Vault-Status
|
||||
ssh macmini "docker exec breakpilot-pwa-vault vault status"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns vermeiden
|
||||
|
||||
| ❌ Nicht machen | ✅ Stattdessen |
|
||||
|-----------------|----------------|
|
||||
| Random Code ändern | Hypothese bilden, dann testen |
|
||||
| console.log überall | Gezielt an verdächtigen Stellen |
|
||||
| Container neustarten und hoffen | Root Cause finden |
|
||||
| Stundenlang alleine debuggen | Nach 30 min Hilfe holen |
|
||||
| Fix ohne Test | Immer Regression-Test schreiben |
|
||||
Reference in New Issue
Block a user