fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.
This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).
Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
385
backend/docs/llm-platform/api/vast-ai-api.md
Normal file
385
backend/docs/llm-platform/api/vast-ai-api.md
Normal file
@@ -0,0 +1,385 @@
|
||||
# vast.ai GPU Infrastructure API Dokumentation
|
||||
|
||||
**Version:** 0.1.0
|
||||
**Base URL:** `/infra/vast`
|
||||
|
||||
---
|
||||
|
||||
## Übersicht
|
||||
|
||||
Die vast.ai Infrastructure API ermöglicht die Steuerung von GPU-Instanzen direkt aus dem Admin Panel. Features:
|
||||
|
||||
- **Start/Stop**: GPU-Instanz ein- und ausschalten
|
||||
- **Auto-Shutdown**: Automatisches Stoppen bei Inaktivität (Kostenkontrolle)
|
||||
- **Kosten-Tracking**: Laufzeit und Kosten pro Session
|
||||
- **Audit-Log**: Protokollierung aller Aktionen
|
||||
|
||||
---
|
||||
|
||||
## Authentifizierung
|
||||
|
||||
Alle Endpoints erfordern den `CONTROL_API_KEY` im Header:
|
||||
|
||||
```
|
||||
X-API-Key: <CONTROL_API_KEY>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Endpoints
|
||||
|
||||
### GET /infra/vast/status
|
||||
|
||||
Gibt den aktuellen Status der vast.ai Instanz zurück.
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"instance_id": 12345,
|
||||
"status": "running",
|
||||
"gpu_name": "RTX 3090",
|
||||
"dph_total": 0.45,
|
||||
"endpoint_base_url": "http://10.0.0.1:8001",
|
||||
"last_activity": "2024-01-15T10:30:00Z",
|
||||
"auto_shutdown_in_minutes": 25,
|
||||
"total_runtime_hours": 2.5,
|
||||
"total_cost_usd": 1.12,
|
||||
"message": null
|
||||
}
|
||||
```
|
||||
|
||||
**Status-Werte:**
|
||||
|
||||
| Status | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `running` | Instanz läuft |
|
||||
| `stopped` | Instanz gestoppt (Disk bleibt) |
|
||||
| `exited` | Instanz beendet |
|
||||
| `loading` | Instanz startet |
|
||||
| `scheduling` | Wartet auf GPU-Zuweisung |
|
||||
| `creating` | Wird erstellt |
|
||||
| `unconfigured` | VAST_API_KEY nicht gesetzt |
|
||||
| `not_found` | Instance ID nicht gefunden |
|
||||
|
||||
---
|
||||
|
||||
### POST /infra/vast/power/on
|
||||
|
||||
Startet die vast.ai Instanz.
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"wait_for_health": true,
|
||||
"health_path": "/health",
|
||||
"health_port": 8001
|
||||
}
|
||||
```
|
||||
|
||||
**Parameter:**
|
||||
|
||||
| Parameter | Typ | Default | Beschreibung |
|
||||
|-----------|-----|---------|--------------|
|
||||
| `wait_for_health` | boolean | true | Warten bis LLM-Server erreichbar |
|
||||
| `health_path` | string | "/health" | Health-Check Endpoint |
|
||||
| `health_port` | integer | 8001 | Port für Health-Check |
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "running",
|
||||
"instance_id": 12345,
|
||||
"endpoint_base_url": "http://10.0.0.1:8001",
|
||||
"health_url": "http://10.0.0.1:8001/health",
|
||||
"message": "Instance running and healthy"
|
||||
}
|
||||
```
|
||||
|
||||
**Errors:**
|
||||
|
||||
| Code | Beschreibung |
|
||||
|------|--------------|
|
||||
| 401 | Unauthorized (falscher API Key) |
|
||||
| 500 | VAST_API_KEY oder VAST_INSTANCE_ID nicht konfiguriert |
|
||||
| 502 | vast.ai API Fehler |
|
||||
| 504 | Health-Check Timeout |
|
||||
|
||||
---
|
||||
|
||||
### POST /infra/vast/power/off
|
||||
|
||||
Stoppt die vast.ai Instanz (Disk bleibt erhalten).
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "stopped",
|
||||
"session_runtime_minutes": 45.5,
|
||||
"session_cost_usd": 0.34,
|
||||
"message": "Instance stopped. Session: 45.5 min, $0.340"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /infra/vast/activity
|
||||
|
||||
Zeichnet Aktivität auf und verzögert den Auto-Shutdown Timer.
|
||||
|
||||
**Verwendung:** Sollte vom LLM Gateway bei jedem Request aufgerufen werden.
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "recorded",
|
||||
"last_activity": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /infra/vast/costs
|
||||
|
||||
Gibt Kosten-Statistiken zurück.
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"total_runtime_hours": 12.5,
|
||||
"total_cost_usd": 5.62,
|
||||
"sessions_count": 5,
|
||||
"avg_session_minutes": 150.0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /infra/vast/audit
|
||||
|
||||
Gibt die letzten Audit-Log Einträge zurück.
|
||||
|
||||
**Query Parameter:**
|
||||
|
||||
| Parameter | Typ | Default | Beschreibung |
|
||||
|-----------|-----|---------|--------------|
|
||||
| `limit` | integer | 50 | Max. Anzahl Einträge |
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"ts": "2024-01-15T10:30:00Z",
|
||||
"event": "power_on_complete",
|
||||
"actor": "system",
|
||||
"meta": {
|
||||
"instance_id": 12345,
|
||||
"endpoint": "http://10.0.0.1:8001"
|
||||
}
|
||||
},
|
||||
{
|
||||
"ts": "2024-01-15T09:00:00Z",
|
||||
"event": "auto_shutdown",
|
||||
"actor": "system",
|
||||
"meta": {
|
||||
"inactive_minutes": 30.5
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Event-Typen:**
|
||||
|
||||
| Event | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `power_on_requested` | Start angefordert |
|
||||
| `power_on_complete` | Start abgeschlossen |
|
||||
| `power_on_health_timeout` | Health-Check fehlgeschlagen |
|
||||
| `power_off_requested` | Stop angefordert |
|
||||
| `power_off_complete` | Stop abgeschlossen |
|
||||
| `auto_shutdown` | Automatischer Stop wegen Inaktivität |
|
||||
| `auto_shutdown_complete` | Auto-Stop abgeschlossen |
|
||||
|
||||
---
|
||||
|
||||
## Auto-Shutdown
|
||||
|
||||
Der Auto-Shutdown Mechanismus stoppt die Instanz automatisch bei Inaktivität:
|
||||
|
||||
1. Bei jedem LLM-Request wird `/activity` aufgerufen
|
||||
2. Ein Hintergrund-Task prüft alle 60s die letzte Aktivität
|
||||
3. Nach `VAST_AUTO_SHUTDOWN_MINUTES` ohne Aktivität wird gestoppt
|
||||
4. Session-Kosten werden berechnet und geloggt
|
||||
|
||||
**Konfiguration:**
|
||||
|
||||
```bash
|
||||
VAST_AUTO_SHUTDOWN=true # Feature aktivieren
|
||||
VAST_AUTO_SHUTDOWN_MINUTES=30 # Timeout in Minuten
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Konfiguration
|
||||
|
||||
Umgebungsvariablen in `.env`:
|
||||
|
||||
```bash
|
||||
# vast.ai Credentials
|
||||
VAST_API_KEY=your-vast-api-key # von https://cloud.vast.ai/cli/
|
||||
VAST_INSTANCE_ID=12345 # numerische Instance ID
|
||||
|
||||
# Admin-Schutz
|
||||
CONTROL_API_KEY=your-control-key # generieren mit: openssl rand -hex 32
|
||||
|
||||
# Health Check
|
||||
VAST_HEALTH_PORT=8001 # Port auf der Instanz
|
||||
VAST_HEALTH_PATH=/health # Health-Endpoint
|
||||
VAST_WAIT_TIMEOUT_S=600 # Timeout beim Start (10 min)
|
||||
|
||||
# Auto-Shutdown
|
||||
VAST_AUTO_SHUTDOWN=true
|
||||
VAST_AUTO_SHUTDOWN_MINUTES=30
|
||||
|
||||
# State Persistence (optional)
|
||||
VAST_STATE_PATH=./vast_state.json
|
||||
VAST_AUDIT_PATH=./vast_audit.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Admin Panel (Browser) │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ GPU Infra Tab │ │
|
||||
│ │ [Start] [Stop] [Refresh] Status: Running │ │
|
||||
│ │ GPU: RTX 3090 Cost: $0.45/h Session: 25 min │ │
|
||||
│ │ Auto-Shutdown in: 5 min │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Breakpilot Backend │
|
||||
│ ┌───────────────────┐ ┌───────────────────┐ │
|
||||
│ │ /infra/vast/* │ │ Auto-Shutdown │ │
|
||||
│ │ (FastAPI Router) │ │ Background Task │ │
|
||||
│ └─────────┬─────────┘ └─────────┬─────────┘ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ VastAIClient │ │
|
||||
│ │ (REST API zu vast.ai Console) │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ vast.ai Cloud API │
|
||||
│ https://console.vast.ai/api/v0/ │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ vast.ai GPU Instance │
|
||||
│ ┌───────────────────────────────────────────────────────┐ │
|
||||
│ │ Docker Container: vLLM │ │
|
||||
│ │ - Model: Mistral-7B-Instruct │ │
|
||||
│ │ - Port 8000: /v1/chat/completions │ │
|
||||
│ │ - Port 8001: /health (nginx proxy) │ │
|
||||
│ └───────────────────────────────────────────────────────┘ │
|
||||
│ GPU: RTX 3090 (24GB VRAM) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## vast.ai Instance Setup
|
||||
|
||||
### 1. Instance buchen
|
||||
|
||||
- Typ: **On-Demand** (nicht Contract)
|
||||
- GPU: **RTX 3090** (24GB) oder **RTX 4090**
|
||||
- RAM: >= 32 GB
|
||||
- Disk: >= 150 GB
|
||||
- Interruptible: **Nein** (Non-interruptible)
|
||||
|
||||
### 2. vLLM mit systemd autostart
|
||||
|
||||
Auf der vast.ai Instanz:
|
||||
|
||||
```bash
|
||||
# Docker Compose erstellen
|
||||
mkdir -p ~/llm-stack
|
||||
cd ~/llm-stack
|
||||
|
||||
# docker-compose.yml und health-nginx.conf erstellen
|
||||
# (siehe vast.ai Implementierung.docx)
|
||||
|
||||
# Systemd Service erstellen
|
||||
sudo tee /etc/systemd/system/llm-stack.service > /dev/null <<'EOF'
|
||||
[Unit]
|
||||
Description=LLM Stack via Docker Compose
|
||||
After=docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
RemainAfterExit=yes
|
||||
WorkingDirectory=/home/ubuntu/llm-stack
|
||||
ExecStart=/usr/bin/docker compose up -d
|
||||
ExecStop=/usr/bin/docker compose down
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl enable llm-stack.service
|
||||
sudo systemctl start llm-stack.service
|
||||
```
|
||||
|
||||
### 3. Backend konfigurieren
|
||||
|
||||
```bash
|
||||
# .env
|
||||
VAST_API_KEY=vast_...
|
||||
VAST_INSTANCE_ID=12345
|
||||
CONTROL_API_KEY=$(openssl rand -hex 32)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fehlerbehandlung
|
||||
|
||||
| Fehler | Ursache | Lösung |
|
||||
|--------|---------|--------|
|
||||
| 500: VAST_API_KEY not configured | ENV nicht gesetzt | `.env` prüfen |
|
||||
| 502: vast CLI failed | vast.ai API Fehler | Instance ID prüfen |
|
||||
| 504: Health check timeout | vLLM startet nicht | SSH auf Instanz, Logs prüfen |
|
||||
| Instance stuck in scheduling | GPU nicht verfügbar | Andere GPU wählen |
|
||||
|
||||
---
|
||||
|
||||
## Kosten-Beispiele
|
||||
|
||||
| GPU | $/Stunde | 1h Test | 8h Tag |
|
||||
|-----|----------|---------|--------|
|
||||
| RTX 3090 | ~$0.45 | $0.45 | $3.60 |
|
||||
| RTX 4090 | ~$0.75 | $0.75 | $6.00 |
|
||||
|
||||
**Mit Auto-Shutdown (30 min):**
|
||||
- Vergessen auszuschalten: max. $0.23 (3090) bzw. $0.38 (4090) extra
|
||||
Reference in New Issue
Block a user