breakpilot-pwa/backend/docs/llm-platform/api/vast-ai-api.md

# vast.ai GPU Infrastructure API Dokumentation

**Version:** 0.1.0
**Base URL:** `/infra/vast`

---

## Übersicht

Die vast.ai Infrastructure API ermöglicht die Steuerung von GPU-Instanzen direkt aus dem Admin Panel. Features:

- **Start/Stop**: GPU-Instanz ein- und ausschalten
- **Auto-Shutdown**: Automatisches Stoppen bei Inaktivität (Kostenkontrolle)
- **Kosten-Tracking**: Laufzeit und Kosten pro Session
- **Audit-Log**: Protokollierung aller Aktionen

---

## Authentifizierung

Alle Endpoints erfordern den `CONTROL_API_KEY` im Header:

```
X-API-Key: <CONTROL_API_KEY>
```

---

## Endpoints

### GET /infra/vast/status

Gibt den aktuellen Status der vast.ai Instanz zurück.

**Response (200):**

```json
{
  "instance_id": 12345,
  "status": "running",
  "gpu_name": "RTX 3090",
  "dph_total": 0.45,
  "endpoint_base_url": "http://10.0.0.1:8001",
  "last_activity": "2024-01-15T10:30:00Z",
  "auto_shutdown_in_minutes": 25,
  "total_runtime_hours": 2.5,
  "total_cost_usd": 1.12,
  "message": null
}
```

**Status-Werte:**

| Status | Beschreibung |
|--------|--------------|
| `running` | Instanz läuft |
| `stopped` | Instanz gestoppt (Disk bleibt) |
| `exited` | Instanz beendet |
| `loading` | Instanz startet |
| `scheduling` | Wartet auf GPU-Zuweisung |
| `creating` | Wird erstellt |
| `unconfigured` | VAST_API_KEY nicht gesetzt |
| `not_found` | Instance ID nicht gefunden |

---

### POST /infra/vast/power/on

Startet die vast.ai Instanz.

**Request Body:**

```json
{
  "wait_for_health": true,
  "health_path": "/health",
  "health_port": 8001
}
```

**Parameter:**

| Parameter | Typ | Default | Beschreibung |
|-----------|-----|---------|--------------|
| `wait_for_health` | boolean | true | Warten bis LLM-Server erreichbar |
| `health_path` | string | "/health" | Health-Check Endpoint |
| `health_port` | integer | 8001 | Port für Health-Check |

**Response (200):**

```json
{
  "status": "running",
  "instance_id": 12345,
  "endpoint_base_url": "http://10.0.0.1:8001",
  "health_url": "http://10.0.0.1:8001/health",
  "message": "Instance running and healthy"
}
```

**Errors:**

| Code | Beschreibung |
|------|--------------|
| 401 | Unauthorized (falscher API Key) |
| 500 | VAST_API_KEY oder VAST_INSTANCE_ID nicht konfiguriert |
| 502 | vast.ai API Fehler |
| 504 | Health-Check Timeout |

---

### POST /infra/vast/power/off

Stoppt die vast.ai Instanz (Disk bleibt erhalten).

**Request Body:**

```json
{}
```

**Response (200):**

```json
{
  "status": "stopped",
  "session_runtime_minutes": 45.5,
  "session_cost_usd": 0.34,
  "message": "Instance stopped. Session: 45.5 min, $0.340"
}
```

---

### POST /infra/vast/activity

Zeichnet Aktivität auf und verzögert den Auto-Shutdown Timer.

**Verwendung:** Sollte vom LLM Gateway bei jedem Request aufgerufen werden.

**Response (200):**

```json
{
  "status": "recorded",
  "last_activity": "2024-01-15T10:30:00Z"
}
```

---

### GET /infra/vast/costs

Gibt Kosten-Statistiken zurück.

**Response (200):**

```json
{
  "total_runtime_hours": 12.5,
  "total_cost_usd": 5.62,
  "sessions_count": 5,
  "avg_session_minutes": 150.0
}
```

---

### GET /infra/vast/audit

Gibt die letzten Audit-Log Einträge zurück.

**Query Parameter:**

| Parameter | Typ | Default | Beschreibung |
|-----------|-----|---------|--------------|
| `limit` | integer | 50 | Max. Anzahl Einträge |

**Response (200):**

```json
[
  {
    "ts": "2024-01-15T10:30:00Z",
    "event": "power_on_complete",
    "actor": "system",
    "meta": {
      "instance_id": 12345,
      "endpoint": "http://10.0.0.1:8001"
    }
  },
  {
    "ts": "2024-01-15T09:00:00Z",
    "event": "auto_shutdown",
    "actor": "system",
    "meta": {
      "inactive_minutes": 30.5
    }
  }
]
```

**Event-Typen:**

| Event | Beschreibung |
|-------|--------------|
| `power_on_requested` | Start angefordert |
| `power_on_complete` | Start abgeschlossen |
| `power_on_health_timeout` | Health-Check fehlgeschlagen |
| `power_off_requested` | Stop angefordert |
| `power_off_complete` | Stop abgeschlossen |
| `auto_shutdown` | Automatischer Stop wegen Inaktivität |
| `auto_shutdown_complete` | Auto-Stop abgeschlossen |

---

## Auto-Shutdown

Der Auto-Shutdown Mechanismus stoppt die Instanz automatisch bei Inaktivität:

1. Bei jedem LLM-Request wird `/activity` aufgerufen
2. Ein Hintergrund-Task prüft alle 60s die letzte Aktivität
3. Nach `VAST_AUTO_SHUTDOWN_MINUTES` ohne Aktivität wird gestoppt
4. Session-Kosten werden berechnet und geloggt

**Konfiguration:**

```bash
VAST_AUTO_SHUTDOWN=true           # Feature aktivieren
VAST_AUTO_SHUTDOWN_MINUTES=30     # Timeout in Minuten
```

---

## Konfiguration

Umgebungsvariablen in `.env`:

```bash
# vast.ai Credentials
VAST_API_KEY=your-vast-api-key          # von https://cloud.vast.ai/cli/
VAST_INSTANCE_ID=12345                   # numerische Instance ID

# Admin-Schutz
CONTROL_API_KEY=your-control-key         # generieren mit: openssl rand -hex 32

# Health Check
VAST_HEALTH_PORT=8001                    # Port auf der Instanz
VAST_HEALTH_PATH=/health                 # Health-Endpoint
VAST_WAIT_TIMEOUT_S=600                  # Timeout beim Start (10 min)

# Auto-Shutdown
VAST_AUTO_SHUTDOWN=true
VAST_AUTO_SHUTDOWN_MINUTES=30

# State Persistence (optional)
VAST_STATE_PATH=./vast_state.json
VAST_AUDIT_PATH=./vast_audit.log
```

---

## Architektur

```
┌─────────────────────────────────────────────────────────────────┐
│                      Admin Panel (Browser)                       │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  GPU Infra Tab                                          │   │
│   │  [Start] [Stop] [Refresh]   Status: Running             │   │
│   │  GPU: RTX 3090   Cost: $0.45/h   Session: 25 min        │   │
│   │  Auto-Shutdown in: 5 min                                │   │
│   └─────────────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Breakpilot Backend                            │
│   ┌───────────────────┐   ┌───────────────────┐                 │
│   │  /infra/vast/*    │   │  Auto-Shutdown    │                 │
│   │  (FastAPI Router) │   │  Background Task  │                 │
│   └─────────┬─────────┘   └─────────┬─────────┘                 │
│             │                       │                            │
│             ▼                       ▼                            │
│   ┌─────────────────────────────────────────────┐               │
│   │           VastAIClient                       │               │
│   │   (REST API zu vast.ai Console)             │               │
│   └─────────────────────────────────────────────┘               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   vast.ai Cloud API                              │
│               https://console.vast.ai/api/v0/                    │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    vast.ai GPU Instance                          │
│   ┌───────────────────────────────────────────────────────┐     │
│   │  Docker Container: vLLM                               │     │
│   │  - Model: Mistral-7B-Instruct                        │     │
│   │  - Port 8000: /v1/chat/completions                   │     │
│   │  - Port 8001: /health (nginx proxy)                  │     │
│   └───────────────────────────────────────────────────────┘     │
│   GPU: RTX 3090 (24GB VRAM)                                     │
└─────────────────────────────────────────────────────────────────┘
```

---

## vast.ai Instance Setup

### 1. Instance buchen

- Typ: **On-Demand** (nicht Contract)
- GPU: **RTX 3090** (24GB) oder **RTX 4090**
- RAM: >= 32 GB
- Disk: >= 150 GB
- Interruptible: **Nein** (Non-interruptible)

### 2. vLLM mit systemd autostart

Auf der vast.ai Instanz:

```bash
# Docker Compose erstellen
mkdir -p ~/llm-stack
cd ~/llm-stack

# docker-compose.yml und health-nginx.conf erstellen
# (siehe vast.ai Implementierung.docx)

# Systemd Service erstellen
sudo tee /etc/systemd/system/llm-stack.service > /dev/null <<'EOF'
[Unit]
Description=LLM Stack via Docker Compose
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/ubuntu/llm-stack
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable llm-stack.service
sudo systemctl start llm-stack.service
```

### 3. Backend konfigurieren

```bash
# .env
VAST_API_KEY=vast_...
VAST_INSTANCE_ID=12345
CONTROL_API_KEY=$(openssl rand -hex 32)
```

---

## Fehlerbehandlung

| Fehler | Ursache | Lösung |
|--------|---------|--------|
| 500: VAST_API_KEY not configured | ENV nicht gesetzt | `.env` prüfen |
| 502: vast CLI failed | vast.ai API Fehler | Instance ID prüfen |
| 504: Health check timeout | vLLM startet nicht | SSH auf Instanz, Logs prüfen |
| Instance stuck in scheduling | GPU nicht verfügbar | Andere GPU wählen |

---

## Kosten-Beispiele

| GPU | $/Stunde | 1h Test | 8h Tag |
|-----|----------|---------|--------|
| RTX 3090 | ~$0.45 | $0.45 | $3.60 |
| RTX 4090 | ~$0.75 | $0.75 | $6.00 |

**Mit Auto-Shutdown (30 min):**
- Vergessen auszuschalten: max. $0.23 (3090) bzw. $0.38 (4090) extra