Archived

This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Benjamin Admin 21a844cb8a fix: Restore all files lost during destructive rebase

A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-09 09:51:32 +01:00

12 KiB

Raw Blame History

vast.ai GPU Infrastructure API Dokumentation

Version: 0.1.0 Base URL: /infra/vast

Übersicht

Die vast.ai Infrastructure API ermöglicht die Steuerung von GPU-Instanzen direkt aus dem Admin Panel. Features:

Start/Stop: GPU-Instanz ein- und ausschalten
Auto-Shutdown: Automatisches Stoppen bei Inaktivität (Kostenkontrolle)
Kosten-Tracking: Laufzeit und Kosten pro Session
Audit-Log: Protokollierung aller Aktionen

Authentifizierung

Alle Endpoints erfordern den CONTROL_API_KEY im Header:

X-API-Key: <CONTROL_API_KEY>

Endpoints

GET /infra/vast/status

Gibt den aktuellen Status der vast.ai Instanz zurück.

Response (200):

{
  "instance_id": 12345,
  "status": "running",
  "gpu_name": "RTX 3090",
  "dph_total": 0.45,
  "endpoint_base_url": "http://10.0.0.1:8001",
  "last_activity": "2024-01-15T10:30:00Z",
  "auto_shutdown_in_minutes": 25,
  "total_runtime_hours": 2.5,
  "total_cost_usd": 1.12,
  "message": null
}

Status-Werte:

Status	Beschreibung
`running`	Instanz läuft
`stopped`	Instanz gestoppt (Disk bleibt)
`exited`	Instanz beendet
`loading`	Instanz startet
`scheduling`	Wartet auf GPU-Zuweisung
`creating`	Wird erstellt
`unconfigured`	VAST_API_KEY nicht gesetzt
`not_found`	Instance ID nicht gefunden

POST /infra/vast/power/on

Startet die vast.ai Instanz.

Request Body:

{
  "wait_for_health": true,
  "health_path": "/health",
  "health_port": 8001
}

Parameter:

Parameter	Typ	Default	Beschreibung
`wait_for_health`	boolean	true	Warten bis LLM-Server erreichbar
`health_path`	string	"/health"	Health-Check Endpoint
`health_port`	integer	8001	Port für Health-Check

Response (200):

{
  "status": "running",
  "instance_id": 12345,
  "endpoint_base_url": "http://10.0.0.1:8001",
  "health_url": "http://10.0.0.1:8001/health",
  "message": "Instance running and healthy"
}

Errors:

Code	Beschreibung
401	Unauthorized (falscher API Key)
500	VAST_API_KEY oder VAST_INSTANCE_ID nicht konfiguriert
502	vast.ai API Fehler
504	Health-Check Timeout

POST /infra/vast/power/off

Stoppt die vast.ai Instanz (Disk bleibt erhalten).

Request Body:

{}

Response (200):

{
  "status": "stopped",
  "session_runtime_minutes": 45.5,
  "session_cost_usd": 0.34,
  "message": "Instance stopped. Session: 45.5 min, $0.340"
}

POST /infra/vast/activity

Zeichnet Aktivität auf und verzögert den Auto-Shutdown Timer.

Verwendung: Sollte vom LLM Gateway bei jedem Request aufgerufen werden.

Response (200):

{
  "status": "recorded",
  "last_activity": "2024-01-15T10:30:00Z"
}

GET /infra/vast/costs

Gibt Kosten-Statistiken zurück.

Response (200):

{
  "total_runtime_hours": 12.5,
  "total_cost_usd": 5.62,
  "sessions_count": 5,
  "avg_session_minutes": 150.0
}

GET /infra/vast/audit

Gibt die letzten Audit-Log Einträge zurück.

Query Parameter:

Parameter	Typ	Default	Beschreibung
`limit`	integer	50	Max. Anzahl Einträge

Response (200):

[
  {
    "ts": "2024-01-15T10:30:00Z",
    "event": "power_on_complete",
    "actor": "system",
    "meta": {
      "instance_id": 12345,
      "endpoint": "http://10.0.0.1:8001"
    }
  },
  {
    "ts": "2024-01-15T09:00:00Z",
    "event": "auto_shutdown",
    "actor": "system",
    "meta": {
      "inactive_minutes": 30.5
    }
  }
]

Event-Typen:

Event	Beschreibung
`power_on_requested`	Start angefordert
`power_on_complete`	Start abgeschlossen
`power_on_health_timeout`	Health-Check fehlgeschlagen
`power_off_requested`	Stop angefordert
`power_off_complete`	Stop abgeschlossen
`auto_shutdown`	Automatischer Stop wegen Inaktivität
`auto_shutdown_complete`	Auto-Stop abgeschlossen

Auto-Shutdown

Der Auto-Shutdown Mechanismus stoppt die Instanz automatisch bei Inaktivität:

Bei jedem LLM-Request wird /activity aufgerufen
Ein Hintergrund-Task prüft alle 60s die letzte Aktivität
Nach VAST_AUTO_SHUTDOWN_MINUTES ohne Aktivität wird gestoppt
Session-Kosten werden berechnet und geloggt

Konfiguration:

VAST_AUTO_SHUTDOWN=true           # Feature aktivieren
VAST_AUTO_SHUTDOWN_MINUTES=30     # Timeout in Minuten

Konfiguration

Umgebungsvariablen in .env:

# vast.ai Credentials
VAST_API_KEY=your-vast-api-key          # von https://cloud.vast.ai/cli/
VAST_INSTANCE_ID=12345                   # numerische Instance ID

# Admin-Schutz
CONTROL_API_KEY=your-control-key         # generieren mit: openssl rand -hex 32

# Health Check
VAST_HEALTH_PORT=8001                    # Port auf der Instanz
VAST_HEALTH_PATH=/health                 # Health-Endpoint
VAST_WAIT_TIMEOUT_S=600                  # Timeout beim Start (10 min)

# Auto-Shutdown
VAST_AUTO_SHUTDOWN=true
VAST_AUTO_SHUTDOWN_MINUTES=30

# State Persistence (optional)
VAST_STATE_PATH=./vast_state.json
VAST_AUDIT_PATH=./vast_audit.log

Architektur

┌─────────────────────────────────────────────────────────────────┐
│                      Admin Panel (Browser)                       │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  GPU Infra Tab                                          │   │
│   │  [Start] [Stop] [Refresh]   Status: Running             │   │
│   │  GPU: RTX 3090   Cost: $0.45/h   Session: 25 min        │   │
│   │  Auto-Shutdown in: 5 min                                │   │
│   └─────────────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Breakpilot Backend                            │
│   ┌───────────────────┐   ┌───────────────────┐                 │
│   │  /infra/vast/*    │   │  Auto-Shutdown    │                 │
│   │  (FastAPI Router) │   │  Background Task  │                 │
│   └─────────┬─────────┘   └─────────┬─────────┘                 │
│             │                       │                            │
│             ▼                       ▼                            │
│   ┌─────────────────────────────────────────────┐               │
│   │           VastAIClient                       │               │
│   │   (REST API zu vast.ai Console)             │               │
│   └─────────────────────────────────────────────┘               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   vast.ai Cloud API                              │
│               https://console.vast.ai/api/v0/                    │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    vast.ai GPU Instance                          │
│   ┌───────────────────────────────────────────────────────┐     │
│   │  Docker Container: vLLM                               │     │
│   │  - Model: Mistral-7B-Instruct                        │     │
│   │  - Port 8000: /v1/chat/completions                   │     │
│   │  - Port 8001: /health (nginx proxy)                  │     │
│   └───────────────────────────────────────────────────────┘     │
│   GPU: RTX 3090 (24GB VRAM)                                     │
└─────────────────────────────────────────────────────────────────┘

vast.ai Instance Setup

1. Instance buchen

Typ: On-Demand (nicht Contract)
GPU: RTX 3090 (24GB) oder RTX 4090
RAM: >= 32 GB
Disk: >= 150 GB
Interruptible: Nein (Non-interruptible)

2. vLLM mit systemd autostart

Auf der vast.ai Instanz:

# Docker Compose erstellen
mkdir -p ~/llm-stack
cd ~/llm-stack

# docker-compose.yml und health-nginx.conf erstellen
# (siehe vast.ai Implementierung.docx)

# Systemd Service erstellen
sudo tee /etc/systemd/system/llm-stack.service > /dev/null <<'EOF'
[Unit]
Description=LLM Stack via Docker Compose
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/ubuntu/llm-stack
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable llm-stack.service
sudo systemctl start llm-stack.service

3. Backend konfigurieren

# .env
VAST_API_KEY=vast_...
VAST_INSTANCE_ID=12345
CONTROL_API_KEY=$(openssl rand -hex 32)

Fehlerbehandlung

Fehler	Ursache	Lösung
500: VAST_API_KEY not configured	ENV nicht gesetzt	`.env` prüfen
502: vast CLI failed	vast.ai API Fehler	Instance ID prüfen
504: Health check timeout	vLLM startet nicht	SSH auf Instanz, Logs prüfen
Instance stuck in scheduling	GPU nicht verfügbar	Andere GPU wählen

Kosten-Beispiele

GPU	$/Stunde	1h Test	8h Tag
RTX 3090	~$0.45	$0.45	$3.60
RTX 4090	~$0.75	$0.75	$6.00

Mit Auto-Shutdown (30 min):

Vergessen auszuschalten: max. $0.23 (3090) bzw. $0.38 (4090) extra

12 KiB Raw Blame History

vast.ai GPU Infrastructure API Dokumentation

Übersicht

Authentifizierung

Endpoints

GET /infra/vast/status

POST /infra/vast/power/on

POST /infra/vast/power/off

POST /infra/vast/activity

GET /infra/vast/costs

GET /infra/vast/audit

Auto-Shutdown

Konfiguration

Architektur

vast.ai Instance Setup

1. Instance buchen

2. vLLM mit systemd autostart

3. Backend konfigurieren

Fehlerbehandlung

Kosten-Beispiele

12 KiB

Raw Blame History