Migrate deployment from Hetzner to Coolify #1

Merged
sharang merged 12 commits from coolify into main 2026-03-13 10:45:19 +00:00
Owner

Summary

  • Add Coolify deployment configuration (docker-compose, healthchecks, network setup)
  • Replace deploy-hetzner CI job with Coolify webhook deploy
  • Externalize postgres, qdrant, S3 for Coolify environment
  • Remove services not needed for SDK deployment (voice, jitsi, synapse)

All changes since branch creation

  • Coolify docker-compose with healthchecks for all services
  • CI pipeline: deploy-hetzner → deploy-coolify (simple webhook curl)
  • QDRANT_API_KEY support in rag-service
  • Alpine-compatible Dockerfile fixes
## Summary - Add Coolify deployment configuration (docker-compose, healthchecks, network setup) - Replace deploy-hetzner CI job with Coolify webhook deploy - Externalize postgres, qdrant, S3 for Coolify environment - Remove services not needed for SDK deployment (voice, jitsi, synapse) ## All changes since branch creation - Coolify docker-compose with healthchecks for all services - CI pipeline: deploy-hetzner → deploy-coolify (simple webhook curl) - QDRANT_API_KEY support in rag-service - Alpine-compatible Dockerfile fixes
sharang added 10 commits 2026-03-13 09:50:52 +00:00
Add docker-compose.coolify.yml (17 services), .env.coolify.example,
and Gitea Action workflow for Coolify API deployment. Removes nginx,
vault, gitea, woodpecker, mailpit, and dev-only services. Adds Traefik
labels for *.breakpilot.ai domain routing with Let's Encrypt SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove PostgreSQL, Qdrant, MinIO services (managed separately in Coolify)
- Remove Jitsi stack (web, xmpp, jicofo, jvb) and Synapse/synapse-db
- Add POSTGRES_HOST, QDRANT_URL, S3_ENDPOINT/S3_ACCESS_KEY/S3_SECRET_KEY env vars
- Remove Traefik labels from internal-only services
- Health aggregator no longer checks external services
- Core now has 10 services: valkey + 9 application services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace --system/--gid/--uid (Debian syntax) with -S/-g/-u (BusyBox/Alpine).
Coolify ARG injection causes exit code 255 with Debian-style flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove voice-service (removed in main branch)
- Remove voice_session_data volume
- Add OLLAMA_URL and OLLAMA_EMBED_MODEL to rag-service
- Update embedding-service default model to BAAI/bge-m3, memory 4G→8G
- Update health-aggregator CHECK_SERVICES (remove voice-service)
- Update .env.coolify.example accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add QDRANT_API_KEY to config.py (empty string = no auth)
- Pass api_key to QdrantClient constructor (None when empty)
- Add QDRANT_API_KEY to coolify compose and env example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Dockerfile hardcoded TARGETARCH=arm64 for Mac Mini. Coolify server
is x86_64, causing exit code 126 (wrong binary arch). Now uses Docker
BuildKit's auto-detected TARGETARCH with dpkg fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coolify/Traefik requires healthchecks to route traffic to containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove services not needed by SDK from Coolify deployment
Some checks failed
CI / go-lint (pull_request) Failing after 15s
CI / nodejs-lint (pull_request) Failing after 2s
CI / test-python-voice (pull_request) Failing after 11s
CI / deploy-hetzner (pull_request) Has been skipped
CI / python-lint (pull_request) Failing after 10s
CI / test-go-consent (pull_request) Failing after 2s
CI / test-bqas (pull_request) Failing after 10s
Deploy to Coolify / deploy (push) Has been cancelled
cf2cabd098
Remove backend-core, billing-service, night-scheduler, and admin-core
as they are not used by any compliance/SDK service. Update
health-aggregator CHECK_SERVICES to reference consent-service instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benjamin_Boenisch added 27 commits 2026-03-13 09:59:34 +00:00
feat: add Coolify deployment configuration
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
d15de16c47
Add docker-compose.coolify.yml (17 services), .env.coolify.example,
and Gitea Action workflow for Coolify API deployment. Removes nginx,
vault, gitea, woodpecker, mailpit, and dev-only services. Adds Traefik
labels for *.breakpilot.ai domain routing with Let's Encrypt SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
refactor(coolify): externalize postgres, qdrant, S3; remove jitsi/synapse
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
e890b1490a
- Remove PostgreSQL, Qdrant, MinIO services (managed separately in Coolify)
- Remove Jitsi stack (web, xmpp, jicofo, jvb) and Synapse/synapse-db
- Add POSTGRES_HOST, QDRANT_URL, S3_ENDPOINT/S3_ACCESS_KEY/S3_SECRET_KEY env vars
- Remove Traefik labels from internal-only services
- Health aggregator no longer checks external services
- Core now has 10 services: valkey + 9 application services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: use Alpine-compatible addgroup/adduser flags in Dockerfiles
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
755570d474
Replace --system/--gid/--uid (Debian syntax) with -S/-g/-u (BusyBox/Alpine).
Coolify ARG injection causes exit code 255 with Debian-style flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync coolify compose with main: remove voice-service, update rag/embedding
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
4626edb232
- Remove voice-service (removed in main branch)
- Remove voice_session_data volume
- Add OLLAMA_URL and OLLAMA_EMBED_MODEL to rag-service
- Update embedding-service default model to BAAI/bge-m3, memory 4G→8G
- Update health-aggregator CHECK_SERVICES (remove voice-service)
- Update .env.coolify.example accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add QDRANT_API_KEY support to rag-service
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
1dd9662037
- Add QDRANT_API_KEY to config.py (empty string = no auth)
- Pass api_key to QdrantClient constructor (None when empty)
- Add QDRANT_API_KEY to coolify compose and env example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix backend-core TARGETARCH: auto-detect instead of hardcoded arm64
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
5d1c837f49
The Dockerfile hardcoded TARGETARCH=arm64 for Mac Mini. Coolify server
is x86_64, causing exit code 126 (wrong binary arch). Now uses Docker
BuildKit's auto-detected TARGETARCH with dpkg fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix admin-core build: ensure public directory exists before build
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
13ff930b5e
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove Traefik labels from coolify compose — Coolify handles routing
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
9e1660f954
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add healthchecks to backend-core, consent-service, billing-service, admin-core
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
395011d0f4
Coolify/Traefik requires healthchecks to route traffic to containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove services not needed by SDK from Coolify deployment
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
d834753a98
Remove backend-core, billing-service, night-scheduler, and admin-core
as they are not used by any compliance/SDK service. Update
health-aggregator CHECK_SERVICES to reference consent-service instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: PaddleOCR Service (PP-OCRv5 Latin auf x86_64)
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
7cdb53051f
Microservice fuer PaddleOCR auf Hetzner. FastAPI mit /ocr und /health
Endpoints, API-Key Auth, 4GB Memory Limit, Modell-Cache Volume.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: libgl1-mesa-glx → libgl1 (Debian bookworm)
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
8f5f9641c7
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: PaddleOCR v3 API — explicit model name + predict() statt ocr()
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
992d4f2a6b
lang="latin" braucht text_recognition_model_name in PP-OCRv5.
Neue API nutzt predict() statt ocr(), Ergebnis-Format angepasst.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: remove show_log (unknown in PaddleOCR v3 API)
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
4ee38d6f0b
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: PaddleOCR model pre-load at startup + 5min healthcheck grace
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
2bc0f87325
Model wird beim Container-Start geladen (nicht erst beim ersten Request).
Health-Check start_period auf 300s erhoeht fuer initialen Download.
/health gibt "loading" zurueck bis Modell bereit ist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: add libgomp1 (OpenMP) + remove unused lang parameter
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
3133615044
PaddlePaddle braucht libgomp.so.1 fuer Inferenz.
lang wird ignoriert bei explizitem model_name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: disable oneDNN (FLAGS_use_mkldnn=0) for PaddlePaddle compat
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
2c9b0dc448
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: pin PaddlePaddle 2.6.2 + PaddleOCR 2.8.1 (stable, no PIR bug)
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
79891063dd
PaddlePaddle 3.x hat oneDNN/PIR Executor Bug. Zurueck auf 2.6.2
mit bewaeherter ocr() API statt predict().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: expose port 8095 directly (bypass Traefik 60s timeout)
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
778c44226e
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: PaddleOCR 3.4.0 compatibility — use lang=en with PP-OCRv5
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
8003dcac39
PaddleOCR 3.4.0 removed 'latin' language support, causing ValueError
at startup. Now uses lang='en' with ocr_version='PP-OCRv5' and falls
back to lang='latin' for older PaddleOCR versions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: catch all exceptions in PaddleOCR version fallback
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
86b11c7e5f
PaddleOCR 2.8.1 throws a generic Exception (not ValueError) when
ocr_version='PP-OCRv5' is used. Broadened except clause to catch
any error and fall back to lang='latin' for older versions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: add detailed logging for PaddleOCR model loading debug
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
b36712247b
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: load PaddleOCR model in background thread
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
5ee3cc0104
The import and model loading can take minutes and was blocking
the startup event, causing health checks to timeout. Now loads
in a background thread — health endpoint returns 200 immediately
with status 'loading' until model is ready.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Old paddlepaddle==2.6.2 + paddleocr==2.8.1 caused hangs on first OCR
request. Upgrading to paddlepaddle>=3.0.0 + paddleocr>=2.9.0 enables
native PP-OCRv5 support and fixes stability issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: resolve stash conflict
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
CI / go-lint (pull_request) Failing after 2s
CI / python-lint (pull_request) Failing after 14s
CI / nodejs-lint (pull_request) Failing after 3s
CI / test-go-consent (pull_request) Failing after 3s
CI / test-python-voice (pull_request) Failing after 11s
CI / test-bqas (pull_request) Failing after 10s
CI / deploy-hetzner (pull_request) Has been skipped
559d6a351c
Benjamin_Boenisch added 1 commit 2026-03-13 10:09:36 +00:00
fix: robust PaddleOCR init with multiple fallback strategies
Some checks failed
CI / go-lint (pull_request) Failing after 2s
CI / python-lint (pull_request) Failing after 11s
CI / nodejs-lint (pull_request) Failing after 2s
CI / test-go-consent (pull_request) Failing after 2s
CI / test-python-voice (pull_request) Failing after 14s
CI / test-bqas (pull_request) Failing after 11s
CI / deploy-hetzner (pull_request) Has been skipped
Deploy to Coolify / deploy (push) Has been cancelled
65177d3ff7
PaddleOCR 3.x removed show_log param and lang='latin'. Try multiple
init strategies in order until one succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sharang force-pushed coolify from 65177d3ff7 to 0fb4a7e359 2026-03-13 10:30:02 +00:00 Compare
sharang added 1 commit 2026-03-13 10:42:44 +00:00
Replace deploy-hetzner with Coolify webhook deploy in ci.yaml
Some checks failed
CI / go-lint (pull_request) Failing after 2s
CI / python-lint (pull_request) Failing after 10s
CI / nodejs-lint (pull_request) Failing after 2s
CI / test-go-consent (pull_request) Failing after 2s
CI / test-python-voice (pull_request) Failing after 9s
CI / Deploy (pull_request) Has been skipped
CI / test-bqas (pull_request) Failing after 10s
e9487a31c6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sharang merged commit fcf8aa8652 into main 2026-03-13 10:45:19 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Benjamin_Boenisch/breakpilot-core#1