Improve health check endpoints to verify real service connectivity #17

Open
opened 2026-04-20 09:36:31 +00:00 by sharang · 0 comments
Owner

Problem

Health endpoints in backend-compliance and ai-compliance-sdk return {"status": "healthy"} unconditionally without checking whether dependencies are actually reachable. Orca marks the deployment healthy even when the database is down.

Required Actions

  1. Update /health endpoint in backend-compliance to:
    • Execute SELECT 1 against the DB; return 503 if it fails
    • Check Qdrant reachability (HTTP ping); return 503 if unavailable, but flag as degraded (not critical)
  2. Distinguish liveness (/health/live) from readiness (/health/ready):
    • Live: is the process running? Always 200.
    • Ready: are all dependencies reachable? 200 or 503.
  3. Apply same split to ai-compliance-sdk Go service
  4. Update docker-compose healthcheck to use /health/ready

Acceptance Criteria

  • /health/ready returns 503 when DB is unreachable
  • Orca/deploy system only marks service healthy after /health/ready returns 200
  • Response body includes dependency status: {"db": "ok", "qdrant": "degraded"}
## Problem Health endpoints in backend-compliance and ai-compliance-sdk return `{"status": "healthy"}` unconditionally without checking whether dependencies are actually reachable. Orca marks the deployment healthy even when the database is down. ## Required Actions 1. Update `/health` endpoint in backend-compliance to: - Execute `SELECT 1` against the DB; return 503 if it fails - Check Qdrant reachability (HTTP ping); return 503 if unavailable, but flag as degraded (not critical) 2. Distinguish liveness (`/health/live`) from readiness (`/health/ready`): - Live: is the process running? Always 200. - Ready: are all dependencies reachable? 200 or 503. 3. Apply same split to ai-compliance-sdk Go service 4. Update docker-compose healthcheck to use `/health/ready` ## Acceptance Criteria - `/health/ready` returns 503 when DB is unreachable - Orca/deploy system only marks service healthy after `/health/ready` returns 200 - Response body includes dependency status: `{"db": "ok", "qdrant": "degraded"}`
sharang added this to the M3: Observability & Audit Logging milestone 2026-04-20 09:36:31 +00:00
sharang added the reliabilityseverity: mediumobservability labels 2026-04-20 09:36:31 +00:00
Sign in to join this conversation.