Files
tenant-registry/README.md
T
sharang bb2c638fb4
ci / test (pull_request) Failing after 1m31s
ci / image (pull_request) Has been skipped
ci / shared (pull_request) Successful in 5s
feat(keycloak): M4.3 — Admin API adapter + claim resolver
internal/keycloak/ — Adapter interface with two implementations:
  HTTPAdapter  pgxpool-style real Admin API client with cached client-
               credentials token (auto-refresh, 401 retry).
  Mock         in-process map for unit tests + dev convenience when
               KEYCLOAK_ADMIN_URL is empty. Used by the eachStore harness.

Adapter contract (adapter.go):
  CreateOrgAndInvite(ctx, InviteInput) (*InviteResult, error)
    Creates a KC organization, an IT_ADMIN user, adds the user as a
    member, triggers VERIFY_EMAIL + UPDATE_PASSWORD execute-actions
    email. Atomic from the caller's PoV; partial failures surface as
    typed errors (ErrOrgConflict, ErrUserConflict, ErrUnauthorized,
    ErrUnavailable).
  SyncClaims(ctx, userID, Claims) error
    Pushes tenant_id / tenant_slug / org_roles / products / plan /
    tenant_status into the user's KC attributes — the same shape the
    realm's protocol mappers project into JWTs.
  Health(ctx) error
    Pings /admin/serverinfo; wired into readyz.

Wiring:
  POST /v1/tenants now accepts admin_email + admin_name. When set, the
  adapter creates the org and invites the user. Response wraps the
  tenant with the new TenantCreated{tenant, invite_url} shape so dev
  testers can use the action-token URL without waiting for the email.
  KC failures DO NOT roll the tenant back — they emit a
  keycloak.provision_failed audit event so the operator can resend.
  Successful invites emit keycloak.invite_sent.

  POST /v1/internal/keycloak/claims resolves a tenant's current claim
  bundle. Lookup chain: body.tenant_id → body.tenant_slug →
  body.user_attrs.tenant_id → body.user_attrs.tenant_slug. The realm's
  protocol mapper calls this at token issuance, or operators on demand.

Config: KEYCLOAK_ADMIN_URL / REALM / CLIENT_ID / CLIENT_SECRET; empty
URL falls back to Mock for dev.

OpenAPI: TenantCreated + Claims schemas added; /v1/internal/keycloak/claims
documented. Contract test extended to cover the new endpoint.

Tests:
  internal/keycloak/mock_test.go    Mock semantics: conflict surfacing,
                                    FailNext hook, SyncClaims persistence.
  internal/server/keycloak_test.go  KC provisioning end-to-end via
                                    eachStore: invite_url returned,
                                    mock records, invite_sent audit;
                                    failure path emits provision_failed
                                    but tenant still lands; claims
                                    endpoint resolves via tenant_id /
                                    tenant_slug / user_attrs / 404 / 400.

The real-KC integration test (against a testcontainers-spun KC 26)
lands in a follow-up — gating it behind KEYCLOAK_INTEGRATION=1 + a
slower nightly CI is cleaner than baking 30s+ of KC boot into every PR.

Refs: M4.3
2026-05-19 13:27:16 +02:00

169 lines
9.0 KiB
Markdown

# tenant-registry
Multi-tenant glue: orgs, entitlements, API keys, audit.
> Part of the **Breakpilot Platform**. For the big picture see [`platform/docs`](https://gitea.meghsakha.com/platform/docs):
> [Architecture](https://gitea.meghsakha.com/platform/docs/src/branch/main/PLATFORM_ARCHITECTURE.md) ·
> [Infrastructure](https://gitea.meghsakha.com/platform/docs/src/branch/main/INFRASTRUCTURE.md) ·
> [Product Integration Spec](https://gitea.meghsakha.com/platform/docs/src/branch/main/PRODUCT_INTEGRATION_SPEC.md) ·
> [Implementation Plan](https://gitea.meghsakha.com/platform/docs/src/branch/main/IMPLEMENTATION_PLAN.md)
## What this is
Multi-tenant glue: orgs, entitlements, API keys, audit. Scaffolded under milestone M4.1. See [`platform/docs`](https://gitea.meghsakha.com/platform/docs) for the full architecture context.
**Plane:** Control
**Owner:** @sharang
**Status:** pre-alpha
**Linked milestone:** [M4.1](https://gitea.meghsakha.com/platform/docs/src/branch/main/IMPLEMENTATION_PLAN.md)
## Run locally
```bash
# Prerequisites: Go 1.25+
# Dependencies (Keycloak, pg-app) come from the dev stack — see platform/orca-platform/dev.
# In one terminal — bring up dev dependencies (in the orca-platform clone):
cd /path/to/platform/orca-platform && make dev-up
# In another — run the service:
make dev # APP_ENV=dev, listens on :8090 (Keycloak owns :8080 in the dev stack)
make test # unit tests
make build # compile to ./bin/tenant-registry
```
Env vars (override at the shell):
| Var | Default | Purpose |
|---|---|---|
| `APP_ENV` | `dev` | one of `dev`, `stage`, `prod` |
| `ADDR` | `:8090` | listen address (avoids Keycloak's :8080) |
| `KEYCLOAK_ISSUER` | `http://localhost:8080/realms/breakpilot-dev` | OIDC issuer URL (the JWT signer) |
| `DATABASE_URL` | empty (in-memory store fallback) | Postgres DSN; service uses Memory when empty |
| `KEYCLOAK_ADMIN_URL` | empty (Mock adapter used in dev) | KC base URL for the Admin API |
| `KEYCLOAK_REALM` | `breakpilot-dev` | Realm name for Admin API calls |
| `KEYCLOAK_CLIENT_ID` | empty | Service-account client id (Admin) |
| `KEYCLOAK_CLIENT_SECRET` | empty | Service-account client secret |
## Endpoints
Authoritative spec: [`openapi.yaml`](./openapi.yaml). Summary:
| Method | Path | Purpose |
|---|---|---|
| GET | `/healthz` | Liveness |
| GET | `/readyz` | Pings the store |
| POST | `/v1/tenants` | Create a tenant |
| GET | `/v1/tenants/{id}` | Read by id |
| GET | `/v1/tenants/by-slug/{slug}` | Read by slug (portal middleware uses this) |
| POST | `/v1/tenants/{id}/activate` | trial → active |
| POST | `/v1/tenants/{id}/cancel` | active → frozen |
| GET | `/v1/entitlements?tenant_id={id}` | List product entitlements |
| GET | `/v1/catalog` | List requestable products |
| POST | `/v1/catalog/request` | Customer requests a product (sales follow-up) |
| POST | `/v1/catalog/trial-request` | Self-serve 14-day trial |
| GET | `/v1/api-keys?tenant_id={id}` | List keys |
| POST | `/v1/api-keys` | Create key (plaintext shown once) |
| DELETE | `/v1/api-keys/{id}` | Revoke |
| POST | `/v1/internal/api-keys/verify` | Used by headless products to validate inbound keys |
| POST | `/v1/audit` | Append an audit event |
| GET | `/v1/audit` | Query (cursor-paginated) |
State-changing endpoints emit audit events automatically. The OpenAPI contract test (`openapi_test.go`) asserts every listed path resolves against the committed spec.
## Storage
The service picks its store based on `DATABASE_URL`:
- **empty** → in-memory store, pre-seeded with the `acme` tenant (`id: 00000000-0000-0000-0000-000000000001`). Useful for portal dev without spinning Postgres.
- **set** → pgx-backed Postgres. Run `make migrate-up` against the same DSN first.
Both implementations pass the same test harness (`internal/server/server_test.go``eachStore`).
## Keycloak adapter (M4.3)
`internal/keycloak` is the seam between tenant-registry and Keycloak. The
`Adapter` interface has two implementations:
| Implementation | When used |
|---|---|
| `Mock` | Default in dev when `KEYCLOAK_ADMIN_URL` is empty |
| `HTTPAdapter` | Real KC Admin API client; activated when KC env vars are populated |
`POST /v1/tenants` now accepts `admin_email` and `admin_name`. When set, the
adapter creates a Keycloak organization (alias = the tenant slug), invites
the user as the IT_ADMIN, and triggers the verify-email + set-password
flow. The response body includes `invite_url` so dev testers can use it
without waiting for the email — production discards it.
**KC failures are non-fatal.** The tenant row still lands; a
`keycloak.provision_failed` audit event captures the error so the operator
can resend the invite from the KC UI.
`POST /v1/internal/keycloak/claims` resolves a tenant's current entitlement
bundle (tenant_id, slug, products, plan, status). The realm's protocol
mapper calls this at token-issuance time (or whenever user attributes
need a refresh).
For production, provision a service-account client in the realm with the
`realm-management:manage-users` + `manage-organizations` roles. Drop its
credentials in Infisical at `/{env}/tenant-registry/KEYCLOAK_CLIENT_*`.
## Schema migrations (M4.1)
```bash
# Apply all pending migrations against the dev Postgres (assumes
# `make dev-up` in platform/orca-platform is running):
make migrate-up
# Inspect current version:
make migrate-version
# Roll back the most recent migration:
make migrate-down
# Wipe everything (DESTRUCTIVE — only safe against a dev DB):
make migrate-down-all
# Create the next pair of empty migration files:
make migrate-create NAME=add_team_table
```
Migrations are embedded into both `cmd/server` and `cmd/migrate` via `migrations/embed.go`. In production, `cmd/migrate` ships as an Orca init container so the schema is applied before the API server starts (`IMPLEMENTATION_PLAN.md §1.7`: migrations are forward-only and run as an init container before the service).
The migrations package ships three integration tests (require Docker):
| Test | What it asserts |
|---|---|
| `TestMigrate_upDownRoundTrip` | up → all 6 tables + 4 enums exist; down → schema empty; up again succeeds |
| `TestSeed_canInsertAndQuery` | end-to-end insert across all 6 tables, FK cascade behaviour, `audit_log` SET-NULL on tenant delete |
| `TestSlugConstraint` | tenant slug regex enforced (rejects too-short / leading dash / uppercase / underscore) |
Run them with `make test`. Use `make test-short` in environments without Docker.
## Deployment
| Env | URL | How |
|---|---|---|
| dev | `http://localhost:8090` | `make dev` |
| stage | `https://tenant-registry.stage.breakpilot.com` | auto on merge to `main` |
| prod | `https://tenant-registry.breakpilot.com` | manual: tag `vX.Y.Z` + sign-off |
Rollback: `orca rollout undo tenant-registry --env={{env}}`.
## Observability
- Traces, logs, metrics: [SigNoz](https://signoz.meghsakha.com) — service name `tenant-registry`
- Audit events: Tenant Registry `/audit` (Retraced-shape schema)
- On-call: `oncall@breakpilot.com` · runbook at `platform/docs/runbooks/tenant-registry.md`
## Contributing
See [`CONTRIBUTING.md`](./CONTRIBUTING.md). TL;DR: branch from main, open a PR, 1 review + green CI, squash-merge.
## License
Proprietary — all rights reserved. Copyright (c) 2026 Sharang Parnerkar and Benjamin Boenisch. See [`LICENSE`](./LICENSE).