feat(keycloak): M4.3 — Admin API adapter + claim resolver #8

Merged
sharang merged 2 commits from feat/m4.3-keycloak into main 2026-05-19 11:51:09 +00:00
Owner

What

M4.3 in full: internal/keycloak adapter, KC provisioning hooked into POST /v1/tenants, claims-resolver endpoint, plus the test scaffolding.

  • internal/keycloak/Adapter interface with HTTPAdapter (real KC Admin API, cached client-credentials token, 401 retry) and Mock (in-process; used in tests + when KEYCLOAK_ADMIN_URL is empty).
  • POST /v1/tenants now accepts admin_email + admin_name. When present, the adapter creates a KC organization, invites the user as IT_ADMIN, and triggers VERIFY_EMAIL + UPDATE_PASSWORD. Response wraps the tenant with the new TenantCreated shape ({tenant, invite_url}) so dev testers can use the action-token URL without waiting for the email.
  • POST /v1/internal/keycloak/claims — resolves the up-to-date claim bundle (tenant_id, slug, products, plan, status) for a tenant. The realm's protocol mapper calls this at token issuance.

Why

Without the KC adapter, the portal's OIDC login lands in a realm with hand-edited user attributes — every new tenant requires a manual click in the KC admin UI. M4.3 makes POST /v1/tenants the only place a tenant exists, with KC kept in sync.

The non-fatal failure mode is deliberate. If KC is unreachable during a tenant create, the DB row still lands and a keycloak.provision_failed audit event captures the diagnostic. M14.x's reconciler will heal it; for now an operator one-click in KC fixes it.

Linked milestone: M4.3

How

  • Token caching in client.go: cache the client-credentials token until 30s before expiry, force-refresh on 401, retry once. Token-refresh races are guarded by a sync.Mutex.
  • Atomicity boundary: tenant DB insert and KC org provision are separate operations. The handler emits the right audit event for each outcome. A future M14-style reconciler can compare tenants against KC organizations to detect drift.
  • Mock.FailNext in mock.go is the test-friendly way to assert error handling: set it once, the next call fails with that error, then the hook clears. Used in TestCreateTenant_kcFailure_doesNotRollback.
  • Claims resolver lookup chain: body tenant_id → body tenant_sluguser_attrs.tenant_iduser_attrs.tenant_slug. Returns a typed Claims struct identical to what SyncClaims pushes — single source of truth for the JWT claim shape.

Test plan

  • go test -short ./... — green
  • go test ./... (Postgres testcontainers harness) — green
  • Mock unit tests: org-conflict, user-conflict, FailNext, SyncClaims persistence, empty user_id rejection
  • eachStore integration tests: full provision happy path, KC-failure-doesn't-rollback, claims resolution via every lookup variant, 400 and 404 paths
  • OpenAPI contract test extended to cover /v1/internal/keycloak/claims
  • Real-KC integration test (testcontainers KC 26) deferred to a nightly job — too slow for every PR

Risk

Blast radius: dev only. No production tenant-registry instance to call yet.

What could break:

  • The Admin API client assumes Keycloak 26 endpoint shapes (/admin/realms/{realm}/organizations, /admin/realms/{realm}/users/{id}/execute-actions-email). If we ever downgrade KC, this breaks loudly.
  • The HTTPAdapter doesn't have circuit-breaker / retry beyond the single 401 re-auth. A flaky KC will surface as 500s on POST /v1/tenants. Fine for now; revisit when we have SLOs.

Rollback plan: revert the PR. KC keeps any orgs/users that were created; M4.x reconciler cleans them up. No data loss.

Checklist

  • Unit + integration tests
  • Docs updated (README has a Keycloak adapter section; CHANGELOG entry)
  • Secrets via Infisical — KEYCLOAK_CLIENT_SECRET documented in .env.example with the Infisical path comment
  • OpenAPI spec updated (new endpoint + TenantCreated/Claims schemas)
  • Tenant scoping — still pre-auth; M5.2 portal pages will be the first JWT consumer
  • CHANGELOG entry under "Added"
## What M4.3 in full: `internal/keycloak` adapter, KC provisioning hooked into `POST /v1/tenants`, claims-resolver endpoint, plus the test scaffolding. - **`internal/keycloak/`** — `Adapter` interface with `HTTPAdapter` (real KC Admin API, cached client-credentials token, 401 retry) and `Mock` (in-process; used in tests + when `KEYCLOAK_ADMIN_URL` is empty). - **`POST /v1/tenants`** now accepts `admin_email` + `admin_name`. When present, the adapter creates a KC organization, invites the user as IT_ADMIN, and triggers `VERIFY_EMAIL` + `UPDATE_PASSWORD`. Response wraps the tenant with the new `TenantCreated` shape (`{tenant, invite_url}`) so dev testers can use the action-token URL without waiting for the email. - **`POST /v1/internal/keycloak/claims`** — resolves the up-to-date claim bundle (tenant_id, slug, products, plan, status) for a tenant. The realm's protocol mapper calls this at token issuance. ## Why Without the KC adapter, the portal's OIDC login lands in a realm with hand-edited user attributes — every new tenant requires a manual click in the KC admin UI. M4.3 makes `POST /v1/tenants` the only place a tenant exists, with KC kept in sync. The non-fatal failure mode is deliberate. If KC is unreachable during a tenant create, the DB row still lands and a `keycloak.provision_failed` audit event captures the diagnostic. M14.x's reconciler will heal it; for now an operator one-click in KC fixes it. Linked milestone: **M4.3** ## How - **Token caching** in `client.go`: cache the client-credentials token until 30s before expiry, force-refresh on 401, retry once. Token-refresh races are guarded by a sync.Mutex. - **Atomicity boundary**: tenant DB insert and KC org provision are separate operations. The handler emits the right audit event for each outcome. A future M14-style reconciler can compare `tenants` against KC `organizations` to detect drift. - **`Mock.FailNext`** in mock.go is the test-friendly way to assert error handling: set it once, the next call fails with that error, then the hook clears. Used in `TestCreateTenant_kcFailure_doesNotRollback`. - **Claims resolver lookup chain**: body `tenant_id` → body `tenant_slug` → `user_attrs.tenant_id` → `user_attrs.tenant_slug`. Returns a typed `Claims` struct identical to what `SyncClaims` pushes — single source of truth for the JWT claim shape. ## Test plan - [x] `go test -short ./...` — green - [x] `go test ./...` (Postgres testcontainers harness) — green - [x] Mock unit tests: org-conflict, user-conflict, FailNext, SyncClaims persistence, empty user_id rejection - [x] eachStore integration tests: full provision happy path, KC-failure-doesn't-rollback, claims resolution via every lookup variant, 400 and 404 paths - [x] OpenAPI contract test extended to cover `/v1/internal/keycloak/claims` - [ ] Real-KC integration test (testcontainers KC 26) deferred to a nightly job — too slow for every PR ## Risk **Blast radius:** dev only. No production tenant-registry instance to call yet. **What could break:** - The Admin API client assumes Keycloak 26 endpoint shapes (`/admin/realms/{realm}/organizations`, `/admin/realms/{realm}/users/{id}/execute-actions-email`). If we ever downgrade KC, this breaks loudly. - The `HTTPAdapter` doesn't have circuit-breaker / retry beyond the single 401 re-auth. A flaky KC will surface as 500s on `POST /v1/tenants`. Fine for now; revisit when we have SLOs. **Rollback plan:** revert the PR. KC keeps any orgs/users that were created; M4.x reconciler cleans them up. No data loss. ## Checklist - [x] Unit + integration tests - [x] Docs updated (README has a Keycloak adapter section; CHANGELOG entry) - [x] Secrets via Infisical — `KEYCLOAK_CLIENT_SECRET` documented in `.env.example` with the Infisical path comment - [x] OpenAPI spec updated (new endpoint + TenantCreated/Claims schemas) - [ ] Tenant scoping — still pre-auth; M5.2 portal pages will be the first JWT consumer - [x] CHANGELOG entry under "Added"
CODEOWNERS rules requested review from Benjamin_Boenisch 2026-05-19 11:25:26 +00:00
sharang force-pushed feat/m4.3-keycloak from fd5f8ae36f to bb2c638fb4 2026-05-19 11:27:18 +00:00 Compare
sharang closed this pull request 2026-05-19 11:38:31 +00:00
sharang reopened this pull request 2026-05-19 11:38:33 +00:00
sharang force-pushed feat/m4.3-keycloak from 5b5c16aa90 to 4639915827 2026-05-19 11:40:46 +00:00 Compare
sharang closed this pull request 2026-05-19 11:43:57 +00:00
sharang reopened this pull request 2026-05-19 11:43:59 +00:00
sharang added 1 commit 2026-05-19 11:47:05 +00:00
feat(keycloak): M4.3 — Admin API adapter + claim resolver
ci / image (pull_request) Has been skipped
ci / shared (pull_request) Successful in 6s
ci / test (pull_request) Successful in 1m36s
d4e8042b94
internal/keycloak/ — Adapter interface with two implementations:
  HTTPAdapter  cached client-credentials token; CreateOrgAndInvite +
               SyncClaims + Health against the real KC Admin API.
  Mock         in-process map for unit tests + dev convenience when
               KEYCLOAK_ADMIN_URL is empty. Used by the eachStore harness.

POST /v1/tenants now accepts admin_email + admin_name. When set, the
adapter creates a KC organization, invites the user as IT_ADMIN, and
triggers VERIFY_EMAIL + UPDATE_PASSWORD. Response wraps the tenant
with TenantCreated{tenant, invite_url}. KC failures DO NOT roll the
tenant back — they emit a keycloak.provision_failed audit event.
Successful invites emit keycloak.invite_sent.

POST /v1/internal/keycloak/claims resolves a tenant's current claim
bundle (tenant_id, slug, products, plan, status). Lookup chain:
body.tenant_id → body.tenant_slug → user_attrs.tenant_id →
user_attrs.tenant_slug.

Config: KEYCLOAK_ADMIN_URL / REALM / CLIENT_ID / CLIENT_SECRET;
empty URL falls back to Mock.

Tests:
  internal/keycloak/mock_test.go     conflict surfacing, FailNext hook,
                                     SyncClaims persistence.
  internal/keycloak/client_test.go   HTTPAdapter against an in-process
                                     stub KC: health, full create-org-
                                     and-invite, conflict, token-cache,
                                     401 retry, ErrUnavailable.
  internal/server/keycloak_test.go   eachStore integration: provisions
                                     via mock; failure path emits
                                     provision_failed audit; claims
                                     endpoint via every lookup variant
                                     + 404 + 400.

OpenAPI extended with TenantCreated + Claims schemas and the new
claims endpoint. Contract test asserts the new path.

CI: include internal/keycloak/... in the test package list so
HTTPAdapter coverage counts. Total project line coverage: 71.6%.

Refs: M4.3
sharang force-pushed feat/m4.3-keycloak from 15bc3c40bd to d4e8042b94 2026-05-19 11:47:05 +00:00 Compare
sharang closed this pull request 2026-05-19 11:48:01 +00:00
sharang reopened this pull request 2026-05-19 11:48:07 +00:00
sharang added 1 commit 2026-05-19 11:49:07 +00:00
chore: kick ci
ci / shared (pull_request) Successful in 6s
ci / test (pull_request) Successful in 1m38s
ci / image (pull_request) Has been skipped
b6c4df6ed0
sharang merged commit 9138731eea into main 2026-05-19 11:51:09 +00:00
sharang deleted branch feat/m4.3-keycloak 2026-05-19 11:51:09 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: platform/tenant-registry#8