feat(api): M4.2 — full REST surface + pgx-backed Postgres #7

Merged
sharang merged 3 commits from feat/m4.2-api into main 2026-05-19 10:52:00 +00:00
Owner

What

M4.2 in full: every endpoint in IMPLEMENTATION_PLAN.md §3 M4.2, the OpenAPI 3.1 spec at openapi.yaml, a contract test, and a pgx-backed Postgres store that's a drop-in for the in-memory one.

11 endpoints (16 with health probes and entitlements list):
POST /v1/tenants · GET /v1/tenants/{id} · GET /v1/tenants/by-slug/{slug} · POST /v1/tenants/{id}/activate · POST /v1/tenants/{id}/cancel · GET /v1/entitlements?tenant_id= · GET /v1/catalog · POST /v1/catalog/request · POST /v1/catalog/trial-request · POST /v1/api-keys · GET /v1/api-keys?tenant_id= · DELETE /v1/api-keys/{id} · POST /v1/internal/api-keys/verify · POST /v1/audit · GET /v1/audit (filterable, cursor-paginated).

Auth: still none in this PR. M4.3 wires Keycloak JWT validation on the public surface and a bearer-only check on /v1/internal/*.

Why

Without M4.2 the portal can do nothing useful past loading the seeded acme tenant. Once this lands, every M5.2 surface (settings, billing, audit, API keys page) and every product-uplift milestone (M6/M7) has a real backend to call.

Linked milestone: M4.2

How

  • Store interface (internal/store/store.go): two implementations (Memory for dev convenience, Postgres for stage/prod). Handlers depend on the interface — never on a concrete type. Same test harness runs both via eachStore so any divergence shows up in CI.
  • Postgres impl uses pgxpool directly (not database/sql) for native types + better performance. Errors mapped via pgerrcode: unique → ErrConflict, FK violation → ErrNotFound, check violation → ErrInvalidInput. Mapping happens in one place (helpers.go mapStoreError).
  • API keys: plaintext format bp_<22 base64url chars>. Hash is argon2id with format-tagged encoding (argon2id|<salt>|<hash>) so we can rotate parameters without re-keying. Verify is constant-time. Plaintext is returned ONCE on create + never echoed in the list endpoint.
  • Audit: every state-changing endpoint emits via s.emitAudit() (fire-and-forget — failures logged, not returned to the caller, since the user-facing operation already succeeded). audit_log uses ON DELETE SET NULL so forensic history outlives tenant deletes.
  • Routing: Go 1.22 ServeMux can't disambiguate /v1/tenants/{id}/products from /v1/tenants/by-slug/{slug=products} (the catch-all router panics at registration). Per-tenant sub-resources moved to query-param top-level paths: /v1/entitlements?tenant_id=…, /v1/api-keys?tenant_id=…. Documented inline.
  • OpenAPI contract test loads openapi.yaml via kin-openapi v0.138, validates the spec, and confirms every listed path resolves against the gorillamux-shaped router. Mismatch between handlers and spec breaks CI.

Test plan

  • go test -short ./... — 4 packages, all green
  • go test ./... — full suite with Postgres testcontainers (~2 min)
  • OpenAPI contract test passes
  • make build + make build-migrate — both static binaries
  • Manual smoke: make dev against the dev stack, curl http://localhost:8090/v1/catalog returns 2 entries; full CRUD via curl exercised
  • Validation errors return documented 400s with the right error enum
  • Conflict on duplicate slug returns 409
  • FK violation when creating an entitlement / api_key for a missing tenant returns 404

Risk

Blast radius: repo-local. No prod consumer yet (portal still calls only GET /v1/tenants/by-slug/…).

What could break:

  • No auth. Any caller can mutate any tenant. Acceptable in dev (loopback) but explicitly gated until M4.3. The internal /api-keys/verify endpoint is also unauthenticated — fine on the cluster-internal network, but stage/prod manifests should not expose it.
  • API key argon2 params are tuned for dev (time=1, mem=64MB). Adequate but not aggressive. Tighten in M6.1+ when we have a verify-rate budget.
  • The OpenAPI contract test only checks operation matching, not parameter / body validation — that would need a router-level fixture more complex than is worth right now. Documented inline.

Rollback plan: revert the PR. The schema (M4.1) is unchanged so the in-memory store keeps working downstream.

Checklist

  • Unit + integration tests (memory + Postgres parity via eachStore)
  • Docs updated (README endpoint table + storage section, openapi.yaml)
  • Secrets via Infisical — DATABASE_URL is the only secret-shaped env; .env.example documents it
  • Migration is forward-only — uses M4.1 schema unchanged
  • Tenant scoping — n/a until M4.3 adds JWTs; documented in PR body
  • OpenAPI spec updated
  • featureFlags.evaluate() — n/a
  • CHANGELOG entry under "Added"
## What M4.2 in full: every endpoint in `IMPLEMENTATION_PLAN.md §3 M4.2`, the OpenAPI 3.1 spec at `openapi.yaml`, a contract test, and a pgx-backed Postgres store that's a drop-in for the in-memory one. **11 endpoints** (16 with health probes and entitlements list): `POST /v1/tenants` · `GET /v1/tenants/{id}` · `GET /v1/tenants/by-slug/{slug}` · `POST /v1/tenants/{id}/activate` · `POST /v1/tenants/{id}/cancel` · `GET /v1/entitlements?tenant_id=` · `GET /v1/catalog` · `POST /v1/catalog/request` · `POST /v1/catalog/trial-request` · `POST /v1/api-keys` · `GET /v1/api-keys?tenant_id=` · `DELETE /v1/api-keys/{id}` · `POST /v1/internal/api-keys/verify` · `POST /v1/audit` · `GET /v1/audit` (filterable, cursor-paginated). **Auth: still none in this PR.** M4.3 wires Keycloak JWT validation on the public surface and a bearer-only check on `/v1/internal/*`. ## Why Without M4.2 the portal can do nothing useful past loading the seeded acme tenant. Once this lands, every M5.2 surface (settings, billing, audit, API keys page) and every product-uplift milestone (M6/M7) has a real backend to call. Linked milestone: **M4.2** ## How - **Store interface** (`internal/store/store.go`): two implementations (`Memory` for dev convenience, `Postgres` for stage/prod). Handlers depend on the interface — never on a concrete type. Same test harness runs both via `eachStore` so any divergence shows up in CI. - **Postgres impl** uses `pgxpool` directly (not `database/sql`) for native types + better performance. Errors mapped via `pgerrcode`: unique → `ErrConflict`, FK violation → `ErrNotFound`, check violation → `ErrInvalidInput`. Mapping happens in one place (`helpers.go mapStoreError`). - **API keys**: plaintext format `bp_<22 base64url chars>`. Hash is argon2id with format-tagged encoding (`argon2id|<salt>|<hash>`) so we can rotate parameters without re-keying. Verify is constant-time. Plaintext is returned ONCE on create + never echoed in the list endpoint. - **Audit**: every state-changing endpoint emits via `s.emitAudit()` (fire-and-forget — failures logged, not returned to the caller, since the user-facing operation already succeeded). `audit_log` uses `ON DELETE SET NULL` so forensic history outlives tenant deletes. - **Routing**: Go 1.22 ServeMux can't disambiguate `/v1/tenants/{id}/products` from `/v1/tenants/by-slug/{slug=products}` (the catch-all router panics at registration). Per-tenant sub-resources moved to query-param top-level paths: `/v1/entitlements?tenant_id=…`, `/v1/api-keys?tenant_id=…`. Documented inline. - **OpenAPI contract test** loads `openapi.yaml` via kin-openapi v0.138, validates the spec, and confirms every listed path resolves against the gorillamux-shaped router. Mismatch between handlers and spec breaks CI. ## Test plan - [x] `go test -short ./...` — 4 packages, all green - [x] `go test ./...` — full suite with Postgres testcontainers (~2 min) - [x] OpenAPI contract test passes - [x] `make build` + `make build-migrate` — both static binaries - [x] Manual smoke: `make dev` against the dev stack, `curl http://localhost:8090/v1/catalog` returns 2 entries; full CRUD via curl exercised - [x] Validation errors return documented 400s with the right `error` enum - [x] Conflict on duplicate slug returns 409 - [x] FK violation when creating an entitlement / api_key for a missing tenant returns 404 ## Risk **Blast radius:** repo-local. No prod consumer yet (portal still calls only `GET /v1/tenants/by-slug/…`). **What could break:** - **No auth.** Any caller can mutate any tenant. Acceptable in dev (loopback) but explicitly gated until M4.3. The internal `/api-keys/verify` endpoint is also unauthenticated — fine on the cluster-internal network, but stage/prod manifests should not expose it. - **API key argon2 params** are tuned for dev (time=1, mem=64MB). Adequate but not aggressive. Tighten in M6.1+ when we have a verify-rate budget. - The OpenAPI contract test only checks operation matching, not parameter / body validation — that would need a router-level fixture more complex than is worth right now. Documented inline. **Rollback plan:** revert the PR. The schema (M4.1) is unchanged so the in-memory store keeps working downstream. ## Checklist - [x] Unit + integration tests (memory + Postgres parity via eachStore) - [x] Docs updated (README endpoint table + storage section, openapi.yaml) - [x] Secrets via Infisical — `DATABASE_URL` is the only secret-shaped env; `.env.example` documents it - [x] Migration is forward-only — uses M4.1 schema unchanged - [ ] Tenant scoping — n/a until M4.3 adds JWTs; documented in PR body - [x] OpenAPI spec updated - [ ] featureFlags.evaluate() — n/a - [x] CHANGELOG entry under "Added"
sharang added 1 commit 2026-05-19 10:45:27 +00:00
feat(api): M4.2 — full REST surface + pgx-backed Postgres store
ci / shared (pull_request) Successful in 5s
ci / test (pull_request) Failing after 1m30s
ci / image (pull_request) Has been skipped
4c46d673fb
Replaces the M5.1-skeleton handler set with the M4.2 spec from
IMPLEMENTATION_PLAN.md:

Endpoints (authoritative shape in openapi.yaml):
  POST   /v1/tenants
  GET    /v1/tenants/{id}
  GET    /v1/tenants/by-slug/{slug}
  POST   /v1/tenants/{id}/activate
  POST   /v1/tenants/{id}/cancel
  GET    /v1/entitlements?tenant_id=...
  GET    /v1/catalog
  POST   /v1/catalog/request
  POST   /v1/catalog/trial-request
  POST   /v1/api-keys                       returns plaintext ONCE
  GET    /v1/api-keys?tenant_id=...
  DELETE /v1/api-keys/{id}
  POST   /v1/internal/api-keys/verify       always 200; valid: bool
  POST   /v1/audit
  GET    /v1/audit?{tenant_id,product,actor_id,action,since,until,limit,cursor}

Architecture:
  internal/store/store.go        Store interface (CRUD + audit + ping)
  internal/store/memory.go       in-process impl, used when DATABASE_URL
                                 is empty (seed acme tenant, no migrations)
  internal/store/postgres.go     pgxpool impl against the M4.1 schema
  internal/server/server.go      router + healthz/readyz
  internal/server/{tenants,catalog,apikeys,audit}.go
                                 per-concern handlers (≤250 LoC each)
  internal/server/helpers.go     writeJSON/writeError/error mapping/log mw
  openapi.yaml                   3.1 spec; openapi_test.go is the contract gate

API keys:
  Plaintext format 'bp_<22-char base64>'. Prefix bp_<8> stored for UI.
  Hash is argon2id(salt, time=1, mem=64MB, threads=4, len=32) encoded as
  'argon2id|<salt-b64>|<hash-b64>'. Format-tagged so we can rotate
  parameters without re-keying. Verify is constant-time.

Store selection:
  cmd/server picks Postgres when DATABASE_URL is set, otherwise Memory.
  Both implementations are exercised by the same eachStore test harness —
  parity is enforced.

Audit:
  Every state-changing endpoint emits via s.emitAudit() (fire-and-forget).
  audit_log uses ON DELETE SET NULL on tenant_id so forensic history
  outlives tenant deletes (per M4.1 schema).

Routing constraint:
  Go 1.22 ServeMux can't disambiguate /v1/tenants/{id}/products from
  /v1/tenants/by-slug/{slug=products}. Per-tenant subresources moved to
  query-param top-level paths: /v1/entitlements?tenant_id=… and
  /v1/api-keys?tenant_id=….

Tests:
  Every endpoint exercised against both Memory and Postgres via the
  eachStore harness. Includes happy paths, validation errors, conflicts,
  404s, auto-audit-emit assertion. testcontainers-go for the postgres
  harness; gated by -short.

  TestOpenAPISpec is the contract gate: every documented operation must
  resolve against the router. (kin-openapi v0.138.0.)

Refs: M4.2
CODEOWNERS rules requested review from Benjamin_Boenisch 2026-05-19 10:45:27 +00:00
sharang added 1 commit 2026-05-19 10:48:08 +00:00
ci(tenant-registry): use -coverpkg so server tests count store coverage
ci / shared (pull_request) Successful in 5s
ci / test (pull_request) Failing after 1m0s
ci / image (pull_request) Has been skipped
4852838347
The store package has no test files of its own (its API is exercised
end-to-end through the server's eachStore harness against both Memory
and Postgres). Without -coverpkg, store/* shows 0% and drags the
total below the 70% gate even though every store method is run.

-coverpkg=./internal/... routes the instrumentation from server tests
into store + config + server alike.

Refs: M4.2
sharang added 1 commit 2026-05-19 10:50:09 +00:00
ci(tenant-registry): drop store from go-test, keep it in -coverpkg
ci / shared (pull_request) Successful in 6s
ci / test (pull_request) Successful in 1m29s
ci / image (pull_request) Has been skipped
96c9586b4d
go: no such tool 'covdata' fires when go test tries to build a test
binary for a package with zero _test.go files under -coverpkg. The
store package has no tests of its own (exercised via the server
harness); excluding it from the test command sidesteps the error
while -coverpkg still counts its coverage from server-side exercise.

Refs: M4.2
sharang merged commit ffab866c87 into main 2026-05-19 10:52:00 +00:00
sharang deleted branch feat/m4.2-api 2026-05-19 10:52:00 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: platform/tenant-registry#7