feat(m7.3): scheduler pulls tenants from registry, env as fallback #96

Open
sharang wants to merge 1 commits from feat/m7.3-scheduler-tenant-registry-v2 into feat/m7.3-admin-endpoints
Owner

Summary

Replaces the M7.2-C static SCHEDULER_TENANT_IDS env enumeration with a live query to the tenant-registry at every tick. New tenants get picked up without an agent restart; the env stays as a fallback so the scheduler is never silenced by a registry outage.

Stacked on #95 (admin endpoints) since that PR added tenant_registry_url to AgentConfig. Once #95 lands this auto-retargets to main.

Resolution order

  1. agent.config.tenant_registry_urlGET <url>/v1/tenants
    • 5s timeout (kept short — we'd rather fall back than block the tick)
    • Frozen and Archived tenants filtered out (the M7.1 status gate would 402/410 them anyway)
    • Accepts either {"id":"..."} or {"tenant_id":"..."} for forward compatibility with whatever shape the registry settles on
  2. SCHEDULER_TENANT_IDS env (comma-separated) — fallback when the registry URL is unset OR the fetch fails OR the parsed response is empty. Each failure mode logs a warn with the url so operators see the problem.
  3. DEFAULT_SCHEDULER_TENANT_ID ("dev") — last-ditch fallback so a bare cargo run against a clean Mongo still scans the dev tenant.

Why fresh on every tick

Tick frequency is every few hours (default scan_schedule = "0 0 */6 * * *"). The registry call happens at most 4 times a day per agent — cheap. Caching introduces a staleness window for newly provisioned tenants, and the whole point of registry integration is to pick them up fast.

Startup log

Includes tenant source=tenant-registry or env so operators can tell at a glance which mode the scheduler is in.

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy -p compliance-agent -- -D warnings clean
  • cargo test -p compliance-agent --lib232 pass (+3 new):
    • filter_active_keeps_running_skips_frozen_archived
    • deserialize_registry_response_accepts_id_or_tenant_id
    • tenants_from_env_resolution (single test covering unset → default, csv → splits, "" → default — collapsed to one to avoid env-var test races)

Production

  • Set TENANT_REGISTRY_URL in orca-infra alongside KEYCLOAK_URL when the registry is ready. Until then, the scheduler keeps using SCHEDULER_TENANT_IDS — no operator action needed.
  • Future M7.4 cleanup: once tenant-registry adoption is universal, delete SCHEDULER_TENANT_IDS env support entirely.

🤖 Generated with Claude Code

## Summary Replaces the M7.2-C static `SCHEDULER_TENANT_IDS` env enumeration with a live query to the tenant-registry at every tick. New tenants get picked up without an agent restart; the env stays as a fallback so the scheduler is never silenced by a registry outage. **Stacked on #95 (admin endpoints)** since that PR added `tenant_registry_url` to `AgentConfig`. Once #95 lands this auto-retargets to main. ## Resolution order 1. **`agent.config.tenant_registry_url`** → `GET <url>/v1/tenants` - 5s timeout (kept short — we'd rather fall back than block the tick) - Frozen and Archived tenants filtered out (the M7.1 status gate would 402/410 them anyway) - Accepts either `{"id":"..."}` or `{"tenant_id":"..."}` for forward compatibility with whatever shape the registry settles on 2. **`SCHEDULER_TENANT_IDS`** env (comma-separated) — fallback when the registry URL is unset OR the fetch fails OR the parsed response is empty. Each failure mode logs a `warn` with the url so operators see the problem. 3. **`DEFAULT_SCHEDULER_TENANT_ID`** (`"dev"`) — last-ditch fallback so a bare `cargo run` against a clean Mongo still scans the dev tenant. ## Why fresh on every tick Tick frequency is every few hours (default `scan_schedule = "0 0 */6 * * *"`). The registry call happens at most 4 times a day per agent — cheap. Caching introduces a staleness window for newly provisioned tenants, and the whole point of registry integration is to pick them up fast. ## Startup log Includes `tenant source=tenant-registry` or `env` so operators can tell at a glance which mode the scheduler is in. ## Test plan - [x] `cargo fmt --all -- --check` clean - [x] `cargo clippy -p compliance-agent -- -D warnings` clean - [x] `cargo test -p compliance-agent --lib` — **232 pass** (+3 new): - `filter_active_keeps_running_skips_frozen_archived` - `deserialize_registry_response_accepts_id_or_tenant_id` - `tenants_from_env_resolution` (single test covering unset → default, csv → splits, `""` → default — collapsed to one to avoid env-var test races) ## Production - Set `TENANT_REGISTRY_URL` in orca-infra alongside `KEYCLOAK_URL` when the registry is ready. Until then, the scheduler keeps using `SCHEDULER_TENANT_IDS` — no operator action needed. - **Future M7.4 cleanup**: once tenant-registry adoption is universal, delete `SCHEDULER_TENANT_IDS` env support entirely. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sharang added 1 commit 2026-06-18 11:07:39 +00:00
feat(m7.3): scheduler pulls tenants from registry, env as fallback
CI / Check (pull_request) Successful in 8m8s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
608611423b
Replaces the M7.2-C static `SCHEDULER_TENANT_IDS` env enumeration with
a live query to the tenant-registry at every tick. New tenants get
picked up without an agent restart; the env stays as the fallback
when the registry is unreachable so the scheduler is never silenced
by a registry outage.

Resolution order
1. agent.config.tenant_registry_url → GET <url>/v1/tenants
   - 5s timeout (kept short — we'd rather fall back than block the
     tick)
   - Frozen and Archived tenants filtered out (the M7.1 status gate
     would 402/410 them anyway, no point scanning their repos)
   - Accepts either {"id"} or {"tenant_id"} for forward compatibility
     with whatever shape the registry settles on
2. SCHEDULER_TENANT_IDS env (comma-separated) — fallback when the
   registry URL is unset OR the fetch fails OR the parsed response is
   empty. Each failure mode logs a warn with the url so operators see
   the problem.
3. DEFAULT_SCHEDULER_TENANT_ID ("dev") — last-ditch fallback so a
   bare `cargo run` against a clean Mongo still scans the dev tenant.

Why each tick instead of caching
- Tick frequency is every few hours (scan_schedule default
  "0 0 */6 * * *"). The registry call is at most 4 times a day per
  agent — cheap.
- Caching introduces a staleness window for newly provisioned
  tenants. The whole point of registry integration is to pick them
  up fast.

Startup log
- Includes "tenant source=tenant-registry" or "env" so operators can
  tell at a glance which mode the scheduler is in.

Test plan
- cargo fmt --all clean
- cargo clippy -p compliance-agent -- -D warnings clean
- cargo test -p compliance-agent --lib — 232 pass (+3 new):
    * filter_active_keeps_running_skips_frozen_archived
    * deserialize_registry_response_accepts_id_or_tenant_id (covers
      the {"id"|"tenant_id"} alias)
    * tenants_from_env_resolution (single test covering unset →
      default, csv → splits, "" → default — collapsed to one to
      avoid env-var test races)

Production
- Set TENANT_REGISTRY_URL in orca-infra alongside KEYCLOAK_URL when
  the registry is ready to serve. Until then, scheduler keeps using
  SCHEDULER_TENANT_IDS — no operator action needed.
- Future M7.4 cleanup: once tenant-registry adoption is universal,
  delete SCHEDULER_TENANT_IDS env support entirely.

Stacked on #95 (admin endpoints) since that PR added
tenant_registry_url to AgentConfig. Once #95 lands this auto-
retargets to main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some checks are pending
CI / Check (pull_request) Successful in 8m8s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/m7.3-scheduler-tenant-registry-v2:feat/m7.3-scheduler-tenant-registry-v2
git checkout feat/m7.3-scheduler-tenant-registry-v2
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sharang/compliance-scanner-agent#96