feat(m7.3): scheduler pulls tenants from registry, env as fallback #96
Reference in New Issue
Block a user
Delete Branch "feat/m7.3-scheduler-tenant-registry-v2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Replaces the M7.2-C static
SCHEDULER_TENANT_IDSenv enumeration with a live query to the tenant-registry at every tick. New tenants get picked up without an agent restart; the env stays as a fallback so the scheduler is never silenced by a registry outage.Stacked on #95 (admin endpoints) since that PR added
tenant_registry_urltoAgentConfig. Once #95 lands this auto-retargets to main.Resolution order
agent.config.tenant_registry_url→GET <url>/v1/tenants{"id":"..."}or{"tenant_id":"..."}for forward compatibility with whatever shape the registry settles onSCHEDULER_TENANT_IDSenv (comma-separated) — fallback when the registry URL is unset OR the fetch fails OR the parsed response is empty. Each failure mode logs awarnwith the url so operators see the problem.DEFAULT_SCHEDULER_TENANT_ID("dev") — last-ditch fallback so a barecargo runagainst a clean Mongo still scans the dev tenant.Why fresh on every tick
Tick frequency is every few hours (default
scan_schedule = "0 0 */6 * * *"). The registry call happens at most 4 times a day per agent — cheap. Caching introduces a staleness window for newly provisioned tenants, and the whole point of registry integration is to pick them up fast.Startup log
Includes
tenant source=tenant-registryorenvso operators can tell at a glance which mode the scheduler is in.Test plan
cargo fmt --all -- --checkcleancargo clippy -p compliance-agent -- -D warningscleancargo test -p compliance-agent --lib— 232 pass (+3 new):filter_active_keeps_running_skips_frozen_archiveddeserialize_registry_response_accepts_id_or_tenant_idtenants_from_env_resolution(single test covering unset → default, csv → splits,""→ default — collapsed to one to avoid env-var test races)Production
TENANT_REGISTRY_URLin orca-infra alongsideKEYCLOAK_URLwhen the registry is ready. Until then, the scheduler keeps usingSCHEDULER_TENANT_IDS— no operator action needed.SCHEDULER_TENANT_IDSenv support entirely.🤖 Generated with Claude Code
Replaces the M7.2-C static `SCHEDULER_TENANT_IDS` env enumeration with a live query to the tenant-registry at every tick. New tenants get picked up without an agent restart; the env stays as the fallback when the registry is unreachable so the scheduler is never silenced by a registry outage. Resolution order 1. agent.config.tenant_registry_url → GET <url>/v1/tenants - 5s timeout (kept short — we'd rather fall back than block the tick) - Frozen and Archived tenants filtered out (the M7.1 status gate would 402/410 them anyway, no point scanning their repos) - Accepts either {"id"} or {"tenant_id"} for forward compatibility with whatever shape the registry settles on 2. SCHEDULER_TENANT_IDS env (comma-separated) — fallback when the registry URL is unset OR the fetch fails OR the parsed response is empty. Each failure mode logs a warn with the url so operators see the problem. 3. DEFAULT_SCHEDULER_TENANT_ID ("dev") — last-ditch fallback so a bare `cargo run` against a clean Mongo still scans the dev tenant. Why each tick instead of caching - Tick frequency is every few hours (scan_schedule default "0 0 */6 * * *"). The registry call is at most 4 times a day per agent — cheap. - Caching introduces a staleness window for newly provisioned tenants. The whole point of registry integration is to pick them up fast. Startup log - Includes "tenant source=tenant-registry" or "env" so operators can tell at a glance which mode the scheduler is in. Test plan - cargo fmt --all clean - cargo clippy -p compliance-agent -- -D warnings clean - cargo test -p compliance-agent --lib — 232 pass (+3 new): * filter_active_keeps_running_skips_frozen_archived * deserialize_registry_response_accepts_id_or_tenant_id (covers the {"id"|"tenant_id"} alias) * tenants_from_env_resolution (single test covering unset → default, csv → splits, "" → default — collapsed to one to avoid env-var test races) Production - Set TENANT_REGISTRY_URL in orca-infra alongside KEYCLOAK_URL when the registry is ready to serve. Until then, scheduler keeps using SCHEDULER_TENANT_IDS — no operator action needed. - Future M7.4 cleanup: once tenant-registry adoption is universal, delete SCHEDULER_TENANT_IDS env support entirely. Stacked on #95 (admin endpoints) since that PR added tenant_registry_url to AgentConfig. Once #95 lands this auto- retargets to main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.