fix(dashboard): attach Keycloak token on agent API calls #90

Merged
sharang merged 6 commits from fix/dashboard-bearer-token into main 2026-06-17 18:36:06 +00:00
Owner

Symptom

Dashboard shows "unable to load repositories." Agent returns Missing authorization header 401 on every protected endpoint.

Root cause

The Keycloak OIDC flow (infrastructure::auth::auth_login / auth_callback) was already wired up and storing the user's access_token + refresh_token in tower-sessions on login. But the dashboard's #[server] functions then called reqwest::get(...) / reqwest::Client::new().post(...) against the compliance-agent without attaching the access token they already had. Agent's M7.1 require_jwt_auth middleware rejects with 401.

The Authentication required text we saw on probes earlier comes from the dashboard's own auth_middleware::require_auth — not Traefik — confirming the dashboard server is the front door and the agent is the protected backend.

Fix

New infrastructure::agent_client module:

pub async fn agent_request(method: Method, path: &str)
    -> Result<reqwest::RequestBuilder, ServerFnError>
pub async fn agent_get(path: &str)
    -> Result<reqwest::RequestBuilder, ServerFnError>

Both build a RequestBuilder for <agent_api_url><path>, pull the session's access_token via FullstackContext::extract, and attach Authorization: Bearer <token>. Short-circuit (no auth header) when Keycloak isn't configured — matches the dashboard's own require_auth short-circuit in the same state.

Then every #[server] function in:

  • chat, dast, findings, graph, issues, notifications, pentest, repositories, sbom, scans, stats

was migrated. 57 call sites total, all replaced.

What's deliberately left alone

  • infrastructure::server::webhook_proxy — forwards to the agent's separate webhook server (port 3002), authenticated via per-repo HMAC, not JWT. Bearer would do nothing useful here.
  • infrastructure::auth::auth_callback — performs the Keycloak token exchange itself. Adding bearer auth would be circular.

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy -p compliance-dashboard --features server -- -D warnings clean
  • cargo check -p compliance-dashboard --features server clean
  • cargo check -p compliance-dashboard (web/wasm target) clean
  • Manual after deploy: dashboard's repositories page loads, network tab shows Authorization: Bearer <token> on outgoing API calls

Deploy

Standard ORCA flow once the image rebuilds:

cd ~/orca && git pull && orca deploy compliance-dashboard

🤖 Generated with Claude Code

## Symptom Dashboard shows "unable to load repositories." Agent returns `Missing authorization header` 401 on every protected endpoint. ## Root cause The Keycloak OIDC flow (`infrastructure::auth::auth_login` / `auth_callback`) was already wired up and storing the user's `access_token` + `refresh_token` in tower-sessions on login. But the dashboard's `#[server]` functions then called `reqwest::get(...)` / `reqwest::Client::new().post(...)` against the compliance-agent **without attaching the access token they already had**. Agent's M7.1 `require_jwt_auth` middleware rejects with 401. The `Authentication required` text we saw on probes earlier comes from the dashboard's own `auth_middleware::require_auth` — not Traefik — confirming the dashboard server is the front door and the agent is the protected backend. ## Fix New `infrastructure::agent_client` module: ```rust pub async fn agent_request(method: Method, path: &str) -> Result<reqwest::RequestBuilder, ServerFnError> pub async fn agent_get(path: &str) -> Result<reqwest::RequestBuilder, ServerFnError> ``` Both build a `RequestBuilder` for `<agent_api_url><path>`, pull the session's `access_token` via `FullstackContext::extract`, and attach `Authorization: Bearer <token>`. Short-circuit (no auth header) when Keycloak isn't configured — matches the dashboard's own `require_auth` short-circuit in the same state. Then every `#[server]` function in: - `chat`, `dast`, `findings`, `graph`, `issues`, `notifications`, `pentest`, `repositories`, `sbom`, `scans`, `stats` was migrated. **57 call sites total**, all replaced. ## What's deliberately left alone - `infrastructure::server::webhook_proxy` — forwards to the agent's separate webhook server (port 3002), authenticated via per-repo HMAC, not JWT. Bearer would do nothing useful here. - `infrastructure::auth::auth_callback` — performs the Keycloak token exchange itself. Adding bearer auth would be circular. ## Test plan - [x] `cargo fmt --all -- --check` clean - [x] `cargo clippy -p compliance-dashboard --features server -- -D warnings` clean - [x] `cargo check -p compliance-dashboard --features server` clean - [x] `cargo check -p compliance-dashboard` (web/wasm target) clean - [ ] Manual after deploy: dashboard's repositories page loads, network tab shows `Authorization: Bearer <token>` on outgoing API calls ## Deploy Standard ORCA flow once the image rebuilds: ```bash cd ~/orca && git pull && orca deploy compliance-dashboard ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sharang added 6 commits 2026-06-17 18:31:51 +00:00
feat(m7.2-A): introduce per-tenant DatabasePool
CI / Check (pull_request) Successful in 8m40s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
e3aabe7d18
First slice of the M7.2 tenant-isolation work. Adds a `DatabasePool`
that hands out per-tenant `Database` handles physically scoped to
`<prefix>_<tenant_id>` Mongo databases. Isolation is at the driver,
not at "we hope we filter" — a handle for tenant A literally cannot
see tenant B's documents because it's connected to a different db.

What's in this PR
- DatabasePool::connect — pings the cluster, prepares per-tenant lazy
  handles.
- DatabasePool::for_tenant(&TenantContext) — returns a Database scoped
  to that tenant. ensure_indexes runs once per tenant per process via
  a DashMap-backed marker; failure rolls the marker back so the next
  request retries.
- tenant_db_name — `<prefix>_<sanitized_tenant_id>` if it fits in
  Mongo's 63-byte db-name cap, else `<prefix>_<sha256-16hex>` fallback.
- Sanitizer rewrites the Mongo-disallowed chars (`/ \ . " $ <space>
  NUL`) so any future tenant_id shape works.
- ComplianceAgent gains a `db_pool: DatabasePool` field next to the
  existing `db: Database`. Handlers / pipelines / webhooks still use
  `db` — they migrate to `db_pool.for_tenant(&ctx)` in M7.2-B/C and
  `db` goes away in M7.2-D.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard -- -D warnings
  clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 4 pass
  against live mongo on 27017:
    * pool_isolates_tenants_at_driver_level — writes for acme + globex,
      reads through each tenant's handle; each sees exactly its own
      data with no filter doc anywhere.
    * for_tenant_is_idempotent_index_creation — second + third call
      for the same tenant do not error.
    * tenant_db_name_sanitizes_unsafe_characters
    * tenant_db_name_falls_back_to_hash_when_too_long — 100-byte
      tenant_id collapses to a stable 8-byte hex suffix.

Why per-tenant DB vs `tenant_id` field + filter
- Driver-level isolation; impossible to forget the filter on one of
  the 184 query call-sites in compliance-agent.
- Handlers don't change shape at migration — `agent.db.findings()`
  becomes `db.findings()` after pulling `db` from
  `agent.db_pool.for_tenant(&ctx)`.
- GDPR delete = `db.dropDatabase()`.
- On-prem deploy = the same code path, with one tenant.
- Trade-off accepted: index storage duplicated per tenant; Mongo's
  ~thousand-db ceiling is way above the 10s-100s tenants we're
  targeting.

Caveats
- Existing `agent.db` continues to point at the single legacy db.
  Handlers / pipelines that use it are unscoped until M7.2-B/C
  migrate them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fixup(m7.2-A): validate db_prefix at connect, bump hash to 16 bytes
CI / Check (pull_request) Successful in 8m29s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
003835764e
Addresses review feedback on the hash-fallback path.

The original `debug_assert!(hashed.len() <= MAX_DB_NAME_LEN)` was a
runtime hack that vanished in release builds. With an 8-byte hash
truncation (~2^32 birthday-collision resistance), two tenant_ids
hashing to the same suffix would silently share a database — no
panic, no rollback, just cross-tenant data leak. Not acceptable for
a regulated-industry product.

Changes:
- Bump hash truncation 8 → 16 bytes (32 hex chars). 2^64 birthday
  resistance — collision-impossible at our scale.
- Add MAX_PREFIX_LEN (= 30) and validate db_prefix.len() at
  `DatabasePool::connect`. The runtime hash-fallback arithmetic is
  now provably within Mongo's 63-byte cap; drop the debug_assert!.
- New test `connect_rejects_overlong_db_prefix` exercises the
  inclusive bound (30 passes, 31 fails).
- Existing hash-fallback test now asserts a 32-char hex suffix +
  basic distinctness for two different inputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(m7.2-B): migrate API handlers to per-tenant database pool
CI / Check (pull_request) Successful in 8m9s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
cdfbb62f9d
Builds on PR M7.2-A. Every HTTP handler in compliance-agent/src/api/
now takes a TenantCtx extractor and pulls a tenant-scoped Database
from agent.db_pool.for_tenant(&ctx). The query bodies are unchanged —
`db.findings().find(doc! {...})` reads from the tenant's own physical
database, so the filter doc cannot leak data across tenants because
the wrong tenant's data is literally on a different db handle.

Changes
- New `dto::tenant_db(&agent, &tenant) -> Result<Database, StatusCode>`
  helper. Every migrated handler calls it at the top of the body
  instead of `let db = &agent.db;`. 500 on the rare pool failure;
  4xx auth failures are already handled by the M7.1 status gate.
- New `api::server::inject_dev_tenant` middleware mounted only when
  Keycloak is NOT configured. Synthesizes a TenantContext with
  tenant_id = $DEV_TENANT_ID (default `dev`) so `cargo run` against
  a bare Mongo + no KC still serves the API. Logged loudly as
  "DO NOT use in any environment with real customer data".
- Test harness: TestServer mounts inject_dev_tenant so existing E2E
  tests reach handlers; cleanup() now drops every <db_name>_*
  per-tenant database, not just the legacy <db_name>.

Files migrated (handler count, all pass `cargo build`):
- chat.rs (3) — also rewires RagPipeline + EmbeddingStore to the
  tenant DB's inner() so vector search is per-tenant
- dast.rs (5)
- findings.rs (5)
- graph.rs (7) — also rewires GraphStore inside trigger_build's
  spawn to the tenant DB
- health.rs (1) — stats_overview migrated; public /health stays
  un-scoped
- issues.rs (1)
- notifications.rs (5)
- pentest_handlers/session.rs (12) — both wizard + legacy paths,
  plus pause/resume/stop/get_attack_chain/get_messages/
  get_session_findings/lookup_repo. PentestOrchestrator now gets
  the tenant DB clone in its spawn.
- pentest_handlers/export.rs (1) — fans out across sessions,
  attack_chain_nodes, dast_findings, findings, sbom_entries,
  graph_nodes from a single tenant_db acquisition
- pentest_handlers/stats.rs (1)
- pentest_handlers/stream.rs (1) — SSE handler verifies session
  via the tenant DB before subscribing
- repos.rs (6)
- sbom.rs (5)
- scans.rs (1)

help_chat.rs has no DB queries and was skipped.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard
  -- -D warnings clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 5 pass
  (driver-level isolation still holds post-handler migration)
- cargo test -p compliance-agent --test tenant_status_middleware
  — 6 pass

What's not yet migrated (PR-C / PR-D)
- scheduler.rs (6 sites), pipeline/orchestrator.rs (14),
  pentest/orchestrator.rs (13), webhooks (gitea/github/gitlab),
  trackers/jira.rs, pipeline/dedup.rs etc. — background paths
  without a JWT-derived tenant context.
- agent.db is still in the ComplianceAgent struct as a transitional
  handle for those paths. PR-D removes it once PR-C migrates the
  background paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(m7.2-C): migrate background paths to per-tenant pool
CI / Check (pull_request) Successful in 10m33s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
0f6dd1135e
Closes the loop on M7.2 isolation for paths that don't have a JWT
context: scheduler, webhooks, and the agent's `run_scan` / `run_pr_review`
helpers all now take a `tenant_id` at the boundary and resolve to a
tenant-scoped `Database` via `db_pool.for_tenant_id(...)`. Internal
orchestrators (PipelineOrchestrator, PentestOrchestrator) and pipeline
helpers were already DB-agnostic — they take `db: Database` at
construction and don't care which tenant it points to.

Changes
- DatabasePool::for_tenant_id(&str) — same as for_tenant but accepts
  a bare tenant_id. Background paths don't have a full TenantContext.
  for_tenant is now a thin wrapper that delegates.
- agent.run_scan(tenant_id, repo_id, trigger) — pulls the tenant
  database before constructing the PipelineOrchestrator. Was:
  run_scan(repo_id, trigger) reading agent.db.
- agent.run_pr_review(tenant_id, repo_id, ...) — same shape.
- Webhook routes change: /webhook/{tenant_id}/{platform}/{repo_id}.
  Tenant is part of the URL path because webhooks arrive without a
  JWT — they're authenticated via per-repo HMAC, not the tenant gate.
  The dashboard surfaces the full per-tenant URL when the repo is
  registered. All three handlers (gitea, github, gitlab) updated.
- scheduler.rs — iterates tenants from $SCHEDULER_TENANT_IDS
  (comma-separated env), or DEV_TENANT_ID's `dev` default. Both
  scan_all_repos and monitor_cves now run once per configured
  tenant. M7.2-D will replace this static config with a pull from
  the tenant-registry.
- api/handlers/repos.rs::trigger_scan now passes tenant.0.tenant_id.

What's unchanged because it didn't need to change
- PipelineOrchestrator, PentestOrchestrator: take `db: Database` at
  construction — they're tenant-DB-agnostic by design. The caller
  picks the tenant DB.
- pipeline/{dedup,graph_build,issue_creation,sbom/mod}.rs,
  pentest/{context,report/html/*}.rs, trackers/jira.rs, llm/triage.rs:
  take `&Database` or `&mongodb::Database` as args, transitively
  tenant-scoped via the caller.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard
  -- -D warnings clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 5 pass
- cargo test -p compliance-agent --test tenant_status_middleware
  — 6 pass

What's left (PR-D)
- Drop the transitional agent.db field — no remaining call sites
  (verified by `grep -rn "agent\.db\b" compliance-agent/src`).
- main.rs / TestServer stop building the legacy Database; only the
  pool remains.
- Add cross-tenant admin helpers (list tenants, drop tenant DB) on
  the pool for offboarding flows.
- Pull tenants from the tenant-registry instead of an env var.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(m7.2-D): drop transitional agent.db, add admin helpers
CI / Check (pull_request) Successful in 9m27s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
08c4ec4cff
Final slice of M7.2. Removes the transitional single-database handle
that M7.2-A introduced alongside the pool, so the compliance-agent
now has a single source of truth for storage: every code path obtains
a tenant-scoped Database from `agent.db_pool.for_tenant_id(...)` or
`for_tenant(&ctx)`. There is no shared "default" database anywhere.

Changes
- ComplianceAgent: `db: Database` field removed. ComplianceAgent::new
  now takes only `(config, db_pool)`. Verified by an earlier grep
  during M7.2-C that no remaining call site reads `agent.db`.
- main.rs: stops constructing the legacy Database. Only the pool is
  built at startup.
- TestServer: same — drops Database::connect/ensure_indexes, builds
  only the pool. cleanup() now drops every `<db_name>_*` per-tenant
  database (no longer touches a bare `<db_name>`).
- DatabasePool::list_tenant_db_names() — lists Mongo databases
  matching the pool's prefix. For admin endpoints + scheduler tenant
  enumeration in a future M7.3 (this PR keeps SCHEDULER_TENANT_IDS
  env config — registry integration is a separate concern).
- DatabasePool::drop_tenant(&str) — idempotent tenant offboarding.
  Drops the per-tenant database and evicts the in-memory `ensured`
  marker so a later re-provision re-runs ensure_indexes.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard
  -- -D warnings clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 6 pass
  including new `admin_helpers_list_and_drop_tenant_dbs`
- cargo test -p compliance-agent --test tenant_status_middleware
  — 6 pass

M7.2 closeout state after this lands
- M7.1 (auth + status) — done
- M7.2-A (pool) — done
- M7.2-B (handlers) — done
- M7.2-C (background paths) — done
- M7.2-D (legacy db removed, admin helpers) — done (this PR)
- Future M7.3: scheduler pulls tenants from tenant-registry instead
  of SCHEDULER_TENANT_IDS env; cross-tenant admin HTTP endpoints
  built on list_tenant_db_names / drop_tenant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(dashboard): attach Keycloak token on agent API calls
CI / Check (pull_request) Successful in 8m12s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
dcec519565
Symptom: "unable to load repositories" — agent returns
"Missing authorization header" 401 on every protected endpoint
because the dashboard's server functions were calling reqwest::get
without an Authorization header. The Keycloak OIDC flow
(auth_login / auth_callback) was already wired up and storing the
access_token in tower-sessions, but the access_token was never
threaded into outbound calls.

Fix
- New `infrastructure::agent_client` module exposes:
  - `agent_request(method, path) -> RequestBuilder`
  - `agent_get(path) -> RequestBuilder` (sugar for GET)
  Both pull the session's access_token (via FullstackContext extract)
  and attach `Authorization: Bearer <token>`. When Keycloak is not
  configured the helper short-circuits — matching the dashboard's
  require_auth middleware which short-circuits in the same state.
- Migrated every #[server] function in:
  - chat, dast, findings, graph, issues, notifications, pentest,
    repositories, sbom, scans, stats
  - 57 call sites total, all replaced.
- Left as-is:
  - `infrastructure::server::webhook_proxy` — forwards to the agent's
    separate webhook server (port 3002), which is HMAC-authenticated,
    not JWT-authenticated.
  - `infrastructure::auth::auth_callback` — performs the KC token
    exchange itself; bearer auth would be circular.

Test plan
- cargo fmt --all clean
- cargo clippy -p compliance-dashboard --features server -- -D warnings
  clean
- cargo check -p compliance-dashboard --features server clean
- cargo check -p compliance-dashboard (web target) implicit via build
- Manual: after deploy, dashboard's repositories page loads without
  401; calls now carry Authorization: Bearer header to the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sharang merged commit 56482911b8 into main 2026-06-17 18:36:05 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sharang/compliance-scanner-agent#90