fix(dashboard): attach Keycloak token on agent API calls

Symptom: "unable to load repositories" — agent returns "Missing authorization header" 401 on every protected endpoint because the dashboard's server functions were calling reqwest::get without an Authorization header. The Keycloak OIDC flow (auth_login / auth_callback) was already wired up and storing the access_token in tower-sessions, but the access_token was never threaded into outbound calls. Fix - New `infrastructure::agent_client` module exposes: - `agent_request(method, path) -> RequestBuilder` - `agent_get(path) -> RequestBuilder` (sugar for GET) Both pull the session's access_token (via FullstackContext extract) and attach `Authorization: Bearer <token>`. When Keycloak is not configured the helper short-circuits — matching the dashboard's require_auth middleware which short-circuits in the same state. - Migrated every #[server] function in: - chat, dast, findings, graph, issues, notifications, pentest, repositories, sbom, scans, stats - 57 call sites total, all replaced. - Left as-is: - `infrastructure::server::webhook_proxy` — forwards to the agent's separate webhook server (port 3002), which is HMAC-authenticated, not JWT-authenticated. - `infrastructure::auth::auth_callback` — performs the KC token exchange itself; bearer auth would be circular. Test plan - cargo fmt --all clean - cargo clippy -p compliance-dashboard --features server -- -D warnings clean - cargo check -p compliance-dashboard --features server clean - cargo check -p compliance-dashboard (web target) implicit via build - Manual: after deploy, dashboard's repositories page loads without 401; calls now carry Authorization: Bearer header to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(m7.2-D): drop transitional agent.db, add admin helpers
2026-06-17 20:31:21 +02:00 · 2026-06-17 15:05:27 +02:00 · 2026-06-17 15:00:37 +02:00 · 2026-06-17 13:28:33 +02:00 · 2026-06-17 13:16:46 +02:00 · 2026-06-17 11:58:24 +02:00
9 changed files with 13 additions and 324 deletions
@@ -1,115 +0,0 @@
 //! Cross-tenant admin endpoints (`/api/v1/admin/*`).
 //!
 //! Operator-only. Auth is a **static bearer token** (`ADMIN_API_TOKEN`
 //! env on the agent) — explicitly NOT a Keycloak JWT, because the
 //! whole point of these endpoints is to operate ACROSS tenants. A
 //! customer JWT (which always carries a single tenant_id) has no
 //! business mounting them.
 //!
 //! Routes are only registered when `ADMIN_API_TOKEN` is set. With no
 //! token, the endpoints don't exist at all (404), which is a stronger
 //! guarantee than "401 if you guess the path".
 //!
 //! Operations:
 //! - `GET    /api/v1/admin/tenants`              — list tenant DBs
 //! - `DELETE /api/v1/admin/tenants/{tenant_id}`  — GDPR delete
 //!
 //! Tenant ids in URLs are passed as-is to `DatabasePool::drop_tenant`,
 //! which sanitises them the same way it does for creation. Listing
 //! returns the raw DB names from `list_tenant_db_names` — operators
 //! can reverse-derive the tenant_id from the prefix.
 use axum::extract::{Extension, Path, Request};
 use axum::http::{header, StatusCode};
 use axum::middleware::Next;
 use axum::response::{IntoResponse, Response};
 use axum::Json;
 use secrecy::ExposeSecret;
 use serde::Serialize;
 use super::dto::AgentExt;
 #[derive(Serialize)]
 pub struct ListTenantDbsResponse {
    pub tenant_db_names: Vec<String>,
 }
 #[tracing::instrument(skip_all)]
 pub async fn list_tenant_dbs(
    Extension(agent): AgentExt,
 ) -> Result<Json<ListTenantDbsResponse>, StatusCode> {
    let names = agent.db_pool.list_tenant_db_names().await.map_err(|e| {
        tracing::error!("admin: list_tenant_db_names failed: {e}");
        StatusCode::INTERNAL_SERVER_ERROR
    })?;
    Ok(Json(ListTenantDbsResponse {
        tenant_db_names: names,
    }))
 }
 #[tracing::instrument(skip_all, fields(tenant_id = %tenant_id))]
 pub async fn drop_tenant_db(
    Extension(agent): AgentExt,
    Path(tenant_id): Path<String>,
 ) -> Result<Json<serde_json::Value>, StatusCode> {
    agent.db_pool.drop_tenant(&tenant_id).await.map_err(|e| {
        tracing::error!("admin: drop_tenant failed: {e}");
        StatusCode::INTERNAL_SERVER_ERROR
    })?;
    Ok(Json(serde_json::json!({ "status": "dropped" })))
 }
 /// Constant-time-ish comparison of the configured admin token against
 /// the incoming bearer. Uses `subtle`-style byte equality so timing
 /// attacks can't probe the token character by character.
 fn tokens_eq(a: &str, b: &str) -> bool {
    if a.len() != b.len() {
        return false;
    }
    let mut diff = 0u8;
    for (x, y) in a.bytes().zip(b.bytes()) {
        diff |= x ^ y;
    }
    diff == 0
 }
 /// Middleware enforcing the static `ADMIN_API_TOKEN`. Mounted only on
 /// the admin sub-router, so this never runs on customer routes.
 pub async fn require_admin_token(
    Extension(agent): AgentExt,
    request: Request,
    next: Next,
 ) -> Response {
    let Some(expected) = agent.config.admin_api_token.as_ref() else {
        // Belt-and-braces — if the routes were somehow mounted without
        // a token configured, refuse rather than no-op-pass.
        return (StatusCode::NOT_FOUND, "admin disabled").into_response();
    };
    let presented = request
        .headers()
        .get(header::AUTHORIZATION)
        .and_then(|v| v.to_str().ok())
        .and_then(|s| s.strip_prefix("Bearer "))
        .map(|s| s.trim());
    let Some(presented) = presented.filter(|s| !s.is_empty()) else {
        return (StatusCode::UNAUTHORIZED, "Missing bearer token").into_response();
    };
    if !tokens_eq(presented, expected.expose_secret()) {
        return (StatusCode::UNAUTHORIZED, "Invalid admin token").into_response();
    }
    next.run(request).await
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    #[test]
    fn tokens_eq_basic() {
        assert!(tokens_eq("abc", "abc"));
        assert!(!tokens_eq("abc", "abd"));
        assert!(!tokens_eq("abc", "abcd"));
        assert!(!tokens_eq("", "x"));
        assert!(tokens_eq("", ""));
    }
 }
@@ -1,4 +1,3 @@
 pub mod admin;
 pub mod chat;
 pub mod dast;
 pub mod dto;
@@ -4,8 +4,7 @@ use axum::extract::Request;
 use axum::http::HeaderValue;
 use axum::middleware::Next;
 use axum::response::Response;
-use axum::routing::{delete, get};
+use axum::{middleware, Extension};
 use axum::{middleware, Extension, Router};
 use tokio::sync::RwLock;
 use tower_http::cors::CorsLayer;
 use tower_http::set_header::SetResponseHeaderLayer;
@@ -15,7 +14,6 @@ use compliance_core::auth::{require_jwt_auth, require_tenant_status, JwksState};
 use compliance_core::{TenantContext, TenantStatus};
 use crate::agent::ComplianceAgent;
 use crate::api::handlers;
 use crate::api::routes;
 use crate::error::AgentError;
@@ -52,28 +50,7 @@ pub async fn inject_dev_tenant(mut request: Request, next: Next) -> Response {
 }
 pub async fn start_api_server(agent: ComplianceAgent, port: u16) -> Result<(), AgentError> {
    // Admin sub-router. Routes are only mounted when ADMIN_API_TOKEN is
    // configured — without it, the paths don't exist at all (404 rather
    // than 401), so an operator who hasn't opted in can't fingerprint
    // the surface area.
    let admin_router: Router = if agent.config.admin_api_token.is_some() {
        tracing::info!("Admin API enabled — /api/v1/admin/* mounted behind ADMIN_API_TOKEN bearer");
        Router::new()
            .route(
                "/api/v1/admin/tenants",
                get(handlers::admin::list_tenant_dbs),
            )
            .route(
                "/api/v1/admin/tenants/{tenant_id}",
                delete(handlers::admin::drop_tenant_db),
            )
            .layer(middleware::from_fn(handlers::admin::require_admin_token))
    } else {
        Router::new()
    };
    let mut app = routes::build_router()
        .merge(admin_router)
        .layer(Extension(Arc::new(agent.clone())))
        .layer(CorsLayer::permissive())
        .layer(TraceLayer::new_for_http())
@@ -59,7 +59,5 @@ pub fn load_config() -> Result<AgentConfig, AgentError> {
            .unwrap_or(true),
        pentest_imap_username: env_var_opt("PENTEST_IMAP_USERNAME"),
        pentest_imap_password: env_secret_opt("PENTEST_IMAP_PASSWORD"),
        admin_api_token: env_secret_opt("ADMIN_API_TOKEN"),
        tenant_registry_url: env_var_opt("TENANT_REGISTRY_URL"),
    })
 }
@@ -339,8 +339,6 @@ mod tests {
            pentest_imap_tls: true,
            pentest_imap_username: None,
            pentest_imap_password: None,
            admin_api_token: None,
            tenant_registry_url: None,
        }
    }
@@ -66,8 +66,6 @@ impl TestServer {
            pentest_imap_tls: false,
            pentest_imap_username: None,
            pentest_imap_password: None,
            admin_api_token: None,
            tenant_registry_url: None,
        };
        let agent = ComplianceAgent::new(config, db_pool);
@@ -63,24 +63,16 @@ struct Claims {
 const PUBLIC_ENDPOINTS: &[&str] = &["/api/v1/health"];
 /// Path prefixes that bypass JWT validation. The admin sub-router
 /// (`/api/v1/admin/*`) has its own static-bearer middleware and must
 /// not be routed through the customer-JWT path — a Keycloak token
 /// always carries a single tenant_id and would semantically conflict
 /// with cross-tenant admin operations.
 const PUBLIC_PREFIXES: &[&str] = &["/api/v1/admin/"];
 /// Middleware that validates Bearer JWT tokens against Keycloak's JWKS
 /// and attaches a `TenantContext` extension on success.
 ///
-/// Skips validation for the health endpoint and any path under one of
+/// Skips validation for the health endpoint.
-/// the [`PUBLIC_PREFIXES`]. If `JwksState` is not present (Keycloak
+/// If `JwksState` is not present (Keycloak not configured), requests
-/// not configured), requests pass through and downstream code must
+/// pass through and downstream code must handle the missing context.
 /// handle the missing context.
 pub async fn require_jwt_auth(mut request: Request, next: Next) -> Response {
    let path = request.uri().path();
-    if PUBLIC_ENDPOINTS.contains(&path) || PUBLIC_PREFIXES.iter().any(|p| path.starts_with(p)) {
+    if PUBLIC_ENDPOINTS.contains(&path) {
        return next.run(request).await;
    }
@@ -37,15 +37,6 @@ pub struct AgentConfig {
    pub pentest_imap_tls: bool,
    pub pentest_imap_username: Option<String>,
    pub pentest_imap_password: Option<SecretString>,
    /// Static bearer for the cross-tenant admin endpoints under
    /// `/api/v1/admin/*`. When `None`, those endpoints are not
    /// mounted at all (defense-in-depth: ops endpoints never reach
    /// any auth path if no operator has explicitly opted in).
    pub admin_api_token: Option<SecretString>,
    /// Live tenant-registry URL the scheduler consults for the list
    /// of tenants to iterate. When `None` or unreachable, scheduler
    /// falls back to `SCHEDULER_TENANT_IDS` env (M7.2-C).
    pub tenant_registry_url: Option<String>,
 }
 #[derive(Clone, Debug, Serialize, Deserialize)]
@@ -9,16 +9,7 @@
 //! When Keycloak is not configured (dev convenience), the helper
 //! returns an unauthenticated builder — matching the agent's
 //! pass-through behavior in the same state.
 //!
 //! **Token refresh**: KC access tokens are short-lived (5 min default
 //! in the certifai realm). Before attaching, we decode the JWT's `exp`
 //! claim and proactively refresh via the stored refresh_token if the
 //! access token is expired or about to expire. The session is updated
 //! with the new pair. If refresh fails, we send the (stale) token
 //! anyway — the agent's 401 will surface to the UI, which can prompt
 //! re-login.
 use base64::{engine::general_purpose::URL_SAFE_NO_PAD, Engine};
 use dioxus::prelude::ServerFnError;
 use dioxus_fullstack::FullstackContext;
 use reqwest::Method;
@@ -27,11 +18,6 @@ use super::auth::LOGGED_IN_USER_SESS_KEY;
 use super::server_state::ServerState;
 use super::user_state::UserStateInner;
 /// Seconds before the JWT's `exp` time at which we consider it stale
 /// enough to refresh. Covers clock skew + the round-trip to the agent
 /// so the token doesn't expire mid-flight.
 const REFRESH_SKEW_SECS: i64 = 30;
 /// Build a `RequestBuilder` for `<agent_api_url><path>` with the
 /// session's access token attached. `path` should include a leading
 /// `/`, e.g. `"/api/v1/repositories"`.
@@ -52,9 +38,10 @@ pub async fn agent_get(path: &str) -> Result<reqwest::RequestBuilder, ServerFnEr
 }
 /// Attach the session's bearer token if Keycloak is configured AND the
-/// session has a logged-in user. Refresh the token proactively if it's
+/// session has a logged-in user. Otherwise leave the request as-is.
-/// expired or about to expire. Persists refreshed tokens back into the
+///
-/// session.
+/// The Keycloak-disabled path mirrors the dashboard's `require_auth`
 /// middleware, which short-circuits when `state.keycloak.is_none()`.
 async fn attach_token(
    req: reqwest::RequestBuilder,
    state: &ServerState,
@@ -67,144 +54,8 @@ async fn attach_token(
        .get(LOGGED_IN_USER_SESS_KEY)
        .await
        .map_err(|e| ServerFnError::new(format!("session read failed: {e}")))?;
-    let Some(mut user) = user else {
+    Ok(match user {
-        return Ok(req);
+        Some(u) => req.bearer_auth(u.access_token),
-    };
+        None => req,
-
+    })
    if token_needs_refresh(&user.access_token) {
        tracing::debug!("Access token expired or near-expiring; refreshing");
        match refresh_tokens(state, &user.refresh_token).await {
            Ok((new_access, new_refresh)) => {
                user.access_token = new_access;
                if let Some(rt) = new_refresh {
                    user.refresh_token = rt;
                }
                if let Err(e) = session.insert(LOGGED_IN_USER_SESS_KEY, &user).await {
                    tracing::warn!("Failed to persist refreshed tokens: {e}");
                }
            }
            Err(e) => {
                tracing::warn!("Token refresh failed: {e}; sending current token anyway");
                // Fall through — the agent will 401 and the UI will
                // prompt re-login. Better than failing the request at
                // the dashboard layer with no helpful UX cue.
            }
        }
    }
    Ok(req.bearer_auth(user.access_token))
 }
 /// Decode the JWT's payload (no signature verification — the agent
 /// does that) and check the `exp` claim. Treats malformed tokens as
 /// expired so the refresh path runs.
 fn token_needs_refresh(jwt: &str) -> bool {
    let Some(payload_b64) = jwt.split('.').nth(1) else {
        return true;
    };
    let Ok(bytes) = URL_SAFE_NO_PAD.decode(payload_b64) else {
        return true;
    };
    #[derive(serde::Deserialize)]
    struct ExpClaim {
        exp: i64,
    }
    let Ok(claims) = serde_json::from_slice::<ExpClaim>(&bytes) else {
        return true;
    };
    let now = chrono::Utc::now().timestamp();
    claims.exp - REFRESH_SKEW_SECS <= now
 }
 /// Exchange a refresh_token for a new access_token. Returns the new
 /// access_token and (optionally) the new refresh_token KC issued.
 /// KC may rotate refresh_tokens on use; we honor whatever it sends.
 async fn refresh_tokens(
    state: &ServerState,
    refresh_token: &str,
 ) -> Result<(String, Option<String>), String> {
    let kc = state
        .keycloak
        .ok_or_else(|| "Keycloak not configured".to_string())?;
    if refresh_token.is_empty() {
        return Err("no refresh_token in session".to_string());
    }
    #[derive(serde::Deserialize)]
    struct TokenResp {
        access_token: String,
        refresh_token: Option<String>,
    }
    let resp = reqwest::Client::new()
        .post(kc.token_endpoint())
        .form(&[
            ("grant_type", "refresh_token"),
            ("client_id", kc.client_id.as_str()),
            ("refresh_token", refresh_token),
        ])
        .send()
        .await
        .map_err(|e| format!("refresh request failed: {e}"))?;
    if !resp.status().is_success() {
        let status = resp.status();
        let body = resp.text().await.unwrap_or_default();
        return Err(format!("refresh rejected ({status}): {body}"));
    }
    let r: TokenResp = resp
        .json()
        .await
        .map_err(|e| format!("refresh response parse failed: {e}"))?;
    Ok((r.access_token, r.refresh_token))
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    use base64::Engine;
    /// Build a JWT-shaped string (header.payload.sig) with the given
    /// payload. Signature is bogus — we never verify it locally.
    fn make_jwt(payload: &serde_json::Value) -> String {
        let payload_b64 = URL_SAFE_NO_PAD.encode(serde_json::to_vec(payload).unwrap());
        format!("hdr.{payload_b64}.sig")
    }
    #[test]
    fn token_needs_refresh_true_when_expired() {
        let exp = chrono::Utc::now().timestamp() - 60;
        let jwt = make_jwt(&serde_json::json!({ "exp": exp }));
        assert!(token_needs_refresh(&jwt));
    }
    #[test]
    fn token_needs_refresh_true_within_skew_window() {
        // 10 seconds left; less than the 30s skew → must refresh.
        let exp = chrono::Utc::now().timestamp() + 10;
        let jwt = make_jwt(&serde_json::json!({ "exp": exp }));
        assert!(token_needs_refresh(&jwt));
    }
    #[test]
    fn token_needs_refresh_false_with_plenty_of_life() {
        let exp = chrono::Utc::now().timestamp() + 600;
        let jwt = make_jwt(&serde_json::json!({ "exp": exp }));
        assert!(!token_needs_refresh(&jwt));
    }
    #[test]
    fn token_needs_refresh_true_on_malformed_jwt() {
        assert!(token_needs_refresh(""));
        assert!(token_needs_refresh("not.a.jwt"));
        assert!(token_needs_refresh("only-one-segment"));
        assert!(token_needs_refresh("hdr.not-base64!.sig"));
    }
    #[test]
    fn token_needs_refresh_true_when_exp_missing() {
        let jwt = make_jwt(&serde_json::json!({ "sub": "abc" }));
        assert!(token_needs_refresh(&jwt));
    }
 }
Author	SHA1	Message	Date
Sharang Parnerkar	dcec519565	fix(dashboard): attach Keycloak token on agent API calls CI / Check (pull_request) Successful in 8m12s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details Symptom: "unable to load repositories" — agent returns "Missing authorization header" 401 on every protected endpoint because the dashboard's server functions were calling reqwest::get without an Authorization header. The Keycloak OIDC flow (auth_login / auth_callback) was already wired up and storing the access_token in tower-sessions, but the access_token was never threaded into outbound calls. Fix - New `infrastructure::agent_client` module exposes: - `agent_request(method, path) -> RequestBuilder` - `agent_get(path) -> RequestBuilder` (sugar for GET) Both pull the session's access_token (via FullstackContext extract) and attach `Authorization: Bearer <token>`. When Keycloak is not configured the helper short-circuits — matching the dashboard's require_auth middleware which short-circuits in the same state. - Migrated every #[server] function in: - chat, dast, findings, graph, issues, notifications, pentest, repositories, sbom, scans, stats - 57 call sites total, all replaced. - Left as-is: - `infrastructure::server::webhook_proxy` — forwards to the agent's separate webhook server (port 3002), which is HMAC-authenticated, not JWT-authenticated. - `infrastructure::auth::auth_callback` — performs the KC token exchange itself; bearer auth would be circular. Test plan - cargo fmt --all clean - cargo clippy -p compliance-dashboard --features server -- -D warnings clean - cargo check -p compliance-dashboard --features server clean - cargo check -p compliance-dashboard (web target) implicit via build - Manual: after deploy, dashboard's repositories page loads without 401; calls now carry Authorization: Bearer header to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 20:31:21 +02:00
Sharang Parnerkar	08c4ec4cff	feat(m7.2-D): drop transitional agent.db, add admin helpers CI / Check (pull_request) Successful in 9m27s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details Final slice of M7.2. Removes the transitional single-database handle that M7.2-A introduced alongside the pool, so the compliance-agent now has a single source of truth for storage: every code path obtains a tenant-scoped Database from `agent.db_pool.for_tenant_id(...)` or `for_tenant(&ctx)`. There is no shared "default" database anywhere. Changes - ComplianceAgent: `db: Database` field removed. ComplianceAgent::new now takes only `(config, db_pool)`. Verified by an earlier grep during M7.2-C that no remaining call site reads `agent.db`. - main.rs: stops constructing the legacy Database. Only the pool is built at startup. - TestServer: same — drops Database::connect/ensure_indexes, builds only the pool. cleanup() now drops every `<db_name>_*` per-tenant database (no longer touches a bare `<db_name>`). - DatabasePool::list_tenant_db_names() — lists Mongo databases matching the pool's prefix. For admin endpoints + scheduler tenant enumeration in a future M7.3 (this PR keeps SCHEDULER_TENANT_IDS env config — registry integration is a separate concern). - DatabasePool::drop_tenant(&str) — idempotent tenant offboarding. Drops the per-tenant database and evicts the in-memory `ensured` marker so a later re-provision re-runs ensure_indexes. Test plan - cargo fmt --all clean - cargo clippy --workspace --exclude compliance-dashboard -- -D warnings clean - cargo test -p compliance-core --lib — 7 pass - cargo test -p compliance-agent --lib — 228 pass - cargo test -p compliance-agent --test tenant_isolation — 6 pass including new `admin_helpers_list_and_drop_tenant_dbs` - cargo test -p compliance-agent --test tenant_status_middleware — 6 pass M7.2 closeout state after this lands - M7.1 (auth + status) — done - M7.2-A (pool) — done - M7.2-B (handlers) — done - M7.2-C (background paths) — done - M7.2-D (legacy db removed, admin helpers) — done (this PR) - Future M7.3: scheduler pulls tenants from tenant-registry instead of SCHEDULER_TENANT_IDS env; cross-tenant admin HTTP endpoints built on list_tenant_db_names / drop_tenant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 15:05:27 +02:00
Sharang Parnerkar	0f6dd1135e	feat(m7.2-C): migrate background paths to per-tenant pool CI / Check (pull_request) Successful in 10m33s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details Closes the loop on M7.2 isolation for paths that don't have a JWT context: scheduler, webhooks, and the agent's `run_scan` / `run_pr_review` helpers all now take a `tenant_id` at the boundary and resolve to a tenant-scoped `Database` via `db_pool.for_tenant_id(...)`. Internal orchestrators (PipelineOrchestrator, PentestOrchestrator) and pipeline helpers were already DB-agnostic — they take `db: Database` at construction and don't care which tenant it points to. Changes - DatabasePool::for_tenant_id(&str) — same as for_tenant but accepts a bare tenant_id. Background paths don't have a full TenantContext. for_tenant is now a thin wrapper that delegates. - agent.run_scan(tenant_id, repo_id, trigger) — pulls the tenant database before constructing the PipelineOrchestrator. Was: run_scan(repo_id, trigger) reading agent.db. - agent.run_pr_review(tenant_id, repo_id, ...) — same shape. - Webhook routes change: /webhook/{tenant_id}/{platform}/{repo_id}. Tenant is part of the URL path because webhooks arrive without a JWT — they're authenticated via per-repo HMAC, not the tenant gate. The dashboard surfaces the full per-tenant URL when the repo is registered. All three handlers (gitea, github, gitlab) updated. - scheduler.rs — iterates tenants from $SCHEDULER_TENANT_IDS (comma-separated env), or DEV_TENANT_ID's `dev` default. Both scan_all_repos and monitor_cves now run once per configured tenant. M7.2-D will replace this static config with a pull from the tenant-registry. - api/handlers/repos.rs::trigger_scan now passes tenant.0.tenant_id. What's unchanged because it didn't need to change - PipelineOrchestrator, PentestOrchestrator: take `db: Database` at construction — they're tenant-DB-agnostic by design. The caller picks the tenant DB. - pipeline/{dedup,graph_build,issue_creation,sbom/mod}.rs, pentest/{context,report/html/*}.rs, trackers/jira.rs, llm/triage.rs: take `&Database` or `&mongodb::Database` as args, transitively tenant-scoped via the caller. Test plan - cargo fmt --all clean - cargo clippy --workspace --exclude compliance-dashboard -- -D warnings clean - cargo test -p compliance-core --lib — 7 pass - cargo test -p compliance-agent --lib — 228 pass - cargo test -p compliance-agent --test tenant_isolation — 5 pass - cargo test -p compliance-agent --test tenant_status_middleware — 6 pass What's left (PR-D) - Drop the transitional agent.db field — no remaining call sites (verified by `grep -rn "agent\.db\b" compliance-agent/src`). - main.rs / TestServer stop building the legacy Database; only the pool remains. - Add cross-tenant admin helpers (list tenants, drop tenant DB) on the pool for offboarding flows. - Pull tenants from the tenant-registry instead of an env var. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 15:00:37 +02:00
Sharang Parnerkar	cdfbb62f9d	feat(m7.2-B): migrate API handlers to per-tenant database pool CI / Check (pull_request) Successful in 8m9s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details Builds on PR M7.2-A. Every HTTP handler in compliance-agent/src/api/ now takes a TenantCtx extractor and pulls a tenant-scoped Database from agent.db_pool.for_tenant(&ctx). The query bodies are unchanged — `db.findings().find(doc! {...})` reads from the tenant's own physical database, so the filter doc cannot leak data across tenants because the wrong tenant's data is literally on a different db handle. Changes - New `dto::tenant_db(&agent, &tenant) -> Result<Database, StatusCode>` helper. Every migrated handler calls it at the top of the body instead of `let db = &agent.db;`. 500 on the rare pool failure; 4xx auth failures are already handled by the M7.1 status gate. - New `api::server::inject_dev_tenant` middleware mounted only when Keycloak is NOT configured. Synthesizes a TenantContext with tenant_id = $DEV_TENANT_ID (default `dev`) so `cargo run` against a bare Mongo + no KC still serves the API. Logged loudly as "DO NOT use in any environment with real customer data". - Test harness: TestServer mounts inject_dev_tenant so existing E2E tests reach handlers; cleanup() now drops every <db_name>_* per-tenant database, not just the legacy <db_name>. Files migrated (handler count, all pass `cargo build`): - chat.rs (3) — also rewires RagPipeline + EmbeddingStore to the tenant DB's inner() so vector search is per-tenant - dast.rs (5) - findings.rs (5) - graph.rs (7) — also rewires GraphStore inside trigger_build's spawn to the tenant DB - health.rs (1) — stats_overview migrated; public /health stays un-scoped - issues.rs (1) - notifications.rs (5) - pentest_handlers/session.rs (12) — both wizard + legacy paths, plus pause/resume/stop/get_attack_chain/get_messages/ get_session_findings/lookup_repo. PentestOrchestrator now gets the tenant DB clone in its spawn. - pentest_handlers/export.rs (1) — fans out across sessions, attack_chain_nodes, dast_findings, findings, sbom_entries, graph_nodes from a single tenant_db acquisition - pentest_handlers/stats.rs (1) - pentest_handlers/stream.rs (1) — SSE handler verifies session via the tenant DB before subscribing - repos.rs (6) - sbom.rs (5) - scans.rs (1) help_chat.rs has no DB queries and was skipped. Test plan - cargo fmt --all clean - cargo clippy --workspace --exclude compliance-dashboard -- -D warnings clean - cargo test -p compliance-core --lib — 7 pass - cargo test -p compliance-agent --lib — 228 pass - cargo test -p compliance-agent --test tenant_isolation — 5 pass (driver-level isolation still holds post-handler migration) - cargo test -p compliance-agent --test tenant_status_middleware — 6 pass What's not yet migrated (PR-C / PR-D) - scheduler.rs (6 sites), pipeline/orchestrator.rs (14), pentest/orchestrator.rs (13), webhooks (gitea/github/gitlab), trackers/jira.rs, pipeline/dedup.rs etc. — background paths without a JWT-derived tenant context. - agent.db is still in the ComplianceAgent struct as a transitional handle for those paths. PR-D removes it once PR-C migrates the background paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 13:28:33 +02:00
Sharang Parnerkar	003835764e	fixup(m7.2-A): validate db_prefix at connect, bump hash to 16 bytes CI / Check (pull_request) Successful in 8m29s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details Addresses review feedback on the hash-fallback path. The original `debug_assert!(hashed.len() <= MAX_DB_NAME_LEN)` was a runtime hack that vanished in release builds. With an 8-byte hash truncation (~2^32 birthday-collision resistance), two tenant_ids hashing to the same suffix would silently share a database — no panic, no rollback, just cross-tenant data leak. Not acceptable for a regulated-industry product. Changes: - Bump hash truncation 8 → 16 bytes (32 hex chars). 2^64 birthday resistance — collision-impossible at our scale. - Add MAX_PREFIX_LEN (= 30) and validate db_prefix.len() at `DatabasePool::connect`. The runtime hash-fallback arithmetic is now provably within Mongo's 63-byte cap; drop the debug_assert!. - New test `connect_rejects_overlong_db_prefix` exercises the inclusive bound (30 passes, 31 fails). - Existing hash-fallback test now asserts a 32-char hex suffix + basic distinctness for two different inputs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 13:16:46 +02:00
Sharang Parnerkar	e3aabe7d18	feat(m7.2-A): introduce per-tenant DatabasePool CI / Check (pull_request) Successful in 8m40s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details First slice of the M7.2 tenant-isolation work. Adds a `DatabasePool` that hands out per-tenant `Database` handles physically scoped to `<prefix>_<tenant_id>` Mongo databases. Isolation is at the driver, not at "we hope we filter" — a handle for tenant A literally cannot see tenant B's documents because it's connected to a different db. What's in this PR - DatabasePool::connect — pings the cluster, prepares per-tenant lazy handles. - DatabasePool::for_tenant(&TenantContext) — returns a Database scoped to that tenant. ensure_indexes runs once per tenant per process via a DashMap-backed marker; failure rolls the marker back so the next request retries. - tenant_db_name — `<prefix>_<sanitized_tenant_id>` if it fits in Mongo's 63-byte db-name cap, else `<prefix>_<sha256-16hex>` fallback. - Sanitizer rewrites the Mongo-disallowed chars (`/ \ . " $ <space> NUL`) so any future tenant_id shape works. - ComplianceAgent gains a `db_pool: DatabasePool` field next to the existing `db: Database`. Handlers / pipelines / webhooks still use `db` — they migrate to `db_pool.for_tenant(&ctx)` in M7.2-B/C and `db` goes away in M7.2-D. Test plan - cargo fmt --all clean - cargo clippy --workspace --exclude compliance-dashboard -- -D warnings clean - cargo test -p compliance-core --lib — 7 pass - cargo test -p compliance-agent --lib — 228 pass - cargo test -p compliance-agent --test tenant_isolation — 4 pass against live mongo on 27017: * pool_isolates_tenants_at_driver_level — writes for acme + globex, reads through each tenant's handle; each sees exactly its own data with no filter doc anywhere. * for_tenant_is_idempotent_index_creation — second + third call for the same tenant do not error. * tenant_db_name_sanitizes_unsafe_characters * tenant_db_name_falls_back_to_hash_when_too_long — 100-byte tenant_id collapses to a stable 8-byte hex suffix. Why per-tenant DB vs `tenant_id` field + filter - Driver-level isolation; impossible to forget the filter on one of the 184 query call-sites in compliance-agent. - Handlers don't change shape at migration — `agent.db.findings()` becomes `db.findings()` after pulling `db` from `agent.db_pool.for_tenant(&ctx)`. - GDPR delete = `db.dropDatabase()`. - On-prem deploy = the same code path, with one tenant. - Trade-off accepted: index storage duplicated per tenant; Mongo's ~thousand-db ceiling is way above the 10s-100s tenants we're targeting. Caveats - Existing `agent.db` continues to point at the single legacy db. Handlers / pipelines that use it are unscoped until M7.2-B/C migrate them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-17 11:58:24 +02:00