feat(m7.2-C): migrate background paths to per-tenant pool
CI / Check (pull_request) Successful in 10m33s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped

Closes the loop on M7.2 isolation for paths that don't have a JWT
context: scheduler, webhooks, and the agent's `run_scan` / `run_pr_review`
helpers all now take a `tenant_id` at the boundary and resolve to a
tenant-scoped `Database` via `db_pool.for_tenant_id(...)`. Internal
orchestrators (PipelineOrchestrator, PentestOrchestrator) and pipeline
helpers were already DB-agnostic — they take `db: Database` at
construction and don't care which tenant it points to.

Changes
- DatabasePool::for_tenant_id(&str) — same as for_tenant but accepts
  a bare tenant_id. Background paths don't have a full TenantContext.
  for_tenant is now a thin wrapper that delegates.
- agent.run_scan(tenant_id, repo_id, trigger) — pulls the tenant
  database before constructing the PipelineOrchestrator. Was:
  run_scan(repo_id, trigger) reading agent.db.
- agent.run_pr_review(tenant_id, repo_id, ...) — same shape.
- Webhook routes change: /webhook/{tenant_id}/{platform}/{repo_id}.
  Tenant is part of the URL path because webhooks arrive without a
  JWT — they're authenticated via per-repo HMAC, not the tenant gate.
  The dashboard surfaces the full per-tenant URL when the repo is
  registered. All three handlers (gitea, github, gitlab) updated.
- scheduler.rs — iterates tenants from $SCHEDULER_TENANT_IDS
  (comma-separated env), or DEV_TENANT_ID's `dev` default. Both
  scan_all_repos and monitor_cves now run once per configured
  tenant. M7.2-D will replace this static config with a pull from
  the tenant-registry.
- api/handlers/repos.rs::trigger_scan now passes tenant.0.tenant_id.

What's unchanged because it didn't need to change
- PipelineOrchestrator, PentestOrchestrator: take `db: Database` at
  construction — they're tenant-DB-agnostic by design. The caller
  picks the tenant DB.
- pipeline/{dedup,graph_build,issue_creation,sbom/mod}.rs,
  pentest/{context,report/html/*}.rs, trackers/jira.rs, llm/triage.rs:
  take `&Database` or `&mongodb::Database` as args, transitively
  tenant-scoped via the caller.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard
  -- -D warnings clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 5 pass
- cargo test -p compliance-agent --test tenant_status_middleware
  — 6 pass

What's left (PR-D)
- Drop the transitional agent.db field — no remaining call sites
  (verified by `grep -rn "agent\.db\b" compliance-agent/src`).
- main.rs / TestServer stop building the legacy Database; only the
  pool remains.
- Add cross-tenant admin helpers (list tenants, drop tenant DB) on
  the pool for offboarding flows.
- Pull tenants from the tenant-registry instead of an env var.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Sharang Parnerkar
2026-06-17 15:00:37 +02:00
parent cdfbb62f9d
commit 0f6dd1135e
8 changed files with 182 additions and 72 deletions
+23 -9
View File
@@ -10,24 +10,30 @@ use crate::agent::ComplianceAgent;
pub async fn handle_gitlab_webhook(
Extension(agent): Extension<Arc<ComplianceAgent>>,
Path(repo_id): Path<String>,
Path((tenant_id, repo_id)): Path<(String, String)>,
headers: HeaderMap,
body: Bytes,
) -> StatusCode {
// Look up the repo to get its webhook secret
// Look up the repo in the tenant's database to get its webhook secret
let oid = match mongodb::bson::oid::ObjectId::parse_str(&repo_id) {
Ok(oid) => oid,
Err(_) => return StatusCode::NOT_FOUND,
};
let repo = match agent
.db
let db = match agent.db_pool.for_tenant_id(&tenant_id).await {
Ok(db) => db,
Err(e) => {
tracing::warn!("GitLab webhook: cannot open tenant database '{tenant_id}': {e}");
return StatusCode::NOT_FOUND;
}
};
let repo = match db
.repositories()
.find_one(mongodb::bson::doc! { "_id": oid })
.await
{
Ok(Some(repo)) => repo,
_ => {
tracing::warn!("GitLab webhook: repo {repo_id} not found");
tracing::warn!("GitLab webhook: repo {repo_id} not found in tenant '{tenant_id}'");
return StatusCode::NOT_FOUND;
}
};
@@ -59,15 +65,21 @@ pub async fn handle_gitlab_webhook(
"push" => {
let agent_clone = (*agent).clone();
let repo_id = repo_id.clone();
let tenant_id = tenant_id.clone();
tokio::spawn(async move {
tracing::info!("GitLab push webhook: triggering scan for {repo_id}");
if let Err(e) = agent_clone.run_scan(&repo_id, ScanTrigger::Webhook).await {
tracing::info!(
"GitLab push webhook: triggering scan for {repo_id} in tenant {tenant_id}"
);
if let Err(e) = agent_clone
.run_scan(&tenant_id, &repo_id, ScanTrigger::Webhook)
.await
{
tracing::error!("Webhook-triggered scan failed: {e}");
}
});
StatusCode::OK
}
"merge_request" => handle_merge_request(agent, &repo_id, &payload).await,
"merge_request" => handle_merge_request(agent, &tenant_id, &repo_id, &payload).await,
_ => {
tracing::debug!("GitLab webhook: ignoring event '{event_type}'");
StatusCode::OK
@@ -77,6 +89,7 @@ pub async fn handle_gitlab_webhook(
async fn handle_merge_request(
agent: Arc<ComplianceAgent>,
tenant_id: &str,
repo_id: &str,
payload: &serde_json::Value,
) -> StatusCode {
@@ -101,13 +114,14 @@ async fn handle_merge_request(
}
let repo_id = repo_id.to_string();
let tenant_id = tenant_id.to_string();
let head_sha = head_sha.to_string();
let base_sha = base_sha.to_string();
let agent_clone = (*agent).clone();
tokio::spawn(async move {
tracing::info!("GitLab MR webhook: reviewing MR !{mr_iid} on {repo_id}");
if let Err(e) = agent_clone
.run_pr_review(&repo_id, mr_iid, &base_sha, &head_sha)
.run_pr_review(&tenant_id, &repo_id, mr_iid, &base_sha, &head_sha)
.await
{
tracing::error!("MR review failed for !{mr_iid}: {e}");