e3aabe7d18
CI / Check (pull_request) Successful in 8m40s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
First slice of the M7.2 tenant-isolation work. Adds a `DatabasePool`
that hands out per-tenant `Database` handles physically scoped to
`<prefix>_<tenant_id>` Mongo databases. Isolation is at the driver,
not at "we hope we filter" — a handle for tenant A literally cannot
see tenant B's documents because it's connected to a different db.
What's in this PR
- DatabasePool::connect — pings the cluster, prepares per-tenant lazy
handles.
- DatabasePool::for_tenant(&TenantContext) — returns a Database scoped
to that tenant. ensure_indexes runs once per tenant per process via
a DashMap-backed marker; failure rolls the marker back so the next
request retries.
- tenant_db_name — `<prefix>_<sanitized_tenant_id>` if it fits in
Mongo's 63-byte db-name cap, else `<prefix>_<sha256-16hex>` fallback.
- Sanitizer rewrites the Mongo-disallowed chars (`/ \ . " $ <space>
NUL`) so any future tenant_id shape works.
- ComplianceAgent gains a `db_pool: DatabasePool` field next to the
existing `db: Database`. Handlers / pipelines / webhooks still use
`db` — they migrate to `db_pool.for_tenant(&ctx)` in M7.2-B/C and
`db` goes away in M7.2-D.
Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard -- -D warnings
clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 4 pass
against live mongo on 27017:
* pool_isolates_tenants_at_driver_level — writes for acme + globex,
reads through each tenant's handle; each sees exactly its own
data with no filter doc anywhere.
* for_tenant_is_idempotent_index_creation — second + third call
for the same tenant do not error.
* tenant_db_name_sanitizes_unsafe_characters
* tenant_db_name_falls_back_to_hash_when_too_long — 100-byte
tenant_id collapses to a stable 8-byte hex suffix.
Why per-tenant DB vs `tenant_id` field + filter
- Driver-level isolation; impossible to forget the filter on one of
the 184 query call-sites in compliance-agent.
- Handlers don't change shape at migration — `agent.db.findings()`
becomes `db.findings()` after pulling `db` from
`agent.db_pool.for_tenant(&ctx)`.
- GDPR delete = `db.dropDatabase()`.
- On-prem deploy = the same code path, with one tenant.
- Trade-off accepted: index storage duplicated per tenant; Mongo's
~thousand-db ceiling is way above the 10s-100s tenants we're
targeting.
Caveats
- Existing `agent.db` continues to point at the single legacy db.
Handlers / pipelines that use it are unscoped until M7.2-B/C
migrate them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
199 lines
6.9 KiB
Rust
199 lines
6.9 KiB
Rust
//! M7.2-A — `DatabasePool` isolation proof.
|
|
//!
|
|
//! Two `TenantContext`s, two databases, one client. Insert on A, query
|
|
//! on B → empty. Insert on B, query on A → only A's docs. Proves that
|
|
//! the per-tenant database split actually isolates at the driver level
|
|
//! and not at "we hope we filter."
|
|
//!
|
|
//! Requires MongoDB. Set `TEST_MONGODB_URI` to override the default
|
|
//! `mongodb://root:example@localhost:27017/?authSource=admin`.
|
|
|
|
#![allow(clippy::expect_used, clippy::unwrap_used)]
|
|
|
|
use compliance_agent::database::DatabasePool;
|
|
use compliance_core::models::TrackedRepository;
|
|
use compliance_core::{OrgRole, TenantContext, TenantStatus};
|
|
use mongodb::bson::doc;
|
|
|
|
fn ctx(tenant_id: &str, slug: &str) -> TenantContext {
|
|
TenantContext {
|
|
tenant_id: tenant_id.to_string(),
|
|
tenant_slug: slug.to_string(),
|
|
org_roles: vec![OrgRole::ItAdmin],
|
|
products: vec!["compliance-scanner".to_string()],
|
|
plan: "starter".to_string(),
|
|
status: TenantStatus::Active,
|
|
user_id: "u-1".to_string(),
|
|
user_name: None,
|
|
}
|
|
}
|
|
|
|
fn fixture_repo(name: &str, git_url: &str) -> TrackedRepository {
|
|
TrackedRepository {
|
|
id: None,
|
|
name: name.to_string(),
|
|
git_url: git_url.to_string(),
|
|
default_branch: "main".to_string(),
|
|
local_path: None,
|
|
scan_schedule: None,
|
|
webhook_enabled: false,
|
|
webhook_secret: None,
|
|
tracker_type: None,
|
|
tracker_owner: None,
|
|
tracker_repo: None,
|
|
tracker_token: None,
|
|
auth_token: None,
|
|
auth_username: None,
|
|
last_scanned_commit: None,
|
|
findings_count: 0,
|
|
created_at: chrono::Utc::now(),
|
|
updated_at: chrono::Utc::now(),
|
|
}
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn pool_isolates_tenants_at_driver_level() {
|
|
let uri = std::env::var("TEST_MONGODB_URI")
|
|
.unwrap_or_else(|_| "mongodb://root:example@localhost:27017/?authSource=admin".into());
|
|
// Unique per run so parallel test invocations don't collide. Kept
|
|
// short because Mongo caps db names at 63 bytes (prefix + tenant_id).
|
|
let prefix = format!("m72a_{}", short_id());
|
|
|
|
let pool = DatabasePool::connect(&uri, &prefix)
|
|
.await
|
|
.expect("Failed to connect to MongoDB — is it running?");
|
|
|
|
let acme = ctx("00000000-0000-0000-0000-00000000acme", "acme");
|
|
let globex = ctx("00000000-0000-0000-0000-0000globex000", "globex");
|
|
|
|
let acme_db = pool.for_tenant(&acme).await.expect("acme db");
|
|
let globex_db = pool.for_tenant(&globex).await.expect("globex db");
|
|
|
|
// Write distinct repos into each tenant's database.
|
|
acme_db
|
|
.repositories()
|
|
.insert_one(fixture_repo("acme-app", "git@example.com:acme/app.git"))
|
|
.await
|
|
.expect("insert acme");
|
|
globex_db
|
|
.repositories()
|
|
.insert_one(fixture_repo(
|
|
"globex-platform",
|
|
"git@example.com:globex/platform.git",
|
|
))
|
|
.await
|
|
.expect("insert globex");
|
|
|
|
// The point of the whole exercise: acme can ONLY see acme's repo
|
|
// and globex can ONLY see globex's, with no filter doc anywhere
|
|
// because the isolation is at the database handle, not in the query.
|
|
let acme_seen = collect(&acme_db).await;
|
|
let globex_seen = collect(&globex_db).await;
|
|
|
|
assert_eq!(acme_seen.len(), 1, "acme should see exactly its own repo");
|
|
assert_eq!(acme_seen[0].name, "acme-app");
|
|
assert_eq!(
|
|
globex_seen.len(),
|
|
1,
|
|
"globex should see exactly its own repo"
|
|
);
|
|
assert_eq!(globex_seen[0].name, "globex-platform");
|
|
|
|
// Sanity: the two databases really are different by name.
|
|
let acme_db_name = pool.tenant_db_name(&acme.tenant_id);
|
|
let globex_db_name = pool.tenant_db_name(&globex.tenant_id);
|
|
assert_ne!(acme_db_name, globex_db_name);
|
|
assert!(acme_db_name.starts_with(&prefix));
|
|
|
|
// Cleanup — drop both per-tenant databases.
|
|
pool.client()
|
|
.database(&acme_db_name)
|
|
.drop()
|
|
.await
|
|
.expect("drop acme");
|
|
pool.client()
|
|
.database(&globex_db_name)
|
|
.drop()
|
|
.await
|
|
.expect("drop globex");
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn for_tenant_is_idempotent_index_creation() {
|
|
let uri = std::env::var("TEST_MONGODB_URI")
|
|
.unwrap_or_else(|_| "mongodb://root:example@localhost:27017/?authSource=admin".into());
|
|
let prefix = format!("m72a_{}", short_id());
|
|
let pool = DatabasePool::connect(&uri, &prefix).await.expect("connect");
|
|
|
|
let acme = ctx("00000000-0000-0000-0000-00000000acme", "acme");
|
|
|
|
// Second call must not fail (ensure_indexes already ran, in-memory
|
|
// marker is set, Mongo's createIndex is idempotent by name anyway).
|
|
let _ = pool.for_tenant(&acme).await.expect("first call");
|
|
let _ = pool.for_tenant(&acme).await.expect("second call");
|
|
let _ = pool.for_tenant(&acme).await.expect("third call");
|
|
|
|
// Cleanup
|
|
let db_name = pool.tenant_db_name(&acme.tenant_id);
|
|
pool.client().database(&db_name).drop().await.expect("drop");
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn tenant_db_name_sanitizes_unsafe_characters() {
|
|
let uri = std::env::var("TEST_MONGODB_URI")
|
|
.unwrap_or_else(|_| "mongodb://root:example@localhost:27017/?authSource=admin".into());
|
|
let pool = DatabasePool::connect(&uri, "m72a_sanitize")
|
|
.await
|
|
.expect("connect");
|
|
|
|
// Mongo db names cannot contain `/ \ . " $ <space> NUL`. The pool
|
|
// must rewrite these without exploding on connect.
|
|
let funky = "te/n.a\\nt$id\" with spaces";
|
|
let name = pool.tenant_db_name(funky);
|
|
for c in ['/', '\\', '.', '"', '$', ' '] {
|
|
assert!(
|
|
!name.contains(c),
|
|
"sanitized db name still contains {c:?}: {name}"
|
|
);
|
|
}
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn tenant_db_name_falls_back_to_hash_when_too_long() {
|
|
let uri = std::env::var("TEST_MONGODB_URI")
|
|
.unwrap_or_else(|_| "mongodb://root:example@localhost:27017/?authSource=admin".into());
|
|
let pool = DatabasePool::connect(&uri, "m72a_long")
|
|
.await
|
|
.expect("connect");
|
|
|
|
// 100-byte tenant_id would overflow the 63-byte db-name cap with
|
|
// any reasonable prefix. The pool must hash it down.
|
|
let huge = "x".repeat(100);
|
|
let name = pool.tenant_db_name(&huge);
|
|
assert!(name.len() <= 63, "hashed name should fit: {name}");
|
|
assert!(name.starts_with("m72a_long_"));
|
|
|
|
// Stable: same input → same output.
|
|
assert_eq!(name, pool.tenant_db_name(&huge));
|
|
}
|
|
|
|
/// Short UUID slug for keeping test prefixes well under Mongo's 63-byte
|
|
/// db-name cap.
|
|
fn short_id() -> String {
|
|
uuid::Uuid::new_v4().simple().to_string()[..8].to_string()
|
|
}
|
|
|
|
/// Drain a `repositories` find cursor on the given tenant database.
|
|
async fn collect(db: &compliance_agent::database::Database) -> Vec<TrackedRepository> {
|
|
let mut cursor = db
|
|
.repositories()
|
|
.find(doc! {})
|
|
.await
|
|
.expect("find repositories");
|
|
let mut out = Vec::new();
|
|
while cursor.advance().await.expect("advance") {
|
|
out.push(cursor.deserialize_current().expect("deserialize"));
|
|
}
|
|
out
|
|
}
|