feat(m7.2-A): introduce per-tenant DatabasePool #86

Closed
sharang wants to merge 2 commits from feat/m7.2a-tenant-db-pool into main
Owner

Summary

First slice of M7.2 tenant isolation. Introduces a DatabasePool that hands out per-tenant Database handles physically scoped to <prefix>_<tenant_id> Mongo databases.

Isolation is at the driver, not at "we hope we filter." A handle for tenant A literally cannot see tenant B's documents because it is connected to a different db. There is no {tenant_id: ctx.tenant_id} filter to forget on one of the 184 query call-sites in compliance-agent.

What's in this PR

  • DatabasePool::connect(uri, db_prefix) — pings the cluster, prepares per-tenant lazy handles.
  • DatabasePool::for_tenant(&TenantContext) — returns a Database scoped to that tenant. ensure_indexes runs once per tenant per process via a DashMap-backed marker; failure rolls the marker back so the next request retries.
  • DatabasePool::tenant_db_name<prefix>_<sanitized_tenant_id> if it fits Mongo's 63-byte db-name cap, else <prefix>_<sha256-16hex> fallback.
  • Sanitizer rewrites Mongo-disallowed chars (/ \ . " $ <space> NUL) so any future tenant_id shape works.
  • ComplianceAgent gains a db_pool: DatabasePool field next to the existing db: Database. Handlers / pipelines / webhooks still use db — they migrate to db_pool.for_tenant(&ctx) in M7.2-B/C and db goes away in M7.2-D.

Why per-tenant database (vs tenant_id field + app filter)

  • Forgetting the filter on one of 184 query call-sites is no longer possible — the type system + driver enforce the boundary.
  • Handler bodies don't change shape at migration. agent.db.findings() becomes db.findings() after pulling db from agent.db_pool.for_tenant(&ctx).
  • GDPR delete = db.dropDatabase().
  • On-prem deploy uses the same code path with one tenant.
  • Trade-off accepted: index storage duplicated per tenant; Mongo's ~thousand-db ceiling is well above the 10s-100s tenants we target.

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy --workspace --exclude compliance-dashboard -- -D warnings clean
  • cargo test -p compliance-core --lib — 7 pass
  • cargo test -p compliance-agent --lib — 228 pass
  • cargo test -p compliance-agent --test tenant_isolation4/4 pass against live Mongo on :27017:
    • pool_isolates_tenants_at_driver_level — writes for acme + globex, reads through each tenant's handle; each sees exactly its own data with no filter doc anywhere.
    • for_tenant_is_idempotent_index_creation — second + third call for the same tenant do not error.
    • tenant_db_name_sanitizes_unsafe_characters
    • tenant_db_name_falls_back_to_hash_when_too_long — 100-byte tenant_id collapses to a stable 8-byte hex suffix. (Caught a real bug during dev — the original test prefix overran the 63-byte cap.)

Caveats

  • Existing agent.db continues to point at the single legacy db. Handlers / pipelines that use it are unscoped until M7.2-B/C migrate them. The agent is bootable and functional throughout.
  • Dev Mongo will be wiped between M7.2-D and the M7.2 rollout per offline conversation — no migration needed.

What's next

  • M7.2-B — migrate api/handlers/* to take TenantCtx and use agent.db_pool.for_tenant(&ctx).
  • M7.2-C — thread TenantContext through scheduler / pipeline / pentest orchestrators / webhooks / trackers.
  • M7.2-D — drop the transitional agent.db field; add cross-tenant admin helper.

🤖 Generated with Claude Code

## Summary First slice of **M7.2 tenant isolation**. Introduces a `DatabasePool` that hands out per-tenant `Database` handles physically scoped to `<prefix>_<tenant_id>` Mongo databases. Isolation is at the **driver**, not at "we hope we filter." A handle for tenant A literally cannot see tenant B's documents because it is connected to a different db. There is no `{tenant_id: ctx.tenant_id}` filter to forget on one of the 184 query call-sites in `compliance-agent`. ## What's in this PR - `DatabasePool::connect(uri, db_prefix)` — pings the cluster, prepares per-tenant lazy handles. - `DatabasePool::for_tenant(&TenantContext)` — returns a `Database` scoped to that tenant. `ensure_indexes` runs once per tenant per process via a `DashMap`-backed marker; failure rolls the marker back so the next request retries. - `DatabasePool::tenant_db_name` — `<prefix>_<sanitized_tenant_id>` if it fits Mongo's **63-byte db-name cap**, else `<prefix>_<sha256-16hex>` fallback. - Sanitizer rewrites Mongo-disallowed chars (`/ \ . " $ <space> NUL`) so any future tenant_id shape works. - `ComplianceAgent` gains a `db_pool: DatabasePool` field next to the existing `db: Database`. Handlers / pipelines / webhooks still use `db` — they migrate to `db_pool.for_tenant(&ctx)` in M7.2-B/C and `db` goes away in M7.2-D. ## Why per-tenant database (vs `tenant_id` field + app filter) - Forgetting the filter on one of 184 query call-sites is no longer possible — the type system + driver enforce the boundary. - Handler bodies don't change shape at migration. `agent.db.findings()` becomes `db.findings()` after pulling `db` from `agent.db_pool.for_tenant(&ctx)`. - GDPR delete = `db.dropDatabase()`. - On-prem deploy uses the same code path with one tenant. - Trade-off accepted: index storage duplicated per tenant; Mongo's ~thousand-db ceiling is well above the 10s-100s tenants we target. ## Test plan - [x] `cargo fmt --all -- --check` clean - [x] `cargo clippy --workspace --exclude compliance-dashboard -- -D warnings` clean - [x] `cargo test -p compliance-core --lib` — 7 pass - [x] `cargo test -p compliance-agent --lib` — 228 pass - [x] `cargo test -p compliance-agent --test tenant_isolation` — **4/4 pass** against live Mongo on `:27017`: - `pool_isolates_tenants_at_driver_level` — writes for acme + globex, reads through each tenant's handle; each sees exactly its own data with no filter doc anywhere. - `for_tenant_is_idempotent_index_creation` — second + third call for the same tenant do not error. - `tenant_db_name_sanitizes_unsafe_characters` - `tenant_db_name_falls_back_to_hash_when_too_long` — 100-byte tenant_id collapses to a stable 8-byte hex suffix. (Caught a real bug during dev — the original test prefix overran the 63-byte cap.) ## Caveats - Existing `agent.db` continues to point at the single legacy db. Handlers / pipelines that use it are **unscoped** until M7.2-B/C migrate them. The agent is bootable and functional throughout. - Dev Mongo will be wiped between M7.2-D and the M7.2 rollout per offline conversation — no migration needed. ## What's next - **M7.2-B** — migrate `api/handlers/*` to take `TenantCtx` and use `agent.db_pool.for_tenant(&ctx)`. - **M7.2-C** — thread `TenantContext` through scheduler / pipeline / pentest orchestrators / webhooks / trackers. - **M7.2-D** — drop the transitional `agent.db` field; add cross-tenant admin helper. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sharang added 1 commit 2026-06-17 09:59:05 +00:00
feat(m7.2-A): introduce per-tenant DatabasePool
CI / Check (pull_request) Successful in 8m40s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
e3aabe7d18
First slice of the M7.2 tenant-isolation work. Adds a `DatabasePool`
that hands out per-tenant `Database` handles physically scoped to
`<prefix>_<tenant_id>` Mongo databases. Isolation is at the driver,
not at "we hope we filter" — a handle for tenant A literally cannot
see tenant B's documents because it's connected to a different db.

What's in this PR
- DatabasePool::connect — pings the cluster, prepares per-tenant lazy
  handles.
- DatabasePool::for_tenant(&TenantContext) — returns a Database scoped
  to that tenant. ensure_indexes runs once per tenant per process via
  a DashMap-backed marker; failure rolls the marker back so the next
  request retries.
- tenant_db_name — `<prefix>_<sanitized_tenant_id>` if it fits in
  Mongo's 63-byte db-name cap, else `<prefix>_<sha256-16hex>` fallback.
- Sanitizer rewrites the Mongo-disallowed chars (`/ \ . " $ <space>
  NUL`) so any future tenant_id shape works.
- ComplianceAgent gains a `db_pool: DatabasePool` field next to the
  existing `db: Database`. Handlers / pipelines / webhooks still use
  `db` — they migrate to `db_pool.for_tenant(&ctx)` in M7.2-B/C and
  `db` goes away in M7.2-D.

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard -- -D warnings
  clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 228 pass
- cargo test -p compliance-agent --test tenant_isolation — 4 pass
  against live mongo on 27017:
    * pool_isolates_tenants_at_driver_level — writes for acme + globex,
      reads through each tenant's handle; each sees exactly its own
      data with no filter doc anywhere.
    * for_tenant_is_idempotent_index_creation — second + third call
      for the same tenant do not error.
    * tenant_db_name_sanitizes_unsafe_characters
    * tenant_db_name_falls_back_to_hash_when_too_long — 100-byte
      tenant_id collapses to a stable 8-byte hex suffix.

Why per-tenant DB vs `tenant_id` field + filter
- Driver-level isolation; impossible to forget the filter on one of
  the 184 query call-sites in compliance-agent.
- Handlers don't change shape at migration — `agent.db.findings()`
  becomes `db.findings()` after pulling `db` from
  `agent.db_pool.for_tenant(&ctx)`.
- GDPR delete = `db.dropDatabase()`.
- On-prem deploy = the same code path, with one tenant.
- Trade-off accepted: index storage duplicated per tenant; Mongo's
  ~thousand-db ceiling is way above the 10s-100s tenants we're
  targeting.

Caveats
- Existing `agent.db` continues to point at the single legacy db.
  Handlers / pipelines that use it are unscoped until M7.2-B/C
  migrate them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sharang added 1 commit 2026-06-17 11:16:59 +00:00
fixup(m7.2-A): validate db_prefix at connect, bump hash to 16 bytes
CI / Check (pull_request) Successful in 8m29s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
003835764e
Addresses review feedback on the hash-fallback path.

The original `debug_assert!(hashed.len() <= MAX_DB_NAME_LEN)` was a
runtime hack that vanished in release builds. With an 8-byte hash
truncation (~2^32 birthday-collision resistance), two tenant_ids
hashing to the same suffix would silently share a database — no
panic, no rollback, just cross-tenant data leak. Not acceptable for
a regulated-industry product.

Changes:
- Bump hash truncation 8 → 16 bytes (32 hex chars). 2^64 birthday
  resistance — collision-impossible at our scale.
- Add MAX_PREFIX_LEN (= 30) and validate db_prefix.len() at
  `DatabasePool::connect`. The runtime hash-fallback arithmetic is
  now provably within Mongo's 63-byte cap; drop the debug_assert!.
- New test `connect_rejects_overlong_db_prefix` exercises the
  inclusive bound (30 passes, 31 fails).
- Existing hash-fallback test now asserts a 32-char hex suffix +
  basic distinctness for two different inputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Owner

Superseded — the M7.2 stack was inadvertently included in PR #90 squash-merge (5648291) on main. The dashboard PR was branched off this PR's descendant and its full diff swept into main as one squash commit. M7.2-A through M7.2-D are all live on main and in production. Closing without merging.

Superseded — the M7.2 stack was inadvertently included in PR #90 squash-merge (`5648291`) on main. The dashboard PR was branched off this PR's descendant and its full diff swept into main as one squash commit. M7.2-A through M7.2-D are all live on main and in production. Closing without merging.
sharang closed this pull request 2026-06-18 09:31:58 +00:00
Some checks are pending
CI / Check (pull_request) Successful in 8m29s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sharang/compliance-scanner-agent#86