Sharang Parnerkar 628f346529
CI / Check (pull_request) Successful in 8m9s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
feat(m7.3): MCP tenant-scoped bearer tokens
LLM clients (Claude Desktop, Cursor, ChatGPT) can't run a Keycloak
OIDC flow, so the MCP server can't use JWTs for auth. This PR
introduces opaque static bearer tokens minted per-tenant via new
agent endpoints, validated by the MCP server, and used to route
incoming MCP requests to the caller's per-tenant database.

Until now, the MCP server connected to a single shared MongoDB DB
with no auth and no tenant awareness — every tool (list_findings,
list_sbom_packages, etc.) returned data across all tenants. After
M7.2 made the agent per-tenant, MCP was the lone cross-tenant data
leak. This closes it.

Design summary
- Token format: `mcpt_<43 url-safe random chars>` (48 chars total).
  Opaque, never embeds tenant_id, never stored in plaintext.
- Storage: cross-tenant `<prefix>__admin.mcp_tokens` collection,
  keyed by SHA-256 hash. Each row carries the tenant_id, name,
  created_by, created_at, last_used_at, revoked flag.
- Agent endpoints (tenant-scoped via TenantCtx):
    POST   /api/v1/mcp-tokens    → mint (returns raw token ONCE)
    GET    /api/v1/mcp-tokens    → list (metadata + 12-char prefix,
                                   never the hash)
    DELETE /api/v1/mcp-tokens/id → soft revoke
- MCP middleware: extract `Authorization: Bearer mcpt_...`, sniff
  the prefix, SHA-256 → lookup in admin DB → reject if missing or
  revoked. Updates last_used_at fire-and-forget so it never blocks.
  Sets `tokio::task_local!` TENANT_ID for the inner service call;
  the rmcp tool handlers read it and resolve the per-tenant DB.
- task_local is scoped via TENANT_ID.scope(...) around next.run(req)
  so the rmcp tool handlers downstream see the tenant_id without
  modifying their (macro-generated) signatures.

Files
- compliance-core/src/models/mcp_token.rs (new) — McpToken +
  McpTokenView (public projection without the hash).
- compliance-agent/src/database.rs — DatabasePool::admin_db() +
  admin_db_name(): cross-tenant access for token storage.
- compliance-agent/src/api/handlers/mcp_tokens.rs (new) — three
  endpoints. Token generation: 32 random bytes → URL-safe base64,
  no padding. SHA-256 hex stored.
- compliance-mcp/src/database.rs — replaced single Database with
  DatabasePool. Tenant-scoped Database constructed per request.
  Same sanitization + 63-byte cap + hash fallback as the agent.
- compliance-mcp/src/auth.rs (new) — bearer middleware + task_local.
  Includes a SHA-256 round-trip test against a known vector.
- compliance-mcp/src/main.rs — HTTP transport: bearer middleware
  layered on /mcp (not /health, so orca's container probe still
  works). stdio transport: falls back to STDIO_TENANT_ID env (defaults
  to "dev") so local development still works; logged loudly as
  not-for-production.
- compliance-mcp/src/server.rs — each of the 12 tool handlers
  resolves the per-tenant DB via task_local before calling its tool
  fn. Tool fns themselves are unchanged.

Token UX
- Generated by the dashboard (or curl + KC JWT) — user sees raw
  token exactly once, copies it into their LLM client config.
- Dashboard UI for management is a follow-up; can use curl in the
  meantime:
    curl -X POST https://comp-dev.../api/v1/mcp-tokens \
      -H "Authorization: Bearer $KC_JWT" \
      -H "Content-Type: application/json" \
      -d '{"name":"Claude Desktop"}'

Test plan
- cargo fmt --all clean
- cargo clippy --workspace --exclude compliance-dashboard
  -- -D warnings clean
- cargo test -p compliance-core --lib — 7 pass
- cargo test -p compliance-agent --lib — 230 pass (+2 new for
  token generation + sha256 stability)
- cargo test -p compliance-agent --test tenant_isolation — 6 pass
- cargo test -p compliance-mcp — 34 pass (+1 new sha256 vector)

What's deferred
- Dashboard UI for managing tokens (page + create modal + list/
  revoke). Trivial once the API is live.
- Token expiry + per-tool scope (today every token grants access
  to all 12 tools for its tenant).
- Lifting DatabasePool into compliance-core (duplicated for now
  in compliance-mcp to keep this PR focused; lift if a third
  consumer appears).

Production
- The `<prefix>__admin` DB needs to NOT collide with a tenant
  DB. Sanitized tenant_id never starts with `_admin` for any
  current tenant_id shape (UUIDs); flagged in the database.rs
  docstring so tenant provisioning can reject `_admin*` ids
  proactively.
- orca-infra MCP service block already has MONGODB_URI /
  MONGODB_DATABASE — no new env needed. No KC creds since MCP
  doesn't use Keycloak for its own auth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 11:54:01 +02:00
2026-03-07 23:51:20 +00:00

Compliance Scanner Logo

Compliance Scanner

Autonomous security and compliance scanning agent for git repositories

Rust Dioxus MongoDB Axum Tailwind CSS

GDPR OAuth SAST CVE Platform


About

Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira/Gitea) with evidence and remediation suggestions, reviews pull requests with multi-pass LLM analysis, runs autonomous penetration tests, and exposes a Dioxus-based dashboard for visualization.

How it works: The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation with multi-language awareness.

Features

Area Capabilities
SAST Scanning Semgrep-based static analysis with auto-config rules
SBOM Generation Syft + cargo-audit for complete dependency inventory
CVE Monitoring OSV.dev batch queries, NVD CVSS enrichment, SearXNG context
GDPR Patterns Detect PII logging, missing consent, hardcoded retention, missing deletion
OAuth Patterns Detect implicit grant, missing PKCE, token in localStorage, token in URLs
LLM Triage Multi-language-aware confidence scoring (Rust, Python, Go, Java, Ruby, PHP, C++)
Issue Creation Auto-create issues in GitHub, GitLab, Jira, or Gitea with dedup via fingerprints
PR Reviews Multi-pass security review (logic, security, convention, complexity) with dedup
DAST Scanning Black-box security testing with endpoint discovery and parameter fuzzing
AI Pentesting Autonomous LLM-orchestrated penetration testing with encrypted reports
Code Graph Interactive code knowledge graph with impact analysis
AI Chat (RAG) Natural language Q&A grounded in repository source code
Help Assistant Documentation-grounded help chat accessible from every dashboard page
MCP Server Expose live security data to Claude, Cursor, and other AI tools
Dashboard Fullstack Dioxus UI with findings, SBOM, issues, DAST, pentest, and graph
Webhooks GitHub, GitLab, and Gitea webhook receivers for push/PR events
Finding Dedup SHA-256 fingerprint dedup for SAST, CWE-based dedup for DAST findings

Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                          Cargo Workspace                                 │
├──────────────┬──────────────────┬──────────────┬──────────┬─────────────┤
│ compliance-  │ compliance-      │ compliance-  │ complian-│ compliance- │
│ core (lib)   │ agent (bin)      │ dashboard    │ ce-graph │ mcp (bin)   │
│              │                  │ (bin)        │ (lib)    │             │
│ Models       │ Scan Pipeline    │ Dioxus 0.7   │ Tree-    │ MCP Server  │
│ Traits       │ LLM Client      │ Fullstack UI │ sitter   │ Live data   │
│ Config       │ Issue Trackers   │ Help Chat    │ Graph    │ for AI      │
│ Errors       │ Pentest Engine   │ Server Fns   │ Embedds  │ tools       │
│              │ DAST Tools       │              │ RAG      │             │
│              │ REST API         │              │          │             │
│              │ Webhooks         │              │          │             │
└──────────────┴──────────────────┴──────────────┴──────────┴─────────────┘
                                 │
                            MongoDB (shared)

Scan Pipeline (7 Stages)

  1. Change Detection -- git2 fetch, compare HEAD SHA with last scanned commit
  2. Semgrep SAST -- CLI wrapper with JSON output parsing
  3. SBOM Generation -- Syft (CycloneDX) + cargo-audit vulnerability merge
  4. CVE Scanning -- OSV.dev batch + NVD CVSS enrichment + SearXNG context
  5. Pattern Scanning -- Regex-based GDPR and OAuth compliance checks
  6. LLM Triage -- LiteLLM confidence scoring, filter findings < 3/10
  7. Issue Creation -- Dedup via SHA-256 fingerprint, create tracker issues

Tech Stack

Layer Technology
Shared Library compliance-core -- models, traits, config
Agent Axum REST API, git2, tokio-cron-scheduler, Semgrep, Syft
Dashboard Dioxus 0.7.3 fullstack, Tailwind CSS 4
Code Graph compliance-graph -- tree-sitter parsing, embeddings, RAG
MCP Server compliance-mcp -- Model Context Protocol for AI tools
DAST compliance-dast -- dynamic application security testing
Database MongoDB with typed collections
LLM LiteLLM (OpenAI-compatible API for chat, triage, embeddings)
Issue Trackers GitHub (octocrab), GitLab (REST v4), Jira (REST v3), Gitea
CVE Sources OSV.dev, NVD, SearXNG
Auth Keycloak (OAuth2/PKCE, SSO)
Browser Automation Chromium (headless, for pentesting and PDF generation)

Getting Started

Prerequisites

  • Rust 1.94+
  • Dioxus CLI (dx)
  • MongoDB
  • Docker & Docker Compose (optional)

Optional External Tools

Setup

# Clone the repository
git clone <repo-url>
cd compliance-scanner

# Start MongoDB + SearXNG
docker compose up -d mongo searxng

# Configure environment
cp .env.example .env
# Edit .env with your LiteLLM, tracker tokens, and MongoDB settings

# Run the agent
cargo run -p compliance-agent

# Run the dashboard (separate terminal)
dx serve --features server --platform web

Docker Compose (Full Stack)

docker compose up -d

This starts MongoDB, SearXNG, the agent (port 3001), and the dashboard (port 8080).

REST API

The agent exposes a REST API on port 3001:

Method Endpoint Description
GET /api/v1/health Health check
GET /api/v1/stats/overview Summary statistics and trends
GET /api/v1/repositories List tracked repositories
POST /api/v1/repositories Add a repository to track
POST /api/v1/repositories/:id/scan Trigger a manual scan
GET /api/v1/findings List findings (filterable)
GET /api/v1/findings/:id Get finding with code evidence
PATCH /api/v1/findings/:id/status Update finding status
GET /api/v1/sbom List dependencies
GET /api/v1/issues List cross-tracker issues
GET /api/v1/scan-runs Scan execution history
GET /api/v1/graph/:repo_id Code knowledge graph
POST /api/v1/graph/:repo_id/build Trigger graph build
GET /api/v1/dast/targets List DAST targets
POST /api/v1/dast/targets Add DAST target
GET /api/v1/dast/findings List DAST findings
POST /api/v1/chat/:repo_id RAG-powered code chat
POST /api/v1/help/chat Documentation-grounded help chat
POST /api/v1/pentest/sessions Create pentest session
POST /api/v1/pentest/sessions/:id/export Export encrypted pentest report
POST /webhook/github GitHub webhook (HMAC-SHA256)
POST /webhook/gitlab GitLab webhook (token verify)
POST /webhook/gitea Gitea webhook

Dashboard Pages

Page Description
Overview Stat cards, severity distribution, AI chat cards, MCP status
Repositories Add/manage tracked repos, trigger scans, webhook config
Findings Filterable table by severity, type, status, scanner
Finding Detail Code evidence, remediation, suggested fix, linked issue
SBOM Dependency inventory with vulnerability badges, license summary
Issues Cross-tracker view (GitHub + GitLab + Jira + Gitea)
Code Graph Interactive architecture visualization, impact analysis
AI Chat RAG-powered Q&A about repository code
DAST Dynamic scanning targets, findings, and scan history
Pentest AI-driven pentest sessions, attack chain visualization
MCP Servers Model Context Protocol server management
Help Chat Floating assistant (available on every page) for product Q&A

Project Structure

compliance-scanner/
├── compliance-core/        Shared library (models, traits, config, errors)
├── compliance-agent/       Agent daemon (pipeline, LLM, trackers, API, webhooks)
│   └── src/
│       ├── pipeline/       7-stage scan pipeline, dedup, PR reviews, code review
│       ├── llm/            LiteLLM client, triage, descriptions, fixes, review prompts
│       ├── trackers/       GitHub, GitLab, Jira, Gitea integrations
│       ├── pentest/        AI-driven pentest orchestrator, tools, reports
│       ├── rag/            RAG pipeline, chunking, embedding
│       ├── api/            REST API (Axum), help chat
│       └── webhooks/       GitHub, GitLab, Gitea webhook receivers
├── compliance-dashboard/   Dioxus fullstack dashboard
│   └── src/
│       ├── components/     Reusable UI (sidebar, help chat, attack chain, etc.)
│       ├── infrastructure/ Server functions, DB, config, auth
│       └── pages/          Full page views (overview, DAST, pentest, graph, etc.)
├── compliance-graph/       Code knowledge graph (tree-sitter, embeddings, RAG)
├── compliance-dast/        Dynamic application security testing
├── compliance-mcp/         Model Context Protocol server
├── docs/                   VitePress documentation site
├── assets/                 Static assets (CSS, icons)
└── styles/                 Tailwind input stylesheet

External Services

Service Purpose Default URL
MongoDB Persistence mongodb://localhost:27017
LiteLLM LLM proxy (chat, triage, embeddings) http://localhost:4000
SearXNG CVE context search http://localhost:8888
Keycloak Authentication (OAuth2/PKCE, SSO) http://localhost:8080
Semgrep SAST scanning CLI tool
Syft SBOM generation CLI tool
Chromium Headless browser (pentesting, PDF) Managed via Docker

Built with Rust, Dioxus, and a commitment to automated security compliance.

S
Description
No description provided
Readme 6.2 MiB
Languages
Rust 93.6%
CSS 5.6%
JavaScript 0.5%
Shell 0.3%