Compliance Scanner

Autonomous security and compliance scanning agent for git repositories

--- ## About Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira/Gitea) with evidence and remediation suggestions, reviews pull requests with multi-pass LLM analysis, runs autonomous penetration tests, and exposes a Dioxus-based dashboard for visualization. > **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation with multi-language awareness. ## Features | Area | Capabilities | |------|-------------| | **SAST Scanning** | Semgrep-based static analysis with auto-config rules | | **SBOM Generation** | Syft + cargo-audit for complete dependency inventory | | **CVE Monitoring** | OSV.dev batch queries, NVD CVSS enrichment, SearXNG context | | **GDPR Patterns** | Detect PII logging, missing consent, hardcoded retention, missing deletion | | **OAuth Patterns** | Detect implicit grant, missing PKCE, token in localStorage, token in URLs | | **LLM Triage** | Multi-language-aware confidence scoring (Rust, Python, Go, Java, Ruby, PHP, C++) | | **Issue Creation** | Auto-create issues in GitHub, GitLab, Jira, or Gitea with dedup via fingerprints | | **PR Reviews** | Multi-pass security review (logic, security, convention, complexity) with dedup | | **DAST Scanning** | Black-box security testing with endpoint discovery and parameter fuzzing | | **AI Pentesting** | Autonomous LLM-orchestrated penetration testing with encrypted reports | | **Code Graph** | Interactive code knowledge graph with impact analysis | | **AI Chat (RAG)** | Natural language Q&A grounded in repository source code | | **Help Assistant** | Documentation-grounded help chat accessible from every dashboard page | | **MCP Server** | Expose live security data to Claude, Cursor, and other AI tools | | **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, DAST, pentest, and graph | | **Webhooks** | GitHub, GitLab, and Gitea webhook receivers for push/PR events | | **Finding Dedup** | SHA-256 fingerprint dedup for SAST, CWE-based dedup for DAST findings | ## Architecture ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ Cargo Workspace │ ├──────────────┬──────────────────┬──────────────┬──────────┬─────────────┤ │ compliance- │ compliance- │ compliance- │ complian-│ compliance- │ │ core (lib) │ agent (bin) │ dashboard │ ce-graph │ mcp (bin) │ │ │ │ (bin) │ (lib) │ │ │ Models │ Scan Pipeline │ Dioxus 0.7 │ Tree- │ MCP Server │ │ Traits │ LLM Client │ Fullstack UI │ sitter │ Live data │ │ Config │ Issue Trackers │ Help Chat │ Graph │ for AI │ │ Errors │ Pentest Engine │ Server Fns │ Embedds │ tools │ │ │ DAST Tools │ │ RAG │ │ │ │ REST API │ │ │ │ │ │ Webhooks │ │ │ │ └──────────────┴──────────────────┴──────────────┴──────────┴─────────────┘ │ MongoDB (shared) ``` ## Scan Pipeline (7 Stages) 1. **Change Detection** -- `git2` fetch, compare HEAD SHA with last scanned commit 2. **Semgrep SAST** -- CLI wrapper with JSON output parsing 3. **SBOM Generation** -- Syft (CycloneDX) + cargo-audit vulnerability merge 4. **CVE Scanning** -- OSV.dev batch + NVD CVSS enrichment + SearXNG context 5. **Pattern Scanning** -- Regex-based GDPR and OAuth compliance checks 6. **LLM Triage** -- LiteLLM confidence scoring, filter findings < 3/10 7. **Issue Creation** -- Dedup via SHA-256 fingerprint, create tracker issues ## Tech Stack | Layer | Technology | |-------|-----------| | Shared Library | `compliance-core` -- models, traits, config | | Agent | Axum REST API, git2, tokio-cron-scheduler, Semgrep, Syft | | Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS 4 | | Code Graph | `compliance-graph` -- tree-sitter parsing, embeddings, RAG | | MCP Server | `compliance-mcp` -- Model Context Protocol for AI tools | | DAST | `compliance-dast` -- dynamic application security testing | | Database | MongoDB with typed collections | | LLM | LiteLLM (OpenAI-compatible API for chat, triage, embeddings) | | Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3), Gitea | | CVE Sources | OSV.dev, NVD, SearXNG | | Auth | Keycloak (OAuth2/PKCE, SSO) | | Browser Automation | Chromium (headless, for pentesting and PDF generation) | ## Getting Started ### Prerequisites - Rust 1.94+ - [Dioxus CLI](https://dioxuslabs.com/learn/0.7/getting_started) (`dx`) - MongoDB - Docker & Docker Compose (optional) ### Optional External Tools - [Semgrep](https://semgrep.dev/) -- for SAST scanning - [Syft](https://github.com/anchore/syft) -- for SBOM generation - [cargo-audit](https://github.com/rustsec/rustsec) -- for Rust dependency auditing ### Setup ```bash # Clone the repository git clone cd compliance-scanner # Start MongoDB + SearXNG docker compose up -d mongo searxng # Configure environment cp .env.example .env # Edit .env with your LiteLLM, tracker tokens, and MongoDB settings # Run the agent cargo run -p compliance-agent # Run the dashboard (separate terminal) dx serve --features server --platform web ``` ### Docker Compose (Full Stack) ```bash docker compose up -d ``` This starts MongoDB, SearXNG, the agent (port 3001), and the dashboard (port 8080). ## REST API The agent exposes a REST API on port 3001: | Method | Endpoint | Description | |--------|----------|-------------| | `GET` | `/api/v1/health` | Health check | | `GET` | `/api/v1/stats/overview` | Summary statistics and trends | | `GET` | `/api/v1/repositories` | List tracked repositories | | `POST` | `/api/v1/repositories` | Add a repository to track | | `POST` | `/api/v1/repositories/:id/scan` | Trigger a manual scan | | `GET` | `/api/v1/findings` | List findings (filterable) | | `GET` | `/api/v1/findings/:id` | Get finding with code evidence | | `PATCH` | `/api/v1/findings/:id/status` | Update finding status | | `GET` | `/api/v1/sbom` | List dependencies | | `GET` | `/api/v1/issues` | List cross-tracker issues | | `GET` | `/api/v1/scan-runs` | Scan execution history | | `GET` | `/api/v1/graph/:repo_id` | Code knowledge graph | | `POST` | `/api/v1/graph/:repo_id/build` | Trigger graph build | | `GET` | `/api/v1/dast/targets` | List DAST targets | | `POST` | `/api/v1/dast/targets` | Add DAST target | | `GET` | `/api/v1/dast/findings` | List DAST findings | | `POST` | `/api/v1/chat/:repo_id` | RAG-powered code chat | | `POST` | `/api/v1/help/chat` | Documentation-grounded help chat | | `POST` | `/api/v1/pentest/sessions` | Create pentest session | | `POST` | `/api/v1/pentest/sessions/:id/export` | Export encrypted pentest report | | `POST` | `/webhook/github` | GitHub webhook (HMAC-SHA256) | | `POST` | `/webhook/gitlab` | GitLab webhook (token verify) | | `POST` | `/webhook/gitea` | Gitea webhook | ## Dashboard Pages | Page | Description | |------|-------------| | **Overview** | Stat cards, severity distribution, AI chat cards, MCP status | | **Repositories** | Add/manage tracked repos, trigger scans, webhook config | | **Findings** | Filterable table by severity, type, status, scanner | | **Finding Detail** | Code evidence, remediation, suggested fix, linked issue | | **SBOM** | Dependency inventory with vulnerability badges, license summary | | **Issues** | Cross-tracker view (GitHub + GitLab + Jira + Gitea) | | **Code Graph** | Interactive architecture visualization, impact analysis | | **AI Chat** | RAG-powered Q&A about repository code | | **DAST** | Dynamic scanning targets, findings, and scan history | | **Pentest** | AI-driven pentest sessions, attack chain visualization | | **MCP Servers** | Model Context Protocol server management | | **Help Chat** | Floating assistant (available on every page) for product Q&A | ## Project Structure ``` compliance-scanner/ ├── compliance-core/ Shared library (models, traits, config, errors) ├── compliance-agent/ Agent daemon (pipeline, LLM, trackers, API, webhooks) │ └── src/ │ ├── pipeline/ 7-stage scan pipeline, dedup, PR reviews, code review │ ├── llm/ LiteLLM client, triage, descriptions, fixes, review prompts │ ├── trackers/ GitHub, GitLab, Jira, Gitea integrations │ ├── pentest/ AI-driven pentest orchestrator, tools, reports │ ├── rag/ RAG pipeline, chunking, embedding │ ├── api/ REST API (Axum), help chat │ └── webhooks/ GitHub, GitLab, Gitea webhook receivers ├── compliance-dashboard/ Dioxus fullstack dashboard │ └── src/ │ ├── components/ Reusable UI (sidebar, help chat, attack chain, etc.) │ ├── infrastructure/ Server functions, DB, config, auth │ └── pages/ Full page views (overview, DAST, pentest, graph, etc.) ├── compliance-graph/ Code knowledge graph (tree-sitter, embeddings, RAG) ├── compliance-dast/ Dynamic application security testing ├── compliance-mcp/ Model Context Protocol server ├── docs/ VitePress documentation site ├── assets/ Static assets (CSS, icons) └── styles/ Tailwind input stylesheet ``` ## External Services | Service | Purpose | Default URL | |---------|---------|-------------| | MongoDB | Persistence | `mongodb://localhost:27017` | | LiteLLM | LLM proxy (chat, triage, embeddings) | `http://localhost:4000` | | SearXNG | CVE context search | `http://localhost:8888` | | Keycloak | Authentication (OAuth2/PKCE, SSO) | `http://localhost:8080` | | Semgrep | SAST scanning | CLI tool | | Syft | SBOM generation | CLI tool | | Chromium | Headless browser (pentesting, PDF) | Managed via Docker | ---

_{Built with Rust, Dioxus, and a commitment to automated security compliance.}