compliance-scanner-agent/README.md

<p align="center">
  <img src="assets/favicon.svg" width="96" height="96" alt="Compliance Scanner Logo" />
</p>

<h1 align="center">Compliance Scanner</h1>

<p align="center">
  <strong>Autonomous security and compliance scanning agent for git repositories</strong>
</p>

<p align="center">
  <a href="https://www.rust-lang.org/"><img src="https://img.shields.io/badge/Rust-1.94-orange?logo=rust&logoColor=white" alt="Rust" /></a>
  <a href="https://dioxuslabs.com/"><img src="https://img.shields.io/badge/Dioxus-0.7-blue?logo=webassembly&logoColor=white" alt="Dioxus" /></a>
  <a href="https://www.mongodb.com/"><img src="https://img.shields.io/badge/MongoDB-8.0-47A248?logo=mongodb&logoColor=white" alt="MongoDB" /></a>
  <a href="https://axum.rs/"><img src="https://img.shields.io/badge/Axum-0.8-4A4A55?logo=rust&logoColor=white" alt="Axum" /></a>
  <a href="https://tailwindcss.com/"><img src="https://img.shields.io/badge/Tailwind_CSS-4-06B6D4?logo=tailwindcss&logoColor=white" alt="Tailwind CSS" /></a>
</p>

<p align="center">
  <img src="https://img.shields.io/badge/GDPR-Scanning-green" alt="GDPR" />
  <img src="https://img.shields.io/badge/OAuth-Scanning-green" alt="OAuth" />
  <img src="https://img.shields.io/badge/SAST-Semgrep-blue" alt="SAST" />
  <img src="https://img.shields.io/badge/CVE-OSV.dev%20%2B%20NVD-orange" alt="CVE" />
  <img src="https://img.shields.io/badge/Platform-Linux%20%7C%20Docker-lightgrey?logo=linux&logoColor=white" alt="Platform" />
</p>

---

## About

Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira/Gitea) with evidence and remediation suggestions, reviews pull requests with multi-pass LLM analysis, runs autonomous penetration tests, and exposes a Dioxus-based dashboard for visualization.

> **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation with multi-language awareness.

## Features

| Area | Capabilities |
|------|-------------|
| **SAST Scanning** | Semgrep-based static analysis with auto-config rules |
| **SBOM Generation** | Syft + cargo-audit for complete dependency inventory |
| **CVE Monitoring** | OSV.dev batch queries, NVD CVSS enrichment, SearXNG context |
| **GDPR Patterns** | Detect PII logging, missing consent, hardcoded retention, missing deletion |
| **OAuth Patterns** | Detect implicit grant, missing PKCE, token in localStorage, token in URLs |
| **LLM Triage** | Multi-language-aware confidence scoring (Rust, Python, Go, Java, Ruby, PHP, C++) |
| **Issue Creation** | Auto-create issues in GitHub, GitLab, Jira, or Gitea with dedup via fingerprints |
| **PR Reviews** | Multi-pass security review (logic, security, convention, complexity) with dedup |
| **DAST Scanning** | Black-box security testing with endpoint discovery and parameter fuzzing |
| **AI Pentesting** | Autonomous LLM-orchestrated penetration testing with encrypted reports |
| **Code Graph** | Interactive code knowledge graph with impact analysis |
| **AI Chat (RAG)** | Natural language Q&A grounded in repository source code |
| **Help Assistant** | Documentation-grounded help chat accessible from every dashboard page |
| **MCP Server** | Expose live security data to Claude, Cursor, and other AI tools |
| **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, DAST, pentest, and graph |
| **Webhooks** | GitHub, GitLab, and Gitea webhook receivers for push/PR events |
| **Finding Dedup** | SHA-256 fingerprint dedup for SAST, CWE-based dedup for DAST findings |

## Architecture

```
┌──────────────────────────────────────────────────────────────────────────┐
│                          Cargo Workspace                                 │
├──────────────┬──────────────────┬──────────────┬──────────┬─────────────┤
│ compliance-  │ compliance-      │ compliance-  │ complian-│ compliance- │
│ core (lib)   │ agent (bin)      │ dashboard    │ ce-graph │ mcp (bin)   │
│              │                  │ (bin)        │ (lib)    │             │
│ Models       │ Scan Pipeline    │ Dioxus 0.7   │ Tree-    │ MCP Server  │
│ Traits       │ LLM Client      │ Fullstack UI │ sitter   │ Live data   │
│ Config       │ Issue Trackers   │ Help Chat    │ Graph    │ for AI      │
│ Errors       │ Pentest Engine   │ Server Fns   │ Embedds  │ tools       │
│              │ DAST Tools       │              │ RAG      │             │
│              │ REST API         │              │          │             │
│              │ Webhooks         │              │          │             │
└──────────────┴──────────────────┴──────────────┴──────────┴─────────────┘
                                 │
                            MongoDB (shared)
```

## Scan Pipeline (7 Stages)

1. **Change Detection** -- `git2` fetch, compare HEAD SHA with last scanned commit
2. **Semgrep SAST** -- CLI wrapper with JSON output parsing
3. **SBOM Generation** -- Syft (CycloneDX) + cargo-audit vulnerability merge
4. **CVE Scanning** -- OSV.dev batch + NVD CVSS enrichment + SearXNG context
5. **Pattern Scanning** -- Regex-based GDPR and OAuth compliance checks
6. **LLM Triage** -- LiteLLM confidence scoring, filter findings < 3/10
7. **Issue Creation** -- Dedup via SHA-256 fingerprint, create tracker issues

## Tech Stack

| Layer | Technology |
|-------|-----------|
| Shared Library | `compliance-core` -- models, traits, config |
| Agent | Axum REST API, git2, tokio-cron-scheduler, Semgrep, Syft |
| Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS 4 |
| Code Graph | `compliance-graph` -- tree-sitter parsing, embeddings, RAG |
| MCP Server | `compliance-mcp` -- Model Context Protocol for AI tools |
| DAST | `compliance-dast` -- dynamic application security testing |
| Database | MongoDB with typed collections |
| LLM | LiteLLM (OpenAI-compatible API for chat, triage, embeddings) |
| Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3), Gitea |
| CVE Sources | OSV.dev, NVD, SearXNG |
| Auth | Keycloak (OAuth2/PKCE, SSO) |
| Browser Automation | Chromium (headless, for pentesting and PDF generation) |

## Getting Started

### Prerequisites

- Rust 1.94+
- [Dioxus CLI](https://dioxuslabs.com/learn/0.7/getting_started) (`dx`)
- MongoDB
- Docker & Docker Compose (optional)

### Optional External Tools

- [Semgrep](https://semgrep.dev/) -- for SAST scanning
- [Syft](https://github.com/anchore/syft) -- for SBOM generation
- [cargo-audit](https://github.com/rustsec/rustsec) -- for Rust dependency auditing

### Setup

```bash
# Clone the repository
git clone <repo-url>
cd compliance-scanner

# Start MongoDB + SearXNG
docker compose up -d mongo searxng

# Configure environment
cp .env.example .env
# Edit .env with your LiteLLM, tracker tokens, and MongoDB settings

# Run the agent
cargo run -p compliance-agent

# Run the dashboard (separate terminal)
dx serve --features server --platform web
```

### Docker Compose (Full Stack)

```bash
docker compose up -d
```

This starts MongoDB, SearXNG, the agent (port 3001), and the dashboard (port 8080).

## REST API

The agent exposes a REST API on port 3001:

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/v1/health` | Health check |
| `GET` | `/api/v1/stats/overview` | Summary statistics and trends |
| `GET` | `/api/v1/repositories` | List tracked repositories |
| `POST` | `/api/v1/repositories` | Add a repository to track |
| `POST` | `/api/v1/repositories/:id/scan` | Trigger a manual scan |
| `GET` | `/api/v1/findings` | List findings (filterable) |
| `GET` | `/api/v1/findings/:id` | Get finding with code evidence |
| `PATCH` | `/api/v1/findings/:id/status` | Update finding status |
| `GET` | `/api/v1/sbom` | List dependencies |
| `GET` | `/api/v1/issues` | List cross-tracker issues |
| `GET` | `/api/v1/scan-runs` | Scan execution history |
| `GET` | `/api/v1/graph/:repo_id` | Code knowledge graph |
| `POST` | `/api/v1/graph/:repo_id/build` | Trigger graph build |
| `GET` | `/api/v1/dast/targets` | List DAST targets |
| `POST` | `/api/v1/dast/targets` | Add DAST target |
| `GET` | `/api/v1/dast/findings` | List DAST findings |
| `POST` | `/api/v1/chat/:repo_id` | RAG-powered code chat |
| `POST` | `/api/v1/help/chat` | Documentation-grounded help chat |
| `POST` | `/api/v1/pentest/sessions` | Create pentest session |
| `POST` | `/api/v1/pentest/sessions/:id/export` | Export encrypted pentest report |
| `POST` | `/webhook/github` | GitHub webhook (HMAC-SHA256) |
| `POST` | `/webhook/gitlab` | GitLab webhook (token verify) |
| `POST` | `/webhook/gitea` | Gitea webhook |

## Dashboard Pages

| Page | Description |
|------|-------------|
| **Overview** | Stat cards, severity distribution, AI chat cards, MCP status |
| **Repositories** | Add/manage tracked repos, trigger scans, webhook config |
| **Findings** | Filterable table by severity, type, status, scanner |
| **Finding Detail** | Code evidence, remediation, suggested fix, linked issue |
| **SBOM** | Dependency inventory with vulnerability badges, license summary |
| **Issues** | Cross-tracker view (GitHub + GitLab + Jira + Gitea) |
| **Code Graph** | Interactive architecture visualization, impact analysis |
| **AI Chat** | RAG-powered Q&A about repository code |
| **DAST** | Dynamic scanning targets, findings, and scan history |
| **Pentest** | AI-driven pentest sessions, attack chain visualization |
| **MCP Servers** | Model Context Protocol server management |
| **Help Chat** | Floating assistant (available on every page) for product Q&A |

## Project Structure

```
compliance-scanner/
├── compliance-core/        Shared library (models, traits, config, errors)
├── compliance-agent/       Agent daemon (pipeline, LLM, trackers, API, webhooks)
│   └── src/
│       ├── pipeline/       7-stage scan pipeline, dedup, PR reviews, code review
│       ├── llm/            LiteLLM client, triage, descriptions, fixes, review prompts
│       ├── trackers/       GitHub, GitLab, Jira, Gitea integrations
│       ├── pentest/        AI-driven pentest orchestrator, tools, reports
│       ├── rag/            RAG pipeline, chunking, embedding
│       ├── api/            REST API (Axum), help chat
│       └── webhooks/       GitHub, GitLab, Gitea webhook receivers
├── compliance-dashboard/   Dioxus fullstack dashboard
│   └── src/
│       ├── components/     Reusable UI (sidebar, help chat, attack chain, etc.)
│       ├── infrastructure/ Server functions, DB, config, auth
│       └── pages/          Full page views (overview, DAST, pentest, graph, etc.)
├── compliance-graph/       Code knowledge graph (tree-sitter, embeddings, RAG)
├── compliance-dast/        Dynamic application security testing
├── compliance-mcp/         Model Context Protocol server
├── docs/                   VitePress documentation site
├── assets/                 Static assets (CSS, icons)
└── styles/                 Tailwind input stylesheet
```

## External Services

| Service | Purpose | Default URL |
|---------|---------|-------------|
| MongoDB | Persistence | `mongodb://localhost:27017` |
| LiteLLM | LLM proxy (chat, triage, embeddings) | `http://localhost:4000` |
| SearXNG | CVE context search | `http://localhost:8888` |
| Keycloak | Authentication (OAuth2/PKCE, SSO) | `http://localhost:8080` |
| Semgrep | SAST scanning | CLI tool |
| Syft | SBOM generation | CLI tool |
| Chromium | Headless browser (pentesting, PDF) | Managed via Docker |

---

<p align="center">
  <sub>Built with Rust, Dioxus, and a commitment to automated security compliance.</sub>
</p>