Add VitePress documentation site with complete user guides
All checks were successful
CI / Format (push) Successful in 3s
CI / Clippy (push) Successful in 3m13s
CI / Security Audit (push) Has been skipped
CI / Tests (push) Has been skipped

Covers getting started, repositories, scanning, findings, configuration,
SBOM, code graph, impact analysis, DAST, AI chat, issue tracker integration,
Docker deployment, environment variables, Keycloak auth, and OpenTelemetry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Sharang Parnerkar
2026-03-08 01:18:58 +01:00
parent 65abc55915
commit 94552d1626
21 changed files with 4019 additions and 0 deletions

141
docs/guide/configuration.md Normal file
View File

@@ -0,0 +1,141 @@
# Configuration
Compliance Scanner is configured through environment variables. Copy `.env.example` to `.env` and edit the values.
## Required Settings
### MongoDB
```bash
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
MONGODB_DATABASE=compliance_scanner
```
### Agent
```bash
AGENT_PORT=3001
```
### Dashboard
```bash
DASHBOARD_PORT=8080
AGENT_API_URL=http://localhost:3001
```
## LLM Configuration
The AI features (chat, remediation suggestions) use LiteLLM as a proxy to various LLM providers:
```bash
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=your-key
LITELLM_MODEL=gpt-4o
LITELLM_EMBED_MODEL=text-embedding-3-small
```
The embed model is used for the RAG/AI Chat feature to generate code embeddings.
## Git Provider Tokens
### GitHub
```bash
GITHUB_TOKEN=ghp_xxxx
GITHUB_WEBHOOK_SECRET=your-webhook-secret
```
### GitLab
```bash
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=glpat-xxxx
GITLAB_WEBHOOK_SECRET=your-webhook-secret
```
## Issue Tracker Integration
### Jira
```bash
JIRA_URL=https://your-org.atlassian.net
JIRA_EMAIL=user@example.com
JIRA_API_TOKEN=your-api-token
JIRA_PROJECT_KEY=SEC
```
When configured, new findings automatically create Jira issues in the specified project.
## Scan Schedules
Cron expressions for automated scanning:
```bash
# Scan every 6 hours
SCAN_SCHEDULE=0 0 */6 * * *
# Check for new CVEs daily at midnight
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```
## Search Engine
SearXNG is used for CVE enrichment and vulnerability research:
```bash
SEARXNG_URL=http://localhost:8888
```
## NVD API
An NVD API key increases rate limits for CVE lookups:
```bash
NVD_API_KEY=your-nvd-api-key
```
Get a free key at [https://nvd.nist.gov/developers/request-an-api-key](https://nvd.nist.gov/developers/request-an-api-key).
## Clone Path
Where the agent stores cloned repository files:
```bash
GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
```
## All Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `MONGODB_URI` | Yes | — | MongoDB connection string |
| `MONGODB_DATABASE` | No | `compliance_scanner` | Database name |
| `AGENT_PORT` | No | `3001` | Agent REST API port |
| `DASHBOARD_PORT` | No | `8080` | Dashboard web UI port |
| `AGENT_API_URL` | No | `http://localhost:3001` | Agent URL for dashboard |
| `LITELLM_URL` | No | `http://localhost:4000` | LiteLLM proxy URL |
| `LITELLM_API_KEY` | No | — | LiteLLM API key |
| `LITELLM_MODEL` | No | `gpt-4o` | LLM model for analysis |
| `LITELLM_EMBED_MODEL` | No | `text-embedding-3-small` | Embedding model for RAG |
| `GITHUB_TOKEN` | No | — | GitHub personal access token |
| `GITHUB_WEBHOOK_SECRET` | No | — | GitHub webhook signing secret |
| `GITLAB_URL` | No | `https://gitlab.com` | GitLab instance URL |
| `GITLAB_TOKEN` | No | — | GitLab access token |
| `GITLAB_WEBHOOK_SECRET` | No | — | GitLab webhook signing secret |
| `JIRA_URL` | No | — | Jira instance URL |
| `JIRA_EMAIL` | No | — | Jira account email |
| `JIRA_API_TOKEN` | No | — | Jira API token |
| `JIRA_PROJECT_KEY` | No | — | Jira project key for issues |
| `SEARXNG_URL` | No | `http://localhost:8888` | SearXNG instance URL |
| `NVD_API_KEY` | No | — | NVD API key for CVE lookups |
| `SCAN_SCHEDULE` | No | `0 0 */6 * * *` | Cron schedule for scans |
| `CVE_MONITOR_SCHEDULE` | No | `0 0 0 * * *` | Cron schedule for CVE checks |
| `GIT_CLONE_BASE_PATH` | No | `/tmp/compliance-scanner/repos` | Local clone directory |
| `KEYCLOAK_URL` | No | — | Keycloak server URL |
| `KEYCLOAK_REALM` | No | — | Keycloak realm name |
| `KEYCLOAK_CLIENT_ID` | No | — | Keycloak client ID |
| `REDIRECT_URI` | No | — | OAuth callback URL |
| `APP_URL` | No | — | Application root URL |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | No | — | OTLP collector endpoint |
| `OTEL_SERVICE_NAME` | No | — | OpenTelemetry service name |

75
docs/guide/findings.md Normal file
View File

@@ -0,0 +1,75 @@
# Managing Findings
Findings are security issues discovered during scans. The findings workflow lets you triage, track, and resolve vulnerabilities across all your repositories.
## Findings List
Navigate to **Findings** in the sidebar to see all findings. The table shows:
| Column | Description |
|--------|-------------|
| Severity | Color-coded badge: Critical (red), High (orange), Medium (yellow), Low (green) |
| Title | Short description of the vulnerability (clickable) |
| Type | SAST, SBOM, CVE, GDPR, or OAuth |
| Scanner | Tool that found the issue (e.g. semgrep, syft) |
| File | Source file path where the issue was found |
| Status | Current triage status |
## Filtering
Use the filter bar at the top to narrow results:
- **Repository** — Filter to a specific repository or view all
- **Severity** — Critical, High, Medium, Low, or Info
- **Type** — SAST, SBOM, CVE, GDPR, OAuth
- **Status** — Open, Triaged, Resolved, False Positive, Ignored
Filters can be combined. Results are paginated with 20 findings per page.
## Finding Detail
Click any finding title to view its full detail page, which includes:
### Metadata
- Severity level with CWE identifier and CVSS score (when available)
- Scanner tool and scan type
- File path and line number
### Description
Full explanation of the vulnerability, why it's a risk, and what conditions trigger it.
### Code Evidence
The source code snippet where the issue was found, with syntax highlighting and the file path.
### Remediation
Step-by-step guidance on how to fix the vulnerability.
### Suggested Fix
A code example showing the corrected implementation.
### Linked Issue
If the finding was pushed to an issue tracker (GitHub, GitLab, Jira), a direct link to the external issue.
## Updating Status
On the finding detail page, change the finding's status using the status buttons:
| Status | When to Use |
|--------|-------------|
| **Open** | New finding, not yet reviewed |
| **Triaged** | Reviewed and confirmed as a real issue, pending fix |
| **Resolved** | Fix has been applied |
| **False Positive** | Finding is not a real vulnerability in this context |
| **Ignored** | Known issue that won't be fixed (accepted risk) |
Status changes are persisted immediately.
## Severity Levels
| Severity | Description | Typical Examples |
|----------|-------------|-----------------|
| **Critical** | Immediate exploitation risk, data breach likely | SQL injection, RCE, hardcoded secrets |
| **High** | Serious vulnerability, exploitation probable | XSS, authentication bypass, SSRF |
| **Medium** | Moderate risk, exploitation requires specific conditions | Insecure deserialization, weak crypto |
| **Low** | Minor risk, limited impact | Information disclosure, verbose errors |
| **Info** | Informational, no direct security impact | Best practice recommendations |

View File

@@ -0,0 +1,55 @@
# Getting Started
Compliance Scanner is a security compliance platform that scans your Git repositories for vulnerabilities, builds software bills of materials, performs dynamic application testing, and provides AI-powered code intelligence.
## Architecture
The platform consists of three main components:
- **Agent** — Background service that clones repositories, runs scans, builds graphs, and exposes a REST API
- **Dashboard** — Web UI built with Dioxus (Rust full-stack framework) for viewing results and managing repositories
- **MongoDB** — Database for storing all scan results, findings, SBOM data, and graph structures
## Quick Start with Docker Compose
The fastest way to get running:
```bash
# Clone the repository
git clone <repo-url> compliance-scanner
cd compliance-scanner
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your settings (see Configuration)
# Start all services
docker-compose up -d
```
This starts:
- MongoDB on port `27017`
- Agent API on port `3001`
- Dashboard on port `8080`
- Chromium (for DAST crawling) on port `3003`
Open the dashboard at [http://localhost:8080](http://localhost:8080).
## What Happens During a Scan
When you add a repository and trigger a scan, the agent runs through these phases:
1. **Clone** — Clones or pulls the latest code from the Git remote
2. **SAST** — Runs static analysis using Semgrep with rules for OWASP, GDPR, OAuth, and general security
3. **SBOM** — Extracts all dependencies using Syft, identifying packages, versions, licenses, and known vulnerabilities
4. **CVE Check** — Cross-references dependencies against the NVD database for known CVEs
5. **Graph Build** — Parses the codebase to construct a code knowledge graph of functions, classes, and their relationships
6. **Issue Sync** — Creates or updates issues in connected trackers (GitHub, GitLab, Jira) for new findings
Each phase produces results visible in the dashboard immediately.
## Next Steps
- [Add your first repository](/guide/repositories)
- [Understand scan results](/guide/findings)
- [Configure integrations](/guide/configuration)

View File

@@ -0,0 +1,62 @@
# Adding Repositories
Repositories are the core resource in Compliance Scanner. Each tracked repository is scanned on a schedule and its results are available across all features.
## Adding a Repository
1. Navigate to **Repositories** in the sidebar
2. Click **Add Repository** at the top of the page
3. Fill in the form:
- **Name** — A display name for the repository
- **Git URL** — The clone URL (HTTPS or SSH), e.g. `https://github.com/org/repo.git`
- **Default Branch** — The branch to scan, e.g. `main` or `master`
4. Click **Add**
The repository appears in the list immediately. It will not be scanned until you trigger a scan manually or the next scheduled scan runs.
::: tip
For private repositories, configure a GitHub token (`GITHUB_TOKEN`) or GitLab token (`GITLAB_TOKEN`) in your environment. The agent uses these tokens when cloning.
:::
## Repository List
The repositories page shows all tracked repositories with:
| Column | Description |
|--------|-------------|
| Name | Repository display name |
| Git URL | Clone URL |
| Branch | Default branch being scanned |
| Findings | Total number of security findings |
| Last Scanned | Relative timestamp of the most recent scan |
## Triggering a Scan
Click the **Scan** button on any repository row to trigger an immediate scan. The scan runs in the background through all phases (clone, SAST, SBOM, CVE, graph). You can monitor progress on the Overview page under recent scan runs.
## Deleting a Repository
Click the **Delete** button on a repository row. A confirmation dialog appears warning that this action permanently removes:
- All security findings
- SBOM entries and vulnerability data
- Scan run history
- Code graph data
- Embedding vectors (for AI chat)
- CVE alerts
This action cannot be undone.
## Automatic Scanning
Repositories are scanned automatically on a schedule configured by the `SCAN_SCHEDULE` environment variable (cron format). The default is every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
CVE monitoring runs on a separate schedule (default: daily at midnight):
```
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```

83
docs/guide/scanning.md Normal file
View File

@@ -0,0 +1,83 @@
# Running Scans
Scans are the primary workflow in Compliance Scanner. Each scan analyzes a repository for security vulnerabilities, dependency risks, and code structure.
## Scan Types
A full scan consists of multiple phases, each producing different types of findings:
| Scan Type | What It Detects | Scanner |
|-----------|----------------|---------|
| **SAST** | Code-level vulnerabilities (injection, XSS, insecure crypto, etc.) | Semgrep |
| **SBOM** | Dependency inventory, outdated packages, known vulnerabilities | Syft |
| **CVE** | Known CVEs in dependencies cross-referenced against NVD | NVD API |
| **GDPR** | Personal data handling issues, consent violations | Custom rules |
| **OAuth** | OAuth/OIDC misconfigurations, insecure token handling | Custom rules |
## Triggering a Scan
### Manual Scan
1. Go to **Repositories**
2. Click **Scan** on the repository you want to scan
3. The scan starts immediately in the background
### Scheduled Scans
Scans run automatically based on the `SCAN_SCHEDULE` cron expression. The default scans every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
### Webhook-Triggered Scans
Configure GitHub or GitLab webhooks to trigger scans on push events. Set the webhook URL to:
```
http://<agent-host>:3002/webhook/github
http://<agent-host>:3002/webhook/gitlab
```
And configure the corresponding webhook secret:
```
GITHUB_WEBHOOK_SECRET=your-secret
GITLAB_WEBHOOK_SECRET=your-secret
```
## Scan Phases
Each scan progresses through these phases in order:
1. **Queued** — Scan is waiting to start
2. **Cloning** — Repository is being cloned or updated
3. **Scanning** — Static analysis and SBOM extraction are running
4. **Analyzing** — CVE cross-referencing and graph construction
5. **Reporting** — Creating tracker issues for new findings
6. **Completed** — All phases finished successfully
If any phase fails, the scan status is set to **Failed** with an error message.
## Viewing Scan History
The Overview page shows the 10 most recent scan runs across all repositories, including:
- Repository name
- Scan status
- Current phase
- Number of findings discovered
- Start time and duration
## Scan Run Statuses
| Status | Meaning |
|--------|---------|
| `queued` | Waiting to start |
| `running` | Currently executing |
| `completed` | Finished successfully |
| `failed` | Stopped due to an error |
## Deduplication
Findings are deduplicated using a fingerprint hash based on the scanner, file path, line number, and vulnerability type. Repeated scans will not create duplicate findings for the same issue.