Add VitePress documentation site with complete user guides

Covers getting started, repositories, scanning, findings, configuration, SBOM, code graph, impact analysis, DAST, AI chat, issue tracker integration, Docker deployment, environment variables, Keycloak auth, and OpenTelemetry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:18:58 +01:00
parent 65abc55915
commit 94552d1626
21 changed files with 4019 additions and 0 deletions
@@ -0,0 +1,141 @@
+# Configuration
+
+Compliance Scanner is configured through environment variables. Copy `.env.example` to `.env` and edit the values.
+
+## Required Settings
+
+### MongoDB
+
+```bash
+MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
+MONGODB_DATABASE=compliance_scanner
+```
+
+### Agent
+
+```bash
+AGENT_PORT=3001
+```
+
+### Dashboard
+
+```bash
+DASHBOARD_PORT=8080
+AGENT_API_URL=http://localhost:3001
+```
+
+## LLM Configuration
+
+The AI features (chat, remediation suggestions) use LiteLLM as a proxy to various LLM providers:
+
+```bash
+LITELLM_URL=http://localhost:4000
+LITELLM_API_KEY=your-key
+LITELLM_MODEL=gpt-4o
+LITELLM_EMBED_MODEL=text-embedding-3-small
+```
+
+The embed model is used for the RAG/AI Chat feature to generate code embeddings.
+
+## Git Provider Tokens
+
+### GitHub
+
+```bash
+GITHUB_TOKEN=ghp_xxxx
+GITHUB_WEBHOOK_SECRET=your-webhook-secret
+```
+
+### GitLab
+
+```bash
+GITLAB_URL=https://gitlab.com
+GITLAB_TOKEN=glpat-xxxx
+GITLAB_WEBHOOK_SECRET=your-webhook-secret
+```
+
+## Issue Tracker Integration
+
+### Jira
+
+```bash
+JIRA_URL=https://your-org.atlassian.net
+JIRA_EMAIL=user@example.com
+JIRA_API_TOKEN=your-api-token
+JIRA_PROJECT_KEY=SEC
+```
+
+When configured, new findings automatically create Jira issues in the specified project.
+
+## Scan Schedules
+
+Cron expressions for automated scanning:
+
+```bash
+# Scan every 6 hours
+SCAN_SCHEDULE=0 0 */6 * * *
+
+# Check for new CVEs daily at midnight
+CVE_MONITOR_SCHEDULE=0 0 0 * * *
+```
+
+## Search Engine
+
+SearXNG is used for CVE enrichment and vulnerability research:
+
+```bash
+SEARXNG_URL=http://localhost:8888
+```
+
+## NVD API
+
+An NVD API key increases rate limits for CVE lookups:
+
+```bash
+NVD_API_KEY=your-nvd-api-key
+```
+
+Get a free key at [https://nvd.nist.gov/developers/request-an-api-key](https://nvd.nist.gov/developers/request-an-api-key).
+
+## Clone Path
+
+Where the agent stores cloned repository files:
+
+```bash
+GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
+```
+
+## All Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `MONGODB_URI` | Yes | — | MongoDB connection string |
+| `MONGODB_DATABASE` | No | `compliance_scanner` | Database name |
+| `AGENT_PORT` | No | `3001` | Agent REST API port |
+| `DASHBOARD_PORT` | No | `8080` | Dashboard web UI port |
+| `AGENT_API_URL` | No | `http://localhost:3001` | Agent URL for dashboard |
+| `LITELLM_URL` | No | `http://localhost:4000` | LiteLLM proxy URL |
+| `LITELLM_API_KEY` | No | — | LiteLLM API key |
+| `LITELLM_MODEL` | No | `gpt-4o` | LLM model for analysis |
+| `LITELLM_EMBED_MODEL` | No | `text-embedding-3-small` | Embedding model for RAG |
+| `GITHUB_TOKEN` | No | — | GitHub personal access token |
+| `GITHUB_WEBHOOK_SECRET` | No | — | GitHub webhook signing secret |
+| `GITLAB_URL` | No | `https://gitlab.com` | GitLab instance URL |
+| `GITLAB_TOKEN` | No | — | GitLab access token |
+| `GITLAB_WEBHOOK_SECRET` | No | — | GitLab webhook signing secret |
+| `JIRA_URL` | No | — | Jira instance URL |
+| `JIRA_EMAIL` | No | — | Jira account email |
+| `JIRA_API_TOKEN` | No | — | Jira API token |
+| `JIRA_PROJECT_KEY` | No | — | Jira project key for issues |
+| `SEARXNG_URL` | No | `http://localhost:8888` | SearXNG instance URL |
+| `NVD_API_KEY` | No | — | NVD API key for CVE lookups |
+| `SCAN_SCHEDULE` | No | `0 0 */6 * * *` | Cron schedule for scans |
+| `CVE_MONITOR_SCHEDULE` | No | `0 0 0 * * *` | Cron schedule for CVE checks |
+| `GIT_CLONE_BASE_PATH` | No | `/tmp/compliance-scanner/repos` | Local clone directory |
+| `KEYCLOAK_URL` | No | — | Keycloak server URL |
+| `KEYCLOAK_REALM` | No | — | Keycloak realm name |
+| `KEYCLOAK_CLIENT_ID` | No | — | Keycloak client ID |
+| `REDIRECT_URI` | No | — | OAuth callback URL |
+| `APP_URL` | No | — | Application root URL |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | No | — | OTLP collector endpoint |
+| `OTEL_SERVICE_NAME` | No | — | OpenTelemetry service name |
@@ -0,0 +1,75 @@
+# Managing Findings
+
+Findings are security issues discovered during scans. The findings workflow lets you triage, track, and resolve vulnerabilities across all your repositories.
+
+## Findings List
+
+Navigate to **Findings** in the sidebar to see all findings. The table shows:
+
+| Column | Description |
+|--------|-------------|
+| Severity | Color-coded badge: Critical (red), High (orange), Medium (yellow), Low (green) |
+| Title | Short description of the vulnerability (clickable) |
+| Type | SAST, SBOM, CVE, GDPR, or OAuth |
+| Scanner | Tool that found the issue (e.g. semgrep, syft) |
+| File | Source file path where the issue was found |
+| Status | Current triage status |
+
+## Filtering
+
+Use the filter bar at the top to narrow results:
+
+- **Repository** — Filter to a specific repository or view all
+- **Severity** — Critical, High, Medium, Low, or Info
+- **Type** — SAST, SBOM, CVE, GDPR, OAuth
+- **Status** — Open, Triaged, Resolved, False Positive, Ignored
+
+Filters can be combined. Results are paginated with 20 findings per page.
+
+## Finding Detail
+
+Click any finding title to view its full detail page, which includes:
+
+### Metadata
+- Severity level with CWE identifier and CVSS score (when available)
+- Scanner tool and scan type
+- File path and line number
+
+### Description
+Full explanation of the vulnerability, why it's a risk, and what conditions trigger it.
+
+### Code Evidence
+The source code snippet where the issue was found, with syntax highlighting and the file path.
+
+### Remediation
+Step-by-step guidance on how to fix the vulnerability.
+
+### Suggested Fix
+A code example showing the corrected implementation.
+
+### Linked Issue
+If the finding was pushed to an issue tracker (GitHub, GitLab, Jira), a direct link to the external issue.
+
+## Updating Status
+
+On the finding detail page, change the finding's status using the status buttons:
+
+| Status | When to Use |
+|--------|-------------|
+| **Open** | New finding, not yet reviewed |
+| **Triaged** | Reviewed and confirmed as a real issue, pending fix |
+| **Resolved** | Fix has been applied |
+| **False Positive** | Finding is not a real vulnerability in this context |
+| **Ignored** | Known issue that won't be fixed (accepted risk) |
+
+Status changes are persisted immediately.
+
+## Severity Levels
+
+| Severity | Description | Typical Examples |
+|----------|-------------|-----------------|
+| **Critical** | Immediate exploitation risk, data breach likely | SQL injection, RCE, hardcoded secrets |
+| **High** | Serious vulnerability, exploitation probable | XSS, authentication bypass, SSRF |
+| **Medium** | Moderate risk, exploitation requires specific conditions | Insecure deserialization, weak crypto |
+| **Low** | Minor risk, limited impact | Information disclosure, verbose errors |
+| **Info** | Informational, no direct security impact | Best practice recommendations |
@@ -0,0 +1,55 @@
+# Getting Started
+
+Compliance Scanner is a security compliance platform that scans your Git repositories for vulnerabilities, builds software bills of materials, performs dynamic application testing, and provides AI-powered code intelligence.
+
+## Architecture
+
+The platform consists of three main components:
+
+- **Agent** — Background service that clones repositories, runs scans, builds graphs, and exposes a REST API
+- **Dashboard** — Web UI built with Dioxus (Rust full-stack framework) for viewing results and managing repositories
+- **MongoDB** — Database for storing all scan results, findings, SBOM data, and graph structures
+
+## Quick Start with Docker Compose
+
+The fastest way to get running:
+
+```bash
+# Clone the repository
+git clone <repo-url> compliance-scanner
+cd compliance-scanner
+
+# Copy and configure environment variables
+cp .env.example .env
+# Edit .env with your settings (see Configuration)
+
+# Start all services
+docker-compose up -d
+```
+
+This starts:
+- MongoDB on port `27017`
+- Agent API on port `3001`
+- Dashboard on port `8080`
+- Chromium (for DAST crawling) on port `3003`
+
+Open the dashboard at [http://localhost:8080](http://localhost:8080).
+
+## What Happens During a Scan
+
+When you add a repository and trigger a scan, the agent runs through these phases:
+
+1. **Clone** — Clones or pulls the latest code from the Git remote
+2. **SAST** — Runs static analysis using Semgrep with rules for OWASP, GDPR, OAuth, and general security
+3. **SBOM** — Extracts all dependencies using Syft, identifying packages, versions, licenses, and known vulnerabilities
+4. **CVE Check** — Cross-references dependencies against the NVD database for known CVEs
+5. **Graph Build** — Parses the codebase to construct a code knowledge graph of functions, classes, and their relationships
+6. **Issue Sync** — Creates or updates issues in connected trackers (GitHub, GitLab, Jira) for new findings
+
+Each phase produces results visible in the dashboard immediately.
+
+## Next Steps
+
+- [Add your first repository](/guide/repositories)
+- [Understand scan results](/guide/findings)
+- [Configure integrations](/guide/configuration)
@@ -0,0 +1,62 @@
+# Adding Repositories
+
+Repositories are the core resource in Compliance Scanner. Each tracked repository is scanned on a schedule and its results are available across all features.
+
+## Adding a Repository
+
+1. Navigate to **Repositories** in the sidebar
+2. Click **Add Repository** at the top of the page
+3. Fill in the form:
+   - **Name** — A display name for the repository
+   - **Git URL** — The clone URL (HTTPS or SSH), e.g. `https://github.com/org/repo.git`
+   - **Default Branch** — The branch to scan, e.g. `main` or `master`
+4. Click **Add**
+
+The repository appears in the list immediately. It will not be scanned until you trigger a scan manually or the next scheduled scan runs.
+
+::: tip
+For private repositories, configure a GitHub token (`GITHUB_TOKEN`) or GitLab token (`GITLAB_TOKEN`) in your environment. The agent uses these tokens when cloning.
+:::
+
+## Repository List
+
+The repositories page shows all tracked repositories with:
+
+| Column | Description |
+|--------|-------------|
+| Name | Repository display name |
+| Git URL | Clone URL |
+| Branch | Default branch being scanned |
+| Findings | Total number of security findings |
+| Last Scanned | Relative timestamp of the most recent scan |
+
+## Triggering a Scan
+
+Click the **Scan** button on any repository row to trigger an immediate scan. The scan runs in the background through all phases (clone, SAST, SBOM, CVE, graph). You can monitor progress on the Overview page under recent scan runs.
+
+## Deleting a Repository
+
+Click the **Delete** button on a repository row. A confirmation dialog appears warning that this action permanently removes:
+
+- All security findings
+- SBOM entries and vulnerability data
+- Scan run history
+- Code graph data
+- Embedding vectors (for AI chat)
+- CVE alerts
+
+This action cannot be undone.
+
+## Automatic Scanning
+
+Repositories are scanned automatically on a schedule configured by the `SCAN_SCHEDULE` environment variable (cron format). The default is every 6 hours:
+
+```
+SCAN_SCHEDULE=0 0 */6 * * *
+```
+
+CVE monitoring runs on a separate schedule (default: daily at midnight):
+
+```
+CVE_MONITOR_SCHEDULE=0 0 0 * * *
+```
@@ -0,0 +1,83 @@
+# Running Scans
+
+Scans are the primary workflow in Compliance Scanner. Each scan analyzes a repository for security vulnerabilities, dependency risks, and code structure.
+
+## Scan Types
+
+A full scan consists of multiple phases, each producing different types of findings:
+
+| Scan Type | What It Detects | Scanner |
+|-----------|----------------|---------|
+| **SAST** | Code-level vulnerabilities (injection, XSS, insecure crypto, etc.) | Semgrep |
+| **SBOM** | Dependency inventory, outdated packages, known vulnerabilities | Syft |
+| **CVE** | Known CVEs in dependencies cross-referenced against NVD | NVD API |
+| **GDPR** | Personal data handling issues, consent violations | Custom rules |
+| **OAuth** | OAuth/OIDC misconfigurations, insecure token handling | Custom rules |
+
+## Triggering a Scan
+
+### Manual Scan
+
+1. Go to **Repositories**
+2. Click **Scan** on the repository you want to scan
+3. The scan starts immediately in the background
+
+### Scheduled Scans
+
+Scans run automatically based on the `SCAN_SCHEDULE` cron expression. The default scans every 6 hours:
+
+```
+SCAN_SCHEDULE=0 0 */6 * * *
+```
+
+### Webhook-Triggered Scans
+
+Configure GitHub or GitLab webhooks to trigger scans on push events. Set the webhook URL to:
+
+```
+http://<agent-host>:3002/webhook/github
+http://<agent-host>:3002/webhook/gitlab
+```
+
+And configure the corresponding webhook secret:
+
+```
+GITHUB_WEBHOOK_SECRET=your-secret
+GITLAB_WEBHOOK_SECRET=your-secret
+```
+
+## Scan Phases
+
+Each scan progresses through these phases in order:
+
+1. **Queued** — Scan is waiting to start
+2. **Cloning** — Repository is being cloned or updated
+3. **Scanning** — Static analysis and SBOM extraction are running
+4. **Analyzing** — CVE cross-referencing and graph construction
+5. **Reporting** — Creating tracker issues for new findings
+6. **Completed** — All phases finished successfully
+
+If any phase fails, the scan status is set to **Failed** with an error message.
+
+## Viewing Scan History
+
+The Overview page shows the 10 most recent scan runs across all repositories, including:
+
+- Repository name
+- Scan status
+- Current phase
+- Number of findings discovered
+- Start time and duration
+
+## Scan Run Statuses
+
+| Status | Meaning |
+|--------|---------|
+| `queued` | Waiting to start |
+| `running` | Currently executing |
+| `completed` | Finished successfully |
+| `failed` | Stopped due to an error |
+
+## Deduplication
+
+Findings are deduplicated using a fingerprint hash based on the scanner, file path, line number, and vulnerability type. Repeated scans will not create duplicate findings for the same issue.