docs: added vite-press docs (#4)
All checks were successful
CI / Clippy (push) Successful in 3m17s
CI / Security Audit (push) Successful in 1m36s
CI / Format (push) Successful in 2s
CI / Tests (push) Successful in 4m38s

Co-authored-by: Sharang Parnerkar <parnerkarsharang@gmail.com>
Reviewed-on: #4
This commit was merged in pull request #4.
This commit is contained in:
2026-03-08 13:59:50 +00:00
parent 65abc55915
commit 7e12d1433a
30 changed files with 4101 additions and 24 deletions

3
docs/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
node_modules
.vitepress/dist
.vitepress/cache

View File

@@ -0,0 +1,55 @@
import { defineConfig } from 'vitepress'
export default defineConfig({
title: 'Compliance Scanner',
description: 'AI-powered security compliance scanning platform',
ignoreDeadLinks: [
/localhost/,
],
themeConfig: {
nav: [
{ text: 'Guide', link: '/guide/getting-started' },
{ text: 'Features', link: '/features/overview' },
{ text: 'Deployment', link: '/deployment/docker' },
],
sidebar: [
{
text: 'Guide',
items: [
{ text: 'Getting Started', link: '/guide/getting-started' },
{ text: 'Adding Repositories', link: '/guide/repositories' },
{ text: 'Running Scans', link: '/guide/scanning' },
{ text: 'Managing Findings', link: '/guide/findings' },
{ text: 'Configuration', link: '/guide/configuration' },
],
},
{
text: 'Features',
items: [
{ text: 'Dashboard Overview', link: '/features/overview' },
{ text: 'SBOM & License Compliance', link: '/features/sbom' },
{ text: 'Code Knowledge Graph', link: '/features/graph' },
{ text: 'Impact Analysis', link: '/features/impact-analysis' },
{ text: 'DAST Scanning', link: '/features/dast' },
{ text: 'AI Chat (RAG)', link: '/features/ai-chat' },
{ text: 'Issue Tracker Integration', link: '/features/issues' },
],
},
{
text: 'Deployment',
items: [
{ text: 'Docker Compose', link: '/deployment/docker' },
{ text: 'Environment Variables', link: '/deployment/environment' },
{ text: 'Keycloak Authentication', link: '/deployment/keycloak' },
{ text: 'OpenTelemetry', link: '/deployment/opentelemetry' },
],
},
],
socialLinks: [
{ icon: 'github', link: 'https://gitea.meghsakha.com/sharang/compliance-scanner-agent' },
],
footer: {
message: 'Compliance Scanner Documentation',
},
},
})

125
docs/deployment/docker.md Normal file
View File

@@ -0,0 +1,125 @@
# Docker Compose Deployment
The recommended way to deploy Compliance Scanner is with Docker Compose.
## Prerequisites
- Docker and Docker Compose installed
- At least 4 GB of available RAM
- Git repository access (tokens configured in `.env`)
## Quick Start
```bash
# Clone the repository
git clone <repo-url> compliance-scanner
cd compliance-scanner
# Configure environment
cp .env.example .env
# Edit .env with your MongoDB credentials, tokens, etc.
# Start all services
docker-compose up -d
```
## Services
The `docker-compose.yml` includes these services:
| Service | Port | Description |
|---------|------|-------------|
| `mongo` | 27017 | MongoDB database |
| `agent` | 3001, 3002 | Compliance agent (REST API + webhooks) |
| `dashboard` | 8080 | Web dashboard |
| `chromium` | 3003 | Headless browser for DAST crawling |
| `otel-collector` | 4317, 4318 | OpenTelemetry collector (optional) |
## Volumes
| Volume | Purpose |
|--------|---------|
| `mongo_data` | Persistent MongoDB data |
| `repos_data` | Cloned repository files |
## Checking Status
```bash
# View running services
docker-compose ps
# View logs
docker-compose logs -f agent
docker-compose logs -f dashboard
# Restart a service
docker-compose restart agent
```
## Accessing the Dashboard
Once running, open [http://localhost:8080](http://localhost:8080) in your browser.
If Keycloak authentication is configured, you'll be redirected to sign in. Otherwise, the dashboard is accessible directly.
## Updating
```bash
# Pull latest changes
git pull
# Rebuild and restart
docker-compose up -d --build
```
## Production Considerations
### MongoDB
For production, use a managed MongoDB instance or configure replication:
```bash
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/compliance_scanner
```
### Reverse Proxy
Place the dashboard behind a reverse proxy (nginx, Caddy, Traefik) with TLS:
```nginx
server {
listen 443 ssl;
server_name compliance.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
### Resource Limits
Add resource limits to Docker Compose for production:
```yaml
services:
agent:
deploy:
resources:
limits:
memory: 2G
cpus: '2.0'
dashboard:
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
```

View File

@@ -0,0 +1,84 @@
# Environment Variables
Complete reference for all environment variables. See [Configuration](/guide/configuration) for detailed descriptions of each variable.
## Required
```bash
# MongoDB connection
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
```
## Agent
```bash
AGENT_PORT=3001
SCAN_SCHEDULE=0 0 */6 * * *
CVE_MONITOR_SCHEDULE=0 0 0 * * *
GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
MONGODB_DATABASE=compliance_scanner
```
## Dashboard
```bash
DASHBOARD_PORT=8080
AGENT_API_URL=http://localhost:3001
```
## LLM / AI
```bash
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=
LITELLM_MODEL=gpt-4o
LITELLM_EMBED_MODEL=text-embedding-3-small
```
## Git Providers
```bash
# GitHub
GITHUB_TOKEN=
GITHUB_WEBHOOK_SECRET=
# GitLab
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=
GITLAB_WEBHOOK_SECRET=
```
## Issue Trackers
```bash
# Jira
JIRA_URL=
JIRA_EMAIL=
JIRA_API_TOKEN=
JIRA_PROJECT_KEY=
```
## External Services
```bash
SEARXNG_URL=http://localhost:8888
NVD_API_KEY=
```
## Authentication
```bash
KEYCLOAK_URL=http://localhost:8080
KEYCLOAK_REALM=compliance
KEYCLOAK_CLIENT_ID=compliance-dashboard
REDIRECT_URI=http://localhost:8080/auth/callback
APP_URL=http://localhost:8080
```
## Observability
```bash
# Set to enable OpenTelemetry export (omit to disable)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=compliance-agent
```

104
docs/deployment/keycloak.md Normal file
View File

@@ -0,0 +1,104 @@
# Keycloak Authentication
Compliance Scanner supports Keycloak for SSO authentication. When configured, all dashboard access requires signing in through Keycloak, and all API endpoints are protected.
## How It Works
### Dashboard (OAuth2/OIDC)
The dashboard implements a standard OAuth2 Authorization Code flow with PKCE:
1. User visits the dashboard
2. If not authenticated, a login page shows with a "Sign in with Keycloak" button
3. User is redirected to Keycloak's login page
4. After authentication, Keycloak redirects back with an authorization code
5. The dashboard exchanges the code for tokens and creates a session
6. All subsequent `/api/` server function calls require a valid session
### Agent API (JWT)
The agent API validates JWT Bearer tokens from Keycloak:
1. Dashboard (or other clients) include the access token in requests: `Authorization: Bearer <token>`
2. The agent fetches Keycloak's JWKS (JSON Web Key Set) to validate the token signature
3. Token expiry and claims are verified
4. The health endpoint (`/api/v1/health`) is always public
If `KEYCLOAK_URL` and `KEYCLOAK_REALM` are not set on the agent, JWT validation is disabled and all endpoints are open.
## Keycloak Setup
### 1. Create a Realm
In the Keycloak admin console:
1. Create a new realm (e.g. `compliance`)
2. Note the realm name — you'll need it for `KEYCLOAK_REALM`
### 2. Create a Client
1. Go to **Clients** > **Create client**
2. Set:
- **Client ID**: `compliance-dashboard`
- **Client type**: OpenID Connect
- **Client authentication**: Off (public client)
3. Under **Settings**:
- **Valid redirect URIs**: `http://localhost:8080/auth/callback` (adjust for your domain)
- **Valid post logout redirect URIs**: `http://localhost:8080`
- **Web origins**: `http://localhost:8080`
### 3. Create Users
1. Go to **Users** > **Create user**
2. Set username, email, first name, last name
3. Under **Credentials**, set a password
## Environment Variables
```bash
# Keycloak server URL (no trailing slash)
KEYCLOAK_URL=http://localhost:8080
# Realm name
KEYCLOAK_REALM=compliance
# Client ID (must match the client created above)
KEYCLOAK_CLIENT_ID=compliance-dashboard
# OAuth callback URL (must match valid redirect URI in Keycloak)
REDIRECT_URI=http://localhost:8080/auth/callback
# Application root URL (used for post-logout redirect)
APP_URL=http://localhost:8080
```
## Dashboard Features
When authenticated, the dashboard shows:
- **User avatar** in the sidebar (from Keycloak profile picture, or initials)
- **User name** from Keycloak profile
- **Logout** link that clears the session and redirects through Keycloak's logout flow
## Session Configuration
Sessions use signed cookies with these defaults:
- **Expiry**: 24 hours of inactivity
- **SameSite**: Lax (required for Keycloak redirect flow)
- **Secure**: Disabled by default (enable behind HTTPS)
- **Storage**: In-memory (resets on server restart)
::: tip
For production, consider persisting sessions to Redis or a database so they survive server restarts.
:::
## Running Without Keycloak
If no Keycloak variables are set:
- The **dashboard** serves without authentication (all pages accessible)
- The **agent API** accepts all requests without token validation
- A warning is logged: `Keycloak not configured - API endpoints are unprotected`
This is suitable for local development and testing.

View File

@@ -0,0 +1,139 @@
# OpenTelemetry Observability
Compliance Scanner exports traces and logs via OpenTelemetry Protocol (OTLP) for integration with observability platforms like SigNoz, Grafana (Tempo + Loki), Jaeger, and others.
## Enabling
Set the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable to enable OTLP export:
```bash
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
```
When this variable is not set, telemetry export is disabled and only console logging is active.
## What Is Exported
### Traces
Distributed traces for:
- HTTP request handling (via `tower-http` `TraceLayer`)
- Database operations
- Scan pipeline phases
- External API calls (LiteLLM, Keycloak, Git providers)
### Logs
All `tracing::info!`, `tracing::warn!`, `tracing::error!` log events are exported as OTel log records, including structured fields.
## Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Collector gRPC endpoint | *(disabled)* |
| `OTEL_SERVICE_NAME` | Service name in traces | `compliance-agent` or `compliance-dashboard` |
| `RUST_LOG` | Log level filter | `info` |
## Docker Compose Setup
The included `docker-compose.yml` provides an OTel Collector service:
```yaml
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
```
The agent and dashboard are pre-configured to send telemetry to the collector:
```yaml
agent:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_SERVICE_NAME: compliance-agent
dashboard:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_SERVICE_NAME: compliance-dashboard
```
## Collector Configuration
Edit `otel-collector-config.yaml` to configure your backend. The default exports to debug (stdout) only.
### SigNoz
```yaml
exporters:
otlp/signoz:
endpoint: "signoz-otel-collector:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/signoz]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp/signoz]
```
### Grafana Tempo (Traces) + Loki (Logs)
```yaml
exporters:
otlp/tempo:
endpoint: "tempo:4317"
tls:
insecure: true
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
```
### Jaeger
```yaml
exporters:
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]
```
## Verifying
After starting with telemetry enabled, look for this log on startup:
```
OpenTelemetry OTLP export enabled endpoint=http://otel-collector:4317 service=compliance-agent
```
If the endpoint is unreachable, the application still starts normally — telemetry export fails silently without affecting functionality.

79
docs/features/ai-chat.md Normal file
View File

@@ -0,0 +1,79 @@
# AI Chat (RAG)
The AI Chat feature lets you ask natural language questions about your codebase. It uses Retrieval-Augmented Generation (RAG) to find relevant code and provide accurate, source-referenced answers.
## How It Works
1. **Code graph** is built for the repository (functions, classes, modules)
2. **Embeddings** are generated for each code symbol using an LLM embedding model
3. When you ask a question, your query is **embedded** and compared against code embeddings
4. The **top 8 most relevant** code snippets are retrieved
5. These snippets are sent as context to the LLM along with your question
6. The LLM generates a response **grounded in your actual code**
## Getting Started
### 1. Select a Repository
Navigate to **AI Chat** in the sidebar. You'll see a grid of repository cards. Click one to open the chat interface.
### 2. Build Embeddings
Before chatting, you need to build embeddings for the repository:
1. Click **Build Embeddings**
2. Wait for the process to complete — a progress bar shows `X/Y chunks`
3. Once the status shows **Embeddings ready**, the chat input is enabled
::: info
Embedding builds require:
- A code graph already built for the repository (via the Graph feature)
- A configured embedding model (`LITELLM_EMBED_MODEL`)
The default model is `text-embedding-3-small`.
:::
### 3. Ask Questions
Type your question in the input area and press Enter (or click Send). Examples:
- "How does authentication work in this codebase?"
- "What functions handle database connections?"
- "Explain the error handling pattern used in this project"
- "Where are the API routes defined?"
- "What does the `process_scan` function do?"
## Understanding Responses
### Answer
The AI response is a natural language answer to your question, grounded in the actual source code of your repository.
### Source References
Below each response, you'll see source references showing exactly which code was used to generate the answer:
- **Symbol name** — The qualified name of the function/class/module
- **File path** — Where the code is located, with line range
- **Code snippet** — The first ~10 lines of the relevant code
- **Relevance score** — How closely the code matched your question (0.0 to 1.0)
## Conversation Context
The chat maintains conversation history within a session. You can ask follow-up questions that reference previous answers. The system sends the last 10 messages as context to maintain coherence.
## Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `LITELLM_URL` | LiteLLM proxy URL | `http://localhost:4000` |
| `LITELLM_API_KEY` | API key for the LLM provider | — |
| `LITELLM_MODEL` | Model for chat responses | `gpt-4o` |
| `LITELLM_EMBED_MODEL` | Model for code embeddings | `text-embedding-3-small` |
## Tips
- **Be specific** — "How does the JWT validation middleware work?" is better than "Tell me about auth"
- **Reference filenames** — "What does `server.rs` do?" helps the retrieval find relevant code
- **Ask about patterns** — "What error handling pattern does this project use?" works well with RAG
- **Rebuild after changes** — If the repository has been updated significantly, rebuild embeddings to include new code

112
docs/features/dast.md Normal file
View File

@@ -0,0 +1,112 @@
# DAST Scanning
DAST (Dynamic Application Security Testing) performs black-box security testing against live web applications and APIs. Unlike SAST which analyzes source code, DAST tests running applications by sending crafted requests and analyzing responses.
## DAST Overview
Navigate to **DAST** in the sidebar to see the overview page with:
- Total DAST scans performed
- Total DAST findings discovered
- Number of active targets
- Recent scan run history with status, phase, and finding counts
## Managing Targets
Navigate to **DAST > Targets** to configure applications to test.
### Adding a Target
1. Enter a **target name** (descriptive label)
2. Enter the **base URL** (e.g. `https://staging.example.com`)
3. Click **Add Target**
### Target Configuration
Each target supports these settings:
| Setting | Description | Default |
|---------|-------------|---------|
| **Target Type** | WebApp, REST API, or GraphQL | WebApp |
| **Max Crawl Depth** | How many link levels to follow | 5 |
| **Rate Limit** | Maximum requests per second | 10 |
| **Destructive Tests** | Allow DELETE/PUT requests | No |
| **Excluded Paths** | URL paths to skip during testing | — |
### Authentication
DAST supports authenticated scanning with multiple methods:
| Method | Configuration |
|--------|--------------|
| **None** | No authentication |
| **Basic** | Username and password (HTTP Basic Auth) |
| **Bearer** | Bearer token (Authorization header) |
| **Cookie** | Session cookie value |
| **Form** | Login URL, username field, password field, and credentials |
::: warning
Authenticated scans access more of the application surface. Only test applications you own or have explicit authorization to test.
:::
## Running a DAST Scan
Click the **Scan** button on any target row. The scan runs through these phases:
1. **Crawl** — Discovers pages, forms, and API endpoints by following links and analyzing JavaScript
2. **Test** — Sends attack payloads to discovered parameters
3. **Report** — Collects results and generates findings
The scan uses a headless Chromium browser (the `chromium` service in Docker Compose) for JavaScript rendering during crawling.
## DAST Scan Agents
The scanner includes specialized testing agents:
### API Fuzzer
Tests API endpoints with malformed inputs, boundary values, and injection payloads.
### XSS Scanner
Detects Cross-Site Scripting vulnerabilities by injecting script payloads into form fields, URL parameters, and headers.
### SSRF Scanner
Tests for Server-Side Request Forgery by injecting internal URLs and cloud metadata endpoints into parameters.
### Auth Bypass Scanner
Tests for authentication and authorization bypass by manipulating tokens, sessions, and access control headers.
## DAST Findings
Navigate to **DAST > Findings** to see all discovered vulnerabilities.
### Finding List
Each finding shows:
| Column | Description |
|--------|-------------|
| Severity | Critical, High, Medium, or Low |
| Type | Vulnerability category (SQL Injection, XSS, SSRF, etc.) |
| Title | Description of the vulnerability |
| Endpoint | The HTTP path that is vulnerable |
| Method | HTTP method (GET, POST, PUT, DELETE) |
| Exploitable | Whether the vulnerability was confirmed exploitable |
### Finding Detail
Click a finding to see full details:
- **Vulnerability type** and CWE identifier
- **Endpoint URL** and HTTP method
- **Parameter** that is vulnerable
- **Exploitability** — Confirmed or Unconfirmed
- **Description** — What the vulnerability is and why it matters
- **Remediation** — How to fix the issue
- **Evidence** — One or more request/response pairs showing:
- The crafted HTTP request (method, URL, headers)
- The payload that triggered the vulnerability
- The HTTP response status and relevant snippet
::: tip
Findings marked as **Confirmed** exploitable were verified by the scanner with a successful attack. **Unconfirmed** findings show suspicious behavior that may indicate a vulnerability but could not be fully exploited.
:::

92
docs/features/graph.md Normal file
View File

@@ -0,0 +1,92 @@
# Code Knowledge Graph
The Code Knowledge Graph feature parses your repository source code and builds an interactive graph of symbols (functions, classes, modules) and their relationships (calls, imports, inheritance).
## Graph Index
Navigate to **Code Graph** in the sidebar to see all repositories. Click a repository card to open its graph explorer.
## Building a Graph
Before exploring, you need to build the graph:
1. Open the graph explorer for a repository
2. Click **Build Graph**
3. The agent parses all source files and constructs the graph
4. A spinner shows build progress
The graph builder supports these languages:
- Rust
- TypeScript
- JavaScript
- Python
## Graph Explorer
The graph explorer provides an interactive network visualization.
### Canvas
The main area renders an interactive network diagram using vis-network:
- **Nodes** represent code symbols (functions, classes, structs, enums, traits, modules, files)
- **Edges** represent relationships between symbols
- Nodes are **color-coded by community** — clusters of highly connected symbols detected using Louvain community detection
- Pan by dragging the background, zoom with scroll wheel
### Node Types
| Type | Description |
|------|-------------|
| Function | Standalone functions |
| Method | Methods on classes/structs |
| Class | Classes (TypeScript, Python) |
| Struct | Structs (Rust) |
| Enum | Enumerations |
| Interface | Interfaces (TypeScript) |
| Trait | Traits (Rust) |
| Module | Modules and namespaces |
| File | Source files |
### Edge Types
| Type | Description |
|------|-------------|
| Calls | Function/method invocation |
| Imports | Module or symbol import |
| Inherits | Class inheritance |
| Implements | Interface/trait implementation |
| Contains | Parent-child containment (module contains function) |
| TypeRef | Type reference or usage |
### Statistics
The statistics panel shows:
- Total node and edge count
- Number of detected communities
- Languages found in the repository
- File tree of the codebase
### Search
Search for symbols by name:
1. Type at least 2 characters in the search box
2. Matching symbols appear in a dropdown
3. Click a result to highlight it on the canvas and open the inspector
### Code Inspector
When you click a node (on the canvas or from search), the inspector panel shows:
- **Symbol name** and kind (function, class, etc.)
- **File path** with line range
- **Source code** excerpt from the file
- **Connected nodes** — what this symbol calls, what calls it, etc.
## Use Cases
- **Onboarding** — Understand unfamiliar codebase structure at a glance
- **Architecture review** — Identify tightly coupled modules and circular dependencies
- **Security** — Trace data flow from entry points to sensitive operations
- **Refactoring** — See what depends on code you plan to change

View File

@@ -0,0 +1,42 @@
# Impact Analysis
Impact Analysis uses the Code Knowledge Graph to determine the blast radius of a security finding. When a vulnerability is found in a specific function or file, impact analysis traces the call graph to show everything that could be affected.
## Accessing Impact Analysis
Impact analysis is linked from the Graph Explorer. When viewing a repository's graph with findings, you can navigate to:
```
/graph/{repo_id}/impact/{finding_id}
```
## What You See
### Blast Radius
A count of the total number of code symbols (functions, methods, classes) affected by the vulnerability, both directly and transitively.
### Entry Points Affected
A list of **public entry points** — main functions, HTTP handlers, API endpoints — that could be impacted by the vulnerable code. These represent the ways an attacker could potentially reach the vulnerability.
### Call Chains
Complete call chain paths showing how execution flows from entry points through intermediate functions to the vulnerable code. Each chain shows the sequence of function calls.
### Direct Callers
The immediate functions that call the vulnerable function. These are the first layer of impact.
## How It Works
1. The finding's file path and line number are matched to a node in the code graph
2. The graph is traversed **backwards** along call edges to find all callers
3. Entry points (functions with no callers, or known patterns like `main`, HTTP handlers) are identified
4. All paths from entry points to the vulnerable node are computed
## Use Cases
- **Prioritization** — A critical vulnerability in a function called by 50 entry points is more urgent than one in dead code
- **Remediation scoping** — Understand what tests need to run after a fix
- **Risk assessment** — Quantify the actual exposure of a vulnerability

72
docs/features/issues.md Normal file
View File

@@ -0,0 +1,72 @@
# Issue Tracker Integration
Compliance Scanner automatically creates issues in your existing issue trackers when new security findings are discovered. This integrates security into your development workflow without requiring teams to check a separate tool.
## Supported Trackers
| Tracker | Configuration Variables |
|---------|----------------------|
| **GitHub Issues** | `GITHUB_TOKEN` |
| **GitLab Issues** | `GITLAB_URL`, `GITLAB_TOKEN` |
| **Jira** | `JIRA_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`, `JIRA_PROJECT_KEY` |
## How It Works
1. A scan discovers new findings
2. For each new finding, the agent checks if an issue already exists (by fingerprint)
3. If not, it creates an issue in the configured tracker with:
- Title matching the finding title
- Description with vulnerability details, severity, and file location
- Link back to the finding in the dashboard
4. The finding is updated with the external issue URL
## Viewing Issues
Navigate to **Issues** in the sidebar to see all tracker issues across your repositories.
The issues table shows:
| Column | Description |
|--------|-------------|
| Tracker | Badge showing GitHub, GitLab, or Jira |
| External ID | Issue number in the external system |
| Title | Issue title |
| Status | Open, Closed, or tracker-specific status |
| Created | When the issue was created |
| Link | Direct link to the issue in the external tracker |
Click the **Open** link to go directly to the issue in GitHub, GitLab, or Jira.
## Configuration
### GitHub
```bash
GITHUB_TOKEN=ghp_xxxx
```
Issues are created in the same repository that was scanned.
### GitLab
```bash
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=glpat-xxxx
```
Issues are created in the same project that was scanned.
### Jira
```bash
JIRA_URL=https://your-org.atlassian.net
JIRA_EMAIL=security-bot@example.com
JIRA_API_TOKEN=your-api-token
JIRA_PROJECT_KEY=SEC
```
All issues are created in the specified Jira project (`JIRA_PROJECT_KEY`).
::: tip
Use a dedicated service account for issue creation so that security findings are clearly attributed to automated scanning rather than individual team members.
:::

35
docs/features/overview.md Normal file
View File

@@ -0,0 +1,35 @@
# Dashboard Overview
The Overview page is the landing page of the Compliance Scanner dashboard. It gives you a high-level view of your security posture across all tracked repositories.
## Statistics
The top section displays key metrics:
| Metric | Description |
|--------|-------------|
| **Repositories** | Total number of tracked repositories |
| **Total Findings** | Combined count of all security findings |
| **Critical** | Findings with critical severity |
| **High** | Findings with high severity |
| **Medium** | Findings with medium severity |
| **Low** | Findings with low severity |
| **Dependencies** | Total SBOM entries across all repositories |
| **CVE Alerts** | Active CVE alerts from dependency monitoring |
| **Tracker Issues** | Issues created in external trackers (GitHub, GitLab, Jira) |
## Severity Distribution
A visual bar chart shows the distribution of findings by severity level, giving you an immediate sense of your risk profile.
## Recent Scan Runs
The bottom section lists the 10 most recent scan runs across all repositories, showing:
- Repository name
- Scan status (queued, running, completed, failed)
- Current phase
- Number of findings discovered
- Timestamp
This helps you monitor scanning activity and quickly spot failures.

106
docs/features/sbom.md Normal file
View File

@@ -0,0 +1,106 @@
# SBOM & License Compliance
The SBOM (Software Bill of Materials) feature provides a complete inventory of all dependencies across your repositories, with vulnerability tracking and license compliance analysis.
The SBOM page has three tabs: **Packages**, **License Compliance**, and **Compare**.
## Packages Tab
The packages tab lists all dependencies discovered during scans.
### Filtering
Use the filter bar to narrow results:
- **Repository** — Select a specific repository or view all
- **Package Manager** — npm, cargo, pip, go, maven, nuget, composer, gem
- **Search** — Filter by package name
- **Vulnerabilities** — Show all packages, only those with vulnerabilities, or only clean packages
- **License** — Filter by specific license (MIT, Apache-2.0, BSD-3-Clause, GPL-3.0, etc.)
### Package Details
Each package row shows:
| Column | Description |
|--------|-------------|
| Package | Package name |
| Version | Installed version |
| Manager | Package manager (npm, cargo, pip, etc.) |
| License | License identifier with color-coded badge |
| Vulnerabilities | Count of known vulnerabilities (click to expand) |
### Vulnerability Details
Click the vulnerability count to expand inline details showing:
- Vulnerability ID (e.g. CVE-2024-1234)
- Source database
- Severity level
- Link to the advisory
### Export
Export your SBOM in industry-standard formats:
1. Select a format:
- **CycloneDX 1.5** — JSON format widely supported by security tools
- **SPDX 2.3** — Linux Foundation standard for license compliance
2. Click **Export**
3. The SBOM downloads as a JSON file
::: tip
SBOM exports are useful for compliance audits, customer security questionnaires, and supply chain transparency requirements.
:::
## License Compliance Tab
The license compliance tab helps you understand your licensing obligations.
### Copyleft Warning
If any dependencies use copyleft licenses (GPL, AGPL, LGPL, MPL), a warning banner appears listing the affected packages and noting that they may impose distribution requirements.
### License Distribution
A horizontal bar chart visualizes the percentage breakdown of licenses across your dependencies.
### License Table
A detailed table lists every license found, with:
| Column | Description |
|--------|-------------|
| License | License identifier |
| Type | **Copyleft** or **Permissive** badge |
| Packages | List of packages using this license |
| Count | Number of packages |
**Copyleft licenses** (flagged as potentially restrictive):
- GPL-2.0, GPL-3.0
- AGPL-3.0
- LGPL-2.1, LGPL-3.0
- MPL-2.0
**Permissive licenses** (generally safe for commercial use):
- MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, etc.
## Compare Tab
Compare the dependency profiles of two repositories side by side.
1. Select **Repository A** from the first dropdown
2. Select **Repository B** from the second dropdown
3. View the diff results:
| Section | Description |
|---------|-------------|
| **Only in A** | Packages present in repo A but not in repo B |
| **Only in B** | Packages present in repo B but not in repo A |
| **Version Diffs** | Same package, different versions between repos |
| **Common** | Count of packages that match exactly |
This is useful for:
- Auditing consistency across microservices
- Identifying dependency drift between environments
- Planning dependency upgrades across projects

141
docs/guide/configuration.md Normal file
View File

@@ -0,0 +1,141 @@
# Configuration
Compliance Scanner is configured through environment variables. Copy `.env.example` to `.env` and edit the values.
## Required Settings
### MongoDB
```bash
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
MONGODB_DATABASE=compliance_scanner
```
### Agent
```bash
AGENT_PORT=3001
```
### Dashboard
```bash
DASHBOARD_PORT=8080
AGENT_API_URL=http://localhost:3001
```
## LLM Configuration
The AI features (chat, remediation suggestions) use LiteLLM as a proxy to various LLM providers:
```bash
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=your-key
LITELLM_MODEL=gpt-4o
LITELLM_EMBED_MODEL=text-embedding-3-small
```
The embed model is used for the RAG/AI Chat feature to generate code embeddings.
## Git Provider Tokens
### GitHub
```bash
GITHUB_TOKEN=ghp_xxxx
GITHUB_WEBHOOK_SECRET=your-webhook-secret
```
### GitLab
```bash
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=glpat-xxxx
GITLAB_WEBHOOK_SECRET=your-webhook-secret
```
## Issue Tracker Integration
### Jira
```bash
JIRA_URL=https://your-org.atlassian.net
JIRA_EMAIL=user@example.com
JIRA_API_TOKEN=your-api-token
JIRA_PROJECT_KEY=SEC
```
When configured, new findings automatically create Jira issues in the specified project.
## Scan Schedules
Cron expressions for automated scanning:
```bash
# Scan every 6 hours
SCAN_SCHEDULE=0 0 */6 * * *
# Check for new CVEs daily at midnight
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```
## Search Engine
SearXNG is used for CVE enrichment and vulnerability research:
```bash
SEARXNG_URL=http://localhost:8888
```
## NVD API
An NVD API key increases rate limits for CVE lookups:
```bash
NVD_API_KEY=your-nvd-api-key
```
Get a free key at [https://nvd.nist.gov/developers/request-an-api-key](https://nvd.nist.gov/developers/request-an-api-key).
## Clone Path
Where the agent stores cloned repository files:
```bash
GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
```
## All Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `MONGODB_URI` | Yes | — | MongoDB connection string |
| `MONGODB_DATABASE` | No | `compliance_scanner` | Database name |
| `AGENT_PORT` | No | `3001` | Agent REST API port |
| `DASHBOARD_PORT` | No | `8080` | Dashboard web UI port |
| `AGENT_API_URL` | No | `http://localhost:3001` | Agent URL for dashboard |
| `LITELLM_URL` | No | `http://localhost:4000` | LiteLLM proxy URL |
| `LITELLM_API_KEY` | No | — | LiteLLM API key |
| `LITELLM_MODEL` | No | `gpt-4o` | LLM model for analysis |
| `LITELLM_EMBED_MODEL` | No | `text-embedding-3-small` | Embedding model for RAG |
| `GITHUB_TOKEN` | No | — | GitHub personal access token |
| `GITHUB_WEBHOOK_SECRET` | No | — | GitHub webhook signing secret |
| `GITLAB_URL` | No | `https://gitlab.com` | GitLab instance URL |
| `GITLAB_TOKEN` | No | — | GitLab access token |
| `GITLAB_WEBHOOK_SECRET` | No | — | GitLab webhook signing secret |
| `JIRA_URL` | No | — | Jira instance URL |
| `JIRA_EMAIL` | No | — | Jira account email |
| `JIRA_API_TOKEN` | No | — | Jira API token |
| `JIRA_PROJECT_KEY` | No | — | Jira project key for issues |
| `SEARXNG_URL` | No | `http://localhost:8888` | SearXNG instance URL |
| `NVD_API_KEY` | No | — | NVD API key for CVE lookups |
| `SCAN_SCHEDULE` | No | `0 0 */6 * * *` | Cron schedule for scans |
| `CVE_MONITOR_SCHEDULE` | No | `0 0 0 * * *` | Cron schedule for CVE checks |
| `GIT_CLONE_BASE_PATH` | No | `/tmp/compliance-scanner/repos` | Local clone directory |
| `KEYCLOAK_URL` | No | — | Keycloak server URL |
| `KEYCLOAK_REALM` | No | — | Keycloak realm name |
| `KEYCLOAK_CLIENT_ID` | No | — | Keycloak client ID |
| `REDIRECT_URI` | No | — | OAuth callback URL |
| `APP_URL` | No | — | Application root URL |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | No | — | OTLP collector endpoint |
| `OTEL_SERVICE_NAME` | No | — | OpenTelemetry service name |

75
docs/guide/findings.md Normal file
View File

@@ -0,0 +1,75 @@
# Managing Findings
Findings are security issues discovered during scans. The findings workflow lets you triage, track, and resolve vulnerabilities across all your repositories.
## Findings List
Navigate to **Findings** in the sidebar to see all findings. The table shows:
| Column | Description |
|--------|-------------|
| Severity | Color-coded badge: Critical (red), High (orange), Medium (yellow), Low (green) |
| Title | Short description of the vulnerability (clickable) |
| Type | SAST, SBOM, CVE, GDPR, or OAuth |
| Scanner | Tool that found the issue (e.g. semgrep, syft) |
| File | Source file path where the issue was found |
| Status | Current triage status |
## Filtering
Use the filter bar at the top to narrow results:
- **Repository** — Filter to a specific repository or view all
- **Severity** — Critical, High, Medium, Low, or Info
- **Type** — SAST, SBOM, CVE, GDPR, OAuth
- **Status** — Open, Triaged, Resolved, False Positive, Ignored
Filters can be combined. Results are paginated with 20 findings per page.
## Finding Detail
Click any finding title to view its full detail page, which includes:
### Metadata
- Severity level with CWE identifier and CVSS score (when available)
- Scanner tool and scan type
- File path and line number
### Description
Full explanation of the vulnerability, why it's a risk, and what conditions trigger it.
### Code Evidence
The source code snippet where the issue was found, with syntax highlighting and the file path.
### Remediation
Step-by-step guidance on how to fix the vulnerability.
### Suggested Fix
A code example showing the corrected implementation.
### Linked Issue
If the finding was pushed to an issue tracker (GitHub, GitLab, Jira), a direct link to the external issue.
## Updating Status
On the finding detail page, change the finding's status using the status buttons:
| Status | When to Use |
|--------|-------------|
| **Open** | New finding, not yet reviewed |
| **Triaged** | Reviewed and confirmed as a real issue, pending fix |
| **Resolved** | Fix has been applied |
| **False Positive** | Finding is not a real vulnerability in this context |
| **Ignored** | Known issue that won't be fixed (accepted risk) |
Status changes are persisted immediately.
## Severity Levels
| Severity | Description | Typical Examples |
|----------|-------------|-----------------|
| **Critical** | Immediate exploitation risk, data breach likely | SQL injection, RCE, hardcoded secrets |
| **High** | Serious vulnerability, exploitation probable | XSS, authentication bypass, SSRF |
| **Medium** | Moderate risk, exploitation requires specific conditions | Insecure deserialization, weak crypto |
| **Low** | Minor risk, limited impact | Information disclosure, verbose errors |
| **Info** | Informational, no direct security impact | Best practice recommendations |

View File

@@ -0,0 +1,55 @@
# Getting Started
Compliance Scanner is a security compliance platform that scans your Git repositories for vulnerabilities, builds software bills of materials, performs dynamic application testing, and provides AI-powered code intelligence.
## Architecture
The platform consists of three main components:
- **Agent** — Background service that clones repositories, runs scans, builds graphs, and exposes a REST API
- **Dashboard** — Web UI built with Dioxus (Rust full-stack framework) for viewing results and managing repositories
- **MongoDB** — Database for storing all scan results, findings, SBOM data, and graph structures
## Quick Start with Docker Compose
The fastest way to get running:
```bash
# Clone the repository
git clone <repo-url> compliance-scanner
cd compliance-scanner
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your settings (see Configuration)
# Start all services
docker-compose up -d
```
This starts:
- MongoDB on port `27017`
- Agent API on port `3001`
- Dashboard on port `8080`
- Chromium (for DAST crawling) on port `3003`
Open the dashboard at [http://localhost:8080](http://localhost:8080).
## What Happens During a Scan
When you add a repository and trigger a scan, the agent runs through these phases:
1. **Clone** — Clones or pulls the latest code from the Git remote
2. **SAST** — Runs static analysis using Semgrep with rules for OWASP, GDPR, OAuth, and general security
3. **SBOM** — Extracts all dependencies using Syft, identifying packages, versions, licenses, and known vulnerabilities
4. **CVE Check** — Cross-references dependencies against the NVD database for known CVEs
5. **Graph Build** — Parses the codebase to construct a code knowledge graph of functions, classes, and their relationships
6. **Issue Sync** — Creates or updates issues in connected trackers (GitHub, GitLab, Jira) for new findings
Each phase produces results visible in the dashboard immediately.
## Next Steps
- [Add your first repository](/guide/repositories)
- [Understand scan results](/guide/findings)
- [Configure integrations](/guide/configuration)

View File

@@ -0,0 +1,62 @@
# Adding Repositories
Repositories are the core resource in Compliance Scanner. Each tracked repository is scanned on a schedule and its results are available across all features.
## Adding a Repository
1. Navigate to **Repositories** in the sidebar
2. Click **Add Repository** at the top of the page
3. Fill in the form:
- **Name** — A display name for the repository
- **Git URL** — The clone URL (HTTPS or SSH), e.g. `https://github.com/org/repo.git`
- **Default Branch** — The branch to scan, e.g. `main` or `master`
4. Click **Add**
The repository appears in the list immediately. It will not be scanned until you trigger a scan manually or the next scheduled scan runs.
::: tip
For private repositories, configure a GitHub token (`GITHUB_TOKEN`) or GitLab token (`GITLAB_TOKEN`) in your environment. The agent uses these tokens when cloning.
:::
## Repository List
The repositories page shows all tracked repositories with:
| Column | Description |
|--------|-------------|
| Name | Repository display name |
| Git URL | Clone URL |
| Branch | Default branch being scanned |
| Findings | Total number of security findings |
| Last Scanned | Relative timestamp of the most recent scan |
## Triggering a Scan
Click the **Scan** button on any repository row to trigger an immediate scan. The scan runs in the background through all phases (clone, SAST, SBOM, CVE, graph). You can monitor progress on the Overview page under recent scan runs.
## Deleting a Repository
Click the **Delete** button on a repository row. A confirmation dialog appears warning that this action permanently removes:
- All security findings
- SBOM entries and vulnerability data
- Scan run history
- Code graph data
- Embedding vectors (for AI chat)
- CVE alerts
This action cannot be undone.
## Automatic Scanning
Repositories are scanned automatically on a schedule configured by the `SCAN_SCHEDULE` environment variable (cron format). The default is every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
CVE monitoring runs on a separate schedule (default: daily at midnight):
```
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```

83
docs/guide/scanning.md Normal file
View File

@@ -0,0 +1,83 @@
# Running Scans
Scans are the primary workflow in Compliance Scanner. Each scan analyzes a repository for security vulnerabilities, dependency risks, and code structure.
## Scan Types
A full scan consists of multiple phases, each producing different types of findings:
| Scan Type | What It Detects | Scanner |
|-----------|----------------|---------|
| **SAST** | Code-level vulnerabilities (injection, XSS, insecure crypto, etc.) | Semgrep |
| **SBOM** | Dependency inventory, outdated packages, known vulnerabilities | Syft |
| **CVE** | Known CVEs in dependencies cross-referenced against NVD | NVD API |
| **GDPR** | Personal data handling issues, consent violations | Custom rules |
| **OAuth** | OAuth/OIDC misconfigurations, insecure token handling | Custom rules |
## Triggering a Scan
### Manual Scan
1. Go to **Repositories**
2. Click **Scan** on the repository you want to scan
3. The scan starts immediately in the background
### Scheduled Scans
Scans run automatically based on the `SCAN_SCHEDULE` cron expression. The default scans every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
### Webhook-Triggered Scans
Configure GitHub or GitLab webhooks to trigger scans on push events. Set the webhook URL to:
```
http://<agent-host>:3002/webhook/github
http://<agent-host>:3002/webhook/gitlab
```
And configure the corresponding webhook secret:
```
GITHUB_WEBHOOK_SECRET=your-secret
GITLAB_WEBHOOK_SECRET=your-secret
```
## Scan Phases
Each scan progresses through these phases in order:
1. **Queued** — Scan is waiting to start
2. **Cloning** — Repository is being cloned or updated
3. **Scanning** — Static analysis and SBOM extraction are running
4. **Analyzing** — CVE cross-referencing and graph construction
5. **Reporting** — Creating tracker issues for new findings
6. **Completed** — All phases finished successfully
If any phase fails, the scan status is set to **Failed** with an error message.
## Viewing Scan History
The Overview page shows the 10 most recent scan runs across all repositories, including:
- Repository name
- Scan status
- Current phase
- Number of findings discovered
- Start time and duration
## Scan Run Statuses
| Status | Meaning |
|--------|---------|
| `queued` | Waiting to start |
| `running` | Currently executing |
| `completed` | Finished successfully |
| `failed` | Stopped due to an error |
## Deduplication
Findings are deduplicated using a fingerprint hash based on the scanner, file path, line number, and vulnerability type. Repeated scans will not create duplicate findings for the same issue.

29
docs/index.md Normal file
View File

@@ -0,0 +1,29 @@
---
layout: home
hero:
name: Compliance Scanner
text: AI-Powered Security Compliance
tagline: Automated SAST, SBOM, DAST, CVE monitoring, and code intelligence for your repositories
actions:
- theme: brand
text: Get Started
link: /guide/getting-started
- theme: alt
text: Features
link: /features/overview
features:
- title: Static Analysis (SAST)
details: Automated security scanning with Semgrep, detecting vulnerabilities across multiple languages including OWASP patterns, GDPR issues, and OAuth misconfigurations.
- title: SBOM & License Compliance
details: Full software bill of materials with dependency inventory, vulnerability tracking, license compliance analysis, and export to CycloneDX/SPDX formats.
- title: Dynamic Testing (DAST)
details: Black-box security testing of live web applications and APIs. Crawls endpoints, fuzzes parameters, and detects SQL injection, XSS, SSRF, and auth bypass vulnerabilities.
- title: Code Knowledge Graph
details: Interactive visualization of your codebase structure. Understand function calls, class hierarchies, and module dependencies with community detection.
- title: Impact Analysis
details: When a vulnerability is found, see exactly which entry points and call chains are affected. Understand blast radius before prioritizing fixes.
- title: AI-Powered Chat
details: Ask questions about your codebase using RAG-powered AI. Code is embedded as vectors and retrieved contextually to give accurate, source-referenced answers.
---

2514
docs/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

12
docs/package.json Normal file
View File

@@ -0,0 +1,12 @@
{
"name": "compliance-scanner-docs",
"private": true,
"scripts": {
"dev": "vitepress dev",
"build": "vitepress build",
"preview": "vitepress preview"
},
"devDependencies": {
"vitepress": "^1.6.4"
}
}