docs: rewrite user docs, fix modal scroll, webhook URL, and sccache
Some checks failed
CI / Clippy (push) Failing after 2m49s
CI / Security Audit (push) Has been skipped
CI / Tests (push) Has been skipped
CI / Detect Changes (push) Has been skipped
CI / Format (pull_request) Successful in 3s
CI / Clippy (pull_request) Failing after 2m52s
CI / Security Audit (pull_request) Has been skipped
CI / Tests (pull_request) Has been skipped
CI / Format (push) Successful in 3s
CI / Deploy Agent (push) Has been skipped
CI / Deploy Dashboard (push) Has been skipped
CI / Deploy Docs (push) Has been skipped
CI / Deploy MCP (push) Has been skipped
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped

Rewrite all public documentation to be user-facing only:
- Remove deployment, configuration, and self-hosting sections
- Add guide pages for SBOM, issues, webhooks & PR reviews
- Add reference pages for glossary and tools/scanners
- Add 12 screenshots from live dashboard
- Explain MCP, LLM triage, false positives, human-in-the-loop

Fix edit repository modal not scrollable (max-height + overflow-y).
Show full webhook URL using window.location.origin instead of path.
Unset RUSTC_WRAPPER in agent cargo commands to avoid sccache errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Sharang Parnerkar
2026-03-11 14:17:46 +01:00
parent 689daa0f49
commit c253e4ef5e
40 changed files with 872 additions and 1334 deletions

View File

@@ -441,6 +441,8 @@ tr:hover {
padding: 24px;
max-width: 440px;
width: 90%;
max-height: 85vh;
overflow-y: auto;
}
.modal-dialog h3 {

View File

@@ -108,6 +108,7 @@ async fn run_clippy(repo_path: &Path, repo_id: &str) -> Result<Vec<Finding>, Cor
"clippy::all",
])
.current_dir(repo_path)
.env("RUSTC_WRAPPER", "")
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()

View File

@@ -54,6 +54,7 @@ async fn generate_lockfiles(repo_path: &Path) {
let result = tokio::process::Command::new("cargo")
.args(["generate-lockfile"])
.current_dir(repo_path)
.env("RUSTC_WRAPPER", "")
.output()
.await;
match result {
@@ -137,6 +138,7 @@ async fn enrich_cargo_licenses(repo_path: &Path, entries: &mut [SbomEntry]) {
let output = match tokio::process::Command::new("cargo")
.args(["metadata", "--format-version", "1"])
.current_dir(repo_path)
.env("RUSTC_WRAPPER", "")
.output()
.await
{
@@ -245,6 +247,7 @@ async fn run_cargo_audit(repo_path: &Path, _repo_id: &str) -> Result<Vec<AuditVu
let output = tokio::process::Command::new("cargo")
.args(["audit", "--json"])
.current_dir(repo_path)
.env("RUSTC_WRAPPER", "")
.output()
.await
.map_err(|e| CoreError::Scanner {

View File

@@ -394,11 +394,12 @@ pub fn RepositoriesPage() -> Element {
r#type: "text",
readonly: true,
style: "font-family: monospace; font-size: 12px;",
value: format!("/webhook/{}/{eid}", edit_webhook_tracker()),
}
p {
style: "font-size: 11px; color: var(--text-secondary); margin-top: 4px;",
"Use the full dashboard URL as the base, e.g. https://your-domain.com/webhook/..."
value: {
let origin = web_sys::window()
.and_then(|w| w.location().origin().ok())
.unwrap_or_default();
format!("{origin}/webhook/{}/{eid}", edit_webhook_tracker())
},
}
}
div { class: "form-group",

View File

@@ -1,7 +1,7 @@
import { defineConfig } from 'vitepress'
export default defineConfig({
title: 'Compliance Scanner',
title: 'Certifai',
description: 'AI-powered security compliance scanning platform',
ignoreDeadLinks: [
/localhost/,
@@ -10,7 +10,7 @@ export default defineConfig({
nav: [
{ text: 'Guide', link: '/guide/getting-started' },
{ text: 'Features', link: '/features/overview' },
{ text: 'Deployment', link: '/deployment/docker' },
{ text: 'Reference', link: '/reference/glossary' },
],
sidebar: [
{
@@ -19,30 +19,27 @@ export default defineConfig({
{ text: 'Getting Started', link: '/guide/getting-started' },
{ text: 'Adding Repositories', link: '/guide/repositories' },
{ text: 'Running Scans', link: '/guide/scanning' },
{ text: 'Managing Findings', link: '/guide/findings' },
{ text: 'Configuration', link: '/guide/configuration' },
{ text: 'Understanding Findings', link: '/guide/findings' },
{ text: 'SBOM & Licenses', link: '/guide/sbom' },
{ text: 'Issues & Tracking', link: '/guide/issues' },
{ text: 'Webhooks & PR Reviews', link: '/guide/webhooks' },
],
},
{
text: 'Features',
items: [
{ text: 'Dashboard Overview', link: '/features/overview' },
{ text: 'SBOM & License Compliance', link: '/features/sbom' },
{ text: 'Code Knowledge Graph', link: '/features/graph' },
{ text: 'Impact Analysis', link: '/features/impact-analysis' },
{ text: 'DAST Scanning', link: '/features/dast' },
{ text: 'AI Chat (RAG)', link: '/features/ai-chat' },
{ text: 'Issue Tracker Integration', link: '/features/issues' },
{ text: 'MCP Server', link: '/features/mcp-server' },
{ text: 'AI Chat', link: '/features/ai-chat' },
{ text: 'Code Knowledge Graph', link: '/features/graph' },
{ text: 'MCP Integration', link: '/features/mcp-server' },
],
},
{
text: 'Deployment',
text: 'Reference',
items: [
{ text: 'Docker Compose', link: '/deployment/docker' },
{ text: 'Environment Variables', link: '/deployment/environment' },
{ text: 'Keycloak Authentication', link: '/deployment/keycloak' },
{ text: 'OpenTelemetry', link: '/deployment/opentelemetry' },
{ text: 'Glossary', link: '/reference/glossary' },
{ text: 'Tools & Scanners', link: '/reference/tools' },
],
},
],
@@ -50,7 +47,7 @@ export default defineConfig({
{ icon: 'github', link: 'https://gitea.meghsakha.com/sharang/compliance-scanner-agent' },
],
footer: {
message: 'Compliance Scanner Documentation',
message: 'Certifai Documentation',
},
},
})

View File

@@ -1,125 +0,0 @@
# Docker Compose Deployment
The recommended way to deploy Compliance Scanner is with Docker Compose.
## Prerequisites
- Docker and Docker Compose installed
- At least 4 GB of available RAM
- Git repository access (tokens configured in `.env`)
## Quick Start
```bash
# Clone the repository
git clone <repo-url> compliance-scanner
cd compliance-scanner
# Configure environment
cp .env.example .env
# Edit .env with your MongoDB credentials, tokens, etc.
# Start all services
docker-compose up -d
```
## Services
The `docker-compose.yml` includes these services:
| Service | Port | Description |
|---------|------|-------------|
| `mongo` | 27017 | MongoDB database |
| `agent` | 3001, 3002 | Compliance agent (REST API + webhooks) |
| `dashboard` | 8080 | Web dashboard |
| `chromium` | 3003 | Headless browser for DAST crawling |
| `otel-collector` | 4317, 4318 | OpenTelemetry collector (optional) |
## Volumes
| Volume | Purpose |
|--------|---------|
| `mongo_data` | Persistent MongoDB data |
| `repos_data` | Cloned repository files |
## Checking Status
```bash
# View running services
docker-compose ps
# View logs
docker-compose logs -f agent
docker-compose logs -f dashboard
# Restart a service
docker-compose restart agent
```
## Accessing the Dashboard
Once running, open [http://localhost:8080](http://localhost:8080) in your browser.
If Keycloak authentication is configured, you'll be redirected to sign in. Otherwise, the dashboard is accessible directly.
## Updating
```bash
# Pull latest changes
git pull
# Rebuild and restart
docker-compose up -d --build
```
## Production Considerations
### MongoDB
For production, use a managed MongoDB instance or configure replication:
```bash
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/compliance_scanner
```
### Reverse Proxy
Place the dashboard behind a reverse proxy (nginx, Caddy, Traefik) with TLS:
```nginx
server {
listen 443 ssl;
server_name compliance.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
### Resource Limits
Add resource limits to Docker Compose for production:
```yaml
services:
agent:
deploy:
resources:
limits:
memory: 2G
cpus: '2.0'
dashboard:
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
```

View File

@@ -1,93 +0,0 @@
# Environment Variables
Complete reference for all environment variables. See [Configuration](/guide/configuration) for detailed descriptions of each variable.
## Required
```bash
# MongoDB connection
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
```
## Agent
```bash
AGENT_PORT=3001
SCAN_SCHEDULE=0 0 */6 * * *
CVE_MONITOR_SCHEDULE=0 0 0 * * *
GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
MONGODB_DATABASE=compliance_scanner
```
## Dashboard
```bash
DASHBOARD_PORT=8080
AGENT_API_URL=http://localhost:3001
```
## LLM / AI
```bash
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=
LITELLM_MODEL=gpt-4o
LITELLM_EMBED_MODEL=text-embedding-3-small
```
## Git Providers
```bash
# GitHub
GITHUB_TOKEN=
GITHUB_WEBHOOK_SECRET=
# GitLab
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=
GITLAB_WEBHOOK_SECRET=
```
## Issue Trackers
```bash
# Jira
JIRA_URL=
JIRA_EMAIL=
JIRA_API_TOKEN=
JIRA_PROJECT_KEY=
```
## External Services
```bash
SEARXNG_URL=http://localhost:8888
NVD_API_KEY=
```
## Authentication
```bash
KEYCLOAK_URL=http://localhost:8080
KEYCLOAK_REALM=compliance
KEYCLOAK_CLIENT_ID=compliance-dashboard
REDIRECT_URI=http://localhost:8080/auth/callback
APP_URL=http://localhost:8080
```
## MCP Server
```bash
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
MONGODB_DATABASE=compliance_scanner
# Set to enable HTTP transport (omit for stdio)
MCP_PORT=8090
```
## Observability
```bash
# Set to enable OpenTelemetry export (omit to disable)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=compliance-agent
```

View File

@@ -1,104 +0,0 @@
# Keycloak Authentication
Compliance Scanner supports Keycloak for SSO authentication. When configured, all dashboard access requires signing in through Keycloak, and all API endpoints are protected.
## How It Works
### Dashboard (OAuth2/OIDC)
The dashboard implements a standard OAuth2 Authorization Code flow with PKCE:
1. User visits the dashboard
2. If not authenticated, a login page shows with a "Sign in with Keycloak" button
3. User is redirected to Keycloak's login page
4. After authentication, Keycloak redirects back with an authorization code
5. The dashboard exchanges the code for tokens and creates a session
6. All subsequent `/api/` server function calls require a valid session
### Agent API (JWT)
The agent API validates JWT Bearer tokens from Keycloak:
1. Dashboard (or other clients) include the access token in requests: `Authorization: Bearer <token>`
2. The agent fetches Keycloak's JWKS (JSON Web Key Set) to validate the token signature
3. Token expiry and claims are verified
4. The health endpoint (`/api/v1/health`) is always public
If `KEYCLOAK_URL` and `KEYCLOAK_REALM` are not set on the agent, JWT validation is disabled and all endpoints are open.
## Keycloak Setup
### 1. Create a Realm
In the Keycloak admin console:
1. Create a new realm (e.g. `compliance`)
2. Note the realm name — you'll need it for `KEYCLOAK_REALM`
### 2. Create a Client
1. Go to **Clients** > **Create client**
2. Set:
- **Client ID**: `compliance-dashboard`
- **Client type**: OpenID Connect
- **Client authentication**: Off (public client)
3. Under **Settings**:
- **Valid redirect URIs**: `http://localhost:8080/auth/callback` (adjust for your domain)
- **Valid post logout redirect URIs**: `http://localhost:8080`
- **Web origins**: `http://localhost:8080`
### 3. Create Users
1. Go to **Users** > **Create user**
2. Set username, email, first name, last name
3. Under **Credentials**, set a password
## Environment Variables
```bash
# Keycloak server URL (no trailing slash)
KEYCLOAK_URL=http://localhost:8080
# Realm name
KEYCLOAK_REALM=compliance
# Client ID (must match the client created above)
KEYCLOAK_CLIENT_ID=compliance-dashboard
# OAuth callback URL (must match valid redirect URI in Keycloak)
REDIRECT_URI=http://localhost:8080/auth/callback
# Application root URL (used for post-logout redirect)
APP_URL=http://localhost:8080
```
## Dashboard Features
When authenticated, the dashboard shows:
- **User avatar** in the sidebar (from Keycloak profile picture, or initials)
- **User name** from Keycloak profile
- **Logout** link that clears the session and redirects through Keycloak's logout flow
## Session Configuration
Sessions use signed cookies with these defaults:
- **Expiry**: 24 hours of inactivity
- **SameSite**: Lax (required for Keycloak redirect flow)
- **Secure**: Disabled by default (enable behind HTTPS)
- **Storage**: In-memory (resets on server restart)
::: tip
For production, consider persisting sessions to Redis or a database so they survive server restarts.
:::
## Running Without Keycloak
If no Keycloak variables are set:
- The **dashboard** serves without authentication (all pages accessible)
- The **agent API** accepts all requests without token validation
- A warning is logged: `Keycloak not configured - API endpoints are unprotected`
This is suitable for local development and testing.

View File

@@ -1,139 +0,0 @@
# OpenTelemetry Observability
Compliance Scanner exports traces and logs via OpenTelemetry Protocol (OTLP) for integration with observability platforms like SigNoz, Grafana (Tempo + Loki), Jaeger, and others.
## Enabling
Set the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable to enable OTLP export:
```bash
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
```
When this variable is not set, telemetry export is disabled and only console logging is active.
## What Is Exported
### Traces
Distributed traces for:
- HTTP request handling (via `tower-http` `TraceLayer`)
- Database operations
- Scan pipeline phases
- External API calls (LiteLLM, Keycloak, Git providers)
### Logs
All `tracing::info!`, `tracing::warn!`, `tracing::error!` log events are exported as OTel log records, including structured fields.
## Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Collector gRPC endpoint | *(disabled)* |
| `OTEL_SERVICE_NAME` | Service name in traces | `compliance-agent` or `compliance-dashboard` |
| `RUST_LOG` | Log level filter | `info` |
## Docker Compose Setup
The included `docker-compose.yml` provides an OTel Collector service:
```yaml
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
```
The agent and dashboard are pre-configured to send telemetry to the collector:
```yaml
agent:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_SERVICE_NAME: compliance-agent
dashboard:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
OTEL_SERVICE_NAME: compliance-dashboard
```
## Collector Configuration
Edit `otel-collector-config.yaml` to configure your backend. The default exports to debug (stdout) only.
### SigNoz
```yaml
exporters:
otlp/signoz:
endpoint: "signoz-otel-collector:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/signoz]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp/signoz]
```
### Grafana Tempo (Traces) + Loki (Logs)
```yaml
exporters:
otlp/tempo:
endpoint: "tempo:4317"
tls:
insecure: true
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
```
### Jaeger
```yaml
exporters:
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]
```
## Verifying
After starting with telemetry enabled, look for this log on startup:
```
OpenTelemetry OTLP export enabled endpoint=http://otel-collector:4317 service=compliance-agent
```
If the endpoint is unreachable, the application still starts normally — telemetry export fails silently without affecting functionality.

View File

@@ -1,41 +1,12 @@
# AI Chat (RAG)
# AI Chat
The AI Chat feature lets you ask natural language questions about your codebase. It uses Retrieval-Augmented Generation (RAG) to find relevant code and provide accurate, source-referenced answers.
The AI Chat feature lets you ask natural language questions about your codebase and get accurate, source-referenced answers.
## How It Works
## What It Does
1. **Code graph** is built for the repository (functions, classes, modules)
2. **Embeddings** are generated for each code symbol using an LLM embedding model
3. When you ask a question, your query is **embedded** and compared against code embeddings
4. The **top 8 most relevant** code snippets are retrieved
5. These snippets are sent as context to the LLM along with your question
6. The LLM generates a response **grounded in your actual code**
AI Chat uses Retrieval-Augmented Generation (RAG) to answer questions about your code. Instead of relying solely on the LLM's training data, it retrieves relevant code from your actual repository and uses it as context for generating answers.
## Getting Started
### 1. Select a Repository
Navigate to **AI Chat** in the sidebar. You'll see a grid of repository cards. Click one to open the chat interface.
### 2. Build Embeddings
Before chatting, you need to build embeddings for the repository:
1. Click **Build Embeddings**
2. Wait for the process to complete — a progress bar shows `X/Y chunks`
3. Once the status shows **Embeddings ready**, the chat input is enabled
::: info
Embedding builds require:
- A code graph already built for the repository (via the Graph feature)
- A configured embedding model (`LITELLM_EMBED_MODEL`)
The default model is `text-embedding-3-small`.
:::
### 3. Ask Questions
Type your question in the input area and press Enter (or click Send). Examples:
This means you can ask questions like:
- "How does authentication work in this codebase?"
- "What functions handle database connections?"
@@ -43,37 +14,42 @@ Type your question in the input area and press Enter (or click Send). Examples:
- "Where are the API routes defined?"
- "What does the `process_scan` function do?"
## Understanding Responses
## How RAG Works
### Answer
In simple terms:
The AI response is a natural language answer to your question, grounded in the actual source code of your repository.
1. Your codebase is parsed into functions, classes, and modules during graph building
2. Each code symbol is converted into a numerical representation (an embedding) that captures its meaning
3. When you ask a question, your question is also converted into an embedding
4. The system finds the code snippets whose embeddings are most similar to your question
5. Those snippets are sent to the LLM along with your question as context
6. The LLM generates an answer grounded in your actual code, not generic knowledge
### Source References
## Getting Started
Below each response, you'll see source references showing exactly which code was used to generate the answer:
1. Navigate to **AI Chat** in the sidebar
2. Select a repository from the grid of cards
3. If embeddings have not been built yet, click **Build Embeddings** and wait for the process to complete
4. Once the status shows **Embeddings ready**, type your question and press Enter
- **Symbol name** — The qualified name of the function/class/module
- **File path** — Where the code is located, with line range
- **Code snippet** — The first ~10 lines of the relevant code
- **Relevance score** — How closely the code matched your question (0.0 to 1.0)
::: tip
Rebuild embeddings after significant code changes to ensure the AI has access to the latest version of your codebase.
:::
## Conversation Context
## Source References
The chat maintains conversation history within a session. You can ask follow-up questions that reference previous answers. The system sends the last 10 messages as context to maintain coherence.
Below each AI response, you will see source references showing exactly which code was used to generate the answer:
## Configuration
- **Symbol name** -- the qualified name of the function, class, or module
- **File path** -- where the code is located, with line range
- **Code snippet** -- the first several lines of the relevant code
- **Relevance score** -- how closely the code matched your question (0.0 to 1.0)
| Variable | Description | Default |
|----------|-------------|---------|
| `LITELLM_URL` | LiteLLM proxy URL | `http://localhost:4000` |
| `LITELLM_API_KEY` | API key for the LLM provider | — |
| `LITELLM_MODEL` | Model for chat responses | `gpt-4o` |
| `LITELLM_EMBED_MODEL` | Model for code embeddings | `text-embedding-3-small` |
Source references let you verify the AI's answer against the actual code and navigate directly to the relevant files.
## Tips
## Tips for Better Results
- **Be specific** "How does the JWT validation middleware work?" is better than "Tell me about auth"
- **Reference filenames** "What does `server.rs` do?" helps the retrieval find relevant code
- **Ask about patterns** "What error handling pattern does this project use?" works well with RAG
- **Rebuild after changes** — If the repository has been updated significantly, rebuild embeddings to include new code
- **Be specific** -- "How does the JWT validation middleware work?" is better than "Tell me about auth"
- **Reference filenames** -- "What does `server.rs` do?" helps the retrieval find relevant code
- **Ask about patterns** -- "What error handling pattern does this project use?" works well with RAG
- **Use follow-ups** -- the chat maintains conversation history within a session, so you can ask follow-up questions

View File

@@ -1,10 +1,14 @@
# DAST Scanning
DAST (Dynamic Application Security Testing) performs black-box security testing against live web applications and APIs. Unlike SAST which analyzes source code, DAST tests running applications by sending crafted requests and analyzing responses.
DAST (Dynamic Application Security Testing) performs black-box security testing against live web applications and APIs. Unlike SAST which analyzes source code, DAST tests running applications by sending crafted requests and analyzing responses for vulnerabilities.
## DAST Overview
Navigate to **DAST** in the sidebar to see the overview page with:
Navigate to **DAST** in the sidebar to see the overview page.
![DAST overview with scan runs and finding counts](/screenshots/dast-overview.png)
The overview shows:
- Total DAST scans performed
- Total DAST findings discovered
@@ -21,29 +25,29 @@ Navigate to **DAST > Targets** to configure applications to test.
2. Enter the **base URL** (e.g. `https://staging.example.com`)
3. Click **Add Target**
### Target Configuration
### Target Settings
Each target supports these settings:
| Setting | Description | Default |
|---------|-------------|---------|
| **Target Type** | WebApp, REST API, or GraphQL | WebApp |
| **Max Crawl Depth** | How many link levels to follow | 5 |
| **Rate Limit** | Maximum requests per second | 10 |
| **Destructive Tests** | Allow DELETE/PUT requests | No |
| **Excluded Paths** | URL paths to skip during testing | — |
| Setting | Description |
|---------|-------------|
| **Target Type** | WebApp, REST API, or GraphQL |
| **Max Crawl Depth** | How many link levels to follow |
| **Rate Limit** | Maximum requests per second |
| **Destructive Tests** | Allow DELETE/PUT requests |
| **Excluded Paths** | URL paths to skip during testing |
### Authentication
DAST supports authenticated scanning with multiple methods:
DAST supports authenticated scanning so it can test pages behind login:
| Method | Configuration |
|--------|--------------|
| Method | Description |
|--------|------------|
| **None** | No authentication |
| **Basic** | Username and password (HTTP Basic Auth) |
| **Bearer** | Bearer token (Authorization header) |
| **Basic** | HTTP Basic Auth with username and password |
| **Bearer** | Bearer token in the Authorization header |
| **Cookie** | Session cookie value |
| **Form** | Login URL, username field, password field, and credentials |
| **Form** | Login form with URL, field names, and credentials |
::: warning
Authenticated scans access more of the application surface. Only test applications you own or have explicit authorization to test.
@@ -51,37 +55,15 @@ Authenticated scans access more of the application surface. Only test applicatio
## Running a DAST Scan
Click the **Scan** button on any target row. The scan runs through these phases:
Click the **Scan** button on any target row. The scan progresses through:
1. **Crawl** — Discovers pages, forms, and API endpoints by following links and analyzing JavaScript
2. **Test** — Sends attack payloads to discovered parameters
3. **Report** — Collects results and generates findings
1. **Crawl** -- discovers pages, forms, and API endpoints by following links and analyzing JavaScript
2. **Test** -- sends attack payloads to discovered parameters
3. **Report** -- collects results and generates findings
The scan uses a headless Chromium browser (the `chromium` service in Docker Compose) for JavaScript rendering during crawling.
## Viewing DAST Findings
## DAST Scan Agents
The scanner includes specialized testing agents:
### API Fuzzer
Tests API endpoints with malformed inputs, boundary values, and injection payloads.
### XSS Scanner
Detects Cross-Site Scripting vulnerabilities by injecting script payloads into form fields, URL parameters, and headers.
### SSRF Scanner
Tests for Server-Side Request Forgery by injecting internal URLs and cloud metadata endpoints into parameters.
### Auth Bypass Scanner
Tests for authentication and authorization bypass by manipulating tokens, sessions, and access control headers.
## DAST Findings
Navigate to **DAST > Findings** to see all discovered vulnerabilities.
### Finding List
Each finding shows:
Navigate to **DAST > Findings** to see all discovered vulnerabilities. Each finding shows:
| Column | Description |
|--------|-------------|
@@ -92,21 +74,8 @@ Each finding shows:
| Method | HTTP method (GET, POST, PUT, DELETE) |
| Exploitable | Whether the vulnerability was confirmed exploitable |
### Finding Detail
Click a finding to see full details:
- **Vulnerability type** and CWE identifier
- **Endpoint URL** and HTTP method
- **Parameter** that is vulnerable
- **Exploitability** — Confirmed or Unconfirmed
- **Description** — What the vulnerability is and why it matters
- **Remediation** — How to fix the issue
- **Evidence** — One or more request/response pairs showing:
- The crafted HTTP request (method, URL, headers)
- The payload that triggered the vulnerability
- The HTTP response status and relevant snippet
Click a finding to see full details including the CWE identifier, vulnerable parameter, remediation guidance, and evidence showing the exact request/response pairs that triggered the finding.
::: tip
Findings marked as **Confirmed** exploitable were verified by the scanner with a successful attack. **Unconfirmed** findings show suspicious behavior that may indicate a vulnerability but could not be fully exploited.
Findings marked as **Confirmed** exploitable were verified with a successful attack payload. **Unconfirmed** findings show suspicious behavior that may indicate a vulnerability but could not be fully exploited.
:::

View File

@@ -1,92 +1,52 @@
# Code Knowledge Graph
The Code Knowledge Graph feature parses your repository source code and builds an interactive graph of symbols (functions, classes, modules) and their relationships (calls, imports, inheritance).
The Code Knowledge Graph parses your repository and builds an interactive visualization of its structure -- functions, classes, modules, and how they connect through calls, imports, and inheritance.
## Graph Index
## What It Shows
Navigate to **Code Graph** in the sidebar to see all repositories. Click a repository card to open its graph explorer.
The graph maps your codebase as a network of nodes (code symbols) and edges (relationships). It supports Rust, TypeScript, JavaScript, and Python.
## Building a Graph
**Node types**: Functions, methods, classes, structs, enums, interfaces, traits, modules, and files.
Before exploring, you need to build the graph:
**Edge types**: Calls (function invocation), imports, inheritance, interface/trait implementation, containment (module contains function), and type references.
1. Open the graph explorer for a repository
2. Click **Build Graph**
3. The agent parses all source files and constructs the graph
4. A spinner shows build progress
Nodes are color-coded by community -- clusters of highly connected symbols detected automatically using community detection algorithms.
The graph builder supports these languages:
- Rust
- TypeScript
- JavaScript
- Python
## How to Navigate
## Graph Explorer
### Graph Explorer
The graph explorer provides an interactive network visualization.
1. Navigate to **Code Graph** in the sidebar
2. Select a repository from the list
3. If the graph has not been built yet, click **Build Graph** and wait for parsing to complete
4. The interactive canvas renders with all symbols and relationships
### Canvas
On the canvas:
The main area renders an interactive network diagram using vis-network:
- **Nodes** represent code symbols (functions, classes, structs, enums, traits, modules, files)
- **Edges** represent relationships between symbols
- Nodes are **color-coded by community** — clusters of highly connected symbols detected using Louvain community detection
- Pan by dragging the background, zoom with scroll wheel
### Node Types
| Type | Description |
|------|-------------|
| Function | Standalone functions |
| Method | Methods on classes/structs |
| Class | Classes (TypeScript, Python) |
| Struct | Structs (Rust) |
| Enum | Enumerations |
| Interface | Interfaces (TypeScript) |
| Trait | Traits (Rust) |
| Module | Modules and namespaces |
| File | Source files |
### Edge Types
| Type | Description |
|------|-------------|
| Calls | Function/method invocation |
| Imports | Module or symbol import |
| Inherits | Class inheritance |
| Implements | Interface/trait implementation |
| Contains | Parent-child containment (module contains function) |
| TypeRef | Type reference or usage |
### Statistics
The statistics panel shows:
- Total node and edge count
- Number of detected communities
- Languages found in the repository
- File tree of the codebase
- **Pan** by dragging the background
- **Zoom** with the scroll wheel
- **Click a node** to open the code inspector panel
### Search
Search for symbols by name:
1. Type at least 2 characters in the search box
2. Matching symbols appear in a dropdown
3. Click a result to highlight it on the canvas and open the inspector
Type at least 2 characters in the search box to find symbols by name. Click a result to highlight it on the canvas and open the inspector.
### Code Inspector
When you click a node (on the canvas or from search), the inspector panel shows:
When you click a node, the inspector panel shows:
- **Symbol name** and kind (function, class, etc.)
- **File path** with line range
- **Source code** excerpt from the file
- **Connected nodes** what this symbol calls, what calls it, etc.
- **Source code** excerpt
- **Connected nodes** -- what this symbol calls, what calls it, what it imports, etc.
### Statistics
The statistics panel shows the total node and edge count, number of detected communities, languages found, and a file tree of the codebase.
## Use Cases
- **Onboarding** — Understand unfamiliar codebase structure at a glance
- **Architecture review** — Identify tightly coupled modules and circular dependencies
- **Security** — Trace data flow from entry points to sensitive operations
- **Refactoring** — See what depends on code you plan to change
- **Onboarding** -- understand unfamiliar codebase structure at a glance
- **Architecture review** -- identify tightly coupled modules and circular dependencies
- **Security analysis** -- trace data flow from entry points to sensitive operations to understand blast radius
- **Impact analysis** -- see what depends on code you plan to change before refactoring

View File

@@ -1,42 +0,0 @@
# Impact Analysis
Impact Analysis uses the Code Knowledge Graph to determine the blast radius of a security finding. When a vulnerability is found in a specific function or file, impact analysis traces the call graph to show everything that could be affected.
## Accessing Impact Analysis
Impact analysis is linked from the Graph Explorer. When viewing a repository's graph with findings, you can navigate to:
```
/graph/{repo_id}/impact/{finding_id}
```
## What You See
### Blast Radius
A count of the total number of code symbols (functions, methods, classes) affected by the vulnerability, both directly and transitively.
### Entry Points Affected
A list of **public entry points** — main functions, HTTP handlers, API endpoints — that could be impacted by the vulnerable code. These represent the ways an attacker could potentially reach the vulnerability.
### Call Chains
Complete call chain paths showing how execution flows from entry points through intermediate functions to the vulnerable code. Each chain shows the sequence of function calls.
### Direct Callers
The immediate functions that call the vulnerable function. These are the first layer of impact.
## How It Works
1. The finding's file path and line number are matched to a node in the code graph
2. The graph is traversed **backwards** along call edges to find all callers
3. Entry points (functions with no callers, or known patterns like `main`, HTTP handlers) are identified
4. All paths from entry points to the vulnerable node are computed
## Use Cases
- **Prioritization** — A critical vulnerability in a function called by 50 entry points is more urgent than one in dead code
- **Remediation scoping** — Understand what tests need to run after a fix
- **Risk assessment** — Quantify the actual exposure of a vulnerability

View File

@@ -1,72 +0,0 @@
# Issue Tracker Integration
Compliance Scanner automatically creates issues in your existing issue trackers when new security findings are discovered. This integrates security into your development workflow without requiring teams to check a separate tool.
## Supported Trackers
| Tracker | Configuration Variables |
|---------|----------------------|
| **GitHub Issues** | `GITHUB_TOKEN` |
| **GitLab Issues** | `GITLAB_URL`, `GITLAB_TOKEN` |
| **Jira** | `JIRA_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`, `JIRA_PROJECT_KEY` |
## How It Works
1. A scan discovers new findings
2. For each new finding, the agent checks if an issue already exists (by fingerprint)
3. If not, it creates an issue in the configured tracker with:
- Title matching the finding title
- Description with vulnerability details, severity, and file location
- Link back to the finding in the dashboard
4. The finding is updated with the external issue URL
## Viewing Issues
Navigate to **Issues** in the sidebar to see all tracker issues across your repositories.
The issues table shows:
| Column | Description |
|--------|-------------|
| Tracker | Badge showing GitHub, GitLab, or Jira |
| External ID | Issue number in the external system |
| Title | Issue title |
| Status | Open, Closed, or tracker-specific status |
| Created | When the issue was created |
| Link | Direct link to the issue in the external tracker |
Click the **Open** link to go directly to the issue in GitHub, GitLab, or Jira.
## Configuration
### GitHub
```bash
GITHUB_TOKEN=ghp_xxxx
```
Issues are created in the same repository that was scanned.
### GitLab
```bash
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=glpat-xxxx
```
Issues are created in the same project that was scanned.
### Jira
```bash
JIRA_URL=https://your-org.atlassian.net
JIRA_EMAIL=security-bot@example.com
JIRA_API_TOKEN=your-api-token
JIRA_PROJECT_KEY=SEC
```
All issues are created in the specified Jira project (`JIRA_PROJECT_KEY`).
::: tip
Use a dedicated service account for issue creation so that security findings are clearly attributed to automated scanning rather than individual team members.
:::

View File

@@ -1,155 +1,86 @@
# MCP Server
# MCP Integration
The Model Context Protocol (MCP) server exposes compliance data to external LLMs and AI agents. Any MCP-compatible client — such as Claude, Cursor, or a custom agent — can connect and query findings, SBOM data, and DAST results without direct database access.
Certifai exposes your security data through the Model Context Protocol (MCP), allowing LLM-powered tools to query your findings, SBOM data, and DAST results directly.
## How It Works
## What is MCP?
The `compliance-mcp` crate runs as a standalone service that connects to the same MongoDB database as the agent and dashboard. It registers a set of **tools** that LLM clients can discover and call through the MCP protocol.
The Model Context Protocol is an open standard that lets AI tools (like Claude, Cursor, or custom agents) connect to external data sources. Think of it as a way for your LLM to "see" your security data without you having to copy and paste it.
```
LLM Client ──MCP──▶ compliance-mcp ──MongoDB──▶ compliance_scanner DB
```
When an MCP client is connected to Certifai, you can ask questions like "Show me all critical findings" or "What vulnerable packages does this repo have?" and the LLM will query Certifai directly to get the answer.
The server supports two transport modes:
## Why It Matters
| Transport | Use Case | How to Enable |
|-----------|----------|---------------|
| **Stdio** | Local development, piped to a CLI tool | Default (no `MCP_PORT` set) |
| **Streamable HTTP** | Remote deployment, multiple clients | Set `MCP_PORT=8090` |
Without MCP, getting security data into an LLM conversation requires manual effort -- exporting reports, copying findings, pasting context. With MCP:
- Your AI coding assistant can check for security issues as you write code
- You can ask natural language questions about your security posture
- Security data stays up to date because it is queried live, not exported statically
- Multiple team members can connect their own LLM tools to the same data
## Managing MCP Servers
Navigate to **MCP Servers** in the sidebar to manage your MCP server instances.
![MCP servers management page](/screenshots/mcp-servers.png)
From this page you can:
- **Register** new MCP server instances with their endpoint URL, transport type, and port
- **View** server configuration, enabled tools, and status
- **Manage access tokens** -- reveal, copy, or regenerate bearer tokens for authentication
- **Delete** servers that are no longer needed
Each registered server is assigned a random access token on creation. You use this token in your MCP client configuration for authenticated access.
## Available Tools
The MCP server exposes seven tools:
The MCP server exposes seven tools that LLM clients can discover and call:
### Findings
### Findings Tools
| Tool | Description |
|------|-------------|
| `list_findings` | Query findings with optional filters for repository, severity, status, and scan type. Returns up to 200 results (default 50). |
| `get_finding` | Retrieve a single finding by its MongoDB ObjectId. |
| `list_findings` | Query findings with optional filters for repository, severity, status, and scan type. Returns up to 200 results. |
| `get_finding` | Retrieve a single finding by its ID. |
| `findings_summary` | Get finding counts grouped by severity and status, optionally filtered by repository. |
### SBOM
### SBOM Tools
| Tool | Description |
|------|-------------|
| `list_sbom_packages` | List SBOM packages with filters for repository, vulnerabilities, package manager, and license. |
| `sbom_vuln_report` | Generate a vulnerability report for a repository showing all packages with known CVEs. |
### DAST
### DAST Tools
| Tool | Description |
|------|-------------|
| `list_dast_findings` | Query DAST findings with filters for target, scan run, severity, exploitability, and vulnerability type. |
| `dast_scan_summary` | Get a summary of recent DAST scan runs and finding counts. |
## Running Locally
## Connecting an MCP Client
### Stdio Mode
To connect an MCP-compatible tool (like Claude Desktop or Cursor) to your Certifai MCP server:
Run the MCP server directly — it reads from stdin and writes to stdout:
1. Go to **MCP Servers** in Certifai and note the server endpoint URL and access token
2. In your MCP client, add a new server connection with:
- **URL** -- the MCP server endpoint (e.g. `https://your-certifai-instance/mcp`)
- **Transport** -- Streamable HTTP
- **Authentication** -- Bearer token using the access token from Certifai
```bash
cd compliance-mcp
cargo run
```
Configure your MCP client to launch it as a subprocess. For example, in a Claude Code `mcp.json`:
```json
{
"mcpServers": {
"compliance": {
"command": "cargo",
"args": ["run", "-p", "compliance-mcp"],
"cwd": "/path/to/compliance-scanner"
}
}
}
```
### HTTP Mode
Set `MCP_PORT` to start the Streamable HTTP server:
```bash
MCP_PORT=8090 cargo run -p compliance-mcp
```
The server listens on `http://0.0.0.0:8090/mcp`. Point your MCP client to this endpoint.
## Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `MONGODB_URI` | MongoDB connection string | `mongodb://localhost:27017` |
| `MONGODB_DATABASE` | Database name | `compliance_scanner` |
| `MCP_PORT` | Port for HTTP transport (omit for stdio) | — |
| `RUST_LOG` | Log level filter | `compliance_mcp=info` |
Create a `.env` file in the project root or set these as environment variables.
## Deploying with Docker
The `Dockerfile.mcp` builds and runs the MCP server in HTTP mode on port 8090.
```bash
docker build -f Dockerfile.mcp -t compliance-mcp .
docker run -p 8090:8090 \
-e MONGODB_URI=mongodb://mongo:27017 \
-e MONGODB_DATABASE=compliance_scanner \
-e MCP_PORT=8090 \
compliance-mcp
```
### Coolify Deployment
1. Create a new service in your Coolify project
2. Set the **Dockerfile path** to `Dockerfile.mcp`
3. Set the **exposed port** to `8090`
4. Add environment variables: `MONGODB_URI`, `MONGODB_DATABASE`, `MCP_PORT=8090`
5. The MCP endpoint will be available at your configured domain under `/mcp`
The CI pipeline automatically deploys on changes to `compliance-core/`, `compliance-mcp/`, `Dockerfile.mcp`, or `Cargo.toml`/`Cargo.lock`. Add the `COOLIFY_WEBHOOK_MCP` secret to your Gitea repository.
## Managing MCP Servers in the Dashboard
Navigate to **MCP Servers** in the dashboard sidebar to:
- **Register** MCP server instances with their endpoint URL, transport type, port, and database connection
- **View** server configuration, enabled tools, and status
- **Manage access tokens** — reveal, copy, or regenerate bearer tokens for authentication
- **Delete** servers that are no longer needed
Each registered server is assigned a random access token on creation. Use this token in your MCP client configuration for authenticated access.
## Example: Querying Findings from an LLM
Once connected, an LLM can call any of the registered tools. For example:
**"Show me all critical findings"** triggers `list_findings` with `severity: "critical"`:
```json
{
"tool": "list_findings",
"arguments": {
"severity": "critical",
"limit": 10
}
}
```
**"What vulnerable packages does repo X have?"** triggers `sbom_vuln_report`:
```json
{
"tool": "sbom_vuln_report",
"arguments": {
"repo_id": "683abc..."
}
}
```
Once connected, the LLM client automatically discovers the available tools and can call them in response to your questions.
::: tip
The MCP server is read-only it only queries data from MongoDB. It cannot modify findings, trigger scans, or change configuration. This makes it safe to expose to external LLM clients.
The MCP server is read-only -- it only queries data. It cannot modify findings, trigger scans, or change configuration. This makes it safe to expose to LLM clients.
:::
## Example Queries
Once your MCP client is connected, you can ask questions like:
- "Show me all critical findings across my repositories"
- "What vulnerable packages does the backend service have?"
- "Give me a summary of DAST findings for the staging target"
- "How many open findings do we have by severity?"
The LLM translates your natural language question into the appropriate tool call and presents the results in a readable format.

View File

@@ -1,10 +1,12 @@
# Dashboard Overview
The Overview page is the landing page of the Compliance Scanner dashboard. It gives you a high-level view of your security posture across all tracked repositories.
The Overview page is the landing page of Certifai. It gives you a high-level view of your security posture across all tracked repositories.
## Statistics
![Dashboard overview with stats cards, severity distribution, AI chat, and MCP servers](/screenshots/dashboard-overview.png)
The top section displays key metrics:
## Stats Cards
The top section displays key metrics at a glance:
| Metric | Description |
|--------|-------------|
@@ -14,22 +16,32 @@ The top section displays key metrics:
| **High** | Findings with high severity |
| **Medium** | Findings with medium severity |
| **Low** | Findings with low severity |
| **Dependencies** | Total SBOM entries across all repositories |
| **Dependencies** | Total SBOM packages across all repositories |
| **CVE Alerts** | Active CVE alerts from dependency monitoring |
| **Tracker Issues** | Issues created in external trackers (GitHub, GitLab, Jira) |
| **Tracker Issues** | Issues created in external trackers (GitHub, GitLab, Gitea, Jira) |
These cards update after each scan completes, so you always see the current state.
## Severity Distribution
A visual bar chart shows the distribution of findings by severity level, giving you an immediate sense of your risk profile.
A visual chart shows the distribution of findings by severity level across all your repositories. This gives you an immediate sense of your risk profile -- whether your findings are mostly informational or if there are critical issues that need attention.
## AI Chat Cards
The overview includes quick-access cards for the AI Chat feature. Each card represents a repository that has embeddings built, letting you jump directly into a conversation about that codebase. See [AI Chat](/features/ai-chat) for details.
## MCP Server Cards
If you have MCP servers registered, they appear on the overview page with their status and connection details. This lets you quickly check that your MCP integrations are running. See [MCP Integration](/features/mcp-server) for details.
## Recent Scan Runs
The bottom section lists the 10 most recent scan runs across all repositories, showing:
The bottom section lists the most recent scan runs across all repositories, showing:
- Repository name
- Scan status (queued, running, completed, failed)
- Current phase
- Number of findings discovered
- Timestamp
- Timestamp and duration
This helps you monitor scanning activity and quickly spot failures.
This helps you monitor scanning activity and quickly spot failures or long-running scans.

View File

@@ -1,106 +0,0 @@
# SBOM & License Compliance
The SBOM (Software Bill of Materials) feature provides a complete inventory of all dependencies across your repositories, with vulnerability tracking and license compliance analysis.
The SBOM page has three tabs: **Packages**, **License Compliance**, and **Compare**.
## Packages Tab
The packages tab lists all dependencies discovered during scans.
### Filtering
Use the filter bar to narrow results:
- **Repository** — Select a specific repository or view all
- **Package Manager** — npm, cargo, pip, go, maven, nuget, composer, gem
- **Search** — Filter by package name
- **Vulnerabilities** — Show all packages, only those with vulnerabilities, or only clean packages
- **License** — Filter by specific license (MIT, Apache-2.0, BSD-3-Clause, GPL-3.0, etc.)
### Package Details
Each package row shows:
| Column | Description |
|--------|-------------|
| Package | Package name |
| Version | Installed version |
| Manager | Package manager (npm, cargo, pip, etc.) |
| License | License identifier with color-coded badge |
| Vulnerabilities | Count of known vulnerabilities (click to expand) |
### Vulnerability Details
Click the vulnerability count to expand inline details showing:
- Vulnerability ID (e.g. CVE-2024-1234)
- Source database
- Severity level
- Link to the advisory
### Export
Export your SBOM in industry-standard formats:
1. Select a format:
- **CycloneDX 1.5** — JSON format widely supported by security tools
- **SPDX 2.3** — Linux Foundation standard for license compliance
2. Click **Export**
3. The SBOM downloads as a JSON file
::: tip
SBOM exports are useful for compliance audits, customer security questionnaires, and supply chain transparency requirements.
:::
## License Compliance Tab
The license compliance tab helps you understand your licensing obligations.
### Copyleft Warning
If any dependencies use copyleft licenses (GPL, AGPL, LGPL, MPL), a warning banner appears listing the affected packages and noting that they may impose distribution requirements.
### License Distribution
A horizontal bar chart visualizes the percentage breakdown of licenses across your dependencies.
### License Table
A detailed table lists every license found, with:
| Column | Description |
|--------|-------------|
| License | License identifier |
| Type | **Copyleft** or **Permissive** badge |
| Packages | List of packages using this license |
| Count | Number of packages |
**Copyleft licenses** (flagged as potentially restrictive):
- GPL-2.0, GPL-3.0
- AGPL-3.0
- LGPL-2.1, LGPL-3.0
- MPL-2.0
**Permissive licenses** (generally safe for commercial use):
- MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, etc.
## Compare Tab
Compare the dependency profiles of two repositories side by side.
1. Select **Repository A** from the first dropdown
2. Select **Repository B** from the second dropdown
3. View the diff results:
| Section | Description |
|---------|-------------|
| **Only in A** | Packages present in repo A but not in repo B |
| **Only in B** | Packages present in repo B but not in repo A |
| **Version Diffs** | Same package, different versions between repos |
| **Common** | Count of packages that match exactly |
This is useful for:
- Auditing consistency across microservices
- Identifying dependency drift between environments
- Planning dependency upgrades across projects

View File

@@ -1,153 +0,0 @@
# Configuration
Compliance Scanner is configured through environment variables. Copy `.env.example` to `.env` and edit the values.
## Required Settings
### MongoDB
```bash
MONGODB_URI=mongodb://root:example@localhost:27017/compliance_scanner?authSource=admin
MONGODB_DATABASE=compliance_scanner
```
### Agent
```bash
AGENT_PORT=3001
```
### Dashboard
```bash
DASHBOARD_PORT=8080
AGENT_API_URL=http://localhost:3001
```
## LLM Configuration
The AI features (chat, remediation suggestions) use LiteLLM as a proxy to various LLM providers:
```bash
LITELLM_URL=http://localhost:4000
LITELLM_API_KEY=your-key
LITELLM_MODEL=gpt-4o
LITELLM_EMBED_MODEL=text-embedding-3-small
```
The embed model is used for the RAG/AI Chat feature to generate code embeddings.
## Git Provider Tokens
### GitHub
```bash
GITHUB_TOKEN=ghp_xxxx
GITHUB_WEBHOOK_SECRET=your-webhook-secret
```
### GitLab
```bash
GITLAB_URL=https://gitlab.com
GITLAB_TOKEN=glpat-xxxx
GITLAB_WEBHOOK_SECRET=your-webhook-secret
```
## Issue Tracker Integration
### Jira
```bash
JIRA_URL=https://your-org.atlassian.net
JIRA_EMAIL=user@example.com
JIRA_API_TOKEN=your-api-token
JIRA_PROJECT_KEY=SEC
```
When configured, new findings automatically create Jira issues in the specified project.
## Scan Schedules
Cron expressions for automated scanning:
```bash
# Scan every 6 hours
SCAN_SCHEDULE=0 0 */6 * * *
# Check for new CVEs daily at midnight
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```
## Search Engine
SearXNG is used for CVE enrichment and vulnerability research:
```bash
SEARXNG_URL=http://localhost:8888
```
## NVD API
An NVD API key increases rate limits for CVE lookups:
```bash
NVD_API_KEY=your-nvd-api-key
```
Get a free key at [https://nvd.nist.gov/developers/request-an-api-key](https://nvd.nist.gov/developers/request-an-api-key).
## MCP Server
The MCP server exposes compliance data to external LLMs via the Model Context Protocol. See [MCP Server](/features/mcp-server) for full details.
```bash
# Set MCP_PORT to enable HTTP transport (omit for stdio mode)
MCP_PORT=8090
```
The MCP server shares the `MONGODB_URI` and `MONGODB_DATABASE` variables with the rest of the platform.
## Clone Path
Where the agent stores cloned repository files:
```bash
GIT_CLONE_BASE_PATH=/tmp/compliance-scanner/repos
```
## All Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `MONGODB_URI` | Yes | — | MongoDB connection string |
| `MONGODB_DATABASE` | No | `compliance_scanner` | Database name |
| `AGENT_PORT` | No | `3001` | Agent REST API port |
| `DASHBOARD_PORT` | No | `8080` | Dashboard web UI port |
| `AGENT_API_URL` | No | `http://localhost:3001` | Agent URL for dashboard |
| `LITELLM_URL` | No | `http://localhost:4000` | LiteLLM proxy URL |
| `LITELLM_API_KEY` | No | — | LiteLLM API key |
| `LITELLM_MODEL` | No | `gpt-4o` | LLM model for analysis |
| `LITELLM_EMBED_MODEL` | No | `text-embedding-3-small` | Embedding model for RAG |
| `GITHUB_TOKEN` | No | — | GitHub personal access token |
| `GITHUB_WEBHOOK_SECRET` | No | — | GitHub webhook signing secret |
| `GITLAB_URL` | No | `https://gitlab.com` | GitLab instance URL |
| `GITLAB_TOKEN` | No | — | GitLab access token |
| `GITLAB_WEBHOOK_SECRET` | No | — | GitLab webhook signing secret |
| `JIRA_URL` | No | — | Jira instance URL |
| `JIRA_EMAIL` | No | — | Jira account email |
| `JIRA_API_TOKEN` | No | — | Jira API token |
| `JIRA_PROJECT_KEY` | No | — | Jira project key for issues |
| `SEARXNG_URL` | No | `http://localhost:8888` | SearXNG instance URL |
| `NVD_API_KEY` | No | — | NVD API key for CVE lookups |
| `SCAN_SCHEDULE` | No | `0 0 */6 * * *` | Cron schedule for scans |
| `CVE_MONITOR_SCHEDULE` | No | `0 0 0 * * *` | Cron schedule for CVE checks |
| `GIT_CLONE_BASE_PATH` | No | `/tmp/compliance-scanner/repos` | Local clone directory |
| `KEYCLOAK_URL` | No | — | Keycloak server URL |
| `KEYCLOAK_REALM` | No | — | Keycloak realm name |
| `KEYCLOAK_CLIENT_ID` | No | — | Keycloak client ID |
| `REDIRECT_URI` | No | — | OAuth callback URL |
| `APP_URL` | No | — | Application root URL |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | No | — | OTLP collector endpoint |
| `OTEL_SERVICE_NAME` | No | — | OpenTelemetry service name |
| `MCP_PORT` | No | — | MCP HTTP transport port (omit for stdio) |

View File

@@ -1,68 +1,62 @@
# Managing Findings
# Understanding Findings
Findings are security issues discovered during scans. The findings workflow lets you triage, track, and resolve vulnerabilities across all your repositories.
## Findings List
Navigate to **Findings** in the sidebar to see all findings. The table shows:
Navigate to **Findings** in the sidebar to see all findings across your repositories.
| Column | Description |
|--------|-------------|
| Severity | Color-coded badge: Critical (red), High (orange), Medium (yellow), Low (green) |
| Title | Short description of the vulnerability (clickable) |
| Type | SAST, SBOM, CVE, GDPR, or OAuth |
| Scanner | Tool that found the issue (e.g. semgrep, syft) |
| File | Source file path where the issue was found |
| Status | Current triage status |
![Findings list with severity badges, types, and filter controls](/screenshots/findings-list.png)
## Filtering
### Filtering
Use the filter bar at the top to narrow results:
Use the filter bar to narrow results:
- **Repository** — Filter to a specific repository or view all
- **Severity** Critical, High, Medium, Low, or Info
- **Type** SAST, SBOM, CVE, GDPR, OAuth
- **Status** Open, Triaged, Resolved, False Positive, Ignored
- **Repository** -- filter to a specific repository or view all
- **Severity** -- Critical, High, Medium, Low, or Info
- **Type** -- SAST, SBOM, CVE, GDPR, OAuth, Secrets, Code Review
- **Status** -- Open, Triaged, Resolved, False Positive, Ignored
Filters can be combined. Results are paginated with 20 findings per page.
### Columns
| Column | Description |
|--------|-------------|
| Severity | Color-coded badge: Critical (red), High (orange), Medium (yellow), Low (green), Info (blue) |
| Title | Short description of the vulnerability (clickable) |
| Type | SAST, SBOM, CVE, GDPR, OAuth, Secrets, or Code Review |
| Scanner | Tool that found the issue (e.g. Semgrep, Grype) |
| File | Source file path where the issue was found |
| Status | Current triage status |
## Finding Detail
Click any finding title to view its full detail page, which includes:
Click any finding title to view its full detail page.
### Metadata
- Severity level with CWE identifier and CVSS score (when available)
- Scanner tool and scan type
- File path and line number
![Finding detail page showing description, triage rationale, code evidence, remediation, and status controls](/screenshots/finding-detail.png)
The detail page is organized into these sections:
### Description
Full explanation of the vulnerability, why it's a risk, and what conditions trigger it.
A full explanation of the vulnerability: what it is, why it is a risk, and what conditions trigger it.
### AI Triage Rationale
The LLM's assessment of the finding, including why it assigned a particular severity and confidence score. This rationale considers the code context, the type of vulnerability, and the blast radius based on the code knowledge graph.
### Code Evidence
The source code snippet where the issue was found, with syntax highlighting and the file path.
The source code snippet where the issue was found, with syntax highlighting and the file path with line number.
### Remediation
Step-by-step guidance on how to fix the vulnerability.
### Suggested Fix
A code example showing the corrected implementation.
Step-by-step guidance on how to fix the vulnerability, often including a suggested code fix showing the corrected implementation.
### Linked Issue
If the finding was pushed to an issue tracker (GitHub, GitLab, Jira), a direct link to the external issue.
## Updating Status
On the finding detail page, change the finding's status using the status buttons:
| Status | When to Use |
|--------|-------------|
| **Open** | New finding, not yet reviewed |
| **Triaged** | Reviewed and confirmed as a real issue, pending fix |
| **Resolved** | Fix has been applied |
| **False Positive** | Finding is not a real vulnerability in this context |
| **Ignored** | Known issue that won't be fixed (accepted risk) |
Status changes are persisted immediately.
If the finding has been pushed to an issue tracker (GitHub, GitLab, Gitea, Jira), a direct link to the external issue appears here.
## Severity Levels
@@ -73,3 +67,77 @@ Status changes are persisted immediately.
| **Medium** | Moderate risk, exploitation requires specific conditions | Insecure deserialization, weak crypto |
| **Low** | Minor risk, limited impact | Information disclosure, verbose errors |
| **Info** | Informational, no direct security impact | Best practice recommendations |
## Finding Types
| Type | Source | Description |
|------|--------|-------------|
| **SAST** | Semgrep | Code-level vulnerabilities found through static analysis |
| **SBOM** | Syft + Grype | Vulnerable dependencies identified in your software bill of materials |
| **CVE** | NVD | Known CVEs matching your dependency versions |
| **GDPR** | Custom rules | Personal data handling and consent issues |
| **OAuth** | Custom rules | OAuth/OIDC misconfigurations and insecure token handling |
| **Secrets** | Custom rules | Hardcoded credentials, API keys, and tokens |
| **Code Review** | LLM | Architecture and security patterns reviewed by the AI engine |
## Triage Workflow
Every finding follows a lifecycle from discovery to resolution. The status indicates where a finding is in this process:
| Status | Meaning |
|--------|---------|
| **Open** | Newly discovered, not yet reviewed |
| **Triaged** | Reviewed and confirmed as a real issue, pending fix |
| **Resolved** | A fix has been applied |
| **False Positive** | Not a real vulnerability in this context |
| **Ignored** | Known issue that will not be fixed (accepted risk) |
On the finding detail page, use the status buttons to move a finding through this workflow. Status changes take effect immediately.
### Recommended Flow
1. A scan discovers a new finding -- it starts as **Open**
2. You review the AI triage rationale and code evidence
3. If it is a real issue, mark it as **Triaged** to signal that it needs a fix
4. Once the fix is deployed and a new scan confirms it, mark it as **Resolved**
5. If the AI got it wrong, mark it as **False Positive** (see below)
## False Positives
Not every finding is a real vulnerability. Static analysis tools can flag code that looks suspicious but is actually safe in context. When this happens:
1. Open the finding detail page
2. Review the code evidence and the AI triage rationale
3. If you determine the finding is not a real issue, click **False Positive**
::: tip
When you mark a finding as a false positive, you are providing training signal to the AI. Over time, the LLM learns from your feedback and becomes better at distinguishing real vulnerabilities from false alarms in your codebase.
:::
## Human in the Loop
Certifai uses AI to triage findings, but humans make the final decisions. Here is how the process works:
1. **AI triages** -- the LLM reviews each finding, assigns a severity, generates a confidence score, and writes a rationale explaining its assessment
2. **You review** -- you read the AI's analysis alongside the code evidence and decide whether to act on it
3. **You decide** -- you set the final status (Triaged, Resolved, False Positive, or Ignored)
4. **AI learns** -- your feedback on false positives and status changes helps improve future triage accuracy
The AI provides the analysis; you provide the judgment. This approach gives you the speed of automated scanning with the accuracy of human review.
## Developer Feedback
On the finding detail page, you can provide feedback on the AI's triage. This feedback loop serves two purposes:
- **Accuracy** -- helps the platform understand which findings are actionable in your specific codebase and context
- **Context** -- lets you add notes explaining why a finding is or is not relevant, which benefits other team members reviewing the same finding
## Confidence Scores
Each AI-triaged finding includes a confidence score from 0.0 to 1.0, indicating how certain the LLM is about its assessment:
- **0.8 -- 1.0** -- High confidence. The AI is very certain this is (or is not) a real vulnerability.
- **0.5 -- 0.8** -- Moderate confidence. The finding likely warrants human review.
- **Below 0.5** -- Low confidence. The AI is uncertain and recommends manual inspection.
Use confidence scores to prioritize your review queue: start with high-severity, high-confidence findings for the greatest impact.

View File

@@ -1,55 +1,49 @@
# Getting Started
Compliance Scanner is a security compliance platform that scans your Git repositories for vulnerabilities, builds software bills of materials, performs dynamic application testing, and provides AI-powered code intelligence.
Certifai is an AI-powered security compliance platform that scans your Git repositories for vulnerabilities, builds software bills of materials, performs dynamic application testing, and provides code intelligence through an interactive knowledge graph and AI chat.
## Architecture
## What You Get
The platform consists of three main components:
When you connect a repository, Certifai runs a comprehensive scan pipeline that covers:
- **Agent** — Background service that clones repositories, runs scans, builds graphs, and exposes a REST API
- **Dashboard** — Web UI built with Dioxus (Rust full-stack framework) for viewing results and managing repositories
- **MongoDB** — Database for storing all scan results, findings, SBOM data, and graph structures
- **Static Analysis (SAST)** -- finds code-level vulnerabilities like injection flaws, insecure crypto, and misconfigurations
- **Software Bill of Materials (SBOM)** -- inventories every dependency, its version, and its license
- **CVE Monitoring** -- cross-references your dependencies against known vulnerabilities
- **Code Knowledge Graph** -- maps the structure of your codebase for impact analysis
- **AI Triage** -- every finding is reviewed by an LLM that provides severity assessment, confidence scores, and remediation guidance
- **Issue Tracking** -- automatically creates issues in your tracker for new findings
## Quick Start with Docker Compose
## Dashboard Overview
The fastest way to get running:
After logging in, you land on the Overview page, which gives you a snapshot of your security posture across all repositories.
```bash
# Clone the repository
git clone <repo-url> compliance-scanner
cd compliance-scanner
![Dashboard overview showing stats cards, severity distribution, and recent scan activity](/screenshots/dashboard-overview.png)
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your settings (see Configuration)
The overview shows key metrics at a glance: total repositories, findings broken down by severity, dependency counts, CVE alerts, and tracker issues. A severity distribution chart visualizes your risk profile, and recent scan runs let you monitor scanning activity.
# Start all services
docker-compose up -d
```
## Quick Walkthrough
This starts:
- MongoDB on port `27017`
- Agent API on port `3001`
- Dashboard on port `8080`
- Chromium (for DAST crawling) on port `3003`
Here is the fastest path from zero to your first scan results:
Open the dashboard at [http://localhost:8080](http://localhost:8080).
### 1. Add a repository
## What Happens During a Scan
Navigate to **Repositories** in the sidebar and click **Add Repository**. Enter a name, the Git clone URL, and the default branch to scan.
When you add a repository and trigger a scan, the agent runs through these phases:
![Add repository dialog](/screenshots/add-repository.png)
1. **Clone** — Clones or pulls the latest code from the Git remote
2. **SAST** — Runs static analysis using Semgrep with rules for OWASP, GDPR, OAuth, and general security
3. **SBOM** — Extracts all dependencies using Syft, identifying packages, versions, licenses, and known vulnerabilities
4. **CVE Check** — Cross-references dependencies against the NVD database for known CVEs
5. **Graph Build** — Parses the codebase to construct a code knowledge graph of functions, classes, and their relationships
6. **Issue Sync** — Creates or updates issues in connected trackers (GitHub, GitLab, Jira) for new findings
### 2. Trigger a scan
Each phase produces results visible in the dashboard immediately.
Click the **Scan** button on your repository row. The scan runs in the background through all phases: cloning, static analysis, SBOM extraction, CVE checking, graph building, and issue sync.
### 3. View findings
Once the scan completes, navigate to **Findings** to see everything that was discovered. Each finding includes a severity level, description, code evidence, and AI-generated remediation guidance.
![Findings list with filters](/screenshots/findings-list.png)
## Next Steps
- [Add your first repository](/guide/repositories)
- [Understand scan results](/guide/findings)
- [Configure integrations](/guide/configuration)
- [Add and configure repositories](/guide/repositories) -- including private repos and issue tracker setup
- [Understand how scans work](/guide/scanning) -- phases, triggers, and deduplication
- [Work with findings](/guide/findings) -- triage, false positives, and developer feedback
- [Explore your SBOM](/guide/sbom) -- dependencies, licenses, and exports

56
docs/guide/issues.md Normal file
View File

@@ -0,0 +1,56 @@
# Issues & Tracking
Certifai automatically creates issues in your existing issue trackers when new security findings are discovered. This integrates security into your development workflow without requiring teams to check a separate tool.
## How Issues Are Created
When a scan discovers new findings, the following happens automatically:
1. Each new finding is checked against existing issues using its fingerprint
2. If no matching issue exists, a new issue is created in the configured tracker
3. The issue includes the finding title, severity, vulnerability details, file location, and a link back to the finding in Certifai
4. The finding is updated with a link to the external issue
This means every actionable finding gets tracked in the same system your developers already use.
## Issues List
Navigate to **Issues** in the sidebar to see all tracker issues across your repositories.
![Issues list showing tracker issues](/screenshots/issues-list.png)
The issues table shows:
| Column | Description |
|--------|-------------|
| Tracker | Badge showing GitHub, GitLab, Gitea, or Jira |
| External ID | Issue number in the external system |
| Title | Issue title |
| Status | Open, Closed, or tracker-specific status |
| Created | When the issue was created |
| Link | Direct link to the issue in the external tracker |
Click the link to go directly to the issue in your tracker.
## Supported Trackers
| Tracker | How to Configure |
|---------|-----------------|
| **GitHub Issues** | Set up in the repository's issue tracker settings with your GitHub API token |
| **GitLab Issues** | Set up with your GitLab project ID, instance URL, and API token |
| **Gitea Issues** | Set up with your Gitea repository details, instance URL, and API token |
| **Jira** | Set up with your Jira project key, instance URL, email, and API token |
Issue tracker configuration is per-repository. You set it up when [adding or editing a repository](/guide/repositories#configuring-an-issue-tracker).
## Deduplication
Issues are deduplicated using the same fingerprint hash that deduplicates findings. This means:
- If the same vulnerability appears in consecutive scans, only one issue is created
- If a finding is resolved and then reappears, the platform recognizes it and can reopen the existing issue rather than creating a duplicate
- Different findings (even if similar) get separate issues because their fingerprints differ based on file path, line number, and vulnerability type
## Linked Issues in Finding Detail
When viewing a [finding's detail page](/guide/findings#finding-detail), you will see a **Linked Issue** section if an issue was created for that finding. This provides a direct link to the external tracker issue, making it easy to jump between the security context in Certifai and the development workflow in your tracker.

View File

@@ -1,26 +1,78 @@
# Adding Repositories
Repositories are the core resource in Compliance Scanner. Each tracked repository is scanned on a schedule and its results are available across all features.
Repositories are the core resource in Certifai. Each tracked repository is scanned on a schedule, and its results are available across all features -- findings, SBOM, code graph, AI chat, and issue tracking.
## Adding a Repository
1. Navigate to **Repositories** in the sidebar
2. Click **Add Repository** at the top of the page
2. Click **Add Repository**
3. Fill in the form:
- **Name** — A display name for the repository
- **Git URL** — The clone URL (HTTPS or SSH), e.g. `https://github.com/org/repo.git`
- **Default Branch** — The branch to scan, e.g. `main` or `master`
- **Name** -- a display name for the repository
- **Git URL** -- the clone URL (HTTPS or SSH), e.g. `https://github.com/org/repo.git` or `git@github.com:org/repo.git`
- **Default Branch** -- the branch to scan, e.g. `main` or `master`
4. Click **Add**
![Add repository dialog](/screenshots/add-repository.png)
The repository appears in the list immediately. It will not be scanned until you trigger a scan manually or the next scheduled scan runs.
## Public vs Private Repositories
**Public repositories** can be cloned using an HTTPS URL with no additional setup.
**Private repositories** require SSH access. When you add a repository with an SSH URL (e.g. `git@github.com:org/repo.git`), Certifai uses an SSH deploy key to authenticate.
### Getting the SSH Public Key
To grant Certifai access to a private repository:
1. Go to the **Repositories** page
2. The platform's SSH public key is available for copying
3. Add this key as a **deploy key** in your Git hosting provider:
- **GitHub**: Repository Settings > Deploy keys > Add deploy key
- **GitLab**: Repository Settings > Repository > Deploy keys
- **Gitea**: Repository Settings > Deploy Keys > Add Deploy Key
::: tip
For private repositories, configure a GitHub token (`GITHUB_TOKEN`) or GitLab token (`GITLAB_TOKEN`) in your environment. The agent uses these tokens when cloning.
Deploy keys are scoped to a single repository and are read-only by default. This is the recommended approach for granting Certifai access to private code.
:::
## Configuring an Issue Tracker
You can connect an issue tracker so that new findings are automatically created as issues in your existing workflow.
When adding or editing a repository, expand the **Issue Tracker** section to configure:
![Add repository dialog with issue tracker options](/screenshots/add-repository-tracker.png)
### Supported Trackers
| Tracker | Required Fields |
|---------|----------------|
| **GitHub Issues** | Repository owner, repository name, API token |
| **GitLab Issues** | Project ID, GitLab URL, API token |
| **Gitea Issues** | Repository owner, repository name, Gitea URL, API token |
| **Jira** | Project key, Jira URL, email, API token |
Each tracker is configured per-repository, so different repositories can use different trackers.
## Editing Repository Settings
Click the **Edit** button on any repository row to modify its settings, including the issue tracker configuration.
![Edit repository modal with tracker configuration](/screenshots/edit-repository.png)
From the edit modal you can:
- Change the repository name, Git URL, or default branch
- Add, modify, or remove issue tracker configuration
- View the webhook URL and secret for this repository (see [Webhooks & PR Reviews](/guide/webhooks))
## Repository List
The repositories page shows all tracked repositories with:
The repositories page shows all tracked repositories in a table.
![Repository list table](/screenshots/repositories-list.png)
| Column | Description |
|--------|-------------|
@@ -32,7 +84,7 @@ The repositories page shows all tracked repositories with:
## Triggering a Scan
Click the **Scan** button on any repository row to trigger an immediate scan. The scan runs in the background through all phases (clone, SAST, SBOM, CVE, graph). You can monitor progress on the Overview page under recent scan runs.
Click the **Scan** button on any repository row to trigger an immediate scan. The scan runs in the background through all phases (clone, SAST, SBOM, CVE, graph, issue sync). You can monitor progress on the Overview page under recent scan runs.
## Deleting a Repository
@@ -44,19 +96,6 @@ Click the **Delete** button on a repository row. A confirmation dialog appears w
- Code graph data
- Embedding vectors (for AI chat)
- CVE alerts
- Tracker issues
This action cannot be undone.
## Automatic Scanning
Repositories are scanned automatically on a schedule configured by the `SCAN_SCHEDULE` environment variable (cron format). The default is every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
CVE monitoring runs on a separate schedule (default: daily at midnight):
```
CVE_MONITOR_SCHEDULE=0 0 0 * * *
```

111
docs/guide/sbom.md Normal file
View File

@@ -0,0 +1,111 @@
# SBOM & Licenses
The SBOM (Software Bill of Materials) feature provides a complete inventory of all dependencies across your repositories, with vulnerability tracking and license compliance analysis.
## What is an SBOM?
A Software Bill of Materials is a list of every component (library, package, framework) that your software depends on, along with version numbers, licenses, and known vulnerabilities. SBOMs are increasingly required for compliance audits, customer security questionnaires, and supply chain transparency.
Certifai generates SBOMs automatically during each scan using Syft for dependency extraction and Grype for vulnerability matching.
## Packages Tab
Navigate to **SBOM** in the sidebar to see the packages tab, which lists all dependencies discovered during scans.
![SBOM packages tab with filters and export options](/screenshots/sbom-packages.png)
### Filtering
Use the filter bar to narrow results:
- **Repository** -- select a specific repository or view all
- **Package Manager** -- npm, cargo, pip, go, maven, nuget, composer, gem
- **Search** -- filter by package name
- **Vulnerabilities** -- show all packages, only those with vulnerabilities, or only clean packages
- **License** -- filter by specific license (MIT, Apache-2.0, BSD-3-Clause, GPL-3.0, etc.)
### Package Details
Each package row shows:
| Column | Description |
|--------|-------------|
| Package | Package name |
| Version | Installed version |
| Manager | Package manager (npm, cargo, pip, etc.) |
| License | License identifier with color-coded badge |
| Vulnerabilities | Count of known vulnerabilities (click to expand) |
### Vulnerability Details
Click the vulnerability count on any package to expand inline details showing:
- Vulnerability ID (e.g. CVE-2024-1234)
- Source database
- Severity level
- Link to the advisory
## License Compliance Tab
The license compliance tab helps you understand your licensing obligations across all dependencies.
![License compliance tab with copyleft warnings and distribution chart](/screenshots/sbom-licenses.png)
### Copyleft Warnings
If any dependencies use copyleft licenses (GPL, AGPL, LGPL, MPL), a warning banner appears listing the affected packages. Copyleft licenses may impose distribution requirements on your software.
::: warning
Copyleft-licensed dependencies can require you to release your source code under the same license. Review flagged packages carefully with your legal team if you distribute proprietary software.
:::
### License Distribution
A horizontal bar chart visualizes the percentage breakdown of licenses across your dependencies, giving you a quick overview of your licensing profile.
### License Table
A detailed table lists every license found:
| Column | Description |
|--------|-------------|
| License | License identifier |
| Type | **Copyleft** or **Permissive** badge |
| Packages | List of packages using this license |
| Count | Number of packages |
**Copyleft licenses** (flagged as potentially restrictive): GPL-2.0, GPL-3.0, AGPL-3.0, LGPL-2.1, LGPL-3.0, MPL-2.0
**Permissive licenses** (generally safe for commercial use): MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, and others
## Export
You can export your SBOM in industry-standard formats:
1. Select a repository (or export across all repositories)
2. Choose a format:
- **CycloneDX 1.5** -- JSON format widely supported by security tools
- **SPDX 2.3** -- Linux Foundation standard for license compliance
3. Click **Export**
4. The SBOM downloads as a JSON file
::: tip
SBOM exports are useful for compliance audits, customer security questionnaires, government procurement requirements, and supply chain transparency.
:::
## Compare Tab
Compare the dependency profiles of two repositories side by side:
1. Select **Repository A** from the first dropdown
2. Select **Repository B** from the second dropdown
3. View the comparison results:
| Section | Description |
|---------|-------------|
| **Only in A** | Packages present in repo A but not in repo B |
| **Only in B** | Packages present in repo B but not in repo A |
| **Version Diffs** | Same package with different versions between repos |
| **Common** | Count of packages that match exactly |
This is useful for auditing consistency across microservices, identifying dependency drift, and planning coordinated upgrades.

View File

@@ -1,20 +1,22 @@
# Running Scans
Scans are the primary workflow in Compliance Scanner. Each scan analyzes a repository for security vulnerabilities, dependency risks, and code structure.
Scans are the primary workflow in Certifai. Each scan analyzes a repository for security vulnerabilities, dependency risks, and code structure.
## Scan Types
## What Happens During a Scan
A full scan consists of multiple phases, each producing different types of findings:
When a scan is triggered, Certifai runs through these phases in order:
| Scan Type | What It Detects | Scanner |
|-----------|----------------|---------|
| **SAST** | Code-level vulnerabilities (injection, XSS, insecure crypto, etc.) | Semgrep |
| **SBOM** | Dependency inventory, outdated packages, known vulnerabilities | Syft |
| **CVE** | Known CVEs in dependencies cross-referenced against NVD | NVD API |
| **GDPR** | Personal data handling issues, consent violations | Custom rules |
| **OAuth** | OAuth/OIDC misconfigurations, insecure token handling | Custom rules |
1. **Clone** -- pulls the latest code from the Git remote (or clones it for the first time)
2. **SAST** -- runs static analysis using Semgrep with rules covering OWASP, GDPR, OAuth, secrets, and general security patterns
3. **SBOM** -- extracts all dependencies using Syft, identifying packages, versions, licenses, and known vulnerabilities via Grype
4. **CVE Check** -- cross-references dependencies against the NVD database for known CVEs
5. **Graph Build** -- parses the codebase to construct a code knowledge graph of functions, classes, and their relationships
6. **AI Triage** -- new findings are reviewed by an LLM that assesses severity, considers blast radius using the code graph, and generates remediation guidance
7. **Issue Sync** -- creates or updates issues in connected trackers (GitHub, GitLab, Gitea, Jira) for new findings
## Triggering a Scan
Each phase produces results that are visible in the dashboard as soon as they complete.
## How Scans Are Triggered
### Manual Scan
@@ -24,60 +26,54 @@ A full scan consists of multiple phases, each producing different types of findi
### Scheduled Scans
Scans run automatically based on the `SCAN_SCHEDULE` cron expression. The default scans every 6 hours:
```
SCAN_SCHEDULE=0 0 */6 * * *
```
Repositories are scanned automatically on a recurring schedule. By default, scans run every 6 hours and CVE monitoring runs daily. Your administrator controls these schedules.
### Webhook-Triggered Scans
Configure GitHub or GitLab webhooks to trigger scans on push events. Set the webhook URL to:
When you configure a webhook in your Git hosting provider, scans are triggered automatically on push events. You can also get automated PR reviews. See [Webhooks & PR Reviews](/guide/webhooks) for setup instructions.
```
http://<agent-host>:3002/webhook/github
http://<agent-host>:3002/webhook/gitlab
```
## Scan Phases and Statuses
And configure the corresponding webhook secret:
```
GITHUB_WEBHOOK_SECRET=your-secret
GITLAB_WEBHOOK_SECRET=your-secret
```
## Scan Phases
Each scan progresses through these phases in order:
1. **Queued** — Scan is waiting to start
2. **Cloning** — Repository is being cloned or updated
3. **Scanning** — Static analysis and SBOM extraction are running
4. **Analyzing** — CVE cross-referencing and graph construction
5. **Reporting** — Creating tracker issues for new findings
6. **Completed** — All phases finished successfully
If any phase fails, the scan status is set to **Failed** with an error message.
## Viewing Scan History
The Overview page shows the 10 most recent scan runs across all repositories, including:
- Repository name
- Scan status
- Current phase
- Number of findings discovered
- Start time and duration
## Scan Run Statuses
Each scan progresses through these statuses:
| Status | Meaning |
|--------|---------|
| `queued` | Waiting to start |
| `running` | Currently executing |
| `completed` | Finished successfully |
| `failed` | Stopped due to an error |
| **Queued** | Scan is waiting to start |
| **Running** | Currently executing scan phases |
| **Completed** | All phases finished successfully |
| **Failed** | Stopped due to an error |
## Deduplication
You can monitor scan progress on the Overview page, which shows the most recent scan runs across all repositories, including the current phase, finding count, and duration.
Findings are deduplicated using a fingerprint hash based on the scanner, file path, line number, and vulnerability type. Repeated scans will not create duplicate findings for the same issue.
## Scan Types
A full scan runs multiple analysis engines, each producing different types of findings:
| Scan Type | What It Detects | Scanner |
|-----------|----------------|---------|
| **SAST** | Code-level vulnerabilities (injection, XSS, insecure crypto, etc.) | Semgrep |
| **SBOM** | Dependency inventory, outdated packages, known vulnerabilities | Syft + Grype |
| **CVE** | Known CVEs in dependencies cross-referenced against NVD | NVD API |
| **GDPR** | Personal data handling issues, consent violations | Custom rules |
| **OAuth** | OAuth/OIDC misconfigurations, insecure token handling | Custom rules |
| **Secrets** | Hardcoded credentials, API keys, tokens in source code | Custom rules |
| **Code Review** | Architecture and security patterns reviewed by AI | LLM-powered |
## Deduplication and Fingerprinting
Findings are deduplicated using a fingerprint hash based on the scanner, file path, line number, and vulnerability type. This means:
- **Repeated scans** will not create duplicate findings for the same issue
- **Tracker issues** are only created once per unique finding
- **Resolved findings** that reappear in a new scan are flagged for re-review
The fingerprint is also used to match findings to existing tracker issues, preventing duplicate issues from being created in GitHub, GitLab, Gitea, or Jira.
## Interpreting Results
After a scan completes, you can explore results in several ways:
- **Findings** -- browse all discovered vulnerabilities with filters for severity, type, and status. See [Understanding Findings](/guide/findings).
- **SBOM** -- review your dependency inventory, check for vulnerable packages, and audit license compliance. See [SBOM & Licenses](/guide/sbom).
- **Overview** -- check the dashboard for a high-level summary of your security posture across all repositories.
- **Issues** -- see which findings have been pushed to your issue tracker. See [Issues & Tracking](/guide/issues).

87
docs/guide/webhooks.md Normal file
View File

@@ -0,0 +1,87 @@
# Webhooks & PR Reviews
Webhooks let Certifai respond to events in your Git repositories automatically. When configured, pushes to your repository trigger scans, and pull requests receive automated security reviews.
## What Webhooks Enable
- **Automatic scans on push** -- every time code is pushed to your default branch, a scan is triggered automatically
- **PR security reviews** -- when a pull request is opened or updated, Certifai scans the changes and posts a review comment summarizing any security findings in the diff
## Finding the Webhook URL and Secret
Each repository in Certifai has its own webhook URL and secret:
1. Go to **Repositories**
2. Click **Edit** on the repository you want to configure
3. In the edit modal, you will find the **Webhook URL** and **Webhook Secret**
4. Copy both values -- you will need them when configuring your Git hosting provider
## Setting Up Webhooks
### Gitea
1. Go to your repository in Gitea
2. Navigate to **Settings > Webhooks > Add Webhook > Gitea**
3. Set the **Target URL** to the webhook URL from Certifai
4. Set the **Secret** to the webhook secret from Certifai
5. Under **Trigger On**, select:
- **Push Events** -- for automatic scans on push
- **Pull Request Events** -- for PR security reviews
6. Set the content type to `application/json`
7. Click **Add Webhook**
### GitHub
1. Go to your repository on GitHub
2. Navigate to **Settings > Webhooks > Add webhook**
3. Set the **Payload URL** to the webhook URL from Certifai
4. Set the **Content type** to `application/json`
5. Set the **Secret** to the webhook secret from Certifai
6. Under **Which events would you like to trigger this webhook?**, select **Let me select individual events**, then check:
- **Pushes** -- for automatic scans on push
- **Pull requests** -- for PR security reviews
7. Click **Add webhook**
### GitLab
1. Go to your project in GitLab
2. Navigate to **Settings > Webhooks**
3. Set the **URL** to the webhook URL from Certifai
4. Set the **Secret token** to the webhook secret from Certifai
5. Under **Trigger**, check:
- **Push events** -- for automatic scans on push
- **Merge request events** -- for PR security reviews
6. Click **Add webhook**
## PR Review Flow
When a pull request (or merge request) is opened or updated, the following happens:
1. Your Git provider sends a webhook event to Certifai
2. Certifai checks out the PR branch and runs a targeted scan on the changed files
3. Findings specific to the changes in the PR are identified
4. Certifai posts a review comment on the PR summarizing:
- Number of new findings introduced by the changes
- Severity breakdown
- Details for each finding including file, line, and remediation guidance
This gives developers immediate security feedback in their pull request workflow, before code is merged.
::: tip
PR reviews focus only on changes introduced in the pull request, not the entire codebase. This keeps reviews relevant and actionable.
:::
## Events to Select
Here is a summary of which events to enable for each feature:
| Feature | Gitea | GitHub | GitLab |
|---------|-------|--------|--------|
| Scan on push | Push Events | Pushes | Push events |
| PR reviews | Pull Request Events | Pull requests | Merge request events |
You can enable one or both depending on your workflow.
::: warning
Make sure the webhook secret matches exactly between your Git provider and Certifai. Requests with an invalid signature are rejected.
:::

View File

@@ -2,7 +2,7 @@
layout: home
hero:
name: Compliance Scanner
name: Certifai
text: AI-Powered Security Compliance
tagline: Automated SAST, SBOM, DAST, CVE monitoring, and code intelligence for your repositories
actions:
@@ -14,16 +14,16 @@ hero:
link: /features/overview
features:
- title: Static Analysis (SAST)
details: Automated security scanning with Semgrep, detecting vulnerabilities across multiple languages including OWASP patterns, GDPR issues, and OAuth misconfigurations.
- title: Smart Findings with AI Triage
details: Every finding is triaged by an LLM that considers severity, blast radius, and codebase context. You get a confidence score, rationale, and remediation guidance -- not just raw scanner output.
- title: SBOM & License Compliance
details: Full software bill of materials with dependency inventory, vulnerability tracking, license compliance analysis, and export to CycloneDX/SPDX formats.
details: Full software bill of materials with dependency inventory, vulnerability tracking, license compliance analysis, and export to CycloneDX and SPDX formats.
- title: Dynamic Testing (DAST)
details: Black-box security testing of live web applications and APIs. Crawls endpoints, fuzzes parameters, and detects SQL injection, XSS, SSRF, and auth bypass vulnerabilities.
- title: Code Knowledge Graph
details: Interactive visualization of your codebase structure. Understand function calls, class hierarchies, and module dependencies with community detection.
- title: Impact Analysis
details: When a vulnerability is found, see exactly which entry points and call chains are affected. Understand blast radius before prioritizing fixes.
details: Interactive visualization of your codebase structure. Understand function calls, class hierarchies, and module dependencies at a glance.
- title: AI-Powered Chat
details: Ask questions about your codebase using RAG-powered AI. Code is embedded as vectors and retrieved contextually to give accurate, source-referenced answers.
details: Ask questions about your codebase using RAG-powered AI. Code is embedded and retrieved contextually to give accurate, source-referenced answers.
- title: MCP Integration
details: Expose your security data to LLM tools like Claude and Cursor through the Model Context Protocol. Query findings, SBOMs, and DAST results from any MCP-compatible client.
---

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

View File

@@ -0,0 +1,70 @@
# Glossary
A reference of key terms used throughout Certifai.
## Security Terms
**SAST (Static Application Security Testing)**
Analysis of source code to find vulnerabilities without running the application. Certifai uses Semgrep for SAST scanning.
**DAST (Dynamic Application Security Testing)**
Testing a running application by sending crafted requests and analyzing responses. Finds vulnerabilities that only appear at runtime.
**SBOM (Software Bill of Materials)**
A complete inventory of all software components (libraries, packages, frameworks) that your application depends on, including versions and licenses.
**CVE (Common Vulnerabilities and Exposures)**
A standardized identifier for publicly known security vulnerabilities. Each CVE has a unique ID (e.g. CVE-2024-1234) and is tracked in the National Vulnerability Database.
**False Positive**
A finding that is flagged as a vulnerability by a scanner but is not actually a security issue in context. For example, a SQL injection warning on a query that uses parameterized statements correctly.
**Triage**
The process of reviewing a security finding and deciding what to do with it: confirm it as real, mark it as a false positive, or accept the risk and ignore it.
**Fingerprint**
A unique hash generated for each finding based on the scanner, file path, line number, and vulnerability type. Used for deduplication so the same issue is not reported twice.
**Confidence Score**
A value from 0.0 to 1.0 assigned by the AI triage engine, indicating how certain the LLM is about its assessment of a finding.
**CWE (Common Weakness Enumeration)**
A community-developed list of software and hardware weakness types. Findings often reference a CWE ID to categorize the type of vulnerability.
**CVSS (Common Vulnerability Scoring System)**
A standardized framework for rating the severity of security vulnerabilities on a scale of 0.0 to 10.0.
## License Terms
**Copyleft License**
A license that requires derivative works to be distributed under the same license terms. Examples: GPL-2.0, GPL-3.0, AGPL-3.0, LGPL-2.1, LGPL-3.0, MPL-2.0.
**Permissive License**
A license that allows broad freedom to use, modify, and distribute software with minimal restrictions. Examples: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC.
## Standards and Formats
**CycloneDX**
An OWASP standard for SBOM formats. Certifai supports export in CycloneDX 1.5 JSON format.
**SPDX (Software Package Data Exchange)**
A Linux Foundation standard for communicating software bill of materials information. Certifai supports export in SPDX 2.3 format.
## Tools
**Semgrep**
An open-source static analysis tool that finds bugs and enforces code standards using pattern-matching rules. Used by Certifai for SAST scanning.
**Syft**
An open-source tool for generating SBOMs from container images and filesystems. Used by Certifai to extract dependency information.
**Grype**
An open-source vulnerability scanner for container images and filesystems. Used by Certifai to match dependencies against known vulnerabilities.
## Protocols
**MCP (Model Context Protocol)**
An open standard that allows LLM-powered tools to connect to external data sources and call tools. Certifai exposes security data through MCP so AI assistants can query findings, SBOMs, and DAST results.
**PKCE (Proof Key for Code Exchange)**
An extension to the OAuth 2.0 authorization code flow that prevents authorization code interception attacks. Used in Certifai's authentication flow.

99
docs/reference/tools.md Normal file
View File

@@ -0,0 +1,99 @@
# Tools & Scanners
Certifai uses a combination of open-source scanners and AI-powered analysis to provide comprehensive security coverage. This page describes each tool and how it contributes to the scan pipeline.
## Semgrep -- Static Analysis (SAST)
[Semgrep](https://semgrep.dev/) is an open-source static analysis tool that finds vulnerabilities by matching patterns in source code. It supports many languages and has an extensive rule library.
Certifai runs Semgrep with rules covering:
- **OWASP Top 10** -- injection, broken authentication, XSS, insecure deserialization, and more
- **General security** -- insecure cryptography, hardcoded credentials, path traversal
- **Language-specific** -- patterns unique to Python, JavaScript, TypeScript, Rust, Go, Java, and others
Semgrep produces SAST-type findings with file paths, line numbers, and rule descriptions.
## Syft -- SBOM Generation
[Syft](https://github.com/anchore/syft) is an open-source tool for generating Software Bills of Materials. It scans your repository and identifies every dependency, including:
- Package name and version
- Package manager (npm, cargo, pip, go, maven, nuget, composer, gem)
- License information
Syft output feeds into both the SBOM feature and the vulnerability scanning pipeline.
## Grype -- Vulnerability Scanning
[Grype](https://github.com/anchore/grype) is an open-source vulnerability scanner that matches your dependencies against known vulnerability databases. It takes Syft's SBOM output and cross-references it against:
- National Vulnerability Database (NVD)
- GitHub Advisory Database
- OS-specific advisory databases
Grype produces SBOM-type findings with CVE identifiers, severity ratings, and links to advisories.
## Custom OAuth Scanner
A purpose-built scanner that detects OAuth and OIDC misconfigurations in your code, including:
- Missing state parameter validation
- Insecure token storage
- Incorrect redirect URI handling
- Missing PKCE implementation
- Token exposure in logs or URLs
## Custom GDPR Scanner
A scanner focused on data protection compliance, detecting:
- Personal data handling without consent checks
- Missing data retention policies
- Unencrypted PII storage
- Cross-border data transfer issues
## Custom Secrets Scanner
Detects hardcoded secrets and credentials in source code:
- API keys and tokens
- Database connection strings with embedded passwords
- Private keys and certificates
- Cloud provider credentials (AWS, GCP, Azure)
## LLM-Powered Code Review
Beyond rule-based scanning, Certifai uses an LLM to perform architectural and security code review. The AI reviews code patterns that are too nuanced for static rules, such as:
- Business logic flaws
- Race conditions
- Improper error handling that leaks information
- Insecure design patterns
Code review findings are marked with the **Code Review** type.
## LLM-Powered Triage
Every finding -- regardless of which scanner produced it -- goes through AI triage. Here is how it works:
1. **Context gathering** -- the triage engine collects the finding details, the code snippet, and information from the code knowledge graph (what calls this code, what it calls, how it connects to entry points)
2. **Severity assessment** -- the LLM evaluates the finding considering:
- The vulnerability type and its typical impact
- The specific code context (is this in a test file? behind authentication? in dead code?)
- The blast radius -- how many entry points and call chains are affected, based on the code graph
3. **Confidence scoring** -- the LLM assigns a confidence score (0.0 to 1.0) indicating how certain it is about the assessment
4. **Rationale generation** -- the LLM writes a human-readable explanation of why it assigned the severity and confidence it did
5. **Remediation guidance** -- the LLM generates step-by-step fix instructions and, where possible, a suggested code fix
### Learning from Feedback
When you mark findings as false positives or provide developer feedback, this information is used to improve future triage accuracy. Over time, the AI becomes better at understanding which findings are actionable in your specific codebase and which are noise.
::: tip
The AI triage is a starting point, not a final verdict. Always review the rationale and code evidence before acting on a finding. See [Understanding Findings](/guide/findings#human-in-the-loop) for more on the human-in-the-loop workflow.
:::