Add VitePress documentation site with complete user guides

Covers getting started, repositories, scanning, findings, configuration, SBOM, code graph, impact analysis, DAST, AI chat, issue tracker integration, Docker deployment, environment variables, Keycloak auth, and OpenTelemetry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:18:58 +01:00
parent 65abc55915
commit 94552d1626
21 changed files with 4019 additions and 0 deletions
@@ -0,0 +1,79 @@
+# AI Chat (RAG)
+
+The AI Chat feature lets you ask natural language questions about your codebase. It uses Retrieval-Augmented Generation (RAG) to find relevant code and provide accurate, source-referenced answers.
+
+## How It Works
+
+1. **Code graph** is built for the repository (functions, classes, modules)
+2. **Embeddings** are generated for each code symbol using an LLM embedding model
+3. When you ask a question, your query is **embedded** and compared against code embeddings
+4. The **top 8 most relevant** code snippets are retrieved
+5. These snippets are sent as context to the LLM along with your question
+6. The LLM generates a response **grounded in your actual code**
+
+## Getting Started
+
+### 1. Select a Repository
+
+Navigate to **AI Chat** in the sidebar. You'll see a grid of repository cards. Click one to open the chat interface.
+
+### 2. Build Embeddings
+
+Before chatting, you need to build embeddings for the repository:
+
+1. Click **Build Embeddings**
+2. Wait for the process to complete — a progress bar shows `X/Y chunks`
+3. Once the status shows **Embeddings ready**, the chat input is enabled
+
+::: info
+Embedding builds require:
+- A code graph already built for the repository (via the Graph feature)
+- A configured embedding model (`LITELLM_EMBED_MODEL`)
+
+The default model is `text-embedding-3-small`.
+:::
+
+### 3. Ask Questions
+
+Type your question in the input area and press Enter (or click Send). Examples:
+
+- "How does authentication work in this codebase?"
+- "What functions handle database connections?"
+- "Explain the error handling pattern used in this project"
+- "Where are the API routes defined?"
+- "What does the `process_scan` function do?"
+
+## Understanding Responses
+
+### Answer
+
+The AI response is a natural language answer to your question, grounded in the actual source code of your repository.
+
+### Source References
+
+Below each response, you'll see source references showing exactly which code was used to generate the answer:
+
+- **Symbol name** — The qualified name of the function/class/module
+- **File path** — Where the code is located, with line range
+- **Code snippet** — The first ~10 lines of the relevant code
+- **Relevance score** — How closely the code matched your question (0.0 to 1.0)
+
+## Conversation Context
+
+The chat maintains conversation history within a session. You can ask follow-up questions that reference previous answers. The system sends the last 10 messages as context to maintain coherence.
+
+## Configuration
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `LITELLM_URL` | LiteLLM proxy URL | `http://localhost:4000` |
+| `LITELLM_API_KEY` | API key for the LLM provider | — |
+| `LITELLM_MODEL` | Model for chat responses | `gpt-4o` |
+| `LITELLM_EMBED_MODEL` | Model for code embeddings | `text-embedding-3-small` |
+
+## Tips
+
+- **Be specific** — "How does the JWT validation middleware work?" is better than "Tell me about auth"
+- **Reference filenames** — "What does `server.rs` do?" helps the retrieval find relevant code
+- **Ask about patterns** — "What error handling pattern does this project use?" works well with RAG
+- **Rebuild after changes** — If the repository has been updated significantly, rebuild embeddings to include new code
@@ -0,0 +1,112 @@
+# DAST Scanning
+
+DAST (Dynamic Application Security Testing) performs black-box security testing against live web applications and APIs. Unlike SAST which analyzes source code, DAST tests running applications by sending crafted requests and analyzing responses.
+
+## DAST Overview
+
+Navigate to **DAST** in the sidebar to see the overview page with:
+
+- Total DAST scans performed
+- Total DAST findings discovered
+- Number of active targets
+- Recent scan run history with status, phase, and finding counts
+
+## Managing Targets
+
+Navigate to **DAST > Targets** to configure applications to test.
+
+### Adding a Target
+
+1. Enter a **target name** (descriptive label)
+2. Enter the **base URL** (e.g. `https://staging.example.com`)
+3. Click **Add Target**
+
+### Target Configuration
+
+Each target supports these settings:
+
+| Setting | Description | Default |
+|---------|-------------|---------|
+| **Target Type** | WebApp, REST API, or GraphQL | WebApp |
+| **Max Crawl Depth** | How many link levels to follow | 5 |
+| **Rate Limit** | Maximum requests per second | 10 |
+| **Destructive Tests** | Allow DELETE/PUT requests | No |
+| **Excluded Paths** | URL paths to skip during testing | — |
+
+### Authentication
+
+DAST supports authenticated scanning with multiple methods:
+
+| Method | Configuration |
+|--------|--------------|
+| **None** | No authentication |
+| **Basic** | Username and password (HTTP Basic Auth) |
+| **Bearer** | Bearer token (Authorization header) |
+| **Cookie** | Session cookie value |
+| **Form** | Login URL, username field, password field, and credentials |
+
+::: warning
+Authenticated scans access more of the application surface. Only test applications you own or have explicit authorization to test.
+:::
+
+## Running a DAST Scan
+
+Click the **Scan** button on any target row. The scan runs through these phases:
+
+1. **Crawl** — Discovers pages, forms, and API endpoints by following links and analyzing JavaScript
+2. **Test** — Sends attack payloads to discovered parameters
+3. **Report** — Collects results and generates findings
+
+The scan uses a headless Chromium browser (the `chromium` service in Docker Compose) for JavaScript rendering during crawling.
+
+## DAST Scan Agents
+
+The scanner includes specialized testing agents:
+
+### API Fuzzer
+Tests API endpoints with malformed inputs, boundary values, and injection payloads.
+
+### XSS Scanner
+Detects Cross-Site Scripting vulnerabilities by injecting script payloads into form fields, URL parameters, and headers.
+
+### SSRF Scanner
+Tests for Server-Side Request Forgery by injecting internal URLs and cloud metadata endpoints into parameters.
+
+### Auth Bypass Scanner
+Tests for authentication and authorization bypass by manipulating tokens, sessions, and access control headers.
+
+## DAST Findings
+
+Navigate to **DAST > Findings** to see all discovered vulnerabilities.
+
+### Finding List
+
+Each finding shows:
+
+| Column | Description |
+|--------|-------------|
+| Severity | Critical, High, Medium, or Low |
+| Type | Vulnerability category (SQL Injection, XSS, SSRF, etc.) |
+| Title | Description of the vulnerability |
+| Endpoint | The HTTP path that is vulnerable |
+| Method | HTTP method (GET, POST, PUT, DELETE) |
+| Exploitable | Whether the vulnerability was confirmed exploitable |
+
+### Finding Detail
+
+Click a finding to see full details:
+
+- **Vulnerability type** and CWE identifier
+- **Endpoint URL** and HTTP method
+- **Parameter** that is vulnerable
+- **Exploitability** — Confirmed or Unconfirmed
+- **Description** — What the vulnerability is and why it matters
+- **Remediation** — How to fix the issue
+- **Evidence** — One or more request/response pairs showing:
+  - The crafted HTTP request (method, URL, headers)
+  - The payload that triggered the vulnerability
+  - The HTTP response status and relevant snippet
+
+::: tip
+Findings marked as **Confirmed** exploitable were verified by the scanner with a successful attack. **Unconfirmed** findings show suspicious behavior that may indicate a vulnerability but could not be fully exploited.
+:::
@@ -0,0 +1,92 @@
+# Code Knowledge Graph
+
+The Code Knowledge Graph feature parses your repository source code and builds an interactive graph of symbols (functions, classes, modules) and their relationships (calls, imports, inheritance).
+
+## Graph Index
+
+Navigate to **Code Graph** in the sidebar to see all repositories. Click a repository card to open its graph explorer.
+
+## Building a Graph
+
+Before exploring, you need to build the graph:
+
+1. Open the graph explorer for a repository
+2. Click **Build Graph**
+3. The agent parses all source files and constructs the graph
+4. A spinner shows build progress
+
+The graph builder supports these languages:
+- Rust
+- TypeScript
+- JavaScript
+- Python
+
+## Graph Explorer
+
+The graph explorer provides an interactive network visualization.
+
+### Canvas
+
+The main area renders an interactive network diagram using vis-network:
+
+- **Nodes** represent code symbols (functions, classes, structs, enums, traits, modules, files)
+- **Edges** represent relationships between symbols
+- Nodes are **color-coded by community** — clusters of highly connected symbols detected using Louvain community detection
+- Pan by dragging the background, zoom with scroll wheel
+
+### Node Types
+
+| Type | Description |
+|------|-------------|
+| Function | Standalone functions |
+| Method | Methods on classes/structs |
+| Class | Classes (TypeScript, Python) |
+| Struct | Structs (Rust) |
+| Enum | Enumerations |
+| Interface | Interfaces (TypeScript) |
+| Trait | Traits (Rust) |
+| Module | Modules and namespaces |
+| File | Source files |
+
+### Edge Types
+
+| Type | Description |
+|------|-------------|
+| Calls | Function/method invocation |
+| Imports | Module or symbol import |
+| Inherits | Class inheritance |
+| Implements | Interface/trait implementation |
+| Contains | Parent-child containment (module contains function) |
+| TypeRef | Type reference or usage |
+
+### Statistics
+
+The statistics panel shows:
+- Total node and edge count
+- Number of detected communities
+- Languages found in the repository
+- File tree of the codebase
+
+### Search
+
+Search for symbols by name:
+
+1. Type at least 2 characters in the search box
+2. Matching symbols appear in a dropdown
+3. Click a result to highlight it on the canvas and open the inspector
+
+### Code Inspector
+
+When you click a node (on the canvas or from search), the inspector panel shows:
+
+- **Symbol name** and kind (function, class, etc.)
+- **File path** with line range
+- **Source code** excerpt from the file
+- **Connected nodes** — what this symbol calls, what calls it, etc.
+
+## Use Cases
+
+- **Onboarding** — Understand unfamiliar codebase structure at a glance
+- **Architecture review** — Identify tightly coupled modules and circular dependencies
+- **Security** — Trace data flow from entry points to sensitive operations
+- **Refactoring** — See what depends on code you plan to change
@@ -0,0 +1,42 @@
+# Impact Analysis
+
+Impact Analysis uses the Code Knowledge Graph to determine the blast radius of a security finding. When a vulnerability is found in a specific function or file, impact analysis traces the call graph to show everything that could be affected.
+
+## Accessing Impact Analysis
+
+Impact analysis is linked from the Graph Explorer. When viewing a repository's graph with findings, you can navigate to:
+
+```
+/graph/{repo_id}/impact/{finding_id}
+```
+
+## What You See
+
+### Blast Radius
+
+A count of the total number of code symbols (functions, methods, classes) affected by the vulnerability, both directly and transitively.
+
+### Entry Points Affected
+
+A list of **public entry points** — main functions, HTTP handlers, API endpoints — that could be impacted by the vulnerable code. These represent the ways an attacker could potentially reach the vulnerability.
+
+### Call Chains
+
+Complete call chain paths showing how execution flows from entry points through intermediate functions to the vulnerable code. Each chain shows the sequence of function calls.
+
+### Direct Callers
+
+The immediate functions that call the vulnerable function. These are the first layer of impact.
+
+## How It Works
+
+1. The finding's file path and line number are matched to a node in the code graph
+2. The graph is traversed **backwards** along call edges to find all callers
+3. Entry points (functions with no callers, or known patterns like `main`, HTTP handlers) are identified
+4. All paths from entry points to the vulnerable node are computed
+
+## Use Cases
+
+- **Prioritization** — A critical vulnerability in a function called by 50 entry points is more urgent than one in dead code
+- **Remediation scoping** — Understand what tests need to run after a fix
+- **Risk assessment** — Quantify the actual exposure of a vulnerability
@@ -0,0 +1,72 @@
+# Issue Tracker Integration
+
+Compliance Scanner automatically creates issues in your existing issue trackers when new security findings are discovered. This integrates security into your development workflow without requiring teams to check a separate tool.
+
+## Supported Trackers
+
+| Tracker | Configuration Variables |
+|---------|----------------------|
+| **GitHub Issues** | `GITHUB_TOKEN` |
+| **GitLab Issues** | `GITLAB_URL`, `GITLAB_TOKEN` |
+| **Jira** | `JIRA_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`, `JIRA_PROJECT_KEY` |
+
+## How It Works
+
+1. A scan discovers new findings
+2. For each new finding, the agent checks if an issue already exists (by fingerprint)
+3. If not, it creates an issue in the configured tracker with:
+   - Title matching the finding title
+   - Description with vulnerability details, severity, and file location
+   - Link back to the finding in the dashboard
+4. The finding is updated with the external issue URL
+
+## Viewing Issues
+
+Navigate to **Issues** in the sidebar to see all tracker issues across your repositories.
+
+The issues table shows:
+
+| Column | Description |
+|--------|-------------|
+| Tracker | Badge showing GitHub, GitLab, or Jira |
+| External ID | Issue number in the external system |
+| Title | Issue title |
+| Status | Open, Closed, or tracker-specific status |
+| Created | When the issue was created |
+| Link | Direct link to the issue in the external tracker |
+
+Click the **Open** link to go directly to the issue in GitHub, GitLab, or Jira.
+
+## Configuration
+
+### GitHub
+
+```bash
+GITHUB_TOKEN=ghp_xxxx
+```
+
+Issues are created in the same repository that was scanned.
+
+### GitLab
+
+```bash
+GITLAB_URL=https://gitlab.com
+GITLAB_TOKEN=glpat-xxxx
+```
+
+Issues are created in the same project that was scanned.
+
+### Jira
+
+```bash
+JIRA_URL=https://your-org.atlassian.net
+JIRA_EMAIL=security-bot@example.com
+JIRA_API_TOKEN=your-api-token
+JIRA_PROJECT_KEY=SEC
+```
+
+All issues are created in the specified Jira project (`JIRA_PROJECT_KEY`).
+
+::: tip
+Use a dedicated service account for issue creation so that security findings are clearly attributed to automated scanning rather than individual team members.
+:::
@@ -0,0 +1,35 @@
+# Dashboard Overview
+
+The Overview page is the landing page of the Compliance Scanner dashboard. It gives you a high-level view of your security posture across all tracked repositories.
+
+## Statistics
+
+The top section displays key metrics:
+
+| Metric | Description |
+|--------|-------------|
+| **Repositories** | Total number of tracked repositories |
+| **Total Findings** | Combined count of all security findings |
+| **Critical** | Findings with critical severity |
+| **High** | Findings with high severity |
+| **Medium** | Findings with medium severity |
+| **Low** | Findings with low severity |
+| **Dependencies** | Total SBOM entries across all repositories |
+| **CVE Alerts** | Active CVE alerts from dependency monitoring |
+| **Tracker Issues** | Issues created in external trackers (GitHub, GitLab, Jira) |
+
+## Severity Distribution
+
+A visual bar chart shows the distribution of findings by severity level, giving you an immediate sense of your risk profile.
+
+## Recent Scan Runs
+
+The bottom section lists the 10 most recent scan runs across all repositories, showing:
+
+- Repository name
+- Scan status (queued, running, completed, failed)
+- Current phase
+- Number of findings discovered
+- Timestamp
+
+This helps you monitor scanning activity and quickly spot failures.
@@ -0,0 +1,106 @@
+# SBOM & License Compliance
+
+The SBOM (Software Bill of Materials) feature provides a complete inventory of all dependencies across your repositories, with vulnerability tracking and license compliance analysis.
+
+The SBOM page has three tabs: **Packages**, **License Compliance**, and **Compare**.
+
+## Packages Tab
+
+The packages tab lists all dependencies discovered during scans.
+
+### Filtering
+
+Use the filter bar to narrow results:
+
+- **Repository** — Select a specific repository or view all
+- **Package Manager** — npm, cargo, pip, go, maven, nuget, composer, gem
+- **Search** — Filter by package name
+- **Vulnerabilities** — Show all packages, only those with vulnerabilities, or only clean packages
+- **License** — Filter by specific license (MIT, Apache-2.0, BSD-3-Clause, GPL-3.0, etc.)
+
+### Package Details
+
+Each package row shows:
+
+| Column | Description |
+|--------|-------------|
+| Package | Package name |
+| Version | Installed version |
+| Manager | Package manager (npm, cargo, pip, etc.) |
+| License | License identifier with color-coded badge |
+| Vulnerabilities | Count of known vulnerabilities (click to expand) |
+
+### Vulnerability Details
+
+Click the vulnerability count to expand inline details showing:
+
+- Vulnerability ID (e.g. CVE-2024-1234)
+- Source database
+- Severity level
+- Link to the advisory
+
+### Export
+
+Export your SBOM in industry-standard formats:
+
+1. Select a format:
+   - **CycloneDX 1.5** — JSON format widely supported by security tools
+   - **SPDX 2.3** — Linux Foundation standard for license compliance
+2. Click **Export**
+3. The SBOM downloads as a JSON file
+
+::: tip
+SBOM exports are useful for compliance audits, customer security questionnaires, and supply chain transparency requirements.
+:::
+
+## License Compliance Tab
+
+The license compliance tab helps you understand your licensing obligations.
+
+### Copyleft Warning
+
+If any dependencies use copyleft licenses (GPL, AGPL, LGPL, MPL), a warning banner appears listing the affected packages and noting that they may impose distribution requirements.
+
+### License Distribution
+
+A horizontal bar chart visualizes the percentage breakdown of licenses across your dependencies.
+
+### License Table
+
+A detailed table lists every license found, with:
+
+| Column | Description |
+|--------|-------------|
+| License | License identifier |
+| Type | **Copyleft** or **Permissive** badge |
+| Packages | List of packages using this license |
+| Count | Number of packages |
+
+**Copyleft licenses** (flagged as potentially restrictive):
+- GPL-2.0, GPL-3.0
+- AGPL-3.0
+- LGPL-2.1, LGPL-3.0
+- MPL-2.0
+
+**Permissive licenses** (generally safe for commercial use):
+- MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, etc.
+
+## Compare Tab
+
+Compare the dependency profiles of two repositories side by side.
+
+1. Select **Repository A** from the first dropdown
+2. Select **Repository B** from the second dropdown
+3. View the diff results:
+
+| Section | Description |
+|---------|-------------|
+| **Only in A** | Packages present in repo A but not in repo B |
+| **Only in B** | Packages present in repo B but not in repo A |
+| **Version Diffs** | Same package, different versions between repos |
+| **Common** | Count of packages that match exactly |
+
+This is useful for:
+- Auditing consistency across microservices
+- Identifying dependency drift between environments
+- Planning dependency upgrades across projects