docs: update README and add help-chat, deduplication docs

README.md: - Add DAST, pentesting, code graph, AI chat, MCP, help chat to features table - Add Gitea to tracker list, multi-language LLM triage note - Update architecture diagram with all 5 workspace crates - Add new API endpoints (graph, DAST, chat, help, pentest) - Update dashboard pages table (remove Settings, add 6 new pages) - Update project structure with new directories - Add Keycloak, Chromium to external services New docs: - docs/features/help-chat.md — Help chat assistant usage, API, config - docs/features/deduplication.md — Finding dedup across SAST, DAST, PR, issues Updated: - docs/features/overview.md — Add help chat section, update tracker list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 09:49:11 +02:00
parent 263a4e654a
commit 4d7efea683
4 changed files with 203 additions and 46 deletions
--- a/README.md
+++ b/README.md
@@ -28,9 +28,9 @@

 ## About

-Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira) with evidence and remediation suggestions, reviews pull requests, and exposes a Dioxus-based dashboard for visualization.
+Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira/Gitea) with evidence and remediation suggestions, reviews pull requests with multi-pass LLM analysis, runs autonomous penetration tests, and exposes a Dioxus-based dashboard for visualization.

-> **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation.
+> **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation with multi-language awareness.

 ## Features

@@ -41,31 +41,38 @@ Compliance Scanner is an autonomous agent that continuously monitors git reposit
 | **CVE Monitoring** | OSV.dev batch queries, NVD CVSS enrichment, SearXNG context |
 | **GDPR Patterns** | Detect PII logging, missing consent, hardcoded retention, missing deletion |
 | **OAuth Patterns** | Detect implicit grant, missing PKCE, token in localStorage, token in URLs |
-| **LLM Triage** | Confidence scoring via LiteLLM to filter false positives |
-| **Issue Creation** | Auto-create issues in GitHub, GitLab, or Jira with code evidence |
-| **PR Reviews** | Post security review comments on pull requests |
-| **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, and statistics |
-| **Webhooks** | GitHub (HMAC-SHA256) and GitLab webhook receivers for push/PR events |
+| **LLM Triage** | Multi-language-aware confidence scoring (Rust, Python, Go, Java, Ruby, PHP, C++) |
+| **Issue Creation** | Auto-create issues in GitHub, GitLab, Jira, or Gitea with dedup via fingerprints |
+| **PR Reviews** | Multi-pass security review (logic, security, convention, complexity) with dedup |
+| **DAST Scanning** | Black-box security testing with endpoint discovery and parameter fuzzing |
+| **AI Pentesting** | Autonomous LLM-orchestrated penetration testing with encrypted reports |
+| **Code Graph** | Interactive code knowledge graph with impact analysis |
+| **AI Chat (RAG)** | Natural language Q&A grounded in repository source code |
+| **Help Assistant** | Documentation-grounded help chat accessible from every dashboard page |
+| **MCP Server** | Expose live security data to Claude, Cursor, and other AI tools |
+| **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, DAST, pentest, and graph |
+| **Webhooks** | GitHub, GitLab, and Gitea webhook receivers for push/PR events |
+| **Finding Dedup** | SHA-256 fingerprint dedup for SAST, CWE-based dedup for DAST findings |

 ## Architecture

 ```
-┌─────────────────────────────────────────────────────────────┐
-│                    Cargo Workspace                           │
-├──────────────┬──────────────────┬───────────────────────────┤
-│ compliance-  │ compliance-      │ compliance-               │
-│ core         │ agent            │ dashboard                 │
-│ (lib)        │ (bin)            │ (bin, Dioxus 0.7.3)       │
-│              │                  │                           │
-│ Models       │ Scan Pipeline    │ Fullstack Web UI          │
-│ Traits       │ LLM Client      │ Server Functions           │
-│ Config       │ Issue Trackers   │ Charts + Tables           │
-│ Errors       │ Scheduler        │ Settings Page             │
-│              │ REST API         │                           │
-│              │ Webhooks         │                           │
-└──────────────┴──────────────────┴───────────────────────────┘
-                        │
-                   MongoDB (shared)
+┌──────────────────────────────────────────────────────────────────────────┐
+│                          Cargo Workspace                                 │
+├──────────────┬──────────────────┬──────────────┬──────────┬─────────────┤
+│ compliance-  │ compliance-      │ compliance-  │ complian-│ compliance- │
+│ core (lib)   │ agent (bin)      │ dashboard    │ ce-graph │ mcp (bin)   │
+│              │                  │ (bin)        │ (lib)    │             │
+│ Models       │ Scan Pipeline    │ Dioxus 0.7   │ Tree-    │ MCP Server  │
+│ Traits       │ LLM Client      │ Fullstack UI │ sitter   │ Live data   │
+│ Config       │ Issue Trackers   │ Help Chat    │ Graph    │ for AI      │
+│ Errors       │ Pentest Engine   │ Server Fns   │ Embedds  │ tools       │
+│              │ DAST Tools       │              │ RAG      │             │
+│              │ REST API         │              │          │             │
+│              │ Webhooks         │              │          │             │
+└──────────────┴──────────────────┴──────────────┴──────────┴─────────────┘
+                                 │
+                            MongoDB (shared)
 ```

 ## Scan Pipeline (7 Stages)
@@ -84,11 +91,16 @@ Compliance Scanner is an autonomous agent that continuously monitors git reposit
 |-------|-----------|
 | Shared Library | `compliance-core` -- models, traits, config |
 | Agent | Axum REST API, git2, tokio-cron-scheduler, Semgrep, Syft |
-| Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS |
+| Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS 4 |
+| Code Graph | `compliance-graph` -- tree-sitter parsing, embeddings, RAG |
+| MCP Server | `compliance-mcp` -- Model Context Protocol for AI tools |
+| DAST | `compliance-dast` -- dynamic application security testing |
 | Database | MongoDB with typed collections |
-| LLM | LiteLLM (OpenAI-compatible API) |
-| Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3) |
+| LLM | LiteLLM (OpenAI-compatible API for chat, triage, embeddings) |
+| Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3), Gitea |
 | CVE Sources | OSV.dev, NVD, SearXNG |
+| Auth | Keycloak (OAuth2/PKCE, SSO) |
+| Browser Automation | Chromium (headless, for pentesting and PDF generation) |

 ## Getting Started

@@ -151,20 +163,35 @@ The agent exposes a REST API on port 3001:
 | `GET` | `/api/v1/sbom` | List dependencies |
 | `GET` | `/api/v1/issues` | List cross-tracker issues |
 | `GET` | `/api/v1/scan-runs` | Scan execution history |
+| `GET` | `/api/v1/graph/:repo_id` | Code knowledge graph |
+| `POST` | `/api/v1/graph/:repo_id/build` | Trigger graph build |
+| `GET` | `/api/v1/dast/targets` | List DAST targets |
+| `POST` | `/api/v1/dast/targets` | Add DAST target |
+| `GET` | `/api/v1/dast/findings` | List DAST findings |
+| `POST` | `/api/v1/chat/:repo_id` | RAG-powered code chat |
+| `POST` | `/api/v1/help/chat` | Documentation-grounded help chat |
+| `POST` | `/api/v1/pentest/sessions` | Create pentest session |
+| `POST` | `/api/v1/pentest/sessions/:id/export` | Export encrypted pentest report |
 | `POST` | `/webhook/github` | GitHub webhook (HMAC-SHA256) |
 | `POST` | `/webhook/gitlab` | GitLab webhook (token verify) |
+| `POST` | `/webhook/gitea` | Gitea webhook |

 ## Dashboard Pages

 | Page | Description |
 |------|-------------|
-| **Overview** | Stat cards, severity distribution chart |
-| **Repositories** | Add/manage tracked repos, trigger scans |
-| **Findings** | Filterable table by severity, type, status |
+| **Overview** | Stat cards, severity distribution, AI chat cards, MCP status |
+| **Repositories** | Add/manage tracked repos, trigger scans, webhook config |
+| **Findings** | Filterable table by severity, type, status, scanner |
 | **Finding Detail** | Code evidence, remediation, suggested fix, linked issue |
-| **SBOM** | Dependency inventory with vulnerability badges |
-| **Issues** | Cross-tracker view (GitHub + GitLab + Jira) |
-| **Settings** | Configure LiteLLM, tracker tokens, SearXNG URL |
+| **SBOM** | Dependency inventory with vulnerability badges, license summary |
+| **Issues** | Cross-tracker view (GitHub + GitLab + Jira + Gitea) |
+| **Code Graph** | Interactive architecture visualization, impact analysis |
+| **AI Chat** | RAG-powered Q&A about repository code |
+| **DAST** | Dynamic scanning targets, findings, and scan history |
+| **Pentest** | AI-driven pentest sessions, attack chain visualization |
+| **MCP Servers** | Model Context Protocol server management |
+| **Help Chat** | Floating assistant (available on every page) for product Q&A |

 ## Project Structure

@@ -173,19 +200,24 @@ compliance-scanner/
 ├── compliance-core/        Shared library (models, traits, config, errors)
 ├── compliance-agent/       Agent daemon (pipeline, LLM, trackers, API, webhooks)
 │   └── src/
-│       ├── pipeline/       7-stage scan pipeline
-│       ├── llm/            LiteLLM client, triage, descriptions, fixes, PR review
-│       ├── trackers/       GitHub, GitLab, Jira integrations
-│       ├── api/            REST API (Axum)
-│       └── webhooks/       GitHub + GitLab webhook receivers
+│       ├── pipeline/       7-stage scan pipeline, dedup, PR reviews, code review
+│       ├── llm/            LiteLLM client, triage, descriptions, fixes, review prompts
+│       ├── trackers/       GitHub, GitLab, Jira, Gitea integrations
+│       ├── pentest/        AI-driven pentest orchestrator, tools, reports
+│       ├── rag/            RAG pipeline, chunking, embedding
+│       ├── api/            REST API (Axum), help chat
+│       └── webhooks/       GitHub, GitLab, Gitea webhook receivers
 ├── compliance-dashboard/   Dioxus fullstack dashboard
 │   └── src/
-│       ├── components/     Reusable UI components
-│       ├── infrastructure/ Server functions, DB, config
-│       └── pages/          Full page views
+│       ├── components/     Reusable UI (sidebar, help chat, attack chain, etc.)
+│       ├── infrastructure/ Server functions, DB, config, auth
+│       └── pages/          Full page views (overview, DAST, pentest, graph, etc.)
+├── compliance-graph/       Code knowledge graph (tree-sitter, embeddings, RAG)
+├── compliance-dast/        Dynamic application security testing
+├── compliance-mcp/         Model Context Protocol server
+├── docs/                   VitePress documentation site
 ├── assets/                 Static assets (CSS, icons)
-├── styles/                 Tailwind input stylesheet
-└── bin/                    Dashboard binary entrypoint
+└── styles/                 Tailwind input stylesheet
 ```

 ## External Services
@@ -193,10 +225,12 @@ compliance-scanner/
 | Service | Purpose | Default URL |
 |---------|---------|-------------|
 | MongoDB | Persistence | `mongodb://localhost:27017` |
-| LiteLLM | LLM proxy for triage and generation | `http://localhost:4000` |
+| LiteLLM | LLM proxy (chat, triage, embeddings) | `http://localhost:4000` |
 | SearXNG | CVE context search | `http://localhost:8888` |
+| Keycloak | Authentication (OAuth2/PKCE, SSO) | `http://localhost:8080` |
 | Semgrep | SAST scanning | CLI tool |
 | Syft | SBOM generation | CLI tool |
+| Chromium | Headless browser (pentesting, PDF) | Managed via Docker |

 ---