Compare commits
7 Commits
fix/remove
...
feat/help-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4d7efea683 | ||
|
|
263a4e654a | ||
| bae24f9cf8 | |||
| dd53132746 | |||
| ff088f9eb4 | |||
| 745ad8a441 | |||
| a9d039dad3 |
6
Cargo.lock
generated
6
Cargo.lock
generated
@@ -4699,9 +4699,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "rustls-webpki"
|
||||
version = "0.103.9"
|
||||
version = "0.103.10"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d7df23109aa6c1567d1c575b9952556388da57401e4ace1d15f79eedad0d8f53"
|
||||
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
|
||||
dependencies = [
|
||||
"ring",
|
||||
"rustls-pki-types",
|
||||
@@ -5171,7 +5171,7 @@ version = "0.8.9"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c1c97747dbf44bb1ca44a561ece23508e99cb592e862f22222dcf42f51d1e451"
|
||||
dependencies = [
|
||||
"heck 0.5.0",
|
||||
"heck 0.4.1",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
|
||||
120
README.md
120
README.md
@@ -28,9 +28,9 @@
|
||||
|
||||
## About
|
||||
|
||||
Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira) with evidence and remediation suggestions, reviews pull requests, and exposes a Dioxus-based dashboard for visualization.
|
||||
Compliance Scanner is an autonomous agent that continuously monitors git repositories for security vulnerabilities, GDPR/OAuth compliance patterns, and dependency risks. It creates issues in external trackers (GitHub/GitLab/Jira/Gitea) with evidence and remediation suggestions, reviews pull requests with multi-pass LLM analysis, runs autonomous penetration tests, and exposes a Dioxus-based dashboard for visualization.
|
||||
|
||||
> **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation.
|
||||
> **How it works:** The agent runs as a lazy daemon -- it only scans when new commits are detected, triggered by cron schedules or webhooks. LLM-powered triage filters out false positives and generates actionable remediation with multi-language awareness.
|
||||
|
||||
## Features
|
||||
|
||||
@@ -41,31 +41,38 @@ Compliance Scanner is an autonomous agent that continuously monitors git reposit
|
||||
| **CVE Monitoring** | OSV.dev batch queries, NVD CVSS enrichment, SearXNG context |
|
||||
| **GDPR Patterns** | Detect PII logging, missing consent, hardcoded retention, missing deletion |
|
||||
| **OAuth Patterns** | Detect implicit grant, missing PKCE, token in localStorage, token in URLs |
|
||||
| **LLM Triage** | Confidence scoring via LiteLLM to filter false positives |
|
||||
| **Issue Creation** | Auto-create issues in GitHub, GitLab, or Jira with code evidence |
|
||||
| **PR Reviews** | Post security review comments on pull requests |
|
||||
| **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, and statistics |
|
||||
| **Webhooks** | GitHub (HMAC-SHA256) and GitLab webhook receivers for push/PR events |
|
||||
| **LLM Triage** | Multi-language-aware confidence scoring (Rust, Python, Go, Java, Ruby, PHP, C++) |
|
||||
| **Issue Creation** | Auto-create issues in GitHub, GitLab, Jira, or Gitea with dedup via fingerprints |
|
||||
| **PR Reviews** | Multi-pass security review (logic, security, convention, complexity) with dedup |
|
||||
| **DAST Scanning** | Black-box security testing with endpoint discovery and parameter fuzzing |
|
||||
| **AI Pentesting** | Autonomous LLM-orchestrated penetration testing with encrypted reports |
|
||||
| **Code Graph** | Interactive code knowledge graph with impact analysis |
|
||||
| **AI Chat (RAG)** | Natural language Q&A grounded in repository source code |
|
||||
| **Help Assistant** | Documentation-grounded help chat accessible from every dashboard page |
|
||||
| **MCP Server** | Expose live security data to Claude, Cursor, and other AI tools |
|
||||
| **Dashboard** | Fullstack Dioxus UI with findings, SBOM, issues, DAST, pentest, and graph |
|
||||
| **Webhooks** | GitHub, GitLab, and Gitea webhook receivers for push/PR events |
|
||||
| **Finding Dedup** | SHA-256 fingerprint dedup for SAST, CWE-based dedup for DAST findings |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Cargo Workspace │
|
||||
├──────────────┬──────────────────┬───────────────────────────┤
|
||||
│ compliance- │ compliance- │ compliance- │
|
||||
│ core │ agent │ dashboard │
|
||||
│ (lib) │ (bin) │ (bin, Dioxus 0.7.3) │
|
||||
│ │ │ │
|
||||
│ Models │ Scan Pipeline │ Fullstack Web UI │
|
||||
│ Traits │ LLM Client │ Server Functions │
|
||||
│ Config │ Issue Trackers │ Charts + Tables │
|
||||
│ Errors │ Scheduler │ Settings Page │
|
||||
│ │ REST API │ │
|
||||
│ │ Webhooks │ │
|
||||
└──────────────┴──────────────────┴───────────────────────────┘
|
||||
│
|
||||
MongoDB (shared)
|
||||
┌──────────────────────────────────────────────────────────────────────────┐
|
||||
│ Cargo Workspace │
|
||||
├──────────────┬──────────────────┬──────────────┬──────────┬─────────────┤
|
||||
│ compliance- │ compliance- │ compliance- │ complian-│ compliance- │
|
||||
│ core (lib) │ agent (bin) │ dashboard │ ce-graph │ mcp (bin) │
|
||||
│ │ │ (bin) │ (lib) │ │
|
||||
│ Models │ Scan Pipeline │ Dioxus 0.7 │ Tree- │ MCP Server │
|
||||
│ Traits │ LLM Client │ Fullstack UI │ sitter │ Live data │
|
||||
│ Config │ Issue Trackers │ Help Chat │ Graph │ for AI │
|
||||
│ Errors │ Pentest Engine │ Server Fns │ Embedds │ tools │
|
||||
│ │ DAST Tools │ │ RAG │ │
|
||||
│ │ REST API │ │ │ │
|
||||
│ │ Webhooks │ │ │ │
|
||||
└──────────────┴──────────────────┴──────────────┴──────────┴─────────────┘
|
||||
│
|
||||
MongoDB (shared)
|
||||
```
|
||||
|
||||
## Scan Pipeline (7 Stages)
|
||||
@@ -84,11 +91,16 @@ Compliance Scanner is an autonomous agent that continuously monitors git reposit
|
||||
|-------|-----------|
|
||||
| Shared Library | `compliance-core` -- models, traits, config |
|
||||
| Agent | Axum REST API, git2, tokio-cron-scheduler, Semgrep, Syft |
|
||||
| Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS |
|
||||
| Dashboard | Dioxus 0.7.3 fullstack, Tailwind CSS 4 |
|
||||
| Code Graph | `compliance-graph` -- tree-sitter parsing, embeddings, RAG |
|
||||
| MCP Server | `compliance-mcp` -- Model Context Protocol for AI tools |
|
||||
| DAST | `compliance-dast` -- dynamic application security testing |
|
||||
| Database | MongoDB with typed collections |
|
||||
| LLM | LiteLLM (OpenAI-compatible API) |
|
||||
| Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3) |
|
||||
| LLM | LiteLLM (OpenAI-compatible API for chat, triage, embeddings) |
|
||||
| Issue Trackers | GitHub (octocrab), GitLab (REST v4), Jira (REST v3), Gitea |
|
||||
| CVE Sources | OSV.dev, NVD, SearXNG |
|
||||
| Auth | Keycloak (OAuth2/PKCE, SSO) |
|
||||
| Browser Automation | Chromium (headless, for pentesting and PDF generation) |
|
||||
|
||||
## Getting Started
|
||||
|
||||
@@ -151,20 +163,35 @@ The agent exposes a REST API on port 3001:
|
||||
| `GET` | `/api/v1/sbom` | List dependencies |
|
||||
| `GET` | `/api/v1/issues` | List cross-tracker issues |
|
||||
| `GET` | `/api/v1/scan-runs` | Scan execution history |
|
||||
| `GET` | `/api/v1/graph/:repo_id` | Code knowledge graph |
|
||||
| `POST` | `/api/v1/graph/:repo_id/build` | Trigger graph build |
|
||||
| `GET` | `/api/v1/dast/targets` | List DAST targets |
|
||||
| `POST` | `/api/v1/dast/targets` | Add DAST target |
|
||||
| `GET` | `/api/v1/dast/findings` | List DAST findings |
|
||||
| `POST` | `/api/v1/chat/:repo_id` | RAG-powered code chat |
|
||||
| `POST` | `/api/v1/help/chat` | Documentation-grounded help chat |
|
||||
| `POST` | `/api/v1/pentest/sessions` | Create pentest session |
|
||||
| `POST` | `/api/v1/pentest/sessions/:id/export` | Export encrypted pentest report |
|
||||
| `POST` | `/webhook/github` | GitHub webhook (HMAC-SHA256) |
|
||||
| `POST` | `/webhook/gitlab` | GitLab webhook (token verify) |
|
||||
| `POST` | `/webhook/gitea` | Gitea webhook |
|
||||
|
||||
## Dashboard Pages
|
||||
|
||||
| Page | Description |
|
||||
|------|-------------|
|
||||
| **Overview** | Stat cards, severity distribution chart |
|
||||
| **Repositories** | Add/manage tracked repos, trigger scans |
|
||||
| **Findings** | Filterable table by severity, type, status |
|
||||
| **Overview** | Stat cards, severity distribution, AI chat cards, MCP status |
|
||||
| **Repositories** | Add/manage tracked repos, trigger scans, webhook config |
|
||||
| **Findings** | Filterable table by severity, type, status, scanner |
|
||||
| **Finding Detail** | Code evidence, remediation, suggested fix, linked issue |
|
||||
| **SBOM** | Dependency inventory with vulnerability badges |
|
||||
| **Issues** | Cross-tracker view (GitHub + GitLab + Jira) |
|
||||
| **Settings** | Configure LiteLLM, tracker tokens, SearXNG URL |
|
||||
| **SBOM** | Dependency inventory with vulnerability badges, license summary |
|
||||
| **Issues** | Cross-tracker view (GitHub + GitLab + Jira + Gitea) |
|
||||
| **Code Graph** | Interactive architecture visualization, impact analysis |
|
||||
| **AI Chat** | RAG-powered Q&A about repository code |
|
||||
| **DAST** | Dynamic scanning targets, findings, and scan history |
|
||||
| **Pentest** | AI-driven pentest sessions, attack chain visualization |
|
||||
| **MCP Servers** | Model Context Protocol server management |
|
||||
| **Help Chat** | Floating assistant (available on every page) for product Q&A |
|
||||
|
||||
## Project Structure
|
||||
|
||||
@@ -173,19 +200,24 @@ compliance-scanner/
|
||||
├── compliance-core/ Shared library (models, traits, config, errors)
|
||||
├── compliance-agent/ Agent daemon (pipeline, LLM, trackers, API, webhooks)
|
||||
│ └── src/
|
||||
│ ├── pipeline/ 7-stage scan pipeline
|
||||
│ ├── llm/ LiteLLM client, triage, descriptions, fixes, PR review
|
||||
│ ├── trackers/ GitHub, GitLab, Jira integrations
|
||||
│ ├── api/ REST API (Axum)
|
||||
│ └── webhooks/ GitHub + GitLab webhook receivers
|
||||
│ ├── pipeline/ 7-stage scan pipeline, dedup, PR reviews, code review
|
||||
│ ├── llm/ LiteLLM client, triage, descriptions, fixes, review prompts
|
||||
│ ├── trackers/ GitHub, GitLab, Jira, Gitea integrations
|
||||
│ ├── pentest/ AI-driven pentest orchestrator, tools, reports
|
||||
│ ├── rag/ RAG pipeline, chunking, embedding
|
||||
│ ├── api/ REST API (Axum), help chat
|
||||
│ └── webhooks/ GitHub, GitLab, Gitea webhook receivers
|
||||
├── compliance-dashboard/ Dioxus fullstack dashboard
|
||||
│ └── src/
|
||||
│ ├── components/ Reusable UI components
|
||||
│ ├── infrastructure/ Server functions, DB, config
|
||||
│ └── pages/ Full page views
|
||||
│ ├── components/ Reusable UI (sidebar, help chat, attack chain, etc.)
|
||||
│ ├── infrastructure/ Server functions, DB, config, auth
|
||||
│ └── pages/ Full page views (overview, DAST, pentest, graph, etc.)
|
||||
├── compliance-graph/ Code knowledge graph (tree-sitter, embeddings, RAG)
|
||||
├── compliance-dast/ Dynamic application security testing
|
||||
├── compliance-mcp/ Model Context Protocol server
|
||||
├── docs/ VitePress documentation site
|
||||
├── assets/ Static assets (CSS, icons)
|
||||
├── styles/ Tailwind input stylesheet
|
||||
└── bin/ Dashboard binary entrypoint
|
||||
└── styles/ Tailwind input stylesheet
|
||||
```
|
||||
|
||||
## External Services
|
||||
@@ -193,10 +225,12 @@ compliance-scanner/
|
||||
| Service | Purpose | Default URL |
|
||||
|---------|---------|-------------|
|
||||
| MongoDB | Persistence | `mongodb://localhost:27017` |
|
||||
| LiteLLM | LLM proxy for triage and generation | `http://localhost:4000` |
|
||||
| LiteLLM | LLM proxy (chat, triage, embeddings) | `http://localhost:4000` |
|
||||
| SearXNG | CVE context search | `http://localhost:8888` |
|
||||
| Keycloak | Authentication (OAuth2/PKCE, SSO) | `http://localhost:8080` |
|
||||
| Semgrep | SAST scanning | CLI tool |
|
||||
| Syft | SBOM generation | CLI tool |
|
||||
| Chromium | Headless browser (pentesting, PDF) | Managed via Docker |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -90,10 +90,13 @@ pub async fn chat(
|
||||
};
|
||||
|
||||
let system_prompt = format!(
|
||||
"You are an expert code assistant for a software repository. \
|
||||
Answer the user's question based on the code context below. \
|
||||
Reference specific files and functions when relevant. \
|
||||
If the context doesn't contain enough information, say so.\n\n\
|
||||
"You are a code assistant for this repository. Answer questions using the code context below.\n\n\
|
||||
Rules:\n\
|
||||
- Reference specific files, functions, and line numbers\n\
|
||||
- Show code snippets when they help explain the answer\n\
|
||||
- If the context is insufficient, say what's missing rather than guessing\n\
|
||||
- Be concise — lead with the answer, then explain if needed\n\
|
||||
- For security questions, note relevant CWEs and link to the finding if one exists\n\n\
|
||||
## Code Context\n\n{code_context}"
|
||||
);
|
||||
|
||||
|
||||
187
compliance-agent/src/api/handlers/help_chat.rs
Normal file
187
compliance-agent/src/api/handlers/help_chat.rs
Normal file
@@ -0,0 +1,187 @@
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::OnceLock;
|
||||
|
||||
use axum::extract::Extension;
|
||||
use axum::http::StatusCode;
|
||||
use axum::Json;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use walkdir::WalkDir;
|
||||
|
||||
use super::dto::{AgentExt, ApiResponse};
|
||||
|
||||
// ── DTOs ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct HelpChatMessage {
|
||||
pub role: String,
|
||||
pub content: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct HelpChatRequest {
|
||||
pub message: String,
|
||||
#[serde(default)]
|
||||
pub history: Vec<HelpChatMessage>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct HelpChatResponse {
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
// ── Doc cache ────────────────────────────────────────────────────────────────
|
||||
|
||||
static DOC_CONTEXT: OnceLock<String> = OnceLock::new();
|
||||
|
||||
/// Walk upward from `start` until we find a directory containing both
|
||||
/// `README.md` and a `docs/` subdirectory.
|
||||
fn find_project_root(start: &Path) -> Option<PathBuf> {
|
||||
let mut current = start.to_path_buf();
|
||||
loop {
|
||||
if current.join("README.md").is_file() && current.join("docs").is_dir() {
|
||||
return Some(current);
|
||||
}
|
||||
if !current.pop() {
|
||||
return None;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Read README.md + all docs/**/*.md (excluding node_modules).
|
||||
fn load_docs(root: &Path) -> String {
|
||||
let mut parts: Vec<String> = Vec::new();
|
||||
|
||||
// Root README first
|
||||
if let Ok(content) = std::fs::read_to_string(root.join("README.md")) {
|
||||
parts.push(format!("<!-- file: README.md -->\n{content}"));
|
||||
}
|
||||
|
||||
// docs/**/*.md, skipping node_modules
|
||||
for entry in WalkDir::new(root.join("docs"))
|
||||
.follow_links(false)
|
||||
.into_iter()
|
||||
.filter_entry(|e| {
|
||||
!e.path()
|
||||
.components()
|
||||
.any(|c| c.as_os_str() == "node_modules")
|
||||
})
|
||||
.filter_map(|e| e.ok())
|
||||
{
|
||||
let path = entry.path();
|
||||
if !path.is_file() {
|
||||
continue;
|
||||
}
|
||||
if path
|
||||
.extension()
|
||||
.and_then(|s| s.to_str())
|
||||
.map(|s| !s.eq_ignore_ascii_case("md"))
|
||||
.unwrap_or(true)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
let rel = path.strip_prefix(root).unwrap_or(path);
|
||||
if let Ok(content) = std::fs::read_to_string(path) {
|
||||
parts.push(format!("<!-- file: {} -->\n{content}", rel.display()));
|
||||
}
|
||||
}
|
||||
|
||||
if parts.is_empty() {
|
||||
tracing::warn!(
|
||||
"help_chat: no documentation files found under {}",
|
||||
root.display()
|
||||
);
|
||||
} else {
|
||||
tracing::info!(
|
||||
"help_chat: loaded {} documentation file(s) from {}",
|
||||
parts.len(),
|
||||
root.display()
|
||||
);
|
||||
}
|
||||
|
||||
parts.join("\n\n---\n\n")
|
||||
}
|
||||
|
||||
/// Returns a reference to the cached doc context string, initialised on
|
||||
/// first call via `OnceLock`.
|
||||
fn doc_context() -> &'static str {
|
||||
DOC_CONTEXT.get_or_init(|| {
|
||||
let start = std::env::current_exe()
|
||||
.ok()
|
||||
.and_then(|p| p.parent().map(Path::to_path_buf))
|
||||
.unwrap_or_else(|| PathBuf::from("."));
|
||||
|
||||
match find_project_root(&start) {
|
||||
Some(root) => load_docs(&root),
|
||||
None => {
|
||||
// Fallback: try current working directory
|
||||
let cwd = std::env::current_dir().unwrap_or_else(|_| PathBuf::from("."));
|
||||
if cwd.join("README.md").is_file() {
|
||||
return load_docs(&cwd);
|
||||
}
|
||||
tracing::error!(
|
||||
"help_chat: could not locate project root from {}; doc context will be empty",
|
||||
start.display()
|
||||
);
|
||||
String::new()
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// ── Handler ──────────────────────────────────────────────────────────────────
|
||||
|
||||
/// POST /api/v1/help/chat — Answer questions about the compliance-scanner
|
||||
/// using the project documentation as grounding context.
|
||||
#[tracing::instrument(skip_all)]
|
||||
pub async fn help_chat(
|
||||
Extension(agent): AgentExt,
|
||||
Json(req): Json<HelpChatRequest>,
|
||||
) -> Result<Json<ApiResponse<HelpChatResponse>>, StatusCode> {
|
||||
let context = doc_context();
|
||||
|
||||
let system_prompt = if context.is_empty() {
|
||||
"You are a helpful assistant for the Compliance Scanner project. \
|
||||
Answer questions about how to use and configure it. \
|
||||
No documentation was loaded at startup, so rely on your general knowledge."
|
||||
.to_string()
|
||||
} else {
|
||||
format!(
|
||||
"You are a helpful assistant for the Compliance Scanner project. \
|
||||
Answer questions about how to use, configure, and understand it \
|
||||
using the documentation below as your primary source of truth.\n\n\
|
||||
Rules:\n\
|
||||
- Prefer information from the provided docs over general knowledge\n\
|
||||
- Quote or reference the relevant doc section when it helps\n\
|
||||
- If the docs do not cover the topic, say so clearly\n\
|
||||
- Be concise — lead with the answer, then explain if needed\n\
|
||||
- Use markdown formatting for readability\n\n\
|
||||
## Project Documentation\n\n{context}"
|
||||
)
|
||||
};
|
||||
|
||||
let mut messages: Vec<(String, String)> = Vec::with_capacity(req.history.len() + 2);
|
||||
messages.push(("system".to_string(), system_prompt));
|
||||
|
||||
for msg in &req.history {
|
||||
messages.push((msg.role.clone(), msg.content.clone()));
|
||||
}
|
||||
messages.push(("user".to_string(), req.message));
|
||||
|
||||
let response_text = agent
|
||||
.llm
|
||||
.chat_with_messages(messages, Some(0.3))
|
||||
.await
|
||||
.map_err(|e| {
|
||||
tracing::error!("LLM help chat failed: {e}");
|
||||
StatusCode::INTERNAL_SERVER_ERROR
|
||||
})?;
|
||||
|
||||
Ok(Json(ApiResponse {
|
||||
data: HelpChatResponse {
|
||||
message: response_text,
|
||||
},
|
||||
total: None,
|
||||
page: None,
|
||||
}))
|
||||
}
|
||||
@@ -4,6 +4,7 @@ pub mod dto;
|
||||
pub mod findings;
|
||||
pub mod graph;
|
||||
pub mod health;
|
||||
pub mod help_chat;
|
||||
pub mod issues;
|
||||
pub mod pentest_handlers;
|
||||
pub use pentest_handlers as pentest;
|
||||
|
||||
@@ -95,8 +95,8 @@ pub async fn export_session_report(
|
||||
Err(_) => Vec::new(),
|
||||
};
|
||||
|
||||
// Fetch DAST findings for this session
|
||||
let findings: Vec<DastFinding> = match agent
|
||||
// Fetch DAST findings for this session, then deduplicate
|
||||
let raw_findings: Vec<DastFinding> = match agent
|
||||
.db
|
||||
.dast_findings()
|
||||
.find(doc! { "session_id": &id })
|
||||
@@ -106,6 +106,14 @@ pub async fn export_session_report(
|
||||
Ok(cursor) => collect_cursor_async(cursor).await,
|
||||
Err(_) => Vec::new(),
|
||||
};
|
||||
let raw_count = raw_findings.len();
|
||||
let findings = crate::pipeline::dedup::dedup_dast_findings(raw_findings);
|
||||
if findings.len() < raw_count {
|
||||
tracing::info!(
|
||||
"Deduped DAST findings for session {id}: {raw_count} → {}",
|
||||
findings.len()
|
||||
);
|
||||
}
|
||||
|
||||
// Fetch SAST findings, SBOM, and code context for the linked repository
|
||||
let repo_id = session
|
||||
|
||||
@@ -237,5 +237,92 @@ pub async fn delete_repository(
|
||||
.delete_many(doc! { "repo_id": &id })
|
||||
.await;
|
||||
|
||||
// Cascade delete DAST targets linked to this repo, and all their downstream data
|
||||
// (scan runs, findings, pentest sessions, attack chains, messages)
|
||||
if let Ok(mut cursor) = db.dast_targets().find(doc! { "repo_id": &id }).await {
|
||||
use futures_util::StreamExt;
|
||||
while let Some(Ok(target)) = cursor.next().await {
|
||||
let target_id = target.id.map(|oid| oid.to_hex()).unwrap_or_default();
|
||||
if !target_id.is_empty() {
|
||||
cascade_delete_dast_target(db, &target_id).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Also delete pentest sessions linked directly to this repo (not via target)
|
||||
if let Ok(mut cursor) = db.pentest_sessions().find(doc! { "repo_id": &id }).await {
|
||||
use futures_util::StreamExt;
|
||||
while let Some(Ok(session)) = cursor.next().await {
|
||||
let session_id = session.id.map(|oid| oid.to_hex()).unwrap_or_default();
|
||||
if !session_id.is_empty() {
|
||||
let _ = db
|
||||
.attack_chain_nodes()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
let _ = db
|
||||
.pentest_messages()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
// Delete DAST findings produced by this session
|
||||
let _ = db
|
||||
.dast_findings()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
}
|
||||
}
|
||||
}
|
||||
let _ = db
|
||||
.pentest_sessions()
|
||||
.delete_many(doc! { "repo_id": &id })
|
||||
.await;
|
||||
|
||||
Ok(Json(serde_json::json!({ "status": "deleted" })))
|
||||
}
|
||||
|
||||
/// Cascade-delete a DAST target and all its downstream data.
|
||||
async fn cascade_delete_dast_target(db: &crate::database::Database, target_id: &str) {
|
||||
// Delete pentest sessions for this target (and their attack chains + messages)
|
||||
if let Ok(mut cursor) = db
|
||||
.pentest_sessions()
|
||||
.find(doc! { "target_id": target_id })
|
||||
.await
|
||||
{
|
||||
use futures_util::StreamExt;
|
||||
while let Some(Ok(session)) = cursor.next().await {
|
||||
let session_id = session.id.map(|oid| oid.to_hex()).unwrap_or_default();
|
||||
if !session_id.is_empty() {
|
||||
let _ = db
|
||||
.attack_chain_nodes()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
let _ = db
|
||||
.pentest_messages()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
let _ = db
|
||||
.dast_findings()
|
||||
.delete_many(doc! { "session_id": &session_id })
|
||||
.await;
|
||||
}
|
||||
}
|
||||
}
|
||||
let _ = db
|
||||
.pentest_sessions()
|
||||
.delete_many(doc! { "target_id": target_id })
|
||||
.await;
|
||||
|
||||
// Delete DAST scan runs and their findings
|
||||
let _ = db
|
||||
.dast_findings()
|
||||
.delete_many(doc! { "target_id": target_id })
|
||||
.await;
|
||||
let _ = db
|
||||
.dast_scan_runs()
|
||||
.delete_many(doc! { "target_id": target_id })
|
||||
.await;
|
||||
|
||||
// Delete the target itself
|
||||
if let Ok(oid) = mongodb::bson::oid::ObjectId::parse_str(target_id) {
|
||||
let _ = db.dast_targets().delete_one(doc! { "_id": oid }).await;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -99,6 +99,8 @@ pub fn build_router() -> Router {
|
||||
"/api/v1/chat/{repo_id}/status",
|
||||
get(handlers::chat::embedding_status),
|
||||
)
|
||||
// Help chat (documentation-grounded Q&A)
|
||||
.route("/api/v1/help/chat", post(handlers::help_chat::help_chat))
|
||||
// Pentest API endpoints
|
||||
.route(
|
||||
"/api/v1/pentest/lookup-repo",
|
||||
|
||||
@@ -5,15 +5,20 @@ use compliance_core::models::Finding;
|
||||
use crate::error::AgentError;
|
||||
use crate::llm::LlmClient;
|
||||
|
||||
const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing issue descriptions for a bug tracker. Generate a clear, actionable issue body in Markdown format that includes:
|
||||
const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing a bug tracker issue for a developer to fix. Be direct and actionable — developers skim issue descriptions, so lead with what matters.
|
||||
|
||||
1. **Summary**: 1-2 sentence overview
|
||||
2. **Evidence**: Code location, snippet, and what was detected
|
||||
3. **Impact**: What could happen if not fixed
|
||||
4. **Remediation**: Step-by-step fix instructions
|
||||
5. **References**: Relevant CWE/CVE links if applicable
|
||||
Format in Markdown:
|
||||
|
||||
Keep it concise and professional. Use code blocks for code snippets."#;
|
||||
1. **What**: 1 sentence — what's wrong and where (file:line)
|
||||
2. **Why it matters**: 1-2 sentences — concrete impact if not fixed. Avoid generic "could lead to" phrasing; describe the specific attack or failure scenario.
|
||||
3. **Fix**: The specific code change needed. Use a code block with the corrected code if possible. If the fix is configuration-based, show the exact config change.
|
||||
4. **References**: CWE/CVE link if applicable (one line, not a section)
|
||||
|
||||
Rules:
|
||||
- No filler paragraphs or background explanations
|
||||
- No restating the finding title in the body
|
||||
- Code blocks should show the FIX, not the vulnerable code (the developer can see that in the diff)
|
||||
- If the remediation is a one-liner, just say it — don't wrap it in a section header"#;
|
||||
|
||||
pub async fn generate_issue_description(
|
||||
llm: &Arc<LlmClient>,
|
||||
|
||||
@@ -5,7 +5,24 @@ use compliance_core::models::Finding;
|
||||
use crate::error::AgentError;
|
||||
use crate::llm::LlmClient;
|
||||
|
||||
const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;
|
||||
const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.
|
||||
|
||||
Rules:
|
||||
- The fix must be a drop-in replacement for the vulnerable code
|
||||
- Preserve the original code's style, indentation, and naming conventions
|
||||
- Add at most one brief inline comment on the changed line explaining the security fix
|
||||
- If the fix requires importing a new module, include the import on a separate line prefixed with the language's comment syntax + "Add import: "
|
||||
- Do not refactor, rename variables, or "improve" unrelated code
|
||||
- If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment explaining why no fix is needed
|
||||
|
||||
Language-specific fix guidance:
|
||||
- Rust: use `?` for error propagation, prefer `SecretString` for secrets, use parameterized queries with `sqlx`/`diesel`
|
||||
- Python: use parameterized queries (never f-strings in SQL), use `secrets` module not `random`, use `subprocess.run([...])` list form, use `markupsafe.escape()` for HTML
|
||||
- Go: use `sql.Query` with `$1`/`?` placeholders, use `crypto/rand` not `math/rand`, use `html/template` not `text/template`, return errors don't panic
|
||||
- Java/Kotlin: use `PreparedStatement` with `?` params, use `SecureRandom`, use `Jsoup.clean()` for HTML sanitization, use `@Valid` for input validation
|
||||
- Ruby: use ActiveRecord parameterized finders, use `SecureRandom`, use `ERB::Util.html_escape`, use `strong_parameters`
|
||||
- PHP: use PDO prepared statements with `:param` or `?`, use `random_bytes()`/`random_int()`, use `htmlspecialchars()` with `ENT_QUOTES`, use `password_hash(PASSWORD_BCRYPT)`
|
||||
- C/C++: use `snprintf` not `sprintf`, use bounds-checked APIs, free resources in reverse allocation order, use `memset_s` for secret cleanup"#;
|
||||
|
||||
pub async fn suggest_fix(llm: &Arc<LlmClient>, finding: &Finding) -> Result<String, AgentError> {
|
||||
let user_prompt = format!(
|
||||
|
||||
@@ -1,69 +1,138 @@
|
||||
// System prompts for multi-pass LLM code review.
|
||||
// Each pass focuses on a different aspect to avoid overloading a single prompt.
|
||||
|
||||
pub const LOGIC_REVIEW_PROMPT: &str = r#"You are a senior software engineer reviewing code changes. Focus ONLY on logic and correctness issues.
|
||||
pub const LOGIC_REVIEW_PROMPT: &str = r#"You are a senior software engineer reviewing a code diff. Report ONLY genuine logic bugs that would cause incorrect behavior at runtime.
|
||||
|
||||
Look for:
|
||||
- Off-by-one errors, wrong comparisons, missing edge cases
|
||||
- Incorrect control flow (unreachable code, missing returns, wrong loop conditions)
|
||||
- Race conditions or concurrency bugs
|
||||
- Resource leaks (unclosed handles, missing cleanup)
|
||||
- Wrong variable used (copy-paste errors)
|
||||
- Incorrect error handling (swallowed errors, wrong error type)
|
||||
Report:
|
||||
- Off-by-one errors, wrong comparisons, missing edge cases that cause wrong results
|
||||
- Incorrect control flow that produces wrong output (not style preferences)
|
||||
- Actual race conditions with concrete shared-state mutation (not theoretical ones)
|
||||
- Resource leaks where cleanup is truly missing (not just "could be improved")
|
||||
- Wrong variable used (copy-paste errors) — must be provably wrong, not just suspicious
|
||||
- Swallowed errors that silently hide failures in a way that matters
|
||||
|
||||
Ignore: style, naming, formatting, documentation, minor improvements.
|
||||
Do NOT report:
|
||||
- Style, naming, formatting, documentation, or code organization preferences
|
||||
- Theoretical issues without a concrete triggering scenario
|
||||
- "Potential" problems that require assumptions not supported by the visible code
|
||||
- Complexity or function length — that's a separate review pass
|
||||
|
||||
For each issue found, respond with a JSON array:
|
||||
Language-idiomatic patterns that are NOT bugs (do not flag these):
|
||||
- Rust: `||`/`&&` short-circuit evaluation, variable shadowing, `let` rebinding, `clone()`, `impl` blocks, `match` arms with guards, `?` operator chaining, `unsafe` blocks with safety comments
|
||||
- Python: duck typing, EAFP pattern (try/except vs check-first), `*args`/`**kwargs`, walrus operator `:=`, truthiness checks on containers, bare `except:` in top-level handlers
|
||||
- Go: multiple return values for errors, `if err != nil` patterns, goroutine + channel patterns, blank identifier `_`, named returns, `defer` for cleanup, `init()` functions
|
||||
- Java/Kotlin: checked exception patterns, method overloading, `Optional` vs null checks, Kotlin `?.` safe calls, `!!` non-null assertions in tests, `when` exhaustive matching, companion objects, `lateinit`
|
||||
- Ruby: monkey patching in libraries, method_missing, blocks/procs/lambdas, `rescue => e` patterns, `send`/`respond_to?` metaprogramming, `nil` checks via `&.` safe navigation
|
||||
- PHP: loose comparisons with `==` (only flag if `===` was clearly intended), `@` error suppression in legacy code, `isset()`/`empty()` patterns, magic methods (`__get`, `__call`), array functions as callbacks
|
||||
- C/C++: RAII patterns, move semantics, `const_cast`/`static_cast` in appropriate contexts, macro usage for platform compat, pointer arithmetic in low-level code, `goto` for cleanup in C
|
||||
|
||||
Severity guide:
|
||||
- high: Will cause incorrect behavior in normal usage
|
||||
- medium: Will cause incorrect behavior in edge cases
|
||||
- low: Minor correctness concern with limited blast radius
|
||||
|
||||
Prefer returning [] over reporting low-confidence guesses. A false positive wastes more developer time than a missed low-severity issue.
|
||||
|
||||
Respond with a JSON array (no markdown fences):
|
||||
[{"title": "...", "description": "...", "severity": "high|medium|low", "file": "...", "line": N, "suggestion": "..."}]
|
||||
|
||||
If no issues found, respond with: []"#;
|
||||
|
||||
pub const SECURITY_REVIEW_PROMPT: &str = r#"You are a security engineer reviewing code changes. Focus ONLY on security vulnerabilities.
|
||||
pub const SECURITY_REVIEW_PROMPT: &str = r#"You are a security engineer reviewing a code diff. Report ONLY exploitable security vulnerabilities with a realistic attack scenario.
|
||||
|
||||
Look for:
|
||||
- Injection vulnerabilities (SQL, command, XSS, template injection)
|
||||
- Authentication/authorization bypasses
|
||||
- Sensitive data exposure (logging secrets, hardcoded credentials)
|
||||
- Insecure cryptography (weak algorithms, predictable randomness)
|
||||
- Path traversal, SSRF, open redirects
|
||||
- Unsafe deserialization
|
||||
- Missing input validation at trust boundaries
|
||||
Report:
|
||||
- Injection vulnerabilities (SQL, command, XSS, template) where untrusted input reaches a sink
|
||||
- Authentication/authorization bypasses with a concrete exploit path
|
||||
- Sensitive data exposure: secrets in code, credentials in logs, PII leaks
|
||||
- Insecure cryptography: weak algorithms, predictable randomness, hardcoded keys
|
||||
- Path traversal, SSRF, open redirects — only where user input reaches the vulnerable API
|
||||
- Unsafe deserialization of untrusted data
|
||||
- Missing input validation at EXTERNAL trust boundaries (user input, API responses)
|
||||
|
||||
Ignore: code style, performance, general quality.
|
||||
Do NOT report:
|
||||
- Internal code that only handles trusted/validated data
|
||||
- Hash functions used for non-security purposes (dedup fingerprints, cache keys, content addressing)
|
||||
- Logging of non-sensitive operational data (finding titles, counts, performance metrics)
|
||||
- "Information disclosure" for data that is already public or user-facing
|
||||
- Code style, performance, or general quality issues
|
||||
- Missing validation on internal function parameters (trust the caller within the same module/crate/package)
|
||||
- Theoretical attacks that require preconditions not present in the code
|
||||
|
||||
For each issue found, respond with a JSON array:
|
||||
Language-specific patterns that are NOT vulnerabilities (do not flag these):
|
||||
- Python: `pickle` used on trusted internal data, `eval()`/`exec()` on hardcoded strings, `subprocess` with hardcoded commands, Django `mark_safe()` on static content, `assert` in non-security contexts
|
||||
- Go: `crypto/rand` is secure (don't confuse with `math/rand`), `sql.DB` with parameterized queries is safe, `http.ListenAndServe` without TLS in dev/internal, error strings in responses (Go convention)
|
||||
- Java/Kotlin: Spring Security annotations are sufficient auth checks, `@Transactional` provides atomicity, JPA parameterized queries are safe, Kotlin `require()`/`check()` are assertion patterns not vulnerabilities
|
||||
- Ruby: Rails `params.permit()` is input validation, `render html:` with `html_safe` on generated content, ActiveRecord parameterized finders are safe, Devise/Warden patterns for auth
|
||||
- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars()` is XSS mitigation, Symfony security voters are auth checks, `password_hash()`/`password_verify()` are correct bcrypt usage
|
||||
- C/C++: `strncpy`/`snprintf` are bounds-checked (vs `strcpy`/`sprintf`), smart pointers manage memory, RAII handles cleanup, `static_assert` is compile-time only, OpenSSL with proper context setup
|
||||
- Rust: `sha2`/`blake3` for fingerprinting is not "weak crypto", `unsafe` with documented invariants, `secrecy::SecretString` properly handles secrets
|
||||
|
||||
Severity guide:
|
||||
- critical: Remote code execution, auth bypass, or data breach with no preconditions
|
||||
- high: Exploitable vulnerability requiring minimal preconditions
|
||||
- medium: Vulnerability requiring specific conditions or limited impact
|
||||
|
||||
Prefer returning [] over reporting speculative vulnerabilities. Every false positive erodes trust in the scanner.
|
||||
|
||||
Respond with a JSON array (no markdown fences):
|
||||
[{"title": "...", "description": "...", "severity": "critical|high|medium", "file": "...", "line": N, "cwe": "CWE-XXX", "suggestion": "..."}]
|
||||
|
||||
If no issues found, respond with: []"#;
|
||||
|
||||
pub const CONVENTION_REVIEW_PROMPT: &str = r#"You are a code reviewer checking adherence to project conventions. Focus ONLY on patterns that indicate likely bugs or maintenance problems.
|
||||
pub const CONVENTION_REVIEW_PROMPT: &str = r#"You are a code reviewer checking for convention violations that indicate likely bugs. Report ONLY deviations from the project's visible patterns that could cause real problems.
|
||||
|
||||
Look for:
|
||||
- Inconsistent error handling patterns within the same module
|
||||
- Public API that doesn't follow the project's established patterns
|
||||
- Missing or incorrect type annotations that could cause runtime issues
|
||||
- Anti-patterns specific to the language (e.g. unwrap in Rust library code, any in TypeScript)
|
||||
Report:
|
||||
- Inconsistent error handling within the same module where the inconsistency could hide failures
|
||||
- Public API that breaks the module's established contract (not just different style)
|
||||
- Anti-patterns that are bugs in this language: e.g. `unwrap()` in Rust library code where the CI enforces `clippy::unwrap_used`, `any` defeating TypeScript's type system
|
||||
|
||||
Do NOT report: minor style preferences, documentation gaps, formatting.
|
||||
Only report issues with HIGH confidence that they deviate from the visible codebase conventions.
|
||||
Do NOT report:
|
||||
- Style preferences, formatting, naming conventions, or documentation
|
||||
- Code organization suggestions ("this function should be split")
|
||||
- Patterns that are valid in the language even if you'd write them differently
|
||||
- "Missing type annotations" unless the code literally won't compile or causes a type inference bug
|
||||
|
||||
For each issue found, respond with a JSON array:
|
||||
Language-specific patterns that are conventional (do not flag these):
|
||||
- Rust: variable shadowing, `||`/`&&` short-circuit, `let` rebinding, builder patterns, `clone()`, `From`/`Into` impl chains, `#[allow(...)]` attributes
|
||||
- Python: `**kwargs` forwarding, `@property` setters, `__dunder__` methods, list comprehensions with conditions, `if TYPE_CHECKING` imports, `noqa` comments
|
||||
- Go: stuttering names (`http.HTTPClient`) discouraged but not a bug, `context.Context` as first param, init() functions, `//nolint` directives, returning concrete types vs interfaces in internal code
|
||||
- Java/Kotlin: builder pattern boilerplate, Lombok annotations (`@Data`, `@Builder`), Kotlin data classes, `companion object` factories, `@Suppress` annotations, checked exception wrapping
|
||||
- Ruby: `attr_accessor` usage, `Enumerable` mixin patterns, `module_function`, `class << self` syntax, DSL blocks (Rake, RSpec, Sinatra routes)
|
||||
- PHP: `__construct` with property promotion, Laravel facades, static factory methods, nullable types with `?`, attribute syntax `#[...]`
|
||||
- C/C++: header guards vs `#pragma once`, forward declarations, `const` correctness patterns, template specialization, `auto` type deduction
|
||||
|
||||
Severity guide:
|
||||
- medium: Convention violation that will likely cause a bug or maintenance problem
|
||||
- low: Convention violation that is a minor concern
|
||||
|
||||
Return at most 3 findings. Prefer [] over marginal findings.
|
||||
|
||||
Respond with a JSON array (no markdown fences):
|
||||
[{"title": "...", "description": "...", "severity": "medium|low", "file": "...", "line": N, "suggestion": "..."}]
|
||||
|
||||
If no issues found, respond with: []"#;
|
||||
|
||||
pub const COMPLEXITY_REVIEW_PROMPT: &str = r#"You are reviewing code changes for excessive complexity that could lead to bugs.
|
||||
pub const COMPLEXITY_REVIEW_PROMPT: &str = r#"You are reviewing code changes for complexity that is likely to cause bugs. Report ONLY complexity that makes the code demonstrably harder to reason about.
|
||||
|
||||
Look for:
|
||||
- Functions over 50 lines that should be decomposed
|
||||
- Deeply nested control flow (4+ levels)
|
||||
- Complex boolean expressions that are hard to reason about
|
||||
- Functions with 5+ parameters
|
||||
- Code duplication within the changed files
|
||||
Report:
|
||||
- Functions over 80 lines with multiple interleaved responsibilities (not just long)
|
||||
- Deeply nested control flow (5+ levels) where flattening would prevent bugs
|
||||
- Complex boolean expressions that a reader would likely misinterpret
|
||||
|
||||
Only report complexity issues that are HIGH risk for future bugs. Ignore acceptable complexity in configuration, CLI argument parsing, or generated code.
|
||||
Do NOT report:
|
||||
- Functions that are long but linear and easy to follow
|
||||
- Acceptable complexity: configuration setup, CLI parsing, test helpers, builder patterns
|
||||
- Code that is complex because the problem is complex — only report if restructuring would reduce bug risk
|
||||
- "This function does multiple things" unless you can identify a specific bug risk from the coupling
|
||||
- Suggestions that would just move complexity elsewhere without reducing it
|
||||
|
||||
For each issue found, respond with a JSON array:
|
||||
Severity guide:
|
||||
- medium: Complexity that has a concrete risk of causing bugs during future changes
|
||||
- low: Complexity that makes review harder but is unlikely to cause bugs
|
||||
|
||||
Return at most 2 findings. Prefer [] over reporting complexity that is justified.
|
||||
|
||||
Respond with a JSON array (no markdown fences):
|
||||
[{"title": "...", "description": "...", "severity": "medium|low", "file": "...", "line": N, "suggestion": "..."}]
|
||||
|
||||
If no issues found, respond with: []"#;
|
||||
|
||||
@@ -8,22 +8,46 @@ use crate::pipeline::orchestrator::GraphContext;
|
||||
/// Maximum number of findings to include in a single LLM triage call.
|
||||
const TRIAGE_CHUNK_SIZE: usize = 30;
|
||||
|
||||
const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.
|
||||
const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.
|
||||
|
||||
Actions:
|
||||
- "confirm": The finding is a true positive at the reported severity. Keep as-is.
|
||||
- "downgrade": The finding is real but over-reported. Lower severity recommended.
|
||||
- "upgrade": The finding is under-reported. Higher severity recommended.
|
||||
- "dismiss": The finding is a false positive. Should be removed.
|
||||
- "confirm": True positive with real impact. Keep severity as-is.
|
||||
- "downgrade": Real issue but over-reported severity. Lower it.
|
||||
- "upgrade": Under-reported — higher severity warranted.
|
||||
- "dismiss": False positive, not exploitable, or not actionable. Remove it.
|
||||
|
||||
Consider:
|
||||
- Is the code in a test, example, or generated file? (lower confidence for test code)
|
||||
- Does the surrounding code context confirm or refute the finding?
|
||||
- Is the finding actionable by a developer?
|
||||
- Would a real attacker be able to exploit this?
|
||||
Dismiss when:
|
||||
- The scanner flagged a language idiom as a bug (see examples below)
|
||||
- The finding is in test/example/generated/vendored code
|
||||
- The "vulnerability" requires preconditions that don't exist in the code
|
||||
- The finding is about code style, complexity, or theoretical concerns rather than actual bugs
|
||||
- A hash function is used for non-security purposes (dedup, caching, content addressing)
|
||||
- Internal logging of non-sensitive operational data is flagged as "information disclosure"
|
||||
- The finding duplicates another finding already in the list
|
||||
- Framework-provided security is already in place (e.g. ORM parameterized queries, CSRF middleware, auth decorators)
|
||||
|
||||
Respond with a JSON array, one entry per finding in the same order they were presented:
|
||||
[{"id": "<fingerprint>", "action": "confirm|downgrade|upgrade|dismiss", "confidence": 0-10, "rationale": "brief explanation", "remediation": "optional fix suggestion"}, ...]"#;
|
||||
Common false positive patterns by language (dismiss these):
|
||||
- Rust: short-circuit `||`/`&&`, variable shadowing, `clone()`, `unsafe` with safety docs, `sha2` for fingerprinting
|
||||
- Python: EAFP try/except, `subprocess` with hardcoded args, `pickle` on trusted data, Django `mark_safe` on static content
|
||||
- Go: `if err != nil` is not "swallowed error", `crypto/rand` is secure, returning errors is not "information disclosure"
|
||||
- Java/Kotlin: Spring Security annotations are valid auth, JPA parameterized queries are safe, Kotlin `!!` in tests is fine
|
||||
- Ruby: Rails `params.permit` is validation, ActiveRecord finders are parameterized, `html_safe` on generated content
|
||||
- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars` is XSS mitigation
|
||||
- C/C++: `strncpy`/`snprintf` are bounds-checked, smart pointers manage memory, RAII handles cleanup
|
||||
|
||||
Confirm only when:
|
||||
- You can describe a concrete scenario where the bug manifests or the vulnerability is exploitable
|
||||
- The fix is actionable (developer can change specific code to resolve it)
|
||||
- The finding is in production code that handles external input or sensitive data
|
||||
|
||||
Confidence scoring (0-10):
|
||||
- 8-10: Certain true positive with clear exploit/bug scenario
|
||||
- 5-7: Likely true positive, some assumptions required
|
||||
- 3-4: Uncertain, needs manual review
|
||||
- 0-2: Almost certainly a false positive
|
||||
|
||||
Respond with a JSON array, one entry per finding in the same order presented (no markdown fences):
|
||||
[{"id": "<fingerprint>", "action": "confirm|downgrade|upgrade|dismiss", "confidence": 0-10, "rationale": "1-2 sentences", "remediation": "optional fix"}, ...]"#;
|
||||
|
||||
pub async fn triage_findings(
|
||||
llm: &Arc<LlmClient>,
|
||||
|
||||
@@ -321,9 +321,38 @@ impl PentestOrchestrator {
|
||||
total_findings += findings_count;
|
||||
|
||||
let mut finding_ids: Vec<String> = Vec::new();
|
||||
for mut finding in result.findings {
|
||||
// Dedup findings within this tool result before inserting
|
||||
let deduped_findings =
|
||||
crate::pipeline::dedup::dedup_dast_findings(
|
||||
result.findings,
|
||||
);
|
||||
for mut finding in deduped_findings {
|
||||
finding.scan_run_id = session_id.clone();
|
||||
finding.session_id = Some(session_id.clone());
|
||||
|
||||
// Check for existing duplicate in this session
|
||||
let fp = crate::pipeline::dedup::compute_dast_fingerprint(
|
||||
&finding,
|
||||
);
|
||||
let existing = self
|
||||
.db
|
||||
.dast_findings()
|
||||
.find_one(doc! {
|
||||
"session_id": &session_id,
|
||||
"title": &finding.title,
|
||||
"endpoint": &finding.endpoint,
|
||||
"method": &finding.method,
|
||||
})
|
||||
.await;
|
||||
if matches!(existing, Ok(Some(_))) {
|
||||
tracing::debug!(
|
||||
"Skipping duplicate DAST finding: {} (fp={:.12})",
|
||||
finding.title,
|
||||
fp,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
|
||||
let insert_result =
|
||||
self.db.dast_findings().insert_one(&finding).await;
|
||||
if let Ok(res) = &insert_result {
|
||||
|
||||
@@ -314,6 +314,21 @@ impl PentestOrchestrator {
|
||||
- For SPA apps: a 200 HTTP status does NOT mean the page is accessible — check the actual
|
||||
page content with the browser tool to verify if it shows real data or a login redirect.
|
||||
|
||||
## Finding Quality Rules
|
||||
- **Do not report the same issue twice.** If multiple tools detect the same missing header or
|
||||
vulnerability on the same endpoint, report it ONCE with the most specific tool's output.
|
||||
For example, if the recon tool and the header scanner both find missing HSTS, report it only
|
||||
from the header scanner (more specific).
|
||||
- **Group related findings.** Missing security headers on the same endpoint are ONE finding
|
||||
("Missing security headers") listing all missing headers, not separate findings per header.
|
||||
- **Severity must match real impact:**
|
||||
- critical/high: Exploitable vulnerability (you can demonstrate the exploit)
|
||||
- medium: Real misconfiguration with security implications but not directly exploitable
|
||||
- low: Best-practice recommendation, defense-in-depth, or informational
|
||||
- **Missing headers are medium at most** unless you can demonstrate a concrete exploit enabled
|
||||
by the missing header (e.g., missing CSP + confirmed XSS = high for CSP finding).
|
||||
- Console.log in third-party/vendored JS (node_modules, minified libraries) is informational only.
|
||||
|
||||
## Important
|
||||
- This is an authorized penetration test. All testing is permitted within the target scope.
|
||||
- Respect the rate limit of {rate_limit} requests per second.
|
||||
|
||||
@@ -66,8 +66,10 @@ impl CodeReviewScanner {
|
||||
}
|
||||
}
|
||||
|
||||
let deduped = dedup_cross_pass(all_findings);
|
||||
|
||||
ScanOutput {
|
||||
findings: all_findings,
|
||||
findings: deduped,
|
||||
sbom_entries: Vec::new(),
|
||||
}
|
||||
}
|
||||
@@ -184,3 +186,51 @@ struct ReviewIssue {
|
||||
#[serde(default)]
|
||||
suggestion: Option<String>,
|
||||
}
|
||||
|
||||
/// Deduplicate findings across review passes.
|
||||
///
|
||||
/// Multiple passes often flag the same issue (e.g. SQL injection reported by
|
||||
/// logic, security, and convention passes). We group by file + nearby line +
|
||||
/// normalized title keywords and keep the highest-severity finding.
|
||||
fn dedup_cross_pass(findings: Vec<Finding>) -> Vec<Finding> {
|
||||
use std::collections::HashMap;
|
||||
|
||||
// Build a dedup key: (file, line bucket, normalized title words)
|
||||
fn dedup_key(f: &Finding) -> String {
|
||||
let file = f.file_path.as_deref().unwrap_or("");
|
||||
// Group lines within 3 of each other
|
||||
let line_bucket = f.line_number.unwrap_or(0) / 4;
|
||||
// Normalize: lowercase, keep only alphanumeric, sort words for order-independence
|
||||
let title_lower = f.title.to_lowercase();
|
||||
let mut words: Vec<&str> = title_lower
|
||||
.split(|c: char| !c.is_alphanumeric())
|
||||
.filter(|w| w.len() > 2)
|
||||
.collect();
|
||||
words.sort();
|
||||
format!("{file}:{line_bucket}:{}", words.join(","))
|
||||
}
|
||||
|
||||
let mut groups: HashMap<String, Finding> = HashMap::new();
|
||||
|
||||
for finding in findings {
|
||||
let key = dedup_key(&finding);
|
||||
groups
|
||||
.entry(key)
|
||||
.and_modify(|existing| {
|
||||
// Keep the higher severity; on tie, keep the one with more detail
|
||||
if finding.severity > existing.severity
|
||||
|| (finding.severity == existing.severity
|
||||
&& finding.description.len() > existing.description.len())
|
||||
{
|
||||
*existing = finding.clone();
|
||||
}
|
||||
// Merge CWE if the existing one is missing it
|
||||
if existing.cwe.is_none() {
|
||||
existing.cwe = finding.cwe.clone();
|
||||
}
|
||||
})
|
||||
.or_insert(finding);
|
||||
}
|
||||
|
||||
groups.into_values().collect()
|
||||
}
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
use sha2::{Digest, Sha256};
|
||||
|
||||
use compliance_core::models::dast::DastFinding;
|
||||
|
||||
pub fn compute_fingerprint(parts: &[&str]) -> String {
|
||||
let mut hasher = Sha256::new();
|
||||
for part in parts {
|
||||
@@ -9,9 +11,209 @@ pub fn compute_fingerprint(parts: &[&str]) -> String {
|
||||
hex::encode(hasher.finalize())
|
||||
}
|
||||
|
||||
/// Compute a dedup fingerprint for a DAST finding.
|
||||
///
|
||||
/// The key is derived from the *canonicalized* title (lowercased, domain names
|
||||
/// stripped, known synonyms resolved), endpoint, and HTTP method. This lets us
|
||||
/// detect both exact duplicates (same tool reporting twice across passes) and
|
||||
/// semantic duplicates (e.g., `security_header_missing` "Missing HSTS header"
|
||||
/// vs `tls_misconfiguration` "Missing strict-transport-security header").
|
||||
pub fn compute_dast_fingerprint(f: &DastFinding) -> String {
|
||||
let canon = canonicalize_dast_title(&f.title);
|
||||
let endpoint = f.endpoint.to_lowercase().trim_end_matches('/').to_string();
|
||||
let method = f.method.to_uppercase();
|
||||
let param = f.parameter.as_deref().unwrap_or("");
|
||||
compute_fingerprint(&[&canon, &endpoint, &method, param])
|
||||
}
|
||||
|
||||
/// Canonicalize a DAST finding title for dedup purposes.
|
||||
///
|
||||
/// 1. Lowercase
|
||||
/// 2. Strip domain names / URLs (e.g. "for comp-dev.meghsakha.com")
|
||||
/// 3. Resolve known header synonyms (hsts ↔ strict-transport-security, etc.)
|
||||
/// 4. Strip extra whitespace
|
||||
fn canonicalize_dast_title(title: &str) -> String {
|
||||
let mut s = title.to_lowercase();
|
||||
|
||||
// Strip "for <domain>" or "on <domain>" suffixes
|
||||
// Pattern: "for <word.word...>" or "on <method> <url>"
|
||||
if let Some(idx) = s.find(" for ") {
|
||||
// Check if what follows looks like a domain or URL
|
||||
let rest = &s[idx + 5..];
|
||||
if rest.contains('.') || rest.starts_with("http") {
|
||||
s.truncate(idx);
|
||||
}
|
||||
}
|
||||
if let Some(idx) = s.find(" on ") {
|
||||
let rest = &s[idx + 4..];
|
||||
if rest.contains("http") || rest.contains('/') {
|
||||
s.truncate(idx);
|
||||
}
|
||||
}
|
||||
|
||||
// Resolve known header synonyms
|
||||
let synonyms: &[(&str, &str)] = &[
|
||||
("hsts", "strict-transport-security"),
|
||||
("csp", "content-security-policy"),
|
||||
("cors", "cross-origin-resource-sharing"),
|
||||
("xfo", "x-frame-options"),
|
||||
];
|
||||
for &(short, canonical) in synonyms {
|
||||
// Only replace whole words — check boundaries
|
||||
if let Some(pos) = s.find(short) {
|
||||
let before_ok = pos == 0 || !s.as_bytes()[pos - 1].is_ascii_alphanumeric();
|
||||
let after_ok = pos + short.len() >= s.len()
|
||||
|| !s.as_bytes()[pos + short.len()].is_ascii_alphanumeric();
|
||||
if before_ok && after_ok {
|
||||
s = format!("{}{}{}", &s[..pos], canonical, &s[pos + short.len()..]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Collapse whitespace
|
||||
s.split_whitespace().collect::<Vec<_>>().join(" ")
|
||||
}
|
||||
|
||||
/// Deduplicate a list of DAST findings, merging evidence from duplicates.
|
||||
///
|
||||
/// Two-phase approach:
|
||||
/// 1. **Exact dedup** — group by canonicalized `(title, endpoint, method, parameter)`.
|
||||
/// Merge evidence arrays, keep the highest severity, preserve exploitable flag.
|
||||
/// 2. **CWE-based dedup** — within the same `(cwe, endpoint, method)` group, merge
|
||||
/// findings whose canonicalized titles resolve to the same subject (e.g., HSTS
|
||||
/// reported as both `security_header_missing` and `tls_misconfiguration`).
|
||||
pub fn dedup_dast_findings(findings: Vec<DastFinding>) -> Vec<DastFinding> {
|
||||
use std::collections::HashMap;
|
||||
|
||||
if findings.len() <= 1 {
|
||||
return findings;
|
||||
}
|
||||
|
||||
// Phase 1: exact fingerprint dedup
|
||||
let mut seen: HashMap<String, usize> = HashMap::new();
|
||||
let mut deduped: Vec<DastFinding> = Vec::new();
|
||||
|
||||
for finding in findings {
|
||||
let fp = compute_dast_fingerprint(&finding);
|
||||
|
||||
if let Some(&idx) = seen.get(&fp) {
|
||||
// Merge into existing
|
||||
merge_dast_finding(&mut deduped[idx], &finding);
|
||||
} else {
|
||||
seen.insert(fp, deduped.len());
|
||||
deduped.push(finding);
|
||||
}
|
||||
}
|
||||
|
||||
let before = deduped.len();
|
||||
|
||||
// Phase 2: CWE-based related dedup
|
||||
// Group by (cwe, endpoint_normalized, method) — only when CWE is present
|
||||
let mut cwe_groups: HashMap<String, Vec<usize>> = HashMap::new();
|
||||
for (i, f) in deduped.iter().enumerate() {
|
||||
if let Some(ref cwe) = f.cwe {
|
||||
let key = format!(
|
||||
"{}|{}|{}",
|
||||
cwe,
|
||||
f.endpoint.to_lowercase().trim_end_matches('/'),
|
||||
f.method.to_uppercase(),
|
||||
);
|
||||
cwe_groups.entry(key).or_default().push(i);
|
||||
}
|
||||
}
|
||||
|
||||
// For each CWE group with multiple findings, keep the one with highest severity
|
||||
// and most evidence, merge the rest into it
|
||||
let mut merge_map: HashMap<usize, Vec<usize>> = HashMap::new();
|
||||
let mut remove_indices: Vec<usize> = Vec::new();
|
||||
|
||||
for indices in cwe_groups.values() {
|
||||
if indices.len() <= 1 {
|
||||
continue;
|
||||
}
|
||||
// Find the "primary" finding: highest severity, then most evidence, then longest description
|
||||
let Some(&primary_idx) = indices.iter().max_by(|&&a, &&b| {
|
||||
deduped[a]
|
||||
.severity
|
||||
.cmp(&deduped[b].severity)
|
||||
.then_with(|| deduped[a].evidence.len().cmp(&deduped[b].evidence.len()))
|
||||
.then_with(|| {
|
||||
deduped[a]
|
||||
.description
|
||||
.len()
|
||||
.cmp(&deduped[b].description.len())
|
||||
})
|
||||
}) else {
|
||||
continue;
|
||||
};
|
||||
|
||||
for &idx in indices {
|
||||
if idx != primary_idx {
|
||||
remove_indices.push(idx);
|
||||
merge_map.entry(primary_idx).or_default().push(idx);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !remove_indices.is_empty() {
|
||||
remove_indices.sort_unstable();
|
||||
remove_indices.dedup();
|
||||
|
||||
// Merge evidence
|
||||
for (&primary, secondaries) in &merge_map {
|
||||
let extra_evidence: Vec<_> = secondaries
|
||||
.iter()
|
||||
.flat_map(|&i| deduped[i].evidence.clone())
|
||||
.collect();
|
||||
let any_exploitable = secondaries.iter().any(|&i| deduped[i].exploitable);
|
||||
|
||||
deduped[primary].evidence.extend(extra_evidence);
|
||||
if any_exploitable {
|
||||
deduped[primary].exploitable = true;
|
||||
}
|
||||
}
|
||||
|
||||
// Remove merged findings (iterate in reverse to preserve indices)
|
||||
for &idx in remove_indices.iter().rev() {
|
||||
deduped.remove(idx);
|
||||
}
|
||||
}
|
||||
|
||||
let after = deduped.len();
|
||||
if before != after {
|
||||
tracing::debug!(
|
||||
"DAST CWE-based dedup: {before} → {after} findings ({} merged)",
|
||||
before - after
|
||||
);
|
||||
}
|
||||
|
||||
deduped
|
||||
}
|
||||
|
||||
/// Merge a duplicate DAST finding into a primary one.
|
||||
fn merge_dast_finding(primary: &mut DastFinding, duplicate: &DastFinding) {
|
||||
primary.evidence.extend(duplicate.evidence.clone());
|
||||
if duplicate.severity > primary.severity {
|
||||
primary.severity = duplicate.severity.clone();
|
||||
}
|
||||
if duplicate.exploitable {
|
||||
primary.exploitable = true;
|
||||
}
|
||||
// Keep the longer/better description
|
||||
if duplicate.description.len() > primary.description.len() {
|
||||
primary.description.clone_from(&duplicate.description);
|
||||
}
|
||||
// Keep remediation if primary doesn't have one
|
||||
if primary.remediation.is_none() && duplicate.remediation.is_some() {
|
||||
primary.remediation.clone_from(&duplicate.remediation);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use compliance_core::models::dast::DastVulnType;
|
||||
use compliance_core::models::finding::Severity;
|
||||
|
||||
#[test]
|
||||
fn fingerprint_is_deterministic() {
|
||||
@@ -55,4 +257,159 @@ mod tests {
|
||||
let b = compute_fingerprint(&["a", "bc"]);
|
||||
assert_ne!(a, b);
|
||||
}
|
||||
|
||||
fn make_dast(title: &str, endpoint: &str, vuln_type: DastVulnType) -> DastFinding {
|
||||
let mut f = DastFinding::new(
|
||||
"run1".into(),
|
||||
"target1".into(),
|
||||
vuln_type,
|
||||
title.into(),
|
||||
format!("Description for {title}"),
|
||||
Severity::Medium,
|
||||
endpoint.into(),
|
||||
"GET".into(),
|
||||
);
|
||||
f.cwe = Some("CWE-319".into());
|
||||
f
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn canonicalize_strips_domain_suffix() {
|
||||
let canon = canonicalize_dast_title("Missing HSTS header for comp-dev.meghsakha.com");
|
||||
assert!(!canon.contains("meghsakha"), "domain should be stripped");
|
||||
assert!(
|
||||
canon.contains("strict-transport-security"),
|
||||
"hsts should be resolved: {canon}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn canonicalize_resolves_synonyms() {
|
||||
let a = canonicalize_dast_title("Missing HSTS header");
|
||||
let b = canonicalize_dast_title("Missing strict-transport-security header");
|
||||
assert_eq!(a, b);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn exact_dedup_merges_identical_findings() {
|
||||
let f1 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let f2 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
assert_eq!(result.len(), 1, "exact duplicates should be merged");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn synonym_dedup_merges_hsts_variants() {
|
||||
let f1 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let f2 = make_dast(
|
||||
"Missing HSTS header for example.com",
|
||||
"https://example.com",
|
||||
DastVulnType::TlsMisconfiguration,
|
||||
);
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
assert_eq!(
|
||||
result.len(),
|
||||
1,
|
||||
"HSTS synonym variants should merge to 1 finding"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn different_headers_not_merged() {
|
||||
let mut f1 = make_dast(
|
||||
"Missing x-content-type-options header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
f1.cwe = Some("CWE-16".into());
|
||||
let mut f2 = make_dast(
|
||||
"Missing permissions-policy header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
f2.cwe = Some("CWE-16".into());
|
||||
// These share CWE-16 but are different headers — phase 2 will merge them
|
||||
// since they share the same CWE+endpoint. This is acceptable because they
|
||||
// have the same root cause (missing security headers configuration).
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
// CWE-based dedup will merge these into 1
|
||||
assert!(
|
||||
result.len() <= 2,
|
||||
"same CWE+endpoint findings may be merged"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn different_endpoints_not_merged() {
|
||||
let f1 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let f2 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://other.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
assert_eq!(result.len(), 2, "different endpoints should not merge");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dedup_preserves_highest_severity() {
|
||||
let f1 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
let mut f2 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
f2.severity = Severity::High;
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
assert_eq!(result.len(), 1);
|
||||
assert_eq!(result[0].severity, Severity::High);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dedup_merges_evidence() {
|
||||
let mut f1 = make_dast(
|
||||
"Missing strict-transport-security header",
|
||||
"https://example.com",
|
||||
DastVulnType::SecurityHeaderMissing,
|
||||
);
|
||||
f1.evidence
|
||||
.push(compliance_core::models::dast::DastEvidence {
|
||||
request_method: "GET".into(),
|
||||
request_url: "https://example.com".into(),
|
||||
request_headers: None,
|
||||
request_body: None,
|
||||
response_status: 200,
|
||||
response_headers: None,
|
||||
response_snippet: Some("pass 1".into()),
|
||||
screenshot_path: None,
|
||||
payload: None,
|
||||
response_time_ms: None,
|
||||
});
|
||||
let mut f2 = f1.clone();
|
||||
f2.evidence[0].response_snippet = Some("pass 2".into());
|
||||
|
||||
let result = dedup_dast_findings(vec![f1, f2]);
|
||||
assert_eq!(result.len(), 1);
|
||||
assert_eq!(result[0].evidence.len(), 2, "evidence should be merged");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -10,7 +10,6 @@ use compliance_core::AgentConfig;
|
||||
use crate::database::Database;
|
||||
use crate::error::AgentError;
|
||||
use crate::llm::LlmClient;
|
||||
use crate::pipeline::code_review::CodeReviewScanner;
|
||||
use crate::pipeline::cve::CveScanner;
|
||||
use crate::pipeline::git::GitOps;
|
||||
use crate::pipeline::gitleaks::GitleaksScanner;
|
||||
@@ -241,21 +240,6 @@ impl PipelineOrchestrator {
|
||||
Err(e) => tracing::warn!("[{repo_id}] Lint scanning failed: {e}"),
|
||||
}
|
||||
|
||||
// Stage 4c: LLM Code Review (only on incremental scans)
|
||||
if let Some(old_sha) = &repo.last_scanned_commit {
|
||||
tracing::info!("[{repo_id}] Stage 4c: LLM Code Review");
|
||||
self.update_phase(scan_run_id, "code_review").await;
|
||||
let review_output = async {
|
||||
let reviewer = CodeReviewScanner::new(self.llm.clone());
|
||||
reviewer
|
||||
.review_diff(&repo_path, &repo_id, old_sha, ¤t_sha)
|
||||
.await
|
||||
}
|
||||
.instrument(tracing::info_span!("stage_code_review"))
|
||||
.await;
|
||||
all_findings.extend(review_output.findings);
|
||||
}
|
||||
|
||||
// Stage 4.5: Graph Building
|
||||
tracing::info!("[{repo_id}] Stage 4.5: Graph Building");
|
||||
self.update_phase(scan_run_id, "graph_building").await;
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
use compliance_core::models::*;
|
||||
|
||||
use super::dedup::compute_fingerprint;
|
||||
use super::orchestrator::PipelineOrchestrator;
|
||||
use crate::error::AgentError;
|
||||
use crate::pipeline::code_review::CodeReviewScanner;
|
||||
@@ -89,12 +90,37 @@ impl PipelineOrchestrator {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Dedup findings by fingerprint to avoid duplicate comments
|
||||
let mut seen_fps = std::collections::HashSet::new();
|
||||
let mut unique_findings: Vec<&Finding> = Vec::new();
|
||||
for finding in &pr_findings {
|
||||
let fp = compute_fingerprint(&[
|
||||
repo_id,
|
||||
&pr_number.to_string(),
|
||||
finding.file_path.as_deref().unwrap_or(""),
|
||||
&finding.line_number.unwrap_or(0).to_string(),
|
||||
&finding.title,
|
||||
]);
|
||||
if seen_fps.insert(fp) {
|
||||
unique_findings.push(finding);
|
||||
}
|
||||
}
|
||||
|
||||
let pr_findings = unique_findings;
|
||||
|
||||
// Build review comments from findings
|
||||
let mut review_comments = Vec::new();
|
||||
for finding in &pr_findings {
|
||||
if let (Some(path), Some(line)) = (&finding.file_path, finding.line_number) {
|
||||
let fp = compute_fingerprint(&[
|
||||
repo_id,
|
||||
&pr_number.to_string(),
|
||||
path,
|
||||
&line.to_string(),
|
||||
&finding.title,
|
||||
]);
|
||||
let comment_body = format!(
|
||||
"**[{}] {}**\n\n{}\n\n*Scanner: {} | {}*",
|
||||
"**[{}] {}**\n\n{}\n\n*Scanner: {} | {}*\n\n<!-- compliance-fp:{fp} -->",
|
||||
finding.severity,
|
||||
finding.title,
|
||||
finding.description,
|
||||
@@ -123,6 +149,17 @@ impl PipelineOrchestrator {
|
||||
.join("\n"),
|
||||
);
|
||||
|
||||
if review_comments.is_empty() {
|
||||
// All findings were on files/lines we can't comment on inline
|
||||
if let Err(e) = tracker
|
||||
.create_pr_review(owner, tracker_repo_name, pr_number, &summary, Vec::new())
|
||||
.await
|
||||
{
|
||||
tracing::warn!("[{repo_id}] Failed to post PR review summary: {e}");
|
||||
}
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
if let Err(e) = tracker
|
||||
.create_pr_review(
|
||||
owner,
|
||||
|
||||
@@ -98,7 +98,8 @@ impl IssueTracker for GiteaTracker {
|
||||
_ => "open",
|
||||
};
|
||||
|
||||
self.http
|
||||
let resp = self
|
||||
.http
|
||||
.patch(&url)
|
||||
.header(
|
||||
"Authorization",
|
||||
@@ -109,6 +110,14 @@ impl IssueTracker for GiteaTracker {
|
||||
.await
|
||||
.map_err(|e| CoreError::IssueTracker(format!("Gitea update issue failed: {e}")))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let status = resp.status();
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(CoreError::IssueTracker(format!(
|
||||
"Gitea update issue returned {status}: {text}"
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -123,7 +132,8 @@ impl IssueTracker for GiteaTracker {
|
||||
"/repos/{owner}/{repo}/issues/{external_id}/comments"
|
||||
));
|
||||
|
||||
self.http
|
||||
let resp = self
|
||||
.http
|
||||
.post(&url)
|
||||
.header(
|
||||
"Authorization",
|
||||
@@ -134,6 +144,14 @@ impl IssueTracker for GiteaTracker {
|
||||
.await
|
||||
.map_err(|e| CoreError::IssueTracker(format!("Gitea add comment failed: {e}")))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let status = resp.status();
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
return Err(CoreError::IssueTracker(format!(
|
||||
"Gitea add comment returned {status}: {text}"
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -158,7 +176,8 @@ impl IssueTracker for GiteaTracker {
|
||||
})
|
||||
.collect();
|
||||
|
||||
self.http
|
||||
let resp = self
|
||||
.http
|
||||
.post(&url)
|
||||
.header(
|
||||
"Authorization",
|
||||
@@ -173,6 +192,48 @@ impl IssueTracker for GiteaTracker {
|
||||
.await
|
||||
.map_err(|e| CoreError::IssueTracker(format!("Gitea PR review failed: {e}")))?;
|
||||
|
||||
if !resp.status().is_success() {
|
||||
let status = resp.status();
|
||||
let text = resp.text().await.unwrap_or_default();
|
||||
|
||||
// If inline comments caused the failure, retry with just the summary body
|
||||
if !comments.is_empty() {
|
||||
tracing::warn!(
|
||||
"Gitea PR review with inline comments failed ({status}): {text}, retrying as plain comment"
|
||||
);
|
||||
let fallback_url = self.api_url(&format!(
|
||||
"/repos/{owner}/{repo}/issues/{pr_number}/comments"
|
||||
));
|
||||
let fallback_resp = self
|
||||
.http
|
||||
.post(&fallback_url)
|
||||
.header(
|
||||
"Authorization",
|
||||
format!("token {}", self.token.expose_secret()),
|
||||
)
|
||||
.json(&serde_json::json!({ "body": body }))
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| {
|
||||
CoreError::IssueTracker(format!("Gitea PR comment fallback failed: {e}"))
|
||||
})?;
|
||||
|
||||
if !fallback_resp.status().is_success() {
|
||||
let fb_status = fallback_resp.status();
|
||||
let fb_text = fallback_resp.text().await.unwrap_or_default();
|
||||
return Err(CoreError::IssueTracker(format!(
|
||||
"Gitea PR comment fallback returned {fb_status}: {fb_text}"
|
||||
)));
|
||||
}
|
||||
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
return Err(CoreError::IssueTracker(format!(
|
||||
"Gitea PR review returned {status}: {text}"
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
||||
@@ -3645,3 +3645,205 @@ tbody tr:last-child td {
|
||||
.wizard-toggle.active .wizard-toggle-knob {
|
||||
transform: translateX(16px);
|
||||
}
|
||||
|
||||
/* ═══════════════════════════════════════════════════════════════
|
||||
HELP CHAT WIDGET
|
||||
Floating assistant for documentation Q&A
|
||||
═══════════════════════════════════════════════════════════════ */
|
||||
|
||||
.help-chat-toggle {
|
||||
position: fixed;
|
||||
bottom: 24px;
|
||||
right: 28px;
|
||||
z-index: 50;
|
||||
width: 48px;
|
||||
height: 48px;
|
||||
border-radius: 50%;
|
||||
background: var(--accent);
|
||||
color: var(--bg-primary);
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
box-shadow: 0 4px 20px rgba(0, 200, 255, 0.3);
|
||||
transition: transform 0.15s, box-shadow 0.15s;
|
||||
}
|
||||
.help-chat-toggle:hover {
|
||||
transform: scale(1.08);
|
||||
box-shadow: 0 6px 28px rgba(0, 200, 255, 0.4);
|
||||
}
|
||||
|
||||
.help-chat-panel {
|
||||
position: fixed;
|
||||
bottom: 24px;
|
||||
right: 28px;
|
||||
z-index: 51;
|
||||
width: 400px;
|
||||
height: 520px;
|
||||
background: var(--bg-secondary);
|
||||
border: 1px solid var(--border-bright);
|
||||
border-radius: 16px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
overflow: hidden;
|
||||
box-shadow: 0 12px 48px rgba(0, 0, 0, 0.5), var(--accent-glow);
|
||||
}
|
||||
|
||||
.help-chat-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 14px 18px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
background: var(--bg-primary);
|
||||
}
|
||||
.help-chat-title {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
font-family: 'Outfit', sans-serif;
|
||||
font-weight: 600;
|
||||
font-size: 14px;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
.help-chat-close {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--text-secondary);
|
||||
cursor: pointer;
|
||||
padding: 4px;
|
||||
border-radius: 6px;
|
||||
display: flex;
|
||||
}
|
||||
.help-chat-close:hover {
|
||||
color: var(--text-primary);
|
||||
background: var(--bg-elevated);
|
||||
}
|
||||
|
||||
.help-chat-messages {
|
||||
flex: 1;
|
||||
overflow-y: auto;
|
||||
padding: 16px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.help-chat-empty {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
height: 100%;
|
||||
text-align: center;
|
||||
color: var(--text-secondary);
|
||||
font-size: 13px;
|
||||
gap: 8px;
|
||||
}
|
||||
.help-chat-hint {
|
||||
font-size: 12px;
|
||||
color: var(--text-tertiary);
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.help-msg {
|
||||
max-width: 88%;
|
||||
animation: helpMsgIn 0.15s ease-out;
|
||||
}
|
||||
@keyframes helpMsgIn {
|
||||
from { opacity: 0; transform: translateY(6px); }
|
||||
to { opacity: 1; transform: translateY(0); }
|
||||
}
|
||||
.help-msg-user {
|
||||
align-self: flex-end;
|
||||
}
|
||||
.help-msg-assistant {
|
||||
align-self: flex-start;
|
||||
}
|
||||
.help-msg-content {
|
||||
padding: 10px 14px;
|
||||
border-radius: 12px;
|
||||
font-size: 13px;
|
||||
line-height: 1.55;
|
||||
word-wrap: break-word;
|
||||
}
|
||||
.help-msg-user .help-msg-content {
|
||||
background: var(--accent);
|
||||
color: var(--bg-primary);
|
||||
border-bottom-right-radius: 4px;
|
||||
}
|
||||
.help-msg-assistant .help-msg-content {
|
||||
background: var(--bg-elevated);
|
||||
color: var(--text-primary);
|
||||
border: 1px solid var(--border);
|
||||
border-bottom-left-radius: 4px;
|
||||
}
|
||||
.help-msg-assistant .help-msg-content code {
|
||||
background: rgba(0, 200, 255, 0.1);
|
||||
padding: 1px 5px;
|
||||
border-radius: 3px;
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-size: 12px;
|
||||
}
|
||||
.help-msg-loading {
|
||||
padding: 10px 14px;
|
||||
border-radius: 12px;
|
||||
background: var(--bg-elevated);
|
||||
border: 1px solid var(--border);
|
||||
border-bottom-left-radius: 4px;
|
||||
color: var(--text-secondary);
|
||||
font-size: 13px;
|
||||
animation: helpPulse 1.2s ease-in-out infinite;
|
||||
}
|
||||
@keyframes helpPulse {
|
||||
0%, 100% { opacity: 0.6; }
|
||||
50% { opacity: 1; }
|
||||
}
|
||||
|
||||
.help-chat-input {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
padding: 12px 14px;
|
||||
border-top: 1px solid var(--border);
|
||||
background: var(--bg-primary);
|
||||
}
|
||||
.help-chat-input input {
|
||||
flex: 1;
|
||||
background: var(--bg-elevated);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 8px;
|
||||
padding: 10px 14px;
|
||||
color: var(--text-primary);
|
||||
font-size: 13px;
|
||||
font-family: 'DM Sans', sans-serif;
|
||||
outline: none;
|
||||
transition: border-color 0.15s;
|
||||
}
|
||||
.help-chat-input input:focus {
|
||||
border-color: var(--accent);
|
||||
}
|
||||
.help-chat-input input::placeholder {
|
||||
color: var(--text-tertiary);
|
||||
}
|
||||
.help-chat-send {
|
||||
width: 36px;
|
||||
height: 36px;
|
||||
border-radius: 8px;
|
||||
background: var(--accent);
|
||||
color: var(--bg-primary);
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
transition: opacity 0.15s;
|
||||
}
|
||||
.help-chat-send:disabled {
|
||||
opacity: 0.4;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
.help-chat-send:not(:disabled):hover {
|
||||
background: var(--accent-hover);
|
||||
}
|
||||
|
||||
@@ -44,8 +44,6 @@ pub enum Route {
|
||||
PentestSessionPage { session_id: String },
|
||||
#[route("/mcp-servers")]
|
||||
McpServersPage {},
|
||||
#[route("/settings")]
|
||||
SettingsPage {},
|
||||
}
|
||||
|
||||
const FAVICON: Asset = asset!("/assets/favicon.svg");
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
use dioxus::prelude::*;
|
||||
|
||||
use crate::app::Route;
|
||||
use crate::components::help_chat::HelpChat;
|
||||
use crate::components::sidebar::Sidebar;
|
||||
use crate::components::toast::{ToastContainer, Toasts};
|
||||
use crate::infrastructure::auth_check::check_auth;
|
||||
@@ -21,6 +22,7 @@ pub fn AppShell() -> Element {
|
||||
Outlet::<Route> {}
|
||||
}
|
||||
ToastContainer {}
|
||||
HelpChat {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
198
compliance-dashboard/src/components/help_chat.rs
Normal file
198
compliance-dashboard/src/components/help_chat.rs
Normal file
@@ -0,0 +1,198 @@
|
||||
use dioxus::prelude::*;
|
||||
use dioxus_free_icons::icons::bs_icons::*;
|
||||
use dioxus_free_icons::Icon;
|
||||
|
||||
use crate::infrastructure::help_chat::{send_help_chat_message, HelpChatHistoryMessage};
|
||||
|
||||
// ── Message model ────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
struct ChatMsg {
|
||||
role: String,
|
||||
content: String,
|
||||
}
|
||||
|
||||
// ── Component ────────────────────────────────────────────────────────────────
|
||||
|
||||
#[component]
|
||||
pub fn HelpChat() -> Element {
|
||||
let mut is_open = use_signal(|| false);
|
||||
let mut messages = use_signal(Vec::<ChatMsg>::new);
|
||||
let mut input_text = use_signal(String::new);
|
||||
let mut is_loading = use_signal(|| false);
|
||||
|
||||
// Send message handler
|
||||
let on_send = move |_| {
|
||||
let text = input_text().trim().to_string();
|
||||
if text.is_empty() || is_loading() {
|
||||
return;
|
||||
}
|
||||
|
||||
// Push user message
|
||||
messages.write().push(ChatMsg {
|
||||
role: "user".into(),
|
||||
content: text.clone(),
|
||||
});
|
||||
input_text.set(String::new());
|
||||
is_loading.set(true);
|
||||
|
||||
// Build history for API call (exclude last user message, it goes as `message`)
|
||||
let history: Vec<HelpChatHistoryMessage> = messages()
|
||||
.iter()
|
||||
.rev()
|
||||
.skip(1) // skip the user message we just added
|
||||
.rev()
|
||||
.map(|m| HelpChatHistoryMessage {
|
||||
role: m.role.clone(),
|
||||
content: m.content.clone(),
|
||||
})
|
||||
.collect();
|
||||
|
||||
spawn(async move {
|
||||
match send_help_chat_message(text, history).await {
|
||||
Ok(resp) => {
|
||||
messages.write().push(ChatMsg {
|
||||
role: "assistant".into(),
|
||||
content: resp.data.message,
|
||||
});
|
||||
}
|
||||
Err(e) => {
|
||||
messages.write().push(ChatMsg {
|
||||
role: "assistant".into(),
|
||||
content: format!("Error: {e}"),
|
||||
});
|
||||
}
|
||||
}
|
||||
is_loading.set(false);
|
||||
});
|
||||
};
|
||||
|
||||
// Key handler for Enter to send
|
||||
let on_keydown = move |e: KeyboardEvent| {
|
||||
if e.key() == Key::Enter && !e.modifiers().shift() {
|
||||
e.prevent_default();
|
||||
let text = input_text().trim().to_string();
|
||||
if text.is_empty() || is_loading() {
|
||||
return;
|
||||
}
|
||||
messages.write().push(ChatMsg {
|
||||
role: "user".into(),
|
||||
content: text.clone(),
|
||||
});
|
||||
input_text.set(String::new());
|
||||
is_loading.set(true);
|
||||
|
||||
let history: Vec<HelpChatHistoryMessage> = messages()
|
||||
.iter()
|
||||
.rev()
|
||||
.skip(1)
|
||||
.rev()
|
||||
.map(|m| HelpChatHistoryMessage {
|
||||
role: m.role.clone(),
|
||||
content: m.content.clone(),
|
||||
})
|
||||
.collect();
|
||||
|
||||
spawn(async move {
|
||||
match send_help_chat_message(text, history).await {
|
||||
Ok(resp) => {
|
||||
messages.write().push(ChatMsg {
|
||||
role: "assistant".into(),
|
||||
content: resp.data.message,
|
||||
});
|
||||
}
|
||||
Err(e) => {
|
||||
messages.write().push(ChatMsg {
|
||||
role: "assistant".into(),
|
||||
content: format!("Error: {e}"),
|
||||
});
|
||||
}
|
||||
}
|
||||
is_loading.set(false);
|
||||
});
|
||||
}
|
||||
};
|
||||
|
||||
rsx! {
|
||||
// Floating toggle button
|
||||
if !is_open() {
|
||||
button {
|
||||
class: "help-chat-toggle",
|
||||
onclick: move |_| is_open.set(true),
|
||||
title: "Help",
|
||||
Icon { icon: BsQuestionCircle, width: 22, height: 22 }
|
||||
}
|
||||
}
|
||||
|
||||
// Chat panel
|
||||
if is_open() {
|
||||
div { class: "help-chat-panel",
|
||||
// Header
|
||||
div { class: "help-chat-header",
|
||||
span { class: "help-chat-title",
|
||||
Icon { icon: BsRobot, width: 16, height: 16 }
|
||||
"Help Assistant"
|
||||
}
|
||||
button {
|
||||
class: "help-chat-close",
|
||||
onclick: move |_| is_open.set(false),
|
||||
Icon { icon: BsX, width: 18, height: 18 }
|
||||
}
|
||||
}
|
||||
|
||||
// Messages area
|
||||
div { class: "help-chat-messages",
|
||||
if messages().is_empty() {
|
||||
div { class: "help-chat-empty",
|
||||
p { "Ask me anything about the Compliance Scanner." }
|
||||
p { class: "help-chat-hint",
|
||||
"e.g. \"How do I add a repository?\" or \"What is SBOM?\""
|
||||
}
|
||||
}
|
||||
}
|
||||
for (i, msg) in messages().iter().enumerate() {
|
||||
div {
|
||||
key: "{i}",
|
||||
class: if msg.role == "user" { "help-msg help-msg-user" } else { "help-msg help-msg-assistant" },
|
||||
div { class: "help-msg-content",
|
||||
dangerous_inner_html: if msg.role == "assistant" {
|
||||
// Basic markdown rendering: bold, code, newlines
|
||||
msg.content
|
||||
.replace("**", "<strong>")
|
||||
.replace("\n\n", "<br><br>")
|
||||
.replace("\n- ", "<br>- ")
|
||||
.replace("`", "<code>")
|
||||
} else {
|
||||
msg.content.clone()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if is_loading() {
|
||||
div { class: "help-msg help-msg-assistant",
|
||||
div { class: "help-msg-loading", "Thinking..." }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Input area
|
||||
div { class: "help-chat-input",
|
||||
input {
|
||||
r#type: "text",
|
||||
placeholder: "Ask a question...",
|
||||
value: "{input_text}",
|
||||
disabled: is_loading(),
|
||||
oninput: move |e| input_text.set(e.value()),
|
||||
onkeydown: on_keydown,
|
||||
}
|
||||
button {
|
||||
class: "help-chat-send",
|
||||
disabled: is_loading() || input_text().trim().is_empty(),
|
||||
onclick: on_send,
|
||||
Icon { icon: BsSend, width: 14, height: 14 }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -3,6 +3,7 @@ pub mod attack_chain;
|
||||
pub mod code_inspector;
|
||||
pub mod code_snippet;
|
||||
pub mod file_tree;
|
||||
pub mod help_chat;
|
||||
pub mod page_header;
|
||||
pub mod pagination;
|
||||
pub mod pentest_wizard;
|
||||
|
||||
@@ -52,11 +52,6 @@ pub fn Sidebar() -> Element {
|
||||
route: Route::PentestDashboardPage {},
|
||||
icon: rsx! { Icon { icon: BsLightningCharge, width: 18, height: 18 } },
|
||||
},
|
||||
NavItem {
|
||||
label: "Settings",
|
||||
route: Route::SettingsPage {},
|
||||
icon: rsx! { Icon { icon: BsGear, width: 18, height: 18 } },
|
||||
},
|
||||
];
|
||||
|
||||
let docs_url = option_env!("DOCS_URL").unwrap_or("/docs");
|
||||
|
||||
59
compliance-dashboard/src/infrastructure/help_chat.rs
Normal file
59
compliance-dashboard/src/infrastructure/help_chat.rs
Normal file
@@ -0,0 +1,59 @@
|
||||
use dioxus::prelude::*;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
// ── Response types ──
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct HelpChatApiResponse {
|
||||
pub data: HelpChatResponseData,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
pub struct HelpChatResponseData {
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
// ── History message type ──
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct HelpChatHistoryMessage {
|
||||
pub role: String,
|
||||
pub content: String,
|
||||
}
|
||||
|
||||
// ── Server function ──
|
||||
|
||||
#[server]
|
||||
pub async fn send_help_chat_message(
|
||||
message: String,
|
||||
history: Vec<HelpChatHistoryMessage>,
|
||||
) -> Result<HelpChatApiResponse, ServerFnError> {
|
||||
let state: super::server_state::ServerState =
|
||||
dioxus_fullstack::FullstackContext::extract().await?;
|
||||
|
||||
let url = format!("{}/api/v1/help/chat", state.agent_api_url);
|
||||
let client = reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(120))
|
||||
.build()
|
||||
.map_err(|e| ServerFnError::new(e.to_string()))?;
|
||||
|
||||
let resp = client
|
||||
.post(&url)
|
||||
.json(&serde_json::json!({
|
||||
"message": message,
|
||||
"history": history,
|
||||
}))
|
||||
.send()
|
||||
.await
|
||||
.map_err(|e| ServerFnError::new(format!("Help chat request failed: {e}")))?;
|
||||
|
||||
let text = resp
|
||||
.text()
|
||||
.await
|
||||
.map_err(|e| ServerFnError::new(format!("Failed to read response: {e}")))?;
|
||||
|
||||
let body: HelpChatApiResponse = serde_json::from_str(&text)
|
||||
.map_err(|e| ServerFnError::new(format!("Failed to parse response: {e}")))?;
|
||||
|
||||
Ok(body)
|
||||
}
|
||||
@@ -113,6 +113,72 @@ pub async fn add_mcp_server(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Probe each MCP server's health endpoint and update status in MongoDB.
|
||||
#[server]
|
||||
pub async fn refresh_mcp_status() -> Result<(), ServerFnError> {
|
||||
use chrono::Utc;
|
||||
use compliance_core::models::McpServerStatus;
|
||||
use mongodb::bson::doc;
|
||||
|
||||
let state: super::server_state::ServerState =
|
||||
dioxus_fullstack::FullstackContext::extract().await?;
|
||||
|
||||
let mut cursor = state
|
||||
.db
|
||||
.mcp_servers()
|
||||
.find(doc! {})
|
||||
.await
|
||||
.map_err(|e| ServerFnError::new(e.to_string()))?;
|
||||
|
||||
let client = reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(5))
|
||||
.build()
|
||||
.map_err(|e| ServerFnError::new(e.to_string()))?;
|
||||
|
||||
while cursor
|
||||
.advance()
|
||||
.await
|
||||
.map_err(|e| ServerFnError::new(e.to_string()))?
|
||||
{
|
||||
let server: compliance_core::models::McpServerConfig = cursor
|
||||
.deserialize_current()
|
||||
.map_err(|e| ServerFnError::new(e.to_string()))?;
|
||||
|
||||
let Some(oid) = server.id else { continue };
|
||||
|
||||
// Derive health URL from the endpoint (replace trailing /mcp with /health)
|
||||
let health_url = if server.endpoint_url.ends_with("/mcp") {
|
||||
format!(
|
||||
"{}health",
|
||||
&server.endpoint_url[..server.endpoint_url.len() - 3]
|
||||
)
|
||||
} else {
|
||||
format!("{}/health", server.endpoint_url.trim_end_matches('/'))
|
||||
};
|
||||
|
||||
let new_status = match client.get(&health_url).send().await {
|
||||
Ok(resp) if resp.status().is_success() => McpServerStatus::Running,
|
||||
_ => McpServerStatus::Stopped,
|
||||
};
|
||||
|
||||
let status_bson = match bson::to_bson(&new_status) {
|
||||
Ok(b) => b,
|
||||
Err(_) => continue,
|
||||
};
|
||||
|
||||
let _ = state
|
||||
.db
|
||||
.mcp_servers()
|
||||
.update_one(
|
||||
doc! { "_id": oid },
|
||||
doc! { "$set": { "status": status_bson, "updated_at": Utc::now().to_rfc3339() } },
|
||||
)
|
||||
.await;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[server]
|
||||
pub async fn delete_mcp_server(server_id: String) -> Result<(), ServerFnError> {
|
||||
use mongodb::bson::doc;
|
||||
|
||||
@@ -5,6 +5,7 @@ pub mod chat;
|
||||
pub mod dast;
|
||||
pub mod findings;
|
||||
pub mod graph;
|
||||
pub mod help_chat;
|
||||
pub mod issues;
|
||||
pub mod mcp;
|
||||
pub mod pentest;
|
||||
|
||||
@@ -123,7 +123,6 @@ pub fn FindingsPage() -> Element {
|
||||
option { value: "oauth", "OAuth" }
|
||||
option { value: "secret_detection", "Secrets" }
|
||||
option { value: "lint", "Lint" }
|
||||
option { value: "code_review", "Code Review" }
|
||||
}
|
||||
select {
|
||||
onchange: move |e| { status_filter.set(e.value()); page.set(1); },
|
||||
|
||||
@@ -5,7 +5,7 @@ use dioxus_free_icons::Icon;
|
||||
use crate::components::page_header::PageHeader;
|
||||
use crate::components::toast::{ToastType, Toasts};
|
||||
use crate::infrastructure::mcp::{
|
||||
add_mcp_server, delete_mcp_server, fetch_mcp_servers, regenerate_mcp_token,
|
||||
add_mcp_server, delete_mcp_server, fetch_mcp_servers, refresh_mcp_status, regenerate_mcp_token,
|
||||
};
|
||||
|
||||
#[component]
|
||||
@@ -22,6 +22,17 @@ pub fn McpServersPage() -> Element {
|
||||
let mut new_mongo_uri = use_signal(String::new);
|
||||
let mut new_mongo_db = use_signal(String::new);
|
||||
|
||||
// Probe health of all MCP servers on page load, then refresh the list
|
||||
let mut refreshing = use_signal(|| true);
|
||||
use_effect(move || {
|
||||
spawn(async move {
|
||||
refreshing.set(true);
|
||||
let _ = refresh_mcp_status().await;
|
||||
servers.restart();
|
||||
refreshing.set(false);
|
||||
});
|
||||
});
|
||||
|
||||
// Track which server's token is visible
|
||||
let mut visible_token: Signal<Option<String>> = use_signal(|| None);
|
||||
// Track which server is pending delete confirmation
|
||||
|
||||
@@ -16,7 +16,6 @@ pub mod pentest_dashboard;
|
||||
pub mod pentest_session;
|
||||
pub mod repositories;
|
||||
pub mod sbom;
|
||||
pub mod settings;
|
||||
|
||||
pub use chat::ChatPage;
|
||||
pub use chat_index::ChatIndexPage;
|
||||
@@ -36,4 +35,3 @@ pub use pentest_dashboard::PentestDashboardPage;
|
||||
pub use pentest_session::PentestSessionPage;
|
||||
pub use repositories::RepositoriesPage;
|
||||
pub use sbom::SbomPage;
|
||||
pub use settings::SettingsPage;
|
||||
|
||||
@@ -1,142 +0,0 @@
|
||||
use dioxus::prelude::*;
|
||||
|
||||
use crate::components::page_header::PageHeader;
|
||||
|
||||
#[component]
|
||||
pub fn SettingsPage() -> Element {
|
||||
let mut litellm_url = use_signal(|| "http://localhost:4000".to_string());
|
||||
let mut litellm_model = use_signal(|| "gpt-4o".to_string());
|
||||
let mut github_token = use_signal(String::new);
|
||||
let mut gitlab_url = use_signal(|| "https://gitlab.com".to_string());
|
||||
let mut gitlab_token = use_signal(String::new);
|
||||
let mut jira_url = use_signal(String::new);
|
||||
let mut jira_email = use_signal(String::new);
|
||||
let mut jira_token = use_signal(String::new);
|
||||
let mut jira_project = use_signal(String::new);
|
||||
let mut searxng_url = use_signal(|| "http://localhost:8888".to_string());
|
||||
|
||||
rsx! {
|
||||
PageHeader {
|
||||
title: "Settings",
|
||||
description: "Configure integrations and scanning parameters",
|
||||
}
|
||||
|
||||
div { class: "card",
|
||||
div { class: "card-header", "LiteLLM Configuration" }
|
||||
div { class: "form-group",
|
||||
label { "LiteLLM URL" }
|
||||
input {
|
||||
r#type: "text",
|
||||
value: "{litellm_url}",
|
||||
oninput: move |e| litellm_url.set(e.value()),
|
||||
}
|
||||
}
|
||||
div { class: "form-group",
|
||||
label { "Model" }
|
||||
input {
|
||||
r#type: "text",
|
||||
value: "{litellm_model}",
|
||||
oninput: move |e| litellm_model.set(e.value()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
div { class: "card",
|
||||
div { class: "card-header", "GitHub Integration" }
|
||||
div { class: "form-group",
|
||||
label { "Personal Access Token" }
|
||||
input {
|
||||
r#type: "password",
|
||||
placeholder: "ghp_...",
|
||||
value: "{github_token}",
|
||||
oninput: move |e| github_token.set(e.value()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
div { class: "card",
|
||||
div { class: "card-header", "GitLab Integration" }
|
||||
div { class: "form-group",
|
||||
label { "GitLab URL" }
|
||||
input {
|
||||
r#type: "text",
|
||||
value: "{gitlab_url}",
|
||||
oninput: move |e| gitlab_url.set(e.value()),
|
||||
}
|
||||
}
|
||||
div { class: "form-group",
|
||||
label { "Access Token" }
|
||||
input {
|
||||
r#type: "password",
|
||||
placeholder: "glpat-...",
|
||||
value: "{gitlab_token}",
|
||||
oninput: move |e| gitlab_token.set(e.value()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
div { class: "card",
|
||||
div { class: "card-header", "Jira Integration" }
|
||||
div { class: "form-group",
|
||||
label { "Jira URL" }
|
||||
input {
|
||||
r#type: "text",
|
||||
placeholder: "https://your-org.atlassian.net",
|
||||
value: "{jira_url}",
|
||||
oninput: move |e| jira_url.set(e.value()),
|
||||
}
|
||||
}
|
||||
div { class: "form-group",
|
||||
label { "Email" }
|
||||
input {
|
||||
r#type: "email",
|
||||
value: "{jira_email}",
|
||||
oninput: move |e| jira_email.set(e.value()),
|
||||
}
|
||||
}
|
||||
div { class: "form-group",
|
||||
label { "API Token" }
|
||||
input {
|
||||
r#type: "password",
|
||||
value: "{jira_token}",
|
||||
oninput: move |e| jira_token.set(e.value()),
|
||||
}
|
||||
}
|
||||
div { class: "form-group",
|
||||
label { "Project Key" }
|
||||
input {
|
||||
r#type: "text",
|
||||
placeholder: "SEC",
|
||||
value: "{jira_project}",
|
||||
oninput: move |e| jira_project.set(e.value()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
div { class: "card",
|
||||
div { class: "card-header", "SearXNG" }
|
||||
div { class: "form-group",
|
||||
label { "SearXNG URL" }
|
||||
input {
|
||||
r#type: "text",
|
||||
value: "{searxng_url}",
|
||||
oninput: move |e| searxng_url.set(e.value()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
div { style: "margin-top: 16px;",
|
||||
button {
|
||||
class: "btn btn-primary",
|
||||
onclick: move |_| {
|
||||
tracing::info!("Settings save not yet implemented - settings are managed via .env");
|
||||
},
|
||||
"Save Settings"
|
||||
}
|
||||
p {
|
||||
style: "margin-top: 8px; font-size: 12px; color: var(--text-secondary);",
|
||||
"Note: Settings are currently configured via environment variables (.env file). Dashboard-based settings persistence coming soon."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -41,7 +41,9 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
StreamableHttpServerConfig::default(),
|
||||
);
|
||||
|
||||
let router = axum::Router::new().nest_service("/mcp", service);
|
||||
let router = axum::Router::new()
|
||||
.route("/health", axum::routing::get(|| async { "ok" }))
|
||||
.nest_service("/mcp", service);
|
||||
let listener = tokio::net::TcpListener::bind(("0.0.0.0", port)).await?;
|
||||
tracing::info!("MCP HTTP server listening on 0.0.0.0:{port}");
|
||||
axum::serve(listener, router).await?;
|
||||
|
||||
61
docs/features/deduplication.md
Normal file
61
docs/features/deduplication.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Finding Deduplication
|
||||
|
||||
The Compliance Scanner automatically deduplicates findings across all scanning surfaces to prevent noise and duplicate issues.
|
||||
|
||||
## SAST Finding Dedup
|
||||
|
||||
Static analysis findings are deduplicated using SHA-256 fingerprints computed from:
|
||||
|
||||
- Repository ID
|
||||
- Scanner rule ID (e.g., Semgrep check ID)
|
||||
- File path
|
||||
- Line number
|
||||
|
||||
Before inserting a new finding, the pipeline checks if a finding with the same fingerprint already exists. If it does, the finding is skipped.
|
||||
|
||||
## DAST / Pentest Finding Dedup
|
||||
|
||||
Dynamic testing findings go through two-phase deduplication:
|
||||
|
||||
### Phase 1: Exact Dedup
|
||||
|
||||
Findings with the same canonicalized title, endpoint, and HTTP method are merged. Evidence from duplicate findings is combined into a single finding, keeping the highest severity.
|
||||
|
||||
**Title canonicalization** handles common variations:
|
||||
- Domain names and URLs are stripped from titles (e.g., "Missing HSTS header for example.com" becomes "Missing HSTS header")
|
||||
- Known synonyms are resolved (e.g., "HSTS" maps to "strict-transport-security", "CSP" maps to "content-security-policy")
|
||||
|
||||
### Phase 2: CWE-Based Dedup
|
||||
|
||||
After exact dedup, findings with the same CWE and endpoint are merged. This catches cases where different tools report the same underlying issue with different titles or vulnerability types (e.g., a missing HSTS header reported as both `security_header_missing` and `tls_misconfiguration`).
|
||||
|
||||
The primary finding is selected by highest severity, then most evidence, then longest description. Evidence from merged findings is preserved.
|
||||
|
||||
### When Dedup Applies
|
||||
|
||||
- **At insertion time**: During a pentest session, before each finding is stored in MongoDB
|
||||
- **At report export**: When generating a pentest report, all session findings are deduplicated before rendering
|
||||
|
||||
## PR Review Comment Dedup
|
||||
|
||||
PR review comments are deduplicated to prevent posting the same finding multiple times:
|
||||
|
||||
- Each comment includes a fingerprint computed from the repository, PR number, file path, line, and finding title
|
||||
- Within a single review run, duplicate findings are skipped
|
||||
- The fingerprint is embedded as an HTML comment in the review body for future cross-run dedup
|
||||
|
||||
## Issue Tracker Dedup
|
||||
|
||||
Before creating an issue in GitHub, GitLab, Jira, or Gitea, the scanner:
|
||||
|
||||
1. Searches for an existing issue matching the finding's fingerprint
|
||||
2. Falls back to searching by issue title
|
||||
3. Skips creation if a match is found
|
||||
|
||||
## Code Review Dedup
|
||||
|
||||
Multi-pass LLM code reviews (logic, security, convention, complexity) are deduplicated across passes using proximity-aware keys:
|
||||
|
||||
- Findings within 3 lines of each other on the same file with similar normalized titles are considered duplicates
|
||||
- The finding with the highest severity is kept
|
||||
- CWE information is merged from duplicates
|
||||
60
docs/features/help-chat.md
Normal file
60
docs/features/help-chat.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Help Chat Assistant
|
||||
|
||||
The Help Chat is a floating assistant available on every page of the dashboard. It answers questions about the Compliance Scanner using the project documentation as its knowledge base.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Click the **?** button in the bottom-right corner of any page
|
||||
2. Type your question and press Enter
|
||||
3. The assistant responds with answers grounded in the project documentation
|
||||
|
||||
The chat supports multi-turn conversations -- you can ask follow-up questions and the assistant will remember the context of your conversation.
|
||||
|
||||
## What You Can Ask
|
||||
|
||||
- **Getting started**: "How do I add a repository?" / "How do I trigger a scan?"
|
||||
- **Features**: "What is SBOM?" / "How does the code knowledge graph work?"
|
||||
- **Configuration**: "How do I set up webhooks?" / "What environment variables are needed?"
|
||||
- **Scanning**: "What does the scan pipeline do?" / "How does LLM triage work?"
|
||||
- **DAST & Pentesting**: "How do I run a pentest?" / "What DAST tools are available?"
|
||||
- **Integrations**: "How do I connect to GitHub?" / "What is MCP?"
|
||||
|
||||
## Technical Details
|
||||
|
||||
The help chat loads all project documentation (README, guides, feature docs, reference) at startup and caches them in memory. When you ask a question, it sends your message along with the full documentation context to the LLM via LiteLLM, which generates a grounded response.
|
||||
|
||||
### API Endpoint
|
||||
|
||||
```
|
||||
POST /api/v1/help/chat
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"message": "How do I add a repository?",
|
||||
"history": [
|
||||
{ "role": "user", "content": "previous question" },
|
||||
{ "role": "assistant", "content": "previous answer" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
The help chat uses the same LiteLLM configuration as other LLM features:
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `LITELLM_URL` | LiteLLM API base URL | `http://localhost:4000` |
|
||||
| `LITELLM_MODEL` | Model for chat responses | `gpt-4o` |
|
||||
| `LITELLM_API_KEY` | API key (optional) | -- |
|
||||
|
||||
### Documentation Sources
|
||||
|
||||
The assistant indexes the following documentation at startup:
|
||||
|
||||
- `README.md` -- Project overview and quick start
|
||||
- `docs/guide/` -- Getting started, repositories, findings, SBOM, scanning, issues, webhooks
|
||||
- `docs/features/` -- AI Chat, DAST, Code Graph, MCP Server, Pentesting, Help Chat
|
||||
- `docs/reference/` -- Glossary, tools reference
|
||||
|
||||
If documentation files are not found at startup (e.g., in a minimal Docker deployment), the assistant falls back to general knowledge about the project.
|
||||
@@ -1,8 +1,6 @@
|
||||
# Dashboard Overview
|
||||
|
||||
The Overview page is the landing page of Certifai. It gives you a high-level view of your security posture across all tracked repositories.
|
||||
|
||||

|
||||
The Overview page is the landing page of the Compliance Scanner. It gives you a high-level view of your security posture across all tracked repositories.
|
||||
|
||||
## Stats Cards
|
||||
|
||||
@@ -34,6 +32,10 @@ The overview includes quick-access cards for the AI Chat feature. Each card repr
|
||||
|
||||
If you have MCP servers registered, they appear on the overview page with their status and connection details. This lets you quickly check that your MCP integrations are running. See [MCP Integration](/features/mcp-server) for details.
|
||||
|
||||
## Help Chat Assistant
|
||||
|
||||
A floating help chat button is available in the bottom-right corner of every page. Click it to ask questions about the Compliance Scanner -- how to configure repositories, understand findings, set up webhooks, or use any feature. The assistant is grounded in the project documentation and uses LiteLLM for responses.
|
||||
|
||||
## Recent Scan Runs
|
||||
|
||||
The bottom section lists the most recent scan runs across all repositories, showing:
|
||||
|
||||
Reference in New Issue
Block a user