feat: pentest onboarding — streaming, browser automation, reports, user cleanup (#16)
All checks were successful
CI / Check (push) Has been skipped
CI / Detect Changes (push) Successful in 7s
CI / Deploy Agent (push) Successful in 2s
CI / Deploy Dashboard (push) Successful in 2s
CI / Deploy Docs (push) Successful in 2s
CI / Deploy MCP (push) Successful in 2s

Complete pentest feature overhaul: SSE streaming, session-persistent browser tool (CDP), AES-256 credential encryption, auto-screenshots in reports, code-level remediation correlation, SAST triage chunking, context window optimization, test user cleanup (Keycloak/Auth0/Okta), wizard dropdowns, attack chain improvements, architecture docs with Mermaid diagrams.

Co-authored-by: Sharang Parnerkar <parnerkarsharang@gmail.com>
Reviewed-on: #16
This commit was merged in pull request #16.
This commit is contained in:
2026-03-17 20:32:20 +00:00
parent 11e1c5f438
commit c461faa2fb
57 changed files with 8844 additions and 2423 deletions

View File

@@ -1,6 +1,7 @@
import { defineConfig } from 'vitepress'
import { withMermaid } from 'vitepress-plugin-mermaid'
export default defineConfig({
export default withMermaid(defineConfig({
title: 'Certifai',
description: 'AI-powered security compliance scanning platform',
ignoreDeadLinks: [
@@ -31,6 +32,7 @@ export default defineConfig({
{ text: 'Dashboard Overview', link: '/features/overview' },
{ text: 'DAST Scanning', link: '/features/dast' },
{ text: 'AI Pentest', link: '/features/pentest' },
{ text: 'Pentest Architecture', link: '/features/pentest-architecture' },
{ text: 'AI Chat', link: '/features/ai-chat' },
{ text: 'Code Knowledge Graph', link: '/features/graph' },
{ text: 'MCP Integration', link: '/features/mcp-server' },
@@ -51,4 +53,5 @@ export default defineConfig({
message: 'Certifai Documentation',
},
},
})
mermaid: {},
}))

View File

@@ -0,0 +1,273 @@
# Pentest Orchestration Architecture
This document explains how the AI pentest orchestrator works under the hood — which steps use the LLM, what context is passed at each stage, and how findings are correlated back to source code.
## High-Level Flow
```mermaid
flowchart TD
subgraph Wizard["Onboarding Wizard (Dashboard)"]
W1[Step 1: Target & Scope] --> W2[Step 2: Authentication]
W2 --> W3[Step 3: Strategy & Instructions]
W3 --> W4[Step 4: Disclaimer & Confirm]
end
W4 -->|POST /sessions| API["Agent API"]
API -->|Encrypt credentials| CRYPTO["AES-256-GCM<br/>Credentials at Rest"]
API -->|Acquire semaphore| SEM["Concurrency Limiter<br/>(max 5 sessions)"]
SEM --> SPAWN["Spawn Orchestrator Task"]
SPAWN --> GATHER["Gather Repo Context"]
subgraph Context["Context Gathering (DB Queries)"]
GATHER --> SAST["SAST Findings<br/>(open/triaged, top 100)"]
GATHER --> SBOM["SBOM Entries<br/>(with known CVEs)"]
GATHER --> GRAPH["Code Knowledge Graph<br/>(entry points → source files)"]
end
SAST & SBOM & GRAPH --> PROMPT["Build System Prompt"]
subgraph LLMLoop["LLM Orchestration Loop (max 50 iterations)"]
PROMPT --> PAUSE{"Paused?"}
PAUSE -->|Yes| WAIT["Wait for resume signal"]
WAIT --> PAUSE
PAUSE -->|No| LLM["LLM Call<br/>(LiteLLM → Claude/GPT)"]
LLM -->|Content response| DONE["Session Complete"]
LLM -->|Tool calls| EXEC["Execute Tools"]
EXEC --> STORE["Store attack chain nodes<br/>+ findings in MongoDB"]
STORE --> SSE["Broadcast SSE events"]
SSE --> LLM
end
STORE --> REPORT["Report Generation"]
subgraph Report["Report Export"]
REPORT --> HTML["HTML Report Builder"]
HTML --> CORRELATE["Code-Level Correlation"]
CORRELATE --> PDF["Chrome CDP → PDF"]
PDF --> ZIP["AES-256 ZIP Archive"]
end
style LLMLoop fill:#1e293b,stroke:#3b82f6,color:#f8fafc
style Context fill:#0f172a,stroke:#16a34a,color:#f8fafc
style Report fill:#0f172a,stroke:#d97706,color:#f8fafc
```
## What the LLM Sees
The orchestrator constructs a system prompt containing all available context. Here is exactly what is passed to the LLM at the start of each session:
### System Prompt Structure
```
┌─────────────────────────────────────────────────────────┐
│ SYSTEM PROMPT │
├─────────────────────────────────────────────────────────┤
│ ## Target │
│ Name, URL, type, rate limit, destructive flag, repo ID │
│ │
│ ## Strategy │
│ Guidance text based on selected strategy │
│ │
│ ## SAST Findings (Static Analysis) │
│ Up to 20 findings with severity, file:line, CWE │
│ ← From linked repository's SAST scan │
│ │
│ ## Vulnerable Dependencies (SBOM) │
│ Up to 15 entries with package, version, CVE IDs │
│ ← From linked repository's SBOM scan │
│ │
│ ## Code Entry Points (Knowledge Graph) │
│ Up to 20 entry points with endpoint → file mapping │
│ Each linked to SAST findings in the same file │
│ ← From code knowledge graph build │
│ │
│ ## Authentication (if configured) │
│ Mode, credentials (decrypted), registration URL │
│ Verification email for plus-addressing │
│ │
│ ## Custom HTTP Headers │
│ Key-value pairs to include in all requests │
│ │
│ ## Scope Exclusions │
│ Paths the LLM must not test │
│ │
│ ## Available Tools │
│ List of all registered tool names │
│ │
│ ## Instructions │
│ Step-by-step testing methodology │
└─────────────────────────────────────────────────────────┘
```
### Per-Iteration Messages
After the system prompt, each LLM call includes the full conversation history:
| Role | Content |
|------|---------|
| `system` | System prompt (above) |
| `user` | Initial instructions or user message |
| `assistant` | LLM reasoning + tool call requests |
| `tool` | Tool execution results (one per tool call) |
| `assistant` | Next reasoning + tool calls |
| ... | Continues until LLM says "testing complete" or max 50 iterations |
## Tool Registry
The LLM can invoke any of these tools. Each tool is registered with a JSON Schema that the LLM uses for structured tool calling:
| Tool | Category | What it does |
|------|----------|-------------|
| `recon` | Recon | HTTP fingerprinting, technology detection |
| `openapi_parser` | API | Discover endpoints from OpenAPI/Swagger specs |
| `security_headers` | Headers | Check for missing security headers |
| `cookie_analyzer` | Cookies | Analyze cookie flags (Secure, HttpOnly, SameSite) |
| `csp_analyzer` | CSP | Evaluate Content-Security-Policy directives |
| `cors_checker` | CORS | Test CORS misconfiguration |
| `tls_analyzer` | TLS | Inspect TLS certificate and cipher suites |
| `dns_checker` | DNS | DNS record enumeration |
| `dmarc_checker` | Email | DMARC/SPF/DKIM verification |
| `rate_limit_tester` | Rate Limit | Test rate limiting on endpoints |
| `console_log_detector` | Logs | Find console.log leakage in JavaScript |
| `sql_injection` | SQLi | SQL injection testing with payloads |
| `xss` | XSS | Cross-site scripting testing |
| `ssrf` | SSRF | Server-side request forgery testing |
| `auth_bypass` | Auth | Authentication bypass testing |
| `api_fuzzer` | Fuzzer | API endpoint fuzzing |
| `browser` | Browser | Headless Chrome automation (navigate, click, fill, screenshot, evaluate JS) |
### Browser Tool
The `browser` tool gives the LLM full control of a headless Chrome instance via CDP (Chrome DevTools Protocol). It supports:
- **navigate** — Go to a URL, return title
- **screenshot** — Capture PNG screenshot (base64)
- **click** — Click a CSS-selected element
- **fill** — Fill a form field with a value
- **get_content** — Read full page HTML
- **evaluate** — Execute arbitrary JavaScript
This is used for registration page discovery, form filling, and visual inspection.
## Session Lifecycle
```mermaid
stateDiagram-v2
[*] --> Running : POST /sessions
Running --> Paused : POST /sessions/{id}/pause
Paused --> Running : POST /sessions/{id}/resume
Running --> Completed : LLM says "testing complete"
Running --> Failed : Error or timeout
Paused --> Failed : POST /sessions/{id}/stop
Running --> Failed : POST /sessions/{id}/stop
Completed --> [*]
Failed --> [*]
```
## SSE Streaming
Each session has a dedicated broadcast channel. The `/sessions/{id}/stream` endpoint:
1. **Replays** stored messages and attack chain nodes as an initial burst
2. **Subscribes** to the live broadcast for real-time events
3. **Keepalive** comments every 15 seconds
Event types:
| Event | When |
|-------|------|
| `tool_start` | LLM requests a tool execution |
| `tool_complete` | Tool finishes with summary + finding count |
| `finding` | New vulnerability discovered |
| `message` | LLM sends a text message |
| `paused` | Session paused |
| `resumed` | Session resumed |
| `complete` | Session finished |
| `error` | Session failed |
## Code-Level Correlation in Reports
When a DAST finding is linked to source code, the report includes a **Code-Level Remediation** section showing exactly what to fix:
### Correlation Channels
```mermaid
flowchart LR
DAST["DAST Finding"]
DAST -->|linked_sast_finding_id| SAST["SAST Finding"]
SAST --> CODE["file:line + code snippet<br/>+ suggested fix"]
DAST -->|endpoint match| GRAPH["Code Knowledge Graph"]
GRAPH --> ENTRY["Handler function + file<br/>+ known vulns in file"]
DAST -->|linked CVE| SBOM["SBOM Entry"]
SBOM --> DEP["Package + version<br/>+ upgrade recommendation"]
style CODE fill:#dc2626,color:#fff
style ENTRY fill:#3b82f6,color:#fff
style DEP fill:#d97706,color:#fff
```
| Channel | Priority | What it shows |
|---------|----------|---------------|
| **SAST Correlation** | 1 (direct link) | Exact file:line, vulnerable code snippet (red), suggested fix (green), scanner rule, CWE |
| **Code Entry Point** | 2 (endpoint match) | Handler function, source file, all SAST issues in that file |
| **Vulnerable Dependency** | 3 (CVE match) | Package name + version, CVE IDs, PURL, upgrade guidance |
### Example Report Finding
A finding like "Reflected XSS in /api/search" would show:
1. The DAST evidence (request, response, payload)
2. **SAST Correlation**: `src/routes/search.rs:42` — semgrep found unescaped user input
3. **Code snippet**: The vulnerable line highlighted in red
4. **Suggested fix**: The patched code in green
5. **Recommendation**: Framework-specific guidance
## Screenshots
### Pentest Dashboard
![Pentest Dashboard](/screenshots/pentest-dashboard.png)
The dashboard shows aggregate statistics, severity distribution, and recent sessions with status badges. Running sessions can be paused, resumed, or stopped.
### Onboarding Wizard
**Step 1 — Target & Scope** (with dropdown showing existing DAST targets):
![Wizard Step 1 — Target dropdown](/screenshots/pentest-wizard-step1-dropdown.png)
**Step 2 — Authentication** (Auto-Register mode with optional registration URL, verification email, IMAP settings):
![Wizard Step 2 — Auth](/screenshots/pentest-wizard-step2-auth.png)
**Step 3 — Strategy & Instructions** (strategy selection, scope exclusions, duration, tester info):
![Wizard Step 3 — Strategy](/screenshots/pentest-wizard-step3-strategy.png)
**Step 4 — Review & Confirm** (summary + authorization disclaimer):
![Wizard Step 4 — Confirm](/screenshots/pentest-wizard-step4-confirm.png)
### Session — Findings
![Session Findings](/screenshots/pentest-session-findings.png)
Each finding shows severity, CWE, endpoint, description, and remediation. Exploitable findings are flagged. SAST correlations are shown when available.
### Session — Attack Chain
![Attack Chain](/screenshots/pentest-attack-chain.png)
The attack chain visualizes the DAG of tool executions grouped into phases (Reconnaissance, Analysis, Boundary Testing, Exploitation). Each node shows tool name, category, duration, findings count, and risk score. Running nodes pulse with an animation.
## Concurrency & Security
- **Max 5 concurrent sessions** via `tokio::Semaphore` — returns HTTP 429 when exhausted
- **Credentials encrypted at rest** with AES-256-GCM (key from `PENTEST_ENCRYPTION_KEY` env var)
- **Credentials redacted** in all API responses (replaced with `********`)
- **Credentials decrypted only** when building the LLM prompt (in-memory, never logged)
- **Report archives** are AES-256 encrypted ZIPs with SHA-256 integrity checksums

View File

@@ -15,17 +15,47 @@ The dashboard shows:
## Starting a Pentest Session
1. Click **New Pentest** on the dashboard
2. Select a **DAST target** (must be configured under DAST > Targets first)
3. Choose a **strategy**:
Click **New Pentest** on the dashboard to open the 4-step onboarding wizard:
### Step 1 — Target & Scope
- **App URL** — enter manually or select from existing DAST targets (dropdown)
- **Git Repository URL** — enter manually or select from tracked repositories (dropdown). If an SSH URL is selected, the deploy key is displayed for easy copy
- **Branch / Commit** — auto-populated when you click **Lookup** for a tracked repo
- **App Type** and **Rate Limit**
### Step 2 — Authentication
- **None** — unauthenticated testing
- **Manual Credentials** — provide username/password (encrypted at rest with AES-256-GCM)
- **Auto-Register** — the orchestrator uses the browser tool (headless Chrome) to discover the registration page and create a test account:
- **Registration URL** (optional) — auto-discovered via Playwright if omitted
- **Verification Email** (optional) — override the agent's default mailbox. Uses plus-addressing (`base+sessionid@domain`) and polls IMAP for verification links
- **IMAP Settings** — collapsible section to override host/port/credentials
### Step 3 — Strategy & Instructions
| Strategy | Description |
|----------|-------------|
| **Comprehensive** | Full-spectrum test covering recon, API analysis, injection testing, auth checks, and more |
| **Focused** | Targets specific vulnerability categories based on initial reconnaissance |
| **Quick** | Focus on common/high-impact vulnerabilities with minimal tool invocations |
| **Targeted** | SAST-guided — prioritize areas where static analysis found issues |
| **Aggressive** | Maximum payloads, attempt full exploitation |
| **Stealth** | Minimal noise, passive analysis, targeted probes |
4. Optionally provide an initial **message** to guide the AI's focus
5. Click **Start** to begin the session
- **Initial Instructions** — free-text guidance for the AI
- **Scope Exclusions** — paths to skip
- **Max Duration**, **Tester Name/Email**, **Destructive Tests** toggle
### Step 4 — Disclaimer & Confirm
Review the configuration summary and accept the authorization disclaimer.
The wizard can be closed at any time via the **X** button (top-right corner) or by clicking outside the modal.
::: tip Architecture Deep-Dive
See [Pentest Orchestration Architecture](./pentest-architecture.md) for details on how the LLM loop works, what context is passed, and how findings are correlated to source code.
:::
The AI orchestrator will autonomously select and execute security tools in phases, using the output of each phase to inform the next.
@@ -65,9 +95,11 @@ A visual DAG (directed acyclic graph) showing the sequence of tools executed dur
- **Finding badges** — red badge showing the number of findings produced by each tool
- **Interactive** — hover for details, click to select, scroll to zoom, drag to pan
### Stopping a Session
### Pausing, Resuming & Stopping
Running sessions can be stopped from the dashboard by clicking the **Stop** button on the session card. This immediately halts all tool execution.
- **Pause** — click the **Pause** button on a running session to suspend the orchestrator loop. The session status changes to `paused` and the LLM stops iterating. SSE clients receive a `paused` event.
- **Resume** — click **Resume** on a paused session to continue from where it left off. The status returns to `running` and a `resumed` event is broadcast.
- **Stop** — click **Stop** to permanently halt the session. This marks it as `failed` with reason "Stopped by user".
## Exporting Reports

1471
docs/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -8,5 +8,9 @@
},
"devDependencies": {
"vitepress": "^1.6.4"
},
"dependencies": {
"mermaid": "^11.13.0",
"vitepress-plugin-mermaid": "^2.0.17"
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB