sharang/compliance-scanner-agent

Fork 0

Files

Sharang Parnerkar 4d7efea683

CI / Check (pull_request) Successful in 13m17s

Details

CI / Detect Changes (pull_request) Has been skipped

Details

CI / Deploy Agent (pull_request) Has been skipped

Details

CI / Deploy Dashboard (pull_request) Has been skipped

Details

CI / Deploy Docs (pull_request) Has been skipped

Details

CI / Deploy MCP (pull_request) Has been skipped

Details

docs: update README and add help-chat, deduplication docs

README.md:
- Add DAST, pentesting, code graph, AI chat, MCP, help chat to features table
- Add Gitea to tracker list, multi-language LLM triage note
- Update architecture diagram with all 5 workspace crates
- Add new API endpoints (graph, DAST, chat, help, pentest)
- Update dashboard pages table (remove Settings, add 6 new pages)
- Update project structure with new directories
- Add Keycloak, Chromium to external services

New docs:
- docs/features/help-chat.md — Help chat assistant usage, API, config
- docs/features/deduplication.md — Finding dedup across SAST, DAST, PR, issues

Updated:
- docs/features/overview.md — Add help chat section, update tracker list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-30 09:49:11 +02:00

2.7 KiB

Raw Blame History

Finding Deduplication

The Compliance Scanner automatically deduplicates findings across all scanning surfaces to prevent noise and duplicate issues.

SAST Finding Dedup

Static analysis findings are deduplicated using SHA-256 fingerprints computed from:

Repository ID
Scanner rule ID (e.g., Semgrep check ID)
File path
Line number

Before inserting a new finding, the pipeline checks if a finding with the same fingerprint already exists. If it does, the finding is skipped.

DAST / Pentest Finding Dedup

Dynamic testing findings go through two-phase deduplication:

Phase 1: Exact Dedup

Findings with the same canonicalized title, endpoint, and HTTP method are merged. Evidence from duplicate findings is combined into a single finding, keeping the highest severity.

Title canonicalization handles common variations:

Domain names and URLs are stripped from titles (e.g., "Missing HSTS header for example.com" becomes "Missing HSTS header")
Known synonyms are resolved (e.g., "HSTS" maps to "strict-transport-security", "CSP" maps to "content-security-policy")

Phase 2: CWE-Based Dedup

After exact dedup, findings with the same CWE and endpoint are merged. This catches cases where different tools report the same underlying issue with different titles or vulnerability types (e.g., a missing HSTS header reported as both security_header_missing and tls_misconfiguration).

The primary finding is selected by highest severity, then most evidence, then longest description. Evidence from merged findings is preserved.

When Dedup Applies

At insertion time: During a pentest session, before each finding is stored in MongoDB
At report export: When generating a pentest report, all session findings are deduplicated before rendering

PR Review Comment Dedup

PR review comments are deduplicated to prevent posting the same finding multiple times:

Each comment includes a fingerprint computed from the repository, PR number, file path, line, and finding title
Within a single review run, duplicate findings are skipped
The fingerprint is embedded as an HTML comment in the review body for future cross-run dedup

Issue Tracker Dedup

Before creating an issue in GitHub, GitLab, Jira, or Gitea, the scanner:

Searches for an existing issue matching the finding's fingerprint
Falls back to searching by issue title
Skips creation if a match is found

Code Review Dedup

Multi-pass LLM code reviews (logic, security, convention, complexity) are deduplicated across passes using proximity-aware keys:

Findings within 3 lines of each other on the same file with similar normalized titles are considered duplicates
The finding with the highest severity is kept
CWE information is merged from duplicates

2.7 KiB Raw Blame History

Finding Deduplication

SAST Finding Dedup

DAST / Pentest Finding Dedup

Phase 1: Exact Dedup

Phase 2: CWE-Based Dedup

When Dedup Applies

PR Review Comment Dedup

Issue Tracker Dedup

Code Review Dedup

2.7 KiB

Raw Blame History