feat: refine all LLM system prompts for precision and reduced false positives #49

Merged
sharang merged 3 commits from feat/refine-llm-prompts into main 2026-03-30 07:11:17 +00:00

3 Commits

Author SHA1 Message Date
Sharang Parnerkar
5c376aff9e chore: remove one-pager files from branch
All checks were successful
CI / Check (pull_request) Successful in 10m11s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
2026-03-29 23:17:28 +02:00
Sharang Parnerkar
b58f7e47df feat: add multi-language idiom awareness to all LLM review prompts
Some checks failed
CI / Detect Changes (pull_request) Has been cancelled
CI / Deploy Agent (pull_request) Has been cancelled
CI / Deploy Dashboard (pull_request) Has been cancelled
CI / Deploy Docs (pull_request) Has been cancelled
CI / Deploy MCP (pull_request) Has been cancelled
CI / Check (pull_request) Has been cancelled
Add language-specific false positive suppression for Python, Go, Java,
Kotlin, Ruby, PHP, and C/C++ across all review passes (logic, security,
convention) and triage. Each prompt now lists common idiomatic patterns
per language that should not be flagged.

Also adds language-specific fix guidance so suggested code fixes use
each language's canonical secure coding patterns (e.g. parameterized
queries, secure random, HTML escaping).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:17:19 +02:00
Sharang Parnerkar
da4084ee78 feat: refine all LLM system prompts for precision and reduced false positives
Some checks failed
CI / Check (pull_request) Successful in 10m8s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been cancelled
CI / Deploy Dashboard (pull_request) Has been cancelled
CI / Deploy Docs (pull_request) Has been cancelled
CI / Deploy MCP (pull_request) Has been cancelled
Code review prompts (review_prompts.rs):
- Add explicit "Do NOT report" sections listing common false positive patterns
- Add language-specific guidance (Rust short-circuit, shadowing, clone patterns)
- Cap findings per pass (3 for conventions, 2 for complexity) to reduce noise
- Raise complexity thresholds (80 lines, 5+ nesting) to pragmatic levels
- Require concrete bug scenarios, not theoretical concerns
- Separate severity guides per pass with clear definitions

Triage prompt (triage.rs):
- Add explicit dismiss criteria for language idioms, non-security hash usage,
  operational logging, and duplicate findings
- Add confirm-only-when criteria requiring concrete exploit scenarios
- Refined confidence scoring guide with clear thresholds

Finding descriptions (descriptions.rs):
- Rewrite to be developer-facing: lead with what/where, skip filler
- Fix suggestions should show corrected code, not vulnerable code
- Remove generic "could lead to" phrasing in favor of specific scenarios

Code fix suggestions (fixes.rs):
- Require drop-in replacement code preserving original style
- Handle false positives by returning original code with explanation
- Limit inline comments to the changed line only

Pentest orchestrator (prompt_builder.rs):
- Add "Finding Quality Rules" section preventing duplicate findings
- Instruct grouping related findings (e.g. missing headers = one finding)
- Cap missing header severity at medium unless exploit demonstrated
- Mark console.log in vendored/minified JS as informational only

RAG chat (chat.rs):
- Add concise rules for referencing files/lines and security context

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 22:57:37 +02:00