feat: add multi-language idiom awareness to all LLM review prompts

Add language-specific false positive suppression for Python, Go, Java, Kotlin, Ruby, PHP, and C/C++ across all review passes (logic, security, convention) and triage. Each prompt now lists common idiomatic patterns per language that should not be flagged. Also adds language-specific fix guidance so suggested code fixes use each language's canonical secure coding patterns (e.g. parameterized queries, secure random, HTML escaping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:17:19 +02:00
parent da4084ee78
commit b58f7e47df
8 changed files with 890 additions and 6 deletions
@@ -11,9 +11,18 @@ Rules:
 - The fix must be a drop-in replacement for the vulnerable code
 - Preserve the original code's style, indentation, and naming conventions
 - Add at most one brief inline comment on the changed line explaining the security fix
- If the fix requires importing a new module, include the import statement on a separate line prefixed with "// Add import: "
+- If the fix requires importing a new module, include the import on a separate line prefixed with the language's comment syntax + "Add import: "
 - Do not refactor, rename variables, or "improve" unrelated code
- If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment "// No fix needed: <reason>""#;
+- If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment explaining why no fix is needed
+
+Language-specific fix guidance:
+- Rust: use `?` for error propagation, prefer `SecretString` for secrets, use parameterized queries with `sqlx`/`diesel`
+- Python: use parameterized queries (never f-strings in SQL), use `secrets` module not `random`, use `subprocess.run([...])` list form, use `markupsafe.escape()` for HTML
+- Go: use `sql.Query` with `$1`/`?` placeholders, use `crypto/rand` not `math/rand`, use `html/template` not `text/template`, return errors don't panic
+- Java/Kotlin: use `PreparedStatement` with `?` params, use `SecureRandom`, use `Jsoup.clean()` for HTML sanitization, use `@Valid` for input validation
+- Ruby: use ActiveRecord parameterized finders, use `SecureRandom`, use `ERB::Util.html_escape`, use `strong_parameters`
+- PHP: use PDO prepared statements with `:param` or `?`, use `random_bytes()`/`random_int()`, use `htmlspecialchars()` with `ENT_QUOTES`, use `password_hash(PASSWORD_BCRYPT)`
+- C/C++: use `snprintf` not `sprintf`, use bounds-checked APIs, free resources in reverse allocation order, use `memset_s` for secret cleanup"#;

 pub async fn suggest_fix(llm: &Arc<LlmClient>, finding: &Finding) -> Result<String, AgentError> {
    let user_prompt = format!(
@@ -15,9 +15,17 @@ Do NOT report:
 - Style, naming, formatting, documentation, or code organization preferences
 - Theoretical issues without a concrete triggering scenario
 - "Potential" problems that require assumptions not supported by the visible code
- Language-idiomatic patterns (e.g. Rust's `||` short-circuit evaluation, variable shadowing, `impl` patterns)
 - Complexity or function length — that's a separate review pass

+Language-idiomatic patterns that are NOT bugs (do not flag these):
+- Rust: `||`/`&&` short-circuit evaluation, variable shadowing, `let` rebinding, `clone()`, `impl` blocks, `match` arms with guards, `?` operator chaining, `unsafe` blocks with safety comments
+- Python: duck typing, EAFP pattern (try/except vs check-first), `*args`/`**kwargs`, walrus operator `:=`, truthiness checks on containers, bare `except:` in top-level handlers
+- Go: multiple return values for errors, `if err != nil` patterns, goroutine + channel patterns, blank identifier `_`, named returns, `defer` for cleanup, `init()` functions
+- Java/Kotlin: checked exception patterns, method overloading, `Optional` vs null checks, Kotlin `?.` safe calls, `!!` non-null assertions in tests, `when` exhaustive matching, companion objects, `lateinit`
+- Ruby: monkey patching in libraries, method_missing, blocks/procs/lambdas, `rescue => e` patterns, `send`/`respond_to?` metaprogramming, `nil` checks via `&.` safe navigation
+- PHP: loose comparisons with `==` (only flag if `===` was clearly intended), `@` error suppression in legacy code, `isset()`/`empty()` patterns, magic methods (`__get`, `__call`), array functions as callbacks
+- C/C++: RAII patterns, move semantics, `const_cast`/`static_cast` in appropriate contexts, macro usage for platform compat, pointer arithmetic in low-level code, `goto` for cleanup in C
+
 Severity guide:
 - high: Will cause incorrect behavior in normal usage
 - medium: Will cause incorrect behavior in edge cases
@@ -47,9 +55,18 @@ Do NOT report:
 - Logging of non-sensitive operational data (finding titles, counts, performance metrics)
 - "Information disclosure" for data that is already public or user-facing
 - Code style, performance, or general quality issues
- Missing validation on internal function parameters (trust the caller within the same crate)
+- Missing validation on internal function parameters (trust the caller within the same module/crate/package)
 - Theoretical attacks that require preconditions not present in the code

+Language-specific patterns that are NOT vulnerabilities (do not flag these):
+- Python: `pickle` used on trusted internal data, `eval()`/`exec()` on hardcoded strings, `subprocess` with hardcoded commands, Django `mark_safe()` on static content, `assert` in non-security contexts
+- Go: `crypto/rand` is secure (don't confuse with `math/rand`), `sql.DB` with parameterized queries is safe, `http.ListenAndServe` without TLS in dev/internal, error strings in responses (Go convention)
+- Java/Kotlin: Spring Security annotations are sufficient auth checks, `@Transactional` provides atomicity, JPA parameterized queries are safe, Kotlin `require()`/`check()` are assertion patterns not vulnerabilities
+- Ruby: Rails `params.permit()` is input validation, `render html:` with `html_safe` on generated content, ActiveRecord parameterized finders are safe, Devise/Warden patterns for auth
+- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars()` is XSS mitigation, Symfony security voters are auth checks, `password_hash()`/`password_verify()` are correct bcrypt usage
+- C/C++: `strncpy`/`snprintf` are bounds-checked (vs `strcpy`/`sprintf`), smart pointers manage memory, RAII handles cleanup, `static_assert` is compile-time only, OpenSSL with proper context setup
+- Rust: `sha2`/`blake3` for fingerprinting is not "weak crypto", `unsafe` with documented invariants, `secrecy::SecretString` properly handles secrets
+
 Severity guide:
 - critical: Remote code execution, auth bypass, or data breach with no preconditions
 - high: Exploitable vulnerability requiring minimal preconditions
@@ -73,9 +90,17 @@ Do NOT report:
 - Style preferences, formatting, naming conventions, or documentation
 - Code organization suggestions ("this function should be split")
 - Patterns that are valid in the language even if you'd write them differently
- Rust-specific: variable shadowing, `||`/`&&` short-circuit, `let` rebinding, builder patterns, `clone()` usage
 - "Missing type annotations" unless the code literally won't compile or causes a type inference bug

+Language-specific patterns that are conventional (do not flag these):
+- Rust: variable shadowing, `||`/`&&` short-circuit, `let` rebinding, builder patterns, `clone()`, `From`/`Into` impl chains, `#[allow(...)]` attributes
+- Python: `**kwargs` forwarding, `@property` setters, `__dunder__` methods, list comprehensions with conditions, `if TYPE_CHECKING` imports, `noqa` comments
+- Go: stuttering names (`http.HTTPClient`) discouraged but not a bug, `context.Context` as first param, init() functions, `//nolint` directives, returning concrete types vs interfaces in internal code
+- Java/Kotlin: builder pattern boilerplate, Lombok annotations (`@Data`, `@Builder`), Kotlin data classes, `companion object` factories, `@Suppress` annotations, checked exception wrapping
+- Ruby: `attr_accessor` usage, `Enumerable` mixin patterns, `module_function`, `class << self` syntax, DSL blocks (Rake, RSpec, Sinatra routes)
+- PHP: `__construct` with property promotion, Laravel facades, static factory methods, nullable types with `?`, attribute syntax `#[...]`
+- C/C++: header guards vs `#pragma once`, forward declarations, `const` correctness patterns, template specialization, `auto` type deduction
+
 Severity guide:
 - medium: Convention violation that will likely cause a bug or maintenance problem
 - low: Convention violation that is a minor concern
@@ -17,13 +17,23 @@ Actions:
 - "dismiss": False positive, not exploitable, or not actionable. Remove it.

 Dismiss when:
- The scanner flagged a language idiom as a bug (e.g. Rust short-circuit `||`, variable shadowing, `clone()`)
+- The scanner flagged a language idiom as a bug (see examples below)
 - The finding is in test/example/generated/vendored code
 - The "vulnerability" requires preconditions that don't exist in the code
 - The finding is about code style, complexity, or theoretical concerns rather than actual bugs
 - A hash function is used for non-security purposes (dedup, caching, content addressing)
 - Internal logging of non-sensitive operational data is flagged as "information disclosure"
 - The finding duplicates another finding already in the list
+- Framework-provided security is already in place (e.g. ORM parameterized queries, CSRF middleware, auth decorators)
+
+Common false positive patterns by language (dismiss these):
+- Rust: short-circuit `||`/`&&`, variable shadowing, `clone()`, `unsafe` with safety docs, `sha2` for fingerprinting
+- Python: EAFP try/except, `subprocess` with hardcoded args, `pickle` on trusted data, Django `mark_safe` on static content
+- Go: `if err != nil` is not "swallowed error", `crypto/rand` is secure, returning errors is not "information disclosure"
+- Java/Kotlin: Spring Security annotations are valid auth, JPA parameterized queries are safe, Kotlin `!!` in tests is fine
+- Ruby: Rails `params.permit` is validation, ActiveRecord finders are parameterized, `html_safe` on generated content
+- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars` is XSS mitigation
+- C/C++: `strncpy`/`snprintf` are bounds-checked, smart pointers manage memory, RAII handles cleanup

 Confirm only when:
 - You can describe a concrete scenario where the bug manifests or the vulnerability is exploitable