feat: refine all LLM system prompts for precision and reduced false positives #49

Merged
sharang merged 3 commits from feat/refine-llm-prompts into main 2026-03-30 07:11:17 +00:00
6 changed files with 196 additions and 63 deletions

View File

@@ -90,10 +90,13 @@ pub async fn chat(
};
let system_prompt = format!(
"You are an expert code assistant for a software repository. \
Answer the user's question based on the code context below. \
Reference specific files and functions when relevant. \
If the context doesn't contain enough information, say so.\n\n\
"You are a code assistant for this repository. Answer questions using the code context below.\n\n\
Review

[low] Inconsistent prompt formatting style

The system prompts in chat.rs, descriptions.rs, and fixes.rs use different formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown-like formatting with explicit rules sections. This inconsistency may make the prompts harder to maintain and understand.

Suggested fix: Standardize the prompt formatting style across all modules to use consistent markdown-style formatting with clear rule sections

*Scanner: code-review/convention | *

**[low] Inconsistent prompt formatting style** The system prompts in chat.rs, descriptions.rs, and fixes.rs use different formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown-like formatting with explicit rules sections. This inconsistency may make the prompts harder to maintain and understand. Suggested fix: Standardize the prompt formatting style across all modules to use consistent markdown-style formatting with clear rule sections *Scanner: code-review/convention | * <!-- compliance-fp:82c7ff37328f996e8e8b3694edf67ea5d9d4957c685ac8f132795d9b428b1bf2 -->
Review

[medium] Incorrect system prompt for chat handler

The system prompt in chat.rs has been updated but contains a formatting issue. The new prompt uses '\n\n' which creates an extra blank line after 'Answer questions using the code context below.' This may cause formatting inconsistencies in the LLM input. Additionally, the structure of the rules section appears to have been changed from a bulleted list to a paragraph-style list without proper markdown formatting.

Suggested fix: Ensure consistent formatting by using proper markdown bullet points for the rules section and verify that the newline spacing is intentional. Consider using a more structured approach like:

Rules:
- Reference specific files, functions, and line numbers
- Show code snippets when they help explain the answer
- If the context is insufficient, say what's missing rather than guessing
- Be concise — lead with the answer, then explain if needed
- For security questions, note relevant CWEs and link to the finding if one exists

## Code Context

{code_context}

*Scanner: code-review/logic | *

**[medium] Incorrect system prompt for chat handler** The system prompt in chat.rs has been updated but contains a formatting issue. The new prompt uses '\n\n' which creates an extra blank line after 'Answer questions using the code context below.' This may cause formatting inconsistencies in the LLM input. Additionally, the structure of the rules section appears to have been changed from a bulleted list to a paragraph-style list without proper markdown formatting. Suggested fix: Ensure consistent formatting by using proper markdown bullet points for the rules section and verify that the newline spacing is intentional. Consider using a more structured approach like: ``` Rules: - Reference specific files, functions, and line numbers - Show code snippets when they help explain the answer - If the context is insufficient, say what's missing rather than guessing - Be concise — lead with the answer, then explain if needed - For security questions, note relevant CWEs and link to the finding if one exists ## Code Context {code_context} ``` *Scanner: code-review/logic | * <!-- compliance-fp:00e5ba7e7bf0ce3023ad5d187231847a9bfd9ad020e68b64c039bfa2ca171b0b -->
Review

[medium] Complex System Prompt in Chat Handler

The system prompt in the chat handler has been significantly expanded with multiple rules and formatting requirements. While this improves guidance, it increases the complexity of the prompt and makes it harder to maintain and understand.

Suggested fix: Consider breaking down this large system prompt into smaller, more manageable components or using a configuration file to define the prompt structure. This would improve readability and maintainability.

*Scanner: code-review/complexity | *

**[medium] Complex System Prompt in Chat Handler** The system prompt in the chat handler has been significantly expanded with multiple rules and formatting requirements. While this improves guidance, it increases the complexity of the prompt and makes it harder to maintain and understand. Suggested fix: Consider breaking down this large system prompt into smaller, more manageable components or using a configuration file to define the prompt structure. This would improve readability and maintainability. *Scanner: code-review/complexity | * <!-- compliance-fp:d3e7d936cf1348efe931b5ef2e65945421e8a85beb6b79567de551fd63241325 -->
Review

[high] Potential Prompt Injection in System Prompts

The system prompts in chat.rs, descriptions.rs, and fixes.rs are constructed using string formatting with user-provided or context-dependent data. While these appear to be static strings in the current diff, if any part of the prompt construction involves user input or external context without proper sanitization, it could allow prompt injection attacks that manipulate the LLM's behavior.

Suggested fix: Ensure all dynamic content used in system prompts is properly sanitized and validated before inclusion. Consider using a templating engine with strict escaping or validating that no malicious prompt injection patterns are present.

Scanner: code-review/security | CWE: CWE-116

**[high] Potential Prompt Injection in System Prompts** The system prompts in chat.rs, descriptions.rs, and fixes.rs are constructed using string formatting with user-provided or context-dependent data. While these appear to be static strings in the current diff, if any part of the prompt construction involves user input or external context without proper sanitization, it could allow prompt injection attacks that manipulate the LLM's behavior. Suggested fix: Ensure all dynamic content used in system prompts is properly sanitized and validated before inclusion. Consider using a templating engine with strict escaping or validating that no malicious prompt injection patterns are present. *Scanner: code-review/security | CWE: CWE-116* <!-- compliance-fp:6ad414c91d0485224048cf7a3f0e1b0753e5e91a6e34f73ece922b54ef5f2e87 -->
Review

[high] Incorrect system prompt for chat handler

The system prompt in chat.rs has been updated but the new prompt structure may not be compatible with existing code that expects a specific format. The new prompt contains more complex formatting with rules and sections that might break parsing or expectation of the LLM response format.

Suggested fix: Verify that all downstream components properly handle the new multi-line system prompt format and that the LLM responses are parsed correctly according to the new structure.

*Scanner: code-review/logic | *

**[high] Incorrect system prompt for chat handler** The system prompt in chat.rs has been updated but the new prompt structure may not be compatible with existing code that expects a specific format. The new prompt contains more complex formatting with rules and sections that might break parsing or expectation of the LLM response format. Suggested fix: Verify that all downstream components properly handle the new multi-line system prompt format and that the LLM responses are parsed correctly according to the new structure. *Scanner: code-review/logic | * <!-- compliance-fp:00e5ba7e7bf0ce3023ad5d187231847a9bfd9ad020e68b64c039bfa2ca171b0b -->
Review

[medium] Inconsistent prompt formatting in system prompts

The system prompts in chat.rs, descriptions.rs, and fixes.rs use inconsistent formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown formatting. This inconsistency makes the prompts harder to maintain and read.

Suggested fix: Standardize the prompt formatting across all files to use consistent multi-line string formatting (either all backslashes or all newlines) for better readability and maintainability.

*Scanner: code-review/convention | *

**[medium] Inconsistent prompt formatting in system prompts** The system prompts in chat.rs, descriptions.rs, and fixes.rs use inconsistent formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown formatting. This inconsistency makes the prompts harder to maintain and read. Suggested fix: Standardize the prompt formatting across all files to use consistent multi-line string formatting (either all backslashes or all newlines) for better readability and maintainability. *Scanner: code-review/convention | * <!-- compliance-fp:51670ee74b911364c8748f6b0567402224781bbe1c2c80e583b579b25cea727d -->
Review

[medium] Inconsistent Error Handling Pattern

The chat function in compliance-agent/src/api/handlers/chat.rs uses ? operator for error propagation but doesn't handle the case where code_context might be empty or invalid, potentially leading to inconsistent error handling compared to other parts of the application.

Suggested fix: Add explicit error handling for empty or malformed code_context to ensure consistent error propagation throughout the application.

*Scanner: code-review/convention | *

**[medium] Inconsistent Error Handling Pattern** The `chat` function in `compliance-agent/src/api/handlers/chat.rs` uses `?` operator for error propagation but doesn't handle the case where `code_context` might be empty or invalid, potentially leading to inconsistent error handling compared to other parts of the application. Suggested fix: Add explicit error handling for empty or malformed code_context to ensure consistent error propagation throughout the application. *Scanner: code-review/convention | * <!-- compliance-fp:4fd1e28653fd6de4c5850a0036f381c3f99a1fa956ad1f64957fc13261b1b2f4 -->
Rules:\n\
- Reference specific files, functions, and line numbers\n\
- Show code snippets when they help explain the answer\n\
- If the context is insufficient, say what's missing rather than guessing\n\
- Be concise — lead with the answer, then explain if needed\n\
- For security questions, note relevant CWEs and link to the finding if one exists\n\n\
## Code Context\n\n{code_context}"
);

View File

@@ -5,15 +5,20 @@ use compliance_core::models::Finding;
use crate::error::AgentError;
use crate::llm::LlmClient;
Review

[medium] Insecure LLM System Prompt Configuration

The system prompts for LLM-based security analysis have been modified to include detailed security guidance and rules. While this may improve output quality, it introduces potential risks if the prompts themselves contain sensitive information or if they inadvertently expose internal security practices. The prompts now include specific language guidance for various programming languages which could potentially be exploited if the LLM generates code based on these prompts.

Suggested fix: Review the system prompt content to ensure no sensitive information is exposed. Consider implementing prompt validation to prevent leakage of internal security practices or methodologies.

Scanner: code-review/security | CWE: CWE-200

**[medium] Insecure LLM System Prompt Configuration** The system prompts for LLM-based security analysis have been modified to include detailed security guidance and rules. While this may improve output quality, it introduces potential risks if the prompts themselves contain sensitive information or if they inadvertently expose internal security practices. The prompts now include specific language guidance for various programming languages which could potentially be exploited if the LLM generates code based on these prompts. Suggested fix: Review the system prompt content to ensure no sensitive information is exposed. Consider implementing prompt validation to prevent leakage of internal security practices or methodologies. *Scanner: code-review/security | CWE: CWE-200* <!-- compliance-fp:c98dad63ad241b2938987d740f8bb37a25ef9d3145f32876ca0acbb4fb20c7af -->
const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing issue descriptions for a bug tracker. Generate a clear, actionable issue body in Markdown format that includes:
const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing a bug tracker issue for a developer to fix. Be direct and actionable — developers skim issue descriptions, so lead with what matters.
Review

[medium] Inconsistent prompt formatting in description generation

The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured. The original format had numbered sections (1. Summary, 2. Evidence, etc.) but the new version uses a different structure with headers like 'What', 'Why it matters', 'Fix'. While this might be intentional, the transition from numbered sections to descriptive headers could affect how the LLM interprets the expected output format. Also, the rule about not restating the finding title in the body is now placed after the main format specification, which might be confusing for the LLM.

Suggested fix: Consider placing the rules section before the main format specification to ensure clarity for the LLM. Also, verify that the change from numbered sections to descriptive headers won't negatively impact the LLM's ability to parse and follow the required format.

*Scanner: code-review/logic | *

**[medium] Inconsistent prompt formatting in description generation** The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured. The original format had numbered sections (1. Summary, 2. Evidence, etc.) but the new version uses a different structure with headers like 'What', 'Why it matters', 'Fix'. While this might be intentional, the transition from numbered sections to descriptive headers could affect how the LLM interprets the expected output format. Also, the rule about not restating the finding title in the body is now placed after the main format specification, which might be confusing for the LLM. Suggested fix: Consider placing the rules section before the main format specification to ensure clarity for the LLM. Also, verify that the change from numbered sections to descriptive headers won't negatively impact the LLM's ability to parse and follow the required format. *Scanner: code-review/logic | * <!-- compliance-fp:cc6a4c4df828a4d2df9ffd67ce0fec117ce12e5b00be4bc364530465d7e598c5 -->
Review

[medium] Enhanced Description System Prompt

The description system prompt has been substantially rewritten with new formatting requirements and rules. The increased complexity makes it harder to reason about and maintain, especially with the detailed formatting constraints.

Suggested fix: Consider extracting the Markdown format specification and rules into separate constants or documentation to reduce the complexity of this single prompt string.

*Scanner: code-review/complexity | *

**[medium] Enhanced Description System Prompt** The description system prompt has been substantially rewritten with new formatting requirements and rules. The increased complexity makes it harder to reason about and maintain, especially with the detailed formatting constraints. Suggested fix: Consider extracting the Markdown format specification and rules into separate constants or documentation to reduce the complexity of this single prompt string. *Scanner: code-review/complexity | * <!-- compliance-fp:16cce2d1218e19b5cd599ae07f0f6115939102d6fe3557afe3e380137aac9073 -->
Review

[high] Inconsistent prompt formatting in description generation

The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured with new formatting requirements and rules. This change may cause compatibility issues with existing LLM models or response parsing logic that expected the previous format.

Suggested fix: Ensure that the LLM client properly handles the new Markdown format specification and that response parsing logic correctly extracts the required fields from the structured output.

*Scanner: code-review/logic | *

**[high] Inconsistent prompt formatting in description generation** The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured with new formatting requirements and rules. This change may cause compatibility issues with existing LLM models or response parsing logic that expected the previous format. Suggested fix: Ensure that the LLM client properly handles the new Markdown format specification and that response parsing logic correctly extracts the required fields from the structured output. *Scanner: code-review/logic | * <!-- compliance-fp:cc6a4c4df828a4d2df9ffd67ce0fec117ce12e5b00be4bc364530465d7e598c5 -->
1. **Summary**: 1-2 sentence overview
2. **Evidence**: Code location, snippet, and what was detected
3. **Impact**: What could happen if not fixed
4. **Remediation**: Step-by-step fix instructions
5. **References**: Relevant CWE/CVE links if applicable
Format in Markdown:
Keep it concise and professional. Use code blocks for code snippets."#;
1. **What**: 1 sentence — what's wrong and where (file:line)
2. **Why it matters**: 1-2 sentences — concrete impact if not fixed. Avoid generic "could lead to" phrasing; describe the specific attack or failure scenario.
3. **Fix**: The specific code change needed. Use a code block with the corrected code if possible. If the fix is configuration-based, show the exact config change.
4. **References**: CWE/CVE link if applicable (one line, not a section)
Review

[low] Missing type annotations in function signatures

The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion.

Suggested fix: Add explicit return type annotations to function signatures for better clarity

*Scanner: code-review/convention | *

**[low] Missing type annotations in function signatures** The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion. Suggested fix: Add explicit return type annotations to function signatures for better clarity *Scanner: code-review/convention | * <!-- compliance-fp:01d9a4d9987fd525ce484ad4230522dbe66023f39b41c0afe79ab43fc36e0ab3 -->
Review

[low] Missing type annotations in function signatures

The function signatures in descriptions.rs and fixes.rs lack explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity.

Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability.

*Scanner: code-review/convention | *

**[low] Missing type annotations in function signatures** The function signatures in descriptions.rs and fixes.rs lack explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity. Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability. *Scanner: code-review/convention | * <!-- compliance-fp:01d9a4d9987fd525ce484ad4230522dbe66023f39b41c0afe79ab43fc36e0ab3 -->
Review

[low] Missing Type Annotations in Function Signature

The generate_issue_description function in compliance-agent/src/llm/descriptions.rs lacks explicit type annotations for its parameters, which could make the function signature less clear and harder to maintain.

Suggested fix: Add explicit type annotations to function parameters for better clarity and maintainability.

*Scanner: code-review/convention | *

**[low] Missing Type Annotations in Function Signature** The `generate_issue_description` function in `compliance-agent/src/llm/descriptions.rs` lacks explicit type annotations for its parameters, which could make the function signature less clear and harder to maintain. Suggested fix: Add explicit type annotations to function parameters for better clarity and maintainability. *Scanner: code-review/convention | * <!-- compliance-fp:e66bbf9eafe00210b7ac8c8838419cf14d542576ff6f4fb7fb9e538843f97550 -->
Rules:
- No filler paragraphs or background explanations
- No restating the finding title in the body
- Code blocks should show the FIX, not the vulnerable code (the developer can see that in the diff)
- If the remediation is a one-liner, just say it — don't wrap it in a section header"#;
pub async fn generate_issue_description(
llm: &Arc<LlmClient>,

View File

@@ -5,7 +5,24 @@ use compliance_core::models::Finding;
use crate::error::AgentError;
use crate::llm::LlmClient;
Review

[medium] Complex System Prompt in Fixes Module

The FIX_SYSTEM_PROMPT constant in fixes.rs contains a very long and complex prompt with extensive rules and language-specific guidance. This makes the prompt hard to maintain and understand, increasing the risk of errors when making updates.

Suggested fix: Break down the complex prompt into smaller, more manageable components or extract language-specific guidance into separate constants/functions to improve readability and maintainability.

*Scanner: code-review/complexity | *

**[medium] Complex System Prompt in Fixes Module** The FIX_SYSTEM_PROMPT constant in fixes.rs contains a very long and complex prompt with extensive rules and language-specific guidance. This makes the prompt hard to maintain and understand, increasing the risk of errors when making updates. Suggested fix: Break down the complex prompt into smaller, more manageable components or extract language-specific guidance into separate constants/functions to improve readability and maintainability. *Scanner: code-review/complexity | * <!-- compliance-fp:3ecfd787fa26a1b4f60f333e10acbcd9f516d12ca4cdf9ecd674bc4a7777b7b8 -->
Review

[medium] LLM Fix Generation Guidance Contains Security Implementation Details

The FIX_SYSTEM_PROMPT in fixes.rs contains detailed implementation guidance for fixing vulnerabilities across multiple programming languages. This includes specific recommendations like using parameterized queries, secure random generators, and proper error handling patterns. While helpful for generating fixes, this information could potentially be misused by attackers to understand how to exploit similar vulnerabilities in other systems.

Suggested fix: Consider removing or obfuscating the detailed implementation guidance from the system prompt to prevent potential exploitation of security best practices by malicious actors.

Scanner: code-review/security | CWE: CWE-200

**[medium] LLM Fix Generation Guidance Contains Security Implementation Details** The FIX_SYSTEM_PROMPT in fixes.rs contains detailed implementation guidance for fixing vulnerabilities across multiple programming languages. This includes specific recommendations like using parameterized queries, secure random generators, and proper error handling patterns. While helpful for generating fixes, this information could potentially be misused by attackers to understand how to exploit similar vulnerabilities in other systems. Suggested fix: Consider removing or obfuscating the detailed implementation guidance from the system prompt to prevent potential exploitation of security best practices by malicious actors. *Scanner: code-review/security | CWE: CWE-200* <!-- compliance-fp:ad3e0e26fa6de1eaa49739c9116ff4d997d2556c3b309a24fe1442c5dbae231b -->
const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;
const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.
Review

[medium] Detailed Fix System Prompt

The fix system prompt has been expanded with multiple rules and constraints for generating code fixes. The complexity of these rules makes the prompt harder to understand and maintain, particularly with the detailed requirements about imports, comments, and code style preservation.

Suggested fix: Break down the complex rules into separate validation steps or extract them into a configuration structure that can be validated independently from the core prompt logic.

*Scanner: code-review/complexity | *

**[medium] Detailed Fix System Prompt** The fix system prompt has been expanded with multiple rules and constraints for generating code fixes. The complexity of these rules makes the prompt harder to understand and maintain, particularly with the detailed requirements about imports, comments, and code style preservation. Suggested fix: Break down the complex rules into separate validation steps or extract them into a configuration structure that can be validated independently from the core prompt logic. *Scanner: code-review/complexity | * <!-- compliance-fp:6471ac11f314793612f8522dff27088c5ee6a8c33dbd430fe80046c1711e70c3 -->
Review

[medium] Complex system prompt in fix generation may cause parsing issues

The FIX_SYSTEM_PROMPT in fixes.rs has been expanded with extensive language-specific guidance and strict rules. This complexity increases the risk of LLM responses not matching the expected format, particularly around import statements and code block formatting.

Suggested fix: Consider adding validation logic to verify that generated fixes match the expected format, especially for import statements and code block boundaries.

*Scanner: code-review/logic | *

**[medium] Complex system prompt in fix generation may cause parsing issues** The FIX_SYSTEM_PROMPT in fixes.rs has been expanded with extensive language-specific guidance and strict rules. This complexity increases the risk of LLM responses not matching the expected format, particularly around import statements and code block formatting. Suggested fix: Consider adding validation logic to verify that generated fixes match the expected format, especially for import statements and code block boundaries. *Scanner: code-review/logic | * <!-- compliance-fp:124a4c60eb5cca94888aae58252f773b97183ef33e03a8e10125077f0dd6502d -->
Rules:
Review

[high] Potential confusion in fix suggestion rules

The FIX_SYSTEM_PROMPT in fixes.rs introduces several complex rules that could confuse the LLM. Specifically, the rule 'If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment '// No fix needed: ' seems to contradict the instruction to return ONLY the fixed code snippet. This could lead to inconsistent outputs where the LLM might return both the original code and a comment instead of just the original code with a comment.

Suggested fix: Clarify the behavior for false positives. Either remove the rule about returning original code with comment, or modify the instruction to explicitly state that in case of false positives, only the original code should be returned with the comment appended as a single line comment.

*Scanner: code-review/logic | *

**[high] Potential confusion in fix suggestion rules** The FIX_SYSTEM_PROMPT in fixes.rs introduces several complex rules that could confuse the LLM. Specifically, the rule 'If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment '// No fix needed: <reason>' seems to contradict the instruction to return ONLY the fixed code snippet. This could lead to inconsistent outputs where the LLM might return both the original code and a comment instead of just the original code with a comment. Suggested fix: Clarify the behavior for false positives. Either remove the rule about returning original code with comment, or modify the instruction to explicitly state that in case of false positives, only the original code should be returned with the comment appended as a single line comment. *Scanner: code-review/logic | * <!-- compliance-fp:8e6b5cde7f474c449ccbee0ffdb3670222fa6d2322d3fd39f0a2b484a7324d28 -->
Review

[medium] Insecure Cryptography Guidance in Fix Prompts

The FIX_SYSTEM_PROMPT in fixes.rs contains guidance about using secure random number generators ('crypto/rand' in Go, 'SecureRandom' in Java/Kotlin). However, the guidance itself may inadvertently expose implementation details or create confusion if not carefully reviewed. More critically, if these prompts are used in contexts where the LLM might generate actual code, there's a risk of generating insecure cryptographic practices if the guidance isn't comprehensive or up-to-date.

Suggested fix: Review the cryptographic guidance provided in the system prompt to ensure it's accurate and complete. Consider adding explicit warnings against common cryptographic pitfalls like using weak algorithms or improper key management.

Scanner: code-review/security | CWE: CWE-327

**[medium] Insecure Cryptography Guidance in Fix Prompts** The FIX_SYSTEM_PROMPT in fixes.rs contains guidance about using secure random number generators ('crypto/rand' in Go, 'SecureRandom' in Java/Kotlin). However, the guidance itself may inadvertently expose implementation details or create confusion if not carefully reviewed. More critically, if these prompts are used in contexts where the LLM might generate actual code, there's a risk of generating insecure cryptographic practices if the guidance isn't comprehensive or up-to-date. Suggested fix: Review the cryptographic guidance provided in the system prompt to ensure it's accurate and complete. Consider adding explicit warnings against common cryptographic pitfalls like using weak algorithms or improper key management. *Scanner: code-review/security | CWE: CWE-327* <!-- compliance-fp:ecbddf6b6cf26651ada5dc544d7bb16f4bf46d17d3a40524abc0590305da7bdd -->
Review

[low] Missing type annotations in function signatures

The function signature in fixes.rs lacks explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity.

Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability.

*Scanner: code-review/convention | *

**[low] Missing type annotations in function signatures** The function signature in fixes.rs lacks explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity. Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability. *Scanner: code-review/convention | * <!-- compliance-fp:ef9b9cbd8839da7fcb9a95d89ef36c7c4f55670b54427730a4c3894ce8aeed17 -->
- The fix must be a drop-in replacement for the vulnerable code
Review

[low] Missing type annotations in function signatures

The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion.

Suggested fix: Add explicit return type annotations to function signatures for better clarity

*Scanner: code-review/convention | *

**[low] Missing type annotations in function signatures** The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion. Suggested fix: Add explicit return type annotations to function signatures for better clarity *Scanner: code-review/convention | * <!-- compliance-fp:64bd001d6114f35f07231dfbdd3b9e6bb8567d43eade19245dfbcf565a1c2d5f -->
- Preserve the original code's style, indentation, and naming conventions
- Add at most one brief inline comment on the changed line explaining the security fix
- If the fix requires importing a new module, include the import on a separate line prefixed with the language's comment syntax + "Add import: "
- Do not refactor, rename variables, or "improve" unrelated code
- If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment explaining why no fix is needed
Language-specific fix guidance:
- Rust: use `?` for error propagation, prefer `SecretString` for secrets, use parameterized queries with `sqlx`/`diesel`
- Python: use parameterized queries (never f-strings in SQL), use `secrets` module not `random`, use `subprocess.run([...])` list form, use `markupsafe.escape()` for HTML
Review

[medium] Inconsistent Error Handling in Fix Generation

The suggest_fix function in compliance-agent/src/llm/fixes.rs returns a Result<String, AgentError> but doesn't explicitly handle potential parsing or validation errors from the LLM response, which could lead to unhandled exceptions.

Suggested fix: Add explicit error handling for potential parsing/validation failures from the LLM response to ensure consistent error management.

*Scanner: code-review/convention | *

**[medium] Inconsistent Error Handling in Fix Generation** The `suggest_fix` function in `compliance-agent/src/llm/fixes.rs` returns a `Result<String, AgentError>` but doesn't explicitly handle potential parsing or validation errors from the LLM response, which could lead to unhandled exceptions. Suggested fix: Add explicit error handling for potential parsing/validation failures from the LLM response to ensure consistent error management. *Scanner: code-review/convention | * <!-- compliance-fp:a60d6afa0b89adef592638aa134966150689b332afeef9d6c034363f324d3886 -->
- Go: use `sql.Query` with `$1`/`?` placeholders, use `crypto/rand` not `math/rand`, use `html/template` not `text/template`, return errors don't panic
- Java/Kotlin: use `PreparedStatement` with `?` params, use `SecureRandom`, use `Jsoup.clean()` for HTML sanitization, use `@Valid` for input validation
- Ruby: use ActiveRecord parameterized finders, use `SecureRandom`, use `ERB::Util.html_escape`, use `strong_parameters`
- PHP: use PDO prepared statements with `:param` or `?`, use `random_bytes()`/`random_int()`, use `htmlspecialchars()` with `ENT_QUOTES`, use `password_hash(PASSWORD_BCRYPT)`
- C/C++: use `snprintf` not `sprintf`, use bounds-checked APIs, free resources in reverse allocation order, use `memset_s` for secret cleanup"#;
pub async fn suggest_fix(llm: &Arc<LlmClient>, finding: &Finding) -> Result<String, AgentError> {
let user_prompt = format!(

View File

@@ -1,69 +1,138 @@
// System prompts for multi-pass LLM code review.
Review

[medium] Missing CWE field in LOGIC_REVIEW_PROMPT response format

The LOGIC_REVIEW_PROMPT response format specifies JSON objects with 'cwe' field, but the actual example shows objects without 'cwe'. This inconsistency may cause parsing issues when processing responses.

Suggested fix: Either remove 'cwe' from the response format specification or add it to the example JSON object to maintain consistency.

*Scanner: code-review/convention | *

**[medium] Missing CWE field in LOGIC_REVIEW_PROMPT response format** The LOGIC_REVIEW_PROMPT response format specifies JSON objects with 'cwe' field, but the actual example shows objects without 'cwe'. This inconsistency may cause parsing issues when processing responses. Suggested fix: Either remove 'cwe' from the response format specification or add it to the example JSON object to maintain consistency. *Scanner: code-review/convention | * <!-- compliance-fp:85dca16ec8d5507004f3013b79242d85863cf224e8c6719737b278541b6f0b68 -->
Review

[medium] Overly complex boolean expressions in prompts

The updated prompts contain very long and complex boolean expressions that make the instructions hard to parse and maintain. These include lengthy 'Do NOT report:' sections with multiple conditions that are difficult to reason about.

Suggested fix: Break down complex boolean conditions into simpler logical components or use helper functions to clarify intent. Consider extracting the detailed exclusion rules into separate constants or documentation.

*Scanner: code-review/complexity | *

**[medium] Overly complex boolean expressions in prompts** The updated prompts contain very long and complex boolean expressions that make the instructions hard to parse and maintain. These include lengthy 'Do NOT report:' sections with multiple conditions that are difficult to reason about. Suggested fix: Break down complex boolean conditions into simpler logical components or use helper functions to clarify intent. Consider extracting the detailed exclusion rules into separate constants or documentation. *Scanner: code-review/complexity | * <!-- compliance-fp:6d4afa449464146b2b6814ba6072b8295456d1f77998b9e145f6863a519cc25c -->
Review

[medium] Insecure Prompt Engineering for LLM Review

The review prompts have been updated to be more restrictive in what they consider as issues. However, these prompts still rely on LLM interpretation which may introduce vulnerabilities through prompt injection or adversarial inputs. The prompts do not explicitly address how to handle malicious inputs or ensure robustness against prompt manipulation.

Suggested fix: Implement additional safeguards such as input sanitization, explicit validation of LLM outputs, and regular auditing of prompt effectiveness against adversarial examples.

Scanner: code-review/security | CWE: CWE-94

**[medium] Insecure Prompt Engineering for LLM Review** The review prompts have been updated to be more restrictive in what they consider as issues. However, these prompts still rely on LLM interpretation which may introduce vulnerabilities through prompt injection or adversarial inputs. The prompts do not explicitly address how to handle malicious inputs or ensure robustness against prompt manipulation. Suggested fix: Implement additional safeguards such as input sanitization, explicit validation of LLM outputs, and regular auditing of prompt effectiveness against adversarial examples. *Scanner: code-review/security | CWE: CWE-94* <!-- compliance-fp:ee9367bc5d57078658d2011e04ce3bb2e8ecfcb72865b1c37c60b2b01cac0048 -->
Review

[high] Inconsistent severity levels between prompts

The LOGIC_REVIEW_PROMPT uses severity levels 'high', 'medium', 'low' while SECURITY_REVIEW_PROMPT uses 'critical', 'high', 'medium'. This inconsistency could cause issues in downstream processing that expects uniform severity levels across all review types.

Suggested fix: Standardize severity levels across all prompts to use consistent labels ('critical', 'high', 'medium', 'low') or ensure downstream systems handle both formats appropriately.

*Scanner: code-review/logic | *

**[high] Inconsistent severity levels between prompts** The LOGIC_REVIEW_PROMPT uses severity levels 'high', 'medium', 'low' while SECURITY_REVIEW_PROMPT uses 'critical', 'high', 'medium'. This inconsistency could cause issues in downstream processing that expects uniform severity levels across all review types. Suggested fix: Standardize severity levels across all prompts to use consistent labels ('critical', 'high', 'medium', 'low') or ensure downstream systems handle both formats appropriately. *Scanner: code-review/logic | * <!-- compliance-fp:0b118fb7a69b8cd4fa191d0073f0cf74defa8bf62874a8ba6a317fb948b4cd92 -->
Review

[high] Potential Command Injection Vulnerability

The SECURITY_REVIEW_PROMPT includes a check for command injection vulnerabilities, but the prompt itself doesn't contain any actual code that could be vulnerable. However, if this file were to be used in a context where user input is passed directly to shell commands, it could introduce a command injection vulnerability.

Suggested fix: Ensure that any user input passed to shell commands is properly sanitized or escaped before being executed.

Scanner: code-review/security | CWE: CWE-78

**[high] Potential Command Injection Vulnerability** The SECURITY_REVIEW_PROMPT includes a check for command injection vulnerabilities, but the prompt itself doesn't contain any actual code that could be vulnerable. However, if this file were to be used in a context where user input is passed directly to shell commands, it could introduce a command injection vulnerability. Suggested fix: Ensure that any user input passed to shell commands is properly sanitized or escaped before being executed. *Scanner: code-review/security | CWE: CWE-78* <!-- compliance-fp:11678dd2e3d7146777e48204f1a6988b1f1ba1acbbfd0a7ad46f0997b8209c56 -->
Review

[medium] Hardcoded Credentials in Prompt

The SECURITY_REVIEW_PROMPT mentions 'hardcoded credentials' as a potential vulnerability, but there are no hardcoded credentials present in the prompt itself. This is a good practice to highlight, but the prompt should also include guidance on how to detect such issues in code.

Suggested fix: Add a note in the prompt to explicitly look for hardcoded credentials in the code being reviewed, especially in configuration files or environment variables.

Scanner: code-review/security | CWE: CWE-798

**[medium] Hardcoded Credentials in Prompt** The SECURITY_REVIEW_PROMPT mentions 'hardcoded credentials' as a potential vulnerability, but there are no hardcoded credentials present in the prompt itself. This is a good practice to highlight, but the prompt should also include guidance on how to detect such issues in code. Suggested fix: Add a note in the prompt to explicitly look for hardcoded credentials in the code being reviewed, especially in configuration files or environment variables. *Scanner: code-review/security | CWE: CWE-798* <!-- compliance-fp:c2b1053095618393dbe125552eb46d269253218ec89d04acbd2f5b9e3a5a9c1b -->
// Each pass focuses on a different aspect to avoid overloading a single prompt.
pub const LOGIC_REVIEW_PROMPT: &str = r#"You are a senior software engineer reviewing code changes. Focus ONLY on logic and correctness issues.
pub const LOGIC_REVIEW_PROMPT: &str = r#"You are a senior software engineer reviewing a code diff. Report ONLY genuine logic bugs that would cause incorrect behavior at runtime.
Look for:
- Off-by-one errors, wrong comparisons, missing edge cases
- Incorrect control flow (unreachable code, missing returns, wrong loop conditions)
- Race conditions or concurrency bugs
- Resource leaks (unclosed handles, missing cleanup)
- Wrong variable used (copy-paste errors)
- Incorrect error handling (swallowed errors, wrong error type)
Report:
- Off-by-one errors, wrong comparisons, missing edge cases that cause wrong results
- Incorrect control flow that produces wrong output (not style preferences)
- Actual race conditions with concrete shared-state mutation (not theoretical ones)
- Resource leaks where cleanup is truly missing (not just "could be improved")
- Wrong variable used (copy-paste errors) — must be provably wrong, not just suspicious
- Swallowed errors that silently hide failures in a way that matters
Ignore: style, naming, formatting, documentation, minor improvements.
Do NOT report:
- Style, naming, formatting, documentation, or code organization preferences
Review

[medium] Missing CWE field in logic review prompt response format

The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues.

Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently

*Scanner: code-review/convention | *

**[medium] Missing CWE field in logic review prompt response format** The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues. Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently *Scanner: code-review/convention | * <!-- compliance-fp:b37f6efba86e1be106cbe85fac06df3b1221cd52de446c1a0fe625923929c1e7 -->
Review

[medium] Missing CWE field in logic review prompt response format

The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues.

Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently

*Scanner: code-review/convention | *

**[medium] Missing CWE field in logic review prompt response format** The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues. Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently *Scanner: code-review/convention | * <!-- compliance-fp:b37f6efba86e1be106cbe85fac06df3b1221cd52de446c1a0fe625923929c1e7 -->
- Theoretical issues without a concrete triggering scenario
- "Potential" problems that require assumptions not supported by the visible code
Review

[medium] Overly complex boolean expressions in security review prompt

The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities.

Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions.

*Scanner: code-review/complexity | *

**[medium] Overly complex boolean expressions in security review prompt** The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities. Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions. *Scanner: code-review/complexity | * <!-- compliance-fp:759b111f257dd8377e94632d4f69aa221c87aa6b34b9897d4893b5b14d72e4d2 -->
Review

[medium] Overly complex boolean expressions in security review prompt

The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities.

Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions.

*Scanner: code-review/complexity | *

**[medium] Overly complex boolean expressions in security review prompt** The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities. Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions. *Scanner: code-review/complexity | * <!-- compliance-fp:759b111f257dd8377e94632d4f69aa221c87aa6b34b9897d4893b5b14d72e4d2 -->
- Complexity or function length — that's a separate review pass
For each issue found, respond with a JSON array:
Language-idiomatic patterns that are NOT bugs (do not flag these):
- Rust: `||`/`&&` short-circuit evaluation, variable shadowing, `let` rebinding, `clone()`, `impl` blocks, `match` arms with guards, `?` operator chaining, `unsafe` blocks with safety comments
- Python: duck typing, EAFP pattern (try/except vs check-first), `*args`/`**kwargs`, walrus operator `:=`, truthiness checks on containers, bare `except:` in top-level handlers
- Go: multiple return values for errors, `if err != nil` patterns, goroutine + channel patterns, blank identifier `_`, named returns, `defer` for cleanup, `init()` functions
- Java/Kotlin: checked exception patterns, method overloading, `Optional` vs null checks, Kotlin `?.` safe calls, `!!` non-null assertions in tests, `when` exhaustive matching, companion objects, `lateinit`
- Ruby: monkey patching in libraries, method_missing, blocks/procs/lambdas, `rescue => e` patterns, `send`/`respond_to?` metaprogramming, `nil` checks via `&.` safe navigation
- PHP: loose comparisons with `==` (only flag if `===` was clearly intended), `@` error suppression in legacy code, `isset()`/`empty()` patterns, magic methods (`__get`, `__call`), array functions as callbacks
- C/C++: RAII patterns, move semantics, `const_cast`/`static_cast` in appropriate contexts, macro usage for platform compat, pointer arithmetic in low-level code, `goto` for cleanup in C
Review

[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT

The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability.

Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent.

*Scanner: code-review/logic | *

**[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT** The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability. Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent. *Scanner: code-review/logic | * <!-- compliance-fp:94ef4ae9b361a8f1529f1b4acf5adfa57887f45d95f7cb3e718e7193a4a87d92 -->
Review

[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT

The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability.

Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent.

*Scanner: code-review/logic | *

**[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT** The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability. Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent. *Scanner: code-review/logic | * <!-- compliance-fp:94ef4ae9b361a8f1529f1b4acf5adfa57887f45d95f7cb3e718e7193a4a87d92 -->
Severity guide:
- high: Will cause incorrect behavior in normal usage
- medium: Will cause incorrect behavior in edge cases
- low: Minor correctness concern with limited blast radius
Prefer returning [] over reporting low-confidence guesses. A false positive wastes more developer time than a missed low-severity issue.
Respond with a JSON array (no markdown fences):
[{"title": "...", "description": "...", "severity": "high|medium|low", "file": "...", "line": N, "suggestion": "..."}]
Review

[medium] Hardcoded Credentials in Prompt

The security review prompt contains a reference to 'hardcoded keys' which could be interpreted as a directive to look for hardcoded credentials in code changes. While the prompt itself doesn't contain actual credentials, it could lead to false positives if developers misunderstand the intent.

Suggested fix: Clarify that hardcoded credentials should not be flagged in the prompt itself, and ensure the prompt explicitly excludes checking for hardcoded credentials in the code being reviewed.

Scanner: code-review/security | CWE: CWE-798

**[medium] Hardcoded Credentials in Prompt** The security review prompt contains a reference to 'hardcoded keys' which could be interpreted as a directive to look for hardcoded credentials in code changes. While the prompt itself doesn't contain actual credentials, it could lead to false positives if developers misunderstand the intent. Suggested fix: Clarify that hardcoded credentials should not be flagged in the prompt itself, and ensure the prompt explicitly excludes checking for hardcoded credentials in the code being reviewed. *Scanner: code-review/security | CWE: CWE-798* <!-- compliance-fp:5fd8b9c43ab51115f65611c82aa499cb7d4b4d83580af1d9157f1ae9b29efa33 -->
If no issues found, respond with: []"#;
pub const SECURITY_REVIEW_PROMPT: &str = r#"You are a security engineer reviewing code changes. Focus ONLY on security vulnerabilities.
pub const SECURITY_REVIEW_PROMPT: &str = r#"You are a security engineer reviewing a code diff. Report ONLY exploitable security vulnerabilities with a realistic attack scenario.
Look for:
- Injection vulnerabilities (SQL, command, XSS, template injection)
- Authentication/authorization bypasses
- Sensitive data exposure (logging secrets, hardcoded credentials)
- Insecure cryptography (weak algorithms, predictable randomness)
- Path traversal, SSRF, open redirects
- Unsafe deserialization
- Missing input validation at trust boundaries
Report:
- Injection vulnerabilities (SQL, command, XSS, template) where untrusted input reaches a sink
- Authentication/authorization bypasses with a concrete exploit path
- Sensitive data exposure: secrets in code, credentials in logs, PII leaks
- Insecure cryptography: weak algorithms, predictable randomness, hardcoded keys
- Path traversal, SSRF, open redirects — only where user input reaches the vulnerable API
- Unsafe deserialization of untrusted data
- Missing input validation at EXTERNAL trust boundaries (user input, API responses)
Ignore: code style, performance, general quality.
Do NOT report:
- Internal code that only handles trusted/validated data
- Hash functions used for non-security purposes (dedup fingerprints, cache keys, content addressing)
- Logging of non-sensitive operational data (finding titles, counts, performance metrics)
- "Information disclosure" for data that is already public or user-facing
- Code style, performance, or general quality issues
- Missing validation on internal function parameters (trust the caller within the same module/crate/package)
- Theoretical attacks that require preconditions not present in the code
For each issue found, respond with a JSON array:
Language-specific patterns that are NOT vulnerabilities (do not flag these):
- Python: `pickle` used on trusted internal data, `eval()`/`exec()` on hardcoded strings, `subprocess` with hardcoded commands, Django `mark_safe()` on static content, `assert` in non-security contexts
- Go: `crypto/rand` is secure (don't confuse with `math/rand`), `sql.DB` with parameterized queries is safe, `http.ListenAndServe` without TLS in dev/internal, error strings in responses (Go convention)
- Java/Kotlin: Spring Security annotations are sufficient auth checks, `@Transactional` provides atomicity, JPA parameterized queries are safe, Kotlin `require()`/`check()` are assertion patterns not vulnerabilities
- Ruby: Rails `params.permit()` is input validation, `render html:` with `html_safe` on generated content, ActiveRecord parameterized finders are safe, Devise/Warden patterns for auth
- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars()` is XSS mitigation, Symfony security voters are auth checks, `password_hash()`/`password_verify()` are correct bcrypt usage
- C/C++: `strncpy`/`snprintf` are bounds-checked (vs `strcpy`/`sprintf`), smart pointers manage memory, RAII handles cleanup, `static_assert` is compile-time only, OpenSSL with proper context setup
- Rust: `sha2`/`blake3` for fingerprinting is not "weak crypto", `unsafe` with documented invariants, `secrecy::SecretString` properly handles secrets
Severity guide:
- critical: Remote code execution, auth bypass, or data breach with no preconditions
- high: Exploitable vulnerability requiring minimal preconditions
- medium: Vulnerability requiring specific conditions or limited impact
Prefer returning [] over reporting speculative vulnerabilities. Every false positive erodes trust in the scanner.
Review

[medium] Inconsistent error handling patterns in convention review prompt

The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation.

Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.)

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling patterns in convention review prompt** The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation. Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.) *Scanner: code-review/convention | * <!-- compliance-fp:1041567fe344247a6ee09a8c0bd2a5b32229254add8d5640c581ddd28b7751bf -->
Review

[medium] Deeply nested control flow in convention review prompt

The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt.

Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns.

*Scanner: code-review/complexity | *

**[medium] Deeply nested control flow in convention review prompt** The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt. Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns. *Scanner: code-review/complexity | * <!-- compliance-fp:57742a3004140d117c80d9271f614c3b80530260ba24e59a09e8f920daabc4f5 -->
Review

[medium] Deeply nested control flow in convention review prompt

The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt.

Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns.

*Scanner: code-review/complexity | *

**[medium] Deeply nested control flow in convention review prompt** The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt. Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns. *Scanner: code-review/complexity | * <!-- compliance-fp:57742a3004140d117c80d9271f614c3b80530260ba24e59a09e8f920daabc4f5 -->
Review

[medium] Inconsistent error handling patterns in convention review prompt

The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation.

Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.)

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling patterns in convention review prompt** The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation. Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.) *Scanner: code-review/convention | * <!-- compliance-fp:1041567fe344247a6ee09a8c0bd2a5b32229254add8d5640c581ddd28b7751bf -->
Respond with a JSON array (no markdown fences):
[{"title": "...", "description": "...", "severity": "critical|high|medium", "file": "...", "line": N, "cwe": "CWE-XXX", "suggestion": "..."}]
If no issues found, respond with: []"#;
pub const CONVENTION_REVIEW_PROMPT: &str = r#"You are a code reviewer checking adherence to project conventions. Focus ONLY on patterns that indicate likely bugs or maintenance problems.
pub const CONVENTION_REVIEW_PROMPT: &str = r#"You are a code reviewer checking for convention violations that indicate likely bugs. Report ONLY deviations from the project's visible patterns that could cause real problems.
Look for:
- Inconsistent error handling patterns within the same module
- Public API that doesn't follow the project's established patterns
- Missing or incorrect type annotations that could cause runtime issues
- Anti-patterns specific to the language (e.g. unwrap in Rust library code, any in TypeScript)
Report:
- Inconsistent error handling within the same module where the inconsistency could hide failures
- Public API that breaks the module's established contract (not just different style)
- Anti-patterns that are bugs in this language: e.g. `unwrap()` in Rust library code where the CI enforces `clippy::unwrap_used`, `any` defeating TypeScript's type system
Do NOT report: minor style preferences, documentation gaps, formatting.
Only report issues with HIGH confidence that they deviate from the visible codebase conventions.
Do NOT report:
- Style preferences, formatting, naming conventions, or documentation
- Code organization suggestions ("this function should be split")
- Patterns that are valid in the language even if you'd write them differently
- "Missing type annotations" unless the code literally won't compile or causes a type inference bug
For each issue found, respond with a JSON array:
Language-specific patterns that are conventional (do not flag these):
- Rust: variable shadowing, `||`/`&&` short-circuit, `let` rebinding, builder patterns, `clone()`, `From`/`Into` impl chains, `#[allow(...)]` attributes
- Python: `**kwargs` forwarding, `@property` setters, `__dunder__` methods, list comprehensions with conditions, `if TYPE_CHECKING` imports, `noqa` comments
- Go: stuttering names (`http.HTTPClient`) discouraged but not a bug, `context.Context` as first param, init() functions, `//nolint` directives, returning concrete types vs interfaces in internal code
- Java/Kotlin: builder pattern boilerplate, Lombok annotations (`@Data`, `@Builder`), Kotlin data classes, `companion object` factories, `@Suppress` annotations, checked exception wrapping
- Ruby: `attr_accessor` usage, `Enumerable` mixin patterns, `module_function`, `class << self` syntax, DSL blocks (Rake, RSpec, Sinatra routes)
- PHP: `__construct` with property promotion, Laravel facades, static factory methods, nullable types with `?`, attribute syntax `#[...]`
- C/C++: header guards vs `#pragma once`, forward declarations, `const` correctness patterns, template specialization, `auto` type deduction
Severity guide:
- medium: Convention violation that will likely cause a bug or maintenance problem
- low: Convention violation that is a minor concern
Return at most 3 findings. Prefer [] over marginal findings.
Respond with a JSON array (no markdown fences):
[{"title": "...", "description": "...", "severity": "medium|low", "file": "...", "line": N, "suggestion": "..."}]
If no issues found, respond with: []"#;
pub const COMPLEXITY_REVIEW_PROMPT: &str = r#"You are reviewing code changes for excessive complexity that could lead to bugs.
pub const COMPLEXITY_REVIEW_PROMPT: &str = r#"You are reviewing code changes for complexity that is likely to cause bugs. Report ONLY complexity that makes the code demonstrably harder to reason about.
Look for:
- Functions over 50 lines that should be decomposed
- Deeply nested control flow (4+ levels)
- Complex boolean expressions that are hard to reason about
- Functions with 5+ parameters
- Code duplication within the changed files
Report:
- Functions over 80 lines with multiple interleaved responsibilities (not just long)
- Deeply nested control flow (5+ levels) where flattening would prevent bugs
- Complex boolean expressions that a reader would likely misinterpret
Only report complexity issues that are HIGH risk for future bugs. Ignore acceptable complexity in configuration, CLI argument parsing, or generated code.
Do NOT report:
- Functions that are long but linear and easy to follow
- Acceptable complexity: configuration setup, CLI parsing, test helpers, builder patterns
- Code that is complex because the problem is complex — only report if restructuring would reduce bug risk
- "This function does multiple things" unless you can identify a specific bug risk from the coupling
- Suggestions that would just move complexity elsewhere without reducing it
For each issue found, respond with a JSON array:
Severity guide:
- medium: Complexity that has a concrete risk of causing bugs during future changes
- low: Complexity that makes review harder but is unlikely to cause bugs
Return at most 2 findings. Prefer [] over reporting complexity that is justified.
Respond with a JSON array (no markdown fences):
[{"title": "...", "description": "...", "severity": "medium|low", "file": "...", "line": N, "suggestion": "..."}]
If no issues found, respond with: []"#;

View File

@@ -8,22 +8,46 @@ use crate::pipeline::orchestrator::GraphContext;
/// Maximum number of findings to include in a single LLM triage call.
const TRIAGE_CHUNK_SIZE: usize = 30;
const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.
const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.
Review

[high] Insecure Prompt Engineering for LLM Triage

The triage system prompt contains detailed instructions that could potentially be manipulated by adversarial inputs to influence the LLM's behavior. Specifically, the prompt includes explicit actions and confidence scoring rules that might be exploited through prompt injection techniques to manipulate the triage decisions.

Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence prompt construction. Consider using a more robust prompt engineering framework that separates instruction logic from user data to prevent prompt injection attacks.

Scanner: code-review/security | CWE: CWE-94

**[high] Insecure Prompt Engineering for LLM Triage** The triage system prompt contains detailed instructions that could potentially be manipulated by adversarial inputs to influence the LLM's behavior. Specifically, the prompt includes explicit actions and confidence scoring rules that might be exploited through prompt injection techniques to manipulate the triage decisions. Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence prompt construction. Consider using a more robust prompt engineering framework that separates instruction logic from user data to prevent prompt injection attacks. *Scanner: code-review/security | CWE: CWE-94* <!-- compliance-fp:b03c0c39c70dd8912b5bce4d4e1bb2fbe9174b725b51f384107c447103cf4bf1 -->
Review

[high] Insecure LLM System Prompt Configuration

The LLM system prompt contains extensive logic for determining when to dismiss findings, including specific language patterns and false positive rules. While this may reduce noise, it introduces potential for manipulation through prompt injection attacks if the LLM is not properly sandboxed or if the prompt is constructed in a way that allows adversarial inputs to influence the decision-making process.

Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence the LLM prompt. Consider using a more robust prompt engineering framework that separates business logic from the core prompt structure.

Scanner: code-review/security | CWE: CWE-94

**[high] Insecure LLM System Prompt Configuration** The LLM system prompt contains extensive logic for determining when to dismiss findings, including specific language patterns and false positive rules. While this may reduce noise, it introduces potential for manipulation through prompt injection attacks if the LLM is not properly sandboxed or if the prompt is constructed in a way that allows adversarial inputs to influence the decision-making process. Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence the LLM prompt. Consider using a more robust prompt engineering framework that separates business logic from the core prompt structure. *Scanner: code-review/security | CWE: CWE-94* <!-- compliance-fp:167bb9b3328b6df542da473001e469cd1b7c487629c47abafb815b965e763fcd -->
Review

[medium] Overly Complex System Prompt in Triage Module

The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation, examples, and rules. This makes it difficult to maintain and understand the core triage logic. The prompt contains multiple sections with different types of information (actions, dismissal criteria, confirmation rules, confidence scoring, language-specific false positives) that could be better organized.

Suggested fix: Break down the system prompt into smaller, focused prompts or extract the detailed rules into separate constants or documentation files. Consider creating a structured format for the rules that can be programmatically validated.

*Scanner: code-review/complexity | *

**[medium] Overly Complex System Prompt in Triage Module** The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation, examples, and rules. This makes it difficult to maintain and understand the core triage logic. The prompt contains multiple sections with different types of information (actions, dismissal criteria, confirmation rules, confidence scoring, language-specific false positives) that could be better organized. Suggested fix: Break down the system prompt into smaller, focused prompts or extract the detailed rules into separate constants or documentation files. Consider creating a structured format for the rules that can be programmatically validated. *Scanner: code-review/complexity | * <!-- compliance-fp:679a9fcaf7f55f74feec024de7298e3e13dc196769e332ebe352bb2349083553 -->
Review

[medium] Potential Information Disclosure in LLM Prompt

The LLM system prompt includes detailed information about how to identify and dismiss various types of false positives, including specific examples from different programming languages. This information could potentially be leveraged by an attacker to craft inputs that would cause the LLM to incorrectly classify security findings.

Suggested fix: Limit the amount of information provided in system prompts that could be used to manipulate LLM behavior. Consider removing or generalizing the specific examples of false positive patterns.

Scanner: code-review/security | CWE: CWE-200

**[medium] Potential Information Disclosure in LLM Prompt** The LLM system prompt includes detailed information about how to identify and dismiss various types of false positives, including specific examples from different programming languages. This information could potentially be leveraged by an attacker to craft inputs that would cause the LLM to incorrectly classify security findings. Suggested fix: Limit the amount of information provided in system prompts that could be used to manipulate LLM behavior. Consider removing or generalizing the specific examples of false positive patterns. *Scanner: code-review/security | CWE: CWE-200* <!-- compliance-fp:778d570a4066e6f3dabba2a28875aa44a98f3a0a1683306bbca2b5debf5555ad -->
Review

[medium] Insecure LLM System Prompt

The LLM system prompt contains detailed instructions for filtering security findings, including specific dismissal criteria and false positive patterns. While this may be intended to reduce noise, it could potentially be exploited to manipulate the LLM's behavior through prompt injection or by crafting inputs designed to trigger specific dismissal patterns.

Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence LLM prompts. Consider using a more robust prompt engineering framework that prevents prompt injection attacks.

Scanner: code-review/security | CWE: CWE-94

**[medium] Insecure LLM System Prompt** The LLM system prompt contains detailed instructions for filtering security findings, including specific dismissal criteria and false positive patterns. While this may be intended to reduce noise, it could potentially be exploited to manipulate the LLM's behavior through prompt injection or by crafting inputs designed to trigger specific dismissal patterns. Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence LLM prompts. Consider using a more robust prompt engineering framework that prevents prompt injection attacks. *Scanner: code-review/security | CWE: CWE-94* <!-- compliance-fp:49d3c44eaae58e248eb1e89b33677b1eb794eb23b417c8f516f25307e028e435 -->
Review

[medium] Overly Complex System Prompt in Triage Module

The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation and rules, making it difficult to maintain and understand. The prompt contains multiple sections with complex nested logic that would benefit from being broken down into smaller, more manageable components.

Suggested fix: Break the system prompt into smaller, focused prompts or extract the detailed rules into a separate documentation file that can be referenced. Consider creating helper functions or constants for the different rule categories (dismiss conditions, confirm criteria, confidence scoring).

*Scanner: code-review/complexity | *

**[medium] Overly Complex System Prompt in Triage Module** The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation and rules, making it difficult to maintain and understand. The prompt contains multiple sections with complex nested logic that would benefit from being broken down into smaller, more manageable components. Suggested fix: Break the system prompt into smaller, focused prompts or extract the detailed rules into a separate documentation file that can be referenced. Consider creating helper functions or constants for the different rule categories (dismiss conditions, confirm criteria, confidence scoring). *Scanner: code-review/complexity | * <!-- compliance-fp:679a9fcaf7f55f74feec024de7298e3e13dc196769e332ebe352bb2349083553 -->
Review

[medium] Complex boolean expression in triage logic

The triage system prompt contains complex conditional logic for determining when to dismiss findings. The prompt has multiple nested conditions that make it difficult to reason about all dismissal criteria at once.

Suggested fix: Break down the dismissal criteria into separate bullet points with clear headings, and consider extracting the logic into a structured decision tree or configuration-based approach for better maintainability.

*Scanner: code-review/complexity | *

**[medium] Complex boolean expression in triage logic** The triage system prompt contains complex conditional logic for determining when to dismiss findings. The prompt has multiple nested conditions that make it difficult to reason about all dismissal criteria at once. Suggested fix: Break down the dismissal criteria into separate bullet points with clear headings, and consider extracting the logic into a structured decision tree or configuration-based approach for better maintainability. *Scanner: code-review/complexity | * <!-- compliance-fp:44019fe3bb756da677eedce054d35074be3b8194480782043269ac41c007e684 -->
Actions:
- "confirm": The finding is a true positive at the reported severity. Keep as-is.
- "downgrade": The finding is real but over-reported. Lower severity recommended.
- "upgrade": The finding is under-reported. Higher severity recommended.
- "dismiss": The finding is a false positive. Should be removed.
- "confirm": True positive with real impact. Keep severity as-is.
- "downgrade": Real issue but over-reported severity. Lower it.
Review

[medium] Inconsistent JSON response format in triage prompt

The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the original prompt included them. This inconsistency could confuse developers implementing the LLM interface or lead to parsing errors if the system expects fenced JSON.

Suggested fix: Ensure consistent formatting in examples across all prompts. Either remove all markdown fences or keep them consistently.

*Scanner: code-review/convention | *

**[medium] Inconsistent JSON response format in triage prompt** The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the original prompt included them. This inconsistency could confuse developers implementing the LLM interface or lead to parsing errors if the system expects fenced JSON. Suggested fix: Ensure consistent formatting in examples across all prompts. Either remove all markdown fences or keep them consistently. *Scanner: code-review/convention | * <!-- compliance-fp:86a6e482fc084392737805b0bf23ce8ce470ae59bb493eeaac1bccbe2f175508 -->
- "upgrade": Under-reported — higher severity warranted.
- "dismiss": False positive, not exploitable, or not actionable. Remove it.
Review

[high] Incorrect JSON response format in triage system prompt

The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the actual implementation may still expect or produce fenced JSON. This could cause parsing failures when the LLM returns properly formatted JSON without the markdown fences.

Suggested fix: Ensure the LLM client parsing logic correctly handles both fenced and unfenced JSON responses, or update the prompt to explicitly mention that markdown fences should be omitted from the actual response.

*Scanner: code-review/logic | *

**[high] Incorrect JSON response format in triage system prompt** The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the actual implementation may still expect or produce fenced JSON. This could cause parsing failures when the LLM returns properly formatted JSON without the markdown fences. Suggested fix: Ensure the LLM client parsing logic correctly handles both fenced and unfenced JSON responses, or update the prompt to explicitly mention that markdown fences should be omitted from the actual response. *Scanner: code-review/logic | * <!-- compliance-fp:4a46aba3b15a83b61730e3f707a542850eaf202ff158e71004fcc9c024f82dff -->
Consider:
- Is the code in a test, example, or generated file? (lower confidence for test code)
- Does the surrounding code context confirm or refute the finding?
- Is the finding actionable by a developer?
- Would a real attacker be able to exploit this?
Dismiss when:
- The scanner flagged a language idiom as a bug (see examples below)
- The finding is in test/example/generated/vendored code
- The "vulnerability" requires preconditions that don't exist in the code
- The finding is about code style, complexity, or theoretical concerns rather than actual bugs
- A hash function is used for non-security purposes (dedup, caching, content addressing)
- Internal logging of non-sensitive operational data is flagged as "information disclosure"
Review

[medium] Incorrect JSON response format in triage system prompt

The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting.

Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity.

*Scanner: code-review/logic | *

**[medium] Incorrect JSON response format in triage system prompt** The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting. Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity. *Scanner: code-review/logic | * <!-- compliance-fp:306961418f1c49dd102048ec3283e2740ead0e804ad956e24abf9dd61de0a9c6 -->
Review

[medium] Incorrect JSON response format in triage system prompt

The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting.

Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity.

*Scanner: code-review/logic | *

**[medium] Incorrect JSON response format in triage system prompt** The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting. Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity. *Scanner: code-review/logic | * <!-- compliance-fp:306961418f1c49dd102048ec3283e2740ead0e804ad956e24abf9dd61de0a9c6 -->
- The finding duplicates another finding already in the list
- Framework-provided security is already in place (e.g. ORM parameterized queries, CSRF middleware, auth decorators)
Respond with a JSON array, one entry per finding in the same order they were presented:
[{"id": "<fingerprint>", "action": "confirm|downgrade|upgrade|dismiss", "confidence": 0-10, "rationale": "brief explanation", "remediation": "optional fix suggestion"}, ...]"#;
Common false positive patterns by language (dismiss these):
- Rust: short-circuit `||`/`&&`, variable shadowing, `clone()`, `unsafe` with safety docs, `sha2` for fingerprinting
- Python: EAFP try/except, `subprocess` with hardcoded args, `pickle` on trusted data, Django `mark_safe` on static content
- Go: `if err != nil` is not "swallowed error", `crypto/rand` is secure, returning errors is not "information disclosure"
- Java/Kotlin: Spring Security annotations are valid auth, JPA parameterized queries are safe, Kotlin `!!` in tests is fine
- Ruby: Rails `params.permit` is validation, ActiveRecord finders are parameterized, `html_safe` on generated content
- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars` is XSS mitigation
Review

[medium] Inconsistent error handling pattern in triage_findings

The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box>, but there's no explicit error handling for parsing failures from the LLM response.

Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function.

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling pattern in triage_findings** The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box<dyn std::error::Error>>, but there's no explicit error handling for parsing failures from the LLM response. Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function. *Scanner: code-review/convention | * <!-- compliance-fp:31fde056984dd98fa386b3c998ea14fee8ffdb9fe8c682a72efa1135336d6b28 -->
Review

[medium] Inconsistent error handling pattern in triage_findings

The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box>, but there's no explicit error handling for parsing failures from the LLM response.

Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function.

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling pattern in triage_findings** The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box<dyn std::error::Error>>, but there's no explicit error handling for parsing failures from the LLM response. Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function. *Scanner: code-review/convention | * <!-- compliance-fp:31fde056984dd98fa386b3c998ea14fee8ffdb9fe8c682a72efa1135336d6b28 -->
- C/C++: `strncpy`/`snprintf` are bounds-checked, smart pointers manage memory, RAII handles cleanup
Confirm only when:
- You can describe a concrete scenario where the bug manifests or the vulnerability is exploitable
- The fix is actionable (developer can change specific code to resolve it)
Review

[medium] Potential Command Injection Vulnerability via LLM Input

The triage function processes security findings which could potentially contain untrusted data. If the LLM receives input containing malicious payloads designed to manipulate its behavior or extract information, there's a risk of command injection or other vulnerabilities depending on how the LLM results are processed downstream.

Suggested fix: Ensure that all findings passed to the LLM are sanitized before processing. Implement strict validation of the JSON output from the LLM to prevent execution of unexpected commands or data exfiltration attempts.

Scanner: code-review/security | CWE: CWE-78

**[medium] Potential Command Injection Vulnerability via LLM Input** The triage function processes security findings which could potentially contain untrusted data. If the LLM receives input containing malicious payloads designed to manipulate its behavior or extract information, there's a risk of command injection or other vulnerabilities depending on how the LLM results are processed downstream. Suggested fix: Ensure that all findings passed to the LLM are sanitized before processing. Implement strict validation of the JSON output from the LLM to prevent execution of unexpected commands or data exfiltration attempts. *Scanner: code-review/security | CWE: CWE-78* <!-- compliance-fp:378a09f19aff71cd879a1ed06f8a6887f72b7010e6111d2e451f7f53612f92e8 -->
- The finding is in production code that handles external input or sensitive data
Confidence scoring (0-10):
- 8-10: Certain true positive with clear exploit/bug scenario
- 5-7: Likely true positive, some assumptions required
- 3-4: Uncertain, needs manual review
- 0-2: Almost certainly a false positive
Respond with a JSON array, one entry per finding in the same order presented (no markdown fences):
[{"id": "<fingerprint>", "action": "confirm|downgrade|upgrade|dismiss", "confidence": 0-10, "rationale": "1-2 sentences", "remediation": "optional fix"}, ...]"#;
pub async fn triage_findings(
llm: &Arc<LlmClient>,

View File

@@ -314,6 +314,21 @@ impl PentestOrchestrator {
- For SPA apps: a 200 HTTP status does NOT mean the page is accessible — check the actual
Review

[medium] Missing type annotations in prompt builder

The PentestOrchestrator implementation lacks explicit return type annotations for methods, which could lead to type inference issues and make the API less predictable. The method signatures should include explicit return types to maintain consistency with the project's established patterns.

Suggested fix: Add explicit return type annotations to all methods in the PentestOrchestrator implementation to ensure consistent API behavior.

*Scanner: code-review/convention | *

**[medium] Missing type annotations in prompt builder** The PentestOrchestrator implementation lacks explicit return type annotations for methods, which could lead to type inference issues and make the API less predictable. The method signatures should include explicit return types to maintain consistency with the project's established patterns. Suggested fix: Add explicit return type annotations to all methods in the PentestOrchestrator implementation to ensure consistent API behavior. *Scanner: code-review/convention | * <!-- compliance-fp:85d1a327fd16e90077263e3e1728b43bcf3c2b308b8aa02179e87b20df382b19 -->
Review

[medium] Missing type annotations in prompt builder

The PentestOrchestrator struct has methods without explicit return type annotations, which makes the API less clear and could lead to inconsistencies in error handling patterns. The method signatures should include explicit return types to maintain consistency with the rest of the codebase.

Suggested fix: Add explicit return type annotations to all methods in PentestOrchestrator to ensure consistent API patterns throughout the module.

*Scanner: code-review/convention | *

**[medium] Missing type annotations in prompt builder** The PentestOrchestrator struct has methods without explicit return type annotations, which makes the API less clear and could lead to inconsistencies in error handling patterns. The method signatures should include explicit return types to maintain consistency with the rest of the codebase. Suggested fix: Add explicit return type annotations to all methods in PentestOrchestrator to ensure consistent API patterns throughout the module. *Scanner: code-review/convention | * <!-- compliance-fp:85d1a327fd16e90077263e3e1728b43bcf3c2b308b8aa02179e87b20df382b19 -->
page content with the browser tool to verify if it shows real data or a login redirect.
## Finding Quality Rules
Review

[medium] Missing type annotations in pentest prompt builder

The pentest prompt builder contains new rules for finding quality but lacks explicit type annotations for the new constants or functions that might handle these rules. This could lead to runtime issues if types are inferred incorrectly.

Suggested fix: Add explicit type annotations to any new variables or functions introduced in the pentest prompt builder to ensure correct type inference and prevent runtime errors.

*Scanner: code-review/convention | *

**[medium] Missing type annotations in pentest prompt builder** The pentest prompt builder contains new rules for finding quality but lacks explicit type annotations for the new constants or functions that might handle these rules. This could lead to runtime issues if types are inferred incorrectly. Suggested fix: Add explicit type annotations to any new variables or functions introduced in the pentest prompt builder to ensure correct type inference and prevent runtime errors. *Scanner: code-review/convention | * <!-- compliance-fp:d9ff328138a66d683351cec5808967184f5068a9bcc519c45528cc9261b54372 -->
Review

[medium] Potential Information Disclosure in Pentest Prompt

The pentest prompt includes detailed rules about finding quality and severity classification that could inadvertently expose internal testing methodologies or security assessment criteria. While not directly exposing secrets, this information could be valuable to attackers trying to understand or circumvent security testing processes.

Suggested fix: Review the prompt content to ensure no sensitive information about internal processes, methodologies, or security assessment criteria is exposed. Consider removing or generalizing the specific examples and rules that might reveal internal practices.

Scanner: code-review/security | CWE: CWE-200

**[medium] Potential Information Disclosure in Pentest Prompt** The pentest prompt includes detailed rules about finding quality and severity classification that could inadvertently expose internal testing methodologies or security assessment criteria. While not directly exposing secrets, this information could be valuable to attackers trying to understand or circumvent security testing processes. Suggested fix: Review the prompt content to ensure no sensitive information about internal processes, methodologies, or security assessment criteria is exposed. Consider removing or generalizing the specific examples and rules that might reveal internal practices. *Scanner: code-review/security | CWE: CWE-200* <!-- compliance-fp:8a7e7f404052c1931d5959d9a242a6acbeb9b4c01f958521eda348168a7584aa -->
- **Do not report the same issue twice.** If multiple tools detect the same missing header or
vulnerability on the same endpoint, report it ONCE with the most specific tool's output.
For example, if the recon tool and the header scanner both find missing HSTS, report it only
Review

[high] Potential off-by-one error in finding grouping logic

In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule.

Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them.

*Scanner: code-review/logic | *

**[high] Potential off-by-one error in finding grouping logic** In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule. Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them. *Scanner: code-review/logic | * <!-- compliance-fp:da6820991c6d7b1608e94644ae82b886e90638e682b7e61f2bc8c1332afb0156 -->
Review

[high] Potential off-by-one error in finding grouping logic

In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule.

Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them.

*Scanner: code-review/logic | *

**[high] Potential off-by-one error in finding grouping logic** In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule. Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them. *Scanner: code-review/logic | * <!-- compliance-fp:da6820991c6d7b1608e94644ae82b886e90638e682b7e61f2bc8c1332afb0156 -->
from the header scanner (more specific).
- **Group related findings.** Missing security headers on the same endpoint are ONE finding
("Missing security headers") listing all missing headers, not separate findings per header.
- **Severity must match real impact:**
- critical/high: Exploitable vulnerability (you can demonstrate the exploit)
- medium: Real misconfiguration with security implications but not directly exploitable
- low: Best-practice recommendation, defense-in-depth, or informational
Review

[medium] Inconsistent severity guidelines in pentest prompt

The pentest prompt introduces conflicting guidance about missing headers. It states 'Missing headers are medium at most' but then contradicts itself by saying 'missing CSP + confirmed XSS = high for CSP finding'. This inconsistency could lead to inconsistent triage decisions.

Suggested fix: Clarify the severity rules for missing headers to ensure consistency. Either remove the 'medium at most' rule or make it conditional on additional factors like exploitability.

*Scanner: code-review/logic | *

**[medium] Inconsistent severity guidelines in pentest prompt** The pentest prompt introduces conflicting guidance about missing headers. It states 'Missing headers are medium at most' but then contradicts itself by saying 'missing CSP + confirmed XSS = high for CSP finding'. This inconsistency could lead to inconsistent triage decisions. Suggested fix: Clarify the severity rules for missing headers to ensure consistency. Either remove the 'medium at most' rule or make it conditional on additional factors like exploitability. *Scanner: code-review/logic | * <!-- compliance-fp:aa77184a0f1c8db288cc566eaf0a6817c090b1cc806ab0a1eb30163c711154a3 -->
- **Missing headers are medium at most** unless you can demonstrate a concrete exploit enabled
by the missing header (e.g., missing CSP + confirmed XSS = high for CSP finding).
- Console.log in third-party/vendored JS (node_modules, minified libraries) is informational only.
## Important
- This is an authorized penetration test. All testing is permitted within the target scope.
- Respect the rate limit of {rate_limit} requests per second.