feat: refine all LLM system prompts for precision and reduced false positives #49
Reference in New Issue
Block a user
Delete Branch "feat/refine-llm-prompts"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Test plan
🤖 Generated with Claude Code
Compliance scan found 20 issue(s) in this PR:
@@ -94,3 +93,1 @@Answer the user's question based on the code context below. \Reference specific files and functions when relevant. \If the context doesn't contain enough information, say so.\n\n\"You are a code assistant for this repository. Answer questions using the code context below.\n\n\[low] Inconsistent prompt formatting style
The system prompts in chat.rs, descriptions.rs, and fixes.rs use different formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown-like formatting with explicit rules sections. This inconsistency may make the prompts harder to maintain and understand.
Suggested fix: Standardize the prompt formatting style across all modules to use consistent markdown-style formatting with clear rule sections
*Scanner: code-review/convention | *
[medium] Incorrect system prompt for chat handler
The system prompt in chat.rs has been updated but contains a formatting issue. The new prompt uses '\n\n' which creates an extra blank line after 'Answer questions using the code context below.' This may cause formatting inconsistencies in the LLM input. Additionally, the structure of the rules section appears to have been changed from a bulleted list to a paragraph-style list without proper markdown formatting.
Suggested fix: Ensure consistent formatting by using proper markdown bullet points for the rules section and verify that the newline spacing is intentional. Consider using a more structured approach like:
*Scanner: code-review/logic | *
[medium] Complex System Prompt in Chat Handler
The system prompt in the chat handler has been significantly expanded with multiple rules and formatting requirements. While this improves guidance, it increases the complexity of the prompt and makes it harder to maintain and understand.
Suggested fix: Consider breaking down this large system prompt into smaller, more manageable components or using a configuration file to define the prompt structure. This would improve readability and maintainability.
*Scanner: code-review/complexity | *
@@ -6,3 +6,3 @@use crate::llm::LlmClient;const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing issue descriptions for a bug tracker. Generate a clear, actionable issue body in Markdown format that includes:const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing a bug tracker issue for a developer to fix. Be direct and actionable — developers skim issue descriptions, so lead with what matters.[medium] Inconsistent prompt formatting in description generation
The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured. The original format had numbered sections (1. Summary, 2. Evidence, etc.) but the new version uses a different structure with headers like 'What', 'Why it matters', 'Fix'. While this might be intentional, the transition from numbered sections to descriptive headers could affect how the LLM interprets the expected output format. Also, the rule about not restating the finding title in the body is now placed after the main format specification, which might be confusing for the LLM.
Suggested fix: Consider placing the rules section before the main format specification to ensure clarity for the LLM. Also, verify that the change from numbered sections to descriptive headers won't negatively impact the LLM's ability to parse and follow the required format.
*Scanner: code-review/logic | *
[medium] Enhanced Description System Prompt
The description system prompt has been substantially rewritten with new formatting requirements and rules. The increased complexity makes it harder to reason about and maintain, especially with the detailed formatting constraints.
Suggested fix: Consider extracting the Markdown format specification and rules into separate constants or documentation to reduce the complexity of this single prompt string.
*Scanner: code-review/complexity | *
@@ -17,0 +12,4 @@1. **What**: 1 sentence — what's wrong and where (file:line)2. **Why it matters**: 1-2 sentences — concrete impact if not fixed. Avoid generic "could lead to" phrasing; describe the specific attack or failure scenario.3. **Fix**: The specific code change needed. Use a code block with the corrected code if possible. If the fix is configuration-based, show the exact config change.4. **References**: CWE/CVE link if applicable (one line, not a section)[low] Missing type annotations in function signatures
The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion.
Suggested fix: Add explicit return type annotations to function signatures for better clarity
*Scanner: code-review/convention | *
@@ -6,3 +6,3 @@use crate::llm::LlmClient;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.[medium] Detailed Fix System Prompt
The fix system prompt has been expanded with multiple rules and constraints for generating code fixes. The complexity of these rules makes the prompt harder to understand and maintain, particularly with the detailed requirements about imports, comments, and code style preservation.
Suggested fix: Break down the complex rules into separate validation steps or extract them into a configuration structure that can be validated independently from the core prompt logic.
*Scanner: code-review/complexity | *
@@ -8,1 +8,3 @@const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.Rules:[high] Potential confusion in fix suggestion rules
The FIX_SYSTEM_PROMPT in fixes.rs introduces several complex rules that could confuse the LLM. Specifically, the rule 'If the vulnerability is a false positive and the code is actually safe, return the original code unchanged with a comment '// No fix needed: ' seems to contradict the instruction to return ONLY the fixed code snippet. This could lead to inconsistent outputs where the LLM might return both the original code and a comment instead of just the original code with a comment.
Suggested fix: Clarify the behavior for false positives. Either remove the rule about returning original code with comment, or modify the instruction to explicitly state that in case of false positives, only the original code should be returned with the comment appended as a single line comment.
*Scanner: code-review/logic | *
@@ -9,0 +8,4 @@const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.Rules:- The fix must be a drop-in replacement for the vulnerable code[low] Missing type annotations in function signatures
The function signatures in descriptions.rs and fixes.rs lack explicit return type annotations, which could make the code harder to understand and maintain. While Rust's type inference usually handles this, explicit annotations improve readability and prevent potential confusion.
Suggested fix: Add explicit return type annotations to function signatures for better clarity
*Scanner: code-review/convention | *
@@ -1,69 +1,113 @@// System prompts for multi-pass LLM code review.[medium] Missing CWE field in LOGIC_REVIEW_PROMPT response format
The LOGIC_REVIEW_PROMPT response format specifies JSON objects with 'cwe' field, but the actual example shows objects without 'cwe'. This inconsistency may cause parsing issues when processing responses.
Suggested fix: Either remove 'cwe' from the response format specification or add it to the example JSON object to maintain consistency.
*Scanner: code-review/convention | *
[medium] Overly complex boolean expressions in prompts
The updated prompts contain very long and complex boolean expressions that make the instructions hard to parse and maintain. These include lengthy 'Do NOT report:' sections with multiple conditions that are difficult to reason about.
Suggested fix: Break down complex boolean conditions into simpler logical components or use helper functions to clarify intent. Consider extracting the detailed exclusion rules into separate constants or documentation.
*Scanner: code-review/complexity | *
[medium] Insecure Prompt Engineering for LLM Review
The review prompts have been updated to be more restrictive in what they consider as issues. However, these prompts still rely on LLM interpretation which may introduce vulnerabilities through prompt injection or adversarial inputs. The prompts do not explicitly address how to handle malicious inputs or ensure robustness against prompt manipulation.
Suggested fix: Implement additional safeguards such as input sanitization, explicit validation of LLM outputs, and regular auditing of prompt effectiveness against adversarial examples.
Scanner: code-review/security | CWE: CWE-94
[high] Inconsistent severity levels between prompts
The LOGIC_REVIEW_PROMPT uses severity levels 'high', 'medium', 'low' while SECURITY_REVIEW_PROMPT uses 'critical', 'high', 'medium'. This inconsistency could cause issues in downstream processing that expects uniform severity levels across all review types.
Suggested fix: Standardize severity levels across all prompts to use consistent labels ('critical', 'high', 'medium', 'low') or ensure downstream systems handle both formats appropriately.
*Scanner: code-review/logic | *
@@ -9,3 +9,3 @@const TRIAGE_CHUNK_SIZE: usize = 30;const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.[high] Insecure Prompt Engineering for LLM Triage
The triage system prompt contains detailed instructions that could potentially be manipulated by adversarial inputs to influence the LLM's behavior. Specifically, the prompt includes explicit actions and confidence scoring rules that might be exploited through prompt injection techniques to manipulate the triage decisions.
Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence prompt construction. Consider using a more robust prompt engineering framework that separates instruction logic from user data to prevent prompt injection attacks.
Scanner: code-review/security | CWE: CWE-94
@@ -10,3 +10,3 @@const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.[medium] Complex boolean expression in triage logic
The triage system prompt contains complex conditional logic for determining when to dismiss findings. The prompt has multiple nested conditions that make it difficult to reason about all dismissal criteria at once.
Suggested fix: Break down the dismissal criteria into separate bullet points with clear headings, and consider extracting the logic into a structured decision tree or configuration-based approach for better maintainability.
*Scanner: code-review/complexity | *
@@ -16,2 +14,2 @@- "upgrade": The finding is under-reported. Higher severity recommended.- "dismiss": The finding is a false positive. Should be removed.- "confirm": True positive with real impact. Keep severity as-is.- "downgrade": Real issue but over-reported severity. Lower it.[medium] Inconsistent JSON response format in triage prompt
The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the original prompt included them. This inconsistency could confuse developers implementing the LLM interface or lead to parsing errors if the system expects fenced JSON.
Suggested fix: Ensure consistent formatting in examples across all prompts. Either remove all markdown fences or keep them consistently.
*Scanner: code-review/convention | *
@@ -18,1 +15,4 @@- "downgrade": Real issue but over-reported severity. Lower it.- "upgrade": Under-reported — higher severity warranted.- "dismiss": False positive, not exploitable, or not actionable. Remove it.[high] Incorrect JSON response format in triage system prompt
The updated TRIAGE_SYSTEM_PROMPT removes markdown fences from the JSON response example, but the actual implementation may still expect or produce fenced JSON. This could cause parsing failures when the LLM returns properly formatted JSON without the markdown fences.
Suggested fix: Ensure the LLM client parsing logic correctly handles both fenced and unfenced JSON responses, or update the prompt to explicitly mention that markdown fences should be omitted from the actual response.
*Scanner: code-review/logic | *
@@ -314,6 +314,21 @@ impl PentestOrchestrator {- For SPA apps: a 200 HTTP status does NOT mean the page is accessible — check the actualpage content with the browser tool to verify if it shows real data or a login redirect.## Finding Quality Rules[medium] Missing type annotations in pentest prompt builder
The pentest prompt builder contains new rules for finding quality but lacks explicit type annotations for the new constants or functions that might handle these rules. This could lead to runtime issues if types are inferred incorrectly.
Suggested fix: Add explicit type annotations to any new variables or functions introduced in the pentest prompt builder to ensure correct type inference and prevent runtime errors.
*Scanner: code-review/convention | *
[medium] Potential Information Disclosure in Pentest Prompt
The pentest prompt includes detailed rules about finding quality and severity classification that could inadvertently expose internal testing methodologies or security assessment criteria. While not directly exposing secrets, this information could be valuable to attackers trying to understand or circumvent security testing processes.
Suggested fix: Review the prompt content to ensure no sensitive information about internal processes, methodologies, or security assessment criteria is exposed. Consider removing or generalizing the specific examples and rules that might reveal internal practices.
Scanner: code-review/security | CWE: CWE-200
@@ -317,0 +324,4 @@- **Severity must match real impact:**- critical/high: Exploitable vulnerability (you can demonstrate the exploit)- medium: Real misconfiguration with security implications but not directly exploitable- low: Best-practice recommendation, defense-in-depth, or informational[medium] Inconsistent severity guidelines in pentest prompt
The pentest prompt introduces conflicting guidance about missing headers. It states 'Missing headers are medium at most' but then contradicts itself by saying 'missing CSP + confirmed XSS = high for CSP finding'. This inconsistency could lead to inconsistent triage decisions.
Suggested fix: Clarify the severity rules for missing headers to ensure consistency. Either remove the 'medium at most' rule or make it conditional on additional factors like exploitability.
*Scanner: code-review/logic | *
Compliance scan found 23 issue(s) in this PR:
@@ -94,3 +93,1 @@Answer the user's question based on the code context below. \Reference specific files and functions when relevant. \If the context doesn't contain enough information, say so.\n\n\"You are a code assistant for this repository. Answer questions using the code context below.\n\n\[high] Potential Prompt Injection in System Prompts
The system prompts in chat.rs, descriptions.rs, and fixes.rs are constructed using string formatting with user-provided or context-dependent data. While these appear to be static strings in the current diff, if any part of the prompt construction involves user input or external context without proper sanitization, it could allow prompt injection attacks that manipulate the LLM's behavior.
Suggested fix: Ensure all dynamic content used in system prompts is properly sanitized and validated before inclusion. Consider using a templating engine with strict escaping or validating that no malicious prompt injection patterns are present.
Scanner: code-review/security | CWE: CWE-116
[high] Incorrect system prompt for chat handler
The system prompt in chat.rs has been updated but the new prompt structure may not be compatible with existing code that expects a specific format. The new prompt contains more complex formatting with rules and sections that might break parsing or expectation of the LLM response format.
Suggested fix: Verify that all downstream components properly handle the new multi-line system prompt format and that the LLM responses are parsed correctly according to the new structure.
*Scanner: code-review/logic | *
[medium] Inconsistent prompt formatting in system prompts
The system prompts in chat.rs, descriptions.rs, and fixes.rs use inconsistent formatting styles. chat.rs uses a mix of newlines and backslash continuation, while descriptions.rs and fixes.rs use more structured markdown formatting. This inconsistency makes the prompts harder to maintain and read.
Suggested fix: Standardize the prompt formatting across all files to use consistent multi-line string formatting (either all backslashes or all newlines) for better readability and maintainability.
*Scanner: code-review/convention | *
@@ -6,3 +6,3 @@use crate::llm::LlmClient;const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing issue descriptions for a bug tracker. Generate a clear, actionable issue body in Markdown format that includes:const DESCRIPTION_SYSTEM_PROMPT: &str = r#"You are a security engineer writing a bug tracker issue for a developer to fix. Be direct and actionable — developers skim issue descriptions, so lead with what matters.[high] Inconsistent prompt formatting in description generation
The DESCRIPTION_SYSTEM_PROMPT in descriptions.rs has been significantly restructured with new formatting requirements and rules. This change may cause compatibility issues with existing LLM models or response parsing logic that expected the previous format.
Suggested fix: Ensure that the LLM client properly handles the new Markdown format specification and that response parsing logic correctly extracts the required fields from the structured output.
*Scanner: code-review/logic | *
@@ -17,0 +12,4 @@1. **What**: 1 sentence — what's wrong and where (file:line)2. **Why it matters**: 1-2 sentences — concrete impact if not fixed. Avoid generic "could lead to" phrasing; describe the specific attack or failure scenario.3. **Fix**: The specific code change needed. Use a code block with the corrected code if possible. If the fix is configuration-based, show the exact config change.4. **References**: CWE/CVE link if applicable (one line, not a section)[low] Missing type annotations in function signatures
The function signatures in descriptions.rs and fixes.rs lack explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity.
Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability.
*Scanner: code-review/convention | *
@@ -5,7 +5,24 @@ use compliance_core::models::Finding;use crate::error::AgentError;use crate::llm::LlmClient;[medium] Complex System Prompt in Fixes Module
The FIX_SYSTEM_PROMPT constant in fixes.rs contains a very long and complex prompt with extensive rules and language-specific guidance. This makes the prompt hard to maintain and understand, increasing the risk of errors when making updates.
Suggested fix: Break down the complex prompt into smaller, more manageable components or extract language-specific guidance into separate constants/functions to improve readability and maintainability.
*Scanner: code-review/complexity | *
@@ -6,3 +6,3 @@use crate::llm::LlmClient;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.[medium] Complex system prompt in fix generation may cause parsing issues
The FIX_SYSTEM_PROMPT in fixes.rs has been expanded with extensive language-specific guidance and strict rules. This complexity increases the risk of LLM responses not matching the expected format, particularly around import statements and code block formatting.
Suggested fix: Consider adding validation logic to verify that generated fixes match the expected format, especially for import statements and code block boundaries.
*Scanner: code-review/logic | *
@@ -8,1 +8,3 @@const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer. Given a security finding with code context, suggest a concrete code fix. Return ONLY the fixed code snippet that can directly replace the vulnerable code. Include brief inline comments explaining the fix."#;const FIX_SYSTEM_PROMPT: &str = r#"You are a security engineer suggesting a code fix. Return ONLY the corrected code that replaces the vulnerable snippet — no explanations, no markdown fences, no before/after comparison.Rules:[medium] Insecure Cryptography Guidance in Fix Prompts
The FIX_SYSTEM_PROMPT in fixes.rs contains guidance about using secure random number generators ('crypto/rand' in Go, 'SecureRandom' in Java/Kotlin). However, the guidance itself may inadvertently expose implementation details or create confusion if not carefully reviewed. More critically, if these prompts are used in contexts where the LLM might generate actual code, there's a risk of generating insecure cryptographic practices if the guidance isn't comprehensive or up-to-date.
Suggested fix: Review the cryptographic guidance provided in the system prompt to ensure it's accurate and complete. Consider adding explicit warnings against common cryptographic pitfalls like using weak algorithms or improper key management.
Scanner: code-review/security | CWE: CWE-327
[low] Missing type annotations in function signatures
The function signature in fixes.rs lacks explicit type annotations for parameters and return types, which could make the code harder to understand and maintain. While Rust's type inference works, explicit annotations improve clarity.
Suggested fix: Add explicit type annotations to function parameters and return types for better clarity and maintainability.
*Scanner: code-review/convention | *
@@ -1,69 +1,138 @@// System prompts for multi-pass LLM code review.[high] Potential Command Injection Vulnerability
The SECURITY_REVIEW_PROMPT includes a check for command injection vulnerabilities, but the prompt itself doesn't contain any actual code that could be vulnerable. However, if this file were to be used in a context where user input is passed directly to shell commands, it could introduce a command injection vulnerability.
Suggested fix: Ensure that any user input passed to shell commands is properly sanitized or escaped before being executed.
Scanner: code-review/security | CWE: CWE-78
[medium] Hardcoded Credentials in Prompt
The SECURITY_REVIEW_PROMPT mentions 'hardcoded credentials' as a potential vulnerability, but there are no hardcoded credentials present in the prompt itself. This is a good practice to highlight, but the prompt should also include guidance on how to detect such issues in code.
Suggested fix: Add a note in the prompt to explicitly look for hardcoded credentials in the code being reviewed, especially in configuration files or environment variables.
Scanner: code-review/security | CWE: CWE-798
@@ -13,2 +13,3 @@Ignore: style, naming, formatting, documentation, minor improvements.Do NOT report:- Style, naming, formatting, documentation, or code organization preferences[medium] Missing CWE field in logic review prompt response format
The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues.
Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently
*Scanner: code-review/convention | *
@@ -15,0 +14,4 @@Do NOT report:- Style, naming, formatting, documentation, or code organization preferences- Theoretical issues without a concrete triggering scenario- "Potential" problems that require assumptions not supported by the visible code[medium] Overly complex boolean expressions in security review prompt
The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities.
Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions.
*Scanner: code-review/complexity | *
@@ -17,0 +24,4 @@- Java/Kotlin: checked exception patterns, method overloading, `Optional` vs null checks, Kotlin `?.` safe calls, `!!` non-null assertions in tests, `when` exhaustive matching, companion objects, `lateinit`- Ruby: monkey patching in libraries, method_missing, blocks/procs/lambdas, `rescue => e` patterns, `send`/`respond_to?` metaprogramming, `nil` checks via `&.` safe navigation- PHP: loose comparisons with `==` (only flag if `===` was clearly intended), `@` error suppression in legacy code, `isset()`/`empty()` patterns, magic methods (`__get`, `__call`), array functions as callbacks- C/C++: RAII patterns, move semantics, `const_cast`/`static_cast` in appropriate contexts, macro usage for platform compat, pointer arithmetic in low-level code, `goto` for cleanup in C[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT
The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability.
Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent.
*Scanner: code-review/logic | *
@@ -35,0 +72,4 @@- high: Exploitable vulnerability requiring minimal preconditions- medium: Vulnerability requiring specific conditions or limited impactPrefer returning [] over reporting speculative vulnerabilities. Every false positive erodes trust in the scanner.[medium] Inconsistent error handling patterns in convention review prompt
The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation.
Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.)
*Scanner: code-review/convention | *
[medium] Deeply nested control flow in convention review prompt
The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt.
Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns.
*Scanner: code-review/complexity | *
@@ -9,3 +9,3 @@const TRIAGE_CHUNK_SIZE: usize = 30;const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.[high] Insecure LLM System Prompt Configuration
The LLM system prompt contains extensive logic for determining when to dismiss findings, including specific language patterns and false positive rules. While this may reduce noise, it introduces potential for manipulation through prompt injection attacks if the LLM is not properly sandboxed or if the prompt is constructed in a way that allows adversarial inputs to influence the decision-making process.
Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence the LLM prompt. Consider using a more robust prompt engineering framework that separates business logic from the core prompt structure.
Scanner: code-review/security | CWE: CWE-94
[medium] Overly Complex System Prompt in Triage Module
The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation, examples, and rules. This makes it difficult to maintain and understand the core triage logic. The prompt contains multiple sections with different types of information (actions, dismissal criteria, confirmation rules, confidence scoring, language-specific false positives) that could be better organized.
Suggested fix: Break down the system prompt into smaller, focused prompts or extract the detailed rules into separate constants or documentation files. Consider creating a structured format for the rules that can be programmatically validated.
*Scanner: code-review/complexity | *
@@ -24,0 +22,4 @@- The "vulnerability" requires preconditions that don't exist in the code- The finding is about code style, complexity, or theoretical concerns rather than actual bugs- A hash function is used for non-security purposes (dedup, caching, content addressing)- Internal logging of non-sensitive operational data is flagged as "information disclosure"[medium] Incorrect JSON response format in triage system prompt
The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting.
Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity.
*Scanner: code-review/logic | *
@@ -27,0 +32,4 @@- Go: `if err != nil` is not "swallowed error", `crypto/rand` is secure, returning errors is not "information disclosure"- Java/Kotlin: Spring Security annotations are valid auth, JPA parameterized queries are safe, Kotlin `!!` in tests is fine- Ruby: Rails `params.permit` is validation, ActiveRecord finders are parameterized, `html_safe` on generated content- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars` is XSS mitigation[medium] Inconsistent error handling pattern in triage_findings
The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box>, but there's no explicit error handling for parsing failures from the LLM response.
Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function.
*Scanner: code-review/convention | *
@@ -27,0 +37,4 @@Confirm only when:- You can describe a concrete scenario where the bug manifests or the vulnerability is exploitable- The fix is actionable (developer can change specific code to resolve it)[medium] Potential Command Injection Vulnerability via LLM Input
The triage function processes security findings which could potentially contain untrusted data. If the LLM receives input containing malicious payloads designed to manipulate its behavior or extract information, there's a risk of command injection or other vulnerabilities depending on how the LLM results are processed downstream.
Suggested fix: Ensure that all findings passed to the LLM are sanitized before processing. Implement strict validation of the JSON output from the LLM to prevent execution of unexpected commands or data exfiltration attempts.
Scanner: code-review/security | CWE: CWE-78
@@ -314,6 +314,21 @@ impl PentestOrchestrator {- For SPA apps: a 200 HTTP status does NOT mean the page is accessible — check the actual[medium] Missing type annotations in prompt builder
The PentestOrchestrator implementation lacks explicit return type annotations for methods, which could lead to type inference issues and make the API less predictable. The method signatures should include explicit return types to maintain consistency with the project's established patterns.
Suggested fix: Add explicit return type annotations to all methods in the PentestOrchestrator implementation to ensure consistent API behavior.
*Scanner: code-review/convention | *
@@ -317,0 +317,4 @@## Finding Quality Rules- **Do not report the same issue twice.** If multiple tools detect the same missing header orvulnerability on the same endpoint, report it ONCE with the most specific tool's output.For example, if the recon tool and the header scanner both find missing HSTS, report it only[high] Potential off-by-one error in finding grouping logic
In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule.
Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them.
*Scanner: code-review/logic | *
Compliance scan found 32 issue(s) in this PR:
@@ -94,3 +93,1 @@Answer the user's question based on the code context below. \Reference specific files and functions when relevant. \If the context doesn't contain enough information, say so.\n\n\"You are a code assistant for this repository. Answer questions using the code context below.\n\n\[medium] Inconsistent Error Handling Pattern
The
chatfunction incompliance-agent/src/api/handlers/chat.rsuses?operator for error propagation but doesn't handle the case wherecode_contextmight be empty or invalid, potentially leading to inconsistent error handling compared to other parts of the application.Suggested fix: Add explicit error handling for empty or malformed code_context to ensure consistent error propagation throughout the application.
*Scanner: code-review/convention | *
@@ -5,15 +5,20 @@ use compliance_core::models::Finding;use crate::error::AgentError;use crate::llm::LlmClient;[medium] Insecure LLM System Prompt Configuration
The system prompts for LLM-based security analysis have been modified to include detailed security guidance and rules. While this may improve output quality, it introduces potential risks if the prompts themselves contain sensitive information or if they inadvertently expose internal security practices. The prompts now include specific language guidance for various programming languages which could potentially be exploited if the LLM generates code based on these prompts.
Suggested fix: Review the system prompt content to ensure no sensitive information is exposed. Consider implementing prompt validation to prevent leakage of internal security practices or methodologies.
Scanner: code-review/security | CWE: CWE-200
@@ -17,0 +12,4 @@1. **What**: 1 sentence — what's wrong and where (file:line)2. **Why it matters**: 1-2 sentences — concrete impact if not fixed. Avoid generic "could lead to" phrasing; describe the specific attack or failure scenario.3. **Fix**: The specific code change needed. Use a code block with the corrected code if possible. If the fix is configuration-based, show the exact config change.4. **References**: CWE/CVE link if applicable (one line, not a section)[low] Missing Type Annotations in Function Signature
The
generate_issue_descriptionfunction incompliance-agent/src/llm/descriptions.rslacks explicit type annotations for its parameters, which could make the function signature less clear and harder to maintain.Suggested fix: Add explicit type annotations to function parameters for better clarity and maintainability.
*Scanner: code-review/convention | *
[medium] Complex System Prompt in Fixes Module
The FIX_SYSTEM_PROMPT constant in fixes.rs contains a very long and complex prompt with extensive rules and language-specific guidance. This makes the prompt hard to maintain and understand, increasing risk of bugs due to misinterpretation of the rules.
Suggested fix: Break down the FIX_SYSTEM_PROMPT into smaller, more manageable prompts or extract language-specific guidance into separate constants/functions to improve readability and maintainability.
*Scanner: code-review/complexity | *
@@ -5,7 +5,24 @@ use compliance_core::models::Finding;use crate::error::AgentError;use crate::llm::LlmClient;[medium] LLM Fix Generation Guidance Contains Security Implementation Details
The FIX_SYSTEM_PROMPT in fixes.rs contains detailed implementation guidance for fixing vulnerabilities across multiple programming languages. This includes specific recommendations like using parameterized queries, secure random generators, and proper error handling patterns. While helpful for generating fixes, this information could potentially be misused by attackers to understand how to exploit similar vulnerabilities in other systems.
Suggested fix: Consider removing or obfuscating the detailed implementation guidance from the system prompt to prevent potential exploitation of security best practices by malicious actors.
Scanner: code-review/security | CWE: CWE-200
@@ -9,0 +17,4 @@Language-specific fix guidance:- Rust: use `?` for error propagation, prefer `SecretString` for secrets, use parameterized queries with `sqlx`/`diesel`- Python: use parameterized queries (never f-strings in SQL), use `secrets` module not `random`, use `subprocess.run([...])` list form, use `markupsafe.escape()` for HTML[medium] Inconsistent Error Handling in Fix Generation
The
suggest_fixfunction incompliance-agent/src/llm/fixes.rsreturns aResult<String, AgentError>but doesn't explicitly handle potential parsing or validation errors from the LLM response, which could lead to unhandled exceptions.Suggested fix: Add explicit error handling for potential parsing/validation failures from the LLM response to ensure consistent error management.
*Scanner: code-review/convention | *
@@ -13,2 +13,3 @@Ignore: style, naming, formatting, documentation, minor improvements.Do NOT report:- Style, naming, formatting, documentation, or code organization preferences[medium] Missing CWE field in logic review prompt response format
The LOGIC_REVIEW_PROMPT specifies a JSON response format that includes 'cwe' field, but the example response format doesn't include this field. This inconsistency may cause parsing issues.
Suggested fix: Either remove 'cwe' from the specification or update the example response format to include the 'cwe' field consistently
*Scanner: code-review/convention | *
@@ -15,0 +14,4 @@Do NOT report:- Style, naming, formatting, documentation, or code organization preferences- Theoretical issues without a concrete triggering scenario- "Potential" problems that require assumptions not supported by the visible code[medium] Overly complex boolean expressions in security review prompt
The SECURITY_REVIEW_PROMPT contains extremely long and complex boolean expressions that make it difficult to reason about the conditions for what constitutes a vulnerability. This increases the risk of misinterpreting the rules and potentially flagging non-issues or missing actual vulnerabilities.
Suggested fix: Break down the complex boolean expressions into simpler conditional statements or use helper functions to encapsulate the logic. Consider using a structured format like bullet points or tables to represent the complex conditions.
*Scanner: code-review/complexity | *
@@ -17,0 +24,4 @@- Java/Kotlin: checked exception patterns, method overloading, `Optional` vs null checks, Kotlin `?.` safe calls, `!!` non-null assertions in tests, `when` exhaustive matching, companion objects, `lateinit`- Ruby: monkey patching in libraries, method_missing, blocks/procs/lambdas, `rescue => e` patterns, `send`/`respond_to?` metaprogramming, `nil` checks via `&.` safe navigation- PHP: loose comparisons with `==` (only flag if `===` was clearly intended), `@` error suppression in legacy code, `isset()`/`empty()` patterns, magic methods (`__get`, `__call`), array functions as callbacks- C/C++: RAII patterns, move semantics, `const_cast`/`static_cast` in appropriate contexts, macro usage for platform compat, pointer arithmetic in low-level code, `goto` for cleanup in C[medium] Incorrect severity level in SECURITY_REVIEW_PROMPT
The SECURITY_REVIEW_PROMPT uses 'critical' severity for vulnerabilities that could lead to remote code execution, auth bypass, or data breach with no preconditions, but the description says 'no preconditions' which contradicts the typical definition of 'critical' severity that requires minimal preconditions. This inconsistency may mislead reviewers about what constitutes a critical vulnerability.
Suggested fix: Clarify the criteria for 'critical' severity in the description to match the actual requirements. Either change the description to say 'minimal preconditions' or adjust the severity mapping to be consistent.
*Scanner: code-review/logic | *
@@ -17,1 +34,4 @@Prefer returning [] over reporting low-confidence guesses. A false positive wastes more developer time than a missed low-severity issue.Respond with a JSON array (no markdown fences):[{"title": "...", "description": "...", "severity": "high|medium|low", "file": "...", "line": N, "suggestion": "..."}][medium] Hardcoded Credentials in Prompt
The security review prompt contains a reference to 'hardcoded keys' which could be interpreted as a directive to look for hardcoded credentials in code changes. While the prompt itself doesn't contain actual credentials, it could lead to false positives if developers misunderstand the intent.
Suggested fix: Clarify that hardcoded credentials should not be flagged in the prompt itself, and ensure the prompt explicitly excludes checking for hardcoded credentials in the code being reviewed.
Scanner: code-review/security | CWE: CWE-798
@@ -35,0 +72,4 @@- high: Exploitable vulnerability requiring minimal preconditions- medium: Vulnerability requiring specific conditions or limited impactPrefer returning [] over reporting speculative vulnerabilities. Every false positive erodes trust in the scanner.[medium] Deeply nested control flow in convention review prompt
The CONVENTION_REVIEW_PROMPT contains deeply nested conditional logic that makes it hard to follow the decision-making process. The nested 'Do NOT report' sections create a complex structure that could lead to confusion when implementing or maintaining the prompt.
Suggested fix: Refactor the nested conditions into separate logical blocks or use early returns to flatten the control flow. Consider breaking the prompt into smaller, focused sections with clear separation of concerns.
*Scanner: code-review/complexity | *
[medium] Inconsistent error handling patterns in convention review prompt
The CONVENTION_REVIEW_PROMPT mentions 'Inconsistent error handling within the same module where the inconsistency could hide failures' but doesn't specify what constitutes 'inconsistent' error handling patterns. This creates ambiguity in implementation.
Suggested fix: Clarify what specific patterns constitute inconsistent error handling (e.g., mixing Result and panic, inconsistent use of ? operator, etc.)
*Scanner: code-review/convention | *
@@ -9,3 +9,3 @@const TRIAGE_CHUNK_SIZE: usize = 30;const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a security finding triage expert. Analyze each of the following security findings with its code context and determine the appropriate action.const TRIAGE_SYSTEM_PROMPT: &str = r#"You are a pragmatic security triage expert. Your job is to filter out noise and keep only findings that a developer should actually fix. Be aggressive about dismissing false positives — a clean, high-signal list is more valuable than a comprehensive one.[medium] Potential Information Disclosure in LLM Prompt
The LLM system prompt includes detailed information about how to identify and dismiss various types of false positives, including specific examples from different programming languages. This information could potentially be leveraged by an attacker to craft inputs that would cause the LLM to incorrectly classify security findings.
Suggested fix: Limit the amount of information provided in system prompts that could be used to manipulate LLM behavior. Consider removing or generalizing the specific examples of false positive patterns.
Scanner: code-review/security | CWE: CWE-200
[medium] Insecure LLM System Prompt
The LLM system prompt contains detailed instructions for filtering security findings, including specific dismissal criteria and false positive patterns. While this may be intended to reduce noise, it could potentially be exploited to manipulate the LLM's behavior through prompt injection or by crafting inputs designed to trigger specific dismissal patterns.
Suggested fix: Implement strict input validation and sanitization for any user-provided data that might influence LLM prompts. Consider using a more robust prompt engineering framework that prevents prompt injection attacks.
Scanner: code-review/security | CWE: CWE-94
[medium] Overly Complex System Prompt in Triage Module
The TRIAGE_SYSTEM_PROMPT constant has grown to 46 lines with extensive documentation and rules, making it difficult to maintain and understand. The prompt contains multiple sections with complex nested logic that would benefit from being broken down into smaller, more manageable components.
Suggested fix: Break the system prompt into smaller, focused prompts or extract the detailed rules into a separate documentation file that can be referenced. Consider creating helper functions or constants for the different rule categories (dismiss conditions, confirm criteria, confidence scoring).
*Scanner: code-review/complexity | *
@@ -24,0 +22,4 @@- The "vulnerability" requires preconditions that don't exist in the code- The finding is about code style, complexity, or theoretical concerns rather than actual bugs- A hash function is used for non-security purposes (dedup, caching, content addressing)- Internal logging of non-sensitive operational data is flagged as "information disclosure"[medium] Incorrect JSON response format in triage system prompt
The updated TRIAGE_SYSTEM_PROMPT removes markdown fences (```json) from the expected JSON response format, but the prompt text still mentions 'no markdown fences' which could confuse LLM interpretation. More importantly, the instruction says to respond with a JSON array but doesn't specify that the response should be valid JSON without any additional text or formatting.
Suggested fix: Clarify that the response should be pure JSON without any markdown fences or extra text, and consider adding explicit examples of valid responses to avoid ambiguity.
*Scanner: code-review/logic | *
@@ -27,0 +32,4 @@- Go: `if err != nil` is not "swallowed error", `crypto/rand` is secure, returning errors is not "information disclosure"- Java/Kotlin: Spring Security annotations are valid auth, JPA parameterized queries are safe, Kotlin `!!` in tests is fine- Ruby: Rails `params.permit` is validation, ActiveRecord finders are parameterized, `html_safe` on generated content- PHP: PDO prepared statements are safe, Laravel Eloquent is parameterized, `htmlspecialchars` is XSS mitigation[medium] Inconsistent error handling pattern in triage_findings
The triage_findings function uses ? operator for error propagation but doesn't handle the case where the LLM response parsing might fail. The function signature suggests it returns Result<(), Box>, but there's no explicit error handling for parsing failures from the LLM response.
Suggested fix: Add explicit error handling for LLM response parsing failures and ensure consistent error propagation throughout the function.
*Scanner: code-review/convention | *
@@ -314,6 +314,21 @@ impl PentestOrchestrator {- For SPA apps: a 200 HTTP status does NOT mean the page is accessible — check the actual[medium] Missing type annotations in prompt builder
The PentestOrchestrator struct has methods without explicit return type annotations, which makes the API less clear and could lead to inconsistencies in error handling patterns. The method signatures should include explicit return types to maintain consistency with the rest of the codebase.
Suggested fix: Add explicit return type annotations to all methods in PentestOrchestrator to ensure consistent API patterns throughout the module.
*Scanner: code-review/convention | *
@@ -317,0 +317,4 @@## Finding Quality Rules- **Do not report the same issue twice.** If multiple tools detect the same missing header orvulnerability on the same endpoint, report it ONCE with the most specific tool's output.For example, if the recon tool and the header scanner both find missing HSTS, report it only[high] Potential off-by-one error in finding grouping logic
In pentest/prompt_builder.rs, the rule states 'Do not report the same issue twice' and suggests reporting only once with the most specific tool's output. However, there's no implementation detail provided here to ensure this deduplication happens correctly in the code. If the deduplication logic isn't properly implemented elsewhere, this could lead to duplicate findings being reported, violating the rule.
Suggested fix: Ensure that the deduplication logic in the actual implementation correctly identifies and groups related findings based on their fingerprints or other identifying characteristics before reporting them.
*Scanner: code-review/logic | *
[medium] Excessive HTML file length
The HTML file contains 419 lines of code which is significantly longer than typical single-file components. This makes it difficult to maintain and understand the overall structure.
Suggested fix: Consider breaking this into smaller components or using a templating system to manage the complexity. The file should be split into separate HTML partials or a component-based approach.
*Scanner: code-review/complexity | *
[medium] Missing type annotations in HTML template
The HTML file contains embedded CSS and JavaScript but lacks type annotations for any JavaScript functions or variables that might be present. This could lead to runtime issues if JavaScript is added later without proper typing.
Suggested fix: Add type annotations to any JavaScript code present in the file, or ensure that any dynamic behavior is properly typed if using TypeScript.
*Scanner: code-review/convention | *
[medium] Incorrect CSS media query for print
The CSS uses
@page { size: A4; margin: 0; }which is correct for print media, but the HTML document doesn't specify any print-specific styling or media queries. This could lead to inconsistent rendering when printing.Suggested fix: Consider adding explicit print media styles or ensuring that all layout elements are properly sized for A4 printing.
*Scanner: code-review/logic | *
[medium] Potential overflow issue in fixed dimensions
The body has fixed dimensions of 210mm x 297mm, but some content like lists and grids might exceed these boundaries without proper overflow handling, potentially causing visual clipping or layout issues.
Suggested fix: Add overflow handling to key containers or use responsive units where appropriate to prevent content overflow.
*Scanner: code-review/logic | *
[low] Hardcoded date in footer
The footer contains a hardcoded date 'März 2026' which will become outdated quickly. While this may be intentional for a static document, it's worth noting as a potential maintenance issue.
Suggested fix: Consider making the date dynamic if this document is intended to be updated regularly, or add a comment indicating this is intentionally static.
*Scanner: code-review/logic | *
[medium] Potential Information Disclosure Through HTML Metadata
The HTML document contains confidential metadata in the form of a 'Confidential — For Investor Review' header and footer, along with a date 'March 2026'. While this is not directly exploitable, it could be used in social engineering attacks or indicate internal information leakage if the document were to be improperly shared.
Suggested fix: Remove or obfuscate any sensitive metadata that could aid in social engineering or information gathering efforts.
Scanner: code-review/security | CWE: CWE-200
[medium] Excessive HTML Structure Complexity
The HTML structure contains deeply nested elements with multiple semantic layers that make it difficult to maintain and understand. The document has complex nested sections with repetitive patterns that could benefit from a more modular approach.
Suggested fix: Consider breaking down the large HTML structure into smaller components or using a templating system to reduce nesting depth and improve maintainability.
*Scanner: code-review/complexity | *
[medium] Missing type annotations in HTML template
The HTML file contains embedded CSS and JavaScript but lacks proper type annotations for any JavaScript functions or variables that might be present. This could lead to runtime issues if JavaScript is added later without proper typing.
Suggested fix: Add type annotations to any JavaScript code if present, or ensure that any dynamic content is properly typed in the surrounding context.
*Scanner: code-review/convention | *
[low] Missing alt attributes on images
While there are no explicit image tags in this HTML, if any images were added later they would lack alt attributes which are essential for accessibility and SEO.
Suggested fix: Add descriptive alt attributes to all images when present.
*Scanner: code-review/logic | *
[medium] Large CSS Stylesheet
The CSS stylesheet spans 300+ lines and contains numerous style definitions that could be better organized into logical groups or separated into external files for better maintainability.
Suggested fix: Break the CSS into logical sections (layout, typography, components) and consider moving to an external stylesheet to improve organization and readability.
*Scanner: code-review/complexity | *
[medium] Incorrect CSS media query for print
The CSS uses
@page { size: A4; margin: 0; }which is correct for print styling, but the body has fixed dimensions (width: 210mm; height: 297mm) that may conflict with actual page rendering. This could cause layout issues when printing or converting to PDF.Suggested fix: Consider removing the fixed width/height from body and relying solely on @page for proper A4 sizing during print.
*Scanner: code-review/logic | *
[medium] Hardcoded Date in Document
The document includes a hardcoded date 'March 2026' which may expose internal planning timelines or project schedules. This could potentially be leveraged by attackers for strategic purposes.
Suggested fix: Consider making the date dynamic or removing it entirely to prevent exposure of internal scheduling information.
Scanner: code-review/security | CWE: CWE-200
[medium] Potential layout inconsistency in metrics section
The metrics section uses
grid-template-columns: repeat(5, 1fr)but some metric labels contain line breaks (<br>). This can cause inconsistent column widths and visual misalignment when rendered in different browsers or environments.Suggested fix: Avoid using
<br>tags inside labels or ensure consistent use across all metrics to maintain grid alignment.*Scanner: code-review/logic | *