feat: deduplicate code review findings across LLM passes #48

2026-03-25T21:15:55Z

sharang commented

2026-03-25 21:15:55 +00:00

Summary

Adds cross-pass deduplication for LLM code review findings
Groups findings by file, line proximity (within 3 lines), and normalized title keywords
Keeps the highest-severity finding from each group and merges CWE info
Reduces noise from the 4 review passes (logic, security, convention, complexity) reporting the same issue independently

Test plan

Trigger a PR review on a test PR and verify finding count is reduced
Verify highest severity is kept when duplicates are merged
Verify CWE info is preserved from whichever pass reported it

## Summary - Adds cross-pass deduplication for LLM code review findings - Groups findings by file, line proximity (within 3 lines), and normalized title keywords - Keeps the highest-severity finding from each group and merges CWE info - Reduces noise from the 4 review passes (logic, security, convention, complexity) reporting the same issue independently ## Test plan - [ ] Trigger a PR review on a test PR and verify finding count is reduced - [ ] Verify highest severity is kept when duplicates are merged - [ ] Verify CWE info is preserved from whichever pass reported it

sharang added 1 commit 2026-03-25 21:15:56 +00:00

feat: deduplicate code review findings across LLM passes

CI / Check (pull_request) Successful in 12m46s

Details

CI / Detect Changes (pull_request) Has been skipped

Details

CI / Deploy Agent (pull_request) Has been skipped

Details

CI / Deploy Dashboard (pull_request) Has been skipped

Details

CI / Deploy Docs (pull_request) Has been skipped

Details

CI / Deploy MCP (pull_request) Has been skipped

Details

46c7188757

Group findings by file, line proximity, and normalized title keywords,
keeping the highest-severity finding from each group and merging CWE info.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sharang reviewed 2026-03-25 21:16:23 +00:00

sharang left a comment

Compliance scan found 9 issue(s) in this PR:

[high] code-review/logic: Potential panic in dedup_key function
[medium] code-review/logic: Off-by-one error in line bucket calculation
[high] code-review/logic: Inconsistent handling of missing descriptions
[medium] code-review/logic: Missing clone of cwe field during merge
[medium] code-review/security: Insecure Deduplication Logic
[medium] code-review/security: Potential Information Disclosure Through Deduplication Key
[medium] code-review/convention: Inconsistent error handling in dedup_cross_pass function
[medium] code-review/convention: Potential division by zero in line bucket calculation
[medium] code-review/complexity: Complex boolean expression in deduplication logic

Compliance scan found **9** issue(s) in this PR: - **[high]** code-review/logic: Potential panic in dedup_key function - **[medium]** code-review/logic: Off-by-one error in line bucket calculation - **[high]** code-review/logic: Inconsistent handling of missing descriptions - **[medium]** code-review/logic: Missing clone of cwe field during merge - **[medium]** code-review/security: Insecure Deduplication Logic - **[medium]** code-review/security: Potential Information Disclosure Through Deduplication Key - **[medium]** code-review/convention: Inconsistent error handling in dedup_cross_pass function - **[medium]** code-review/convention: Potential division by zero in line bucket calculation - **[medium]** code-review/complexity: Complex boolean expression in deduplication logic

compliance-agent/src/pipeline/code_review.rs

						
				@@ -71,3 +72,4 @@

				            findings: deduped,

				            sbom_entries: Vec::new(),

				        }

				    }

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Inconsistent error handling in dedup_cross_pass function

The dedup_cross_pass function uses unwrap_or which can lead to unexpected behavior when file paths are None. This pattern should be handled more explicitly to avoid potential panics or silent data loss.

Suggested fix: Replace f.file_path.as_deref().unwrap_or("") with explicit error handling or a default value that makes sense for the deduplication logic, such as using a placeholder string like "unknown_file".

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling in dedup_cross_pass function** The `dedup_cross_pass` function uses `unwrap_or` which can lead to unexpected behavior when file paths are None. This pattern should be handled more explicitly to avoid potential panics or silent data loss. Suggested fix: Replace `f.file_path.as_deref().unwrap_or("")` with explicit error handling or a default value that makes sense for the deduplication logic, such as using a placeholder string like "unknown_file". *Scanner: code-review/convention | *

compliance-agent/src/pipeline/code_review.rs

						
				@@ -71,3 +72,4 @@

				            findings: deduped,

				            sbom_entries: Vec::new(),

				        }

				    }

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Potential division by zero in line bucket calculation

The line bucket calculation f.line_number.unwrap_or(0) / 4 could potentially cause issues if line_number is None. While unwrap_or(0) handles this case, it might not be the intended behavior for deduplication purposes.

Suggested fix: Consider how to handle cases where line numbers are missing more explicitly, perhaps by assigning a special bucket value or skipping deduplication for such entries.

*Scanner: code-review/convention | *

**[medium] Potential division by zero in line bucket calculation** The line bucket calculation `f.line_number.unwrap_or(0) / 4` could potentially cause issues if `line_number` is `None`. While `unwrap_or(0)` handles this case, it might not be the intended behavior for deduplication purposes. Suggested fix: Consider how to handle cases where line numbers are missing more explicitly, perhaps by assigning a special bucket value or skipping deduplication for such entries. *Scanner: code-review/convention | *

compliance-agent/src/pipeline/code_review.rs

sharang commented

2026-03-25 21:16:23 +00:00

[high] Potential panic in dedup_key function

The dedup_key function uses f.file_path.as_deref().unwrap_or("") which could panic if file_path is None and we try to access it. While this might be intentional, it's not handled gracefully.

Suggested fix: Use f.file_path.as_deref().unwrap_or_default() instead to avoid potential panics and ensure consistent behavior.

*Scanner: code-review/logic | *

**[high] Potential panic in dedup_key function** The `dedup_key` function uses `f.file_path.as_deref().unwrap_or("")` which could panic if `file_path` is None and we try to access it. While this might be intentional, it's not handled gracefully. Suggested fix: Use `f.file_path.as_deref().unwrap_or_default()` instead to avoid potential panics and ensure consistent behavior. *Scanner: code-review/logic | *

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Potential Information Disclosure Through Deduplication Key

The deduplication key construction includes the full file path which could expose sensitive information about the repository structure or file locations when findings are logged or serialized. This might aid an attacker in understanding the system's internal layout.

Suggested fix: Consider hashing or obfuscating the file path in the deduplication key, or ensure that file paths are not exposed in logs or external outputs where they could be accessed by unauthorized parties.

Scanner: code-review/security | CWE: CWE-200

**[medium] Potential Information Disclosure Through Deduplication Key** The deduplication key construction includes the full file path which could expose sensitive information about the repository structure or file locations when findings are logged or serialized. This might aid an attacker in understanding the system's internal layout. Suggested fix: Consider hashing or obfuscating the file path in the deduplication key, or ensure that file paths are not exposed in logs or external outputs where they could be accessed by unauthorized parties. *Scanner: code-review/security | CWE: CWE-200*

compliance-agent/src/pipeline/code_review.rs

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Off-by-one error in line bucket calculation

The line bucket calculation f.line_number.unwrap_or(0) / 4 will group lines that differ by up to 3 into the same bucket. This may cause false positives when two different issues are on adjacent lines but still grouped together.

Suggested fix: Consider using f.line_number.unwrap_or(0) / 3 + 1 or adjust the grouping logic to better reflect proximity between lines.

*Scanner: code-review/logic | *

**[medium] Off-by-one error in line bucket calculation** The line bucket calculation `f.line_number.unwrap_or(0) / 4` will group lines that differ by up to 3 into the same bucket. This may cause false positives when two different issues are on adjacent lines but still grouped together. Suggested fix: Consider using `f.line_number.unwrap_or(0) / 3 + 1` or adjust the grouping logic to better reflect proximity between lines. *Scanner: code-review/logic | *

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Insecure Deduplication Logic

The deduplication function uses a simple line bucketing approach that can lead to false negatives. By grouping lines within 3 of each other (line_bucket = f.line_number.unwrap_or(0) / 4), different but semantically similar issues on adjacent lines may be incorrectly considered duplicates, potentially masking legitimate security findings.

Suggested fix: Use a more sophisticated approach for line-based deduplication such as considering a fixed window around the line number or using a more robust similarity metric that accounts for code context rather than just proximity.

Scanner: code-review/security | CWE: CWE-473

**[medium] Insecure Deduplication Logic** The deduplication function uses a simple line bucketing approach that can lead to false negatives. By grouping lines within 3 of each other (line_bucket = f.line_number.unwrap_or(0) / 4), different but semantically similar issues on adjacent lines may be incorrectly considered duplicates, potentially masking legitimate security findings. Suggested fix: Use a more sophisticated approach for line-based deduplication such as considering a fixed window around the line number or using a more robust similarity metric that accounts for code context rather than just proximity. *Scanner: code-review/security | CWE: CWE-473*

compliance-agent/src/pipeline/code_review.rs

sharang commented

2026-03-25 21:16:23 +00:00

[high] Inconsistent handling of missing descriptions

When comparing descriptions for tie-breaking, the code assumes that finding.description.len() and existing.description.len() are always valid. If either description is None, this could lead to incorrect behavior or panics.

Suggested fix: Add explicit checks for None values before accessing .len() to prevent potential panics.

*Scanner: code-review/logic | *

**[high] Inconsistent handling of missing descriptions** When comparing descriptions for tie-breaking, the code assumes that `finding.description.len()` and `existing.description.len()` are always valid. If either description is None, this could lead to incorrect behavior or panics. Suggested fix: Add explicit checks for None values before accessing `.len()` to prevent potential panics. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/code_review.rs

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Missing clone of cwe field during merge

In the and_modify block, the code copies finding.cwe.clone() into existing.cwe, but does not handle the case where finding.cwe is None while existing.cwe has a value. This could result in losing CWE information from existing findings.

Suggested fix: Ensure that CWE merging logic properly handles both None and Some cases for both finding.cwe and existing.cwe.

*Scanner: code-review/logic | *

**[medium] Missing clone of cwe field during merge** In the `and_modify` block, the code copies `finding.cwe.clone()` into `existing.cwe`, but does not handle the case where `finding.cwe` is None while `existing.cwe` has a value. This could result in losing CWE information from existing findings. Suggested fix: Ensure that CWE merging logic properly handles both None and Some cases for both `finding.cwe` and `existing.cwe`. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/code_review.rs

sharang commented

2026-03-25 21:16:23 +00:00

[medium] Complex boolean expression in deduplication logic

The boolean expression in the and_modify closure checks both severity comparison and description length comparison, making it hard to reason about the deduplication criteria. The condition combines multiple logical operations that could be simplified.

Suggested fix: Extract the severity comparison logic into a separate helper function or variable to improve readability. Consider using a match statement or explicit conditional blocks to make the intent clearer.

*Scanner: code-review/complexity | *

**[medium] Complex boolean expression in deduplication logic** The boolean expression in the `and_modify` closure checks both severity comparison and description length comparison, making it hard to reason about the deduplication criteria. The condition combines multiple logical operations that could be simplified. Suggested fix: Extract the severity comparison logic into a separate helper function or variable to improve readability. Consider using a match statement or explicit conditional blocks to make the intent clearer. *Scanner: code-review/complexity | *

sharang added 1 commit 2026-03-29 20:24:31 +00:00

feat: deduplicate DAST findings, PR comments, and pentest reports

CI / Check (pull_request) Successful in 10m17s

Details

CI / Detect Changes (pull_request) Has been skipped

Details

CI / Deploy Agent (pull_request) Has been skipped

Details

CI / Deploy Dashboard (pull_request) Has been skipped

Details

CI / Deploy Docs (pull_request) Has been skipped

Details

CI / Deploy MCP (pull_request) Has been skipped

Details

5da33ef882

Two-phase DAST dedup: exact fingerprint match (title+endpoint+method)
and CWE-based related finding merge (e.g., HSTS reported as both
security_header_missing and tls_misconfiguration). Applied at insertion
time in the pentest orchestrator and at report export.

PR review comments now include fingerprints and skip duplicates within
the same review run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sharang reviewed 2026-03-29 20:25:45 +00:00

sharang left a comment

Compliance scan found 34 issue(s) in this PR:

[high] code-review/logic: Potential race condition in duplicate checking
[medium] code-review/logic: Off-by-one error in line bucket calculation
[high] code-review/logic: Incorrect deduplication logic in cross-pass deduplication
[high] code-review/logic: Incorrect string truncation in canonicalize_dast_title
[medium] code-review/logic: Potential panic in merge_dast_finding due to clone usage
[medium] code-review/logic: Incorrect severity comparison in merge_dast_finding
[high] code-review/logic: Potential index out of bounds in canonicalize_dast_title
[medium] code-review/logic: Incorrect removal order in dedup_dast_findings
[high] code-review/logic: Incorrect deduplication logic
[medium] code-review/security: Insecure Deduplication Logic in DAST Findings
[medium] code-review/security: Potential Information Disclosure Through Tracing Logs
[medium] code-review/security: Weak Deduplication Key Construction in Cross-Pass Code Review
[medium] code-review/security: Insecure String Manipulation in Canonicalization
[medium] code-review/security: Potential Integer Overflow in Evidence Merging
[medium] code-review/security: Insecure Hash Usage for Deduplication
[medium] code-review/security: Potential Information Disclosure via Comment Fingerprint
[medium] code-review/security: Insecure Direct Object Reference in Finding Processing
[medium] code-review/convention: Inconsistent error handling pattern in export_session_report
[medium] code-review/convention: Potential performance issue in PentestOrchestrator with redundant database queries
[low] code-review/convention: Missing type annotation for dedup_cross_pass parameter
[medium] code-review/convention: Inconsistent use of clone() in merge_dast_finding
[high] code-review/convention: Potential panic in canonicalize_dast_title due to unchecked indexing
[medium] code-review/convention: Inconsistent error handling in dedup_dast_findings
[medium] code-review/convention: Inconsistent error handling in PR review pipeline
[medium] code-review/convention: Potential panic from unwrap in fingerprint computation
[medium] code-review/complexity: Complex boolean expression in duplicate checking logic
[medium] code-review/complexity: Deeply nested control flow in export_session_report
[high] code-review/complexity: Complex deduplication logic in cross-pass code review
[medium] code-review/complexity: Complex boolean expression in canonicalize_dast_title
[medium] code-review/complexity: Deeply nested control flow in dedup_dast_findings
[high] code-review/complexity: Large function with multiple responsibilities
[medium] code-review/complexity: Complex boolean expression in finding deduplication
[medium] code-review/complexity: Duplicated fingerprint computation logic
[medium] code-review/complexity: Deep nesting in PR review handling

Compliance scan found **34** issue(s) in this PR: - **[high]** code-review/logic: Potential race condition in duplicate checking - **[medium]** code-review/logic: Off-by-one error in line bucket calculation - **[high]** code-review/logic: Incorrect deduplication logic in cross-pass deduplication - **[high]** code-review/logic: Incorrect string truncation in canonicalize_dast_title - **[medium]** code-review/logic: Potential panic in merge_dast_finding due to clone usage - **[medium]** code-review/logic: Incorrect severity comparison in merge_dast_finding - **[high]** code-review/logic: Potential index out of bounds in canonicalize_dast_title - **[medium]** code-review/logic: Incorrect removal order in dedup_dast_findings - **[high]** code-review/logic: Incorrect deduplication logic - **[medium]** code-review/security: Insecure Deduplication Logic in DAST Findings - **[medium]** code-review/security: Potential Information Disclosure Through Tracing Logs - **[medium]** code-review/security: Weak Deduplication Key Construction in Cross-Pass Code Review - **[medium]** code-review/security: Insecure String Manipulation in Canonicalization - **[medium]** code-review/security: Potential Integer Overflow in Evidence Merging - **[medium]** code-review/security: Insecure Hash Usage for Deduplication - **[medium]** code-review/security: Potential Information Disclosure via Comment Fingerprint - **[medium]** code-review/security: Insecure Direct Object Reference in Finding Processing - **[medium]** code-review/convention: Inconsistent error handling pattern in export_session_report - **[medium]** code-review/convention: Potential performance issue in PentestOrchestrator with redundant database queries - **[low]** code-review/convention: Missing type annotation for dedup_cross_pass parameter - **[medium]** code-review/convention: Inconsistent use of `clone()` in merge_dast_finding - **[high]** code-review/convention: Potential panic in `canonicalize_dast_title` due to unchecked indexing - **[medium]** code-review/convention: Inconsistent error handling in dedup_dast_findings - **[medium]** code-review/convention: Inconsistent error handling in PR review pipeline - **[medium]** code-review/convention: Potential panic from unwrap in fingerprint computation - **[medium]** code-review/complexity: Complex boolean expression in duplicate checking logic - **[medium]** code-review/complexity: Deeply nested control flow in export_session_report - **[high]** code-review/complexity: Complex deduplication logic in cross-pass code review - **[medium]** code-review/complexity: Complex boolean expression in canonicalize_dast_title - **[medium]** code-review/complexity: Deeply nested control flow in dedup_dast_findings - **[high]** code-review/complexity: Large function with multiple responsibilities - **[medium]** code-review/complexity: Complex boolean expression in finding deduplication - **[medium]** code-review/complexity: Duplicated fingerprint computation logic - **[medium]** code-review/complexity: Deep nesting in PR review handling

compliance-agent/src/api/handlers/pentest_handlers/export.rs

						
				@@ -97,3 +97,2 @@

				    // Fetch DAST findings for this session

				    let findings: Vec<DastFinding> = match agent

				    // Fetch DAST findings for this session, then deduplicate

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Inconsistent error handling pattern in export_session_report

The code uses match expressions for database operations but handles errors by returning empty vectors. This creates inconsistency with other parts of the codebase that likely use more robust error propagation patterns.

Suggested fix: Use consistent error handling approach throughout the codebase, possibly propagating errors instead of silently returning empty collections

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling pattern in export_session_report** The code uses `match` expressions for database operations but handles errors by returning empty vectors. This creates inconsistency with other parts of the codebase that likely use more robust error propagation patterns. Suggested fix: Use consistent error handling approach throughout the codebase, possibly propagating errors instead of silently returning empty collections *Scanner: code-review/convention | *

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Deeply nested control flow in export_session_report

The export_session_report function has deeply nested control flow with multiple match statements and conditional blocks that make it difficult to follow the main logic path. The deduplication process adds additional nesting.

Suggested fix: Consider extracting the DAST findings fetching and deduplication logic into a separate function to flatten the control flow and improve readability.

*Scanner: code-review/complexity | *

**[medium] Deeply nested control flow in export_session_report** The export_session_report function has deeply nested control flow with multiple match statements and conditional blocks that make it difficult to follow the main logic path. The deduplication process adds additional nesting. Suggested fix: Consider extracting the DAST findings fetching and deduplication logic into a separate function to flatten the control flow and improve readability. *Scanner: code-review/complexity | *

compliance-agent/src/api/handlers/pentest_handlers/export.rs

						
				@@ -109,0 +109,4 @@

				    let raw_count = raw_findings.len();

				    let findings = crate::pipeline::dedup::dedup_dast_findings(raw_findings);

				    if findings.len() < raw_count {

				        tracing::info!(

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential Information Disclosure Through Tracing Logs

The code logs information about deduplicated findings including raw counts and final counts. If sensitive data is included in these logs, it could expose internal system details or potentially sensitive findings to unauthorized parties.

Suggested fix: Ensure that logging does not include sensitive information from findings. Consider using structured logging with configurable log levels to prevent accidental exposure of sensitive data.

Scanner: code-review/security | CWE: CWE-200

**[medium] Potential Information Disclosure Through Tracing Logs** The code logs information about deduplicated findings including raw counts and final counts. If sensitive data is included in these logs, it could expose internal system details or potentially sensitive findings to unauthorized parties. Suggested fix: Ensure that logging does not include sensitive information from findings. Consider using structured logging with configurable log levels to prevent accidental exposure of sensitive data. *Scanner: code-review/security | CWE: CWE-200*

compliance-agent/src/pentest/orchestrator.rs

						
				@@ -327,0 +337,4 @@

				                                        let existing = self

				                                            .db

				                                            .dast_findings()

				                                            .find_one(doc! {

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Insecure Deduplication Logic in DAST Findings

The deduplication logic for DAST findings relies on a combination of title, endpoint, and method fields which may not be sufficient to uniquely identify findings. This could lead to false negatives where actual duplicates are not detected due to minor variations in these fields.

Suggested fix: Enhance the deduplication criteria by incorporating additional unique identifiers such as vulnerability signature hashes or content-based fingerprints that are more robust than just title, endpoint, and method.

Scanner: code-review/security | CWE: CWE-400

**[medium] Insecure Deduplication Logic in DAST Findings** The deduplication logic for DAST findings relies on a combination of title, endpoint, and method fields which may not be sufficient to uniquely identify findings. This could lead to false negatives where actual duplicates are not detected due to minor variations in these fields. Suggested fix: Enhance the deduplication criteria by incorporating additional unique identifiers such as vulnerability signature hashes or content-based fingerprints that are more robust than just title, endpoint, and method. *Scanner: code-review/security | CWE: CWE-400*

compliance-agent/src/pentest/orchestrator.rs

						
				@@ -327,0 +342,4 @@

				                                                "title": &finding.title,

				                                                "endpoint": &finding.endpoint,

				                                                "method": &finding.method,

				                                            })

sharang commented

2026-03-29 20:25:44 +00:00

[high] Potential race condition in duplicate checking

In compliance-agent/src/pentest/orchestrator.rs, the duplicate check for DAST findings is performed before insertion but not atomically with the insertion. This could lead to race conditions where two concurrent processes might both pass the duplicate check and insert the same finding.

Suggested fix: Use a database-level unique constraint or atomic upsert operation instead of separate find and insert operations to prevent race conditions.

*Scanner: code-review/logic | *

**[high] Potential race condition in duplicate checking** In `compliance-agent/src/pentest/orchestrator.rs`, the duplicate check for DAST findings is performed before insertion but not atomically with the insertion. This could lead to race conditions where two concurrent processes might both pass the duplicate check and insert the same finding. Suggested fix: Use a database-level unique constraint or atomic upsert operation instead of separate find and insert operations to prevent race conditions. *Scanner: code-review/logic | *

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential performance issue in PentestOrchestrator with redundant database queries

The code performs a database query for each finding to check for duplicates, which could be inefficient when processing many findings. Consider batching these checks or using a more efficient deduplication strategy.

Suggested fix: Consider optimizing duplicate detection by either batching database queries or pre-loading existing findings into memory for faster lookup

*Scanner: code-review/convention | *

**[medium] Potential performance issue in PentestOrchestrator with redundant database queries** The code performs a database query for each finding to check for duplicates, which could be inefficient when processing many findings. Consider batching these checks or using a more efficient deduplication strategy. Suggested fix: Consider optimizing duplicate detection by either batching database queries or pre-loading existing findings into memory for faster lookup *Scanner: code-review/convention | *

compliance-agent/src/pentest/orchestrator.rs

						
				@@ -327,0 +344,4 @@

				                                                "method": &finding.method,

				                                            })

				                                            .await;

				                                        if matches!(existing, Ok(Some(_))) {

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Complex boolean expression in duplicate checking logic

The condition if finding.severity > existing.severity || (finding.severity == existing.severity && finding.description.len() > existing.description.len()) in orchestrator.rs is complex and hard to reason about. It combines multiple comparisons that could be simplified into a helper function.

Suggested fix: Extract the severity comparison logic into a separate function like should_replace_finding to improve readability and maintainability.

*Scanner: code-review/complexity | *

**[medium] Complex boolean expression in duplicate checking logic** The condition `if finding.severity > existing.severity || (finding.severity == existing.severity && finding.description.len() > existing.description.len())` in orchestrator.rs is complex and hard to reason about. It combines multiple comparisons that could be simplified into a helper function. Suggested fix: Extract the severity comparison logic into a separate function like `should_replace_finding` to improve readability and maintainability. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/code_review.rs

						
				@@ -184,3 +186,51 @@ struct ReviewIssue {

				    #[serde(default)]

				    suggestion: Option<String>,

				}

sharang commented

2026-03-29 20:25:44 +00:00

[low] Missing type annotation for dedup_cross_pass parameter

The function dedup_cross_pass takes a parameter findings without explicit type annotation, which could lead to confusion about expected input types and make refactoring harder.

Suggested fix: Add explicit type annotation to parameter: findings: Vec<Finding>

*Scanner: code-review/convention | *

**[low] Missing type annotation for dedup_cross_pass parameter** The function `dedup_cross_pass` takes a parameter `findings` without explicit type annotation, which could lead to confusion about expected input types and make refactoring harder. Suggested fix: Add explicit type annotation to parameter: `findings: Vec<Finding>` *Scanner: code-review/convention | *

sharang commented

2026-03-29 20:25:44 +00:00

[high] Complex deduplication logic in cross-pass code review

The dedup_cross_pass function contains complex logic for building deduplication keys and comparing findings. The combination of string formatting, word normalization, and multi-criteria comparison makes this function hard to understand and maintain.

Suggested fix: Break down the deduplication logic into smaller helper functions: one for generating the dedup key, another for comparing findings by severity/description length, and a third for merging CWE information.

*Scanner: code-review/complexity | *

**[high] Complex deduplication logic in cross-pass code review** The `dedup_cross_pass` function contains complex logic for building deduplication keys and comparing findings. The combination of string formatting, word normalization, and multi-criteria comparison makes this function hard to understand and maintain. Suggested fix: Break down the deduplication logic into smaller helper functions: one for generating the dedup key, another for comparing findings by severity/description length, and a third for merging CWE information. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/code_review.rs

						
				@@ -187,0 +191,4 @@

				///

				/// Multiple passes often flag the same issue (e.g. SQL injection reported by

				/// logic, security, and convention passes). We group by file + nearby line +

				/// normalized title keywords and keep the highest-severity finding.

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Weak Deduplication Key Construction in Cross-Pass Code Review

The deduplication key construction in dedup_cross_pass function uses a simple line bucketing approach (line_number / 4) which might incorrectly group unrelated findings or fail to detect true duplicates when line numbers vary slightly. Additionally, the normalization of titles may not handle all edge cases properly.

Suggested fix: Improve the deduplication key generation by using more sophisticated techniques like fuzzy matching or semantic analysis to better identify truly duplicate findings regardless of minor differences in formatting or location.

Scanner: code-review/security | CWE: CWE-400

**[medium] Weak Deduplication Key Construction in Cross-Pass Code Review** The deduplication key construction in `dedup_cross_pass` function uses a simple line bucketing approach (`line_number / 4`) which might incorrectly group unrelated findings or fail to detect true duplicates when line numbers vary slightly. Additionally, the normalization of titles may not handle all edge cases properly. Suggested fix: Improve the deduplication key generation by using more sophisticated techniques like fuzzy matching or semantic analysis to better identify truly duplicate findings regardless of minor differences in formatting or location. *Scanner: code-review/security | CWE: CWE-400*

compliance-agent/src/pipeline/code_review.rs

						
				@@ -187,0 +194,4 @@

				/// normalized title keywords and keep the highest-severity finding.

				fn dedup_cross_pass(findings: Vec<Finding>) -> Vec<Finding> {

				    use std::collections::HashMap;

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Off-by-one error in line bucket calculation

In compliance-agent/src/pipeline/code_review.rs, the line bucket calculation uses f.line_number.unwrap_or(0) / 4 which will put line 0, 1, 2, 3 into bucket 0, but line 4 will be in bucket 1. This means that lines 3 and 4 are considered different buckets when they should probably be grouped together for deduplication purposes.

Suggested fix: Change the line bucket calculation to (f.line_number.unwrap_or(0) + 3) / 4 to properly group lines within 3 of each other.

*Scanner: code-review/logic | *

**[medium] Off-by-one error in line bucket calculation** In `compliance-agent/src/pipeline/code_review.rs`, the line bucket calculation uses `f.line_number.unwrap_or(0) / 4` which will put line 0, 1, 2, 3 into bucket 0, but line 4 will be in bucket 1. This means that lines 3 and 4 are considered different buckets when they should probably be grouped together for deduplication purposes. Suggested fix: Change the line bucket calculation to `(f.line_number.unwrap_or(0) + 3) / 4` to properly group lines within 3 of each other. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/code_review.rs

						
				@@ -187,0 +213,4 @@

				    let mut groups: HashMap<String, Finding> = HashMap::new();

				    for finding in findings {

				        let key = dedup_key(&finding);

sharang commented

2026-03-29 20:25:44 +00:00

[high] Incorrect deduplication logic in cross-pass deduplication

In compliance-agent/src/pipeline/code_review.rs, the deduplication function compares descriptions by length rather than content. This means that a finding with a longer description will overwrite a finding with a shorter one even if the shorter one has higher severity, which is incorrect behavior.

Suggested fix: Fix the comparison logic to prioritize severity first, then consider description length only as a tiebreaker, or better yet, compare actual description content for meaningful deduplication.

*Scanner: code-review/logic | *

**[high] Incorrect deduplication logic in cross-pass deduplication** In `compliance-agent/src/pipeline/code_review.rs`, the deduplication function compares descriptions by length rather than content. This means that a finding with a longer description will overwrite a finding with a shorter one even if the shorter one has higher severity, which is incorrect behavior. Suggested fix: Fix the comparison logic to prioritize severity first, then consider description length only as a tiebreaker, or better yet, compare actual description content for meaningful deduplication. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -9,9 +11,209 @@ pub fn compute_fingerprint(parts: &[&str]) -> String {

				    hex::encode(hasher.finalize())

				}

				/// Compute a dedup fingerprint for a DAST finding.

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Insecure Hash Usage for Deduplication

While SHA256 is cryptographically secure, using it for deduplication purposes without proper input sanitization can lead to issues. The function computes fingerprints based on user-provided data (titles, endpoints, methods) which could potentially be crafted to cause hash collisions or other issues if not properly validated.

Suggested fix: Ensure that the inputs to compute_fingerprint are properly sanitized and validated before hashing. Consider adding input length limits to prevent potential DoS attacks through excessively long inputs.

Scanner: code-review/security | CWE: CWE-327

**[medium] Insecure Hash Usage for Deduplication** While SHA256 is cryptographically secure, using it for deduplication purposes without proper input sanitization can lead to issues. The function computes fingerprints based on user-provided data (titles, endpoints, methods) which could potentially be crafted to cause hash collisions or other issues if not properly validated. Suggested fix: Ensure that the inputs to compute_fingerprint are properly sanitized and validated before hashing. Consider adding input length limits to prevent potential DoS attacks through excessively long inputs. *Scanner: code-review/security | CWE: CWE-327*

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +35,4 @@

				fn canonicalize_dast_title(title: &str) -> String {

				    let mut s = title.to_lowercase();

				    // Strip "for <domain>" or "on <domain>" suffixes

sharang commented

2026-03-29 20:25:44 +00:00

[high] Incorrect string truncation in canonicalize_dast_title

The canonicalize_dast_title function uses s.truncate(idx) which removes everything from index idx onwards, but this can leave trailing spaces or break word boundaries. For example, 'test for example.com' becomes 'test ' instead of 'test'.

Suggested fix: Replace s.truncate(idx) with s.truncate(idx.saturating_sub(1)) or better yet, use a more robust approach that properly handles whitespace around the truncated portion.

*Scanner: code-review/logic | *

**[high] Incorrect string truncation in canonicalize_dast_title** The canonicalize_dast_title function uses s.truncate(idx) which removes everything from index idx onwards, but this can leave trailing spaces or break word boundaries. For example, 'test for example.com' becomes 'test ' instead of 'test'. Suggested fix: Replace s.truncate(idx) with s.truncate(idx.saturating_sub(1)) or better yet, use a more robust approach that properly handles whitespace around the truncated portion. *Scanner: code-review/logic | *

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Insecure String Manipulation in Canonicalization

The canonicalize_dast_title function uses manual string manipulation and pattern matching that could be vulnerable to injection or unexpected behavior. Specifically, the domain stripping logic relies on simple substring checks which may not properly handle all edge cases, potentially leading to incorrect deduplication or information leakage.

Suggested fix: Use more robust parsing techniques or regular expressions for domain/URL detection instead of simple substring matching. Consider using a dedicated URL parsing library to properly identify and strip domains.

Scanner: code-review/security | CWE: CWE-1104

**[medium] Insecure String Manipulation in Canonicalization** The canonicalize_dast_title function uses manual string manipulation and pattern matching that could be vulnerable to injection or unexpected behavior. Specifically, the domain stripping logic relies on simple substring checks which may not properly handle all edge cases, potentially leading to incorrect deduplication or information leakage. Suggested fix: Use more robust parsing techniques or regular expressions for domain/URL detection instead of simple substring matching. Consider using a dedicated URL parsing library to properly identify and strip domains. *Scanner: code-review/security | CWE: CWE-1104*

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +51,4 @@

				        }

				    }

				    // Resolve known header synonyms

sharang commented

2026-03-29 20:25:44 +00:00

[high] Potential index out of bounds in canonicalize_dast_title

In canonicalize_dast_title, when checking for synonyms, the code accesses bytes at position pos + short.len() without ensuring that pos + short.len() is within bounds of the string. This could cause a panic if short is near the end of the string.

Suggested fix: Add bounds checking before accessing s.as_bytes()[pos + short.len()] to prevent potential panics.

*Scanner: code-review/logic | *

**[high] Potential index out of bounds in canonicalize_dast_title** In canonicalize_dast_title, when checking for synonyms, the code accesses bytes at position pos + short.len() without ensuring that pos + short.len() is within bounds of the string. This could cause a panic if short is near the end of the string. Suggested fix: Add bounds checking before accessing s.as_bytes()[pos + short.len()] to prevent potential panics. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +52,4 @@

				    }

				    // Resolve known header synonyms

				    let synonyms: &[(&str, &str)] = &[

sharang commented

2026-03-29 20:25:44 +00:00

[high] Potential panic in canonicalize_dast_title due to unchecked indexing

In canonicalize_dast_title, there's a potential panic when accessing s.as_bytes()[pos - 1] without ensuring pos > 0. If pos is 0, this would access memory before the string, causing undefined behavior.

Suggested fix: Add a bounds check before accessing s.as_bytes()[pos - 1] to ensure pos > 0 before dereferencing.

*Scanner: code-review/convention | *

**[high] Potential panic in `canonicalize_dast_title` due to unchecked indexing** In `canonicalize_dast_title`, there's a potential panic when accessing `s.as_bytes()[pos - 1]` without ensuring `pos > 0`. If `pos` is 0, this would access memory before the string, causing undefined behavior. Suggested fix: Add a bounds check before accessing `s.as_bytes()[pos - 1]` to ensure `pos > 0` before dereferencing. *Scanner: code-review/convention | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +55,4 @@

				    let synonyms: &[(&str, &str)] = &[

				        ("hsts", "strict-transport-security"),

				        ("csp", "content-security-policy"),

				        ("cors", "cross-origin-resource-sharing"),

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Complex boolean expression in canonicalize_dast_title

The canonicalize_dast_title function contains complex boolean logic for checking word boundaries when replacing synonyms. The condition before_ok && after_ok involves multiple checks that are hard to follow and maintain.

Suggested fix: Extract the boundary checking logic into a separate helper function to improve readability and maintainability.

*Scanner: code-review/complexity | *

**[medium] Complex boolean expression in canonicalize_dast_title** The canonicalize_dast_title function contains complex boolean logic for checking word boundaries when replacing synonyms. The condition `before_ok && after_ok` involves multiple checks that are hard to follow and maintain. Suggested fix: Extract the boundary checking logic into a separate helper function to improve readability and maintainability. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +111,4 @@

				    // Group by (cwe, endpoint_normalized, method) — only when CWE is present

				    let mut cwe_groups: HashMap<String, Vec<usize>> = HashMap::new();

				    for (i, f) in deduped.iter().enumerate() {

				        if let Some(ref cwe) = f.cwe {

sharang commented

2026-03-29 20:25:44 +00:00

[high] Large function with multiple responsibilities

The dedup_dast_findings function is quite large (over 100 lines) and handles multiple distinct responsibilities: exact deduplication, CWE-based deduplication, merging logic, and result processing. This makes it hard to test and maintain.

Suggested fix: Break down this function into smaller, focused functions such as 'perform_exact_dedup', 'perform_cwe_dedup', and 'merge_findings' to improve modularity and testability.

*Scanner: code-review/complexity | *

**[high] Large function with multiple responsibilities** The dedup_dast_findings function is quite large (over 100 lines) and handles multiple distinct responsibilities: exact deduplication, CWE-based deduplication, merging logic, and result processing. This makes it hard to test and maintain. Suggested fix: Break down this function into smaller, focused functions such as 'perform_exact_dedup', 'perform_cwe_dedup', and 'merge_findings' to improve modularity and testability. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +117,4 @@

				                cwe,

				                f.endpoint.to_lowercase().trim_end_matches('/'),

				                f.method.to_uppercase(),

				            );

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Deeply nested control flow in dedup_dast_findings

The dedup_dast_findings function has deeply nested control flow with multiple nested if statements and match expressions, making it difficult to follow the execution path and increasing the risk of logic errors.

Suggested fix: Refactor the nested conditions into early returns or separate helper functions to flatten the control flow and improve readability.

*Scanner: code-review/complexity | *

**[medium] Deeply nested control flow in dedup_dast_findings** The dedup_dast_findings function has deeply nested control flow with multiple nested if statements and match expressions, making it difficult to follow the execution path and increasing the risk of logic errors. Suggested fix: Refactor the nested conditions into early returns or separate helper functions to flatten the control flow and improve readability. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -12,0 +137,4 @@

				                .severity

				                .cmp(&deduped[b].severity)

				                .then_with(|| deduped[a].evidence.len().cmp(&deduped[b].evidence.len()))

				                .then_with(|| {

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Inconsistent error handling in dedup_dast_findings

The dedup_dast_findings function uses unwrap_or in one place but has explicit if let Some(...) checks elsewhere. This inconsistency can lead to confusion about how errors are handled and might mask potential issues in the deduplication logic.

Suggested fix: Use consistent error handling patterns throughout the function. Prefer explicit match expressions or ? operators for better clarity and error propagation.

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling in dedup_dast_findings** The `dedup_dast_findings` function uses `unwrap_or` in one place but has explicit `if let Some(...)` checks elsewhere. This inconsistency can lead to confusion about how errors are handled and might mask potential issues in the deduplication logic. Suggested fix: Use consistent error handling patterns throughout the function. Prefer explicit `match` expressions or `?` operators for better clarity and error propagation. *Scanner: code-review/convention | *

compliance-agent/src/pipeline/dedup.rs

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Incorrect removal order in dedup_dast_findings

In dedup_dast_findings, when removing merged findings, the code iterates through remove_indices in reverse order. However, if there are duplicate indices in remove_indices, they won't be handled correctly because dedup() is called after sorting but before iteration.

Suggested fix: Ensure that remove_indices contains unique values before iterating in reverse, or use a HashSet to track already removed indices.

*Scanner: code-review/logic | *

**[medium] Incorrect removal order in dedup_dast_findings** In dedup_dast_findings, when removing merged findings, the code iterates through remove_indices in reverse order. However, if there are duplicate indices in remove_indices, they won't be handled correctly because dedup() is called after sorting but before iteration. Suggested fix: Ensure that remove_indices contains unique values before iterating in reverse, or use a HashSet to track already removed indices. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/dedup.rs

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential Integer Overflow in Evidence Merging

In the merge_dast_finding function, there's potential for integer overflow when comparing string lengths during evidence merging. While this is unlikely to cause immediate issues, it represents a potential vulnerability in length comparisons that could be exploited in certain scenarios.

Suggested fix: Add bounds checking before performing length comparisons. Ensure that string length operations are performed safely, especially when dealing with potentially large inputs from external sources.

Scanner: code-review/security | CWE: CWE-190

**[medium] Potential Integer Overflow in Evidence Merging** In the merge_dast_finding function, there's potential for integer overflow when comparing string lengths during evidence merging. While this is unlikely to cause immediate issues, it represents a potential vulnerability in length comparisons that could be exploited in certain scenarios. Suggested fix: Add bounds checking before performing length comparisons. Ensure that string length operations are performed safely, especially when dealing with potentially large inputs from external sources. *Scanner: code-review/security | CWE: CWE-190*

compliance-agent/src/pipeline/dedup.rs

						
				@@ -58,0 +275,4 @@

				    #[test]

				    fn canonicalize_strips_domain_suffix() {

				        let canon = canonicalize_dast_title("Missing HSTS header for comp-dev.meghsakha.com");

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Incorrect severity comparison in merge_dast_finding

In merge_dast_finding, the condition 'if duplicate.severity > primary.severity' assumes that Severity implements Ord correctly and that the comparison works as expected. If Severity does not implement Ord properly or has custom ordering, this could lead to incorrect behavior.

Suggested fix: Verify that Severity implements Ord correctly and add explicit debug assertions to ensure the comparison behaves as expected.

*Scanner: code-review/logic | *

**[medium] Incorrect severity comparison in merge_dast_finding** In merge_dast_finding, the condition 'if duplicate.severity > primary.severity' assumes that Severity implements Ord correctly and that the comparison works as expected. If Severity does not implement Ord properly or has custom ordering, this could lead to incorrect behavior. Suggested fix: Verify that Severity implements Ord correctly and add explicit debug assertions to ensure the comparison behaves as expected. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/dedup.rs

						
				@@ -58,0 +279,4 @@

				        assert!(!canon.contains("meghsakha"), "domain should be stripped");

				        assert!(

				            canon.contains("strict-transport-security"),

				            "hsts should be resolved: {canon}"

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential panic in merge_dast_finding due to clone usage

In merge_dast_finding, the line 'primary.description.clone_from(&duplicate.description)' could panic if the description field is not properly initialized or if there's an issue with the clone operation. However, more importantly, the comparison logic in dedup_dast_findings might not handle all edge cases correctly.

Suggested fix: Ensure that description fields are always initialized before calling clone_from, and consider adding defensive checks.

*Scanner: code-review/logic | *

**[medium] Potential panic in merge_dast_finding due to clone usage** In merge_dast_finding, the line 'primary.description.clone_from(&duplicate.description)' could panic if the description field is not properly initialized or if there's an issue with the clone operation. However, more importantly, the comparison logic in dedup_dast_findings might not handle all edge cases correctly. Suggested fix: Ensure that description fields are always initialized before calling clone_from, and consider adding defensive checks. *Scanner: code-review/logic | *

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Inconsistent use of clone() in merge_dast_finding

The merge_dast_finding function uses clone() on duplicate.description and duplicate.remediation, but not on other fields like duplicate.evidence. This creates an inconsistent pattern where some fields are cloned while others are moved, potentially leading to performance issues or unexpected behavior.

Suggested fix: Use clone() consistently for all fields that need to be copied, or use clone_from for fields that are already owned. Consider using std::mem::replace or similar patterns for more efficient moves.

*Scanner: code-review/convention | *

**[medium] Inconsistent use of `clone()` in merge_dast_finding** The `merge_dast_finding` function uses `clone()` on `duplicate.description` and `duplicate.remediation`, but not on other fields like `duplicate.evidence`. This creates an inconsistent pattern where some fields are cloned while others are moved, potentially leading to performance issues or unexpected behavior. Suggested fix: Use `clone()` consistently for all fields that need to be copied, or use `clone_from` for fields that are already owned. Consider using `std::mem::replace` or similar patterns for more efficient moves. *Scanner: code-review/convention | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -89,12 +90,37 @@ impl PipelineOrchestrator {

				            return Ok(());

				        }

				        // Dedup findings by fingerprint to avoid duplicate comments

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Complex boolean expression in finding deduplication

The fingerprint computation and deduplication logic uses a complex boolean expression with multiple unwraps and nested conditions that make it hard to reason about. The code also has duplicated fingerprint computation logic.

Suggested fix: Extract the fingerprint computation into a separate function and simplify the deduplication logic. Consider using a more functional approach with iterators and map/filter operations instead of manual loops.

*Scanner: code-review/complexity | *

**[medium] Complex boolean expression in finding deduplication** The fingerprint computation and deduplication logic uses a complex boolean expression with multiple unwraps and nested conditions that make it hard to reason about. The code also has duplicated fingerprint computation logic. Suggested fix: Extract the fingerprint computation into a separate function and simplify the deduplication logic. Consider using a more functional approach with iterators and map/filter operations instead of manual loops. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -92,0 +95,4 @@

				        let mut unique_findings: Vec<&Finding> = Vec::new();

				        for finding in &pr_findings {

				            let fp = compute_fingerprint(&[

				                repo_id,

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential panic from unwrap in fingerprint computation

The code uses finding.file_path.as_deref().unwrap_or("") which could panic if file_path is None and the fallback is not properly handled. While it's unlikely to be None, using unwrap_or_default() or similar safer alternatives would prevent potential panics.

Suggested fix: Replace unwrap_or("") with unwrap_or_default() or handle the Option more explicitly to avoid potential panics.

*Scanner: code-review/convention | *

**[medium] Potential panic from unwrap in fingerprint computation** The code uses `finding.file_path.as_deref().unwrap_or("")` which could panic if file_path is None and the fallback is not properly handled. While it's unlikely to be None, using `unwrap_or_default()` or similar safer alternatives would prevent potential panics. Suggested fix: Replace `unwrap_or("")` with `unwrap_or_default()` or handle the Option more explicitly to avoid potential panics. *Scanner: code-review/convention | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -92,0 +97,4 @@

				            let fp = compute_fingerprint(&[

				                repo_id,

				                &pr_number.to_string(),

				                finding.file_path.as_deref().unwrap_or(""),

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Insecure Direct Object Reference in Finding Processing

The code processes findings based on file paths and line numbers without proper validation. If these values are derived from user input or external sources, they could potentially allow an attacker to access unintended files or lines through manipulation of the finding data.

Suggested fix: Validate and sanitize all inputs related to file paths and line numbers before processing them. Ensure that the file paths are within expected boundaries and that line numbers are valid for the given file.

Scanner: code-review/security | CWE: CWE-639

**[medium] Insecure Direct Object Reference in Finding Processing** The code processes findings based on file paths and line numbers without proper validation. If these values are derived from user input or external sources, they could potentially allow an attacker to access unintended files or lines through manipulation of the finding data. Suggested fix: Validate and sanitize all inputs related to file paths and line numbers before processing them. Ensure that the file paths are within expected boundaries and that line numbers are valid for the given file. *Scanner: code-review/security | CWE: CWE-639*

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -92,0 +100,4 @@

				                finding.file_path.as_deref().unwrap_or(""),

				                &finding.line_number.unwrap_or(0).to_string(),

				                &finding.title,

				            ]);

sharang commented

2026-03-29 20:25:44 +00:00

[high] Incorrect deduplication logic

The deduplication logic uses a HashSet to track fingerprints but then reassigns the pr_findings variable which shadows the original one. This means the deduplication only affects the local unique_findings vector, not the actual pr_findings that get processed later. The original pr_findings is never updated with unique_findings.

Suggested fix: Remove the line 'let pr_findings = unique_findings;' and instead modify the original pr_findings variable directly, or better yet, replace the entire loop with a more idiomatic approach using iterator methods.

*Scanner: code-review/logic | *

**[high] Incorrect deduplication logic** The deduplication logic uses a HashSet to track fingerprints but then reassigns the pr_findings variable which shadows the original one. This means the deduplication only affects the local unique_findings vector, not the actual pr_findings that get processed later. The original pr_findings is never updated with unique_findings. Suggested fix: Remove the line 'let pr_findings = unique_findings;' and instead modify the original pr_findings variable directly, or better yet, replace the entire loop with a more idiomatic approach using iterator methods. *Scanner: code-review/logic | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -92,0 +102,4 @@

				                &finding.title,

				            ]);

				            if seen_fps.insert(fp) {

				                unique_findings.push(finding);

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Duplicated fingerprint computation logic

The fingerprint computation logic is duplicated in two places (lines 93-101 and 105-113) which increases maintenance burden and risk of inconsistencies.

Suggested fix: Create a helper function that takes the required components and returns the fingerprint to eliminate code duplication.

*Scanner: code-review/complexity | *

**[medium] Duplicated fingerprint computation logic** The fingerprint computation logic is duplicated in two places (lines 93-101 and 105-113) which increases maintenance burden and risk of inconsistencies. Suggested fix: Create a helper function that takes the required components and returns the fingerprint to eliminate code duplication. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -92,0 +104,4 @@

				            if seen_fps.insert(fp) {

				                unique_findings.push(finding);

				            }

				        }

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Potential Information Disclosure via Comment Fingerprint

The code includes a compliance fingerprint in PR review comments using . If the fingerprint contains sensitive information or can be used to reconstruct internal state, it may lead to information disclosure. The fingerprint is computed from repository ID, PR number, file path, line number, and finding title.

Suggested fix: Review the fingerprint computation logic to ensure it does not include sensitive data. Consider using a hash of non-sensitive components only, and validate that the fingerprint cannot be reverse-engineered to expose internal system details.

Scanner: code-review/security | CWE: CWE-200

**[medium] Potential Information Disclosure via Comment Fingerprint** The code includes a compliance fingerprint in PR review comments using ``. If the fingerprint contains sensitive information or can be used to reconstruct internal state, it may lead to information disclosure. The fingerprint is computed from repository ID, PR number, file path, line number, and finding title. Suggested fix: Review the fingerprint computation logic to ensure it does not include sensitive data. Consider using a hash of non-sensitive components only, and validate that the fingerprint cannot be reverse-engineered to expose internal system details. *Scanner: code-review/security | CWE: CWE-200*

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -123,6 +149,17 @@ impl PipelineOrchestrator {

				                .join("\n"),

sharang commented

2026-03-29 20:25:45 +00:00

[medium] Deep nesting in PR review handling

The PR review handling logic has multiple nested conditional blocks (if let chains and nested if statements) that make the control flow difficult to follow and increase cognitive load.

Suggested fix: Refactor the nested conditionals into early returns or extract the review creation logic into separate functions to flatten the control flow.

*Scanner: code-review/complexity | *

**[medium] Deep nesting in PR review handling** The PR review handling logic has multiple nested conditional blocks (if let chains and nested if statements) that make the control flow difficult to follow and increase cognitive load. Suggested fix: Refactor the nested conditionals into early returns or extract the review creation logic into separate functions to flatten the control flow. *Scanner: code-review/complexity | *

compliance-agent/src/pipeline/pr_review.rs

						
				@@ -123,6 +149,17 @@ impl PipelineOrchestrator {

				                .join("\n"),

				        );

				        if review_comments.is_empty() {

sharang commented

2026-03-29 20:25:44 +00:00

[medium] Inconsistent error handling in PR review pipeline

The code uses both tracing::warn! for logging and direct error propagation with ? operator inconsistently. The warning log is used for one case but not for others where errors might occur during PR review creation.

Suggested fix: Use consistent error handling pattern throughout the function. Either propagate all errors with ? or handle them uniformly with logging.

*Scanner: code-review/convention | *

**[medium] Inconsistent error handling in PR review pipeline** The code uses both `tracing::warn!` for logging and direct error propagation with `?` operator inconsistently. The warning log is used for one case but not for others where errors might occur during PR review creation. Suggested fix: Use consistent error handling pattern throughout the function. Either propagate all errors with `?` or handle them uniformly with logging. *Scanner: code-review/convention | *

sharang merged commit ff088f9eb4 into main

2026-03-29 20:38:53 +00:00

sharang deleted branch feat/dedup-code-review

2026-03-29 20:38:53 +00:00

sharang referenced this issue from a commit

2026-03-29 20:38:54 +00:00

feat: deduplicate code review findings across LLM passes (#48)

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: sharang/compliance-scanner-agent#48

				`@@ -123,6 +149,17 @@ impl PipelineOrchestrator {`
				`.join("\n"),`