diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index f495326..11d9396 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -188,7 +188,7 @@ }, { "name": "static-analysis", - "version": "1.0.3", + "version": "1.1.0", "description": "Static analysis toolkit with CodeQL, Semgrep, and SARIF parsing for security vulnerability detection", "author": { "name": "Axel Mierczuk & Paweł Płatek" diff --git a/plugins/static-analysis/.claude-plugin/plugin.json b/plugins/static-analysis/.claude-plugin/plugin.json index 150d55c..1e69cf1 100644 --- a/plugins/static-analysis/.claude-plugin/plugin.json +++ b/plugins/static-analysis/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "static-analysis", - "version": "1.0.3", + "version": "1.1.0", "description": "Static analysis toolkit with CodeQL, Semgrep, and SARIF parsing for security vulnerability detection", "author": { "name": "Axel Mierczuk & Paweł Płatek" diff --git a/plugins/static-analysis/README.md b/plugins/static-analysis/README.md index b9c3013..31eb3fc 100644 --- a/plugins/static-analysis/README.md +++ b/plugins/static-analysis/README.md @@ -47,6 +47,13 @@ Use this plugin when you need to: - Aggregate and deduplicate results from multiple files - CI/CD integration patterns +## Agents Included + +| Agent | Tools | Purpose | +|--------------------|------------------------|----------------------------------------------------------------| +| `semgrep-scanner` | Bash | Executes parallel semgrep scans for a language category | +| `semgrep-triager` | Read, Grep, Glob, Write | Classifies findings as true/false positives by reading source | + ## Installation ``` diff --git a/plugins/static-analysis/agents/semgrep-scanner.md b/plugins/static-analysis/agents/semgrep-scanner.md new file mode 100644 index 0000000..4c97c43 --- /dev/null +++ b/plugins/static-analysis/agents/semgrep-scanner.md @@ -0,0 +1,71 @@ +--- +name: semgrep-scanner +description: "Executes semgrep CLI scans for a language category. Use when running automated static analysis scans with semgrep against a codebase." +tools: Bash(semgrep scan:*), Bash +--- + +# Semgrep Scanner Agent + +You are a Semgrep scanner agent responsible for executing +static analysis scans for a specific language category. + +## Core Rules + +1. **Only use approved rulesets** - Run exactly the rulesets + provided in your task prompt. Never add or remove rulesets. +2. **Always use `--metrics=off`** - Prevents sending telemetry + to Semgrep servers. No exceptions. +3. **Use `--pro` when available** - If the task indicates Pro + engine is available, always include the `--pro` flag for + cross-file taint tracking. +4. **Parallel execution** - Run all rulesets simultaneously + using `&` and `wait`. Never run rulesets sequentially. + +## Scan Command Pattern + +For each approved ruleset, generate and run: + +```bash +semgrep [--pro if available] \ + --metrics=off \ + --config [RULESET] \ + --json -o [OUTPUT_DIR]/[lang]-[ruleset-name].json \ + --sarif-output=[OUTPUT_DIR]/[lang]-[ruleset-name].sarif \ + [TARGET] & +``` + +After launching all rulesets: + +```bash +wait +``` + +## GitHub URL Rulesets + +For rulesets specified as GitHub URLs (e.g., +`https://github.com/trailofbits/semgrep-rules`): +- Clone the repository first if not already cached locally +- Use the local path as the `--config` value, or pass the + URL directly to semgrep (it handles GitHub URLs natively) + +## Output Requirements + +After all scans complete, report: +- Number of findings per ruleset +- Any scan errors or warnings +- File paths of all generated JSON and SARIF results +- If Pro was used, note any cross-file findings detected + +## Error Handling + +- If a ruleset fails to download, report the error but + continue with remaining rulesets +- If semgrep exits non-zero for a scan, capture stderr and + include in report +- Never silently skip a failed ruleset + +## Full Reference + +For the complete scanner task prompt template with variable +substitutions and examples, see: +`{baseDir}/skills/semgrep/references/scanner-task-prompt.md` diff --git a/plugins/static-analysis/agents/semgrep-triager.md b/plugins/static-analysis/agents/semgrep-triager.md new file mode 100644 index 0000000..0b31480 --- /dev/null +++ b/plugins/static-analysis/agents/semgrep-triager.md @@ -0,0 +1,107 @@ +--- +name: semgrep-triager +description: "Classifies semgrep scan findings as true or false positives by reading source context. Use when triaging static analysis results to separate real vulnerabilities from noise." +tools: Read, Grep, Glob, Write +--- + +# Semgrep Triage Agent + +You are a security finding triager responsible for classifying +Semgrep scan results as true or false positives by reading +source code context. + +## Task + +For each finding in the provided JSON result files: + +1. Read the JSON finding (rule ID, file, line number) +2. Read source code context (at least 5 lines before/after) +3. Classify as `TRUE_POSITIVE` or `FALSE_POSITIVE` +4. Write a brief reason for the classification + +## Decision Tree + +Apply these checks in order. The first match determines +the classification: + +``` +Finding + |-- In a test file? + | -> FALSE_POSITIVE (note: add to .semgrepignore) + |-- In example/documentation code? + | -> FALSE_POSITIVE + |-- Has nosemgrep comment? + | -> FALSE_POSITIVE (already acknowledged) + |-- Input sanitized/validated upstream? + | Check 10-20 lines before for validation + | -> FALSE_POSITIVE if validated + |-- Code path reachable? + | Check if function is called/exported + | -> FALSE_POSITIVE if dead code + |-- None of the above + -> TRUE_POSITIVE +``` + +## Classification Guidelines + +**TRUE_POSITIVE indicators:** +- User input flows to sensitive sink without sanitization +- Hardcoded credentials or API keys in source (not test) code +- Known-vulnerable function usage in production paths +- Missing security controls (no CSRF, no auth check) + +**FALSE_POSITIVE indicators:** +- Test files with mock/fixture data +- Input is validated before reaching the flagged line +- Code is behind a feature flag or compile-time guard +- Dead code (unreachable function, commented-out caller) +- Documentation or example snippets + +## Output Format + +Write a triage file to `[OUTPUT_DIR]/[lang]-triage.json`: + +```json +{ + "file": "[lang]-[ruleset].json", + "total": 45, + "true_positives": [ + { + "rule": "rule.id.here", + "file": "path/to/file.py", + "line": 42, + "reason": "User input in raw SQL without parameterization" + } + ], + "false_positives": [ + { + "rule": "rule.id.here", + "file": "tests/test_file.py", + "line": 15, + "reason": "Test file with mock data" + } + ] +} +``` + +## Report + +After triage, provide a summary: +- Total findings examined +- True positives count +- False positives count with breakdown by reason category + (test files, sanitized inputs, dead code, etc.) + +## Important + +- Read actual source code for every finding. Never classify + based solely on the rule name or file path. +- When uncertain, classify as TRUE_POSITIVE. False negatives + are worse than false positives in security triage. +- Process all input JSON files for the language category. + +## Full Reference + +For the complete triage task prompt template with variable +substitutions and examples, see: +`{baseDir}/skills/semgrep/references/triage-task-prompt.md` diff --git a/plugins/static-analysis/skills/semgrep/SKILL.md b/plugins/static-analysis/skills/semgrep/SKILL.md index ca9e28f..f91fd8c 100644 --- a/plugins/static-analysis/skills/semgrep/SKILL.md +++ b/plugins/static-analysis/skills/semgrep/SKILL.md @@ -241,6 +241,8 @@ Present plan to user with **explicit ruleset listing**: - Spawn 3 parallel scan Tasks (Python, JavaScript, Docker) - Total rulesets: 9 - [If Pro] Cross-file taint tracking enabled +- Scan agent: `static-analysis:semgrep-scanner` +- Triage agent: `static-analysis:semgrep-triager` **Want to modify rulesets?** Tell me which to add or remove. **Ready to scan?** Say "proceed" or "yes". @@ -282,6 +284,7 @@ Before marking Step 3 complete, verify: - [ ] User given opportunity to modify rulesets - [ ] User explicitly approved (quote their confirmation) - [ ] **Final ruleset list captured for Step 4** +- [ ] Agent types listed: `static-analysis:semgrep-scanner` and `static-analysis:semgrep-triager` ### Step 4: Spawn Parallel Scan Tasks @@ -296,7 +299,7 @@ mkdir -p "$OUTPUT_DIR" echo "Output directory: $OUTPUT_DIR" ``` -**Spawn N Tasks in a SINGLE message** (one per language category) using `subagent_type: Bash`. +**Spawn N Tasks in a SINGLE message** (one per language category) using `subagent_type: static-analysis:semgrep-scanner`. Use the scanner task prompt template from [scanner-task-prompt.md]({baseDir}/references/scanner-task-prompt.md). @@ -318,7 +321,7 @@ Spawn these 3 Tasks in a SINGLE message: ### Step 5: Spawn Parallel Triage Tasks -After scan Tasks complete, spawn triage Tasks using `subagent_type: general-purpose` (triage requires reading code context, not just running commands). +After scan Tasks complete, spawn triage Tasks using `subagent_type: static-analysis:semgrep-triager` (triage requires reading code context, not just running commands). Use the triage task prompt template from [triage-task-prompt.md]({baseDir}/references/triage-task-prompt.md). @@ -396,6 +399,17 @@ Results written to: 3. Triage requires reading code context - parallelized via Tasks 4. Some false positive patterns require human judgment +## Agents + +This plugin provides two specialized agents for the scan and triage phases: + +| Agent | Tools | Purpose | +|-------|-------|---------| +| `static-analysis:semgrep-scanner` | Bash | Executes parallel semgrep scans for a language category | +| `static-analysis:semgrep-triager` | Read, Grep, Glob, Write | Classifies findings as true/false positives by reading source context | + +Use `subagent_type: static-analysis:semgrep-scanner` in Step 4 and `subagent_type: static-analysis:semgrep-triager` in Step 5 when spawning Task subagents. + ## Rationalizations to Reject | Shortcut | Why It's Wrong | diff --git a/plugins/static-analysis/skills/semgrep/references/scanner-task-prompt.md b/plugins/static-analysis/skills/semgrep/references/scanner-task-prompt.md index e0c7872..c8029c9 100644 --- a/plugins/static-analysis/skills/semgrep/references/scanner-task-prompt.md +++ b/plugins/static-analysis/skills/semgrep/references/scanner-task-prompt.md @@ -1,6 +1,6 @@ # Scanner Subagent Task Prompt -Use this prompt template when spawning scanner Tasks in Step 4. Use `subagent_type: Bash`. +Use this prompt template when spawning scanner Tasks in Step 4. Use `subagent_type: static-analysis:semgrep-scanner`. ## Template diff --git a/plugins/static-analysis/skills/semgrep/references/triage-task-prompt.md b/plugins/static-analysis/skills/semgrep/references/triage-task-prompt.md index 3690f86..a476063 100644 --- a/plugins/static-analysis/skills/semgrep/references/triage-task-prompt.md +++ b/plugins/static-analysis/skills/semgrep/references/triage-task-prompt.md @@ -1,6 +1,6 @@ # Triage Subagent Task Prompt -Use this prompt template when spawning triage Tasks in Step 5. Use `subagent_type: general-purpose`. +Use this prompt template when spawning triage Tasks in Step 5. Use `subagent_type: static-analysis:semgrep-triager`. ## Template