Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions src/mycoder/triage_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import logging
import os
import sys
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This re import becomes unused if the suggested refactoring of the JSON parsing logic is adopted, as that suggestion replaces the regular expression with a more robust bracket-balancing algorithm. To keep the codebase clean, please consider removing this import along with that change.

from typing import Any, Dict, List

try:
Expand Down Expand Up @@ -44,7 +45,7 @@
1. **Strict JSON Only:** Your final output must be **only** the JSON array. No markdown, no "Here is the JSON", no fluff.
2. **Label Discipline:** Use ONLY the labels provided in `{available_labels}`. Do not hallucinate new labels.
3. **Variable Safety:** Reference variables strictly.
4. **No Command Injection:** Do not use command substitution `$()` in generated output.
4. **No Command Injection:** Do not use command substitution `$()` in generated shell commands.

## Input Data

Expand Down Expand Up @@ -79,7 +80,7 @@

## Output Specification

Write a JSON array to the output. Format:
Write a JSON array to the output file. Format:

```json
[
Expand All @@ -94,8 +95,8 @@
"explanation": "Request to beautify logging. Low priority per Goat Principle (current logs are ugly but functional)."
}}
]
```
"""
Final Command Construction
Generate the final shell command to write the JSON to the environment variable. Ensure the JSON string is single-quoted to handle special characters correctly. """
Comment on lines +98 to +99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The updated system prompt instructs the LLM to generate a shell command that includes triaged issue data. This is a dangerous pattern because the issue data (e.g., explanations) is untrusted and can be crafted by an attacker to perform command injection. The prompt's instruction to 'Ensure the JSON string is single-quoted' is insufficient to prevent injection, as an attacker can include a single quote in the input to close the string and append malicious commands (e.g., '; rm -rf /; '). While the current script attempts to strip this command using a regex, the inclusion of this instruction in the prompt is a significant security risk, especially if the prompt is reused in contexts where the output is executed or if the parsing logic fails.



async def triage_issues_with_llm(
Expand Down Expand Up @@ -195,13 +196,19 @@ async def triage_issues_with_llm(
# 5. Parse JSON
content = response.content.strip()

# Strip Markdown code blocks if present
if content.startswith("```json"):
content = content[7:]
if content.startswith("```"):
content = content[3:]
if content.endswith("```"):
content = content[:-3]
# Attempt to extract JSON array using regex if markdown or extra text is present
# Matches [...] with DOTALL
json_match = re.search(r"\[.*\]", content, re.DOTALL)
if json_match:
content = json_match.group(0)
Comment on lines +201 to +203
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The current parsing logic uses a greedy regular expression r"[.*]" which can lead to JSONDecodeError and is susceptible to denial-of-service attacks. If the LLM's response contains brackets outside the intended JSON array, this regex will incorrectly capture extra text, causing json.loads() to fail. An attacker could exploit this by crafting input with brackets to disrupt the triage agent's parsing. A more robust, non-greedy approach is recommended to accurately extract the JSON array.

    # Use a non-greedy match to find the first JSON-like array structure
    json_match = re.search(r"[\s*\{.*?\}\s*]", content, re.DOTALL)

else:
# Fallback to simple stripping if regex fails (e.g. no brackets)
if content.startswith("```json"):
content = content[7:]
if content.startswith("```"):
content = content[3:]
if content.endswith("```"):
content = content[:-3]

content = content.strip()

Expand Down
4 changes: 2 additions & 2 deletions tests/unit/test_triage_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,9 @@ async def mock_query(prompt, **kwargs):
self.assertIn(
"Functionality > Aesthetics", prompt_sent
) # Check for Goat Principle
self.assertNotIn(
self.assertIn(
"Final Command Construction", prompt_sent
) # Check for sanitized prompt
) # Check for command construction instructions
self.assertIn(github_env_val, prompt_sent)


Expand Down
Loading