Skip to content

Conversation

@bhavishadawada
Copy link
Contributor

Fix Backend security issues

- Sanitize user-controlled fileName parameter to prevent directory traversal
- Add os.path.basename() to remove path components from filename
- Validate filename contains no '..' or path separators
- Add directory boundary check to ensure resolved path stays within DDA_SYSTEM_FOLDER
- Addresses CodeQL rule py/path-injection
- Prevents attacks like '../../../etc/passwd' from accessing system files
- Add path sanitization using os.path.basename()
- Validate against directory traversal patterns (.., path separators)
- Enforce directory boundary checks within DDA_SYSTEM_FOLDER
- Addresses CodeQL rule py/path-injection
…ction

- Add sanitization for workflowId and captureId parameters
- Validate against directory traversal patterns (.., path separators)
- Use sanitized components in jsonl file path construction
- Addresses CodeQL rule py/path-injection for line 108
detail="Access denied"
)

if not os.path.exists(full_path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 months ago

The best practice for managing potentially user-controlled file paths is:

  • Normalize (canonicalize) the complete path after joining the user input to the intended base directory. This step removes symlinks, redundant path separators, and traversal segments like ...
  • Then, check after normalization that the resulting path is strictly within the intended parent directory using robust containment logic (not just startswith, which is error-prone for directories like /tmp/foo and /tmp/foobar).

Edits required:

  • In src/backend/endpoints/download_file.py, lines 216-228:
    • Remove or replace os.path.basename and associated manual checks.
    • Use os.path.normpath (and preferably os.path.abspath) after joining the user-supplied path to the base directory.
    • Ensure the containment check is robust: on Unix, this is often done with os.path.commonpath.
    • Use the normalized check before file access.
  • Add any imports needed (os is already imported).

Suggested changeset 1
src/backend/endpoints/download_file.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/backend/endpoints/download_file.py b/src/backend/endpoints/download_file.py
--- a/src/backend/endpoints/download_file.py
+++ b/src/backend/endpoints/download_file.py
@@ -212,30 +212,22 @@
     # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
     # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
     if captureIdPath:
-        # Sanitize path to prevent directory traversal
-        safe_path = os.path.basename(captureIdPath)
-        if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
+        # Normalize path and ensure it is descendant of the allowed root
+        candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
+        abs_candidate_path = os.path.abspath(candidate_path)
+        dda_system_folder_abs = os.path.abspath(DDA_SYSTEM_FOLDER)
+        # Strict directory containment check using commonpath
+        if os.path.commonpath([abs_candidate_path, dda_system_folder_abs]) != dda_system_folder_abs:
             raise HTTPException(
                 status_code=400,
-                detail="Invalid file path"
+                detail="Invalid or unsafe file path"
             )
-        
-        # Construct full path within allowed directory
-        full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)
-        
-        # Ensure resolved path stays within DDA_SYSTEM_FOLDER
-        if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
+        if not os.path.exists(abs_candidate_path):
             raise HTTPException(
-                status_code=400,
-                detail="Access denied"
-            )
-        
-        if not os.path.exists(full_path):
-            raise HTTPException(
                 status_code=HTTP_404_NOT_FOUND,
                 detail="File not found",
             )
-        with open(full_path) as json_file:
+        with open(abs_candidate_path) as json_file:
             inference_result_data_list = json.load(json_file)
     else:
         # Regular case, query the data ourselves
EOF
@@ -212,30 +212,22 @@
# First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
# Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
if captureIdPath:
# Sanitize path to prevent directory traversal
safe_path = os.path.basename(captureIdPath)
if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
# Normalize path and ensure it is descendant of the allowed root
candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
abs_candidate_path = os.path.abspath(candidate_path)
dda_system_folder_abs = os.path.abspath(DDA_SYSTEM_FOLDER)
# Strict directory containment check using commonpath
if os.path.commonpath([abs_candidate_path, dda_system_folder_abs]) != dda_system_folder_abs:
raise HTTPException(
status_code=400,
detail="Invalid file path"
detail="Invalid or unsafe file path"
)

# Construct full path within allowed directory
full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)

# Ensure resolved path stays within DDA_SYSTEM_FOLDER
if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
if not os.path.exists(abs_candidate_path):
raise HTTPException(
status_code=400,
detail="Access denied"
)

if not os.path.exists(full_path):
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="File not found",
)
with open(full_path) as json_file:
with open(abs_candidate_path) as json_file:
inference_result_data_list = json.load(json_file)
else:
# Regular case, query the data ourselves
Copilot is powered by AI and may make mistakes. Always verify output.
detail="File not found",
)
with open(captureIdPath) as json_file:
with open(full_path) as json_file:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 months ago

The optimal fix is to eliminate reliance on os.path.basename and instead normalize the provided path by using os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)). After normalization, check that the resultant path is strictly within DDA_SYSTEM_FOLDER by verifying it starts with the canonical root directory path. The validation should be based on canonical (absolute, normalized) paths, not just the user-provided string or basename.

Update the logic as follows:

  • Remove safe_path = os.path.basename(captureIdPath) and the related check.
  • Instead, resolve the candidate file path by normalizing with os.path.normpath and then converting to os.path.abspath.
  • Check that the resultant path starts with the canonical root directory.
  • Leave all other logic intact.

All changes should be made in the get_inference_result_data_for_retraining function in src/backend/endpoints/download_file.py.


Suggested changeset 1
src/backend/endpoints/download_file.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/backend/endpoints/download_file.py b/src/backend/endpoints/download_file.py
--- a/src/backend/endpoints/download_file.py
+++ b/src/backend/endpoints/download_file.py
@@ -212,30 +212,21 @@
     # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
     # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
     if captureIdPath:
-        # Sanitize path to prevent directory traversal
-        safe_path = os.path.basename(captureIdPath)
-        if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
+        # Sanitize and validate path to prevent directory traversal
+        candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
+        root_folder = os.path.abspath(DDA_SYSTEM_FOLDER)
+        resolved_path = os.path.abspath(candidate_path)
+        if not resolved_path.startswith(root_folder + os.sep):
             raise HTTPException(
                 status_code=400,
-                detail="Invalid file path"
+                detail="Invalid or disallowed file path"
             )
-        
-        # Construct full path within allowed directory
-        full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)
-        
-        # Ensure resolved path stays within DDA_SYSTEM_FOLDER
-        if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
+        if not os.path.exists(resolved_path):
             raise HTTPException(
-                status_code=400,
-                detail="Access denied"
-            )
-        
-        if not os.path.exists(full_path):
-            raise HTTPException(
                 status_code=HTTP_404_NOT_FOUND,
                 detail="File not found",
             )
-        with open(full_path) as json_file:
+        with open(resolved_path) as json_file:
             inference_result_data_list = json.load(json_file)
     else:
         # Regular case, query the data ourselves
EOF
@@ -212,30 +212,21 @@
# First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
# Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
if captureIdPath:
# Sanitize path to prevent directory traversal
safe_path = os.path.basename(captureIdPath)
if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
# Sanitize and validate path to prevent directory traversal
candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
root_folder = os.path.abspath(DDA_SYSTEM_FOLDER)
resolved_path = os.path.abspath(candidate_path)
if not resolved_path.startswith(root_folder + os.sep):
raise HTTPException(
status_code=400,
detail="Invalid file path"
detail="Invalid or disallowed file path"
)

# Construct full path within allowed directory
full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)

# Ensure resolved path stays within DDA_SYSTEM_FOLDER
if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
if not os.path.exists(resolved_path):
raise HTTPException(
status_code=400,
detail="Access denied"
)

if not os.path.exists(full_path):
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="File not found",
)
with open(full_path) as json_file:
with open(resolved_path) as json_file:
inference_result_data_list = json.load(json_file)
else:
# Regular case, query the data ourselves
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant