Security fixes #15

bhavishadawada · 2025-10-08T23:28:47Z

Fix Backend security issues

Can be removed.

- Sanitize user-controlled fileName parameter to prevent directory traversal - Add os.path.basename() to remove path components from filename - Validate filename contains no '..' or path separators - Add directory boundary check to ensure resolved path stays within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection - Prevents attacks like '../../../etc/passwd' from accessing system files

- Add path sanitization using os.path.basename() - Validate against directory traversal patterns (.., path separators) - Enforce directory boundary checks within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection

…ction - Add sanitization for workflowId and captureId parameters - Validate against directory traversal patterns (.., path separators) - Use sanitized components in jsonl file path construction - Addresses CodeQL rule py/path-injection for line 108

src/backend/endpoints/download_file.py

+                detail="Access denied"
+            )
+
+        if not os.path.exists(full_path):


The best practice for managing potentially user-controlled file paths is:

Normalize (canonicalize) the complete path after joining the user input to the intended base directory. This step removes symlinks, redundant path separators, and traversal segments like ...

Then, check after normalization that the resulting path is strictly within the intended parent directory using robust containment logic (not just startswith, which is error-prone for directories like /tmp/foo and /tmp/foobar).

Edits required:

In src/backend/endpoints/download_file.py, lines 216-228:

Remove or replace os.path.basename and associated manual checks.

Use os.path.normpath (and preferably os.path.abspath) after joining the user-supplied path to the base directory.

Ensure the containment check is robust: on Unix, this is often done with os.path.commonpath.

Use the normalized check before file access.

Add any imports needed (os is already imported).

src/backend/endpoints/download_file.py

+                detail="File not found",
            )
-        with open(captureIdPath) as json_file:
+        with open(full_path) as json_file:


The optimal fix is to eliminate reliance on os.path.basename and instead normalize the provided path by using os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)). After normalization, check that the resultant path is strictly within DDA_SYSTEM_FOLDER by verifying it starts with the canonical root directory path. The validation should be based on canonical (absolute, normalized) paths, not just the user-provided string or basename.

Update the logic as follows:

Remove safe_path = os.path.basename(captureIdPath) and the related check.

Instead, resolve the candidate file path by normalizing with os.path.normpath and then converting to os.path.abspath.

Check that the resultant path starts with the canonical root directory.

Leave all other logic intact.

All changes should be made in the get_inference_result_data_for_retraining function in src/backend/endpoints/download_file.py.

bhavishadawada added 5 commits October 8, 2025 18:48

Fix information exposure through exception

9d65707

Too few arguments to formatting function

d450881

Can be removed.

bhavishadawada requested a review from rvanderwerf October 8, 2025 23:28

github-advanced-security bot found potential problems Oct 8, 2025

View reviewed changes

@@ -212,30 +212,22 @@
                 # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
                 # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
                 if captureIdPath:
-                    # Sanitize path to prevent directory traversal
-                    safe_path = os.path.basename(captureIdPath)
-                    if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
+                    # Normalize path and ensure it is descendant of the allowed root
+                    candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
+                    abs_candidate_path = os.path.abspath(candidate_path)
+                    dda_system_folder_abs = os.path.abspath(DDA_SYSTEM_FOLDER)
+                    # Strict directory containment check using commonpath
+                    if os.path.commonpath([abs_candidate_path, dda_system_folder_abs]) != dda_system_folder_abs:
                         raise HTTPException(
                             status_code=400,
-                            detail="Invalid file path"
+                            detail="Invalid or unsafe file path"
                         )
-                    # Construct full path within allowed directory
-                    full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)
-                    # Ensure resolved path stays within DDA_SYSTEM_FOLDER
-                    if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
+                    if not os.path.exists(abs_candidate_path):
                         raise HTTPException(
-                            status_code=400,
-                            detail="Access denied"
-                        )
-                    if not os.path.exists(full_path):
-                        raise HTTPException(
                             status_code=HTTP_404_NOT_FOUND,
                             detail="File not found",
                         )
-                    with open(full_path) as json_file:
+                    with open(abs_candidate_path) as json_file:
                         inference_result_data_list = json.load(json_file)
                 else:
                     # Regular case, query the data ourselves

@@ -212,30 +212,21 @@
                 # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk
                 # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk
                 if captureIdPath:
-                    # Sanitize path to prevent directory traversal
-                    safe_path = os.path.basename(captureIdPath)
-                    if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path:
+                    # Sanitize and validate path to prevent directory traversal
+                    candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath))
+                    root_folder = os.path.abspath(DDA_SYSTEM_FOLDER)
+                    resolved_path = os.path.abspath(candidate_path)
+                    if not resolved_path.startswith(root_folder + os.sep):
                         raise HTTPException(
                             status_code=400,
-                            detail="Invalid file path"
+                            detail="Invalid or disallowed file path"
                         )
-                    # Construct full path within allowed directory
-                    full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path)
-                    # Ensure resolved path stays within DDA_SYSTEM_FOLDER
-                    if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)):
+                    if not os.path.exists(resolved_path):
                         raise HTTPException(
-                            status_code=400,
-                            detail="Access denied"
-                        )
-                    if not os.path.exists(full_path):
-                        raise HTTPException(
                             status_code=HTTP_404_NOT_FOUND,
                             detail="File not found",
                         )
-                    with open(full_path) as json_file:
+                    with open(resolved_path) as json_file:
                         inference_result_data_list = json.load(json_file)
                 else:
                     # Regular case, query the data ourselves

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security fixes #15

Security fixes #15

Uh oh!

bhavishadawada commented Oct 8, 2025

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Security fixes #15

Are you sure you want to change the base?

Security fixes #15

Uh oh!

Conversation

bhavishadawada commented Oct 8, 2025

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant