-
Notifications
You must be signed in to change notification settings - Fork 2
Security fixes #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Security fixes #15
Conversation
Can be removed.
- Sanitize user-controlled fileName parameter to prevent directory traversal - Add os.path.basename() to remove path components from filename - Validate filename contains no '..' or path separators - Add directory boundary check to ensure resolved path stays within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection - Prevents attacks like '../../../etc/passwd' from accessing system files
- Add path sanitization using os.path.basename() - Validate against directory traversal patterns (.., path separators) - Enforce directory boundary checks within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection
…ction - Add sanitization for workflowId and captureId parameters - Validate against directory traversal patterns (.., path separators) - Use sanitized components in jsonl file path construction - Addresses CodeQL rule py/path-injection for line 108
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
The best practice for managing potentially user-controlled file paths is:
- Normalize (canonicalize) the complete path after joining the user input to the intended base directory. This step removes symlinks, redundant path separators, and traversal segments like
... - Then, check after normalization that the resulting path is strictly within the intended parent directory using robust containment logic (not just
startswith, which is error-prone for directories like/tmp/fooand/tmp/foobar).
Edits required:
- In
src/backend/endpoints/download_file.py, lines 216-228:- Remove or replace
os.path.basenameand associated manual checks. - Use
os.path.normpath(and preferablyos.path.abspath) after joining the user-supplied path to the base directory. - Ensure the containment check is robust: on Unix, this is often done with
os.path.commonpath. - Use the normalized check before file access.
- Remove or replace
- Add any imports needed (
osis already imported).
-
Copy modified lines R215-R220 -
Copy modified line R223 -
Copy modified line R225 -
Copy modified line R230
| @@ -212,30 +212,22 @@ | ||
| # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk | ||
| # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk | ||
| if captureIdPath: | ||
| # Sanitize path to prevent directory traversal | ||
| safe_path = os.path.basename(captureIdPath) | ||
| if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path: | ||
| # Normalize path and ensure it is descendant of the allowed root | ||
| candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)) | ||
| abs_candidate_path = os.path.abspath(candidate_path) | ||
| dda_system_folder_abs = os.path.abspath(DDA_SYSTEM_FOLDER) | ||
| # Strict directory containment check using commonpath | ||
| if os.path.commonpath([abs_candidate_path, dda_system_folder_abs]) != dda_system_folder_abs: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Invalid file path" | ||
| detail="Invalid or unsafe file path" | ||
| ) | ||
|
|
||
| # Construct full path within allowed directory | ||
| full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path) | ||
|
|
||
| # Ensure resolved path stays within DDA_SYSTEM_FOLDER | ||
| if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)): | ||
| if not os.path.exists(abs_candidate_path): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail="File not found", | ||
| ) | ||
| with open(full_path) as json_file: | ||
| with open(abs_candidate_path) as json_file: | ||
| inference_result_data_list = json.load(json_file) | ||
| else: | ||
| # Regular case, query the data ourselves |
| detail="File not found", | ||
| ) | ||
| with open(captureIdPath) as json_file: | ||
| with open(full_path) as json_file: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
The optimal fix is to eliminate reliance on os.path.basename and instead normalize the provided path by using os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)). After normalization, check that the resultant path is strictly within DDA_SYSTEM_FOLDER by verifying it starts with the canonical root directory path. The validation should be based on canonical (absolute, normalized) paths, not just the user-provided string or basename.
Update the logic as follows:
- Remove
safe_path = os.path.basename(captureIdPath)and the related check. - Instead, resolve the candidate file path by normalizing with
os.path.normpathand then converting toos.path.abspath. - Check that the resultant path starts with the canonical root directory.
- Leave all other logic intact.
All changes should be made in the get_inference_result_data_for_retraining function in src/backend/endpoints/download_file.py.
-
Copy modified lines R215-R219 -
Copy modified line R222 -
Copy modified line R224 -
Copy modified line R229
| @@ -212,30 +212,21 @@ | ||
| # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk | ||
| # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk | ||
| if captureIdPath: | ||
| # Sanitize path to prevent directory traversal | ||
| safe_path = os.path.basename(captureIdPath) | ||
| if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path: | ||
| # Sanitize and validate path to prevent directory traversal | ||
| candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)) | ||
| root_folder = os.path.abspath(DDA_SYSTEM_FOLDER) | ||
| resolved_path = os.path.abspath(candidate_path) | ||
| if not resolved_path.startswith(root_folder + os.sep): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Invalid file path" | ||
| detail="Invalid or disallowed file path" | ||
| ) | ||
|
|
||
| # Construct full path within allowed directory | ||
| full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path) | ||
|
|
||
| # Ensure resolved path stays within DDA_SYSTEM_FOLDER | ||
| if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)): | ||
| if not os.path.exists(resolved_path): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail="File not found", | ||
| ) | ||
| with open(full_path) as json_file: | ||
| with open(resolved_path) as json_file: | ||
| inference_result_data_list = json.load(json_file) | ||
| else: | ||
| # Regular case, query the data ourselves |
Fix Backend security issues