update structure #3

bio-boris · 2025-12-15T22:00:46Z

No description provided.

github-advanced-security · 2025-12-15T22:01:27Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Copilot

Pull request overview

This PR restructures the TableScanner application by introducing a layered architecture with separate service modules, configuration management, and a more comprehensive search API endpoint.

Introduces service layer with modules for MinIO, SQLite, and cache operations
Adds configuration management using pydantic-settings for environment variables
Replaces GET /search endpoint with a POST endpoint that accepts structured search requests with filtering, ordering, and pagination

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
pyproject.toml	Adds pydantic-settings dependency for configuration management
app/config.py	New configuration module using pydantic-settings to manage KB_SERVICE_AUTH_TOKEN and CACHE_DIR settings
app/models.py	New Pydantic models defining SearchRequest schema with pangenome_id, table_name, and query parameters
app/services/init.py	New services module initialization exporting core service functions
app/services/cache_service.py	New cache service module with functions for managing local file caching
app/services/minio_service.py	New MinIO service stub for downloading binary files from object storage
app/services/sqlite_service.py	New SQLite service with conversion and querying functions including flexible filtering and ordering
app/main.py	Updates to store settings in app state for access throughout the application
app/routes.py	Refactors search endpoint to POST method with integrated workflow for downloading, caching, converting, and querying data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-15T22:05:59Z

app/utils/sqlite.py

+            if op not in allowed_ops:
+                raise ValueError(f"Invalid operator: {op}")
+
+            where_clauses.append(f"{column} {op} ?")


The column names from filter_spec are directly interpolated into the SQL query without sanitization. This creates a SQL injection vulnerability. An attacker could pass malicious column names that break out of the query structure. Validate column names against the actual table schema or use a whitelist approach.

app/routes.py

app/services/__init__.py

Copilot · 2025-12-15T22:06:00Z

app/routes.py

+        filters=search_request.filters,
+    )
+
+    #TODO use a return model when we figure out what we want to return


The TODO comment is missing a space after the hash and colon. The standard format is '# TODO:' with proper spacing for consistency with other TODO comments in the codebase.

Suggested change

#TODO use a return model when we figure out what we want to return

# TODO: use a return model when we figure out what we want to return

app/config.py

Copilot · 2025-12-15T22:06:02Z

app/utils/sqlite.py

+            if direction not in ["ASC", "DESC"]:
+                raise ValueError(f"Invalid direction: {direction}")
+
+            order_clauses.append(f"{column} {direction}")


The column names in order_spec are directly interpolated into the SQL query without sanitization. This creates a SQL injection vulnerability. An attacker could pass malicious column names. Validate column names against the actual table schema or use a whitelist approach.

app/routes.py

Copilot · 2025-12-15T22:06:02Z

app/utils/sqlite.py

+            where_clauses.append(f"{column} {op} ?")
+            params.append(value)


The IN operator requires special handling with multiple values (e.g., a tuple), but the current implementation treats it like other operators with a single placeholder. When op is "IN" and value is a list, this will fail. The IN operator needs special parameter binding handling or should be removed from the allowed_ops list.

Suggested change

where_clauses.append(f"{column} {op} ?")

params.append(value)

if op == "IN":

if not isinstance(value, (list, tuple)) or not value:

raise ValueError(f"Value for IN operator must be a non-empty list or tuple, got: {value}")

placeholders = ", ".join(["?"] * len(value))

where_clauses.append(f"{column} IN ({placeholders})")

params.extend(value)

else:

where_clauses.append(f"{column} {op} ?")

params.append(value)

Copilot · 2025-12-15T22:06:03Z

app/routes.py

+
+    # Convert to SQLite if not already converted
+    if not is_cached(sqlite_file_path):
+        convert_to_sqlite(cache_file_path, sqlite_file_path)


The convert_to_sqlite function raises NotImplementedError, but this is called without a try-except block. This will cause the endpoint to fail with an unhandled exception. Either implement the function or add proper error handling to return a meaningful HTTP error response.

Suggested change

convert_to_sqlite(cache_file_path, sqlite_file_path)

try:

convert_to_sqlite(cache_file_path, sqlite_file_path)

except NotImplementedError:

raise HTTPException(

status_code=501,

detail="The convert_to_sqlite function is not implemented."

)

app/routes.py

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 18 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-15T22:18:34Z

app/utils/download.py

+    """
+    headers = {"Authorization": auth_token}
+    response = requests.get(handle_url, headers=headers)
+    response.raise_for_status()


Poor error handling: The raise_for_status() call will raise a generic HTTPError without additional context. Consider catching this exception and raising a more informative error that includes details about what failed (e.g., which URL, what status code) to help with debugging and provide better user feedback.

Suggested change

response.raise_for_status()

try:

response.raise_for_status()

except requests.HTTPError as e:

raise requests.HTTPError(

f"Failed to download from {handle_url} (status code: {response.status_code}): {response.text}"

) from e

Copilot · 2025-12-15T22:18:35Z

app/utils/download.py

+        requests.HTTPError: If download fails
+    """
+    headers = {"Authorization": auth_token}
+    response = requests.get(handle_url, headers=headers)


Missing timeout parameter: The requests.get call does not specify a timeout, which could cause the application to hang indefinitely if the remote server is unresponsive. Consider adding a timeout parameter (e.g., timeout=30) to prevent resource exhaustion.

Suggested change

response = requests.get(handle_url, headers=headers)

response = requests.get(handle_url, headers=headers, timeout=30)

Copilot · 2025-12-15T22:18:35Z

app/utils/cache.py

+    cache_file_path = cache_dir / id / filename
+    sqlite_file_path = cache_dir / id / f"{Path(filename).stem}.db"


Path traversal vulnerability: The id and filename parameters are used directly in path construction without validation. An attacker could provide values like "../../../etc/passwd" to access or create files outside the intended cache directory. Consider sanitizing these inputs by removing path separators or validating against a safe pattern.

Copilot · 2025-12-15T22:18:35Z

app/routes.py

+        convert_to_sqlite(cache_file_path, sqlite_file_path)
+
+    # Query the SQLite file with parameters
+    from app.utils.sqlite import get_table_data


The import for get_table_data should be moved to the top of the file with the other imports from app.utils.sqlite (line 14) for better code organization and consistency. Inline imports should be avoided unless there's a specific reason like circular imports.

Copilot · 2025-12-15T22:18:35Z

app/models.py

+from pydantic import BaseModel
+
+
+class SearchRequest(BaseModel):
+    """Search request with query parameters."""
+    pangenome_id: str
+    table_name: str
+    limit: Optional[int] = None


Missing validation for limit field: The limit field should have a minimum value (at least 1) and possibly a maximum value to prevent excessive memory usage or denial of service from extremely large result sets. Consider adding Field validators with appropriate constraints.

Suggested change

from pydantic import BaseModel

class SearchRequest(BaseModel):

"""Search request with query parameters."""

pangenome_id: str

table_name: str

limit: Optional[int] = None

from pydantic import BaseModel, Field

class SearchRequest(BaseModel):

"""Search request with query parameters."""

pangenome_id: str

table_name: str

limit: Optional[int] = Field(

default=None,

ge=1,

le=1000,

description="Maximum number of results to return (min 1, max 1000)"

)

Copilot · 2025-12-15T22:18:38Z

app/utils/sqlite.py

+    conn = sqlite3.connect(sqlite_file)
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+
+    # Build SELECT query
+    query = f"SELECT * FROM {table_name}"
+    params = []
+
+    # Add WHERE clause for filters
+    if filters:
+        where_clauses = []
+        for filter_spec in filters:
+            column = filter_spec["column"]
+            op = filter_spec["op"]
+            value = filter_spec["value"]
+
+            # Sanitize operator
+            allowed_ops = ["=", "!=", "<", ">", "<=", ">=", "LIKE", "IN"]
+            if op not in allowed_ops:
+                raise ValueError(f"Invalid operator: {op}")
+
+            where_clauses.append(f"{column} {op} ?")
+            params.append(value)
+
+        query += " WHERE " + " AND ".join(where_clauses)
+
+    # Add ORDER BY clause
+    if order_by:
+        order_clauses = []
+        for order_spec in order_by:
+            column = order_spec["column"]
+            direction = order_spec.get("direction", "ASC").upper()
+
+            if direction not in ["ASC", "DESC"]:
+                raise ValueError(f"Invalid direction: {direction}")
+
+            order_clauses.append(f"{column} {direction}")
+
+        query += " ORDER BY " + ", ".join(order_clauses)
+
+    # Add LIMIT clause
+    if limit is not None:
+        query += f" LIMIT {int(limit)}"
+
+    # Execute query
+    cursor.execute(query, params)
+    rows = cursor.fetchall()
+
+    # Convert to list of dicts
+    results = [dict(row) for row in rows]
+
+    conn.close()
+
+    return results


Missing resource cleanup on error: If an exception occurs during query execution, the database connection is not properly closed. Consider using a try-finally block or a context manager (with statement) to ensure the connection is always closed, even when errors occur.

Suggested change

conn = sqlite3.connect(sqlite_file)

conn.row_factory = sqlite3.Row

cursor = conn.cursor()

# Build SELECT query

query = f"SELECT * FROM {table_name}"

params = []

# Add WHERE clause for filters

if filters:

where_clauses = []

for filter_spec in filters:

column = filter_spec["column"]

op = filter_spec["op"]

value = filter_spec["value"]

# Sanitize operator

allowed_ops = ["=", "!=", "<", ">", "<=", ">=", "LIKE", "IN"]

if op not in allowed_ops:

raise ValueError(f"Invalid operator: {op}")

where_clauses.append(f"{column} {op} ?")

params.append(value)

query += " WHERE " + " AND ".join(where_clauses)

# Add ORDER BY clause

if order_by:

order_clauses = []

for order_spec in order_by:

column = order_spec["column"]

direction = order_spec.get("direction", "ASC").upper()

if direction not in ["ASC", "DESC"]:

raise ValueError(f"Invalid direction: {direction}")

order_clauses.append(f"{column} {direction}")

query += " ORDER BY " + ", ".join(order_clauses)

# Add LIMIT clause

if limit is not None:

query += f" LIMIT {int(limit)}"

# Execute query

cursor.execute(query, params)

rows = cursor.fetchall()

# Convert to list of dicts

results = [dict(row) for row in rows]

conn.close()

return results

with sqlite3.connect(sqlite_file) as conn:

conn.row_factory = sqlite3.Row

cursor = conn.cursor()

# Build SELECT query

query = f"SELECT * FROM {table_name}"

params = []

# Add WHERE clause for filters

if filters:

where_clauses = []

for filter_spec in filters:

column = filter_spec["column"]

op = filter_spec["op"]

value = filter_spec["value"]

# Sanitize operator

allowed_ops = ["=", "!=", "<", ">", "<=", ">=", "LIKE", "IN"]

if op not in allowed_ops:

raise ValueError(f"Invalid operator: {op}")

where_clauses.append(f"{column} {op} ?")

params.append(value)

query += " WHERE " + " AND ".join(where_clauses)

# Add ORDER BY clause

if order_by:

order_clauses = []

for order_spec in order_by:

column = order_spec["column"]

direction = order_spec.get("direction", "ASC").upper()

if direction not in ["ASC", "DESC"]:

raise ValueError(f"Invalid direction: {direction}")

order_clauses.append(f"{column} {direction}")

query += " ORDER BY " + ", ".join(order_clauses)

# Add LIMIT clause

if limit is not None:

query += f" LIMIT {int(limit)}"

# Execute query

cursor.execute(query, params)

rows = cursor.fetchall()

# Convert to list of dicts

results = [dict(row) for row in rows]

return results

Copilot · 2025-12-15T22:18:38Z

app/utils/sqlite.py

+            value = filter_spec["value"]
+
+            # Sanitize operator
+            allowed_ops = ["=", "!=", "<", ">", "<=", ">=", "LIKE", "IN"]


The IN operator is listed as allowed but the current implementation doesn't handle it correctly. The IN operator expects a tuple of values (e.g., "column IN (?, ?, ?)"), but the current code only appends a single placeholder. This will cause SQL syntax errors when the IN operator is used.

Copilot · 2025-12-15T22:18:38Z

app/routes.py

+        convert_to_sqlite(cache_file_path, sqlite_file_path)
+


The convert_to_sqlite function always raises NotImplementedError (as seen in sqlite.py line 48), which will cause the endpoint to fail when attempting to convert uncached files. This should be implemented or the endpoint should handle the NotImplementedError appropriately.

Suggested change

convert_to_sqlite(cache_file_path, sqlite_file_path)

try:

convert_to_sqlite(cache_file_path, sqlite_file_path)

except NotImplementedError:

raise HTTPException(

status_code=501,

detail="Conversion to SQLite is not implemented yet."

)

Copilot · 2025-12-15T22:18:39Z

app/routes.py

+        "cache_file": str(cache_file_path),
+        "sqlite_file": str(sqlite_file_path),


Potential information disclosure: The response exposes internal file system paths (cache_file and sqlite_file). This could reveal sensitive information about the server's directory structure to clients. Consider removing these fields or making them optional for debugging purposes only.

Suggested change

"cache_file": str(cache_file_path),

"sqlite_file": str(sqlite_file_path),

Copilot · 2025-12-15T22:18:39Z

app/utils/workspace.py

+KBase Workspace utilities for retrieving object information.
+"""
+
+import requests


Import of 'requests' is not used.

Suggested change

import requests

update structure

7698419

Copilot AI review requested due to automatic review settings December 15, 2025 22:00

Copilot started reviewing on behalf of bio-boris December 15, 2025 22:01 View session

bio-boris requested a review from VibhavSetlur December 15, 2025 22:02

Copilot AI reviewed Dec 15, 2025

View reviewed changes

Update strategy

c9979a6

bio-boris requested a review from Copilot December 15, 2025 22:11

Copilot started reviewing on behalf of bio-boris December 15, 2025 22:11 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

bio-boris added 2 commits December 15, 2025 16:18

Update strategy

6518738

Update strategy

8d945bc

bio-boris merged commit 8c46a2b into develop Dec 15, 2025
6 checks passed

bio-boris deleted the update_structure branch December 15, 2025 22:24

	#TODO use a return model when we figure out what we want to return
	# TODO: use a return model when we figure out what we want to return

		where_clauses.append(f"{column} {op} ?")
		params.append(value)

-            where_clauses.append(f"{column} {op} ?")
-            params.append(value)
+            if op == "IN":
+                if not isinstance(value, (list, tuple)) or not value:
+                    raise ValueError(f"Value for IN operator must be a non-empty list or tuple, got: {value}")
+                placeholders = ", ".join(["?"] * len(value))
+                where_clauses.append(f"{column} IN ({placeholders})")
+                params.extend(value)
+            else:
+                where_clauses.append(f"{column} {op} ?")
+                params.append(value)

-        convert_to_sqlite(cache_file_path, sqlite_file_path)
+        try:
+            convert_to_sqlite(cache_file_path, sqlite_file_path)
+        except NotImplementedError:
+            raise HTTPException(
+                status_code=501,
+                detail="The convert_to_sqlite function is not implemented."
+            )

-    response.raise_for_status()
+    try:
+        response.raise_for_status()
+    except requests.HTTPError as e:
+        raise requests.HTTPError(
+            f"Failed to download from {handle_url} (status code: {response.status_code}): {response.text}"
+        ) from e

	response = requests.get(handle_url, headers=headers)
	response = requests.get(handle_url, headers=headers, timeout=30)

		cache_file_path = cache_dir / id / filename
		sqlite_file_path = cache_dir / id / f"{Path(filename).stem}.db"

		"cache_file": str(cache_file_path),
		"sqlite_file": str(sqlite_file_path),

update structure #3

update structure #3

Uh oh!

Conversation

bio-boris commented Dec 15, 2025

Uh oh!

github-advanced-security bot commented Dec 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants