Skip to content

Conversation

@rodrigopitanga
Copy link
Owner

Summary

  • sanitize txtai metadata keys and values before persisting them, reuse the sanitized copy for indexing, and harden post-filter matching
  • extend the sql safety regression to assert sanitized metadata storage/return values and align the fake embeddings filter expectations

Testing

  • pytest tests/test_txtai_store_sql_safety.py

https://chatgpt.com/codex/tasks/task_e_69093a475d08832c840db7f13a54e5b9

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces SQL injection protection for the txtai vector store implementation by implementing comprehensive input sanitization. The changes ensure that user-supplied query strings, filter keys, and filter values cannot be used to inject malicious SQL code.

Key changes:

  • Added sanitization functions (_sanit_sql, _sanit_field, _sanit_meta_dict, _sanit_meta_value) to clean potentially dangerous characters from SQL inputs
  • Updated FakeEmbeddings test utility to support metadata filtering and track generated SQL queries
  • Added comprehensive test coverage for SQL safety with edge cases including special characters, injection attempts, and query length limits

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pave/stores/txtai_store.py Core implementation of SQL sanitization methods and integration into query building and metadata handling
tests/utils.py Enhanced FakeEmbeddings to parse and apply SQL filters for realistic testing of sanitization logic
tests/test_txtai_store_sql_safety.py New test suite validating SQL injection protection across various attack vectors
tests/conftest.py Added txtai module stub to allow tests to run without optional txtai dependency installed
Comments suppressed due to low confidence (1)

pave/stores/txtai_store.py:523

  • Extra space before parenthesis in isinstance (txt, dict). Should be isinstance(txt, dict) for consistency with Python style conventions.
                "text": txt.get("text") if isinstance (txt, dict) else txt,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Nov 5, 2025

@rodrigopitanga I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link

Copilot AI commented Nov 5, 2025

@rodrigopitanga I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you.

@rodrigopitanga rodrigopitanga merged commit c014c40 into main Nov 5, 2025
6 checks passed
@rodrigopitanga rodrigopitanga deleted the codex/fix-sql-query-escaping-and-add-tests branch November 5, 2025 01:08
@rodrigopitanga rodrigopitanga added the enhancement New feature or request label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants