-
Notifications
You must be signed in to change notification settings - Fork 1
Sanitize txtai metadata persistence #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize txtai metadata persistence #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces SQL injection protection for the txtai vector store implementation by implementing comprehensive input sanitization. The changes ensure that user-supplied query strings, filter keys, and filter values cannot be used to inject malicious SQL code.
Key changes:
- Added sanitization functions (
_sanit_sql,_sanit_field,_sanit_meta_dict,_sanit_meta_value) to clean potentially dangerous characters from SQL inputs - Updated
FakeEmbeddingstest utility to support metadata filtering and track generated SQL queries - Added comprehensive test coverage for SQL safety with edge cases including special characters, injection attempts, and query length limits
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pave/stores/txtai_store.py | Core implementation of SQL sanitization methods and integration into query building and metadata handling |
| tests/utils.py | Enhanced FakeEmbeddings to parse and apply SQL filters for realistic testing of sanitization logic |
| tests/test_txtai_store_sql_safety.py | New test suite validating SQL injection protection across various attack vectors |
| tests/conftest.py | Added txtai module stub to allow tests to run without optional txtai dependency installed |
Comments suppressed due to low confidence (1)
pave/stores/txtai_store.py:523
- Extra space before parenthesis in
isinstance (txt, dict). Should beisinstance(txt, dict)for consistency with Python style conventions.
"text": txt.get("text") if isinstance (txt, dict) else txt,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@rodrigopitanga I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@rodrigopitanga I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you. |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_69093a475d08832c840db7f13a54e5b9