Skip to content

Conversation

@rodrigotoledo
Copy link

Summary

This PR introduces a new score_threshold parameter to improve the quality of similarity search results by filtering out irrelevant matches based on similarity scores.

Changes

  • Feature: Added score_threshold parameter to similarity_search, similarity_search_by_vector, and ask methods in the Pgvector adapter
  • Feature: Implemented Ruby-side filtering to avoid issues with virtual columns when using score thresholds
  • Testing: Added comprehensive test coverage for the new score_threshold functionality and ask method
  • Version: Bumped version to 0.1.13 with updated CHANGELOG.md

Technical Details

The score_threshold parameter allows users to set a minimum similarity score (lower distance) to include in search results. When provided, the system:

  1. Fetches more candidates than requested (k + 5)
  2. Filters results in Ruby based on neighbor_distance <= score_threshold
  3. Returns the top k filtered results in the correct order

Usage Example

# Filter out results with similarity score > 0.5 (less similar)
Recipe.similarity_search("chocolate cake", k: 10, score_threshold: 0.5)
Recipe.ask("How to make chocolate cake?", k: 5, score_threshold: 0.3)

- Add score_threshold parameter to similarity_search, similarity_search_by_vector, and ask methods
- Improve test coverage with comprehensive tests for score_threshold functionality and ask method
- Added exclusion for .gem files to prevent circular reference error
- Fixes 'contains itself' validation error when building/installing gem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant