Add similarity search functions for tasks and tickets#166
Add similarity search functions for tasks and tickets#166dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request adds two PostgreSQL PL/pgSQL functions for semantic similarity search. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql`:
- Around line 17-20: Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
🧹 Nitpick comments (1)
supabase/migrations/20251021110000_task_ticket_vector_search.sql (1)
1-23: ConsiderLANGUAGE sqland marking asSTABLE.Since the function body is a single
RETURN QUERY SELECT, PL/pgSQL is unnecessary overhead — plainLANGUAGE sqlavoids the PL/pgSQL executor layer. Additionally, these functions have no side effects and return consistent results for the same inputs within a transaction, so marking themSTABLElets the planner optimize repeated calls.♻️ Suggested diff
CREATE OR REPLACE FUNCTION get_similar_tasks( query_embedding vector(768), match_count INT DEFAULT 3 ) RETURNS TABLE ( task_id UUID, title TEXT, description TEXT, similarity FLOAT -) AS $$ -BEGIN - RETURN QUERY +) LANGUAGE sql STABLE AS $$ SELECT t.id AS task_id, t.title, t.description, 1 - (t.description_embedding <=> query_embedding) AS similarity FROM tasks t WHERE t.description_embedding IS NOT NULL ORDER BY t.description_embedding <=> query_embedding LIMIT match_count; -END; -$$ LANGUAGE plpgsql; +$$;Same applies to
get_similar_ticketsbelow.
| 1 - (t.description_embedding <=> query_embedding) AS similarity | ||
| FROM tasks t | ||
| WHERE t.description_embedding IS NOT NULL | ||
| ORDER BY t.description_embedding <=> query_embedding |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the migration file
find . -name "*20251021110000_task_ticket_vector_search.sql" -type fRepository: AOSSIE-Org/Ell-ena
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Check all SQL files for vector index creation patterns
rg -n 'hnsw|ivfflat|vector_cosine_ops|vector_l2_ops' --type sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Also search for CREATE INDEX with vector patterns
rg -n 'CREATE INDEX.*embedding|CREATE INDEX.*vector' --type sql -iRepository: AOSSIE-Org/Ell-ena
Length of output: 44
🏁 Script executed:
#!/bin/bash
# List all migration files to understand the structure
find . -path "*/migrations/*.sql" -type f | sortRepository: AOSSIE-Org/Ell-ena
Length of output: 743
🏁 Script executed:
#!/bin/bash
# Read the migration file to see its full content
cat -n ./supabase/migrations/20251021110000_task_ticket_vector_search.sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 1518
🏁 Script executed:
#!/bin/bash
# Also check the meeting vector search migration for comparison
cat -n ./supabase/migrations/20251021090000_meeting_vector_search.sqlRepository: AOSSIE-Org/Ell-ena
Length of output: 4913
Add vector indexes for efficient similarity queries.
The <=> operator in both ORDER BY clauses will fall back to sequential scans without indexes. Add HNSW indexes to enable efficient vector search:
CREATE INDEX IF NOT EXISTS idx_tasks_embedding
ON tasks USING hnsw (description_embedding vector_cosine_ops);
CREATE INDEX IF NOT EXISTS idx_tickets_embedding
ON tickets USING hnsw (description_embedding vector_cosine_ops);This is critical once the tables grow beyond a few thousand rows. If PR1 creates these indexes separately, this reminder can be disregarded.
Also applies to: 42-45
🤖 Prompt for AI Agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql` around
lines 17 - 20, Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
There was a problem hiding this comment.
I left indexing for now on purpose coz I wanted to keep this one focused on adding the similarity functions for tasks and tickets and not expanding the scope too much but I can add another pr for indexing as the table grows
There was a problem hiding this comment.
@dolliecoder, that makes sense! Keeping the PR focused is a good approach.
Would you like me to create a follow-up issue to track adding the HNSW indexes for tasks.description_embedding and tickets.description_embedding? This way it won't be forgotten as the tables grow.
This PR introduces SQL helper functions to enable vector similarity search for tasks and tickets using the pgvector setup added in PR1.
It is a follow-up, incremental step toward Issue #65. While PR1 introduced the description_embedding vector(768) columns for tasks and tickets (storage layer), this PR builds on that foundation by adding database-level similarity search functions (retrieval layer).
No embedding generation, indexing, or AI service integration is included here. This PR strictly enables semantic retrieval capability at the database level.
Dependency Note:
This PR depends on PR1, as it relies on the description_embedding columns introduced there. PR1 must be merged before this PR to ensure the functions execute against an existing schema. pr1 : #160
Changes Made
Added get_similar_tasks(query_embedding, match_count) SQL function
Added get_similar_tickets(query_embedding, match_count) SQL function
Each function:
Computes cosine similarity using <=>
Returns top-k most semantically similar rows
Ignores rows without embeddings (IS NOT NULL)
Added new Supabase migration file to maintain proper migration ordering
✅ Checklist
I have read the contributing guidelines.
I have added tests that prove my fix is effective or that my feature works.
(Not applicable – database-level capability addition only.)
I have added necessary documentation (if applicable).
(Not required at this stage.)
Any dependent changes have been merged and published in downstream modules.
(Depends on PR1 – embedding schema changes.)