Skip to content

Add similarity search functions for tasks and tickets#166

Open
dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
dolliecoder:feat/task-ticket-vector-search
Open

Add similarity search functions for tasks and tickets#166
dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
dolliecoder:feat/task-ticket-vector-search

Conversation

@dolliecoder
Copy link

@dolliecoder dolliecoder commented Feb 13, 2026

This PR introduces SQL helper functions to enable vector similarity search for tasks and tickets using the pgvector setup added in PR1.

It is a follow-up, incremental step toward Issue #65. While PR1 introduced the description_embedding vector(768) columns for tasks and tickets (storage layer), this PR builds on that foundation by adding database-level similarity search functions (retrieval layer).
No embedding generation, indexing, or AI service integration is included here. This PR strictly enables semantic retrieval capability at the database level.

Dependency Note:
This PR depends on PR1, as it relies on the description_embedding columns introduced there. PR1 must be merged before this PR to ensure the functions execute against an existing schema. pr1 : #160

Changes Made

Added get_similar_tasks(query_embedding, match_count) SQL function
Added get_similar_tickets(query_embedding, match_count) SQL function

Each function:
Computes cosine similarity using <=>
Returns top-k most semantically similar rows
Ignores rows without embeddings (IS NOT NULL)
Added new Supabase migration file to maintain proper migration ordering

✅ Checklist

I have read the contributing guidelines.

I have added tests that prove my fix is effective or that my feature works.
(Not applicable – database-level capability addition only.)

I have added necessary documentation (if applicable).
(Not required at this stage.)

Any dependent changes have been merged and published in downstream modules.
(Depends on PR1 – embedding schema changes.)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

This pull request adds two PostgreSQL PL/pgSQL functions for semantic similarity search. The get_similar_tasks and get_similar_tickets functions compute cosine similarity between query embeddings and stored embeddings, returning ranked results up to a configurable limit.

Changes

Cohort / File(s) Summary
Database Vector Search Functions
supabase/migrations/20251021110000_task_ticket_vector_search.sql
Adds two new PL/pgSQL functions (get_similar_tasks and get_similar_tickets) that perform semantic similarity search using vector embeddings. Both functions accept a query embedding (768-dimensional vector) and match count parameter, computing cosine similarity via the <=> operator and returning ranked results with similarity scores.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Embeddings dance in vectors bright,
Eight seventy-four dimensions take flight,
Cosine whispers "find what's near,"
Similar tasks and tickets appear!
Search with similarity, clear and light. ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add similarity search functions for tasks and tickets' directly and clearly summarizes the main change—adding two SQL functions for vector-based similarity search on tasks and tickets.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql`:
- Around line 17-20: Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
🧹 Nitpick comments (1)
supabase/migrations/20251021110000_task_ticket_vector_search.sql (1)

1-23: Consider LANGUAGE sql and marking as STABLE.

Since the function body is a single RETURN QUERY SELECT, PL/pgSQL is unnecessary overhead — plain LANGUAGE sql avoids the PL/pgSQL executor layer. Additionally, these functions have no side effects and return consistent results for the same inputs within a transaction, so marking them STABLE lets the planner optimize repeated calls.

♻️ Suggested diff
 CREATE OR REPLACE FUNCTION get_similar_tasks(
     query_embedding vector(768),
     match_count INT DEFAULT 3
 )
 RETURNS TABLE (
     task_id UUID,
     title TEXT,
     description TEXT,
     similarity FLOAT
-) AS $$
-BEGIN
-    RETURN QUERY
+) LANGUAGE sql STABLE AS $$
     SELECT
         t.id AS task_id,
         t.title,
         t.description,
         1 - (t.description_embedding <=> query_embedding) AS similarity
     FROM tasks t
     WHERE t.description_embedding IS NOT NULL
     ORDER BY t.description_embedding <=> query_embedding
     LIMIT match_count;
-END;
-$$ LANGUAGE plpgsql;
+$$;

Same applies to get_similar_tickets below.

Comment on lines +17 to +20
1 - (t.description_embedding <=> query_embedding) AS similarity
FROM tasks t
WHERE t.description_embedding IS NOT NULL
ORDER BY t.description_embedding <=> query_embedding
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the migration file
find . -name "*20251021110000_task_ticket_vector_search.sql" -type f

Repository: AOSSIE-Org/Ell-ena

Length of output: 128


🏁 Script executed:

#!/bin/bash
# Check all SQL files for vector index creation patterns
rg -n 'hnsw|ivfflat|vector_cosine_ops|vector_l2_ops' --type sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 44


🏁 Script executed:

#!/bin/bash
# Also search for CREATE INDEX with vector patterns
rg -n 'CREATE INDEX.*embedding|CREATE INDEX.*vector' --type sql -i

Repository: AOSSIE-Org/Ell-ena

Length of output: 44


🏁 Script executed:

#!/bin/bash
# List all migration files to understand the structure
find . -path "*/migrations/*.sql" -type f | sort

Repository: AOSSIE-Org/Ell-ena

Length of output: 743


🏁 Script executed:

#!/bin/bash
# Read the migration file to see its full content
cat -n ./supabase/migrations/20251021110000_task_ticket_vector_search.sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 1518


🏁 Script executed:

#!/bin/bash
# Also check the meeting vector search migration for comparison
cat -n ./supabase/migrations/20251021090000_meeting_vector_search.sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 4913


Add vector indexes for efficient similarity queries.

The <=> operator in both ORDER BY clauses will fall back to sequential scans without indexes. Add HNSW indexes to enable efficient vector search:

CREATE INDEX IF NOT EXISTS idx_tasks_embedding
  ON tasks USING hnsw (description_embedding vector_cosine_ops);

CREATE INDEX IF NOT EXISTS idx_tickets_embedding
  ON tickets USING hnsw (description_embedding vector_cosine_ops);

This is critical once the tables grow beyond a few thousand rows. If PR1 creates these indexes separately, this reminder can be disregarded.

Also applies to: 42-45

🤖 Prompt for AI Agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql` around
lines 17 - 20, Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left indexing for now on purpose coz I wanted to keep this one focused on adding the similarity functions for tasks and tickets and not expanding the scope too much but I can add another pr for indexing as the table grows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dolliecoder, that makes sense! Keeping the PR focused is a good approach.

Would you like me to create a follow-up issue to track adding the HNSW indexes for tasks.description_embedding and tickets.description_embedding? This way it won't be forgotten as the tables grow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant