Adding better queries #2

techiejd · 2025-11-19T15:17:54Z

Release v0.3.0: Enhanced Chunking API, Extension Fields, and Filterable Vector Search

🎯 Overview

This release introduces a major refactor of the chunking API, adds support for extension fields, and enables powerful filtering capabilities in vector search. The plugin now uses Drizzle ORM throughout, eliminating raw SQL and providing a more maintainable, type-safe codebase.

⚠️ Breaking Changes

1. Field-Based Chunking Replaced with `toKnowledgePool` Functions

Before (v0.2.x):

collections: {
  posts: {
    fields: {
      title: { chunker: chunkText },
      content: { chunker: chunkRichText },
    },
  },
}

After (v0.3.0):

collections: {
  posts: {
    toKnowledgePool: async (doc, payload) => {
      const entries = []
      const titleChunks = await chunkText(doc.title ?? '', payload)
      titleChunks.forEach(chunk => entries.push({ chunk, category: doc.category }))
      
      const contentChunks = await chunkRichText(doc.content, payload)
      contentChunks.forEach(chunk => entries.push({ chunk, category: doc.category }))
      
      return entries
    },
  },
}

Why: This change provides full control over chunking logic, allowing you to:

Combine multiple fields into single chunks
Attach custom metadata (extension fields) to each chunk
Implement complex chunking strategies that weren't possible with field-based approach

2. `fieldPath` Removed from Search Results

The fieldPath property has been removed from VectorSearchResult. If you were using this to identify which field a chunk came from, you'll need to track this via extension fields or your chunking logic.

Before:

{
  id: "...",
  fieldPath: "content", // ❌ No longer available
  chunkText: "...",
  // ...
}

After:

{
  id: "...",
  chunkText: "...",
  category: "guides", // ✅ Use extension fields instead
  priority: 4,
  // ...
}

✨ New Features

1. Extension Fields

Add custom fields to the embeddings collection schema and persist values per chunk:

collections: {
  posts: {
    toKnowledgePool: postsToKnowledgePool,
    extensionFields: [
      { name: 'category', type: 'text' },
      { name: 'priority', type: 'number' },
    ],
  },
}

Extension fields are:

Added to the embeddings table schema automatically
Stored with each chunk when vectorizing
Returned in search results
Queryable via the where clause (see below)

2. Filterable Vector Search

The vector search endpoint now accepts Payload-style where clauses and a limit parameter:

const response = await fetch('/api/vector-search', {
  method: 'POST',
  body: JSON.stringify({
    query: 'machine learning',
    knowledgePool: 'main',
    where: {
      category: { equals: 'guides' },
      priority: { gte: 3 },
    },
    limit: 5,
  }),
})

Supported operators:

equals, not_equals / notEquals
in, not_in / notIn
like, contains
greater_than / greaterThan, greater_than_equal / greaterThanEqual
less_than / lessThan, less_than_equal / lessThanEqual
exists (null checks)
Nested and / or conditions

You can filter on:

Default embedding columns: sourceCollection, docId, chunkIndex, chunkText, embeddingVersion
Any extension fields you've defined

3. Improved Chunking Control

The toKnowledgePool function gives you complete control over:

What gets chunked: Combine any fields, transform content, or skip fields entirely
How it's chunked: Use different chunkers for different parts of a document
Metadata per chunk: Attach extension field values that vary per chunk

Example: Chunk a blog post's title separately from its content, and attach different metadata to each:

const postsToKnowledgePool: ToKnowledgePoolFn = async (doc, payload) => {
  const entries = []
  
  // Title chunks get high priority
  const titleChunks = await chunkText(doc.title ?? '', payload)
  titleChunks.forEach(chunk => 
    entries.push({ 
      chunk, 
      category: doc.category,
      priority: 10, // High priority for titles
    })
  )
  
  // Content chunks get normal priority
  const contentChunks = await chunkRichText(doc.content, payload)
  contentChunks.forEach(chunk => 
    entries.push({ 
      chunk, 
      category: doc.category,
      priority: doc.priority ?? 0,
    })
  )
  
  return entries
}

🔧 Technical Improvements

Drizzle ORM Integration

Eliminated raw SQL: All queries now use Drizzle's query builder and type-safe functions
Uses public API only: No reliance on Drizzle's private _ properties, ensuring forward compatibility
Better type safety: Leverages Drizzle's type system throughout
Maintainable: Easier to understand and modify query logic

Custom WHERE Clause Converter

Implemented a custom convertWhereToDrizzle function that:

Converts Payload's Where objects to Drizzle conditions
Handles all common operators and nested and/or logic
Works with both default columns and extension fields
Provides clear error messages for invalid queries

Dynamic Table Registration

The plugin now dynamically generates Drizzle table definitions during schema initialization and stores them in a registry. This allows:

Direct access to table columns without introspection
Type-safe column references
Clean separation between schema definition and query building

📝 Migration Guide

Step 1: Update Your Collection Configuration

Replace field-based chunking with toKnowledgePool functions:

// OLD
collections: {
  posts: {
    fields: {
      title: { chunker: chunkText },
      content: { chunker: chunkRichText },
    },
  },
}

// NEW
collections: {
  posts: {
    toKnowledgePool: async (doc, payload) => {
      const entries = []
      const titleChunks = await chunkText(doc.title ?? '', payload)
      titleChunks.forEach(chunk => entries.push({ chunk }))
      
      const contentChunks = await chunkRichText(doc.content, payload)
      contentChunks.forEach(chunk => entries.push({ chunk }))
      
      return entries
    },
  },
}

Step 2: Update Search Result Handling

Remove any code that references fieldPath:

// OLD
results.forEach(result => {
  console.log(`Field: ${result.fieldPath}`) // ❌ No longer exists
})

// NEW
results.forEach(result => {
  console.log(`Chunk: ${result.chunkText}`)
  // Use extension fields if you need metadata
  if (result.category) {
    console.log(`Category: ${result.category}`)
  }
})

Step 3: (Optional) Add Extension Fields

If you want to store and query custom metadata:

collections: {
  posts: {
    toKnowledgePool: postsToKnowledgePool,
    extensionFields: [
      { name: 'category', type: 'text' },
      { name: 'priority', type: 'number' },
    ],
  },
}

Then update your toKnowledgePool function to return these values:

const postsToKnowledgePool: ToKnowledgePoolFn = async (doc, payload) => {
  return [
    { chunk: '...', category: doc.category, priority: doc.priority },
    // ...
  ]
}

Step 4: Re-vectorize Your Content

After updating your configuration, you'll need to re-vectorize existing documents. The plugin will automatically:

Delete old embeddings when documents are updated
Create new embeddings with the updated structure

🧪 Testing

✅ All existing tests updated and passing
✅ New test suite for extension fields (dev/specs/extensionFields.spec.ts)
✅ Expanded vector search tests with WHERE clause filtering
✅ Integration tests verify end-to-end functionality

📚 Documentation

Updated README with new API examples
Added comprehensive CHANGELOG.md
Migration guide included in CHANGELOG
All code examples updated to reflect new API

Full Changelog: See CHANGELOG.md for complete details.

techiejd added 3 commits November 19, 2025 04:03

WIP

25c76b9

Vector search with where now works

12e3713

Updates package version, readme and adds changelog

b3c142a

techiejd merged commit 78b1409 into main Nov 19, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding better queries #2

Adding better queries #2

Uh oh!

techiejd commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding better queries #2

Adding better queries #2

Uh oh!

Conversation

techiejd commented Nov 19, 2025

Release v0.3.0: Enhanced Chunking API, Extension Fields, and Filterable Vector Search

🎯 Overview

⚠️ Breaking Changes

1. Field-Based Chunking Replaced with toKnowledgePool Functions

2. fieldPath Removed from Search Results

✨ New Features

1. Extension Fields

2. Filterable Vector Search

3. Improved Chunking Control

🔧 Technical Improvements

Drizzle ORM Integration

Custom WHERE Clause Converter

Dynamic Table Registration

📝 Migration Guide

Step 1: Update Your Collection Configuration

Step 2: Update Search Result Handling

Step 3: (Optional) Add Extension Fields

Step 4: Re-vectorize Your Content

🧪 Testing

📚 Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Field-Based Chunking Replaced with `toKnowledgePool` Functions

2. `fieldPath` Removed from Search Results