Skip to content

Filters

Anup Ghatage edited this page Feb 12, 2026 · 1 revision

Filters

Filters are applied as post-filters on vector search results and as pre-filters on BM25 search. When bitmap indexes are enabled (default), eligible filters are evaluated against pre-built RoaringBitmap indexes for sub-millisecond filtering.

Format

Filters use a tagged JSON format with an "op" discriminator field:

{
  "op": "<operator>",
  ...fields
}

Operators

eq — Equality

Match vectors where a field exactly equals a value.

{
  "op": "eq",
  "field": "color",
  "value": "red"
}

not_eq — Inequality

Match vectors where a field does not equal a value.

{
  "op": "not_eq",
  "field": "status",
  "value": "deleted"
}

range — Numeric Range

Match vectors where a numeric field falls within bounds. All bounds are optional — use any combination of gte, lte, gt, lt.

{
  "op": "range",
  "field": "price",
  "gte": 10.0,
  "lte": 100.0
}
{
  "op": "range",
  "field": "score",
  "gt": 0.5
}

in — Set Membership

Match vectors where a field's value is in a given set.

{
  "op": "in",
  "field": "category",
  "values": ["electronics", "books", "toys"]
}

not_in — Set Exclusion

Match vectors where a field's value is NOT in a given set.

{
  "op": "not_in",
  "field": "category",
  "values": ["spam", "junk"]
}

contains — List Contains

Match vectors where a list-type field contains a specific value.

{
  "op": "contains",
  "field": "tags",
  "value": "rust"
}

contains_all_tokens — Token Presence

Match vectors where a text field contains all specified tokens (order-independent). Useful for FTS pre-filtering.

{
  "op": "contains_all_tokens",
  "field": "content",
  "tokens": ["rust", "programming"]
}

contains_token_sequence — Exact Phrase

Match vectors where a text field contains tokens as an exact adjacent phrase.

{
  "op": "contains_token_sequence",
  "field": "content",
  "tokens": ["vector", "search", "engine"]
}

Logical Operators

and — All Must Match

{
  "op": "and",
  "filters": [
    {"op": "eq", "field": "color", "value": "red"},
    {"op": "range", "field": "price", "lte": 50.0}
  ]
}

or — Any Must Match

{
  "op": "or",
  "filters": [
    {"op": "eq", "field": "color", "value": "red"},
    {"op": "eq", "field": "color", "value": "blue"}
  ]
}

not — Negate

{
  "op": "not",
  "filter": {
    "op": "eq",
    "field": "archived",
    "value": true
  }
}

Compound Examples

Products: red, under $50, in stock

{
  "op": "and",
  "filters": [
    {"op": "eq", "field": "color", "value": "red"},
    {"op": "range", "field": "price", "lt": 50.0},
    {"op": "eq", "field": "in_stock", "value": true}
  ]
}

Documents: either "engineering" category OR tagged "technical", but not archived

{
  "op": "and",
  "filters": [
    {
      "op": "or",
      "filters": [
        {"op": "eq", "field": "category", "value": "engineering"},
        {"op": "contains", "field": "tags", "value": "technical"}
      ]
    },
    {
      "op": "not",
      "filter": {"op": "eq", "field": "archived", "value": true}
    }
  ]
}

Attribute Value Types

Filters match against attribute values stored during upsert. Supported types:

Type JSON Example Description
String "hello" Text value
Integer 42 64-bit integer
Float 3.14 64-bit float
Bool true Boolean
StringList ["a", "b"] List of strings (use contains)
IntegerList [1, 2, 3] List of integers
FloatList [1.5, 2.5] List of floats

Bitmap Pre-Filter Optimization

When bitmap_index is enabled (default: true), Zeppelin builds RoaringBitmap indexes during compaction for each attribute field. At query time:

  1. The filter is evaluated against bitmap indexes to produce a candidate set
  2. Only candidate vectors are scanned during distance computation
  3. If a filter cannot be fully resolved by bitmaps, it falls back to post-filtering

This is automatic and transparent — no query changes needed. Bitmap evaluation typically takes < 1ms even for millions of vectors.

Clone this wiki locally