Skip to content

Hybrid search & normalization #64

@ShravanSunder

Description

@ShravanSunder

Hello! I see many articles (like pinecones) that use the following ways to combine the hybrid search results from dense vector and splade.

However i'm a bit confused of how it would work if the dense vectors are normalized to 1, but splade's output is not. any thoughts. What is the best way to conduct hybrid search with both vectors?

I understand the ANN search is done with dot product, so we would just use the highest score and not try to normalize?

def hybrid_scale(dense, sparse, alpha: float):
    # check alpha value is in range
    if alpha < 0 or alpha > 1:
        raise ValueError("Alpha must be between 0 and 1")
    # scale sparse and dense vectors to create hybrid search vecs
    hsparse = {
        'indices': sparse['indices'],
        'values':  [v * (1 - alpha) for v in sparse['values']]
    }
    hdense = [v * alpha for v in dense]
    return hdense, hsparse

i seee this prior issue: #34 but it seemed inconclusive

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions