Add callback interface to put_* functions #27

andreasknoepfle · 2025-03-31T08:59:49Z

This adds a callback function interface to all
put_* functions in the Generation.

Often we need to get something from the pipeline and pass other values in again. Like this we can implement this without breaking the pipeline.

 generation =
      Generation.new(query)
      |> Embedding.generate_embedding(embedding_provider())
      |> Retrieval.retrieve(:fulltext_results, fn generation -> query_fulltext(generation) end)
      |> Retrieval.retrieve(:semantic_results, fn generation ->
        query_with_pgvector(generation)
      end)
      |> Retrieval.reciprocal_rank_fusion(
        %{fulltext_results: 1, semantic_results: 1},
        :rrf_result
      )
      |> Retrieval.deduplicate(:rrf_result, [:source])

    context =
      Generation.get_retrieval_result(generation, :rrf_result)
      |> Enum.map_join("\n\n", & &1.document)

    context_sources =
      Generation.get_retrieval_result(generation, :rrf_result)
      |> Enum.map(& &1.source)

    prompt = prompt(query, context)

    generation
    |> Generation.put_context(context)
    |> Generation.put_context_sources(context_sources)
    |> Generation.put_prompt(prompt)

      query
      |> Generation.new()
      |> Embedding.generate_embedding(embedding_provider())
      |> Retrieval.retrieve(:fulltext_results, fn generation -> query_fulltext(generation) end)
      |> Retrieval.retrieve(:semantic_results, query_with_pgvector/1)
      |> Retrieval.reciprocal_rank_fusion(
        %{fulltext_results: 1, semantic_results: 1},
        :rrf_result
      )
      |> Retrieval.deduplicate(:rrf_result, [:source])
      |> Generation.put_context(fn generation ->
         Generation.get_retrieval_result(generation, :rrf_result)
         |> Enum.map_join("\n\n", & &1.document)
      end) 
     |> Generation.put_context_sources(fn generation ->
           Generation.get_retrieval_result(generation, :rrf_result)
          |> Enum.map(& &1.source)
    |> Generation.put_prompt(&prompt/1)

This adds a callback function interface to all put_* functions in the Generation. Often we need to get something from the pipeline and pass other values in again. Like this we can implement this without breaking the pipeline.

andreasknoepfle · 2025-03-31T12:05:07Z

Actually, looking at the other PR in here I am not sure anymore if we should not duplicate the types and create types for all putable things instead.

joelpaulkoch · 2025-03-31T12:38:16Z

I think API-wise, I would prefer to leave the put_* functions as is for putting simple values into the struct.

Instead we could have more semantic functions take the callback, what should already be in place for most parts of the pipeline.

For instance
Generation.put_query_embedding(..., callback) -> Embedding.generate_embedding(..., callback)
put_retrieval_result -> Retrieval.retrieve
put_response -> generate_response

What's missing are those for context, context_sources, prompt and evaluation. Something along

      query
      |> Generation.new()
      |> Embedding.generate_embedding(embedding_provider())
      |> Retrieval.retrieve(:fulltext_results, fn generation -> query_fulltext(generation) end)
      |> Retrieval.retrieve(:semantic_results, query_with_pgvector/1)
      |> Retrieval.reciprocal_rank_fusion(
        %{fulltext_results: 1, semantic_results: 1},
        :rrf_result
      )
      |> Retrieval.deduplicate(:rrf_result, [:source])
      |> Generation.build_context(fn generation ->
         Generation.get_retrieval_result(generation, :rrf_result)
         |> Enum.map_join("\n\n", & &1.document)
      end) 
     |> Generation.define_context_sources(fn generation ->
           Generation.get_retrieval_result(generation, :rrf_result)
          |> Enum.map(& &1.source)
    |> Generation.build_prompt(&prompt/1)

In my mind that would leave put_* functions as escape hatches when you need control (and break the pipeline).
And gives us more freedom to do stuff in the semantic functions, for instance telemetry or setting a default builder function for the prompt etc.

What do you think?

joelpaulkoch · 2025-03-31T12:39:57Z

Actually, looking at the other PR in here I am not sure anymore if we should not duplicate the types and create types for all putable things instead.

Yes, it would be better to define those types but I think it's not needed for now.
I'm going to set up dialyzer on CI and can rework the types as part of that.

joelpaulkoch · 2025-05-12T11:24:37Z

closing in favor of #30

Add callback interface to put_* functions

c2b04c8

This adds a callback function interface to all put_* functions in the Generation. Often we need to get something from the pipeline and pass other values in again. Like this we can implement this without breaking the pipeline.

andreasknoepfle requested a review from joelpaulkoch March 31, 2025 08:59

joelpaulkoch mentioned this pull request May 11, 2025

Add build_... functions to enable pipeline #30

Merged

joelpaulkoch closed this May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add callback interface to put_* functions #27

Add callback interface to put_* functions #27

Uh oh!

andreasknoepfle commented Mar 31, 2025

Uh oh!

andreasknoepfle commented Mar 31, 2025

Uh oh!

joelpaulkoch commented Mar 31, 2025

Uh oh!

joelpaulkoch commented Mar 31, 2025

Uh oh!

joelpaulkoch commented May 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add callback interface to put_* functions #27

Add callback interface to put_* functions #27

Uh oh!

Conversation

andreasknoepfle commented Mar 31, 2025

Uh oh!

andreasknoepfle commented Mar 31, 2025

Uh oh!

joelpaulkoch commented Mar 31, 2025

Uh oh!

joelpaulkoch commented Mar 31, 2025

Uh oh!

joelpaulkoch commented May 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants