Optional retrieval result summarisation in CodeCompanion.nvim query tool #179

Davidyz · 2025-06-09T11:51:38Z

This PR aims to implement summarisation for retrieval results.

This would:

Use fewer tokens in the main chat because long documents can be replaced by their summaries.
Allow users to use a smaller/cheaper model/adapter for the summarisation, and hence saving the cost.

Refactoring the existing code to put VectorCode.Result into a function
Accept an adapter as a config option
Implement the summarisation request
- make it not block the main thread
~~Implement a thresholding mechanism that only triggers the summarisation if the document is too long~~ provides a callback that decides whether summarisation should kick in for each tool call

Example config:

opts.extensions.vectorcode = {
  ---@type VectorCode.CodeCompanion.ExtensionOpts
  opts = {
    tool_opts = {
      query = {
        summarise = {
          ---@type boolean|fun(chat: CodeCompanion.Chat,results: VectorCode.QueryResult[]):boolean
          enabled = true,
          adapter = function()
            return require("codecompanion.adapters").extend("gemini", {
              name = "Summariser",
              schema = {
                model = { default = "gemini-2.0-flash-lite" },
              },
              opts = { stream = false },
            })
          end,
        },
      },
    },
  },
}

Related PR:

Add an option to return only chunks to LLMs #168 about reducing token usage

codecov · 2025-06-09T11:54:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.49%. Comparing base (b3a8fa2) to head (9ff39fd).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #179   +/-   ##
=======================================
  Coverage   99.49%   99.49%           
=======================================
  Files          21       21           
  Lines        1589     1589           
=======================================
  Hits         1581     1581           
  Misses          8        8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Davidyz · 2025-06-10T07:52:43Z

I've managed to make requests from an adapter, but handling the async requests from a sync context is TRICKY. @olimorris any suggestions on how I might be able to simplify this?

olimorris · 2025-06-10T09:10:09Z

Could you hook into CodeCompanion's event system and listen for CodeCompanionRequestStarted and CodeCompanionRequestFinished?

Alternatively, we could add a sync method on CodeCompanion's http.lua file, something like :request_sync. I think I've mentioned on a few posts that I'm looking to implement a background strategy that will make it much easier for external plugins to leverage the adapters and http module to make calls to LLMs. I intend on adding a sync method for that too.

Davidyz · 2025-06-10T09:26:03Z

Could you hook into CodeCompanion's event system and listen for CodeCompanionRequestStarted and CodeCompanionRequestFinished?

I hadn't thought of that. Will look into that. I'll probably still need to work out how to make the wait non-blocking for the main thread, though.

I think I've mentioned on a few posts that I'm looking to implement a background strategy that will make it much easier for external plugins to leverage the adapters and http module to make calls to LLMs. I intend on adding a sync method for that too.

That would be very nice to have for this PR. For me, the real tricky bit is to make it NOT block the main UI. tbf I feel like I'm spoiled by modern async like Python asyncio, and have no idea how to work with coroutines directly 😭

olimorris · 2025-06-10T10:09:28Z

That would be very nice to have for this PR. For me, the real tricky bit is to make it NOT block the main UI. tbf I feel like I'm spoiled by modern async like Python asyncio, and have no idea how to work with coroutines directly

It's high up on my list after I've got the agent mode sorted in CodeCompanion. I've had this exact conversation with sooooo many LLMs over the last 12 months 😆.

I took a lot of inspiration from the lua-async-await library some time ago. I never ended up using it but what she's done in 90 LOC blew my mind.

Davidyz · 2025-06-21T07:35:21Z

@olimorris I've managed to implement this without blocking the main UI by putting the summarisation logics into the cmds function, not the output handler. This way we can take advantage of the async tool callback and use the existing async request. The tradeoff is, apparently, deeper nested callbacks 😢 Also we'd still need to work out some sort of throttling. Maybe this should be done in codecompanion, so that the adapter (if reused by different extensions) don't hit the rate limit too often.

Davidyz · 2025-06-21T07:59:40Z

Or, we could move most of the result handling from output handler to cmds function, concatenate all results and send one single request to the summariser... not sure about this

Davidyz · 2025-06-22T06:55:58Z

I've managed to work around the rate limit by including the full results into one request (obviously, this'll need the summariser to be good at long context, but this is MUCH easier to implement than a rate limiter).

Davidyz · 2025-06-22T08:55:09Z

@ravitemer, any suggestions on this feature? I'm asking because you've also done summarisation (from a different perspective), and maybe you can spot something I'm missing?

…y one request to the summariser.

ravitemer · 2025-06-22T09:36:31Z

@Davidyz This looks amazing! I didn't follow the previous commits but the current implementation seems solid.

From a user's perspective, when I want to go through some repo to let LLM get an overview kind of a repo map but better I would certainly use the summary feature. Only thing I can think of is if you can make the summary option dynamic through some variable or adding a summarize field to the query tool so that LLM decides if it needs just an overview in cases to understand the project or the accurate file content to do some edits.

And just an observation, another edge case might be managing context window and max_tokens limits. As you know for the history summarization which is less complex than this, we have a hard limit for each summary request and if there are some messages left we prepend the generated summary to the remaining messages to generate the final summary. I see that strategy might not work here I think. We can tweak the max results or maybe we split the files into maybe 5 files per chunk and send multiple requests and combine all the summaries? I know this adds complexity to this and I am totally okay with not having this at all!

Davidyz · 2025-06-22T09:46:42Z

another edge case might be managing context window and max_tokens limits

I thought this could be done through the adapter configuration, so I didn't do it here. OpenAI API, for example, offers max_tokens that limits the maximum number of tokens generated. I prefer to have a single source of truth, so I intentionally chose not to implement my own hard switch.

We can tweak the max results or maybe we split the files into maybe 5 files per chunk and send multiple requests and combine all the summaries?

In the initial iterations, I send a request for each result (document or chunk). This simply doesn't work OOTB because of the rate limits (imagine having 50 simultaneous requests hitting a server with a rate limit of 10 per minute). I'm open to the possibility, but it'll be very tricky to implement. I'll have to think about it. Maybe this should be upstreamed, as each adapter instance has its debounce counter, so multiple requests can't all happen at the same time. With more extensions making their own requests (outside of the chat buffer itself), I think this will actually make sense.

Davidyz · 2025-06-22T11:49:45Z

Only thing I can think of is if you can make the summary option dynamic through some variable or adding a summarize field to the query tool so that LLM decides if it needs just an overview in cases to understand the project or the accurate file content to do some edits.

As for this one, currently there's the enabled option that can be a function (see the type annotation), which allows you to write some custom logic (for example, a hard switch based on the length of the retrieval results) to turn the summarisation on/off. I'm not so sure about letting the LLM decide this. To my knowledge, the LLMs don't usually have a good understanding of how much of their context window has been used. There are also providers that automatically truncates the input, which makes this matter even worse.

ravitemer · 2025-06-22T11:54:35Z

currently there's the enabled option that can be a function (see the type annotation),

Thanks. Didn't see that. That solves it then.

To my knowledge, the LLMs don't usually have a good understanding of how much of their context window has been used. There are also providers that automatically truncates the input, which makes this matter even worse.

Agreed. It looks solid for me.

Davidyz · 2025-06-23T06:45:27Z

In a quick (non-rigorous) test, the summarisation reduced the query result from a 50k+ character string to an 18k+ string, meaning a 60% reduction in the token count for the tool result!

…esult summarisation

Davidyz added enhancement New feature or request feature labels Jun 9, 2025

Davidyz force-pushed the nvim/result_summary branch from eba101d to 6ce9a03 Compare June 9, 2025 12:12

Davidyz mentioned this pull request Jun 10, 2025

Add an option to return only chunks to LLMs #168

Closed

1 task

Davidyz force-pushed the nvim/result_summary branch 5 times, most recently from 0ced3a4 to 1c562a1 Compare June 17, 2025 01:42

Davidyz force-pushed the nvim/result_summary branch from 1c562a1 to 86b59b4 Compare June 21, 2025 04:53

Davidyz changed the title ~~[WIP] Summarised retrieval results in CodeCompanion.nvim tool~~ Optional retrieval result summarisation in CodeCompanion.nvim query tool Jun 22, 2025

Davidyz marked this pull request as ready for review June 22, 2025 06:51

Zhe Yu added 9 commits June 22, 2025 17:17

refactor(nvim): move VectorCode.Result processing into a function.

ff02f57

feat(nvim): add file summarization to codecompanion tool

7316364

feat(nvim): Add result summarisation to query tool

c007834

feat(nvim): smash a list of results into one HUGE string and send onl…

2406679

…y one request to the summariser.

fix(nvim): Move result processing before conditional summarisation

f0d1eb8

refactor(nvim): Remove summary field from VectorCode.Result type

7728fbc

feat(nvim): allow dynamically switching on/off the summarisation

6597350

feat(nvim): Augment result summary with user query context

59a9ddd

fix(nvim): merge conflicts

b517c76

Davidyz force-pushed the nvim/result_summary branch from 4e90c51 to b517c76 Compare June 22, 2025 09:31

feat(nvim): allow customising the system prompt as a function

407068b

feat(nvim): add system prompt to merge chunks from the same file in r…

9ff39fd

…esult summarisation

Davidyz merged commit bb3d169 into main Jun 23, 2025
13 checks passed

Davidyz deleted the nvim/result_summary branch June 23, 2025 11:53

Uh oh!

Optional retrieval result summarisation in CodeCompanion.nvim query tool #179

Optional retrieval result summarisation in CodeCompanion.nvim query tool #179

Uh oh!

Conversation

Davidyz commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Davidyz commented Jun 10, 2025

Uh oh!

olimorris commented Jun 10, 2025

Uh oh!

Davidyz commented Jun 10, 2025

Uh oh!

olimorris commented Jun 10, 2025

Uh oh!

Davidyz commented Jun 21, 2025

Uh oh!

Davidyz commented Jun 21, 2025

Uh oh!

Davidyz commented Jun 22, 2025

Uh oh!

Davidyz commented Jun 22, 2025

Uh oh!

ravitemer commented Jun 22, 2025

Uh oh!

Davidyz commented Jun 22, 2025

Uh oh!

Davidyz commented Jun 22, 2025

Uh oh!

ravitemer commented Jun 22, 2025

Uh oh!

Davidyz commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Davidyz commented Jun 9, 2025 •

edited

Loading

codecov bot commented Jun 9, 2025 •

edited

Loading