diff --git a/.github/workflows/panvimdoc.yml b/.github/workflows/panvimdoc.yml
index dae4ce0f..71f6f70b 100644
--- a/.github/workflows/panvimdoc.yml
+++ b/.github/workflows/panvimdoc.yml
@@ -11,11 +11,29 @@ jobs:
     name: pandoc to vimdoc
     steps:
       - uses: actions/checkout@v4
+
       - name: panvimdoc
         uses: kdheepak/panvimdoc@main
         with:
           vimdoc: "VectorCode" # Output vimdoc project name (required)
-          pandoc: "./docs/neovim.md" # Input pandoc file
+          pandoc: "./docs/neovim/README.md" # Input pandoc file
+          toc: true # Table of contents
+          description: "A code repository indexing tool to supercharge your LLM experience." # Project description used in title (if empty, uses neovim version and current date)
+          titledatepattern: "%Y %B %d" # Pattern for the date that used in the title
+          demojify: true # Strip emojis from the vimdoc
+          dedupsubheadings: true # Add heading to subheading anchor links to ensure that subheadings are unique
+          treesitter: true # Use treesitter for highlighting codeblocks
+          ignorerawblocks: true # Ignore raw html blocks in markdown when converting to vimdoc
+          docmapping: false # Use h4 headers as mapping docs
+          docmappingprojectname: true # Use project name in tag when writing mapping docs
+          shiftheadinglevelby: 0 # Shift heading levels by specified number
+          incrementheadinglevelby: 0 # Increment heading levels by specified number
+
+      - name: panvimdoc
+        uses: kdheepak/panvimdoc@main
+        with:
+          vimdoc: "VectorCode-API" # Output vimdoc project name (required)
+          pandoc: "./docs/neovim/api_references.md" # Input pandoc file
           toc: true # Table of contents
           description: "A code repository indexing tool to supercharge your LLM experience." # Project description used in title (if empty, uses neovim version and current date)
           titledatepattern: "%Y %B %d" # Pattern for the date that used in the title
diff --git a/README.md b/README.md
index d7d68d9b..698451d4 100644
--- a/README.md
+++ b/README.md
@@ -44,11 +44,11 @@ model output and reduce hallucination.
 
 - For the setup and usage of the command-line tool, see [the CLI documentation](./docs/cli.md);
 - For neovim users, after you've gone through the CLI documentation, please refer to 
-  [the neovim plugin documentation](./docs/neovim.md) for further instructions.
+  [the neovim plugin documentation](./docs/neovim/README.md) (and optionally the [lua API reference](./docs/neovim/api_references.md)) 
+  for further instructions.
 - Additional resources:
   - the [wiki](https://github.com/Davidyz/VectorCode/wiki) for extra tricks and
-    tips that will help you get the most out of VectorCode, as well as
-    instructions to setup VectorCode to work with some other neovim plugins;
+    tips that will help you get the most out of VectorCode;
   - the [discussions](https://github.com/Davidyz/VectorCode/discussions) where
     you can ask general questions and share your cool usages about VectorCode.
   - If you're feeling adanvturous, feel free to check out 
diff --git a/doc/VectorCode-API.txt b/doc/VectorCode-API.txt
new file mode 100644
index 00000000..54fc295e
--- /dev/null
+++ b/doc/VectorCode-API.txt
@@ -0,0 +1,406 @@
+*VectorCode-API.txt*A code repository indexing tool to supercharge your LLM experience.
+
+==============================================================================
+Table of Contents                           *VectorCode-API-table-of-contents*
+
+1. Lua API References                      |VectorCode-API-lua-api-references|
+  - Synchronous API        |VectorCode-API-lua-api-references-synchronous-api|
+  - Cached Asynchronous API|VectorCode-API-lua-api-references-cached-asynchronous-api|
+  - JobRunners                  |VectorCode-API-lua-api-references-jobrunners|
+
+==============================================================================
+1. Lua API References                      *VectorCode-API-lua-api-references*
+
+This plugin provides 2 sets of _high-level APIs_ that provides similar
+functionalities. The synchronous APIs provide more up-to-date retrieval results
+at the cost of blocking the main neovim UI, while the async APIs use a caching
+mechanism to provide asynchronous retrieval results almost instantaneously, but
+the result may be slightly out-of-date. For some tasks like chat, the main UI
+being blocked/frozen doesn’t hurt much because you spend the time waiting for
+response anyway, and you can use the synchronous API in this case. For other
+tasks like completion, the cached API will minimise the interruption to your
+workflow, but at a cost of providing less up-to-date results.
+
+These APIs are wrappers around the _lower-level job runner API_, which provides
+a unified interface for calling VectorCode commands that can be executed by
+either the LSP or the generic CLI backend. If the high-level APIs are
+sufficient for your use-case, it’s usually not necessary to use the job
+runners directly.
+
+- |VectorCode-API-synchronous-api|
+    - |VectorCode-API-`query(query_message,-opts?,-callback?)`|
+    - |VectorCode-API-`check(check_item?)`|
+    - |VectorCode-API-`update(project_root?)`|
+- |VectorCode-API-cached-asynchronous-api|
+    - |VectorCode-API-`cacher_backend.register_buffer(bufnr?,-opts?)`|
+    - |VectorCode-API-`cacher_backend.query_from_cache(bufnr?)`|
+    - |VectorCode-API-`cacher_backend.async_check(check_item?,-on_success?,-on_failure?)`|
+    - |VectorCode-API-`cacher_backend.buf_is_registered(bufnr?)`|
+    - |VectorCode-API-`cacher_backend.buf_is_enabled(bufnr?)`|
+    - |VectorCode-API-`cacher_backend.buf_job_count(bufnr?)`|
+    - |VectorCode-API-`cacher_backend.make_prompt_component(bufnr?,-component_cb?)`|
+    - |VectorCode-API-built-in-query-callbacks|
+- |VectorCode-API-jobrunners|
+    - |VectorCode-API-`run_async(args,-callback,-bufnr)`-and-`run(args,-timeout_ms,-bufnr)`|
+    - |VectorCode-API-`is_job_running(job_handle):boolean`|
+    - |VectorCode-API-`stop_job(job_handle)`|
+
+
+SYNCHRONOUS API            *VectorCode-API-lua-api-references-synchronous-api*
+
+
+QUERY(QUERY_MESSAGE, OPTS?, CALLBACK?) ~
+
+This function queries VectorCode and returns an array of results.
+
+>lua
+    require("vectorcode").query("some query message", {
+        n_query = 5,
+    })
+<
+
+- `query_message`string or a list of strings, the query messages;
+- `opts`The following are the available options for this function (see |VectorCode-API-`setup(opts?)`| for details):
+
+>lua
+    {
+        exclude_this = true,
+        n_query = 1,
+        notify = true,
+        timeout_ms = 5000,
+    }
+<
+
+- `callback`a callback function that takes the result of the retrieval as the
+    only parameter. If this is set, the `query` function will be non-blocking and
+    runs in an async manner. In this case, it doesn’t return any value and
+    retrieval results can only be accessed by this callback function.
+
+The return value of this function is an array of results in the format of
+`{path="path/to/your/code.lua", document="document content"}`.
+
+For example, in cmp-ai <https://github.com/tzachar/cmp-ai>, you can add the
+path/document content to the prompt like this:
+
+>lua
+    prompt = function(prefix, suffix)
+        local retrieval_results = require("vectorcode").query("some query message", {
+            n_query = 5,
+        })
+        for _, source in pairs(retrieval_results) do
+            -- This works for qwen2.5-coder.
+            file_context = file_context
+                .. "<|file_sep|>"
+                .. source.path
+                .. "\n"
+                .. source.document
+                .. "\n"
+        end
+        return file_context
+            .. "<|fim_prefix|>" 
+            .. prefix 
+            .. "<|fim_suffix|>" 
+            .. suffix 
+            .. "<|fim_middle|>"
+    end
+<
+
+Keep in mind that this `query` function call will be synchronous and therefore
+block the neovim UI. This is where the async cache comes in.
+
+
+CHECK(CHECK_ITEM?) ~
+
+This function checks if VectorCode has been configured properly for your
+project. See the CLI manual for details <./cli.md>.
+
+>lua
+    require("vectorcode").check()
+<
+
+The following are the available options for this function: - `check_item`Only
+supports `"config"` at the moment. Checks if a project-local config is present.
+Return value: `true` if passed, `false` if failed.
+
+This involves the `check` command of the CLI that checks the status of the
+VectorCode project setup. Use this as a pre-condition of any subsequent use of
+other VectorCode APIs that may be more expensive (if this fails, VectorCode
+hasn’t been properly set up for the project, and you should not use
+VectorCode APIs).
+
+The use of this API is entirely optional. You can totally ignore this and call
+`query` anyway, but if `check` fails, you might be spending the waiting time
+for nothing.
+
+
+UPDATE(PROJECT_ROOT?) ~
+
+This function calls `vectorcode update` at the current working directory.
+`--project_root` will be added if the `project_root` parameter is not `nil`.
+This runs async and doesn’t block the main UI.
+
+>lua
+    require("vectorcode").update()
+<
+
+
+CACHED ASYNCHRONOUS API*VectorCode-API-lua-api-references-cached-asynchronous-api*
+
+The async cache mechanism helps mitigate the issue where the `query` API may
+take too long and block the main thread. The following are the functions
+available through the `require("vectorcode.cacher")` module.
+
+From 0.4.0, the async cache module came with 2 backends that exposes the same
+interface:
+
+1. The `default` backend which works exactly like the original implementation
+used in previous versions;
+2. The `lsp` based backend, which make use of the experimental `vectorcode-server`
+implemented in version 0.4.0. If you want to customise the LSP executable or
+any options supported by `vim.lsp.ClientConfig`, you can do so by using
+`vim.lsp.config()`. This plugin will load the config associated with the name
+`vectorcode_server`. You can override the default config (for example, the
+path to the executable) by calling `vim.lsp.config('vectorcode_server', opts)`.
+
+  -------------------------------------------------------------------------------
+  Features   default                         lsp
+  ---------- ------------------------------- ------------------------------------
+  Pros       Fully backward compatible with  Less IO overhead for
+             minimal extra config required   loading/unloading embedding models;
+                                             Progress reports.
+
+  Cons       Heavy IO overhead because the   Requires vectorcode-server
+             embedding model and database    
+             client need to be initialised   
+             for every query.                
+  -------------------------------------------------------------------------------
+You may choose which backend to use by setting the |VectorCode-API-`setup`|
+option `async_backend`, and acquire the corresponding backend by the following
+API:
+
+>lua
+    local cacher_backend = require("vectorcode.config").get_cacher_backend()
+<
+
+and you can use `cacher_backend` wherever you used to use
+`require("vectorcode.cacher")`. For example,
+`require("vectorcode.cacher").query_from_cache(0)` becomes
+`require("vectorcode.config").get_cacher_backend().query_from_cache(0)`. In the
+remaining section of this documentation, I’ll use `cacher_backend` to
+represent either of the backends. Unless otherwise noticed, all the
+asynchronous APIs work for both backends.
+
+
+CACHER_BACKEND.REGISTER_BUFFER(BUFNR?, OPTS?) ~
+
+This function registers a buffer to be cached by VectorCode.
+
+>lua
+    cacher_backend.register_buffer(0, {
+        n_query = 1,
+    })
+<
+
+The following are the available options for this function: - `bufnr`buffer
+number. Default: `0` (current buffer); - `opts`accepts a lua table with the
+following keys: - `project_root`a string of the path that overrides the
+detected project root. Default: `nil`. This is mostly intended to use with the
+|VectorCode-API-user-command|, and you probably should not use this directly in
+your config. **If you’re using the LSP backend and did not specify this
+value, it will be automatically detected based on .vectorcode or .git. If this
+fails, LSP backend will not work**; - `exclude_this`whether to exclude the file
+you’re editing. Default: `true`; - `n_query`number of retrieved documents.
+Default: `1`; - `debounce`debounce time in milliseconds. Default: `10`; -
+`notify`whether to show notifications when a query is completed. Default:
+`false`; - `query_cb``fun(bufnr: integer):string|string[]`, a callback function
+that accepts the buffer ID and returns the query message(s). Default:
+`require("vectorcode.utils").make_surrounding_lines_cb(-1)`. See
+|VectorCode-API-this-section| for a list of built-in query callbacks; -
+`events`list of autocommand events that triggers the query. Default:
+`{"BufWritePost", "InsertEnter", "BufReadPost"}`; - `run_on_register`whether to
+run the query when the buffer is registered. Default: `false`; -
+`single_job`boolean. If this is set to `true`, there will only be one running
+job for each buffer, and when a new job is triggered, the last-running job will
+be cancelled. Default: `false`.
+
+
+CACHER_BACKEND.QUERY_FROM_CACHE(BUFNR?) ~
+
+This function queries VectorCode from cache.
+
+>lua
+    local query_results = cacher_backend.query_from_cache(0, {notify=false})
+<
+
+The following are the available options for this function: - `bufnr`buffer
+number. Default: current buffer; - `opts`accepts a lua table with the following
+keys: - `notify`boolean, whether to show notifications when a query is
+completed. Default: `false`;
+
+Return value: an array of results. Each item of the array is in the format of
+`{path="path/to/your/code.lua", document="document content"}`.
+
+
+CACHER_BACKEND.ASYNC_CHECK(CHECK_ITEM?, ON_SUCCESS?, ON_FAILURE?) ~
+
+This function checks if VectorCode has been configured properly for your
+project.
+
+>lua
+    cacher_backend.async_check(
+        "config", 
+        do_something(), -- on success
+        do_something_else()  -- on failure
+    )
+<
+
+The following are the available options for this function: - `check_item`any
+check that works with `vectorcode check` command. If not set, it defaults to
+`"config"`; - `on_success`a callback function that is called when the check
+passes; - `on_failure`a callback function that is called when the check fails.
+
+
+CACHER_BACKEND.BUF_IS_REGISTERED(BUFNR?) ~
+
+This function checks if a buffer has been registered with VectorCode.
+
+The following are the available options for this function: - `bufnr`buffer
+number. Default: current buffer. Return value: `true` if registered, `false`
+otherwise.
+
+
+CACHER_BACKEND.BUF_IS_ENABLED(BUFNR?) ~
+
+This function checks if a buffer has been enabled with VectorCode. It is
+slightly different from `buf_is_registered`, because it does not guarantee
+VectorCode is actively caching the content of the buffer. It is the same as
+`buf_is_registered && not is_paused`.
+
+The following are the available options for this function: - `bufnr`buffer
+number. Default: current buffer. Return value: `true` if enabled, `false`
+otherwise.
+
+
+CACHER_BACKEND.BUF_JOB_COUNT(BUFNR?) ~
+
+Returns the number of running jobs in the background.
+
+
+CACHER_BACKEND.MAKE_PROMPT_COMPONENT(BUFNR?, COMPONENT_CB?) ~
+
+Compile the retrieval results into a string. Parameters: - `bufnr`buffer
+number. Default: current buffer; - `component_cb`a callback function that
+formats each retrieval result, so that you can customise the control token,
+etc. for the component. The default is the following:
+
+>lua
+    function(result)
+        return "<|file_sep|>" .. result.path .. "\n" .. result.document
+    end
+<
+
+`make_prompt_component` returns a table with 2 keys: - `count`number of
+retrieved documents; - `content`The retrieval results concatenated together
+into a string. Each result is formatted by `component_cb`.
+
+
+BUILT-IN QUERY CALLBACKS ~
+
+When using async cache, the query message is constructed by a function that
+takes the buffer ID as the only parameter, and return a string or a list of
+strings. The `vectorcode.utils` module provides the following callback
+constructor for you to play around with it, but you can easily build your own!
+
+- `require("vectorcode.utils").make_surrounding_lines_cb(line_count)`returns a
+    callback that uses `line_count` lines around the cursor as the query. When
+    `line_count` is negative, it uses the full buffer;
+- `require("vectorcode.utils").make_lsp_document_symbol_cb()`returns a
+    callback which uses the `textDocument/documentSymbol` method to retrieve a
+    list of symbols in the current document. This will fallback to
+    `make_surrounding_lines_cb(-1)` when there’s no LSP that supports the
+    `documentSymbol` method;
+- `require("vectorcode.utils").make_changes_cb(max_num)`returns a callback
+    that fetches `max_num` unique items from the `:changes` list. This will also
+    fallback to `make_surrounding_lines_cb(-1)`. The default value for `max_num`
+    is 50.
+
+
+JOBRUNNERS                      *VectorCode-API-lua-api-references-jobrunners*
+
+The `VectorCode.JobRunner` is an abstract class for vectorcode command
+execution. There are 2 concrete child classes that you can use: -
+`require("vectorcode.jobrunner.cmd")` uses the CLI (`vectorcode` commands) to
+interact with the database; - `quire("vectorcode.jobrunner.lsp")` use the LSP
+server, which avoids some of the IO overhead and provides LSP progress
+notifications.
+
+The available methods for a `VectorCode.JobRunner` object includes:
+
+
+RUN_ASYNC(ARGS, CALLBACK, BUFNR) AND RUN(ARGS, TIMEOUT_MS, BUFNR) ~
+
+Calls a vectorcode command.
+
+The `args` parameter (of type `string[]`) is whatever argument that comes after
+`vectorcode` when you run it in the CLI. For example, if you want to query for
+10 chunks in the shell, you’d call the following command:
+
+>bash
+    vectorcode query -n 10 keyword1 keyword2 --include chunk
+<
+
+Then for the job runner (either LSP or cmd), the `args` parameter would be:
+
+>lua
+    args = {"query", "-n", "10", "keyword1", "keyword2", "--include", "chunk"}
+<
+
+For the `run_async` method, the `callback` function has the following
+signature:
+
+>lua
+    ---@type fun(result: table, error: table, code:integer, signal: integer?)?
+<
+
+For the `run` method, the return value can be captured as follow:
+
+>lua
+    res, err, _code, _signal = jobrunner.run(args, -1, 0)
+<
+
+The result (for both synchronous and asynchronous method) is a
+`vim.json.decode`ed table of the result of the command execution. Consult the
+CLI documentation <../cli.md#for-developers> for the schema of the results for
+the command that you call.
+
+For example, the query command mentioned above will return a
+`VectorCode.QueryResult[]`, where `VectorCode.QueryResult` is defined as
+follows:
+
+>lua
+    ---@class VectorCode.QueryResult
+    ---@field path string Path to the file
+    ---@field document string? Content of the file
+    ---@field chunk string?
+    ---@field start_line integer?
+    ---@field end_line integer?
+    ---@field chunk_id string?
+<
+
+The `run_async` will return a `job_handle` which is defined as an `integer?`.
+For the LSP backend, the job handle is the `request_id`. For the cmd runner,
+the job handle is the `PID` of the process.
+
+
+IS_JOB_RUNNING(JOB_HANDLE):BOOLEAN ~
+
+Checks if a job associated with the given handle is currently running;
+
+
+STOP_JOB(JOB_HANDLE) ~
+
+Attempts to stop or cancel the async job associated with the given handle.
+
+Generated by panvimdoc <https://github.com/kdheepak/panvimdoc>
+
+vim:tw=78:ts=8:noet:ft=help:norl:
diff --git a/doc/VectorCode-cli.txt b/doc/VectorCode-cli.txt
index 0530a95f..a2465d5e 100644
--- a/doc/VectorCode-cli.txt
+++ b/doc/VectorCode-cli.txt
@@ -150,6 +150,10 @@ A community-maintained Nix package is available here
 If you’re using nix to install a standalone Chromadb server, make sure to
 stick to 0.6.3 <https://github.com/NixOS/nixpkgs/pull/412528>.
 
+If you install via Nix and run into an issue, please try to reproduce with the
+PyPi package (install via `uv` or `pipx`). If it’s not reproducible on the
+non-nix package, I may close the issue immediately.
+
 
 GETTING STARTED  *VectorCode-cli-vectorcode-command-line-tool-getting-started*
 
@@ -350,9 +354,14 @@ reranking method to use. Currently supports `CrossEncoderReranker` (default,
 using sentence-transformers cross-encoder
 <https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html> )
 and `NaiveReranker` (sort chunks by the "distance" between the embedding
-vectors); - `reranker_params`dictionary, similar to `embedding_params`. The
-options passed to the reranker class constructor. For `CrossEncoderReranker`,
-these are the options passed to the `CrossEncoder`
+vectors). Note: If you’re using a good embedding model (eg. a hosted service
+from OpenAI, or a LLM-based embedding model like Qwen3-Embedding-0.6B
+<https://huggingface.co/Qwen/Qwen3-Embedding-0.6B>), you may get better results
+if you use `NaiveReranker` here because a good embedding model may understand
+texts better than a mediocre reranking model. - `reranker_params`dictionary,
+similar to `embedding_params`. The options passed to the reranker class
+constructor. For `CrossEncoderReranker`, these are the options passed to the
+`CrossEncoder`
 <https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#id1>
 class. For example, if you want to use a non-default model, you can use the
 following: `json { "reranker_params": { "model_name_or_path": "your_model_here"
diff --git a/doc/VectorCode.txt b/doc/VectorCode.txt
index 6c0e7773..14551f58 100644
--- a/doc/VectorCode.txt
+++ b/doc/VectorCode.txt
@@ -8,7 +8,6 @@ Table of Contents                               *VectorCode-table-of-contents*
   - Integrations                       |VectorCode-neovim-plugin-integrations|
   - Configuration                     |VectorCode-neovim-plugin-configuration|
   - User Command                       |VectorCode-neovim-plugin-user-command|
-  - API Usage                             |VectorCode-neovim-plugin-api-usage|
   - Debugging and Logging     |VectorCode-neovim-plugin-debugging-and-logging|
 2. Links                                                    |VectorCode-links|
 
@@ -17,8 +16,8 @@ Table of Contents                               *VectorCode-table-of-contents*
 
 
   [!NOTE] This plugin depends on the CLI tool. Please go through the CLI
-  documentation <./cli.md> and make sure the VectorCode CLI is working before
-  proceeding.
+  documentation <../cli/README.md> and make sure the VectorCode CLI is working
+  before proceeding.
 
   [!NOTE] When the neovim plugin doesn’t work properly, please try upgrading
   both the CLI and the neovim plugin to the latest version before opening an
@@ -26,26 +25,26 @@ Table of Contents                               *VectorCode-table-of-contents*
 - |VectorCode-installation|
     - |VectorCode-mason.nvim|
     - |VectorCode-nix|
+    - |VectorCode-lazy-loading|
 - |VectorCode-integrations|
+    - |VectorCode-milanglacier/minuet-ai.nvim|
+    - |VectorCode-olimorris/codecompanion.nvim|
+    - |VectorCode-copilotc-nvim/copilotchat.nvim|
+        - |VectorCode-setup|
+        - |VectorCode-configuration-options|
+        - |VectorCode-usage-tips|
+        - |VectorCode-performance-optimization|
+        - |VectorCode-using-with-sticky-prompts|
+    - |VectorCode-status-line-component|
+        - |VectorCode-nvim-lualine/lualine.nvim|
+        - |VectorCode-heirline.nvim|
+    - |VectorCode-fidget.nvim|
+    - |VectorCode-model-context-protocol-(mcp)|
 - |VectorCode-configuration|
     - |VectorCode-`setup(opts?)`|
 - |VectorCode-user-command|
     - |VectorCode-`vectorcode-register`|
     - |VectorCode-`vectorcode-deregister`|
-- |VectorCode-api-usage|
-    - |VectorCode-synchronous-api|
-        - |VectorCode-`query(query_message,-opts?,-callback?)`|
-        - |VectorCode-`check(check_item?)`|
-        - |VectorCode-`update(project_root?)`|
-    - |VectorCode-cached-asynchronous-api|
-        - |VectorCode-`cacher_backend.register_buffer(bufnr?,-opts?)`|
-        - |VectorCode-`cacher_backend.query_from_cache(bufnr?)`|
-        - |VectorCode-`cacher_backend.async_check(check_item?,-on_success?,-on_failure?)`|
-        - |VectorCode-`cacher_backend.buf_is_registered(bufnr?)`|
-        - |VectorCode-`cacher_backend.buf_is_enabled(bufnr?)`|
-        - |VectorCode-`cacher_backend.buf_job_count(bufnr?)`|
-        - |VectorCode-`cacher_backend.make_prompt_component(bufnr?,-component_cb?)`|
-        - |VectorCode-built-in-query-callbacks|
 - |VectorCode-debugging-and-logging|
 
 
@@ -108,17 +107,366 @@ There’s a community-maintained nix package
 <https://github.com/sarahec> for the Neovim plugin.
 
 
+LAZY LOADING ~
+
+When you call VectorCode APIs or integration interfaces as a part of another
+plugin’s configuration, it’s important to make sure that VectorCode is
+loaded BEFORE the plugin you’re trying to use.
+
+For example, in lazy.nvim <https://github.com/folke/lazy.nvim>, it’s not
+sufficient to simply add VectorCode as a dependency. You’d also need to wrap
+the `opts` table in a function:
+
+>lua
+    {
+      "olimorris/codecompanion.nvim",
+      opts = function()
+        return your_opts_here
+      end
+    }
+<
+
+If you pass a table, instead of a function, as the value for the `opts` key,
+neovim will try to load the VectorCode components immediately on startup
+(potentially even before the plugin is added to the |`rtp`|) and will cause
+some errors.
+
+
 INTEGRATIONS                           *VectorCode-neovim-plugin-integrations*
 
-The wiki <https://github.com/Davidyz/VectorCode/wiki/Neovim-Integrations>
-contains instructions to integrate VectorCode with the following plugins:
+VectorCode is a _library_ plugin that needs to be paired with some AI plugin to
+assist your workflow. The core APIs are documented in the API references
+<./api_references.md>. For some plugins, we provide built-in support that
+simplify the integrations. You can read about the relevant sections below about
+the specific plugin that you want to use VectorCode with.
+
+If, unfortunately, your AI plugin of choice is not listed here, you can either
+use the APIs listed in the API references <./api_references.md> to build your
+own integration interface, or open an issue (either in this repo or in the AI
+plugin’s repo) to request for support.
+
+Currently supported plugins: - milanglacier/minuet-ai.nvim
+<https://github.com/milanglacier/minuet-ai.nvim>; -
+olimorris/codecompanion.nvim <https://github.com/olimorris/codecompanion.nvim>;
+- CopilotC-Nvim/CopilotChat.nvim
+<https://github.com/CopilotC-Nvim/CopilotChat.nvim>; - ravitemer/mcphub.nvim
+<https://github.com/ravitemer/mcphub.nvim>; - nvim-lualine/lualine.nvim
+<https://github.com/nvim-lualine/lualine.nvim>; - rebelot/heirline.nvim
+<https://github.com/rebelot/heirline.nvim>.
+
+
+MILANGLACIER/MINUET-AI.NVIM ~
+
+You can use the aysnc caching API <./api_references.md#cached-asynchronous-api>
+to include query results in the prompt.
+
+See minuet-ai documentation
+<https://github.com/milanglacier/minuet-ai.nvim/blob/main/recipes.md#integration-with-vectorcode>
+and Prompt Gallery <https://github.com/Davidyz/VectorCode/wiki/Prompt-Gallery>
+for instructions to modify the prompts to use VectorCode context for
+completion.
+
+To control the number of results to be included in the prompt and some other
+behaviour, you can either set the opts when calling the `register_buffer`
+function, or change the value of `async_opts.n_query` in the `setup` function
+(see |VectorCode-configuration|).
+
+
+OLIMORRIS/CODECOMPANION.NVIM ~
+
+<https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD?t=3>
+
+The following requires VectorCode 0.7+ and a recent version of
+CodeCompanion.nvim.
+
+The CodeCompanion extension will register the following tools: -
+`@{vectorcode_ls}`an equivalent of `vectorcode ls` command that shows the
+indexed projects on your system; - `@{vectorcode_query}`an equivalent of
+`vectorcode query` command that searches from a project; -
+`@{vectorcode_vectorise}`an equivalent of `vectorcode vectorise` command that
+adds files to the database; - `@{vectorcode_files_ls}`an equivalent of
+`vectorcode files ls` command that gives a list of indexed files in a project;
+- `@{vectorcode_files_rm}`an equivalent of `vectorcode files rm` command that
+removes files from a collection.
+
+By default, it’ll also create a tool group called `@{vectorcode_toolbox}`,
+which contains the `vectorcode_ls`, `vectorcode_query` and
+`vectorcode_vectorise` tools. You can customise the members of this toolbox by
+the `include_in_toolbox` option explained below.
+
+>lua
+    ---@module "vectorcode"
+    opts = {
+      extensions = {
+        vectorcode = {
+          ---@type VectorCode.CodeCompanion.ExtensionOpts
+          opts = {
+            tool_group = {
+              -- this will register a tool group called `@vectorcode_toolbox` that contains all 3 tools
+              enabled = true,
+              -- a list of extra tools that you want to include in `@vectorcode_toolbox`.
+              -- if you use @vectorcode_vectorise, it'll be very handy to include
+              -- `file_search` here.
+              extras = {},
+              collapse = false, -- whether the individual tools should be shown in the chat
+            },
+            tool_opts = {
+              ---@type VectorCode.CodeCompanion.ToolOpts
+              ["*"] = {},
+              ---@type VectorCode.CodeCompanion.LsToolOpts
+              ls = {},
+              ---@type VectorCode.CodeCompanion.VectoriseToolOpts
+              vectorise = {},
+              ---@type VectorCode.CodeCompanion.QueryToolOpts
+              query = {
+                max_num = { chunk = -1, document = -1 },
+                default_num = { chunk = 50, document = 10 },
+                include_stderr = false,
+                use_lsp = false,
+                no_duplicate = true,
+                chunk_mode = false,
+                ---@type VectorCode.CodeCompanion.SummariseOpts
+                summarise = {
+                  ---@type boolean|(fun(chat: CodeCompanion.Chat, results: VectorCode.QueryResult[]):boolean)|nil
+                  enabled = false,
+                  adapter = nil,
+                  query_augmented = true,
+                }
+              },
+              files_ls = {},
+              files_rm = {}
+            }
+          },
+        },
+      }
+    }
+<
+
+The following are the common options that all tools supports:
+
+- `use_lsp`whether to use the LSP backend to run the queries. Using LSP
+    provides some insignificant performance boost and a nice notification pop-up
+    if you’re using fidget.nvim <https://github.com/j-hui/fidget.nvim>. Default:
+    `true` if `async_backend` is set to `"lsp"` in `setup()`. Otherwise, it’ll be
+    `false`;
+- `requires_approval`whether CodeCompanion.nvim asks for your approval before
+    executing the tool call. Default: `false` for `ls` and `query`; `true` for
+    `vectorise`;
+- `include_in_toolbox`whether this tool should be included in
+    `vectorcode_toolbox`. Default: `true` for `query`, `vectorise` and `ls`,
+    `false` for `files_*`.
+
+In the `tool_opts` table, you may either configure these common options
+individually, or use the `["*"]` key to specify the default settings for all
+tools. If you’ve set both the default settings (via `["*"]`) and the
+individual settings for a tool, the individual settings take precedence.
+
+The `query` tool contains the following extra config options: -
+`chunk_mode`boolean, whether the VectorCode backend should return chunks or
+full documents. Default: `false`; - `max_num` and `default_num`If they’re set
+to integers, they represent the default and maximum allowed number of results
+returned by VectorCode (regardless of document or chunk mode). They can also be
+set to tables with 2 keys: `document` and `chunk`. In this case, their values
+would be used for the corresponding mode. You may ask the LLM to request a
+different number of chunks or documents, but they’ll be capped by the values
+in `max_num`. Default: See the sample snippet above. Negative values for
+`max_num` means unlimited. - `no_duplicate`boolean, whether the query calls
+should attempt to exclude files that has been retrieved and provided in the
+previous turns of the current chat. This helps saving tokens and increase the
+chance of retrieving the correct files when the previous retrievals fail to do
+so. Default: `true`. - `summarise`optional summarisation for the retrieval
+results. This is a table with the following keys: - `enabled`This can either be
+a boolean that toggles summarisation on/off completely, or a function that
+accepts the `CodeCompanion.Chat` object and the raw query results as the 2
+paramters and returns a boolean. When it’s the later, it’ll be evaluated
+for every tool call. This allows you to write some custom logic to dynamically
+turn summarisation on and off. _When the summarisation is enabled, but you find
+the summaries not informative enough, you can tell the LLM to disable the
+summarisation during the chat so that it sees the raw information_; -
+`adapter`See CodeCompanion documentation
+<https://codecompanion.olimorris.dev/configuration/adapters.html#configuring-adapters>.
+When not provided, it’ll use the chat adapter; - `system_prompt`When set to a
+string, this will be used as the system prompt for the summarisation model.
+When set to a function, it’ll be called with the default system prompt as the
+only parameter, and it should return a string that will be used as a system
+prompt. This allows you to append/prepend things to the default system prompt;
+- `query_augmented`boolean, whether the system prompt should contain the query
+so that when the LLM decide what information to include, it _may_ be able to
+avoid omitting stuff related to query.
+
+
+COPILOTC-NVIM/COPILOTCHAT.NVIM ~
+
+CopilotC-Nvim/CopilotChat.nvim
+<https://github.com/CopilotC-Nvim/CopilotChat.nvim> is a Neovim plugin that
+provides an interface to GitHub Copilot Chat. VectorCode integration enriches
+the conversations by providing relevant repository context.
+
+
+SETUP
+
+VectorCode offers a dedicated integration with CopilotChat.nvim that provides
+contextual information about your codebase to enhance Copilot’s responses.
+Add this to your CopilotChat configuration:
+
+>lua
+    local vectorcode_ctx = require('vectorcode.integrations.copilotchat').make_context_provider({
+      prompt_header = "Here are relevant files from the repository:", -- Customize header text
+      prompt_footer = "\nConsider this context when answering:", -- Customize footer text
+      skip_empty = true, -- Skip adding context when no files are retrieved
+    })
+    
+    require('CopilotChat').setup({
+      -- Your other CopilotChat options...
+    
+      contexts = {
+        -- Add the VectorCode context provider
+        vectorcode = vectorcode_ctx,
+      },
+    
+      -- Enable VectorCode context in your prompts
+      prompts = {
+        Explain = {
+          prompt = "Explain the following code in detail:\n$input",
+          context = {"selection", "vectorcode"}, -- Add vectorcode to the context
+        },
+        -- Other prompts...
+      }
+    })
+<
+
+
+CONFIGURATION OPTIONS
+
+The `make_context_provider` function accepts these options:
+
+- `prompt_header`Text that appears before the code context (default: "The following are relevant files from the repository. Use them as extra context for helping with code completion and understanding:")
+- `prompt_footer`Text that appears after the code context (default: "and provide a strategy with examples about: ")
+- `skip_empty`Whether to skip adding context when no files are retrieved (default: true)
+- `format_file`A function that formats each retrieved file (takes a file result object and returns formatted string)
+
+
+USAGE TIPS
+
+1. Register your buffers with VectorCode (`:VectorCode register`) to enable context fetching
+2. Create different prompt templates with or without VectorCode context depending on your needs
+3. For large codebases, consider adjusting the number of retrieved documents using `n_query` when registering buffers
+
+
+PERFORMANCE OPTIMIZATION
+
+The integration includes caching to avoid sending duplicate context to the LLM,
+which helps reduce token usage when asking multiple questions about the same
+codebase.
+
+
+USING WITH STICKY PROMPTS
 
-- milanglacier/minuet-ai.nvim <https://github.com/milanglacier/minuet-ai.nvim>;
-- olimorris/codecompanion.nvim <https://github.com/olimorris/codecompanion.nvim>;
-- nvim-lualine/lualine.nvim <https://github.com/nvim-lualine/lualine.nvim>;
-- CopilotC-Nvim/CopilotChat.nvim <https://github.com/CopilotC-Nvim/CopilotChat.nvim>;
-- ravitemer/mcphub.nvim <https://github.com/ravitemer/mcphub.nvim>;
-- rebelot/heirline.nvim <https://github.com/rebelot/heirline.nvim>.
+You can configure VectorCode to be part of your sticky prompts, ensuring every
+conversation includes relevant codebase context automatically:
+
+>lua
+    require('CopilotChat').setup({
+      -- Your other CopilotChat options...
+    
+      sticky = {
+        "Using the model $claude-3.7-sonnet-thought",
+        "#vectorcode", -- Automatically includes repository context in every conversation
+      },
+    })
+<
+
+This configuration will include both the model specification and repository
+context in every conversation with CopilotChat.
+
+------------------------------------------------------------------------------
+
+STATUS LINE COMPONENT ~
+
+
+NVIM-LUALINE/LUALINE.NVIM
+
+A `lualine` component that shows the status of the async job and the number of
+cached retrieval results.
+
+>lua
+    tabline = {
+      lualine_y = {
+        require("vectorcode.integrations").lualine(opts)
+      }
+    }
+<
+
+`opts` is a table with the following configuration option: -
+`show_job_count`boolean, whether to show the number of running jobs for the
+buffer. Default: `false`.
+
+This will, however, start VectorCode when lualine starts (which usually means
+when neovim starts). If this bothers you, you can use the following snippet:
+
+>lua
+    tabline = {
+      lualine_y = {
+        {
+          function()
+            return require("vectorcode.integrations").lualine(opts)[1]()
+          end,
+          cond = function()
+            if package.loaded["vectorcode"] == nil then
+              return false
+            else
+              return require("vectorcode.integrations").lualine(opts).cond()
+            end
+          end,
+        },
+      }
+    }
+<
+
+This will further delay the loading of VectorCode to the moment you (or one of
+your plugins that actually retrieves context from VectorCode) load VectorCode.
+
+
+HEIRLINE.NVIM
+
+A heirline component is available as:
+
+>lua
+    local vectorcode_component = require("vectorcode.integrations").heirline({
+      show_job_count = true,
+      component_opts = {
+        -- put other field of the components here.
+        -- they'll be merged into the final component.
+      },
+    })
+<
+
+
+FIDGET.NVIM ~
+
+If you’re using a LSP backend
+<https://github.com/Davidyz/VectorCode/blob/main/docs/cli.md#lsp-mode>, there
+will be a notification when there’s a pending request for queries. As long as
+the LSP backend is working, no special configuration is needed for this.
+
+
+MODEL CONTEXT PROTOCOL (MCP) ~
+
+The Python package contains an optional `mcp` dependency group. After
+installing this, you can use the MCP server with any MCP client. For example,
+to use it with mcphub.nvim <https://github.com/ravitemer/mcphub.nvim>, simply
+add this server in the JSON config:
+
+>json
+    {
+      "mcpServers": {
+        "vectorcode-mcp-server": {
+          "command": "vectorcode-mcp-server",
+          "args": []
+        }
+      }
+    }
+<
 
 
 CONFIGURATION                         *VectorCode-neovim-plugin-configuration*
@@ -126,6 +474,11 @@ CONFIGURATION                         *VectorCode-neovim-plugin-configuration*
 
 SETUP(OPTS?) ~
 
+This function controls the behaviour of some of the APIs provided by the
+VectorCode neovim plugin. If you’re using built-in integration interfaces,
+you usually don’t have to worry about this section, unless otherwise
+specified in the relevant section.
+
 This function initialises the VectorCode client and sets up some default
 
 >lua
@@ -154,7 +507,7 @@ This function initialises the VectorCode client and sets up some default
         on_setup = {
           update = false, -- set to true to enable update when `setup` is called.
           lsp = false,
-        }
+        },
         sync_log_env_var = false,
       }
     )
@@ -246,297 +599,6 @@ Deregister the current buffer. Any running jobs will be killed, cached results
 will be deleted, and no more queries will be run.
 
 
-API USAGE                                 *VectorCode-neovim-plugin-api-usage*
-
-This plugin provides 2 sets of APIs that provides similar functionalities. The
-synchronous APIs provide more up-to-date retrieval results at the cost of
-blocking the main neovim UI, while the async APIs use a caching mechanism to
-provide asynchronous retrieval results almost instantaneously, but the result
-may be slightly out-of-date. For some tasks like chat, the main UI being
-blocked/frozen doesn’t hurt much because you spend the time waiting for
-response anyway, and you can use the synchronous API in this case. For other
-tasks like completion, the async API will minimise the interruption to your
-workflow.
-
-
-SYNCHRONOUS API ~
-
-
-QUERY(QUERY_MESSAGE, OPTS?, CALLBACK?)
-
-This function queries VectorCode and returns an array of results.
-
->lua
-    require("vectorcode").query("some query message", {
-        n_query = 5,
-    })
-<
-
-- `query_message`string or a list of strings, the query messages;
-- `opts`The following are the available options for this function (see |VectorCode-`setup(opts?)`| for details):
-
->lua
-    {
-        exclude_this = true,
-        n_query = 1,
-        notify = true,
-        timeout_ms = 5000,
-    }
-<
-
-- `callback`a callback function that takes the result of the retrieval as the
-    only parameter. If this is set, the `query` function will be non-blocking and
-    runs in an async manner. In this case, it doesn’t return any value and
-    retrieval results can only be accessed by this callback function.
-
-The return value of this function is an array of results in the format of
-`{path="path/to/your/code.lua", document="document content"}`.
-
-For example, in cmp-ai <https://github.com/tzachar/cmp-ai>, you can add the
-path/document content to the prompt like this:
-
->lua
-    prompt = function(prefix, suffix)
-        local retrieval_results = require("vectorcode").query("some query message", {
-            n_query = 5,
-        })
-        for _, source in pairs(retrieval_results) do
-            -- This works for qwen2.5-coder.
-            file_context = file_context
-                .. "<|file_sep|>"
-                .. source.path
-                .. "\n"
-                .. source.document
-                .. "\n"
-        end
-        return file_context
-            .. "<|fim_prefix|>" 
-            .. prefix 
-            .. "<|fim_suffix|>" 
-            .. suffix 
-            .. "<|fim_middle|>"
-    end
-<
-
-Keep in mind that this `query` function call will be synchronous and therefore
-block the neovim UI. This is where the async cache comes in.
-
-
-CHECK(CHECK_ITEM?)
-
-This function checks if VectorCode has been configured properly for your
-project. See the CLI manual for details <./cli.md>.
-
->lua
-    require("vectorcode").check()
-<
-
-The following are the available options for this function: - `check_item`Only
-supports `"config"` at the moment. Checks if a project-local config is present.
-Return value: `true` if passed, `false` if failed.
-
-This involves the `check` command of the CLI that checks the status of the
-VectorCode project setup. Use this as a pre-condition of any subsequent use of
-other VectorCode APIs that may be more expensive (if this fails, VectorCode
-hasn’t been properly set up for the project, and you should not use
-VectorCode APIs).
-
-The use of this API is entirely optional. You can totally ignore this and call
-`query` anyway, but if `check` fails, you might be spending the waiting time
-for nothing.
-
-
-UPDATE(PROJECT_ROOT?)
-
-This function calls `vectorcode update` at the current working directory.
-`--project_root` will be added if the `project_root` parameter is not `nil`.
-This runs async and doesn’t block the main UI.
-
->lua
-    require("vectorcode").update()
-<
-
-
-CACHED ASYNCHRONOUS API ~
-
-The async cache mechanism helps mitigate the issue where the `query` API may
-take too long and block the main thread. The following are the functions
-available through the `require("vectorcode.cacher")` module.
-
-From 0.4.0, the async cache module came with 2 backends that exposes the same
-interface:
-
-1. The `default` backend which works exactly like the original implementation
-used in previous versions;
-2. The `lsp` based backend, which make use of the experimental `vectorcode-server`
-implemented in version 0.4.0. If you want to customise the LSP executable or
-any options supported by `vim.lsp.ClientConfig`, you can do so by using
-`vim.lsp.config()`. This plugin will load the config associated with the name
-`vectorcode_server`. You can override the default config (for example, the
-path to the executable) by calling `vim.lsp.config('vectorcode_server', opts)`.
-
-  -------------------------------------------------------------------------------
-  Features   default                         lsp
-  ---------- ------------------------------- ------------------------------------
-  Pros       Fully backward compatible with  Less IO overhead for
-             minimal extra config required   loading/unloading embedding models;
-                                             Progress reports.
-
-  Cons       Heavy IO overhead because the   Requires vectorcode-server
-             embedding model and database    
-             client need to be initialised   
-             for every query.                
-  -------------------------------------------------------------------------------
-You may choose which backend to use by setting the |VectorCode-`setup`| option
-`async_backend`, and acquire the corresponding backend by the following API:
-
->lua
-    local cacher_backend = require("vectorcode.config").get_cacher_backend()
-<
-
-and you can use `cacher_backend` wherever you used to use
-`require("vectorcode.cacher")`. For example,
-`require("vectorcode.cacher").query_from_cache(0)` becomes
-`require("vectorcode.config").get_cacher_backend().query_from_cache(0)`. In the
-remaining section of this documentation, I’ll use `cacher_backend` to
-represent either of the backends. Unless otherwise noticed, all the
-asynchronous APIs work for both backends.
-
-
-CACHER_BACKEND.REGISTER_BUFFER(BUFNR?, OPTS?)
-
-This function registers a buffer to be cached by VectorCode.
-
->lua
-    cacher_backend.register_buffer(0, {
-        n_query = 1,
-    })
-<
-
-The following are the available options for this function: - `bufnr`buffer
-number. Default: `0` (current buffer); - `opts`accepts a lua table with the
-following keys: - `project_root`a string of the path that overrides the
-detected project root. Default: `nil`. This is mostly intended to use with the
-|VectorCode-user-command|, and you probably should not use this directly in
-your config. **If you’re using the LSP backend and did not specify this
-value, it will be automatically detected based on .vectorcode or .git. If this
-fails, LSP backend will not work**; - `exclude_this`whether to exclude the file
-you’re editing. Default: `true`; - `n_query`number of retrieved documents.
-Default: `1`; - `debounce`debounce time in milliseconds. Default: `10`; -
-`notify`whether to show notifications when a query is completed. Default:
-`false`; - `query_cb``fun(bufnr: integer):string|string[]`, a callback function
-that accepts the buffer ID and returns the query message(s). Default:
-`require("vectorcode.utils").make_surrounding_lines_cb(-1)`. See
-|VectorCode-this-section| for a list of built-in query callbacks; -
-`events`list of autocommand events that triggers the query. Default:
-`{"BufWritePost", "InsertEnter", "BufReadPost"}`; - `run_on_register`whether to
-run the query when the buffer is registered. Default: `false`; -
-`single_job`boolean. If this is set to `true`, there will only be one running
-job for each buffer, and when a new job is triggered, the last-running job will
-be cancelled. Default: `false`.
-
-
-CACHER_BACKEND.QUERY_FROM_CACHE(BUFNR?)
-
-This function queries VectorCode from cache.
-
->lua
-    local query_results = cacher_backend.query_from_cache(0, {notify=false})
-<
-
-The following are the available options for this function: - `bufnr`buffer
-number. Default: current buffer; - `opts`accepts a lua table with the following
-keys: - `notify`boolean, whether to show notifications when a query is
-completed. Default: `false`;
-
-Return value: an array of results. Each item of the array is in the format of
-`{path="path/to/your/code.lua", document="document content"}`.
-
-
-CACHER_BACKEND.ASYNC_CHECK(CHECK_ITEM?, ON_SUCCESS?, ON_FAILURE?)
-
-This function checks if VectorCode has been configured properly for your
-project.
-
->lua
-    cacher_backend.async_check(
-        "config", 
-        do_something(), -- on success
-        do_something_else()  -- on failure
-    )
-<
-
-The following are the available options for this function: - `check_item`any
-check that works with `vectorcode check` command. If not set, it defaults to
-`"config"`; - `on_success`a callback function that is called when the check
-passes; - `on_failure`a callback function that is called when the check fails.
-
-
-CACHER_BACKEND.BUF_IS_REGISTERED(BUFNR?)
-
-This function checks if a buffer has been registered with VectorCode.
-
-The following are the available options for this function: - `bufnr`buffer
-number. Default: current buffer. Return value: `true` if registered, `false`
-otherwise.
-
-
-CACHER_BACKEND.BUF_IS_ENABLED(BUFNR?)
-
-This function checks if a buffer has been enabled with VectorCode. It is
-slightly different from `buf_is_registered`, because it does not guarantee
-VectorCode is actively caching the content of the buffer. It is the same as
-`buf_is_registered && not is_paused`.
-
-The following are the available options for this function: - `bufnr`buffer
-number. Default: current buffer. Return value: `true` if enabled, `false`
-otherwise.
-
-
-CACHER_BACKEND.BUF_JOB_COUNT(BUFNR?)
-
-Returns the number of running jobs in the background.
-
-
-CACHER_BACKEND.MAKE_PROMPT_COMPONENT(BUFNR?, COMPONENT_CB?)
-
-Compile the retrieval results into a string. Parameters: - `bufnr`buffer
-number. Default: current buffer; - `component_cb`a callback function that
-formats each retrieval result, so that you can customise the control token,
-etc. for the component. The default is the following:
-
->lua
-    function(result)
-        return "<|file_sep|>" .. result.path .. "\n" .. result.document
-    end
-<
-
-`make_prompt_component` returns a table with 2 keys: - `count`number of
-retrieved documents; - `content`The retrieval results concatenated together
-into a string. Each result is formatted by `component_cb`.
-
-
-BUILT-IN QUERY CALLBACKS
-
-When using async cache, the query message is constructed by a function that
-takes the buffer ID as the only parameter, and return a string or a list of
-strings. The `vectorcode.utils` module provides the following callback
-constructor for you to play around with it, but you can easily build your own!
-
-- `require("vectorcode.utils").make_surrounding_lines_cb(line_count)`returns a
-    callback that uses `line_count` lines around the cursor as the query. When
-    `line_count` is negative, it uses the full buffer;
-- `require("vectorcode.utils").make_lsp_document_symbol_cb()`returns a
-    callback which uses the `textDocument/documentSymbol` method to retrieve a
-    list of symbols in the current document. This will fallback to
-    `make_surrounding_lines_cb(-1)` when there’s no LSP that supports the
-    `documentSymbol` method;
-- `require("vectorcode.utils").make_changes_cb(max_num)`returns a callback
-    that fetches `max_num` unique items from the `:changes` list. This will also
-    fallback to `make_surrounding_lines_cb(-1)`. The default value for `max_num`
-    is 50.
-
-
 DEBUGGING AND LOGGING         *VectorCode-neovim-plugin-debugging-and-logging*
 
 You can enable logging by setting `VECTORCODE_NVIM_LOG_LEVEL` environment
@@ -549,6 +611,7 @@ Linux, this is usually `~/.local/state/nvim/`.
 2. Links                                                    *VectorCode-links*
 
 1. *@sarahec*: 
+2. *asciicast*: https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD.svg
 
 Generated by panvimdoc <https://github.com/kdheepak/panvimdoc>
 
diff --git a/docs/cli.md b/docs/cli.md
index f16f5034..e9979c1c 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -125,6 +125,10 @@ A community-maintained Nix package is available
 If you're using nix to install a standalone Chromadb server, make sure to stick
 to [0.6.3](https://github.com/NixOS/nixpkgs/pull/412528).
 
+If you install via Nix and run into an issue, please try to reproduce with the
+PyPi package (install via `uv` or `pipx`). If it's not reproducible on the
+non-nix package, I may close the issue immediately.
+
 ## Getting Started
 
 `cd` into your project root repo, and run:
@@ -305,7 +309,12 @@ The JSON configuration file may hold the following values:
   `CrossEncoderReranker` (default, using 
   [sentence-transformers cross-encoder](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html)
   ) and `NaiveReranker` (sort chunks by the "distance" between the embedding
-  vectors);
+  vectors).
+  Note: If you're using a good embedding model (eg. a hosted service from OpenAI, or 
+  a LLM-based embedding model like 
+  [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B)), you
+  may get better results if you use `NaiveReranker` here because a good embedding
+  model may understand texts better than a mediocre reranking model.
 - `reranker_params`: dictionary, similar to `embedding_params`. The options
   passed to the reranker class constructor. For `CrossEncoderReranker`, these
   are the options passed to the 
diff --git a/docs/neovim/README.md b/docs/neovim/README.md
new file mode 100644
index 00000000..2a10b528
--- /dev/null
+++ b/docs/neovim/README.md
@@ -0,0 +1,564 @@
+# NeoVim Plugin
+> [!NOTE]
+> This plugin depends on the CLI tool. Please go through 
+> [the CLI documentation](../cli/README.md) and make sure the VectorCode CLI is working
+> before proceeding.
+
+> [!NOTE]
+> When the neovim plugin doesn't work properly, please try upgrading both the CLI
+> and the neovim plugin to the latest version before opening an issue.
+
+
+<!-- mtoc-start -->
+
+* [Installation](#installation)
+  * [Mason.nvim ](#masonnvim-)
+  * [Nix](#nix)
+  * [Lazy Loading](#lazy-loading)
+* [Integrations](#integrations)
+  * [milanglacier/minuet-ai.nvim](#milanglacierminuet-ainvim)
+  * [olimorris/codecompanion.nvim](#olimorriscodecompanionnvim)
+  * [CopilotC-Nvim/CopilotChat.nvim](#copilotc-nvimcopilotchatnvim)
+    * [Setup](#setup)
+    * [Configuration Options](#configuration-options)
+    * [Usage Tips](#usage-tips)
+    * [Performance Optimization](#performance-optimization)
+    * [Using with Sticky Prompts](#using-with-sticky-prompts)
+  * [Status Line Component](#status-line-component)
+    * [nvim-lualine/lualine.nvim](#nvim-lualinelualinenvim)
+    * [heirline.nvim](#heirlinenvim)
+  * [fidget.nvim](#fidgetnvim)
+  * [Model Context Protocol (MCP)](#model-context-protocol-mcp)
+* [Configuration](#configuration)
+  * [`setup(opts?)`](#setupopts)
+* [User Command](#user-command)
+  * [`VectorCode register`](#vectorcode-register)
+  * [`VectorCode deregister`](#vectorcode-deregister)
+* [Debugging and Logging](#debugging-and-logging)
+
+<!-- mtoc-end -->
+
+## Installation
+Using Lazy:
+
+```lua 
+{
+  "Davidyz/VectorCode",
+  version = "*", -- optional, depending on whether you're on nightly or release
+  dependencies = { "nvim-lua/plenary.nvim" },
+  cmd = "VectorCode", -- if you're lazy-loading VectorCode
+}
+```
+The VectorCode CLI and neovim plugin share the same release scheme (version
+numbers). In other words, CLI 0.1.3 is guaranteed to work with neovim plugin
+0.1.3, but if you use CLI 0.1.0 with neovim plugin 0.1.3, they may not work
+together because the neovim plugin is built for a newer CLI release and depends
+on newer features/breaking changes.
+
+To ensure maximum compatibility, please either:
+1. Use release build for VectorCode CLI and pin to the releases for the
+   neovim plugin;
+
+**OR**
+
+2. Use the latest commit for the neovim plugin with VectorCode installed from
+   the latest GitHub commit.
+
+It may be helpful to use a `build` hook to automatically upgrade the CLI when
+the neovim plugin updates. For example, if you're using lazy.nvim and `uv`,
+you can use the following plugin spec:
+
+```lua
+{
+  "Davidyz/VectorCode",
+  version = "*",
+  build = "uv tool upgrade vectorcode", -- This helps keeping the CLI up-to-date
+  -- build = "pipx upgrade vectorcode", -- If you used pipx to install the CLI
+  dependencies = { "nvim-lua/plenary.nvim" },
+}
+```
+
+> This plugin is developed and tested on neovim _v0.11_. It may work on older
+> versions, but I do not test on them before publishing.
+
+### Mason.nvim 
+
+The VectorCode CLI and LSP server are available in `mason.nvim`. If you choose to
+install the CLI through mason, you may need to pay extra attention to the version 
+pinning because the package updates on mason usually takes extra time.
+
+### Nix
+
+There's a community-maintained [nix package](https://nixpk.gs/pr-tracker.html?pr=413395) 
+submitted by [@sarahec](https://github.com/sarahec) for the Neovim plugin.
+
+### Lazy Loading
+When you call VectorCode APIs or integration interfaces as a part of another
+plugin's configuration, it's important to make sure that VectorCode is loaded
+BEFORE the plugin you're trying to use.
+
+For example, in [lazy.nvim](https://github.com/folke/lazy.nvim), it's not
+sufficient to simply add VectorCode as a dependency. You'd also need to wrap the 
+`opts` table in a function:
+```lua
+{
+  "olimorris/codecompanion.nvim",
+  opts = function()
+    return your_opts_here
+  end
+}
+```
+If you pass a table, instead of a function, as the value for the `opts` key,
+neovim will try to load the VectorCode components immediately on startup
+(potentially even before the plugin is added to the
+[`rtp`](https://neovim.io/doc/user/options.html#'runtimepath')) and will cause
+some errors.
+
+## Integrations
+
+VectorCode is a _library_ plugin that needs to be paired with some AI plugin to
+assist your workflow. The core APIs are documented in the [API references](./api_references.md).
+For some plugins, we provide built-in support that simplify the integrations.
+You can read about the relevant sections below about the specific plugin that
+you want to use VectorCode with.
+
+If, unfortunately, your AI plugin of choice is not listed here, you can either
+use the APIs listed in the [API references](./api_references.md) to build your
+own integration interface, or open an issue (either in this repo or in the AI 
+plugin's repo) to request for support.
+
+Currently supported plugins:
+- [milanglacier/minuet-ai.nvim](https://github.com/milanglacier/minuet-ai.nvim);
+- [olimorris/codecompanion.nvim](https://github.com/olimorris/codecompanion.nvim);
+- [CopilotC-Nvim/CopilotChat.nvim](https://github.com/CopilotC-Nvim/CopilotChat.nvim);
+- [ravitemer/mcphub.nvim](https://github.com/ravitemer/mcphub.nvim);
+- [nvim-lualine/lualine.nvim](https://github.com/nvim-lualine/lualine.nvim);
+- [rebelot/heirline.nvim](https://github.com/rebelot/heirline.nvim).
+
+### [milanglacier/minuet-ai.nvim](https://github.com/milanglacier/minuet-ai.nvim)
+
+You can use the [aysnc caching API](./api_references.md#cached-asynchronous-api)
+to include query results in the prompt. 
+
+See
+[minuet-ai documentation](https://github.com/milanglacier/minuet-ai.nvim/blob/main/recipes.md#integration-with-vectorcode)
+and
+[Prompt Gallery](https://github.com/Davidyz/VectorCode/wiki/Prompt-Gallery) for
+instructions to modify the prompts to use VectorCode context for completion.
+
+To control the number of results to be included in the prompt and some other
+behaviour, you can either set the opts when calling the `register_buffer` function, 
+or change the value of `async_opts.n_query` in the `setup` function 
+(see [configuration](#configuration)).
+
+### [olimorris/codecompanion.nvim](https://github.com/olimorris/codecompanion.nvim)
+
+[![asciicast](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD.svg)](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD?t=3)
+
+The following requires VectorCode 0.7+ and a recent version of CodeCompanion.nvim.
+
+The CodeCompanion extension will register the following tools:
+- `@{vectorcode_ls}`: an equivalent of `vectorcode ls` command that shows the
+  indexed projects on your system;
+- `@{vectorcode_query}`: an equivalent of `vectorcode query` command that
+  searches from a project;
+- `@{vectorcode_vectorise}`: an equivalent of `vectorcode vectorise` command
+  that adds files to the database;
+- `@{vectorcode_files_ls}`: an equivalent of `vectorcode files ls` command that
+  gives a list of indexed files in a project;
+- `@{vectorcode_files_rm}`: an equivalent of `vectorcode files rm` command that
+  removes files from a collection.
+
+By default, it'll also create a tool group called `@{vectorcode_toolbox}`, which
+contains the `vectorcode_ls`, `vectorcode_query` and `vectorcode_vectorise`
+tools. You can customise the members of this toolbox by the `include_in_toolbox`
+option explained below.
+
+```lua
+---@module "vectorcode"
+opts = {
+  extensions = {
+    vectorcode = {
+      ---@type VectorCode.CodeCompanion.ExtensionOpts
+      opts = {
+        tool_group = {
+          -- this will register a tool group called `@vectorcode_toolbox` that contains all 3 tools
+          enabled = true,
+          -- a list of extra tools that you want to include in `@vectorcode_toolbox`.
+          -- if you use @vectorcode_vectorise, it'll be very handy to include
+          -- `file_search` here.
+          extras = {},
+          collapse = false, -- whether the individual tools should be shown in the chat
+        },
+        tool_opts = {
+          ---@type VectorCode.CodeCompanion.ToolOpts
+          ["*"] = {},
+          ---@type VectorCode.CodeCompanion.LsToolOpts
+          ls = {},
+          ---@type VectorCode.CodeCompanion.VectoriseToolOpts
+          vectorise = {},
+          ---@type VectorCode.CodeCompanion.QueryToolOpts
+          query = {
+            max_num = { chunk = -1, document = -1 },
+            default_num = { chunk = 50, document = 10 },
+            include_stderr = false,
+            use_lsp = false,
+            no_duplicate = true,
+            chunk_mode = false,
+            ---@type VectorCode.CodeCompanion.SummariseOpts
+            summarise = {
+              ---@type boolean|(fun(chat: CodeCompanion.Chat, results: VectorCode.QueryResult[]):boolean)|nil
+              enabled = false,
+              adapter = nil,
+              query_augmented = true,
+            }
+          },
+          files_ls = {},
+          files_rm = {}
+        }
+      },
+    },
+  }
+}
+```
+
+The following are the common options that all tools supports:
+
+- `use_lsp`: whether to use the LSP backend to run the queries. Using LSP
+  provides some insignificant performance boost and a nice notification pop-up
+  if you're using [fidget.nvim](https://github.com/j-hui/fidget.nvim). Default:
+  `true` if `async_backend` is set to `"lsp"` in `setup()`. Otherwise, it'll be
+  `false`;
+- `requires_approval`: whether CodeCompanion.nvim asks for your approval before
+  executing the tool call. Default: `false` for `ls` and `query`; `true` for
+  `vectorise`;
+- `include_in_toolbox`: whether this tool should be included in
+  `vectorcode_toolbox`. Default: `true` for `query`, `vectorise` and `ls`,
+  `false` for `files_*`.
+
+In the `tool_opts` table, you may either configure these common options
+individually, or use the `["*"]` key to specify the default settings for all
+tools. If you've set both the default settings (via `["*"]`) and the individual
+settings for a tool, the individual settings take precedence.
+
+The `query` tool contains the following extra config options:
+- `chunk_mode`: boolean, whether the VectorCode backend should return chunks or
+  full documents. Default: `false`;
+- `max_num` and `default_num`: If they're set to integers, they represent the
+  default and maximum allowed number of results returned by VectorCode
+  (regardless of  document or chunk mode). They can also be set to tables with 2
+  keys: `document` and `chunk`. In this case, their values would be used for the
+  corresponding mode. You may ask the LLM to request a different number of
+  chunks or documents, but they'll be capped by the values in `max_num`.
+  Default: See the sample snippet above. Negative values for `max_num` means
+  unlimited.
+- `no_duplicate`: boolean, whether the query calls should attempt to exclude files
+  that has been retrieved and provided in the previous turns of the current chat.
+  This helps saving tokens and increase the chance of retrieving the correct files
+  when the previous retrievals fail to do so. Default: `true`.
+- `summarise`: optional summarisation for the retrieval results. This is a table
+  with the following keys:
+  - `enabled`: This can either be a boolean that toggles summarisation on/off
+    completely, or a function that accepts the `CodeCompanion.Chat` object and
+    the raw query results as the 2 paramters and returns a boolean. When it's
+    the later, it'll be evaluated for every tool call. This allows you to write
+    some custom logic to dynamically turn summarisation on and off. _When the
+    summarisation is enabled, but you find the summaries not informative enough,
+    you can tell the LLM to disable the summarisation during the chat so that it
+    sees the raw information_;
+  - `adapter`: See [CodeCompanion documentation](https://codecompanion.olimorris.dev/configuration/adapters.html#configuring-adapters).
+    When not provided, it'll use the chat adapter;
+  - `system_prompt`: When set to a string, this will be used as the system
+    prompt for the summarisation model. When set to a function, it'll be called
+    with the default system prompt as the only parameter, and it should return
+    a string that will be used as a system prompt. This allows you to
+    append/prepend things to the default system prompt;
+  - `query_augmented`: boolean, whether the system prompt should contain the
+    query so that when the LLM decide what information to include, it _may_ be
+    able to avoid omitting stuff related to query.
+
+### [CopilotC-Nvim/CopilotChat.nvim](https://github.com/CopilotC-Nvim/CopilotChat.nvim)
+
+[CopilotC-Nvim/CopilotChat.nvim](https://github.com/CopilotC-Nvim/CopilotChat.nvim) 
+is a Neovim plugin that provides an interface to GitHub Copilot Chat. VectorCode 
+integration enriches the conversations by providing relevant repository context.
+
+#### Setup
+
+VectorCode offers a dedicated integration with CopilotChat.nvim that provides 
+contextual information about your codebase to enhance Copilot's responses. Add this 
+to your CopilotChat configuration:
+
+```lua
+local vectorcode_ctx = require('vectorcode.integrations.copilotchat').make_context_provider({
+  prompt_header = "Here are relevant files from the repository:", -- Customize header text
+  prompt_footer = "\nConsider this context when answering:", -- Customize footer text
+  skip_empty = true, -- Skip adding context when no files are retrieved
+})
+
+require('CopilotChat').setup({
+  -- Your other CopilotChat options...
+
+  contexts = {
+    -- Add the VectorCode context provider
+    vectorcode = vectorcode_ctx,
+  },
+
+  -- Enable VectorCode context in your prompts
+  prompts = {
+    Explain = {
+      prompt = "Explain the following code in detail:\n$input",
+      context = {"selection", "vectorcode"}, -- Add vectorcode to the context
+    },
+    -- Other prompts...
+  }
+})
+```
+
+#### Configuration Options
+
+The `make_context_provider` function accepts these options:
+
+- `prompt_header`: Text that appears before the code context (default: "The following are relevant files from the repository. Use them as extra context for helping with code completion and understanding:")
+- `prompt_footer`: Text that appears after the code context (default: "\nExplain and provide a strategy with examples about: \n")
+- `skip_empty`: Whether to skip adding context when no files are retrieved (default: true)
+- `format_file`: A function that formats each retrieved file (takes a file result object and returns formatted string)
+
+#### Usage Tips
+
+1. Register your buffers with VectorCode (`:VectorCode register`) to enable context fetching
+2. Create different prompt templates with or without VectorCode context depending on your needs
+3. For large codebases, consider adjusting the number of retrieved documents using `n_query` when registering buffers
+
+#### Performance Optimization
+
+The integration includes caching to avoid sending duplicate context to the LLM, which helps reduce token usage when asking multiple questions about the same codebase.
+
+#### Using with Sticky Prompts
+
+You can configure VectorCode to be part of your sticky prompts, ensuring every conversation includes relevant codebase context automatically:
+
+```lua
+require('CopilotChat').setup({
+  -- Your other CopilotChat options...
+
+  sticky = {
+    "Using the model $claude-3.7-sonnet-thought",
+    "#vectorcode", -- Automatically includes repository context in every conversation
+  },
+})
+```
+
+This configuration will include both the model specification and repository context in every conversation with CopilotChat.
+
+---
+### Status Line Component
+
+#### [nvim-lualine/lualine.nvim](https://github.com/nvim-lualine/lualine.nvim)
+A `lualine` component that shows the status of the async job and the number of
+cached retrieval results.
+```lua
+tabline = {
+  lualine_y = {
+    require("vectorcode.integrations").lualine(opts)
+  }
+}
+```
+`opts` is a table with the following configuration option:
+- `show_job_count`: boolean, whether to show the number of running jobs for the
+  buffer. Default: `false`.
+
+This will, however, start VectorCode when lualine starts (which usually means
+when neovim starts). If this bothers you, you can use the following
+snippet:
+```lua
+tabline = {
+  lualine_y = {
+    {
+      function()
+        return require("vectorcode.integrations").lualine(opts)[1]()
+      end,
+      cond = function()
+        if package.loaded["vectorcode"] == nil then
+          return false
+        else
+          return require("vectorcode.integrations").lualine(opts).cond()
+        end
+      end,
+    },
+  }
+}
+```
+This will further delay the loading of VectorCode to the moment you (or one of
+your plugins that actually retrieves context from VectorCode) load VectorCode.
+
+#### [heirline.nvim](https://github.com/rebelot/heirline.nvim)
+
+A heirline component is available as:
+```lua
+local vectorcode_component = require("vectorcode.integrations").heirline({
+  show_job_count = true,
+  component_opts = {
+    -- put other field of the components here.
+    -- they'll be merged into the final component.
+  },
+})
+```
+
+### [fidget.nvim](https://github.com/j-hui/fidget.nvim)
+
+If you're using
+[a LSP backend](https://github.com/Davidyz/VectorCode/blob/main/docs/cli.md#lsp-mode),
+there will be a notification when there's a pending request for queries. As long
+as the LSP backend is working, no special configuration is needed for this.
+
+### Model Context Protocol (MCP)
+
+The Python package contains an optional `mcp` dependency group. After installing
+this, you can use the MCP server with any MCP client. For example, to use it
+with [mcphub.nvim](https://github.com/ravitemer/mcphub.nvim), simply add this
+server in the JSON config:
+```json
+{
+  "mcpServers": {
+    "vectorcode-mcp-server": {
+      "command": "vectorcode-mcp-server",
+      "args": []
+    }
+  }
+}
+```
+
+## Configuration
+
+### `setup(opts?)`
+
+This function controls the behaviour of some of the APIs provided by the
+VectorCode neovim plugin. If you're using built-in integration interfaces, you
+usually don't have to worry about this section, unless otherwise specified in
+the relevant section.
+
+This function initialises the VectorCode client and sets up some default
+
+```lua
+-- Default configuration
+require("vectorcode").setup(
+  ---@type VectorCode.Opts
+  {
+    cli_cmds = {
+      vectorcode = "vectorcode",
+    },
+    ---@type VectorCode.RegisterOpts
+    async_opts = {
+      debounce = 10,
+      events = { "BufWritePost", "InsertEnter", "BufReadPost" },
+      exclude_this = true,
+      n_query = 1,
+      notify = false,
+      query_cb = require("vectorcode.utils").make_surrounding_lines_cb(-1),
+      run_on_register = false,
+    },
+    async_backend = "default", -- or "lsp"
+    exclude_this = true,
+    n_query = 1,
+    notify = true,
+    timeout_ms = 5000,
+    on_setup = {
+      update = false, -- set to true to enable update when `setup` is called.
+      lsp = false,
+    },
+    sync_log_env_var = false,
+  }
+)
+```
+
+The following are the available options for the parameter of this function:
+- `cli_cmds`: A table to customize the CLI command names / paths used by the plugin.
+  Supported key:
+  - `vectorcode`: The command / path to use for the main CLI tool. Default: `"vectorcode"`.
+- `n_query`: number of retrieved documents. A large number gives a higher chance
+  of including the right file, but with the risk of saturating the context 
+  window and getting truncated. Default: `1`;
+- `notify`: whether to show notifications when a query is completed.
+  Default: `true`;
+- `timeout_ms`: timeout in milliseconds for the query operation. Applies to
+  synchronous API only. Default: 
+  `5000` (5 seconds);
+- `exclude_this`: whether to exclude the file you're editing. Setting this to
+  `false` may lead to an outdated version of the current file being sent to the
+  LLM as the prompt, and can lead to generations with outdated information;
+- `async_opts`: default options used when registering buffers. See 
+  [`register_buffer(bufnr?, opts?)`](#register_bufferbufnr-opts) for details;
+- `async_backend`: the async backend to use, currently either `"default"` or
+  `"lsp"`. Default: `"default"`;
+- `on_setup`: some actions that can be registered to run when `setup` is called.
+  Supported keys:
+  - `update`: if `true`, the plugin will run `vectorcode update` on startup to
+    update the embeddings;
+  - `lsp`: if `true`, the plugin will try to start the LSP server on startup so
+    that you won't need to wait for the server loading when making your first 
+    request. _Please pay extra attention on lazy-loading so that the LSP server
+    won't be started without a buffer to be attached to (see [here](https://github.com/Davidyz/VectorCode/pull/234))._
+- `sync_log_env_var`: `boolean`. If true, this plugin will automatically set the
+  `VECTORCODE_LOG_LEVEL` environment variable for LSP or cmd processes started
+  within your neovim session when logging is turned on for this plugin. Use at 
+  caution because the non-LSP CLI write all logs to stderr, which _may_ make this plugin 
+  VERY verbose. See [Debugging and Logging](#debugging-and-logging) for details
+  on how to turn on logging.
+
+You may notice that a lot of options in `async_opts` are the same as the other
+options in the top-level of the main option table. This is because the top-level
+options are designated for the [Synchronous API](#synchronous-api) and the ones
+in `async_opts` is for the [Cached Asynchronous API](#cached-asynchronous-api).
+The `async_opts` will reuse the synchronous API options if not explicitly
+configured.
+
+## User Command
+
+The neovim plugin provides user commands to work with [async caching](#cached-asynchronous-api).
+
+### `VectorCode register`
+
+Register the current buffer for async caching. It's possible to register the
+current buffer to a different vectorcode project by passing the `project_root`
+parameter:
+```
+:VectorCode register project_root=path/to/another/project/
+```
+This is useful if you're working on a project that is closely related to a
+different project, for example a utility repository for a main library or a
+documentation repository. Alternatively, you can call the [lua API](#cached-asynchronous-api) in an autocmd:
+```lua
+vim.api.nvim_create_autocmd("LspAttach", {
+  callback = function()
+    local bufnr = vim.api.nvim_get_current_buf()
+    cacher.async_check("config", function()
+      cacher.register_buffer(
+        bufnr,
+        { 
+          n_query = 10,
+        }
+      )
+    end, nil)
+  end,
+  desc = "Register buffer for VectorCode",
+})
+```
+The latter avoids the manual registrations, but registering too many buffers
+means there will be a lot of background processes/requests being sent to
+VectorCode. Choose these based on your workflow and the capability of your
+system.
+
+### `VectorCode deregister`
+
+Deregister the current buffer. Any running jobs will be killed, cached results
+will be deleted, and no more queries will be run.
+
+
+## Debugging and Logging
+
+You can enable logging by setting `VECTORCODE_NVIM_LOG_LEVEL` environment
+variable to a 
+[supported log level](https://github.com/nvim-lua/plenary.nvim/blob/857c5ac632080dba10aae49dba902ce3abf91b35/lua/plenary/log.lua#L44). 
+The log file will be written to `stdpath("log")` or `stdpath("cache")`. On
+Linux, this is usually `~/.local/state/nvim/`.
diff --git a/docs/neovim.md b/docs/neovim/api_references.md
similarity index 52%
rename from docs/neovim.md
rename to docs/neovim/api_references.md
index eaa3209d..5865b07b 100644
--- a/docs/neovim.md
+++ b/docs/neovim/api_references.md
@@ -1,241 +1,46 @@
-# NeoVim Plugin
-> [!NOTE]
-> This plugin depends on the CLI tool. Please go through 
-> [the CLI documentation](./cli.md) and make sure the VectorCode CLI is working
-> before proceeding.
+# Lua API References
 
-> [!NOTE]
-> When the neovim plugin doesn't work properly, please try upgrading both the CLI
-> and the neovim plugin to the latest version before opening an issue.
-
-
-<!-- mtoc-start -->
-
-* [Installation](#installation)
-  * [Mason.nvim ](#masonnvim-)
-  * [Nix](#nix)
-* [Integrations](#integrations)
-* [Configuration](#configuration)
-  * [`setup(opts?)`](#setupopts)
-* [User Command](#user-command)
-  * [`VectorCode register`](#vectorcode-register)
-  * [`VectorCode deregister`](#vectorcode-deregister)
-* [API Usage](#api-usage)
-  * [Synchronous API](#synchronous-api)
-    * [`query(query_message, opts?, callback?)`](#queryquery_message-opts-callback)
-    * [`check(check_item?)`](#checkcheck_item)
-    * [`update(project_root?)`](#updateproject_root)
-  * [Cached Asynchronous API](#cached-asynchronous-api)
-    * [`cacher_backend.register_buffer(bufnr?, opts?)`](#cacher_backendregister_bufferbufnr-opts)
-    * [`cacher_backend.query_from_cache(bufnr?)`](#cacher_backendquery_from_cachebufnr)
-    * [`cacher_backend.async_check(check_item?, on_success?, on_failure?)`](#cacher_backendasync_checkcheck_item-on_success-on_failure)
-    * [`cacher_backend.buf_is_registered(bufnr?)`](#cacher_backendbuf_is_registeredbufnr)
-    * [`cacher_backend.buf_is_enabled(bufnr?)`](#cacher_backendbuf_is_enabledbufnr)
-    * [`cacher_backend.buf_job_count(bufnr?)`](#cacher_backendbuf_job_countbufnr)
-    * [`cacher_backend.make_prompt_component(bufnr?, component_cb?)`](#cacher_backendmake_prompt_componentbufnr-component_cb)
-    * [Built-in Query Callbacks](#built-in-query-callbacks)
-* [Debugging and Logging](#debugging-and-logging)
-
-<!-- mtoc-end -->
-
-## Installation
-Using Lazy:
-
-```lua 
-{
-  "Davidyz/VectorCode",
-  version = "*", -- optional, depending on whether you're on nightly or release
-  dependencies = { "nvim-lua/plenary.nvim" },
-  cmd = "VectorCode", -- if you're lazy-loading VectorCode
-}
-```
-The VectorCode CLI and neovim plugin share the same release scheme (version
-numbers). In other words, CLI 0.1.3 is guaranteed to work with neovim plugin
-0.1.3, but if you use CLI 0.1.0 with neovim plugin 0.1.3, they may not work
-together because the neovim plugin is built for a newer CLI release and depends
-on newer features/breaking changes.
-
-To ensure maximum compatibility, please either:
-1. Use release build for VectorCode CLI and pin to the releases for the
-   neovim plugin;
-
-**OR**
-
-2. Use the latest commit for the neovim plugin with VectorCode installed from
-   the latest GitHub commit.
-
-It may be helpful to use a `build` hook to automatically upgrade the CLI when
-the neovim plugin updates. For example, if you're using lazy.nvim and `uv`,
-you can use the following plugin spec:
-
-```lua
-{
-  "Davidyz/VectorCode",
-  version = "*",
-  build = "uv tool upgrade vectorcode", -- This helps keeping the CLI up-to-date
-  -- build = "pipx upgrade vectorcode", -- If you used pipx to install the CLI
-  dependencies = { "nvim-lua/plenary.nvim" },
-}
-```
-
-> This plugin is developed and tested on neovim _v0.11_. It may work on older
-> versions, but I do not test on them before publishing.
-
-### Mason.nvim 
-
-The VectorCode CLI and LSP server are available in `mason.nvim`. If you choose to
-install the CLI through mason, you may need to pay extra attention to the version 
-pinning because the package updates on mason usually takes extra time.
-
-### Nix
-
-There's a community-maintained [nix package](https://nixpk.gs/pr-tracker.html?pr=413395) 
-submitted by [@sarahec](https://github.com/sarahec) for the Neovim plugin.
-
-## Integrations
-
-[The wiki](https://github.com/Davidyz/VectorCode/wiki/Neovim-Integrations)
-contains instructions to integrate VectorCode with the following plugins:
-
-- [milanglacier/minuet-ai.nvim](https://github.com/milanglacier/minuet-ai.nvim);
-- [olimorris/codecompanion.nvim](https://github.com/olimorris/codecompanion.nvim);
-- [nvim-lualine/lualine.nvim](https://github.com/nvim-lualine/lualine.nvim);
-- [CopilotC-Nvim/CopilotChat.nvim](https://github.com/CopilotC-Nvim/CopilotChat.nvim);
-- [ravitemer/mcphub.nvim](https://github.com/ravitemer/mcphub.nvim);
-- [rebelot/heirline.nvim](https://github.com/rebelot/heirline.nvim).
-
-## Configuration
-
-### `setup(opts?)`
-This function initialises the VectorCode client and sets up some default
-
-```lua
--- Default configuration
-require("vectorcode").setup(
-  ---@type VectorCode.Opts
-  {
-    cli_cmds = {
-      vectorcode = "vectorcode",
-    },
-    ---@type VectorCode.RegisterOpts
-    async_opts = {
-      debounce = 10,
-      events = { "BufWritePost", "InsertEnter", "BufReadPost" },
-      exclude_this = true,
-      n_query = 1,
-      notify = false,
-      query_cb = require("vectorcode.utils").make_surrounding_lines_cb(-1),
-      run_on_register = false,
-    },
-    async_backend = "default", -- or "lsp"
-    exclude_this = true,
-    n_query = 1,
-    notify = true,
-    timeout_ms = 5000,
-    on_setup = {
-      update = false, -- set to true to enable update when `setup` is called.
-      lsp = false,
-    },
-    sync_log_env_var = false,
-  }
-)
-```
-
-The following are the available options for the parameter of this function:
-- `cli_cmds`: A table to customize the CLI command names / paths used by the plugin.
-  Supported key:
-  - `vectorcode`: The command / path to use for the main CLI tool. Default: `"vectorcode"`.
-- `n_query`: number of retrieved documents. A large number gives a higher chance
-  of including the right file, but with the risk of saturating the context 
-  window and getting truncated. Default: `1`;
-- `notify`: whether to show notifications when a query is completed.
-  Default: `true`;
-- `timeout_ms`: timeout in milliseconds for the query operation. Applies to
-  synchronous API only. Default: 
-  `5000` (5 seconds);
-- `exclude_this`: whether to exclude the file you're editing. Setting this to
-  `false` may lead to an outdated version of the current file being sent to the
-  LLM as the prompt, and can lead to generations with outdated information;
-- `async_opts`: default options used when registering buffers. See 
-  [`register_buffer(bufnr?, opts?)`](#register_bufferbufnr-opts) for details;
-- `async_backend`: the async backend to use, currently either `"default"` or
-  `"lsp"`. Default: `"default"`;
-- `on_setup`: some actions that can be registered to run when `setup` is called.
-  Supported keys:
-  - `update`: if `true`, the plugin will run `vectorcode update` on startup to
-    update the embeddings;
-  - `lsp`: if `true`, the plugin will try to start the LSP server on startup so
-    that you won't need to wait for the server loading when making your first 
-    request. _Please pay extra attention on lazy-loading so that the LSP server
-    won't be started without a buffer to be attached to (see [here](https://github.com/Davidyz/VectorCode/pull/234))._
-- `sync_log_env_var`: `boolean`. If true, this plugin will automatically set the
-  `VECTORCODE_LOG_LEVEL` environment variable for LSP or cmd processes started
-  within your neovim session when logging is turned on for this plugin. Use at 
-  caution because the non-LSP CLI write all logs to stderr, which _may_ make this plugin 
-  VERY verbose. See [Debugging and Logging](#debugging-and-logging) for details
-  on how to turn on logging.
-
-You may notice that a lot of options in `async_opts` are the same as the other
-options in the top-level of the main option table. This is because the top-level
-options are designated for the [Synchronous API](#synchronous-api) and the ones
-in `async_opts` is for the [Cached Asynchronous API](#cached-asynchronous-api).
-The `async_opts` will reuse the synchronous API options if not explicitly
-configured.
-
-## User Command
-
-The neovim plugin provides user commands to work with [async caching](#cached-asynchronous-api).
-
-### `VectorCode register`
-
-Register the current buffer for async caching. It's possible to register the
-current buffer to a different vectorcode project by passing the `project_root`
-parameter:
-```
-:VectorCode register project_root=path/to/another/project/
-```
-This is useful if you're working on a project that is closely related to a
-different project, for example a utility repository for a main library or a
-documentation repository. Alternatively, you can call the [lua API](#cached-asynchronous-api) in an autocmd:
-```lua
-vim.api.nvim_create_autocmd("LspAttach", {
-  callback = function()
-    local bufnr = vim.api.nvim_get_current_buf()
-    cacher.async_check("config", function()
-      cacher.register_buffer(
-        bufnr,
-        { 
-          n_query = 10,
-        }
-      )
-    end, nil)
-  end,
-  desc = "Register buffer for VectorCode",
-})
-```
-The latter avoids the manual registrations, but registering too many buffers
-means there will be a lot of background processes/requests being sent to
-VectorCode. Choose these based on your workflow and the capability of your
-system.
-
-### `VectorCode deregister`
-
-Deregister the current buffer. Any running jobs will be killed, cached results
-will be deleted, and no more queries will be run.
-
-## API Usage
-This plugin provides 2 sets of APIs that provides similar functionalities. The
+This plugin provides 2 sets of _high-level APIs_ that provides similar functionalities. The
 synchronous APIs provide more up-to-date retrieval results at the cost of
 blocking the main neovim UI, while the async APIs use a caching mechanism to 
 provide asynchronous retrieval results almost instantaneously, but the result
 may be slightly out-of-date. For some tasks like chat, the main UI being
 blocked/frozen doesn't hurt much because you spend the time waiting for response
 anyway, and you can use the synchronous API in this case. For other tasks like 
-completion, the async API will minimise the interruption to your workflow.
+completion, the cached API will minimise the interruption to your workflow, but
+at a cost of providing less up-to-date results.
 
+These APIs are wrappers around the _lower-level 
+[job runner API](https://github.com/Davidyz/VectorCode/tree/main/lua/vectorcode/jobrunner)_, 
+which provides a unified interface for calling VectorCode commands that can be
+executed by either the LSP or the generic CLI backend. If the high-level APIs
+are sufficient for your use-case, it's usually not necessary to use the job
+runners directly.
 
-### Synchronous API
-#### `query(query_message, opts?, callback?)`
+<!-- mtoc-start -->
+
+* [Synchronous API](#synchronous-api)
+  * [`query(query_message, opts?, callback?)`](#queryquery_message-opts-callback)
+  * [`check(check_item?)`](#checkcheck_item)
+  * [`update(project_root?)`](#updateproject_root)
+* [Cached Asynchronous API](#cached-asynchronous-api)
+  * [`cacher_backend.register_buffer(bufnr?, opts?)`](#cacher_backendregister_bufferbufnr-opts)
+  * [`cacher_backend.query_from_cache(bufnr?)`](#cacher_backendquery_from_cachebufnr)
+  * [`cacher_backend.async_check(check_item?, on_success?, on_failure?)`](#cacher_backendasync_checkcheck_item-on_success-on_failure)
+  * [`cacher_backend.buf_is_registered(bufnr?)`](#cacher_backendbuf_is_registeredbufnr)
+  * [`cacher_backend.buf_is_enabled(bufnr?)`](#cacher_backendbuf_is_enabledbufnr)
+  * [`cacher_backend.buf_job_count(bufnr?)`](#cacher_backendbuf_job_countbufnr)
+  * [`cacher_backend.make_prompt_component(bufnr?, component_cb?)`](#cacher_backendmake_prompt_componentbufnr-component_cb)
+  * [Built-in Query Callbacks](#built-in-query-callbacks)
+* [JobRunners](#jobrunners)
+  * [`run_async(args, callback, bufnr)` and `run(args, timeout_ms, bufnr)`](#run_asyncargs-callback-bufnr-and-runargs-timeout_ms-bufnr)
+  * [`is_job_running(job_handle):boolean`](#is_job_runningjob_handleboolean)
+  * [`stop_job(job_handle)`](#stop_jobjob_handle)
+
+<!-- mtoc-end -->
+
+## Synchronous API
+### `query(query_message, opts?, callback?)`
 This function queries VectorCode and returns an array of results.
 
 ```lua
@@ -288,7 +93,7 @@ end
 Keep in mind that this `query` function call will be synchronous and therefore
 block the neovim UI. This is where the async cache comes in.
 
-#### `check(check_item?)`
+### `check(check_item?)`
 This function checks if VectorCode has been configured properly for your project. See the [CLI manual for details](./cli.md).
 
 ```lua 
@@ -310,7 +115,7 @@ The use of this API is entirely optional. You can totally ignore this and call
 `query` anyway, but if `check` fails, you might be spending the waiting time for
 nothing.
 
-#### `update(project_root?)`
+### `update(project_root?)`
 This function calls `vectorcode update` at the current working directory.
 `--project_root` will be added if the `project_root` parameter is not `nil`.
 This runs async and doesn't block the main UI.
@@ -319,7 +124,7 @@ This runs async and doesn't block the main UI.
 require("vectorcode").update()
 ```
 
-### Cached Asynchronous API
+## Cached Asynchronous API
 
 The async cache mechanism helps mitigate the issue where the `query` API may
 take too long and block the main thread. The following are the functions
@@ -355,7 +160,7 @@ In the remaining section of this documentation, I'll use `cacher_backend` to
 represent either of the backends. Unless otherwise noticed, all the asynchronous APIs 
 work for both backends.
 
-#### `cacher_backend.register_buffer(bufnr?, opts?)`
+### `cacher_backend.register_buffer(bufnr?, opts?)`
 This function registers a buffer to be cached by VectorCode.
 
 ```lua
@@ -389,7 +194,7 @@ The following are the available options for this function:
     cancelled. Default: `false`.
 
 
-#### `cacher_backend.query_from_cache(bufnr?)`
+### `cacher_backend.query_from_cache(bufnr?)`
 This function queries VectorCode from cache.
 
 ```lua
@@ -405,7 +210,7 @@ The following are the available options for this function:
 Return value: an array of results. Each item of the array is in the format of 
 `{path="path/to/your/code.lua", document="document content"}`.
 
-#### `cacher_backend.async_check(check_item?, on_success?, on_failure?)`
+### `cacher_backend.async_check(check_item?, on_success?, on_failure?)`
 This function checks if VectorCode has been configured properly for your project.
 
 ```lua 
@@ -422,14 +227,14 @@ The following are the available options for this function:
 - `on_success`: a callback function that is called when the check passes;
 - `on_failure`: a callback function that is called when the check fails.
 
-#### `cacher_backend.buf_is_registered(bufnr?)`
+### `cacher_backend.buf_is_registered(bufnr?)`
 This function checks if a buffer has been registered with VectorCode.
 
 The following are the available options for this function:
 - `bufnr`: buffer number. Default: current buffer.
 Return value: `true` if registered, `false` otherwise.
 
-#### `cacher_backend.buf_is_enabled(bufnr?)`
+### `cacher_backend.buf_is_enabled(bufnr?)`
 This function checks if a buffer has been enabled with VectorCode. It is slightly
 different from `buf_is_registered`, because it does not guarantee VectorCode is actively
 caching the content of the buffer. It is the same as `buf_is_registered && not is_paused`.
@@ -438,10 +243,10 @@ The following are the available options for this function:
 - `bufnr`: buffer number. Default: current buffer.
 Return value: `true` if enabled, `false` otherwise.
 
-#### `cacher_backend.buf_job_count(bufnr?)`
+### `cacher_backend.buf_job_count(bufnr?)`
 Returns the number of running jobs in the background.
 
-#### `cacher_backend.make_prompt_component(bufnr?, component_cb?)`
+### `cacher_backend.make_prompt_component(bufnr?, component_cb?)`
 Compile the retrieval results into a string.
 Parameters:
 - `bufnr`: buffer number. Default: current buffer;
@@ -459,7 +264,7 @@ end
 - `content`: The retrieval results concatenated together into a string. Each
   result is formatted by `component_cb`.
 
-#### Built-in Query Callbacks
+### Built-in Query Callbacks
 
 When using async cache, the query message is constructed by a function that
 takes the buffer ID as the only parameter, and return a string or a list of
@@ -479,10 +284,69 @@ constructor for you to play around with it, but you can easily build your own!
   fallback to `make_surrounding_lines_cb(-1)`. The default value for `max_num`
   is 50.
 
-## Debugging and Logging
 
-You can enable logging by setting `VECTORCODE_NVIM_LOG_LEVEL` environment
-variable to a 
-[supported log level](https://github.com/nvim-lua/plenary.nvim/blob/857c5ac632080dba10aae49dba902ce3abf91b35/lua/plenary/log.lua#L44). 
-The log file will be written to `stdpath("log")` or `stdpath("cache")`. On
-Linux, this is usually `~/.local/state/nvim/`.
+## JobRunners
+
+The `VectorCode.JobRunner` is an abstract class for vectorcode command
+execution. There are 2 concrete child classes that you can use: 
+- `require("vectorcode.jobrunner.cmd")` uses the CLI (`vectorcode` commands) to
+  interact with the database;
+- `quire("vectorcode.jobrunner.lsp")` use the LSP server, which avoids some of
+  the IO overhead and provides LSP progress notifications.
+
+The available methods for a `VectorCode.JobRunner` object includes:
+
+### `run_async(args, callback, bufnr)` and `run(args, timeout_ms, bufnr)`
+Calls a vectorcode command.
+
+The `args` parameter (of type `string[]`) is whatever argument that comes after
+`vectorcode` when you run it in the CLI. For example, if you want to query for
+10 chunks in the shell, you'd call the following command:
+
+```bash
+vectorcode query -n 10 keyword1 keyword2 --include chunk
+```
+
+Then for the job runner (either LSP or cmd), the `args` parameter would be:
+```lua
+args = {"query", "-n", "10", "keyword1", "keyword2", "--include", "chunk"}
+```
+
+For the `run_async` method, the `callback` function has the
+following signature:
+```lua
+---@type fun(result: table, error: table, code:integer, signal: integer?)?
+```
+For the `run` method, the return value can be captured as follow:
+```lua
+res, err, _code, _signal = jobrunner.run(args, -1, 0)
+```
+
+The result (for both synchronous and asynchronous method) is a `vim.json.decode`ed 
+table of the result of the command execution. Consult 
+[the CLI documentation](../cli.md#for-developers) for the schema of the results for 
+the command that you call.
+
+For example, the query command mentioned above will return a
+`VectorCode.QueryResult[]`, where `VectorCode.QueryResult` is defined as
+follows:
+```lua
+---@class VectorCode.QueryResult
+---@field path string Path to the file
+---@field document string? Content of the file
+---@field chunk string?
+---@field start_line integer?
+---@field end_line integer?
+---@field chunk_id string?
+```
+
+The `run_async` will return a `job_handle` which is defined as an `integer?`.
+For the LSP backend, the job handle is the `request_id`. For the cmd runner, the
+job handle is the `PID` of the process.
+
+### `is_job_running(job_handle):boolean`
+Checks if a job associated with the given handle is currently running;
+
+
+### `stop_job(job_handle)`
+Attempts to stop or cancel the async job associated with the given handle.
diff --git a/lua/vectorcode/types.lua b/lua/vectorcode/types.lua
index 6e71483f..2fc039c7 100644
--- a/lua/vectorcode/types.lua
+++ b/lua/vectorcode/types.lua
@@ -8,7 +8,7 @@
 ---@field start_line integer?
 ---@field end_line integer?
 ---@field chunk_id string?
----@field summary string?
+---@field summary string? Used by the CodeCompanion tool only. Not part of the backend response
 
 ---@class VectorCode.LsResult
 ---@field project-root string