diff --git a/README.md b/README.md index 4de4428e..1383dfdf 100644 --- a/README.md +++ b/README.md @@ -7,18 +7,13 @@ VectorCode is a code repository indexing tool. It helps you build better prompt for your coding LLMs by indexing and providing information about the code repository you're working on. This repository also contains the corresponding -neovim plugin because that's what I used to write this tool. +neovim plugin that provides a set of APIs for you to build or enhance AI plugins, +and integrations for some of the popular plugins. > [!NOTE] > This project is in beta quality and is undergoing rapid iterations. > I know there are plenty of rooms for improvements, and any help is welcomed. -> [!NOTE] -> [Chromadb](https://www.trychroma.com/), the vector database backend behind -> this project, supports multiple embedding engines. I developed this tool using -> SentenceTransformer, but if you encounter any issues with a different embedding -> function, please open an issue (or even better, a pull request :D). - * [Why VectorCode?](#why-vectorcode) @@ -37,14 +32,14 @@ releases. Their capabilities on these projects are quite limited. With VectorCode, you can easily (and programmatically) inject task-relevant context from the project into the prompt. This significantly improves the quality of the model output and reduce hallucination. -![](./images/codecompanion_chat.png) + +[![asciicast](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD.svg)](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD?t=3) ## Documentation > [!NOTE] -> The documentation on the `main` branch reflects the code on the latest commit -> (apologies if I forget to update the docs, but this will be what I aim for). To -> check for the documentation for the version you're using, you can [check out +> The documentation on the `main` branch reflects the code on the latest commit. +> To check for the documentation for the version you're using, you can [check out > the corresponding tags](https://github.com/Davidyz/VectorCode/tags). - For the setup and usage of the command-line tool, see [the CLI documentation](./docs/cli.md); @@ -52,7 +47,8 @@ model output and reduce hallucination. [the neovim plugin documentation](./docs/neovim.md) for further instructions. - Additional resources: - the [wiki](https://github.com/Davidyz/VectorCode/wiki) for extra tricks and - tips that will help you get the most out of VectorCode; + tips that will help you get the most out of VectorCode, as well as + instructions to setup VectorCode to work with some other neovim plugins; - the [discussions](https://github.com/Davidyz/VectorCode/discussions) where you can ask general questions and share your cool usages about VectorCode. @@ -98,7 +94,7 @@ This project follows an adapted semantic versioning: - [ ] ability to view and delete files in a collection (atm you can only `drop` and `vectorise` again); - [x] joint search (kinda, using codecompanion.nvim/MCP); -- [ ] Nix support (#144); +- [x] Nix support (unofficial packages [here](https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode)); - [ ] Query rewriting (#124). diff --git a/doc/VectorCode-cli.txt b/doc/VectorCode-cli.txt index 08d58b1d..25589b47 100644 --- a/doc/VectorCode-cli.txt +++ b/doc/VectorCode-cli.txt @@ -121,8 +121,7 @@ significantly reduce the IO overhead and avoid potential race condition. If you’re setting up a standalone ChromaDB server, I recommend sticking to - v0.6.3. ChromaDB recently released v1.0.0, which may not work with VectorCode. - I’m testing with v1.0.0 and will publish a new release when it’s ready. + v0.6.3, because VectorCode is not ready for the upgrade to ChromaDB 1.0 yet. FOR WINDOWS USERS ~ @@ -146,6 +145,8 @@ NIX ~ A community-maintained Nix package is available here . +If you’re using nix to install a standalone Chromadb server, make sure to +stick to 0.6.3 . GETTING STARTED *VectorCode-cli-vectorcode-command-line-tool-getting-started* @@ -212,7 +213,7 @@ REFRESHING EMBEDDINGS ~ To maintain the accuracy of the vector search, it’s important to keep your embeddings up-to-date. You can simply run the `vectorise` subcommand on a file -to refresh the embedding for a particular file, and the CLI provides a +to refresh the embedding for that file. Apart from that, the CLI provides a `vectorcode update` subcommand, which updates the embeddings for all files that are currently indexed by VectorCode for the current project. @@ -241,8 +242,8 @@ For each project, VectorCode creates a collection (similar to tables in traditional databases) and puts the code embeddings in the corresponding collection. In the root directory of a project, you may run `vectorcode init`. This will initialise the repository with a subdirectory -`project_root/.vectorcode/`. This will mark this directory a _project root_, a -concept that will later be used to construct the collection. You may put a +`project_root/.vectorcode/`. This will mark this directory as a _project root_, +a concept that will later be used to construct the collection. You may put a `config.json` file in `project_root/.vectorcode`. This file may be used to store project-specific settings such as embedding functions and database entry point (more on this later). If you already have a global configuration file at @@ -272,31 +273,22 @@ hooks. The `init` subcommand provides a `--hooks` flag which helps you manage hooks when working with a git repository. You can put some custom hooks in `~/.config/vectorcode/hooks/` and the `vectorcode init --hooks` command will pick them up and append them to your existing hooks, or create new hook scripts -if they don’t exist yet. The hook files should be named the same as they -would be under the `.git/hooks` directory. For example, a pre-commit hook would -be named `~/.config/vectorcode/hooks/pre-commit`. +if they don’t exist yet. The custom hook files should be named the same as +they would be under the `.git/hooks` directory. For example, a pre-commit hook +would be named `~/.config/vectorcode/hooks/pre-commit`. By default, there are 2 pre-defined hooks: ->bash - # pre-commit hook that vectorise changed files before you commit. - diff_files=$(git diff --cached --name-only) - [ -z "$diff_files" ] || vectorcode vectorise $diff_files -< +1. A pre-commit hook that vectorises the modified files. +2. A post-checkout hook that:- vectorises the full repository if it’s an initial commit/clone and a + `vectorcode.include` spec is available (either locally in the project or + globally); +- vectorises the files changed by the checkout. + ->bash - # post-checkout hook that vectorise changed files when you checkout to a - # different branch/tag/commit - files=$(git diff --name-only "$1" "$2") - [ -z "$files" ] || vectorcode vectorise $files -< -When you run `vectorcode init --hooks` in a git repo, these 2 hooks will be -added to your `.git/hooks/`. Hooks that are managed by VectorCode will be -wrapped by `# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines. -They help VectorCode determine whether hooks have been added, so don’t delete -the markers unless you know what you’re doing. To remove the hooks, simply -delete the lines wrapped by these 2 comment strings. +Both hooks will only be triggered on repositories that have a `.vectorcode` +directory in them. CONFIGURING VECTORCODE ~ @@ -328,31 +320,32 @@ model_name="nomic-embed-text")`. Default: `{}`; - `db_url`string, the url that points to the Chromadb server. VectorCode will start an HTTP server for Chromadb at a randomly picked free port on `localhost` if your configured `http://host:port` is not accessible. Default: `http://127.0.0.1:8000`; - -`db_path`string, Path to local persistent database. This is where the files for -your database will be stored. Default: `~/.local/share/vectorcode/chromadb/`; - -`db_log_path`string, path to the _directory_ where the built-in chromadb server -will write the log to. Default: `~/.local/share/vectorcode/`; - -`chunk_size`integer, the maximum number of characters per chunk. A larger value -reduces the number of items in the database, and hence accelerates the search, -but at the cost of potentially truncated data and lost information. Default: -`2500`. To disable chunking, set it to a negative number; - -`overlap_ratio`float between 0 and 1, the ratio of overlapping/shared content -between 2 adjacent chunks. A larger ratio improves the coherences of chunks, -but at the cost of increasing number of entries in the database and hence -slowing down the search. Default: `0.2`. _Starting from 0.4.11, VectorCode will -use treesitter to parse languages that it can automatically detect. It uses -pygments to guess the language from filename, and tree-sitter-language-pack to -fetch the correct parser. overlap_ratio has no effects when treesitter works. -If VectorCode fails to find an appropriate parser, it’ll fallback to the -legacy naive parser, in which case overlap_ratio works exactly in the same way -as before;_ - `query_multiplier`integer, when you use the `query` command to -retrieve `n` documents, VectorCode will check `n * query_multiplier` chunks and -return at most `n` documents. A larger value of `query_multiplier` guarantees -the return of `n` documents, but with the risk of including too many -less-relevant chunks that may affect the document selection. Default: `-1` (any -negative value means selecting documents based on all indexed chunks); - -`reranker`string, the reranking method to use. Currently supports -`CrossEncoderReranker` (default, using sentence-transformers cross-encoder +`db_path`string, Path to local persistent database. If you didn’t set up a +standalone Chromadb server, this is where the files for your database will be +stored. Default: `~/.local/share/vectorcode/chromadb/`; - `db_log_path`string, +path to the _directory_ where the built-in chromadb server will write the log +to. Default: `~/.local/share/vectorcode/`; - `chunk_size`integer, the maximum +number of characters per chunk. A larger value reduces the number of items in +the database, and hence accelerates the search, but at the cost of potentially +truncated data and lost information. Default: `2500`. To disable chunking, set +it to a negative number; - `overlap_ratio`float between 0 and 1, the ratio of +overlapping/shared content between 2 adjacent chunks. A larger ratio improves +the coherence of chunks, but at the cost of increasing number of entries in the +database and hence slowing down the search. Default: `0.2`. _Starting from +0.4.11, VectorCode will use treesitter to parse languages that it can +automatically detect. It uses pygments to guess the language from filename, and +tree-sitter-language-pack to fetch the correct parser. overlap_ratio has no +effects when treesitter works. If VectorCode fails to find an appropriate +parser, it’ll fallback to the legacy naive parser, in which case +overlap_ratio works exactly in the same way as before;_ - +`query_multiplier`integer, when you use the `query` command to retrieve `n` +documents, VectorCode will check `n * query_multiplier` chunks and return at +most `n` documents. A larger value of `query_multiplier` guarantees the return +of `n` documents, but with the risk of including too many less-relevant chunks +that may affect the document selection. Default: `-1` (any negative value means +selecting documents based on all indexed chunks); - `reranker`string, the +reranking method to use. Currently supports `CrossEncoderReranker` (default, +using sentence-transformers cross-encoder ) and `NaiveReranker` (sort chunks by the "distance" between the embedding vectors); - `reranker_params`dictionary, similar to `embedding_params`. The @@ -361,7 +354,7 @@ these are the options passed to the `CrossEncoder` class. For example, if you want to use a non-default model, you can use the following: `json { "reranker_params": { "model_name_or_path": "your_model_here" -} }` ; - `db_settings`dictionary, works in a similar way to `embedding_params`, +} }` - `db_settings`dictionary, works in a similar way to `embedding_params`, but for Chromadb client settings so that you can configure authentication for remote Chromadb ; - `hnsw`a dictionary of hnsw settings @@ -369,9 +362,8 @@ remote Chromadb ; - improve the query performances or avoid runtime errors during queries. **It’s recommended to re-vectorise the collection after modifying these options, because some of the options can only be set during collection creation.** -Example: `json5 // the following is the default value. "hnsw": { "hnsw:M": 64, -}` - `filetype_map``dict[str, list[str]]`, a dictionary where keys are language -name +Example (and default): `json5 "hnsw": { "hnsw:M": 64, }` - +`filetype_map``dict[str, list[str]]`, a dictionary where keys are language name and values are lists of Python regex patterns that will match file extensions. @@ -566,7 +558,7 @@ the `VECTORCODE_LOG_LEVEL` variable to one of `ERROR`, `WARN` (`WARNING`), `INFO` or `DEBUG`. For the CLI that you interact with in your shell, this will output logs to `STDERR` and write a log file to `~/.local/share/vectorcode/logs/`. For LSP and MCP servers, because `STDIO` is -used for the RPC, only the log file will be written. +used for the RPC, the logs will only be written to the log file, not `STDERR`. For example: @@ -575,6 +567,9 @@ For example: < + Depending on the MCP/LSP client implementation, you may need to take extra + steps to make sure the environment variables are captured by VectorCode. + SHELL COMPLETION*VectorCode-cli-vectorcode-command-line-tool-shell-completion* VectorCode supports shell completion for bash/zsh/tcsh. You can use `vectorcode @@ -602,9 +597,9 @@ following options in the JSON config file: For Intel users, sentence transformer supports OpenVINO -backend for supported GPU. Run `pipx install vectorcode[intel]` which will -bundle the relevant libraries when you install VectorCode. After that, you will -need to configure `SentenceTransformer` to use `openvino` backend. In your +backend for supported GPU. Run `uv install vectorcode[intel]` which will bundle +the relevant libraries when you install VectorCode. After that, you will need +to configure `SentenceTransformer` to use `openvino` backend. In your `config.json`, set `backend` key in `embedding_params` to `"openvino"` >json diff --git a/doc/VectorCode.txt b/doc/VectorCode.txt index dfcaf4a7..22aeeec1 100644 --- a/doc/VectorCode.txt +++ b/doc/VectorCode.txt @@ -6,8 +6,8 @@ Table of Contents *VectorCode-table-of-contents* 1. NeoVim Plugin |VectorCode-neovim-plugin| - Installation |VectorCode-neovim-plugin-installation| - Integrations |VectorCode-neovim-plugin-integrations| - - User Command |VectorCode-neovim-plugin-user-command| - Configuration |VectorCode-neovim-plugin-configuration| + - User Command |VectorCode-neovim-plugin-user-command| - API Usage |VectorCode-neovim-plugin-api-usage| - Debugging and Logging |VectorCode-neovim-plugin-debugging-and-logging| 2. Links |VectorCode-links| @@ -21,15 +21,16 @@ Table of Contents *VectorCode-table-of-contents* proceeding. [!NOTE] When the neovim plugin doesn’t work properly, please try upgrading - the CLI tool to the latest version before opening an issue. + both the CLI and the neovim plugin to the latest version before opening an + issue. - |VectorCode-installation| - |VectorCode-nix| - |VectorCode-integrations| +- |VectorCode-configuration| + - |VectorCode-`setup(opts?)`| - |VectorCode-user-command| - |VectorCode-`vectorcode-register`| - |VectorCode-`vectorcode-deregister`| -- |VectorCode-configuration| - - |VectorCode-`setup(opts?)`| - |VectorCode-api-usage| - |VectorCode-synchronous-api| - |VectorCode-`query(query_message,-opts?,-callback?)`| @@ -49,7 +50,7 @@ Table of Contents *VectorCode-table-of-contents* INSTALLATION *VectorCode-neovim-plugin-installation* -Use your favorite plugin manager. +Using Lazy: >lua { @@ -67,7 +68,7 @@ together because the neovim plugin is built for a newer CLI release and depends on newer features/breaking changes. To ensure maximum compatibility, please either: 1. Use release build for -VectorCode CLI and pin to the release tags for the neovim plugin; +VectorCode CLI and pin to the releases for the neovim plugin; **OR** @@ -75,7 +76,7 @@ VectorCode CLI and pin to the release tags for the neovim plugin; the latest GitHub commit. It may be helpful to use a `build` hook to automatically upgrade the CLI when -the neovim plugin updates. For example, if you’re using lazy.nvim and `pipx`, +the neovim plugin updates. For example, if you’re using lazy.nvim and `uv`, you can use the following plugin spec: >lua @@ -111,53 +112,6 @@ contains instructions to integrate VectorCode with the following plugins: - ravitemer/mcphub.nvim . -USER COMMAND *VectorCode-neovim-plugin-user-command* - - -VECTORCODE REGISTER ~ - -Register the current buffer for async caching. It’s possible to register the -current buffer to a different vectorcode project by passing the `project_root` -parameter: - -> - :VectorCode register project_root=path/to/another/project/ -< - -This is useful if you’re working on a project that is closely related to a -different project, for example a utility repository for a main library or a -documentation repository. Alternatively, you can call the |VectorCode-lua-api| -in an autocmd: - ->lua - vim.api.nvim_create_autocmd("LspAttach", { - callback = function() - local bufnr = vim.api.nvim_get_current_buf() - cacher.async_check("config", function() - cacher.register_buffer( - bufnr, - { - n_query = 10, - } - ) - end, nil) - end, - desc = "Register buffer for VectorCode", - }) -< - -The latter avoids the manual registrations, but registering too many buffers -means there will be a lot of background processes/requests being sent to -VectorCode. Choose these based on your workflow and the capability of your -system. - - -VECTORCODE DEREGISTER ~ - -Deregister the current buffer. Any running jobs will be killed, cached results -will be deleted, and no more queries will be run. - - CONFIGURATION *VectorCode-neovim-plugin-configuration* @@ -220,9 +174,9 @@ wait for the server loading when making your first request. - `sync_log_env_var``boolean`. If true, this plugin will automatically set the `VECTORCODE_LOG_LEVEL` environment variable for LSP or cmd processes started within your neovim session when logging is turned on for this plugin. Use at -caution because the CLI write all logs to stderr, which _may_ make this plugin -VERY verbose. See |VectorCode-debugging-and-logging| for details on how to turn -on logging. +caution because the non-LSP CLI write all logs to stderr, which _may_ make this +plugin VERY verbose. See |VectorCode-debugging-and-logging| for details on how +to turn on logging. You may notice that a lot of options in `async_opts` are the same as the other options in the top-level of the main option table. This is because the @@ -232,6 +186,56 @@ ones in `async_opts` is for the |VectorCode-cached-asynchronous-api|. The configured. +USER COMMAND *VectorCode-neovim-plugin-user-command* + +The neovim plugin provides user commands to work with +|VectorCode-async-caching|. + + +VECTORCODE REGISTER ~ + +Register the current buffer for async caching. It’s possible to register the +current buffer to a different vectorcode project by passing the `project_root` +parameter: + +> + :VectorCode register project_root=path/to/another/project/ +< + +This is useful if you’re working on a project that is closely related to a +different project, for example a utility repository for a main library or a +documentation repository. Alternatively, you can call the |VectorCode-lua-api| +in an autocmd: + +>lua + vim.api.nvim_create_autocmd("LspAttach", { + callback = function() + local bufnr = vim.api.nvim_get_current_buf() + cacher.async_check("config", function() + cacher.register_buffer( + bufnr, + { + n_query = 10, + } + ) + end, nil) + end, + desc = "Register buffer for VectorCode", + }) +< + +The latter avoids the manual registrations, but registering too many buffers +means there will be a lot of background processes/requests being sent to +VectorCode. Choose these based on your workflow and the capability of your +system. + + +VECTORCODE DEREGISTER ~ + +Deregister the current buffer. Any running jobs will be killed, cached results +will be deleted, and no more queries will be run. + + API USAGE *VectorCode-neovim-plugin-api-usage* This plugin provides 2 sets of APIs that provides similar functionalities. The @@ -304,6 +308,9 @@ path/document content to the prompt like this: end < +Keep in mind that this `query` function call will be synchronous and therefore +block the neovim UI. This is where the async cache comes in. + CHECK(CHECK_ITEM?) @@ -367,8 +374,8 @@ path to the executable) by calling `vim.lsp.config('vectorcode_server', opts)`. Cons Heavy IO overhead because the Requires vectorcode-server; Only embedding model and database works if you’re using a standalone - client need to be initialised ChromaDB server; May contain bugs - for every query. because it’s new. + client need to be initialised ChromaDB server. + for every query. ------------------------------------------------------------------------------- You may choose which backend to use by setting the |VectorCode-`setup`| option `async_backend`, and acquire the corresponding backend by the following API: @@ -397,9 +404,9 @@ This function registers a buffer to be cached by VectorCode. < The following are the available options for this function: - `bufnr`buffer -number. Default: current buffer; - `opts`accepts a lua table with the following -keys: - `project_root`a string of the path that overrides the detected project -root. Default: `nil`. This is mostly intended to use with the +number. Default: `0` (current buffer); - `opts`accepts a lua table with the +following keys: - `project_root`a string of the path that overrides the +detected project root. Default: `nil`. This is mostly intended to use with the |VectorCode-user-command|, and you probably should not use this directly in your config. **If you’re using the LSP backend and did not specify this value, it will be automatically detected based on .vectorcode or .git. If this diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index cabbb293..67d0df42 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -1,6 +1,7 @@ This project uses [pre-commit](https://pre-commit.com/) to perform some -formatting and linting that hasn't made its way into CI/CD. If you're -contributing to this project, make sure you set it up before you make the commit. +formatting and linting. If you're +contributing to this project, having it on your system will help you write code +that passes the CI. You can also see [.pre-commit-config.yaml](https://github.com/Davidyz/VectorCode/blob/main/.pre-commit-config.yaml) for a list of hooks enabled for the repo. @@ -16,9 +17,9 @@ actually optional, but for convenience I decided to leave them here. This will include [pytest](https://docs.pytest.org/en/stable/), the testing framework, and [coverage.py](https://coverage.readthedocs.io/en/7.7.1/), the coverage report tool. If you're not familiar with pytest or coverage.py, you can run `make test` to -run tests, and `make coverage` to generate a coverage report. The testing and -coverage report are also in the CI configuration, but it might still help to run -them locally before you open the PR. +run tests on all python code, and `make coverage` to generate a coverage report. +The testing and coverage report are also in the CI configuration, but it might +still help to run them locally before you open the PR. This project also runs static analysis with [basedpyright](https://docs.basedpyright.com). GitHub Action will also run the diff --git a/docs/cli.md b/docs/cli.md index 688c4f8a..393ee0a3 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -99,9 +99,8 @@ set up a standalone local server (they provides detailed instructions through will significantly reduce the IO overhead and avoid potential race condition. > If you're setting up a standalone ChromaDB server, I recommend sticking to -> v0.6.3. -> ChromaDB recently released v1.0.0, which may not work with VectorCode. I'm -> testing with v1.0.0 and will publish a new release when it's ready. +> v0.6.3, +> because VectorCode is not ready for the upgrade to ChromaDB 1.0 yet. ### For Windows Users @@ -120,7 +119,9 @@ architecture, python version and the vectorcode virtual environment ### Nix A community-maintained Nix package is available -[here](https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode). +[here](https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode). +If you're using nix to install a standalone Chromadb server, make sure to stick +to [0.6.3](https://github.com/NixOS/nixpkgs/pull/412528). ## Getting Started @@ -173,9 +174,9 @@ reading or use the `--help` flag. To maintain the accuracy of the vector search, it's important to keep your embeddings up-to-date. You can simply run the `vectorise` subcommand on a file -to refresh the embedding for a particular file, and the CLI provides a +to refresh the embedding for that file. Apart from that, the CLI provides a `vectorcode update` subcommand, which updates the embeddings for all files that -are currently indexed by VectorCode for the current project. +are currently indexed by VectorCode for the current project. If you want something more automagic, check out [the advanced usage section](#git-hooks) @@ -197,7 +198,7 @@ For each project, VectorCode creates a collection (similar to tables in traditional databases) and puts the code embeddings in the corresponding collection. In the root directory of a project, you may run `vectorcode init`. This will initialise the repository with a subdirectory -`project_root/.vectorcode/`. This will mark this directory a _project root_, a +`project_root/.vectorcode/`. This will mark this directory as a _project root_, a concept that will later be used to construct the collection. You may put a `config.json` file in `project_root/.vectorcode`. This file may be used to store project-specific settings such as embedding functions and database entry point @@ -226,29 +227,21 @@ hooks. The `init` subcommand provides a `--hooks` flag which helps you manage hooks when working with a git repository. You can put some custom hooks in `~/.config/vectorcode/hooks/` and the `vectorcode init --hooks` command will pick them up and append them to your existing hooks, or create new hook scripts -if they don't exist yet. The hook files should be named the same as they would -be under the `.git/hooks` directory. For example, a pre-commit hook would be named -`~/.config/vectorcode/hooks/pre-commit`. +if they don't exist yet. The custom hook files should be named the same as they +would be under the `.git/hooks` directory. For example, a pre-commit hook would +be named `~/.config/vectorcode/hooks/pre-commit`. By default, there are 2 pre-defined hooks: -```bash -# pre-commit hook that vectorise changed files before you commit. -diff_files=$(git diff --cached --name-only) -[ -z "$diff_files" ] || vectorcode vectorise $diff_files -``` -```bash -# post-checkout hook that vectorise changed files when you checkout to a -# different branch/tag/commit -files=$(git diff --name-only "$1" "$2") -[ -z "$files" ] || vectorcode vectorise $files -``` -When you run `vectorcode init --hooks` in a git repo, these 2 hooks will be added -to your `.git/hooks/`. Hooks that are managed by VectorCode will be wrapped by -`# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines. They help -VectorCode determine whether hooks have been added, so don't delete the markers -unless you know what you're doing. To remove the hooks, simply delete the lines -wrapped by these 2 comment strings. +1. A pre-commit hook that vectorises the modified files. +2. A post-checkout hook that: + - vectorises the full repository if it's an initial commit/clone and a + `vectorcode.include` spec is available (either locally in the project or + globally); + - vectorises the files changed by the checkout. + +Both hooks will only be triggered on repositories that have a `.vectorcode` +directory in them. ### Configuring VectorCode Since 0.6.4, VectorCode adapted a [json5 parser](https://github.com/dpranke/pyjson5) @@ -279,8 +272,9 @@ The JSON configuration file may hold the following values: - `db_url`: string, the url that points to the Chromadb server. VectorCode will start an HTTP server for Chromadb at a randomly picked free port on `localhost` if your configured `http://host:port` is not accessible. Default: `http://127.0.0.1:8000`; -- `db_path`: string, Path to local persistent database. This is where the files for - your database will be stored. Default: `~/.local/share/vectorcode/chromadb/`; +- `db_path`: string, Path to local persistent database. If you didn't set up a standalone + Chromadb server, this is where the files for your database will be stored. + Default: `~/.local/share/vectorcode/chromadb/`; - `db_log_path`: string, path to the _directory_ where the built-in chromadb server will write the log to. Default: `~/.local/share/vectorcode/`; - `chunk_size`: integer, the maximum number of characters per chunk. A larger @@ -288,7 +282,7 @@ The JSON configuration file may hold the following values: search, but at the cost of potentially truncated data and lost information. Default: `2500`. To disable chunking, set it to a negative number; - `overlap_ratio`: float between 0 and 1, the ratio of overlapping/shared content - between 2 adjacent chunks. A larger ratio improves the coherences of chunks, + between 2 adjacent chunks. A larger ratio improves the coherence of chunks, but at the cost of increasing number of entries in the database and hence slowing down the search. Default: `0.2`. _Starting from 0.4.11, VectorCode will use treesitter to parse languages that it can automatically detect. It @@ -323,7 +317,6 @@ The JSON configuration file may hold the following values: } } ``` - ; - `db_settings`: dictionary, works in a similar way to `embedding_params`, but for Chromadb client settings so that you can configure [authentication for remote Chromadb](https://docs.trychroma.com/production/administration/auth); @@ -332,9 +325,8 @@ The JSON configuration file may hold the following values: that may improve the query performances or avoid runtime errors during queries. **It's recommended to re-vectorise the collection after modifying these options, because some of the options can only be set during collection - creation.** Example: + creation.** Example (and default): ```json5 - // the following is the default value. "hnsw": { "hnsw:M": 64, } @@ -516,14 +508,17 @@ When something doesn't work as expected, you can enable logging by setting the `VECTORCODE_LOG_LEVEL` variable to one of `ERROR`, `WARN` (`WARNING`), `INFO` or `DEBUG`. For the CLI that you interact with in your shell, this will output logs to `STDERR` and write a log file to `~/.local/share/vectorcode/logs/`. For LSP -and MCP servers, because `STDIO` is used for the RPC, only the log file will be -written. +and MCP servers, because `STDIO` is used for the RPC, the logs will only be +written to the log file, not `STDERR`. For example: ```bash VECTORCODE_LOG_LEVEL=INFO vectorcode vectorise file1.py file2.lua ``` +> Depending on the MCP/LSP client implementation, you may need to take extra +> steps to make sure the environment variables are captured by VectorCode. + ## Shell Completion VectorCode supports shell completion for bash/zsh/tcsh. You can use `vectorcode -s {bash,zsh,tcsh}` @@ -547,7 +542,7 @@ following options in the JSON config file: For Intel users, [sentence transformer](https://www.sbert.net/index.html) supports [OpenVINO](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) -backend for supported GPU. Run `pipx install vectorcode[intel]` which will +backend for supported GPU. Run `uv install vectorcode[intel]` which will bundle the relevant libraries when you install VectorCode. After that, you will need to configure `SentenceTransformer` to use `openvino` backend. In your `config.json`, set `backend` key in `embedding_params` to `"openvino"`: diff --git a/docs/neovim.md b/docs/neovim.md index 1051ffb9..0f61566c 100644 --- a/docs/neovim.md +++ b/docs/neovim.md @@ -5,8 +5,8 @@ > before proceeding. > [!NOTE] -> When the neovim plugin doesn't work properly, please try upgrading the CLI -> tool to the latest version before opening an issue. +> When the neovim plugin doesn't work properly, please try upgrading both the CLI +> and the neovim plugin to the latest version before opening an issue. @@ -14,11 +14,11 @@ * [Installation](#installation) * [Nix](#nix) * [Integrations](#integrations) +* [Configuration](#configuration) + * [`setup(opts?)`](#setupopts) * [User Command](#user-command) * [`VectorCode register`](#vectorcode-register) * [`VectorCode deregister`](#vectorcode-deregister) -* [Configuration](#configuration) - * [`setup(opts?)`](#setupopts) * [API Usage](#api-usage) * [Synchronous API](#synchronous-api) * [`query(query_message, opts?, callback?)`](#queryquery_message-opts-callback) @@ -38,7 +38,7 @@ ## Installation -Use your favorite plugin manager. +Using Lazy: ```lua { @@ -55,7 +55,7 @@ together because the neovim plugin is built for a newer CLI release and depends on newer features/breaking changes. To ensure maximum compatibility, please either: -1. Use release build for VectorCode CLI and pin to the release tags for the +1. Use release build for VectorCode CLI and pin to the releases for the neovim plugin; **OR** @@ -64,7 +64,7 @@ To ensure maximum compatibility, please either: the latest GitHub commit. It may be helpful to use a `build` hook to automatically upgrade the CLI when -the neovim plugin updates. For example, if you're using lazy.nvim and `pipx`, +the neovim plugin updates. For example, if you're using lazy.nvim and `uv`, you can use the following plugin spec: ```lua @@ -96,44 +96,6 @@ contains instructions to integrate VectorCode with the following plugins: - [CopilotC-Nvim/CopilotChat.nvim](https://github.com/CopilotC-Nvim/CopilotChat.nvim); - [ravitemer/mcphub.nvim](https://github.com/ravitemer/mcphub.nvim). -## User Command -### `VectorCode register` - -Register the current buffer for async caching. It's possible to register the -current buffer to a different vectorcode project by passing the `project_root` -parameter: -``` -:VectorCode register project_root=path/to/another/project/ -``` -This is useful if you're working on a project that is closely related to a -different project, for example a utility repository for a main library or a -documentation repository. Alternatively, you can call the [lua API](#cached-asynchronous-api) in an autocmd: -```lua -vim.api.nvim_create_autocmd("LspAttach", { - callback = function() - local bufnr = vim.api.nvim_get_current_buf() - cacher.async_check("config", function() - cacher.register_buffer( - bufnr, - { - n_query = 10, - } - ) - end, nil) - end, - desc = "Register buffer for VectorCode", -}) -``` -The latter avoids the manual registrations, but registering too many buffers -means there will be a lot of background processes/requests being sent to -VectorCode. Choose these based on your workflow and the capability of your -system. - -### `VectorCode deregister` - -Deregister the current buffer. Any running jobs will be killed, cached results -will be deleted, and no more queries will be run. - ## Configuration ### `setup(opts?)` @@ -200,7 +162,7 @@ The following are the available options for the parameter of this function: - `sync_log_env_var`: `boolean`. If true, this plugin will automatically set the `VECTORCODE_LOG_LEVEL` environment variable for LSP or cmd processes started within your neovim session when logging is turned on for this plugin. Use at - caution because the CLI write all logs to stderr, which _may_ make this plugin + caution because the non-LSP CLI write all logs to stderr, which _may_ make this plugin VERY verbose. See [Debugging and Logging](#debugging-and-logging) for details on how to turn on logging. @@ -211,6 +173,47 @@ in `async_opts` is for the [Cached Asynchronous API](#cached-asynchronous-api). The `async_opts` will reuse the synchronous API options if not explicitly configured. +## User Command + +The neovim plugin provides user commands to work with [async caching](#cached-asynchronous-api). + +### `VectorCode register` + +Register the current buffer for async caching. It's possible to register the +current buffer to a different vectorcode project by passing the `project_root` +parameter: +``` +:VectorCode register project_root=path/to/another/project/ +``` +This is useful if you're working on a project that is closely related to a +different project, for example a utility repository for a main library or a +documentation repository. Alternatively, you can call the [lua API](#cached-asynchronous-api) in an autocmd: +```lua +vim.api.nvim_create_autocmd("LspAttach", { + callback = function() + local bufnr = vim.api.nvim_get_current_buf() + cacher.async_check("config", function() + cacher.register_buffer( + bufnr, + { + n_query = 10, + } + ) + end, nil) + end, + desc = "Register buffer for VectorCode", +}) +``` +The latter avoids the manual registrations, but registering too many buffers +means there will be a lot of background processes/requests being sent to +VectorCode. Choose these based on your workflow and the capability of your +system. + +### `VectorCode deregister` + +Deregister the current buffer. Any running jobs will be killed, cached results +will be deleted, and no more queries will be run. + ## API Usage This plugin provides 2 sets of APIs that provides similar functionalities. The synchronous APIs provide more up-to-date retrieval results at the cost of @@ -273,7 +276,8 @@ prompt = function(prefix, suffix) .. "<|fim_middle|>" end ``` - +Keep in mind that this `query` function call will be synchronous and therefore +block the neovim UI. This is where the async cache comes in. #### `check(check_item?)` This function checks if VectorCode has been configured properly for your project. See the [CLI manual for details](./cli.md). @@ -328,7 +332,7 @@ interface: | Features | `default` | `lsp` | |----------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | **Pros** | Fully backward compatible with minimal extra config required | Less IO overhead for loading/unloading embedding models; Progress reports. | -| **Cons** | Heavy IO overhead because the embedding model and database client need to be initialised for every query. | Requires `vectorcode-server`; Only works if you're using a standalone ChromaDB server; May contain bugs because it's new. | +| **Cons** | Heavy IO overhead because the embedding model and database client need to be initialised for every query. | Requires `vectorcode-server`; Only works if you're using a standalone ChromaDB server. | You may choose which backend to use by setting the [`setup`](#setupopts) option `async_backend`, and acquire the corresponding backend by the following API: @@ -352,7 +356,7 @@ cacher_backend.register_buffer(0, { ``` The following are the available options for this function: -- `bufnr`: buffer number. Default: current buffer; +- `bufnr`: buffer number. Default: `0` (current buffer); - `opts`: accepts a lua table with the following keys: - `project_root`: a string of the path that overrides the detected project root. Default: `nil`. This is mostly intended to use with the [user command](#vectorcode-register),