Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 34 additions & 33 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
* [If Anything Goes Wrong...](#if-anything-goes-wrong)
* [Advanced Usage](#advanced-usage)
* [Initialising a Project](#initialising-a-project)
* [Git Hooks](#git-hooks)
* [Configuring VectorCode](#configuring-vectorcode)
* [Vectorising Your Code](#vectorising-your-code)
* [File Specs](#file-specs)
* [Git Hooks](#git-hooks)
* [Making a Query](#making-a-query)
* [Listing All Collections](#listing-all-collections)
* [Removing a Collection](#removing-a-collection)
Expand Down Expand Up @@ -123,8 +123,7 @@ vectorcode vectorise src/**/*.py
```
> VectorCode doesn't track file changes, so you need to re-vectorise edited
> files. You may automate this by a git pre-commit hook, etc. See the
> [wiki](https://github.com/Davidyz/VectorCode/wiki/Tips-and-Tricks#git-hooks)
> for examples to set them up.
> [advanced usage section](#git-hooks) for examples to set them up.

Ideally, you should try to vectorise all source code in the repo, but for large
repos you may experience slow queries. If that happens, try to `vectorcode drop`
Expand Down Expand Up @@ -164,7 +163,7 @@ to refresh the embedding for a particular file, and the CLI provides a
are currently indexed by VectorCode for the current project.

If you want something more automagic, check out
[this section in the wiki](https://github.com/Davidyz/VectorCode/wiki/Tips-and-Tricks#git-hooks)
[the advanced usage section](#git-hooks)
about setting up git hooks to trigger automatic embedding updates when you
commit/checkout to a different tag.

Expand Down Expand Up @@ -205,6 +204,37 @@ contains `.git/` subdirectory and use it as the _project root_. In this case, th
default global configuration will be used. If `.git/` does not exist, VectorCode
falls back to using the current working directory as the _project root_.

#### Git Hooks

To keep the embeddings up-to-date, you may find it useful to set up some git
hooks. The `init` subcommand provides a `--hooks` flag which helps you manage
hooks when working with a git repository. You can put some custom hooks in
`~/.config/vectorcode/hooks/` and the `vectorcode init --hooks` command will
pick them up and append them to your existing hooks, or create new hook scripts
if they don't exist yet. The hook files should be named the same as they would
be under the `.git/hooks` directory. For example, a pre-commit hook would be named
`~/.config/vectorcode/hooks/pre-commit`.

By default, there are 2 pre-defined hooks:
```bash
# pre-commit hook that vectorise changed files before you commit.
diff_files=$(git diff --cached --name-only)
[ -z "$diff_files" ] || vectorcode vectorise $diff_files
```
```bash
# post-checkout hook that vectorise changed files when you checkout to a
# different branch/tag/commit
files=$(git diff --name-only "$1" "$2")
[ -z "$files" ] || vectorcode vectorise $files
```
When you run `vectorcode init --hooks` in a git repo, these 2 hooks will be added
to your `.git/hooks/`. Hooks that are managed by VectorCode will be wrapped by
`# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines. They help
VectorCode determine whether hooks have been added, so don't delete the markers
unless you know what you're doing. To remove the hooks, simply delete the lines
wrapped by these 2 comment strings.


### Configuring VectorCode
Since 0.6.4, VectorCode adapted a [json5 parser](https://github.com/dpranke/pyjson5)
for loading configuration. VectorCode will now look for `config.json5` in
Expand Down Expand Up @@ -366,35 +396,6 @@ on certain conditions. See
[the wiki](https://github.com/Davidyz/VectorCode/wiki/Tips-and-Tricks#git-hooks)
for an example to use it with git hooks.

#### Git Hooks

To keep the embeddings up-to-date, you may find it useful to set up some git
hooks. The CLI provides a subcommand, `vectorcode hooks`, that helps you manage
hooks when working with a git repository. You can put some custom hooks in
`~/.config/vectorcode/hooks/` and the `vectorcode hooks` command will pick them
up and append them to your existing hooks, or create new hook scripts if they
don't exist yet. The hook files should be named the same as they would be under
the `.git/hooks` directory. For example, a pre-commit hook would be named
`~/.config/vectorcode/hooks/pre-commit`. By default, there are 2 pre-defined
hooks:
```bash
# pre-commit hook that vectorise changed files before you commit.
diff_files=$(git diff --cached --name-only)
[ -z "$diff_files" ] || vectorcode vectorise $diff_files
```
```bash
# post-checkout hook that vectorise changed files when you checkout to a
# different branch/tag/commit
files=$(git diff --name-only "$1" "$2")
[ -z "$files" ] || vectorcode vectorise $files
```
When you run `vectorcode hooks` in a git repo, these 2 hooks will be added to
your `.git/hooks/`. Hooks that are managed by VectorCode will be wrapped by
`# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines. They help
VectorCode determine whether hooks have been added, so don't delete the markers
unless you know what you're doing. To remove the hooks, simply delete the lines
wrapped by these 2 comment strings.

### Making a Query

To retrieve a list of documents from the database, you can use the following command:
Expand Down
7 changes: 7 additions & 0 deletions src/vectorcode/cli_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ class Config:
hnsw: dict[str, str | int] = field(default_factory=dict)
chunk_filters: dict[str, list[str]] = field(default_factory=dict)
encoding: str = "utf8"
hooks: bool = False

@classmethod
async def import_from(cls, config_dict: dict[str, Any]) -> "Config":
Expand Down Expand Up @@ -307,6 +308,12 @@ def get_cli_parser():
default=False,
help="Wipe current project config and overwrite with global config (if it exists).",
)
init_parser.add_argument(
"--hooks",
action="store_true",
default=False,
help="Add git hooks to the current project, if it's a git repo.",
)

subparsers.add_parser(
"version", parents=[shared_parser], help="Print the version number."
Expand Down
4 changes: 4 additions & 0 deletions src/vectorcode/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ async def async_main():

return_val = await chunks(final_configs)
case CliAction.hooks:
logger.warning(
"`vectorcode hooks` has been deprecated and will be removed in 0.7.0."
)
logger.warning("Please use `vectorcode init --hooks`.")
from vectorcode.subcommands import hooks

return await hooks(cli_args)
Expand Down
86 changes: 2 additions & 84 deletions src/vectorcode/subcommands/hooks.py
Original file line number Diff line number Diff line change
@@ -1,92 +1,10 @@
import glob
import logging
import os
import platform
import re
import stat
from pathlib import Path
from typing import Optional

from vectorcode.cli_utils import GLOBAL_CONFIG_PATH, Config, find_project_root
from vectorcode.cli_utils import Config, find_project_root
from vectorcode.subcommands.init import __HOOK_CONTENTS, HookFile, load_hooks

logger = logging.getLogger(name=__name__)
__GLOBAL_HOOKS_PATH = Path(GLOBAL_CONFIG_PATH).parent / "hooks"


# Keys: name of the hooks, ie. `pre-commit`
# Values: lines of the hooks.
__HOOK_CONTENTS: dict[str, list[str]] = {
"pre-commit": [
"diff_files=$(git diff --cached --name-only)",
'[ -z "$diff_files" ] || vectorcode vectorise $diff_files',
],
"post-checkout": [
'files=$(git diff --name-only "$1" "$2")',
'[ -z "$files" ] || vectorcode vectorise $files',
],
}


def __lines_are_empty(lines: list[str]) -> bool:
pattern = re.compile(r"^\s*$")
if len(lines) == 0:
return True
return all(map(lambda line: pattern.match(line) is not None, lines))


def load_hooks():
global __HOOK_CONTENTS
for file in glob.glob(str(__GLOBAL_HOOKS_PATH / "*")):
hook_name = Path(file).stem
with open(file) as fin:
lines = fin.readlines()
if not __lines_are_empty(lines):
__HOOK_CONTENTS[hook_name] = lines


class HookFile:
prefix = "# VECTORCODE_HOOK_START"
suffix = "# VECTORCODE_HOOK_END"
prefix_pattern = re.compile(r"^\s*#\s*VECTORCODE_HOOK_START\s*")
suffix_pattern = re.compile(r"^\s*#\s*VECTORCODE_HOOK_END\s*")

def __init__(self, path: str | Path, git_dir: Optional[str | Path] = None):
self.path = path
self.lines: list[str] = []
if os.path.isfile(self.path):
with open(self.path) as fin:
self.lines.extend(fin.readlines())

def has_vectorcode_hooks(self, force: bool = False) -> bool:
for start, start_line in enumerate(self.lines):
if self.prefix_pattern.match(start_line) is None:
continue

for end in range(start + 1, len(self.lines)):
if self.suffix_pattern.match(self.lines[end]) is not None:
if force:
logger.debug("`force` cleaning existing VectorCode hooks...")
new_lines = self.lines[:start] + self.lines[end + 1 :]
self.lines[:] = new_lines
return False
logger.debug(
f"Found vectorcode hook block between line {start} and {end} in {self.path}:\n{''.join(self.lines[start + 1 : end])}"
)
return True

return False

def inject_hook(self, content: list[str], force: bool = False):
if len(self.lines) == 0 or not self.has_vectorcode_hooks(force):
self.lines.append(self.prefix + "\n")
self.lines.extend(i if i.endswith("\n") else i + "\n" for i in content)
self.lines.append(self.suffix + "\n")
with open(self.path, "w") as fin:
fin.writelines(self.lines)
if platform.system() != "Windows":
# for unix systems, set the executable bit.
curr_mode = os.stat(self.path).st_mode
os.chmod(self.path, mode=curr_mode | stat.S_IXUSR)


async def hooks(configs: Config) -> int:
Expand Down
129 changes: 113 additions & 16 deletions src/vectorcode/subcommands/init.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,129 @@
import glob
import logging
import os
import platform
import re
import shutil
import stat
from pathlib import Path
from typing import Optional

from vectorcode.cli_utils import Config
from vectorcode.cli_utils import GLOBAL_CONFIG_PATH, Config, find_project_root

logger = logging.getLogger(name=__name__)

__GLOBAL_HOOKS_PATH = Path(GLOBAL_CONFIG_PATH).parent / "hooks"


# Keys: name of the hooks, ie. `pre-commit`
# Values: lines of the hooks.
__HOOK_CONTENTS: dict[str, list[str]] = {
"pre-commit": [
"diff_files=$(git diff --cached --name-only)",
'[ -z "$diff_files" ] || vectorcode vectorise $diff_files',
],
"post-checkout": [
'files=$(git diff --name-only "$1" "$2")',
'[ -z "$files" ] || vectorcode vectorise $files',
],
}


def __lines_are_empty(lines: list[str]) -> bool:
pattern = re.compile(r"^\s*$")
if len(lines) == 0:
return True
return all(map(lambda line: pattern.match(line) is not None, lines))


def load_hooks():
global __HOOK_CONTENTS
for file in glob.glob(str(__GLOBAL_HOOKS_PATH / "*")):
hook_name = Path(file).stem
with open(file) as fin:
lines = fin.readlines()
if not __lines_are_empty(lines):
__HOOK_CONTENTS[hook_name] = lines


class HookFile:
prefix = "# VECTORCODE_HOOK_START"
suffix = "# VECTORCODE_HOOK_END"
prefix_pattern = re.compile(r"^\s*#\s*VECTORCODE_HOOK_START\s*")
suffix_pattern = re.compile(r"^\s*#\s*VECTORCODE_HOOK_END\s*")

def __init__(self, path: str | Path, git_dir: Optional[str | Path] = None):
self.path = path
self.lines: list[str] = []
if os.path.isfile(self.path):
with open(self.path) as fin:
self.lines.extend(fin.readlines())

def has_vectorcode_hooks(self, force: bool = False) -> bool:
for start, start_line in enumerate(self.lines):
if self.prefix_pattern.match(start_line) is None:
continue

for end in range(start + 1, len(self.lines)):
if self.suffix_pattern.match(self.lines[end]) is not None:
if force:
logger.debug("`force` cleaning existing VectorCode hooks...")
new_lines = self.lines[:start] + self.lines[end + 1 :]
self.lines[:] = new_lines
return False
logger.debug(
f"Found vectorcode hook block between line {start} and {end} in {self.path}:\n{''.join(self.lines[start + 1 : end])}"
)
return True

return False

def inject_hook(self, content: list[str], force: bool = False):
if len(self.lines) == 0 or not self.has_vectorcode_hooks(force):
self.lines.append(self.prefix + "\n")
self.lines.extend(i if i.endswith("\n") else i + "\n" for i in content)
self.lines.append(self.suffix + "\n")
with open(self.path, "w") as fin:
fin.writelines(self.lines)
if platform.system() != "Windows":
# for unix systems, set the executable bit.
curr_mode = os.stat(self.path).st_mode
os.chmod(self.path, mode=curr_mode | stat.S_IXUSR)


async def init(configs: Config) -> int:
assert configs.project_root is not None
project_config_dir = os.path.join(str(configs.project_root), ".vectorcode")
is_initialised = 0
if os.path.isdir(project_config_dir) and not configs.force:
logger.warning(
f"{configs.project_root} is already initialised for VectorCode.",
)
return 1
is_initialised = 1
else:
os.makedirs(project_config_dir, exist_ok=True)
for item in ("config.json", "vectorcode.include", "vectorcode.exclude"):
local_file_path = os.path.join(project_config_dir, item)
global_file_path = os.path.join(
os.path.expanduser("~"), ".config", "vectorcode", item
)
if os.path.isfile(global_file_path):
logger.debug(f"Copying global {item} to {project_config_dir}")
shutil.copyfile(global_file_path, local_file_path)

os.makedirs(project_config_dir, exist_ok=True)
for item in ("config.json", "vectorcode.include", "vectorcode.exclude"):
local_file_path = os.path.join(project_config_dir, item)
global_file_path = os.path.join(
os.path.expanduser("~"), ".config", "vectorcode", item
print(f"VectorCode project root has been initialised at {configs.project_root}")
print(
"Note: The collection in the database will not be created until you vectorise a file."
)
if os.path.isfile(global_file_path):
logger.debug(f"Copying global {item} to {project_config_dir}")
shutil.copyfile(global_file_path, local_file_path)

print(f"VectorCode project root has been initialised at {configs.project_root}")
print(
"Note: The collection in the database will not be created until you vectorise a file."
)
return 0

git_root = find_project_root(configs.project_root, ".git")
if git_root:
load_hooks()
for hook in __HOOK_CONTENTS.keys():
hook_file_path = os.path.join(git_root, ".git", "hooks", hook)
logger.info(f"Writing {hook} hook into {hook_file_path}.")
print(f"Processing {hook} hook...")
hook_obj = HookFile(hook_file_path, git_dir=git_root)
hook_obj.inject_hook(__HOOK_CONTENTS[hook], configs.force)

return is_initialised
Loading