Skip to content

Commit 3338970

Browse files
author
Chris Mason
committed
semcode-lsp: make an LSP server
This is a very simple LSP server, but it's useful for navigating the sources while double checking AI reviews. docs/lsp-server.md has details on how to set it up Signed-off-by: Chris Mason <clm@meta.com>
1 parent 80dd9f3 commit 3338970

File tree

8 files changed

+882
-9
lines changed

8 files changed

+882
-9
lines changed

CLAUDE.md

Lines changed: 88 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
Semcode is a semantic code search tool written in Rust that indexes C/C++ codebases using machine learning embeddings. It consists of two main binaries:
7+
Semcode is a semantic code search tool written in Rust that indexes C/C++ codebases using machine learning embeddings. It consists of several binaries:
88
- `semcode-index`: Analyzes and indexes codebases using the CodeBERT model
99
- `semcode`: Interactive query tool for searching the indexed code
10+
- `semcode-mcp`: Model Context Protocol server for Claude Desktop integration
11+
- `semcode-lsp`: Language Server Protocol server for editor integration (see [docs/lsp-server.md](docs/lsp-server.md))
1012

1113
## Build and Development Commands
1214

@@ -41,9 +43,9 @@ Semcode uses the following search order to locate the `.semcode.db` database dir
4143
2. **Source directory**: Look for `.semcode.db` in the source directory specified by `-s`
4244
3. **Current directory**: Fall back to `./.semcode.db` in the current working directory
4345

44-
**For semcode (query tool) and semcode-mcp:**
45-
1. **-d flag**: If provided, use the specified path (direct database path or parent directory containing `.semcode.db`)
46-
2. **Current directory**: Use `./.semcode.db` in the current working directory
46+
**For semcode (query tool), semcode-mcp, and semcode-lsp:**
47+
1. **-d flag / configuration**: If provided, use the specified path (direct database path or parent directory containing `.semcode.db`)
48+
2. **Workspace/Current directory**: Use `./.semcode.db` in the workspace or current working directory
4749

4850
The `-d` flag can specify either:
4951
- A direct path to the database directory (e.g., `./my-custom.db`)
@@ -78,6 +80,88 @@ semcode --database /path/to/code # Uses /path/to/code/.semcode.db
7880

7981
## Architecture
8082

83+
### Git-Aware Operations (IMPORTANT)
84+
85+
**All features that query the database MUST use git-aware lookup functions.**
86+
87+
Semcode indexes codebases at specific git commits and stores multiple versions of functions, types, and macros as the codebase evolves. When implementing any feature that looks up code entities, always use git-aware functions to ensure you're finding the correct version that matches the user's current working directory.
88+
89+
#### Required Approach
90+
91+
1. **Obtain the current git SHA:**
92+
```rust
93+
use semcode::git::get_git_sha;
94+
95+
let git_sha = get_git_sha(&repo_path)?
96+
.ok_or_else(|| anyhow::anyhow!("Not a git repository"))?;
97+
```
98+
99+
2. **Use git-aware lookup functions:**
100+
```rust
101+
// ✅ CORRECT: Git-aware function lookup
102+
let function = db.find_function_git_aware(name, &git_sha).await?;
103+
104+
// ❌ WRONG: Non-git-aware lookup (may return wrong version)
105+
let function = db.find_function(name).await?;
106+
```
107+
108+
3. **Pass git_repo_path to DatabaseManager:**
109+
```rust
110+
// DatabaseManager needs git_repo_path for git-aware resolution
111+
let db = DatabaseManager::new(&db_path, git_repo_path).await?;
112+
```
113+
114+
#### Available Git-Aware Functions
115+
116+
In `DatabaseManager` (src/database/connection.rs):
117+
- `find_function_git_aware(name: &str, git_sha: &str)` - Find function at specific commit
118+
- `find_macro_git_aware(name: &str, git_sha: &str)` - Find macro at specific commit
119+
- `get_function_callees_git_aware(name: &str, git_sha: &str)` - Get callees at specific commit
120+
- `build_callchain_with_manifest()` - Call chain analysis with git manifest
121+
122+
#### When to Use Non-Git-Aware Functions
123+
124+
Non-git-aware functions like `find_function()` should **only** be used:
125+
- As a fallback when git SHA cannot be determined (not in a git repo)
126+
- For administrative/debugging operations that need to see all versions
127+
- When the operation explicitly requires seeing historical data across commits
128+
129+
#### Example: Implementing a New Feature
130+
131+
```rust
132+
// ✅ CORRECT IMPLEMENTATION
133+
async fn my_new_feature(db: &DatabaseManager, repo_path: &str) -> Result<()> {
134+
// 1. Get current git SHA
135+
let git_sha = semcode::git::get_git_sha(repo_path)?
136+
.ok_or_else(|| anyhow::anyhow!("Not a git repository"))?;
137+
138+
// 2. Use git-aware lookup
139+
if let Some(func) = db.find_function_git_aware("my_func", &git_sha).await? {
140+
println!("Found function at current commit: {}", func.name);
141+
}
142+
143+
Ok(())
144+
}
145+
146+
// ❌ WRONG IMPLEMENTATION (may return wrong version)
147+
async fn my_bad_feature(db: &DatabaseManager) -> Result<()> {
148+
if let Some(func) = db.find_function("my_func").await? {
149+
println!("Found function (but which version?): {}", func.name);
150+
}
151+
Ok(())
152+
}
153+
```
154+
155+
#### Why This Matters
156+
157+
Without git-aware lookups:
158+
- Users may jump to outdated function definitions
159+
- Call chains may include deleted or renamed functions
160+
- Type information may not match current code structure
161+
- Results are confusing and incorrect for active development
162+
163+
**Remember: When in doubt, use git-aware functions!**
164+
81165
### Scalability
82166
- The database is very large. No operation should be implemented via full table
83167
scans unless that operation is a effectively a full table scan.

Cargo.toml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ anstream = "0.6" # Auto TTY/NO_COLOR handling; print!/println! and Write
4545
owo-colors = { version = "4", features = ["supports-colors"] } # .red().bold(), optional detection helpers [web:27]
4646
gix = "0.73"
4747
model2vec-rs = "0.1.4"
48+
tower-lsp = "0.20" # Language Server Protocol framework for LSP server
49+
url = "2.5" # URL parsing for file URIs in LSP
4850

4951
[[bin]]
5052
name = "semcode-index"
@@ -58,7 +60,9 @@ path = "src/bin/query.rs"
5860
name = "semcode-mcp"
5961
path = "src/bin/semcode-mcp.rs"
6062

63+
[[bin]]
64+
name = "semcode-lsp"
65+
path = "src/bin/semcode-lsp.rs"
66+
6167
[profile.release]
6268
debug = false
63-
64-
[workspace]

build.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ mkdir -p bin
5252
ln -sf ../target/release/semcode-index bin/semcode-index
5353
ln -sf ../target/release/semcode bin/semcode
5454
ln -sf ../target/release/semcode-mcp bin/semcode-mcp
55+
ln -sf ../target/release/semcode-lsp bin/semcode-lsp
5556

5657
echo ""
5758
echo "=== Build Complete ==="
@@ -67,6 +68,10 @@ echo ""
6768
echo "To run MCP server:"
6869
echo " ./bin/semcode-mcp --database ./code.db"
6970
echo ""
71+
echo "To run LSP server (for editor integration):"
72+
echo " ./bin/semcode-lsp"
73+
echo " See docs/lsp-server.md for Neovim/editor setup"
74+
echo ""
7075

7176
# Optional: Create a small test directory with sample C files
7277
if [ "$1" == "--with-test" ]; then

docs/lsp-server.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Semcode LSP Server
2+
3+
A Language Server Protocol (LSP) server that provides navigation features for C/C++ codebases indexed by semcode.
4+
5+
## Features
6+
7+
- **Go to Definition**: Jump to definitions of functions, macros, types, and typedefs using semcode's semantic database
8+
- **Find References**: Show all places where a symbol is referenced (callers)
9+
- **Git-Aware Lookups**: Automatically finds the correct version at your current commit
10+
- **Configuration Support**: Configurable database path through LSP client settings
11+
12+
## Building
13+
14+
```bash
15+
cargo build --release --bin semcode-lsp
16+
```
17+
18+
## Usage
19+
20+
The LSP server communicates over stdin/stdout using the JSON-RPC 2.0 protocol as specified by the Language Server Protocol.
21+
22+
### Prerequisites
23+
24+
1. A semcode-indexed codebase:
25+
```bash
26+
semcode-index --source /path/to/your/code
27+
```
28+
29+
2. The resulting `.semcode.db` database in your workspace directory
30+
31+
## Neovim Configuration
32+
33+
### Using nvim-lspconfig
34+
35+
Add this to your Neovim configuration (`~/.config/nvim/init.lua` or similar):
36+
37+
```lua
38+
-- Ensure you have nvim-lspconfig installed
39+
-- Using lazy.nvim:
40+
-- { 'neovim/nvim-lspconfig' }
41+
42+
-- Configure semcode LSP
43+
local lspconfig = require('lspconfig')
44+
local configs = require('lspconfig.configs')
45+
46+
-- Define semcode-lsp if it's not already defined
47+
if not configs.semcode_lsp then
48+
configs.semcode_lsp = {
49+
default_config = {
50+
cmd = { '/path/to/semcode/target/release/semcode-lsp' },
51+
filetypes = { 'c', 'cpp', 'cc', 'h', 'hpp' },
52+
root_dir = function(fname)
53+
-- Look for .semcode.db or use git root
54+
return lspconfig.util.find_git_ancestor(fname) or
55+
lspconfig.util.root_pattern('.semcode.db')(fname) or
56+
vim.fn.getcwd()
57+
end,
58+
settings = {
59+
semcode = {
60+
database_path = nil -- Uses workspace/.semcode.db by default
61+
}
62+
}
63+
}
64+
}
65+
end
66+
67+
-- Setup the LSP
68+
lspconfig.semcode_lsp.setup({
69+
-- Optional: custom database path
70+
settings = {
71+
semcode = {
72+
database_path = "/custom/path/to/.semcode.db" -- Optional
73+
}
74+
}
75+
})
76+
77+
-- Optional: Set up keybindings
78+
vim.api.nvim_create_autocmd('LspAttach', {
79+
group = vim.api.nvim_create_augroup('UserLspConfig', {}),
80+
callback = function(ev)
81+
-- Enable completion triggered by <c-x><c-o>
82+
vim.bo[ev.buf].omnifunc = 'v:lua.vim.lsp.omnifunc'
83+
84+
local opts = { buffer = ev.buf }
85+
vim.keymap.set('n', 'gd', vim.lsp.buf.definition, opts) -- Go to definition
86+
vim.keymap.set('n', 'gr', vim.lsp.buf.references, opts) -- Find references (callers)
87+
vim.keymap.set('n', 'K', vim.lsp.buf.hover, opts)
88+
vim.keymap.set('n', 'gi', vim.lsp.buf.implementation, opts)
89+
vim.keymap.set('n', '<C-k>', vim.lsp.buf.signature_help, opts)
90+
vim.keymap.set('n', '<space>rn', vim.lsp.buf.rename, opts)
91+
vim.keymap.set('n', '<space>ca', vim.lsp.buf.code_action, opts)
92+
end,
93+
})
94+
```
95+
96+
### Manual Configuration
97+
98+
If you prefer manual configuration without nvim-lspconfig:
99+
100+
```lua
101+
vim.lsp.start({
102+
name = 'semcode-lsp',
103+
cmd = { '/path/to/semcode/target/release/semcode-lsp' },
104+
root_dir = vim.fs.dirname(vim.fs.find({'.semcode.db', '.git'}, { upward = true })[1]),
105+
settings = {
106+
semcode = {
107+
database_path = nil -- Optional custom path
108+
}
109+
}
110+
})
111+
```
112+
113+
## Configuration Options
114+
115+
The LSP server accepts the following configuration through the `semcode` section:
116+
117+
- `database_path` (string, optional): Custom path to the semcode database. If not specified, the server will:
118+
1. Look for `.semcode.db` in the workspace directory
119+
2. Fall back to `./.semcode.db` in the current directory
120+
121+
## Testing
122+
123+
To test the LSP server manually, you can run it and send JSON-RPC messages:
124+
125+
```bash
126+
# Build the server
127+
cargo build --release --bin semcode-lsp
128+
129+
# Run the server (it will read from stdin and write to stdout)
130+
./target/release/semcode-lsp
131+
```
132+
133+
Example initialization message:
134+
```json
135+
Content-Length: 246
136+
137+
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"processId":null,"rootUri":"file:///path/to/your/workspace","capabilities":{},"initializationOptions":{},"workspaceFolders":null}}
138+
```
139+
140+
## How It Works
141+
142+
1. **Database Connection**: The server connects to the semcode database (`.semcode.db`) in your workspace
143+
2. **Git Awareness**: The server detects your current git commit (HEAD) to ensure it finds the correct version of all symbols
144+
3. **Go to Definition**: When you request "go to definition" (`gd`) on an identifier, the server:
145+
- Extracts the identifier name at the cursor position
146+
- Uses git-aware lookup to query the database at your current commit
147+
- Checks in priority order: function > macro > type > typedef
148+
- Returns the file path and line number where it is defined
149+
- Your editor jumps to the definition location
150+
4. **Find References**: When you request "find references" (`gr`) on an identifier, the server:
151+
- Extracts the identifier name at the cursor position
152+
- Queries the database for all symbols that reference this identifier (callers)
153+
- Uses git-aware lookup to find references that exist at your current commit
154+
- Returns a list of all locations where it is referenced
155+
- Your editor displays the list of references
156+
157+
### Git-Aware Lookups
158+
159+
The LSP server uses **git-aware lookups** to ensure you always jump to the correct version of a symbol:
160+
161+
- On initialization, it determines your current git commit (`HEAD`)
162+
- When looking up symbols, it finds the version that exists at your current commit
163+
- If you have indexed multiple versions of your codebase, it intelligently selects the right one
164+
- Falls back to non-git-aware lookup if not in a git repository
165+
166+
## Supported Languages
167+
168+
- C (`.c`, `.h`)
169+
- C++ (`.cpp`, `.cc`, `.cxx`, `.hpp`, `.hxx`)
170+
171+
## Troubleshooting
172+
173+
### Database Not Found
174+
If you see "Semcode database not found" messages:
175+
1. Ensure you've run `semcode-index` on your codebase
176+
2. Check that `.semcode.db` exists in your workspace
177+
3. Verify the database path configuration
178+
179+
### Symbol Not Found
180+
If "go to definition" doesn't work for a symbol:
181+
1. Ensure the symbol (function, macro, type, or typedef) was indexed by semcode
182+
2. Try using the `semcode` CLI tool to verify the symbol exists in the database
183+
3. Check that you're using the correct identifier name (no typos)
184+
4. **Git-related issues:**
185+
- Make sure your working directory is at a git commit that has been indexed
186+
- If you've checked out a different commit, restart the LSP server to refresh the git SHA
187+
- Try running `semcode-index` on your current commit if it hasn't been indexed yet
188+
189+
## Limitations
190+
191+
- Supports functions, macros, types, and typedefs (other symbols may be added in the future)
192+
- Requires a pre-indexed semcode database
193+
- Does not support real-time indexing of file changes
194+
- References show symbol-level locations, not exact usage sites within function bodies

scripts/semcode-lsp-wrapper.sh

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/bin/bash
2+
# LSP debugging wrapper - logs all communication to /tmp/semcode-lsp.log
3+
4+
LOG_FILE="/tmp/semcode-lsp-debug.log"
5+
LSP_BINARY="$(dirname "$0")/../target/release/semcode-lsp"
6+
7+
echo "=== LSP Session Started: $(date) ===" >> "$LOG_FILE"
8+
echo "Arguments: $@" >> "$LOG_FILE"
9+
echo "Working Directory: $(pwd)" >> "$LOG_FILE"
10+
echo "Environment:" >> "$LOG_FILE"
11+
env >> "$LOG_FILE"
12+
echo "===================================" >> "$LOG_FILE"
13+
14+
# Use tee to capture stdin/stdout while passing through
15+
# This logs the JSON-RPC communication
16+
exec 3>&1 4>&2
17+
{
18+
# Run the LSP server
19+
SEMCODE_DEBUG=info "$LSP_BINARY" "$@" 2>&1 | tee -a "$LOG_FILE" >&3
20+
} 2>&1 | tee -a "$LOG_FILE" >&4

0 commit comments

Comments
 (0)