Skip to content

Conversation

@nicholaspsmith
Copy link
Owner

Summary

More conservative Ollama embedding settings to improve reliability:

  • Reduce batch size from 100 to 10 texts per request
  • Reduce concurrency from 10 to 4 parallel requests
  • Reduce timeout from 5 to 2 minutes per batch
  • Add keep_alive: '10m' parameter to prevent model unloading between requests
  • Add detailed logging showing each group start

With 2000 texts per indexer batch:

  • 200 batches of 10 texts
  • 50 groups of 4 parallel batches
  • Progress updates every 40 texts

Test plan

  • Start indexing a large codebase
  • Verify detailed Ollama logs appear in stderr
  • Verify embedding makes progress without hanging

🤖 Generated with Claude Code

- Reduce batch size from 100 to 10 texts per request
- Reduce concurrency from 10 to 4 parallel requests
- Reduce timeout from 5 to 2 minutes per batch
- Add keep_alive parameter to prevent model unloading
- Add detailed logging for each group start

Co-Authored-By: Claude <noreply@anthropic.com>
@nicholaspsmith nicholaspsmith merged commit 455d0f4 into main Jan 27, 2026
0 of 2 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 27, 2026
## [1.18.5](v1.18.4...v1.18.5) (2026-01-27)

### Bug Fixes

* use conservative Ollama settings for reliability ([#83](#83)) ([455d0f4](455d0f4))
@github-actions
Copy link

🎉 This PR is included in version 1.18.5 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants