Skip to content

Conversation

@nicholaspsmith
Copy link
Owner

Summary

  • Increase batch size from 10 to 50 texts per request
  • Increase concurrency from 4 to 20 parallel requests
  • Processes 1000 texts at a time instead of 40
  • Add diagnostic log before embedBatch call to help identify hangs

This should better utilize available GPU resources.

Test plan

  • Start indexing a large codebase
  • Verify GPU utilization increases
  • Check that diagnostic logs appear

🤖 Generated with Claude Code

nicholaspsmith and others added 2 commits January 27, 2026 11:03
- Increase batch size from 10 to 50 texts per request
- Increase concurrency from 4 to 20 parallel requests
- Processes 1000 texts at a time instead of 40

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@nicholaspsmith nicholaspsmith force-pushed the fix/ollama-aggressive-parallelism branch from 91fdc05 to 41bbec7 Compare January 27, 2026 16:03
@nicholaspsmith nicholaspsmith merged commit 8a1aef7 into main Jan 27, 2026
2 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 27, 2026
## [1.18.6](v1.18.5...v1.18.6) (2026-01-27)

### Performance Improvements

* increase Ollama parallelism and add diagnostics ([#85](#85)) ([8a1aef7](8a1aef7))
@github-actions
Copy link

🎉 This PR is included in version 1.18.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants