Skip to content

Conversation

@mentatbot
Copy link
Contributor

@mentatbot mentatbot bot commented Sep 9, 2025

This PR adds benchmark results for the x-ai/grok-code-fast-1 model running on the LoCoDiff-250425 benchmark set.

Benchmark Configuration

  • Model: x-ai/grok-code-fast-1
  • Concurrency: 20
  • Benchmark Directory: locodiff-250425
  • Total Cases: 200

Current Status

This is an in-progress benchmark run. The results committed so far represent a checkpoint to prevent data loss during the ongoing benchmark execution.

Results Summary (Partial)

  • Mixed success/failure rates as expected for code reconstruction tasks
  • Some API errors encountered (JSONDecodeError) which will be retried at lower concurrency after the main run completes
  • Results are being saved under locodiff-250425/results/*/x-ai_grok-code-fast-1/

Next Steps

  • Complete the full benchmark run at concurrency 20
  • Retry any API-error cases at lower concurrency (concurrency 1) as recommended
  • Generate final summary with success rates and cost analysis
  • Note: Step 3 (documentation generation) will NOT be run per user request

The benchmark will continue running until all 200 cases are completed, then a final commit will be made with the complete results.


🤖 This PR was created with Mentat. See my steps and cost here

  • Wake on any new activity.

mentatbot bot and others added 2 commits September 9, 2025 15:59
- Restarted benchmark after system error, running at concurrency 20
- Saving initial results to prevent data loss
- Will rerun API-error cases at lower concurrency after completion

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/3f57f91c-f099-4cac-a5c4-660b7c86366c

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Continuing benchmark run at concurrency 20
- Saving additional results to prevent data loss
- Will rerun API-error cases at lower concurrency after completion

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/413e26e5-a573-4ee8-9bef-ee974888da8b

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
@mentatbot mentatbot bot requested a review from biobootloader September 9, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant