Add benchmark results for x-ai/grok-code-fast-1 model (in-progress) #338

mentatbot · 2025-09-09T16:00:38Z

This PR adds benchmark results for the x-ai/grok-code-fast-1 model running on the LoCoDiff-250425 benchmark set.

Benchmark Configuration

Model: x-ai/grok-code-fast-1
Concurrency: 20
Benchmark Directory: locodiff-250425
Total Cases: 200

Current Status

This is an in-progress benchmark run. The results committed so far represent a checkpoint to prevent data loss during the ongoing benchmark execution.

Results Summary (Partial)

Mixed success/failure rates as expected for code reconstruction tasks
Some API errors encountered (JSONDecodeError) which will be retried at lower concurrency after the main run completes
Results are being saved under locodiff-250425/results/*/x-ai_grok-code-fast-1/

Next Steps

Complete the full benchmark run at concurrency 20
Retry any API-error cases at lower concurrency (concurrency 1) as recommended
Generate final summary with success rates and cost analysis
Note: Step 3 (documentation generation) will NOT be run per user request

The benchmark will continue running until all 200 cases are completed, then a final commit will be made with the complete results.

🤖 This PR was created with Mentat. See my steps and cost here ✨

Wake on any new activity.

- Restarted benchmark after system error, running at concurrency 20 - Saving initial results to prevent data loss - Will rerun API-error cases at lower concurrency after completion Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/3f57f91c-f099-4cac-a5c4-660b7c86366c Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>

- Continuing benchmark run at concurrency 20 - Saving additional results to prevent data loss - Will rerun API-error cases at lower concurrency after completion Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/413e26e5-a573-4ee8-9bef-ee974888da8b Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>

mentatbot bot and others added 2 commits September 9, 2025 15:59

mentatbot bot requested a review from biobootloader September 9, 2025 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add benchmark results for x-ai/grok-code-fast-1 model (in-progress) #338

Add benchmark results for x-ai/grok-code-fast-1 model (in-progress) #338

Uh oh!

mentatbot bot commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add benchmark results for x-ai/grok-code-fast-1 model (in-progress) #338

Are you sure you want to change the base?

Add benchmark results for x-ai/grok-code-fast-1 model (in-progress) #338

Uh oh!

Conversation

mentatbot bot commented Sep 9, 2025

Benchmark Configuration

Current Status

Results Summary (Partial)

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant