Skip to content
Change the repository type filter

All

    Repositories list

    • swt-bench

      Public
      [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
      Python
      146780Updated Dec 17, 2025Dec 17, 2025
    • insights

      Public
      We track and analyze the activity and performance of autonomous code agents in the wild
      TypeScript
      44700Updated Dec 5, 2025Dec 5, 2025
    • baxbench

      Public
      Python
      178320Updated Oct 22, 2025Oct 22, 2025
    • Heavily compressed docker images for SWE Bench Verified
      Go
      1400Updated Oct 1, 2025Oct 1, 2025
    • SWEBench

      Public
      SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
      Python
      728000Updated Jul 30, 2025Jul 30, 2025
    • tests

      Public
      0000Updated May 5, 2025May 5, 2025