Skip to content

Conversation

@burtenshaw
Copy link
Collaborator

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

Community evals do not replace benchmarks and leaderboards and closed evals with published leaderboards are still crucial. However, we want to contribute to this with open eval results based on reproducible eval specs.

This won't solve benchmark saturation or close the benchmark-reality gap. Nor will it stop training on test sets. But it makes the game visible by exposing what is evaluated, how, when, and by whom.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 30 to 32
We are going to take evaluations on the Hugging Face Hub in a new direction by decentralizing reporting and allowing the entire community to openly report scores for benchmarks. At first, we will start with a shortlist of 4 benchmarks and over time we’ll expand to the most relevant benchmarks.

**For Benchmarks:** Dataset repos can now register as benchmarks ([MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro), [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa), [HLE](https://huggingface.co/datasets/cais/hle) are already live). They automatically aggregate reported results from across the hub and display leaderboards in the dataset card. The benchmark defines the eval spec via `eval.yaml`, based on [Inspect AI](https://inspect.aisi.org.uk/), so anyone can reproduce it. The reported results need to align with the task definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) there's a mention of 4 benchmarks, but only 3 get linked

Copy link
Member

@davanstrien davanstrien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestion

burtenshaw and others added 5 commits February 4, 2026 14:58
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's in good shape! 🚢

@burtenshaw burtenshaw merged commit f890f8a into main Feb 5, 2026
1 check passed
@burtenshaw burtenshaw deleted the eval-results branch February 5, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants