Add LM-Evaluation-Harness integration with unified adapter pattern #7

shreyashkar-ml · 2025-08-10T22:45:35Z

This commit adds support for the lm-eval library, and moves infer quantization and model context window extraction to the common utils module.

Added adapter for transforming lm-eval outputs to the unified schema format.
Added converter for running lm-eval and dumping outputs to the unified schema format.
Added test for the adapter and converter, with test config for the lm-eval library in config/lm_eval_test_config.yaml.
Added _infer_quantization and _extract_context_window_from_config functions to the common utils module.

Complete pipeline: YAML → LMEvalRunner → lm-eval → LMEvalAdapter → Unified Schema

damian1996 · 2025-08-14T12:36:04Z

The changes in naming (for example helm to eval_helm) are unneeded because Andrew's PR will clean it enough.

damian1996

Comments for now, probably I will add more for adapter file for extracting scores and samples.

damian1996 · 2025-08-14T11:31:20Z

eval_lmeval/adapter.py

+
+# Helpers
+
+def _infer_quantization(model_name_or_path: str) -> tuple[BitPrecision, Method]:


What do you think to move this question to common/utils.py? There usually are issues with extracting quantization info directly from evaluation logs, so this can be useful as a way to extract this info.

Yes, this is used in more than a few occasion, I will probably move this and get_context_size part to utils.py

damian1996 · 2025-08-14T12:54:26Z

pyproject.toml


 [tool.setuptools.packages.find]
-include = ["helm*", "schema*", "common*", "config*"]
+include = ["eval_helm*", "eval_lmeval*", "schema*", "common*", "config*"]


remove eval_ prefixes

damian1996 · 2025-08-14T13:00:40Z

eval_lmeval/adapter.py

+	}
+
+	method = method_map.get(method_key, Method.None_)
+	return precision, method


What about adding quant type (gptq, awq, ...) to output from this function and handle it in our schema as well?

can do this, would you recommend editing Method class in eval_types.py or adding a new class for mapping from quant type to Method (existing class)?

damian1996 · 2025-08-14T13:01:57Z

eval_lmeval/adapter.py

+
+class LMEvalAdapter(BaseEvaluationAdapter):
+
+	CONFIG_FILE = "config.yaml"


Would you always have an access to these four files in lm-eval logs?

file name may change, I will add some checks to ensure that naming consistency is maintained, but lm_eval works with yaml file (so unless it's terminal arguments entry), yes

damian1996 · 2025-08-14T13:15:50Z

eval_lmeval/adapter.py

+		if not dir_path.is_dir():
+			raise FileNotFoundError(f"Directory {dir_path} does not exist")
+
+		cfg_path = dir_path / self.CONFIG_FILE


Have you tested it? I received TypeError: unsupported operand type(s) for /: 'str' and 'str'
Maybe use:
cfg_path = os.path.join(dir_path, self.CONFIG_FILE)
or
cfg_path = f'{dir_path}/{self.CONFIG_FILE}'

Agreed, will change to ensure robustness, was working fine in my system so didn't think of it much that time

damian1996 · 2025-08-14T14:19:03Z

eval_lmeval/adapter.py

+
+		# Load task-level metrics
+		task_scores: Dict[str, Dict[str, float]] = {}
+		results_path = dir_path / self.RESULTS_FILE


to check as above

damian1996 · 2025-08-14T14:32:50Z

eval_lmeval/lm_eval_runner.py

@@ -0,0 +1,66 @@
+from __future__ import annotations


What is a point of this file? We probably don't need it because we won't run any experiments. We only converting eval logs from users to our unified schema.

will remove

damian1996 · 2025-08-14T14:37:42Z

eval_lmeval/utils.py

+    # generative tasks often expose `exact_match` / `bleu` - handled ad-hoc
+}
+
+def detect_prompt_class(task_name: str) -> PromptClass:


Did you consider to fill this up later in the adapter file? Even if we don't have a field like in HELM (about question type), we still can confirm if task is multiple_choice when we reading responses for questions from the eval log.

damian1996 · 2025-08-14T14:39:47Z

tests/test_lm_eval_runner.py

@@ -0,0 +1,340 @@
+import pytest


You can remove this file, because we don't run anything here.

damian1996 · 2025-08-14T14:40:34Z

config/lm_eval_test_config.yaml

@@ -0,0 +1,10 @@
+model: hf


Where do you use it?

shreyashkar-ml · 2025-08-17T05:41:34Z

Thanks @damian1996 , I will wait till andrew's PR for file cleanup and name change is merged, then will commit as per your recommendations to avoid merge conflicts.

…ntization and model context window extraction to the common utils module. 1. Added adapter for transforming lm-eval outputs to the unified schema format. 2. Added converter for running lm-eval and dumping outputs to the unified schema format. 3. Added test for the adapter and converter, with test config for the lm-eval library in config/lm_eval_test_config.yaml. 4. Added _infer_quantization and _extract_context_window_from_config functions to the common utils module.

shreyashkar-ml · 2025-09-06T09:01:03Z

@damian1996 , I have updated the lm-eval integration accordingly, kindly check

shreyashkar-ml assigned shreyashkar-ml, yifanmai and copperwiring and unassigned shreyashkar-ml Aug 10, 2025

damian1996 reviewed Aug 14, 2025

View reviewed changes

shreyashkar-ml force-pushed the shreyashkar/lm-eval-implementation branch from ca040da to b3dd337 Compare September 6, 2025 08:56

shreyashkar-ml assigned damian1996 and unassigned yifanmai and copperwiring Sep 6, 2025


		# Helpers

		def _infer_quantization(model_name_or_path: str) -> tuple[BitPrecision, Method]:


		class LMEvalAdapter(BaseEvaluationAdapter):

		CONFIG_FILE = "config.yaml"

Add LM-Evaluation-Harness integration with unified adapter pattern #7

Are you sure you want to change the base?

Add LM-Evaluation-Harness integration with unified adapter pattern #7

Uh oh!

Conversation

shreyashkar-ml commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damian1996 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damian1996 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shreyashkar-ml commented Aug 17, 2025

Uh oh!

shreyashkar-ml commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shreyashkar-ml commented Aug 10, 2025 •

edited

Loading

damian1996 commented Aug 14, 2025 •

edited

Loading