-
Notifications
You must be signed in to change notification settings - Fork 3
Add LM-Evaluation-Harness integration with unified adapter pattern #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The changes in naming (for example |
damian1996
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments for now, probably I will add more for adapter file for extracting scores and samples.
eval_lmeval/adapter.py
Outdated
|
|
||
| # Helpers | ||
|
|
||
| def _infer_quantization(model_name_or_path: str) -> tuple[BitPrecision, Method]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think to move this question to common/utils.py? There usually are issues with extracting quantization info directly from evaluation logs, so this can be useful as a way to extract this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is used in more than a few occasion, I will probably move this and get_context_size part to utils.py
pyproject.toml
Outdated
|
|
||
| [tool.setuptools.packages.find] | ||
| include = ["helm*", "schema*", "common*", "config*"] | ||
| include = ["eval_helm*", "eval_lmeval*", "schema*", "common*", "config*"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove eval_ prefixes
eval_lmeval/adapter.py
Outdated
| } | ||
|
|
||
| method = method_map.get(method_key, Method.None_) | ||
| return precision, method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding quant type (gptq, awq, ...) to output from this function and handle it in our schema as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do this, would you recommend editing Method class in eval_types.py or adding a new class for mapping from quant type to Method (existing class)?
|
|
||
| class LMEvalAdapter(BaseEvaluationAdapter): | ||
|
|
||
| CONFIG_FILE = "config.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you always have an access to these four files in lm-eval logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file name may change, I will add some checks to ensure that naming consistency is maintained, but lm_eval works with yaml file (so unless it's terminal arguments entry), yes
eval_lmeval/adapter.py
Outdated
| if not dir_path.is_dir(): | ||
| raise FileNotFoundError(f"Directory {dir_path} does not exist") | ||
|
|
||
| cfg_path = dir_path / self.CONFIG_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested it? I received TypeError: unsupported operand type(s) for /: 'str' and 'str'
Maybe use:
cfg_path = os.path.join(dir_path, self.CONFIG_FILE)
or
cfg_path = f'{dir_path}/{self.CONFIG_FILE}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will change to ensure robustness, was working fine in my system so didn't think of it much that time
eval_lmeval/adapter.py
Outdated
|
|
||
| # Load task-level metrics | ||
| task_scores: Dict[str, Dict[str, float]] = {} | ||
| results_path = dir_path / self.RESULTS_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to check as above
| @@ -0,0 +1,66 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a point of this file? We probably don't need it because we won't run any experiments. We only converting eval logs from users to our unified schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
| # generative tasks often expose `exact_match` / `bleu` - handled ad-hoc | ||
| } | ||
|
|
||
| def detect_prompt_class(task_name: str) -> PromptClass: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider to fill this up later in the adapter file? Even if we don't have a field like in HELM (about question type), we still can confirm if task is multiple_choice when we reading responses for questions from the eval log.
| @@ -0,0 +1,340 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this file, because we don't run anything here.
| @@ -0,0 +1,10 @@ | |||
| model: hf | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you use it?
|
Thanks @damian1996 , I will wait till andrew's PR for file cleanup and name change is merged, then will commit as per your recommendations to avoid merge conflicts. |
…ntization and model context window extraction to the common utils module. 1. Added adapter for transforming lm-eval outputs to the unified schema format. 2. Added converter for running lm-eval and dumping outputs to the unified schema format. 3. Added test for the adapter and converter, with test config for the lm-eval library in config/lm_eval_test_config.yaml. 4. Added _infer_quantization and _extract_context_window_from_config functions to the common utils module.
ca040da to
b3dd337
Compare
|
@damian1996 , I have updated the lm-eval integration accordingly, kindly check |
This commit adds support for the lm-eval library, and moves infer quantization and model context window extraction to the common utils module.
Complete pipeline: YAML → LMEvalRunner → lm-eval → LMEvalAdapter → Unified Schema