Add distinct max_len and max_tokens parameters #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
as discussed with @geoalgo today, this PR adds CLI arguments for controlling input truncation and model generation limits:
--max_len(default 8192): Maximum character length for truncating input text (instructions, completions) before sending to models. Prevents exceeding context limits (was previously hard-set to 200), effectively leading to judges that would notice cut-off completions and therefore base their decisions on that.--max_tokens(default 32768): Maximum number of tokens all models (A, B, and judge) can generate in their responses (was previously hard-coded to 32k)fixed minor bug:
--results_folderparsed but never passed to the CliArgs dataclassThe first two parameters were previously hard-coded with inconsistent values (200 and 4096) and conflated together. This PR separates them as distinct concepts.
Changes
generate_and_evaluate.py: Added--max_lenand--max_tokensCLI arguments, fixed--result_foldernot being passedgenerate.py: Separatedmax_len(truncation) frommax_tokens(generation) ingenerate_instructions()andgenerate_base()evaluate.py: Updated defaultmax_lento 8192Usage
python -m openjury.generate_and_evaluate
--dataset alpaca-eval
--model_A ...
--model_B ...
--judge_model ...
--max_len 16384
--max_tokens 8192