Add distinct max_len and max_tokens parameters #7

ferreirafabio · 2026-01-07T17:37:44Z

Overview

as discussed with @geoalgo today, this PR adds CLI arguments for controlling input truncation and model generation limits:

--max_len (default 8192): Maximum character length for truncating input text (instructions, completions) before sending to models. Prevents exceeding context limits (was previously hard-set to 200), effectively leading to judges that would notice cut-off completions and therefore base their decisions on that.
--max_tokens (default 32768): Maximum number of tokens all models (A, B, and judge) can generate in their responses (was previously hard-coded to 32k)
fixed minor bug:--results_folder parsed but never passed to the CliArgs dataclass

The first two parameters were previously hard-coded with inconsistent values (200 and 4096) and conflated together. This PR separates them as distinct concepts.

Changes

generate_and_evaluate.py: Added --max_len and --max_tokens CLI arguments, fixed --result_folder not being passed
generate.py: Separated max_len (truncation) from max_tokens (generation) in generate_instructions() and generate_base()
evaluate.py: Updated default max_len to 8192

Usage

python -m openjury.generate_and_evaluate
--dataset alpaca-eval
--model_A ...
--model_B ...
--judge_model ...
--max_len 16384
--max_tokens 8192

Add --max_len (default 8192) for truncating input text (instructions, completions)
Add --max_tokens (default 32768) for limiting model generation output
Separate these concepts which were previously conflated
Update defaults consistently across generate.py and evaluate.py
Fix bug: --result_folder CLI arg was parsed but not passed to CliArgs

- Add --max_len (default 8192) for truncating input text (instructions, completions) - Add --max_tokens (default 32768) for limiting model generation output - Separate these concepts which were previously conflated - Update defaults consistently across generate.py and evaluate.py - Fix bug: --result_folder CLI arg was parsed but not passed to CliArgs

geoalgo

Awesome, thanks for catching and fixing this. I have only one comment about the naming.

geoalgo · 2026-01-08T08:50:37Z

openjury/generate_and_evaluate.py

            " `[result_folder]/[evaluation_name]`.",
        )
+        parser.add_argument(
+            "--max_len",


max_len and max_tokens do not convey what the parameters do, could you replace by a better name?

Perhaps max_token_completion and max_token_judge would be better?

Thanks for the feedback @geoalgo. I've been thinking about this more and realized that max_token_completion and max_token_judge don't fully capture what's happening, since truncation occurs at the character level, not based on tokens.

Here's what I would suggest:
--max_out_tokens_models: max tokens models A/B can generate
--max_out_tokens_judge: max tokens the judge can generate
--truncate_all_input_chars: max chars to truncate all input text (instructions before A/B, completions before judge)

I considered splitting the last one into separate params (--max_in_chars_models for instructions and --max_in_chars_judge for completions), but I couldn't think of a practical use case where you'd want different truncation limits for each. The common scenarios are "both short" (save costs) or "both long" (thorough eval) I would say.

Let me know if this naming works for you, or if you'd prefer something different.

- Rename → (truncates instructions before A/B, completions before judge) - Split into: - (output limit for models A/B) - (output limit for judge) - Fix bug in generate_base where max_len was used instead of max_tokens - Update function signatures in generate.py and evaluate.py

ferreirafabio changed the title ~~Add --max_len and --max_tokens CLI arguments~~ Add distinct max_len and max_tokens parameters Jan 7, 2026

geoalgo reviewed Jan 8, 2026

View reviewed changes

geoalgo merged commit 593f0f2 into OpenEuroLLM:main Jan 8, 2026
1 check failed

ferreirafabio deleted the feature/configurable-max-len branch January 12, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distinct max_len and max_tokens parameters #7

Add distinct max_len and max_tokens parameters #7

Uh oh!

ferreirafabio commented Jan 7, 2026

Uh oh!

geoalgo left a comment

Uh oh!

geoalgo Jan 8, 2026

Uh oh!

ferreirafabio Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add distinct max_len and max_tokens parameters #7

Add distinct max_len and max_tokens parameters #7

Uh oh!

Conversation

ferreirafabio commented Jan 7, 2026

Overview

Changes

Usage

Uh oh!

geoalgo left a comment

Choose a reason for hiding this comment

Uh oh!

geoalgo Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ferreirafabio Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants