Mitigate judge's order bias with swap mode #4

Ali-Elganzory · 2025-11-17T07:52:29Z

No description provided.

geoalgo · 2025-11-17T09:46:24Z

openjury/generate_and_evaluate.py

            judge_model=args.judge_model,
            n_instructions=args.n_instructions,
            provide_explanation=args.provide_explanation,
+            swap_mode=args.swap_mode,


At this point args.swap_mode is a string but we map it to a boolean, we should probably rather test args.swap_mode == "fixed" rather?

Yes, you are right. This got lost in the change from correct_order_bias (bool) to swap_mode (str).

geoalgo · 2025-11-17T09:47:41Z

openjury/generate_and_evaluate.py

+                for annotation in annotations_reversed
+            ]
+        )
+        prefs = (prefs + (1 - prefs_reversed)) / 2.0


Does it work in case of nans? Seems like return 2n battles would be cleaner?

Yes, it does:

>>> pd.Series([1, 2, 3, None]) + pd.Series([1, 2, None, 4]) 0 2.0 1 4.0 2 NaN 3 NaN dtype: float64

Nonetheless, I think your suggestion is, generally, cleaner. How about the following?

prefs = pd.concat([prefs, (1 - prefs_reversed)]).reset_index(drop=True)

that sounds good to me

Ali-Elganzory · 2025-11-17T16:30:14Z

I have created #6

Mitigate judge's order bias with swap mode

19d45f6

geoalgo merged commit 609eaf3 into OpenEuroLLM:main Nov 17, 2025
1 check failed

geoalgo reviewed Nov 17, 2025

View reviewed changes

Ali-Elganzory mentioned this pull request Nov 17, 2025

Order bias followup #6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate judge's order bias with swap mode #4

Mitigate judge's order bias with swap mode #4

Uh oh!

Ali-Elganzory commented Nov 17, 2025

Uh oh!

Uh oh!

geoalgo Nov 17, 2025

Uh oh!

Ali-Elganzory Nov 17, 2025

Uh oh!

geoalgo Nov 17, 2025

Uh oh!

Ali-Elganzory Nov 17, 2025

Uh oh!

geoalgo Nov 17, 2025

Uh oh!

Ali-Elganzory commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mitigate judge's order bias with swap mode #4

Mitigate judge's order bias with swap mode #4

Uh oh!

Conversation

Ali-Elganzory commented Nov 17, 2025

Uh oh!

Uh oh!

geoalgo Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Ali-Elganzory Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

geoalgo Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Ali-Elganzory Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

geoalgo Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Ali-Elganzory commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants