Skip to content

Conversation

@simonmeoni
Copy link
Contributor

No description provided.

@simonmeoni simonmeoni changed the base branch from main to ht+sm/conflict-maybe-lrec October 6, 2025 08:26
@simonmeoni simonmeoni requested a review from honghanhh October 6, 2025 08:26
Copy link
Contributor

@honghanhh honghanhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic LGTM 🚀 Just a few comments to remove redundancies & optimize it.


def create_output_filename() -> str:
"""Create output filename with git SHA and date."""
git_sha = get_git_sha()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both create_output_filename and create_metadata called get_git_sha() at the beginning, it's better to call once in main() directly before calling these 2 function to avoid redundant calls.

Comment on lines +69 to +90
# Split 1: Best propositions
for item in data:
if len(best_split) >= best_limit:
break
pair_id = get_pair_id(item)
is_best = item.get("data", {}).get("best_conflict", False)

if is_best and pair_id not in used_pair_ids:
best_split.append(item)
used_pair_ids.add(pair_id)

# Split 2: Low-scored propositions
for item in data:
if len(low_score_split) >= low_score_limit:
break
pair_id = get_pair_id(item)
avg_score = calculate_average_score(item)

if avg_score < 4.0 and pair_id not in used_pair_ids:
low_score_split.append(item)
used_pair_ids.add(pair_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function iterates through the entire dataset twice. I think it is possible to be done in a single pass.

pair_id = get_pair_id(item)
avg_score = calculate_average_score(item)

if avg_score < 4.0 and pair_id not in used_pair_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value will be used both in moderator agents & post-processing steps, maybe it is worth to save in a constants.py/utils.py for later consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants