add postprocessing script to select only 50% of best examples and 50% of random examples #89

simonmeoni · 2025-10-06T08:26:13Z

No description provided.

honghanhh

The logic LGTM 🚀 Just a few comments to remove redundancies & optimize it.

honghanhh · 2025-10-06T10:04:01Z

lib/conflicts/scripts/postprocess.py

+
+def create_output_filename() -> str:
+    """Create output filename with git SHA and date."""
+    git_sha = get_git_sha()


Both create_output_filename and create_metadata called get_git_sha() at the beginning, it's better to call once in main() directly before calling these 2 function to avoid redundant calls.

honghanhh · 2025-10-06T10:05:33Z

lib/conflicts/scripts/postprocess.py

+    # Split 1: Best propositions
+    for item in data:
+        if len(best_split) >= best_limit:
+            break
+        pair_id = get_pair_id(item)
+        is_best = item.get("data", {}).get("best_conflict", False)
+
+        if is_best and pair_id not in used_pair_ids:
+            best_split.append(item)
+            used_pair_ids.add(pair_id)
+
+    # Split 2: Low-scored propositions
+    for item in data:
+        if len(low_score_split) >= low_score_limit:
+            break
+        pair_id = get_pair_id(item)
+        avg_score = calculate_average_score(item)
+
+        if avg_score < 4.0 and pair_id not in used_pair_ids:
+            low_score_split.append(item)
+            used_pair_ids.add(pair_id)
+


The function iterates through the entire dataset twice. I think it is possible to be done in a single pass.

honghanhh · 2025-10-06T10:10:16Z

lib/conflicts/scripts/postprocess.py

+        pair_id = get_pair_id(item)
+        avg_score = calculate_average_score(item)
+
+        if avg_score < 4.0 and pair_id not in used_pair_ids:


This value will be used both in moderator agents & post-processing steps, maybe it is worth to save in a constants.py/utils.py for later consistency.

simonmeoni added 2 commits September 30, 2025 14:31

feat: add postprocessing script

ee175cb

fix: fix problem with best_conflict col

afc0f74

simonmeoni changed the base branch from main to ht+sm/conflict-maybe-lrec October 6, 2025 08:26

simonmeoni requested a review from honghanhh October 6, 2025 08:26

honghanhh requested changes Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add postprocessing script to select only 50% of best examples and 50% of random examples #89

add postprocessing script to select only 50% of best examples and 50% of random examples #89

Uh oh!

simonmeoni commented Oct 6, 2025

Uh oh!

honghanhh left a comment

Uh oh!

honghanhh Oct 6, 2025

Uh oh!

honghanhh Oct 6, 2025

Uh oh!

honghanhh Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add postprocessing script to select only 50% of best examples and 50% of random examples #89

Are you sure you want to change the base?

add postprocessing script to select only 50% of best examples and 50% of random examples #89

Uh oh!

Conversation

simonmeoni commented Oct 6, 2025

Uh oh!

honghanhh left a comment

Choose a reason for hiding this comment

Uh oh!

honghanhh Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

honghanhh Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

honghanhh Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants