[MOSH-976] Add GEPA Optimization for Summarization #68

jli-together · 2025-12-22T15:16:56Z

Summary

This PR demonstrates how to leverage the GEPA method with our evaluation APIs for iterative prompt optimization.

Note

Adds end-to-end GEPA optimization workflows as runnable notebooks.

New Evals/GEPA_Optimization.ipynb: optimizes a summarization prompt on CNN/DailyMail using dspy, batch summary generation, and Together Eval compare with a judge model; tracks win rates, saves prompts/results
New Evals/Prompt_Optimization.ipynb: optimizes a judge/evaluator prompt via a TogetherEvalAdapter (upload, poll, download, per-subset metrics), minibatch reflection with an optimizer LLM, validation/test evaluation, and results export

^{Written by Cursor Bugbot for commit e704809. This will update automatically on new commits. Configure here.}

cursor

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on January 7

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Evals/GEPA_Optimization.ipynb

cursor · 2025-12-23T03:35:20Z

Evals/GEPA_Optimization.ipynb

+    "        summarizer_lm: dspy.LM,\n",
+    "        optimizer_lm: SimpleOptimizerLM,\n",
+    "        max_iterations: int = 5\n",
+    "):\n",


Unused train_data parameter breaks GEPA methodology

The run_manual_gepa function accepts train_data as a parameter but never uses it. The GEPA methodology relies on sampling failure examples from training data to guide prompt improvement, as correctly implemented in Prompt_Optimization.ipynb. Instead, reflect_and_improve_prompt only receives a win rate percentage without any actual failure examples to analyze. This makes the optimizer LLM blind to specific failure patterns, significantly reducing the effectiveness of the iterative improvement process.

Additional Locations (1)

Evals/GEPA_Optimization.ipynb#L572-L578

cursor · 2025-12-23T03:46:20Z

Evals/Prompt_Optimization.ipynb

+    "\n",
+    "        # Remove language tags if present\n",
+    "        if new_prompt.startswith('markdown\\n') or new_prompt.startswith('text\\n'):\n",
+    "            new_prompt = '\\n'.join(new_prompt.split('\\n')[1:])\n",


Incomplete language tag removal corrupts extracted prompts

The reflect_and_propose_prompt function only removes markdown and text language tags from the extracted prompt, while the equivalent function in GEPA_Optimization.ipynb also handles python and plaintext. If the optimizer LLM wraps its response in a code block with an unhandled language tag (e.g., ```plaintext), the tag text would remain at the start of the new judge prompt, potentially corrupting it and degrading evaluation quality.

add GEPA optimization notebook for summarization

d4d46e6

cursor bot reviewed Dec 22, 2025

View reviewed changes

Evals/GEPA_Optimization.ipynb Outdated Show resolved Hide resolved

update comments

170d85b

jli-together requested review from VProv and newokaerinasai December 22, 2025 17:34

jli-together added 2 commits December 23, 2025 10:44

update notebook link

414b4d2

update file id option for data prep

713c10b

cursor bot reviewed Dec 23, 2025

View reviewed changes

update

9f7d34f

cursor bot reviewed Dec 23, 2025

View reviewed changes

update

e704809

jli-together changed the title ~~Add GEPA Optimization for Summarization~~ [MOSH-976] Add GEPA Optimization for Summarization Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MOSH-976] Add GEPA Optimization for Summarization #68

[MOSH-976] Add GEPA Optimization for Summarization #68

jli-together commented Dec 22, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot Dec 23, 2025

Uh oh!

cursor bot Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MOSH-976] Add GEPA Optimization for Summarization #68

Are you sure you want to change the base?

[MOSH-976] Add GEPA Optimization for Summarization #68

Conversation

jli-together commented Dec 22, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This is the final PR Bugbot will review for you during this billing cycle

Uh oh!

Uh oh!

cursor bot Dec 23, 2025

Choose a reason for hiding this comment

Unused train_data parameter breaks GEPA methodology

Uh oh!

cursor bot Dec 23, 2025

Choose a reason for hiding this comment

Incomplete language tag removal corrupts extracted prompts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jli-together commented Dec 22, 2025 •

edited by cursor bot

Loading