-
Notifications
You must be signed in to change notification settings - Fork 197
[MOSH-976] Add GEPA Optimization for Summarization #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on January 7
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Evals/GEPA_Optimization.ipynb
Outdated
| " summarizer_lm: dspy.LM,\n", | ||
| " optimizer_lm: SimpleOptimizerLM,\n", | ||
| " max_iterations: int = 5\n", | ||
| "):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused train_data parameter breaks GEPA methodology
The run_manual_gepa function accepts train_data as a parameter but never uses it. The GEPA methodology relies on sampling failure examples from training data to guide prompt improvement, as correctly implemented in Prompt_Optimization.ipynb. Instead, reflect_and_improve_prompt only receives a win rate percentage without any actual failure examples to analyze. This makes the optimizer LLM blind to specific failure patterns, significantly reducing the effectiveness of the iterative improvement process.
Additional Locations (1)
| "\n", | ||
| " # Remove language tags if present\n", | ||
| " if new_prompt.startswith('markdown\\n') or new_prompt.startswith('text\\n'):\n", | ||
| " new_prompt = '\\n'.join(new_prompt.split('\\n')[1:])\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete language tag removal corrupts extracted prompts
The reflect_and_propose_prompt function only removes markdown and text language tags from the extracted prompt, while the equivalent function in GEPA_Optimization.ipynb also handles python and plaintext. If the optimizer LLM wraps its response in a code block with an unhandled language tag (e.g., ```plaintext), the tag text would remain at the start of the new judge prompt, potentially corrupting it and degrading evaluation quality.
Summary
This PR demonstrates how to leverage the GEPA method with our evaluation APIs for iterative prompt optimization.
Note
Adds end-to-end GEPA optimization workflows as runnable notebooks.
Evals/GEPA_Optimization.ipynb: optimizes a summarization prompt on CNN/DailyMail usingdspy, batch summary generation, and Together Eval compare with a judge model; tracks win rates, saves prompts/resultsEvals/Prompt_Optimization.ipynb: optimizes a judge/evaluator prompt via aTogetherEvalAdapter(upload, poll, download, per-subset metrics), minibatch reflection with an optimizer LLM, validation/test evaluation, and results exportWritten by Cursor Bugbot for commit e704809. This will update automatically on new commits. Configure here.