This repository contains a minimal Python implementation of the Promptbreeder concept, a self-referential self-improvement method for Large Language Models (LLMs) that evolves prompts for a given domain. Inspired by the Promptbreeder paper, this project demonstrates the core ideas of using a genetic algorithm to evolve task-specific prompts and their associated mutation instructions.
- Genetic Algorithm Core: Implements a binary tournament genetic algorithm to evolve a population of prompts.
- Coupled Prompt Evolution: Each "individual" in the population consists of a
(task_prompt, mutation_prompt)pair, allowing the mutation mechanism itself to evolve alongside the task prompts. - Hypermutation: Includes a mechanism where the
mutation_promptitself can be mutated (hypermutation). This mutation is conditionally accepted based on a "quick check" of its effectiveness. - Zeroth-Order Mutation: Allows for generating entirely new task prompts from scratch based on the overall task description.
- First-Order Mutation: Modifies an existing task prompt using its coupled
mutation_prompt. - Dynamic Fitness Evaluation: Supports loading external Python files as
FitnessEvaluatorclasses, enabling flexible and task-specific fitness scoring. - Ollama Integration: Uses
ollama-pythonfor local LLM inference, making it easy to run experiments with various models. - Results Logging: Saves detailed generation-wise data (population, fitness, best prompts) to a JSON file for analysis.
Before running this project, ensure you have:
- Python 3.8+: Installed on your system.
- Ollama: Installed and running. Download from ollama.com.
- Ollama Models: Pull the necessary LLM models. The default models used in this project are
qwen3:0.6b. You can pull them using:ollama pull qwen3:0.6b
- Clone the repository (or create the files):
# If you're creating files manually, ensure they are in the correct structure # promptbreeder_project/ # ├── ga.py # ├── llm_utils.py # ├── mutator.py # ├── mutation_prompts.py # ├── thinking_styles.py # ├── fitness_functions/ # │ └── all_ones.py # └── README.md
- Install Python dependencies:
pip install ollama
To run the Promptbreeder GA, execute the ga.py script from your terminal.
python ga.py [OPTIONS]Command-Line Arguments:
- --task (Required): A high-level description of the task for which prompts are being evolved (e.g., "print 1s", "summarize news articles"). This helps guide initial prompt generation and zeroth-order mutations.
- --seed_prompt (Default: "Print a sequence of ones."): An initial task prompt used as a fallback if LLM generation fails during population initialization.
- --max_gens (Default: 5): The maximum number of generations the GA will run.
- --pop_size (Default: 10): The size of the prompt population. It will be adjusted to an even number if an odd value is provided.
- --fitness_fn (Required): The path to the Python file containing your FitnessEvaluator class (e.g., "fitness_functions/all_ones.py").
- --llm_ga_model <model_name> (Default: "qwen3:0.6b"): The Ollama model to use for GA operations (initial population generation, prompt mutation, hypermutation).
- --llm_fitness_model <model_name> (Default: "qwen3:0.6b"): The Ollama model to use for evaluating prompts within the fitness function.
- --ollama_host (Default: "http://localhost:11434"): The host URL for your Ollama server.
- --temp_ga (Default: 0.7): Temperature for LLM calls during GA operations (higher for more creativity in mutations).
- --temp_fitness (Default: 0.1): Temperature for LLM calls within the fitness evaluator (lower for more deterministic and consistent evaluation).
- --output_file (Default: "ga_results.json"): Path to the JSON file where all GA results (population, fitness per generation) will be saved.
- --max_prompt_length (Default: 256): Maximum length for generated prompts (number of tokens/characters).
python ga.py \
--task="print 1s" \
--seed_prompt="Generate a string of ones." \
--max_gens=3 \
--pop_size=4 \
--max_prompt_length=100 \
--fitness_fn="fitness_functions/all_ones.py" \
--llm_ga_model="qwen3:0.6b" \
--llm_fitness_model="qwen3:0.6b" \
--output_file="my_ones_run_results.json"- ga.py: The main script that orchestrates the Genetic Algorithm, including population management, evaluation, selection, and mutation.
- llm_utils.py: A utility module for interacting with the Ollama LLM client, handling client initialization, API calls, and response cleaning (removing tags and empty lines).
- mutator.py: Contains the mutate_prompt function, which implements the core mutation logic for task prompts (zeroth-order and first-order) and hypermutation for mutation prompts.
- mutation_prompts.py: Defines the INITIAL_MUTATION_PROMPTS list, used to seed the coupled mutation prompts in the GA.
- thinking_styles.py: Contains a list of DEFAULT_THINKING_STYLES that guide the LLM's approach during mutation operations.
- fitness_functions/: A directory for custom fitness evaluation modules.
- fitness_functions/all_ones.py: An example FitnessEvaluator class that scores prompts based on how many '1's the LLM generates in response.
To adapt the Promptbreeder to your specific task:
- Create a new Python file in the fitness_functions/ directory (e.g., my_task_fitness.py).
- Inside this file, define a class named FitnessEvaluator.
- This class must have:
- An init(self, task_description: str, llm_model_name: str, ollama_host: str, temperature: float) method to initialize your evaluator and its LLM client.
- A get_fitness(self, prompt: str) -> int method that takes a task_prompt string and returns an integer fitness score.
- An optional new_generation(self) method, which will be called at the start of each GA generation (useful for loading new test data batches, etc.).
- Pass the path to your new fitness file using the --fitness_fn argument when running ga.py.
Example my_task_fitness.py structure:
# fitness_functions/my_task_fitness.py
from typing import Any
from llm_utils import get_llm_client, call_llm
class FitnessEvaluator:
def __init__(self, task_description: str, llm_model_name: str, ollama_host: str, temperature: float):
self.task_description = task_description
self.llm_client = get_llm_client(llm_model_name, ollama_host)
self.temperature = temperature
# Load your task-specific test data here
# self.test_data = load_my_data(task_description)
def new_generation(self):
# Optional: Prepare for a new generation (e.g., sample new test batch)
pass
def get_fitness(self, prompt: str) -> int:
# Use self.llm_client to evaluate the prompt against your task
# For example, if it's a summarization task:
# summary = call_llm(self.llm_client, f"{prompt}\nSummarize the following text: {self.test_data['article']}", temperature=self.temperature)
# score = evaluate_summary_quality(summary, self.test_data['ground_truth'])
# return score
pass # Implement your actual fitness logic here