Minimal Promptbreeder Implementation

This repository contains a minimal Python implementation of the Promptbreeder concept, a self-referential self-improvement method for Large Language Models (LLMs) that evolves prompts for a given domain. Inspired by the Promptbreeder paper, this project demonstrates the core ideas of using a genetic algorithm to evolve task-specific prompts and their associated mutation instructions.

Features

Genetic Algorithm Core: Implements a binary tournament genetic algorithm to evolve a population of prompts.
Coupled Prompt Evolution: Each "individual" in the population consists of a (task_prompt, mutation_prompt) pair, allowing the mutation mechanism itself to evolve alongside the task prompts.
Hypermutation: Includes a mechanism where the mutation_prompt itself can be mutated (hypermutation). This mutation is conditionally accepted based on a "quick check" of its effectiveness.
Zeroth-Order Mutation: Allows for generating entirely new task prompts from scratch based on the overall task description.
First-Order Mutation: Modifies an existing task prompt using its coupled mutation_prompt.
Dynamic Fitness Evaluation: Supports loading external Python files as FitnessEvaluator classes, enabling flexible and task-specific fitness scoring.
Ollama Integration: Uses ollama-python for local LLM inference, making it easy to run experiments with various models.
Results Logging: Saves detailed generation-wise data (population, fitness, best prompts) to a JSON file for analysis.

Prerequisites

Before running this project, ensure you have:

Python 3.8+: Installed on your system.
Ollama: Installed and running. Download from ollama.com.
Ollama Models: Pull the necessary LLM models. The default models used in this project are qwen3:0.6b. You can pull them using:
```
ollama pull qwen3:0.6b
```

Setup

Clone the repository (or create the files):

# If you're creating files manually, ensure they are in the correct structure
# promptbreeder_project/
# ├── ga.py
# ├── llm_utils.py
# ├── mutator.py
# ├── mutation_prompts.py
# ├── thinking_styles.py
# ├── fitness_functions/
# │   └── all_ones.py
# └── README.md

Install Python dependencies:
```
pip install ollama
```

Usage

To run the Promptbreeder GA, execute the ga.py script from your terminal.

python ga.py [OPTIONS]

Command-Line Arguments:

--task (Required): A high-level description of the task for which prompts are being evolved (e.g., "print 1s", "summarize news articles"). This helps guide initial prompt generation and zeroth-order mutations.
--seed_prompt (Default: "Print a sequence of ones."): An initial task prompt used as a fallback if LLM generation fails during population initialization.
--max_gens (Default: 5): The maximum number of generations the GA will run.
--pop_size (Default: 10): The size of the prompt population. It will be adjusted to an even number if an odd value is provided.
--fitness_fn (Required): The path to the Python file containing your FitnessEvaluator class (e.g., "fitness_functions/all_ones.py").
--llm_ga_model <model_name> (Default: "qwen3:0.6b"): The Ollama model to use for GA operations (initial population generation, prompt mutation, hypermutation).
--llm_fitness_model <model_name> (Default: "qwen3:0.6b"): The Ollama model to use for evaluating prompts within the fitness function.
--ollama_host (Default: "http://localhost:11434"): The host URL for your Ollama server.
--temp_ga (Default: 0.7): Temperature for LLM calls during GA operations (higher for more creativity in mutations).
--temp_fitness (Default: 0.1): Temperature for LLM calls within the fitness evaluator (lower for more deterministic and consistent evaluation).
--output_file (Default: "ga_results.json"): Path to the JSON file where all GA results (population, fitness per generation) will be saved.
--max_prompt_length (Default: 256): Maximum length for generated prompts (number of tokens/characters).

python ga.py \
    --task="print 1s" \
    --seed_prompt="Generate a string of ones." \
    --max_gens=3 \
    --pop_size=4 \
    --max_prompt_length=100 \
    --fitness_fn="fitness_functions/all_ones.py" \
    --llm_ga_model="qwen3:0.6b" \
    --llm_fitness_model="qwen3:0.6b" \
    --output_file="my_ones_run_results.json"

File Structure

ga.py: The main script that orchestrates the Genetic Algorithm, including population management, evaluation, selection, and mutation.
llm_utils.py: A utility module for interacting with the Ollama LLM client, handling client initialization, API calls, and response cleaning (removing tags and empty lines).
mutator.py: Contains the mutate_prompt function, which implements the core mutation logic for task prompts (zeroth-order and first-order) and hypermutation for mutation prompts.
mutation_prompts.py: Defines the INITIAL_MUTATION_PROMPTS list, used to seed the coupled mutation prompts in the GA.
thinking_styles.py: Contains a list of DEFAULT_THINKING_STYLES that guide the LLM's approach during mutation operations.
fitness_functions/: A directory for custom fitness evaluation modules.
fitness_functions/all_ones.py: An example FitnessEvaluator class that scores prompts based on how many '1's the LLM generates in response.

Customizing Fitness Functions

To adapt the Promptbreeder to your specific task:

Create a new Python file in the fitness_functions/ directory (e.g., my_task_fitness.py).
Inside this file, define a class named FitnessEvaluator.
This class must have:
- An init(self, task_description: str, llm_model_name: str, ollama_host: str, temperature: float) method to initialize your evaluator and its LLM client.
- A get_fitness(self, prompt: str) -> int method that takes a task_prompt string and returns an integer fitness score.
- An optional new_generation(self) method, which will be called at the start of each GA generation (useful for loading new test data batches, etc.).
Pass the path to your new fitness file using the --fitness_fn argument when running ga.py.

Example my_task_fitness.py structure:

# fitness_functions/my_task_fitness.py
from typing import Any
from llm_utils import get_llm_client, call_llm

class FitnessEvaluator:
    def __init__(self, task_description: str, llm_model_name: str, ollama_host: str, temperature: float):
        self.task_description = task_description
        self.llm_client = get_llm_client(llm_model_name, ollama_host)
        self.temperature = temperature
        # Load your task-specific test data here
        # self.test_data = load_my_data(task_description)

    def new_generation(self):
        # Optional: Prepare for a new generation (e.g., sample new test batch)
        pass

    def get_fitness(self, prompt: str) -> int:
        # Use self.llm_client to evaluate the prompt against your task
        # For example, if it's a summarization task:
        # summary = call_llm(self.llm_client, f"{prompt}\nSummarize the following text: {self.test_data['article']}", temperature=self.temperature)
        # score = evaluate_summary_quality(summary, self.test_data['ground_truth'])
        # return score
        pass # Implement your actual fitness logic here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal Promptbreeder Implementation

Features

Prerequisites

Setup

Usage

File Structure

Customizing Fitness Functions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
fitness_functions		fitness_functions
LICENSE		LICENSE
README.md		README.md
ga.py		ga.py
llm_utils.py		llm_utils.py
mutation_prompts.py		mutation_prompts.py
mutator.py		mutator.py
thinking_styles.py		thinking_styles.py

License

dylski/promptbreeder

Folders and files

Latest commit

History

Repository files navigation

Minimal Promptbreeder Implementation

Features

Prerequisites

Setup

Usage

File Structure

Customizing Fitness Functions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages