DistributedFunSearch

DistributedFunSearch (disfun) is a multi-node distributed implementation of FunSearch (Romera et al., 2024) that uses LLM-guided evolutionary search to discover novel algorithms. It uses RabbitMQ for asynchronous message passing and works with both API-based LLMs (e.g., GPT-4o via Azure OpenAI) and locally hosted models (defaulting to StarCoder2).

Independent workers: ProgramsDatabase, Samplers, and Evaluators work independently and process tasks asynchronously to maximize throughput
Multi-node execution: Distributes across multiple nodes and allows adding Samplers or Evaluators from the same or different nodes to a running experiment (see Cluster Setup for SLURM/Enroot example)
Dynamic scaling: Automatically spawns/terminates Samplers and Evaluators based on workload and available resources (see Scaling Guide)

In each iteration:

A few-shot prompt is constructed by sampling from the program database, which stores all previously generated functions and their metadata
The LLM generates a new function variant
The function is evaluated on user-defined test cases
If the function is executable and logically distinct from previously stored ones, it is added to the program database along with its evaluation results

Our implementation includes an example application for discovering large deletion-correcting codes. For details on this specific use case, see our paper.

Quickstart

Installation

# 1. Create and activate conda environment
conda create -n env python=3.11 pip numpy==1.26.4 -y
conda activate env

# 2. Install PyTorch (skip if using API-based LLM)
conda install pytorch==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y

# 3. Start RabbitMQ
sudo systemctl start rabbitmq-server

# 4. Install DistributedFunSearch
pip install .

See Docker Setup for container-based installation or Cluster Setup for cluster execution.

Run experiment

This runs the example specification for discovering deletion-correcting codes:

cd src/experiments/experiment1
python -m disfun

# Resume from checkpoint (if needed)
python -m disfun --checkpoint path/to/checkpoint.pkl

You can monitor message load in real time at http://localhost:15672 (login: guest/guest). Enable the management plugin first with sudo rabbitmq-plugins enable rabbitmq_management.

Evolve your problem

Adapt DistributedFunSearch to your problem by defining a specification file (that specifies which function to evolve and how to evaluate it), evaluation inputs (what parameter values to test your evolved function on), and evaluation outputs (score and hash for deduplication, plus optional metrics like speed or memory).

Create your specification

Add a new specification file to src/disfun/specifications/ (see existing examples in Deletions/ or IDS/ folders).

Specification structure:

"""
[Problem description that becomes the LLM prompt]
Improve the `your_function` function over its previous versions below.
Keep the code short and comment for easy understanding.
"""

import your_dependencies

# Helper functions and classes that define your problem
def helper_function(...):
    ...

# Evaluation entry point, must be named "evaluate"
# Called by evaluator with test inputs
def evaluate(params):
    input1, input2, input3 = params
    result, hash_value = solve(input1, input2, input3)
    return (score, hash_value)  # Score and hash for deduplication

# Main evaluation logic which uses the evolved function
def solve(input1, input2, input3):
    # Your problem-specific logic that uses the priority function
    priorities = {item: priority(item, ...) for item in items}
    # ... use priorities to construct solution ...
    return solution, hash_value

# The function that gets evolved by the LLM, must be named "priority"
def priority(item, context):
    """Returns the priority/score for the given item."""
    return 0.0  # Baseline implementation

Explanation:

Docstring: Becomes the problem context in the LLM prompt
Helper functions: Defines your problem (graph construction, constraints, etc.)
evaluate(params): Entry point called by evaluator
solve(...): Implements evaluation logic using the evolved function
priority(item, context): The function that the LLM evolves

The function names evaluate and priority are hardcoded in __main__.py (lines 284, 661). If you want to use different names, you also need to update them there.

The evaluator executes this entire script, calling evaluate() with evaluation inputs.

Configure your evaluation inputs

The evolved function is tested on problem instances defined by tuples (the evaluation inputs). Each tuple specifies one problem instance. For example, (n=10, s=1, q=2) specifies a binary code of length n=10 that corrects s=1 deletion.

The evaluate() function in your specification receives each tuple and uses the parameters to define the problem instance. It may call the evolved function with some, all, or none of these parameters (they define the problem context, not necessarily the function inputs).

Configuration example:

evaluator=EvaluatorConfig(
    spec_path="src/disfun/specifications/Deletions/StarCoder2/load_graph/baseline.txt",
    s_values=[1, 2],        # Scalar or list: error correction levels
    start_n=[5, 7],         # Range start: code lengths (one per s_value)
    end_n=[10, 12],         # Range end: code lengths (one per s_value)
    q=2,                    # Scalar: alphabet size (2=binary, 4=DNA)
    timeout=90,             # Timeout per evaluation in seconds
    max_workers=2,          # Parallel CPU processes per evaluator
)

Customizing:

To change which problem instances are tested, modify the create_evaluation_inputs() function in src/disfun/__main__.py. This function generates tuples from your config parameters. Customize it by:

Adding new parameters to EvaluatorConfig in config.py
Changing how parameters are combined into tuples in create_evaluation_inputs()
Updating your specification's evaluate(params) to unpack the tuple

Your specification's evaluate(params) receives each tuple and uses it to define the problem instance.

Configure your evaluation outputs and scoring

Evaluation outputs:

The evaluate() function returns a tuple (score, hash_value) for each problem instance:

score: Numeric value measuring solution quality (higher is better by default)
hash_value: Optional hash for deduplication (set to None if not needed)

Scoring configuration:

evaluator=EvaluatorConfig(
    # ... evaluation input parameters ...
    mode="last",  # How to aggregate scores: "last" = use largest n for each s, "average", "weighted"
)

How scores are aggregated:

Per problem instance (in src/disfun/evaluator.py function extract_evaluation_result() line 66): Extracts test_output[0] as score and test_output[1] as hash, stores in scores_per_test dictionary with the full problem instance tuple as key (e.g., (n, s, q))
Across all problem instances (in src/disfun/programs_database.py function _reduce_score() line 82): Aggregates the scores_per_test dictionary into a single score that determines sampling. Extracts (n, s) from full tuples and aggregates based on mode set in config ("last" = use largest n for each s, "average", "weighted")

To extract additional outputs (e.g., execution time, memory usage) or change how scores are combined, modify these two functions.

Change the LLM

For different open-source models:

Edit src/disfun/sampler.py line ~64:

checkpoint = "bigcode/starcoder2-15b"  # Any HuggingFace model ID

For OpenAI models:

In config.py, enable GPT mode:

sampler=SamplerConfig(
    gpt=True,
    # ... other sampler config
)

To use a different GPT model, edit src/disfun/gpt.py line 22:

def __init__(self, samples_per_prompt: int, model="gpt-4o-mini"):  # Change model here

Then export your Azure OpenAI credentials:

export AZURE_OPENAI_API_KEY=<your-key>
export AZURE_OPENAI_API_VERSION=<your-version>

Documentation

Configuration Guide: Detailed configuration options, CLI arguments, and config blocks
Scaling Guide: Explanation on how dynamic resource scaling is implemented
Docker Setup: Setup with Docker containers
Cluster Setup: Setup on cluster with SLURM and enroot

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
.devcontainer		.devcontainer
analysis		analysis
docs		docs
fig		fig
src		src
.gitignore		.gitignore
README.md		README.md
load_llm.py		load_llm.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DistributedFunSearch

Quickstart

Installation

Run experiment

Evolve your problem

Create your specification

Configure your evaluation inputs

Configure your evaluation outputs and scoring

Change the LLM

Documentation

About

Uh oh!

Releases

Packages

Languages

MLI-lab/DistributedFunSearch

Folders and files

Latest commit

History

Repository files navigation

DistributedFunSearch

Quickstart

Installation

Run experiment

Evolve your problem

Create your specification

Configure your evaluation inputs

Configure your evaluation outputs and scoring

Change the LLM

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages