This repository provides a reference implementation of A*-Decoding for SLMs. A*-Decoding is an advanced search algorithm designed to efficiently find high-quality completions from language models, especially for tasks requiring complex reasoning and step-by-step solutions.
In addition to A*-Decoding, the repository includes competitive baseline methods such as Best-of-N search, Self-Consistency decoding, and Particle Filtering, enabling direct comparison and benchmarking across different decoding strategies.
- A*-Decoding: Core implementation for efficient, guided search in LLMs.
- Best-of-N Search: Selects the highest-scoring answer from N independently sampled completions, where scoring is done using a PRM.
- Self-Consistency Decoding: Picks the most frequent answer from multiple Chain-Of-Thought samples.
- Particle Filtering: Generates multiple trajectories by iteratively sampling next steps, with probabilities weighted by the PRM scores of the current partial solutions.
- Evaluation Suite: Tools for Chain-Of-Thought benchmarking, grading, and verifying model outputs on math and reasoning datasets.
- Configurable Experiments: Easily switch between models, datasets, and decoding strategies using JSON configuration files.
a_star_decoding/— Main A*-Decoding implementation.best_of_n/— Best-of-N search baseline.self_consistency/— Self-consistency decoding baseline.pf/— Particle Filtering guide.configs/— JSON configuration files for experiments.data/— Datasets and experiment results.evaluate/— Scripts and utilities for evaluation and grading.models/— Model configuration and utility code.run_scripts/— Shell scripts to run experiments.constants.py,pyproject.toml,Dockerfile— Project setup and dependencies.
- Clone the repository:
git clone <repo-url> cd a_star_decoding
- Install dependencies:
- Using Poetry:
poetry install
- Or with pip:
pip install -r requirements.txt
- Using Poetry:
- (Optional) Docker:
docker build -t a_star_decoding . docker run -it a_star_decoding
All config files can be found in configs/.
- A* Decoding:
./run_scripts/run_a_star.sh HF_TOKEN MATH500 llama1b_qwen7b
- Best-of-N Search:
./run_scripts/run_bon.sh HF_TOKEN MATH500 bon_llama1b
- Self-Consistency:
./run_scripts/run_sco.sh HF_TOKEN CO_API_KEY MATH500 sco_llama1b_commandrp
- Evaluation:
./run_scripts/run_bench.sh HF_TOKEN MATH500 meta-llama/Llama-3.2-1B-instruct
- Datasets are in
data/datasets/(e.g.,AIME24,MATH500). - Experiment results are saved in
data/runs/.
- Experiment settings are controlled via JSON files in
configs/. - Modify these files to change models, datasets, or decoding parameters.
If you use this codebase in your research, please cite the relevant papers or this repository. For example:
@misc{chatziveroglou2025adecodingtokenefficientinferencescaling,
title={A*-Decoding: Token-Efficient Inference Scaling},
author={Giannis Chatziveroglou},
year={2025},
eprint={2505.13672},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.13672},
}For questions or contributions, please open an issue or pull request.
