A Python project that trains a Large Language Model (LLM) to perform listwise reranking for information retrieval.
Usually, reranking tasks are performed by cross-encoder, so why use an LLM?
Compared to traditional cross-encoder rerankers, LLMs offer several advantages:
- Listwise reasoning: LLMs can consider the entire candidate list jointly, enabling true listwise ranking instead of scoring documents independently.
- Explainability: LLMs can justify their reranking.
- Task adaptability: The same LLM can be adapted to different domains or ranking criteria (e.g. factuality, diversity, recency).
The pipeline:
- Build a reranking dataset using BM25 as the first-stage retriever
- Fine-tune an LLM to rerank candidate documents listwise
- Evaluate reranking performance with standard IR metrics
- Python 3.12+
- The project can be installed and run using uv.
- Other dependancies are specified in
pyproject.toml
Dependancies can be installed with:
uv sync
@TODO
cd src
uv run dataset.py
cd src
uv run train.py
cd src
uv run infer.py [--model-type {base,fine-tuned} ...] [--dataset {train,validation}]
Example usage for inference of using the fine-tuned model on the validation dataset:
uv run infer.py --model-type fine-tuned --dataset validation
cd src
uv run eval.py
- The queries from the original MSMarco dataset that are used in this project are the one which have only one corresponding document. This could change in the future.
- Implement logging system.