STEPS is an efficient framework for optimizing text prompts in text-to-image (T2I) generation through sequential probability tensor decomposition.
STEPS provides several key parameters for optimization:
alg: The running algorithmprompt_len: Length of the prompt sequenceiter: Number of optimization iterationsrank: Rank of tensor train decompositiontop_n: Number of candidates to reduce the sequentially increasing the memorysample_bs: The maximum sampling sizedataset_name: The dataset to run the algorithm
diffusers==0.11.1
ftfy==6.3.1
horovod==0.28.1
huggingface_hub==0.25.2
jax==0.4.34
numpy==2.1.3
optax==0.2.4
Pillow==11.0.0
regex==2024.9.11
Requests==2.32.3
sentence_transformers==2.2.2
timm==1.0.11
torch==1.13.0
torchvision==0.14.0
tqdm==4.66.5
transformers==4.23.1
- Install dependencies:
pip install -r requirements.txt
-
Prepare your dataset in
data/. -
Configure parameters:
python run_STEPS.py \
--alg td \
--prompt_len 10 \
--iter 100 \
--rank 10 \
--topk 64
--top_n 8 \
--sample_bs 1000 \
--dataset_name coco \
- Run optimization
For detailed examples, please refer to the code documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
We thank the contributors and maintainers of the following projects that made STEPS possible: