Hana 🌸

My homemade text-to-image model - a frankensteined Sana (NVIDIA).

Latest specs:

600M parameter diffuser model (Sana architecture)
Hugging Face's SmolLM2-360M as text encoder
Han lab's Deep Compression AutoEncoder
Trained on ImageNet-1k
.. at home with 4x3090s

This is a public log of my work in progress.

Samples

Alpha-44 samples, more

💪 Training runs - WIP, unfiltered list, mostly fails

Model	Text Encoder	AE	Transformer	Dataset	Compute	Model	Code	Loss	Samples
Beta-8	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116M) `num_layers=12, hidden_dim=768`	IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3 100 epochs, ??? steps, BS 1024, single GPU LR 5e-4 constant 10% label dropout	1xRTX 6000 pro ??? hrs	Model	Code	??? loss
Beta-7	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116M) `num_layers=12, hidden_dim=768`	CC12M+IN21K 256px 17M subset ~4 epochs, 300k steps, BS 256, single GPU LR 5e-4 constant 10% label dropout	1xRTX 6000 Ada, 64 hrs	Model	Code	1.01
Beta-6	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116M) `num_layers=12, hidden_dim=768`	IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3 70 epochs, 280k steps, BS 320, single GPU LR 5e-4 constant 20% label dropout	1xRTX 5090, 28 hrs	Model	Code	1.03
Beta-5	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116M) `num_layers=12, hidden_dim=768`	IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3 40 epochs, 168k steps, BS 320, single GPU LR 5e-4 constant 10% label dropout	1xRTX 5090, 16 hrs	Model	Code	1.03
Beta-4	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116M) `num_layers=12, hidden_dim=768`	IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3 65 epochs, 250k steps, BS 320, single GPU LR 5e-4 constant no label dropout	1xRTX 5090, 25 hrs	Model	Code	1.02
Alpha-45	SmolLM2-360M	dc-ae-f32c32-sana-1.0	Alpha-44	PD12M 256px 200k steps, BS 256 x4, grad_checkpointing LR 4e-4 with linear decay to 1e-4 over 50k steps 10% dropout CFG	4x3090 250hrs	Model	Code	0.89
Alpha-44	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (590M) `num_layers=28, hidden_dim=1152`	IN1k_256px, captions by md2+qwen2-vl+smolvlm2, all aspects 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 4e-4 with linear decay to 1e-4 over 50k steps 10% dropout, CFG	4x3090 150hrs	Model	Code	0.992
Alpha-42	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (590M) `num_layers=28, hidden_dim=1152`	IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3 2 runs, model broke after 45 epochs, restarted at epoch 25 with lower LR 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4 + 1e-4 after restart, 10% dropout is back	4x3090 50hrs + 85 hrs	Model	Code	run1 run2
Alpha-39	SmolLM2-360M	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 32hrs	Model	Code	0.99
Alpha-38	SigLIP2	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 24hrs	Model	Code	0.99
Alpha-37	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 24hrs	Model	Code	0.98
Alpha-35	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_256px recaptioned with md2, bfl16, AR 1:1 3:4 4:3 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 34hrs	Model	Code	0.97
Alpha-34	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel, wide variant (205M) `num_layers=12, hidden_dim=1024`	IN1k_256px, no aug., +aspect ratios 1:1 3:4 4:3 88 epochs, 140k steps, BS 192 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 31hrs	trashed	Code	1.00
Alpha-33	ModernBERT-large	KBlueLeaf/EQ-SDXL-VAE	SanaTransformer2DModel (116.0M) `num_layers=12, hidden_dim=768`	imagenet1k_eqsdxlvae_latents_withShape 4.4 epochs, 16k steps, BS 80 x4, 1000 timesteps, LR 3e-4, 10% dropout, CFG	4x3090 3.4hrs	trashed	Code	0.72
Alpha-32	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_256px, no aug., +aspect ratios 1:1 3:4 4:3 100 epochs, 125k steps, BS 256 x4, 1000 timesteps, LR 5e-4, ~~10% dropout~~, CFG	4x3090 24hrs	Model	Code	0.98 still undertrained I guess
Alpha-31	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_128px, no aug., +aspect ratios 1:1 3:4 4:3 100 epochs, 38.6k steps, BS 832 x4, 1000 timesteps, LR 5e-4, 10% dropout, CFG	4x3090 12hrs	Model	Code	1.14 not 100% sure if a31>30
Alpha-30	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_128px +4 augmentations 102 epochs, 32k steps, BS 1024 x4, 1000 timesteps, LR 5e-4, 10% dropout, CFG	4x3090 16hrs	Model	Code	1.14
Alpha-29	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (125.3M) `num_layers=12, hidden_dim=768`	IN1k_96px +4 augmentations 103 epochs, 32k steps, BS 1024 x4, 1000 timesteps, LR 5e-4, 10% dropout, CFG	4x3090 12.5hrs	Model	Code	1.21 less overfitting with augmentations
Alpha-27	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116.08M) `num_layers=12, hidden_dim=768`	IN1k_96px 143 epochs, BS 1024 x4, 1000 timesteps, LR 5e-4	4x3090 17hrs	none, run killed, overfitting	Code	1.14
Alpha-15	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	CIFAR10 128px 800 epochs, BS 512, 1000 timesteps, LR 5e-4	3x3090 8.5hrs	forgot to save	Code	0.78
Alpha-14	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	CIFAR10 128px 1000 epochs, BS 384, 1000 timesteps, LR 5e-4, scalingf fix	1x3090, 23hrs	Model	Code	0.78
Alpha-13	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	CIFAR10 128px 1000 epochs, BS 384, 1000 timesteps, LR 5e-4	1x3090, 23hrs	trashed	Code	1.9
Alpha-22	ModernBERT-large	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (116.08M) `num_layers=12, hidden_dim=768`	CIFAR10-augmented 64px 500 epochs, BS 1024 * 4, 1000 timesteps, LR 5e-4	4x3090, 2hrs	Model	Code	1.07
Alpha-12	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	CIFAR10 64px 500 epochs, BS 896, 1000 timesteps, LR 5e-4	1x3090, 8hrs	Model	Code	1.27	`["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]`
Alpha-11	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	CIFAR10 64px 500 epochs, BS 896, 1000 timesteps, LR 5e-4	1x3090, 12hrs	Model	Code	1.25	`["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]`
Alpha-20	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (17.8M) `num_layers=7, hidden_dim=384`	Fashion MNIST 20 epochs, 1170 steps, BS 1024, 1000 timesteps (logit normal), LR 5e-4	1x3090, 13mins	gone	Code	0.61 lnormal 0.71 uniform 0.61 beta high 0.82 beta low	notes
Alpha-17	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (17.8M) `num_layers=7, hidden_dim=384`	Fashion MNIST 20 epochs, 600 steps, BS 2048, 1000 timesteps (uniform), LR 5e-4	1x3090, 20mins	Model	Code	0.85	notes
Alpha-8d	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	Fashion MNIST 400 epochs, BS 896, 1000 timesteps	1x4090, 6.2hrs	Model	Code	1.14 wandb	`['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']`
Alpha-8c	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	Fashion MNIST 236 epochs, BS 896, 40 timesteps	1x4090, 2.6hrs	trashed	Code	1.33 wandb
Alpha-8b	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	Fashion MNIST 250 epochs, BS 896, 20 timesteps	1x4090, 2.6hrs	trashed	Code	1.33 wandb
Alpha-8a	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`	Fashion MNIST 250 epochs, BS 896, 10 timesteps	1x4090, 2.6hrs	trashed	Code	1.34 wandb
Alpha-10	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	MNIST 3 epochs, LR 5e-4 BS 256, 10 TS (lognormal)	1x4090, 10'	Model	Code	1.069
Alpha-7	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	MNIST 3 epochs, LR 5e-4, BS 256, 10 TS	1x4090, 10'	Model	Code	0.99
Alpha-9	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	Imagenet-1k 60 epochs, BS 320, LR 5e-4	1x4090, 83hrs	FAIL	Code	2.32
Alpha-5	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	Imagenet-1k 20 epochs, BS 128	1x4090, 22hrs	FAIL	Code	diverging after e2 2.57
Alpha-4	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	MNIST 3 epochs, BS 128	1x4090, 9'	Model	Code	1.050
Alpha-3	ModernBERT-base	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) `num_layers=7`, `cross_attention_dim=1152`	MNIST 85660 steps,=150 epochs, BS 128	1x4090, 8 hrs	Model	Code	0.833
Alpha-2	Gemma2 2b	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) 7 layers instead of 28	MNIST 7940 steps, BS 128	1x4090, 40'	Model	Code	0.933
Alpha-1	Gemma2 2b	dc-ae-f32c32-sana-1.0	SanaTransformer2DModel (158.18M) 7 layers instead of 28	MNIST 5 epochs, LR 1e-4, 300k steps, BS 1	1x4090, 4 hours	Model	Code	0.958

🏅🏅Thank you - people, components, videos, articles

SwayStar ⭐️
cloneofsimo/minRF
HuggingFace's 🧨 diffusers, transformers, SmolLM2, and SmolVLM2
Two more great VLMs: Moondream and Qwen2.5-VL
MITs HAN lab
ModernBERT, answer.ai, Jeremy Howard and Jonathan Whitaker
ostris/ai-toolkit
bghira/SimpleTuner
Google's PartiPrompts

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
archive		archive
assets		assets
blogpost		blogpost
configs		configs
evals		evals
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
requirements.txt		requirements.txt
train.ipynb		train.ipynb
train.py		train.py
transformer_Sana-DiT-XL-smollm2.json		transformer_Sana-DiT-XL-smollm2.json
utils.py		utils.py
utils_captioning.py		utils_captioning.py
utils_datasets.py		utils_datasets.py
utils_preprocess.py		utils_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hana 🌸

Samples

💪 Training runs - WIP, unfiltered list, mostly fails

🏅🏅Thank you - people, components, videos, articles

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

geronimi73/Hana

Folders and files

Latest commit

History

Repository files navigation

Hana 🌸

Samples

💪 Training runs - WIP, unfiltered list, mostly fails

🏅🏅Thank you - people, components, videos, articles

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages