Skip to content

geronimi73/Hana

Repository files navigation

Hana 🌸

My homemade text-to-image model - a frankensteined Sana (NVIDIA).

Latest specs:

This is a public log of my work in progress.

Samples

Alpha-44 samples, more

💪 Training runs - WIP, unfiltered list, mostly fails

Model       Text Encoder AE Transformer Dataset           Compute Model Code Loss     Samples                                
Beta-8 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116M) num_layers=12, hidden_dim=768 IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3
100 epochs, ??? steps,
BS 1024, single GPU
LR 5e-4 constant
10% label dropout
1xRTX 6000 pro ??? hrs Model Code ??? loss
Beta-7 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116M) num_layers=12, hidden_dim=768 CC12M+IN21K 256px 17M subset
~4 epochs, 300k steps,
BS 256, single GPU
LR 5e-4 constant
10% label dropout
1xRTX 6000 Ada, 64 hrs Model Code 1.01 loss clipscore gallery
Beta-6 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116M) num_layers=12, hidden_dim=768 IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3
70 epochs, 280k steps,
BS 320, single GPU
LR 5e-4 constant
20% label dropout
1xRTX 5090, 28 hrs Model Code 1.03 loss clipscore gallery
Beta-5 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116M) num_layers=12, hidden_dim=768 IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3
40 epochs, 168k steps,
BS 320, single GPU
LR 5e-4 constant
10% label dropout
1xRTX 5090, 16 hrs Model Code 1.03 loss clipscore gallery
Beta-4 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116M) num_layers=12, hidden_dim=768 IN1k_256px recaptioned with md2+qwen2-vl+smolvlm2, AR 1:1 3:4 4:3
65 epochs, 250k steps,
BS 320, single GPU
LR 5e-4 constant
no label dropout
1xRTX 5090, 25 hrs Model Code 1.02 loss clipscore gallery
Alpha-45 SmolLM2-360M dc-ae-f32c32-sana-1.0 Alpha-44 PD12M 256px
200k steps,
BS 256 x4, grad_checkpointing
LR 4e-4 with linear decay to 1e-4 over 50k steps
10% dropout
CFG
4x3090 250hrs Model Code 0.89 loss clipscore gallery output
Alpha-44 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (590M) num_layers=28, hidden_dim=1152 IN1k_256px, captions by md2+qwen2-vl+smolvlm2, all aspects
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 4e-4 with linear decay to 1e-4 over 50k steps
10% dropout,
CFG
4x3090 150hrs Model Code 0.992 loss_eval clipscore gallery output
Alpha-42 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (590M) num_layers=28, hidden_dim=1152 IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3 2 runs, model broke after 45 epochs, restarted at epoch 25 with lower LR
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4 + 1e-4 after restart,
10% dropout is back
4x3090 50hrs + 85 hrs Model Code run1 run2 loss_eval clipscore gallery_small output
Alpha-39 SmolLM2-360M dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 32hrs Model Code 0.99 loss_eval clipscore gallery
Alpha-38 SigLIP2 dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 24hrs Model Code 0.99 loss_eval clipscore gallery output_half
Alpha-37 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_256px, captions by md2+qwen2-vl+smolvlm2, bfl16, AR 1:1 3:4 4:3
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 24hrs Model Code 0.98 loss_eval clipscore gallery_eval output_half
Alpha-35 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_256px recaptioned with md2, bfl16, AR 1:1 3:4 4:3
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 34hrs Model Code 0.97 loss_eval gallery_eval output_half
Alpha-34 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel, wide variant (205M) num_layers=12, hidden_dim=1024 IN1k_256px, no aug., +aspect ratios 1:1 3:4 4:3
88 epochs,
140k steps,
BS 192 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 31hrs trashed Code 1.00 loss gallery
Alpha-33 ModernBERT-large KBlueLeaf/EQ-SDXL-VAE SanaTransformer2DModel (116.0M) num_layers=12, hidden_dim=768 imagenet1k_eqsdxlvae_latents_withShape
4.4 epochs,
16k steps,
BS 80 x4,
1000 timesteps,
LR 3e-4,
10% dropout,
CFG
4x3090 3.4hrs trashed Code 0.72 loss clipscore gallery
Alpha-32 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_256px, no aug., +aspect ratios 1:1 3:4 4:3
100 epochs,
125k steps,
BS 256 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 24hrs Model Code 0.98 W B Chart 3_2_2025, 6_29_56 PM still undertrained I guess loss_eval gallery output_half
Alpha-31 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_128px, no aug., +aspect ratios 1:1 3:4 4:3
100 epochs,
38.6k steps,
BS 832 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 12hrs Model Code 1.14 loss not 100% sure if a31>30 gallery gallery-2
Alpha-30 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_128px +4 augmentations
102 epochs,
32k steps,
BS 1024 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 16hrs Model Code 1.14 loss gallery
Alpha-29 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (125.3M) num_layers=12, hidden_dim=768 IN1k_96px +4 augmentations
103 epochs,
32k steps,
BS 1024 x4,
1000 timesteps,
LR 5e-4,
10% dropout,
CFG
4x3090 12.5hrs Model Code 1.21 loss less overfitting with augmentations gallery
Alpha-27 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116.08M) num_layers=12, hidden_dim=768 IN1k_96px
143 epochs,
BS 1024 x4,
1000 timesteps,
LR 5e-4
4x3090 17hrs none, run killed, overfitting Code 1.14 loss gallery
Alpha-15 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 CIFAR10 128px
800 epochs,
BS 512,
1000 timesteps,
LR 5e-4
3x3090 8.5hrs forgot to save Code 0.78 loss eval_images
Alpha-14 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 CIFAR10 128px
1000 epochs,
BS 384,
1000 timesteps,
LR 5e-4, scalingf fix
1x3090, 23hrs Model Code 0.78 W B Chart 2_6_2025, 7_15_45 AM gallery
Alpha-13 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 CIFAR10 128px
1000 epochs,
BS 384,
1000 timesteps,
LR 5e-4
1x3090, 23hrs trashed Code 1.9 W B Chart 2_4_2025, 9_11_30 PM media_images_images_eval_127_4ac141f8512cc0ee4592
Alpha-22 ModernBERT-large dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (116.08M) num_layers=12, hidden_dim=768 CIFAR10-augmented 64px
500 epochs,
BS 1024 * 4,
1000 timesteps,
LR 5e-4
4x3090, 2hrs Model Code 1.07 W B Chart 2_17_2025, 9_57_35 PM gallery_ov copy 2 media_images_images_eval_650_90546f10f12bce35da68
Alpha-12 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 CIFAR10 64px
500 epochs,
BS 896,
1000 timesteps,
LR 5e-4
1x3090, 8hrs Model Code 1.27 W B Chart 2_3_2025, 6_53_37 AM gallery ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
Alpha-11 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 CIFAR10 64px
500 epochs,
BS 896,
1000 timesteps,
LR 5e-4
1x3090, 12hrs Model Code 1.25 W B Chart 2_2_2025, 9_08_09 AM media_images_images_eval_2919_7c7a3df35d986984f8b1 (1) ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
Alpha-20 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (17.8M) num_layers=7, hidden_dim=384 Fashion MNIST
20 epochs,
1170 steps,
BS 1024,
1000 timesteps (logit normal),
LR 5e-4
1x3090, 13mins gone Code 0.61 lnormal 0.71 uniform 0.61 beta high 0.82 beta low loss normal notes
Alpha-17 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (17.8M) num_layers=7, hidden_dim=384 Fashion MNIST
20 epochs,
600 steps,
BS 2048,
1000 timesteps (uniform),
LR 5e-4
1x3090, 20mins Model Code 0.85 loss clipscore gallery notes
Alpha-8d ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 Fashion MNIST
400 epochs,
BS 896,
1000 timesteps
1x4090, 6.2hrs Model Code 1.14 wandb loss_comparison media_images_images_eval_1463_2941cfe11fbe584e6130 ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Alpha-8c ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 Fashion MNIST
236 epochs,
BS 896,
40 timesteps
1x4090, 2.6hrs trashed Code 1.33 wandb media_images_images_eval_858_39143ae078839e67f230 (1)
Alpha-8b ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 Fashion MNIST
250 epochs,
BS 896,
20 timesteps
1x4090, 2.6hrs trashed Code 1.33 wandb media_images_images_eval_891_65d6f480d656633d9ff8
Alpha-8a ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7 Fashion MNIST
250 epochs,
BS 896,
10 timesteps
1x4090, 2.6hrs trashed Code 1.34 wandb media_images_images_eval_914_984ebb3d40cd269bf7f2
Alpha-10 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 MNIST
3 epochs,
LR 5e-4
BS 256, 10 TS (lognormal)
1x4090, 10' Model Code 1.069 loss_TS-lin-vs-logn media_images_images_eval_41_3599422a42f52634bd6b
Alpha-7 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 MNIST
3 epochs,
LR 5e-4,
BS 256, 10 TS
1x4090, 10' Model Code 0.99 media_images_images_eval_41_f15cda102f6f1ed8ee3b
Alpha-9 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 Imagenet-1k
60 epochs,
BS 320,
LR 5e-4
1x4090, 83hrs FAIL Code 2.32 W B Chart 1_31_2025, 3_38_00 PM media_images_images_eval_13125_115627eb4d3d574732aa (1)
Alpha-5 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 Imagenet-1k
20 epochs,
BS 128
1x4090, 22hrs FAIL Code diverging after e2 2.57 media_images_images_eval_11065_a98b1c9fd31b90e95541
Alpha-4 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 MNIST
3 epochs,
BS 128
1x4090, 9' Model Code 1.050 media_images_images_eval_76_61a68bef7c70a802ced5
Alpha-3 ModernBERT-base dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) num_layers=7, cross_attention_dim=1152 MNIST
85660 steps,=150 epochs,
BS 128
1x4090, 8 hrs Model Code 0.833 media_images_images_eval_670_f1a015427c67c3e5933e
Alpha-2 Gemma2 2b dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) 7 layers instead of 28 MNIST
7940 steps,
BS 128
1x4090, 40' Model Code 0.933 images_eval_406_016a491efda29b0ea833
Alpha-1 Gemma2 2b dc-ae-f32c32-sana-1.0 SanaTransformer2DModel (158.18M) 7 layers instead of 28 MNIST
5 epochs,
LR 1e-4,
300k steps,
BS 1
1x4090, 4 hours Model Code 0.958 images_eval_15565_b120cf6385fa11612684

🏅🏅Thank you - people, components, videos, articles

About

A toy text-to-image model trained from scratch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published