Flux1 dev try #3

wkpark · 2024-08-31T18:03:55Z

FLUX1 support

⚠experimental Flux1 support

support direct FP8 state_dict loading
misc FLUX1 support patches added based on the previous SD3 work done by A1111 - https://github.com/AUTOMATIC1111/stable-diffusion-webui/commits/master/modules/models/sd3
- and comfyui's flux fixes - https://github.com/comfyanonymous/ComfyUI/commits/master/comfy/ldm/flux

Usage

minimal pytorch version v2.1.2
download FLUX1 FP8 safetensors model file from civitai or original huggingface repo.
~~download AutoEncoder from huggingface repo https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors (FP32)~~
use Euler, Restart samplers
CFG 1
TAESD work // VAE approx not work (there is no VAE approx for flux)

ChangeLog

LoRA support added. (09/09)
baked VAE supported.
one time pass lora support to use LoRA with less memory. (Option -> Optimization -> use LoRA without backup weight
some minor attention optimization.
reduce VRAM usage.

@timudk

License: Apache 2.0 original author: Tim Dockhorn @timudk

rkfg · 2024-08-31T23:11:38Z

Thank you for your efforts! However, I can't make it work. If I load the Civitai version (11 GB) it's recognized as SD1 and it loads the v1 config, then quickly renders completely gray output. If I load the official fp8 version (17 GB), it loads the Flux config and rendering is slower. However, the result is still the same, a gray image. I suppose it doesn't use T5 and VAE, it's not clear where they should be put (there are no errors whatsoever in the log). I see there are some environment vars in the code that probably contain paths?

stablediff-runner-cuda  | 2024-08-31T23:13:04.175171516Z Loading model flux1-dev-fp8.safetensors [8e91b68084] (2 out of 3)
stablediff-runner-cuda  | 2024-08-31T23:13:04.178122249Z Loading weights [8e91b68084] from /stablediff-web/models/Stable-diffusion/flux1-dev-fp8.safetensors
stablediff-runner-cuda  | 2024-08-31T23:13:04.196063647Z Creating model from config: /stablediff-web/configs/flux1-inference.yaml
stablediff-runner-cuda  | 2024-08-31T23:13:04.196584786Z state_dict dtype: {'model.diffusion_model.': {torch.float8_e4m3fn: 780}, 'text_encoders.': {torch.float32: 2, torch.float16: 197, torch.float8_e4m3fn: 219}, 'vae.': {torch.float32: 244}}
stablediff-runner-cuda  | 2024-08-31T23:13:05.726793994Z state_dict dtype: {'model.diffusion_model.': {torch.float8_e4m3fn: 780}, 'text_encoders.': {torch.float32: 2, torch.float16: 197, torch.float8_e4m3fn: 219}, 'vae.': {torch.float32: 244}}
stablediff-runner-cuda  | 2024-08-31T23:13:05.747544610Z Applying attention optimization: xformers... done.
stablediff-runner-cuda  | 2024-08-31T23:13:05.768056293Z Model loaded in 1.6s (create model: 0.2s, apply weights to model: 1.4s).
stablediff-runner-cuda  | 2024-08-31T23:13:07.553464063Z 
Total progress: 100%|██████████| 20/20 [00:45<00:00,  2.26s/it]
Total progress: 100%|██████████| 20/20 [00:45<00:00,  2.32s/it]

I have 3090 Ti / Debian testing.

wkpark · 2024-08-31T23:52:40Z

11G model has only U net weights and not compatible with A1111 and yes T5xxl text encoder not used at all , could be activated with the sd3 t5 option recently added

and you need to install official as.safetensors VAE and select it at the vae ui to fix grayed result

rkfg · 2024-09-01T10:52:21Z

Ah, interesting! Yes, I bound the VAE and now it works. I also tried enabling T5 but it OOMs on generation, I assume the encoder isn't unloaded as it is in CUI so they both don't fit in the VRAM. Hopefully this can be fixed relatively easily.

rkfg · 2024-09-01T17:09:20Z

Can run with T5XXL enabled but only with --medvram switch, otherwise it OOMs even before generation begins. Apparently 24 GB is the new mid now 😩

PS: I think it's worth adding another flag such as --medvram-flux to only apply it for Flux. The slowdown isn't very big, for me it's 10.3s without the flag for an SDXL generation vs 11.4s with it. But there's no reason to sacrifice it if SDXL fits my VRAM. I'll take a look later, it's basically one line in https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/82a973c04367123ae98bd9abdf80d9eda9b910e2/modules/lowvram.py#L21 just need to find an attribute that's only present in Flux and not other models.

Ultron55 · 2024-09-04T18:49:33Z

I uploaded the model, added ae.safetensors to the VAE folder, selected it in settings, but it gives this error

state_dict dtype: {}
loading stable diffusion model: AttributeError
Traceback (most recent call last):
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\stable-diffusion-webui\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "C:\stable-diffusion-webui\modules\shared_items.py", line 175, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 769, in get_sd_model
    load_model()
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 893, in load_model
    use_fp8_unet = has_loadable_weights("F8", "model.diffusion_model.", state_dict=state_dict)
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 448, in has_loadable_weights
    "F8": (torch.float8_e4m3fn,),
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'

rkfg · 2024-09-04T21:33:34Z

You probably need to update your torch

* check supported dtypes * detect non_blocking * update autocast() to use non_blocking, target_device and current_dtype

- check float8 unet dtype to save memory - check vae dtype

* add QkvLinear class for Flux lora

* devices.dtype_unet, dtype_vae could be considered as storage dtypes * use devices.dtype_inference as computational dtype * misc fixes to support float8 unet storage

Ultron55 · 2024-09-08T15:21:04Z

You probably need to update your torch

Ok. My bad.
I was in no hurry to update myself, because before that I somehow had to roll back some of the libraries after the update - the new one was not supported.

Now I have another problem: when I try to select ae.safetensors in vae I get this

  File "C:\stable-diffusion-webui\modules\options.py", line 165, in set
    option.onchange()
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 14, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\initialize_util.py", line 182, in <lambda>
    shared.opts.onchange("sd_vae", wrap_queued_call(lambda: sd_vae.reload_vae_weights()), call=False)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 273, in reload_vae_weights
    load_vae(sd_model, vae_file, vae_source)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 212, in load_vae
    _load_vae_dict(model, vae_dict_1)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 239, in _load_vae_dict
    model.first_stage_model.load_state_dict(vae_dict_1)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AutoencoderKL:
        Missing key(s) in state_dict: "quant_conv.weight", "quant_conv.bias", "post_quant_conv.weight", "post_quant_conv.bias".
        size mismatch for encoder.conv_out.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 512, 3, 3]).
        size mismatch for encoder.conv_out.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([8]).
        size mismatch for decoder.conv_in.weight: copying a param with shape torch.Size([512, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 4, 3, 3]).

If I try to generate like now, I get a solid gray image

* replace rearrange to view AUTOMATIC1111#15804 * see also lllyasviel/stable-diffusion-webui-forge@79adfa8 * conditional use torch.rms_norm for torch 2.4 * fix RMSNorm() for clear: use torch.ones()

wkpark · 2024-09-08T15:36:02Z

you don't have to use VAE separatly now. (VAE baked checkpoint correctly supported now)

and there is some A1111 bug exist to prevent changing VAE.
in this case, change VAE to None first then change to another VAE.

import Flux from https://github.com/black-forest-labs/flux/

bbba746

License: Apache 2.0 original author: Tim Dockhorn @timudk

wkpark mentioned this pull request Aug 31, 2024

[Feature Request]: Flux AUTOMATIC1111/stable-diffusion-webui#16311

Open

1 task

wkpark added 5 commits September 1, 2024 12:26

fix for A1111 webui

8e847ee

support Flux1

c0d4ab0

fix for flux

7932335

add cheap approximation for flux

4eee381

fix for float8_*

67bf106

wkpark force-pushed the flux1-dev-try branch from 4bce6b3 to c97d652 Compare September 2, 2024 18:48

wkpark added 9 commits September 5, 2024 09:27

fix for t5xxl

47a601c

fix misc

495212d

* check supported dtypes * detect non_blocking * update autocast() to use non_blocking, target_device and current_dtype

check Unet/VAE and load as is

4dc5c90

- check float8 unet dtype to save memory - check vae dtype

patch reset_parameters

90ff052

preserve detected dtype_inference

233e05f

add diffusers weight mapping for flux lora

0124ec3

* add QkvLinear class for Flux lora

fix for Lora flux

a5e057d

misc fixes to support float8 dtype_unet

45a3dae

* devices.dtype_unet, dtype_vae could be considered as storage dtypes * use devices.dtype_inference as computational dtype * misc fixes to support float8 unet storage

add shared.opts.lora_without_backup_weight option to reduce ram usage

673e665

wkpark force-pushed the flux1-dev-try branch from c97d652 to 9b598e6 Compare September 8, 2024 15:14

wkpark added 5 commits September 9, 2024 00:24

support copy option to reduce ram usage

0e627de

optimize

c64dd9a

* replace rearrange to view AUTOMATIC1111#15804 * see also lllyasviel/stable-diffusion-webui-forge@79adfa8 * conditional use torch.rms_norm for torch 2.4 * fix RMSNorm() for clear: use torch.ones()

reduce memort usage

c2c44c6

vae fix for flux

9a50880

check vae/ text_encoders dtype and use as intended

c5d84c4

wkpark force-pushed the flux1-dev-try branch from 41b68ac to c5d84c4 Compare September 8, 2024 15:25

wkpark self-assigned this Sep 8, 2024

wkpark added 4 commits September 10, 2024 19:05

minor update

f744462

reduce intermediate steps and optimize

88135af

fix to support dtype_inference != dtype case

aaacdbc

use empty_like() and partial revert for speed

5075552

wkpark force-pushed the flux1-dev-try branch from 12abd36 to deaad69 Compare September 13, 2024 10:18

wkpark added 2 commits September 13, 2024 19:59

fix for pytest

a6c55b2

pytest with --precision full --no-half

5cb6200

wkpark force-pushed the flux1-dev-try branch from 539dd98 to 5cb6200 Compare September 13, 2024 11:01

minor fixes

4499384

wkpark mentioned this pull request Sep 13, 2024

flux support with fp8 freeze model AUTOMATIC1111/stable-diffusion-webui#16484

Open

13 tasks

wkpark added 2 commits September 15, 2024 23:40

fix lora without backup

16e8590

revert to use without_autocast()

517c395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux1 dev try #3

Flux1 dev try #3

Uh oh!

wkpark commented Aug 31, 2024 •

edited

Loading

Uh oh!

rkfg commented Aug 31, 2024 •

edited

Loading

Uh oh!

wkpark commented Aug 31, 2024 •

edited

Loading

Uh oh!

rkfg commented Sep 1, 2024

Uh oh!

rkfg commented Sep 1, 2024 •

edited

Loading

Uh oh!

Ultron55 commented Sep 4, 2024 •

edited

Loading

Uh oh!

rkfg commented Sep 4, 2024

Uh oh!

Ultron55 commented Sep 8, 2024

Uh oh!

wkpark commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Flux1 dev try #3

Are you sure you want to change the base?

Flux1 dev try #3

Uh oh!

Conversation

wkpark commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FLUX1 support

Usage

ChangeLog

Uh oh!

rkfg commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wkpark commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkfg commented Sep 1, 2024

Uh oh!

rkfg commented Sep 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ultron55 commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkfg commented Sep 4, 2024

Uh oh!

Ultron55 commented Sep 8, 2024

Uh oh!

wkpark commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wkpark commented Aug 31, 2024 •

edited

Loading

rkfg commented Aug 31, 2024 •

edited

Loading

wkpark commented Aug 31, 2024 •

edited

Loading

rkfg commented Sep 1, 2024 •

edited

Loading

Ultron55 commented Sep 4, 2024 •

edited

Loading