Skip to content

Conversation

@wkpark
Copy link
Owner

@wkpark wkpark commented Aug 31, 2024

FLUX1 support

⚠experimental Flux1 support

Usage

ChangeLog

  • LoRA support added. (09/09)
  • baked VAE supported.
  • one time pass lora support to use LoRA with less memory. (Option -> Optimization -> use LoRA without backup weight
  • some minor attention optimization.
  • reduce VRAM usage.

License: Apache 2.0
original author: Tim Dockhorn @timudk
@rkfg
Copy link

rkfg commented Aug 31, 2024

Thank you for your efforts! However, I can't make it work. If I load the Civitai version (11 GB) it's recognized as SD1 and it loads the v1 config, then quickly renders completely gray output. If I load the official fp8 version (17 GB), it loads the Flux config and rendering is slower. However, the result is still the same, a gray image. I suppose it doesn't use T5 and VAE, it's not clear where they should be put (there are no errors whatsoever in the log). I see there are some environment vars in the code that probably contain paths?

stablediff-runner-cuda  | 2024-08-31T23:13:04.175171516Z Loading model flux1-dev-fp8.safetensors [8e91b68084] (2 out of 3)
stablediff-runner-cuda  | 2024-08-31T23:13:04.178122249Z Loading weights [8e91b68084] from /stablediff-web/models/Stable-diffusion/flux1-dev-fp8.safetensors
stablediff-runner-cuda  | 2024-08-31T23:13:04.196063647Z Creating model from config: /stablediff-web/configs/flux1-inference.yaml
stablediff-runner-cuda  | 2024-08-31T23:13:04.196584786Z state_dict dtype: {'model.diffusion_model.': {torch.float8_e4m3fn: 780}, 'text_encoders.': {torch.float32: 2, torch.float16: 197, torch.float8_e4m3fn: 219}, 'vae.': {torch.float32: 244}}
stablediff-runner-cuda  | 2024-08-31T23:13:05.726793994Z state_dict dtype: {'model.diffusion_model.': {torch.float8_e4m3fn: 780}, 'text_encoders.': {torch.float32: 2, torch.float16: 197, torch.float8_e4m3fn: 219}, 'vae.': {torch.float32: 244}}
stablediff-runner-cuda  | 2024-08-31T23:13:05.747544610Z Applying attention optimization: xformers... done.
stablediff-runner-cuda  | 2024-08-31T23:13:05.768056293Z Model loaded in 1.6s (create model: 0.2s, apply weights to model: 1.4s).
stablediff-runner-cuda  | 2024-08-31T23:13:07.553464063Z 
Total progress: 100%|██████████| 20/20 [00:45<00:00,  2.26s/it]
Total progress: 100%|██████████| 20/20 [00:45<00:00,  2.32s/it]

I have 3090 Ti / Debian testing.

@wkpark
Copy link
Owner Author

wkpark commented Aug 31, 2024

11G model has only U net weights and not compatible with A1111 and yes T5xxl text encoder not used at all , could be activated with the sd3 t5 option recently added

and you need to install official as.safetensors VAE and select it at the vae ui to fix grayed result

@rkfg
Copy link

rkfg commented Sep 1, 2024

Ah, interesting! Yes, I bound the VAE and now it works. I also tried enabling T5 but it OOMs on generation, I assume the encoder isn't unloaded as it is in CUI so they both don't fit in the VRAM. Hopefully this can be fixed relatively easily.

@rkfg
Copy link

rkfg commented Sep 1, 2024

Can run with T5XXL enabled but only with --medvram switch, otherwise it OOMs even before generation begins. Apparently 24 GB is the new mid now 😩

PS: I think it's worth adding another flag such as --medvram-flux to only apply it for Flux. The slowdown isn't very big, for me it's 10.3s without the flag for an SDXL generation vs 11.4s with it. But there's no reason to sacrifice it if SDXL fits my VRAM. I'll take a look later, it's basically one line in https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/82a973c04367123ae98bd9abdf80d9eda9b910e2/modules/lowvram.py#L21 just need to find an attribute that's only present in Flux and not other models.

@Ultron55
Copy link

Ultron55 commented Sep 4, 2024

I uploaded the model, added ae.safetensors to the VAE folder, selected it in settings, but it gives this error

state_dict dtype: {}
loading stable diffusion model: AttributeError
Traceback (most recent call last):
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\ultron\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\stable-diffusion-webui\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "C:\stable-diffusion-webui\modules\shared_items.py", line 175, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 769, in get_sd_model
    load_model()
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 893, in load_model
    use_fp8_unet = has_loadable_weights("F8", "model.diffusion_model.", state_dict=state_dict)
  File "C:\stable-diffusion-webui\modules\sd_models.py", line 448, in has_loadable_weights
    "F8": (torch.float8_e4m3fn,),
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'

@rkfg
Copy link

rkfg commented Sep 4, 2024

You probably need to update your torch

 * check supported dtypes
 * detect non_blocking
 * update autocast() to use non_blocking, target_device and current_dtype
 - check float8 unet dtype to save memory
 - check vae dtype
 * add QkvLinear class for Flux lora
 * devices.dtype_unet, dtype_vae could be considered as storage dtypes
 * use devices.dtype_inference as computational dtype
 * misc fixes to support float8 unet storage
@Ultron55
Copy link

Ultron55 commented Sep 8, 2024

You probably need to update your torch

Ok. My bad.
I was in no hurry to update myself, because before that I somehow had to roll back some of the libraries after the update - the new one was not supported.

Now I have another problem: when I try to select ae.safetensors in vae I get this

  File "C:\stable-diffusion-webui\modules\options.py", line 165, in set
    option.onchange()
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 14, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\initialize_util.py", line 182, in <lambda>
    shared.opts.onchange("sd_vae", wrap_queued_call(lambda: sd_vae.reload_vae_weights()), call=False)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 273, in reload_vae_weights
    load_vae(sd_model, vae_file, vae_source)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 212, in load_vae
    _load_vae_dict(model, vae_dict_1)
  File "C:\stable-diffusion-webui\modules\sd_vae.py", line 239, in _load_vae_dict
    model.first_stage_model.load_state_dict(vae_dict_1)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AutoencoderKL:
        Missing key(s) in state_dict: "quant_conv.weight", "quant_conv.bias", "post_quant_conv.weight", "post_quant_conv.bias".
        size mismatch for encoder.conv_out.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 512, 3, 3]).
        size mismatch for encoder.conv_out.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([8]).
        size mismatch for decoder.conv_in.weight: copying a param with shape torch.Size([512, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 4, 3, 3]).

If I try to generate like now, I get a solid gray image
315466-1793676599-bear

@wkpark
Copy link
Owner Author

wkpark commented Sep 8, 2024

you don't have to use VAE separatly now. (VAE baked checkpoint correctly supported now)

and there is some A1111 bug exist to prevent changing VAE.
in this case, change VAE to None first then change to another VAE.

@wkpark wkpark self-assigned this Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants