Skip to content

FEATURE REQUEST: VAE GGUF Loader #406

@OrsoEric

Description

@OrsoEric

First, I want to say thanks @city96 for your GGUF loaders for CLIP and Model. They are a godsend.

I am running a 7900XTX ROCm 7.1 Windows, and I found no way to accelerate FP8, they auto converts to BF16 acceleration that uses 2B per parameter and lots of VRAM. And ROCM isn't good at VRAM, AMD is improving, slowly.

With your GGUF loader my 7900XTX is able to use the INT8 hardware acceleration, and finally get that memory footprint down and stabilize workflow execution. At least with Zimage it works so much better using GGUF with both CLIP and Model.

Image

VAE Issues under ROCm

A persistent issue I have with ROCm acceleration, is poor performance on VAE decode. I'm under the impression VAE decode is almost instant under Nvidia CUDA, meaning nobody looked into doing GGUF quantization for the VAE model as far as I can tell, I looked into found no quants, those are FP32, FP16 or BF16 models.

On AMD ROCm, VAE decode is a slow and expensive step requiring lots of extra VRAM and causing RAM spillage.

On 6.4 I found a workaround, on 7.1 Flux and Zimage Turbo VAE decode work, even if they spill into RAM even at moderate resolution 1024px.

Qwen Image VAE for some reasons requires much more RAM, even at 1024px it fills 24GB VRAM and 64GB RAM on my system and goes into OOM and segmentation fault.

VAE GGUF Loader

Would it be possible to have VAE GGUF loader to feed the VAE encode and VAE decode nodes and do the GGUF quantization of VAE models?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions