-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Fix: Add memory precheck before VAE decode to prevent crash #12109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix: Add memory precheck before VAE decode to prevent crash #12109
Conversation
|
We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns. The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates. #11845 No AMD support yet though. Is there a path forward on pytorch being able to allocate with clean exception on OOM? Which VAEs are the worst offenders and how much VRAM are you generally trying to support? |
Not the OP but, the most VRAM is consumed by the video VAEs, all the new image VAEs are very efficient by default. On another note, since you recently reduced the LTX 2 VAE VRAM consumption by 1/3 , ComfyUI still offloads the whole model before decoding with the LTX 2 VAE (likely because the VAE memory estimation wasn't changed), this is critical for low 16gb VRAM users since the model and TEs cannot be kept within 32GB RAM and the model spills onto pagefile. A custom node that sets the amount of model to offload before decoding will be so nice to have since LTX 2 VAE literally takes only 3GB VRAM when decoding with tiled decoding, but ComfyUI still offloads the whole model into RAM and pagefile. |
| #NOTE: We don't know what tensors were allocated to stack variables at the time of the | ||
| #exception and the exception itself refs them all until we get out of this except block. | ||
| #So we just set a flag for tiler fallback so that tensor gc can happen once the | ||
| #exception is fully off the books. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to set do_tile = True here to actually do the tiled VAE retry.
I think this patch would be fairly helpful on AMD especially. Some VAE VRAM estimates with AMD seem to be kind of bonkers; the Flux VAE requests 11.6GB of VRAM to decode a 1 megapixel image and somehow I don't think it actually uses anywhere near that much.
EDIT: I just did a quick memory dump after a VAE decode. Torch maximum memory usage was about 6.6GB, and that would probably include the loaded VAE model and anything else that might be in VRAM. I'm not sure how to accurately tell what the actual VAE decoding used, but clearly not 11.6GB
Adding memory precheck before VAE decode to prevent Windows
0xC0000005access violation crashes, particularly on devices with limited VRAM.Problem
VAE decode loading could trigger
0xC0000005(access violation) crashes when:--highvram,--gpu-only, or insufficient CPU RAM)The existing OOM exception handling couldn't catch these crashes because they occur at the driver/system level before PyTorch can raise an exception.
Solution
Added a proactive memory check (
use_tiled_vae_decode()) that evaluates memory conditions before attempting decode:--highvram,--gpu-onlyflags)--disable-smart-memory)If any condition fails, switch to tiled VAE decode preemptively.