Fix: Add memory precheck before VAE decode to prevent crash #12109

tvukovic-amd · 2026-01-27T09:26:51Z

Adding memory precheck before VAE decode to prevent Windows 0xC0000005 access violation crashes, particularly on devices with limited VRAM.

Problem

VAE decode loading could trigger 0xC0000005 (access violation) crashes when:

GPU memory was insufficient for full decode
Models couldn't be offloaded to CPU (due to --highvram, --gpu-only, or insufficient CPU RAM)

The existing OOM exception handling couldn't catch these crashes because they occur at the driver/system level before PyTorch can raise an exception.

Solution

Added a proactive memory check (use_tiled_vae_decode()) that evaluates memory conditions before attempting decode:

Check if GPU has enough space for full decode (with reserves)
Check if models can be offloaded to CPU (respects --highvram, --gpu-only flags)
Check if CPU has enough space to receive offloaded models (respects --disable-smart-memory)

If any condition fails, switch to tiled VAE decode preemptively.

rattus128 · 2026-01-28T03:36:36Z

We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns.

The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates.

#11845
https://pypi.org/project/comfy-aimdo/

No AMD support yet though.

Is there a path forward on pytorch being able to allocate with clean exception on OOM?

Which VAEs are the worst offenders and how much VRAM are you generally trying to support?

MeiYi-dev · 2026-01-29T08:22:49Z

We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns.

The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates.

#11845 https://pypi.org/project/comfy-aimdo/

No AMD support yet though.

Is there a path forward on pytorch being able to allocate with clean exception on OOM?

Which VAEs are the worst offenders and how much VRAM are you generally trying to support?

Not the OP but, the most VRAM is consumed by the video VAEs, all the new image VAEs are very efficient by default. On another note, since you recently reduced the LTX 2 VAE VRAM consumption by 1/3 , ComfyUI still offloads the whole model before decoding with the LTX 2 VAE (likely because the VAE memory estimation wasn't changed), this is critical for low 16gb VRAM users since the model and TEs cannot be kept within 32GB RAM and the model spills onto pagefile. A custom node that sets the amount of model to offload before decoding will be so nice to have since LTX 2 VAE literally takes only 3GB VRAM when decoding with tiled decoding, but ComfyUI still offloads the whole model into RAM and pagefile.

asagi4 · 2026-01-29T15:49:12Z

comfy/sd.py

+                #NOTE: We don't know what tensors were allocated to stack variables at the time of the
+                #exception and the exception itself refs them all until we get out of this except block.
+                #So we just set a flag for tiler fallback so that tensor gc can happen once the
+                #exception is fully off the books.


I think you need to set do_tile = True here to actually do the tiled VAE retry.

I think this patch would be fairly helpful on AMD especially. Some VAE VRAM estimates with AMD seem to be kind of bonkers; the Flux VAE requests 11.6GB of VRAM to decode a 1 megapixel image and somehow I don't think it actually uses anywhere near that much.

EDIT: I just did a quick memory dump after a VAE decode. Torch maximum memory usage was about 6.6GB, and that would probably include the loaded VAE model and anything else that might be in VRAM. I'm not sure how to accurately tell what the actual VAE decoding used, but clearly not 11.6GB

fix: add memory precheck before VAE decode to prevent crash

d80df37

tvukovic-amd requested review from Kosinkadink, comfyanonymous and guill as code owners January 27, 2026 09:26

asagi4 reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Add memory precheck before VAE decode to prevent crash #12109

Fix: Add memory precheck before VAE decode to prevent crash #12109

tvukovic-amd commented Jan 27, 2026

Uh oh!

rattus128 commented Jan 28, 2026

Uh oh!

MeiYi-dev commented Jan 29, 2026 •

edited

Loading

Uh oh!

asagi4 Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix: Add memory precheck before VAE decode to prevent crash #12109

Are you sure you want to change the base?

Fix: Add memory precheck before VAE decode to prevent crash #12109

Conversation

tvukovic-amd commented Jan 27, 2026

Problem

Solution

Uh oh!

rattus128 commented Jan 28, 2026

Uh oh!

MeiYi-dev commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asagi4 Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MeiYi-dev commented Jan 29, 2026 •

edited

Loading

asagi4 Jan 29, 2026 •

edited

Loading