Skip to content

Conversation

@uael
Copy link
Contributor

@uael uael commented Dec 9, 2025

Description
Both revert #7842 and propose a proper fix by explicitly retaining resources used by command buffers.

When retain_command_buffer_references is false (the new default after the revert), Metal's command buffers don't automatically retain the resources they reference.
Previously, this was "fixed" by setting retain_command_buffer_references to true, but since command buffers are pooled and reused, this caused unbounded memory growth - every resource ever touched by a command buffer stayed alive indefinitely.

This commit implements proper resource lifetime tracking:

  • Track buffers and textures used during command encoding
  • Transfer references to CommandBuffer when encoding completes
  • Release references when command buffer is recycled

Testing
Manually

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

uael added 2 commits December 9, 2025 08:31
When `retain_command_buffer_references` is false (the default), Metal's
command buffers don't automatically retain the resources they reference.
Previously, this was "fixed" by setting `retain_command_buffer_references`
to true, but since command buffers are pooled and reused in wgpu, this
caused unbounded memory growth - every resource ever touched by a command
buffer stayed alive indefinitely.

This commit implements proper resource lifetime tracking:
- Track buffers and textures used during command encoding
- Transfer references to CommandBuffer when encoding completes
- Release references when command buffer is recycled
@ErichDonGubler
Copy link
Member

Tentatively assigning to @andyleiserson, who handled #7842.

@andyleiserson
Copy link
Contributor

since command buffers are pooled and reused, this caused unbounded memory growth - every resource ever touched by a command buffer stayed alive indefinitely

This is obviously bad, but I'm surprised we hadn't noticed. Do you have a test case?

I don't think it should be necessary to add new tracking for used textures and buffers. We already have a "tracker" that is keeping track of them. The problem was that there are corner cases where we don't get it quite right. Vulkan is also sensitive to this issue and doesn't have any ability to track references internally, which has motivated fixing a bunch of the issues. When we turn the retain references flag back off, Metal will also benefit from those fixes.

#7854 (comment) describes one case where we suspected Metal has a problem that differs from Vulkan. It would be worth running the test case from the issue that comment links to (#7816) against whatever change we're considering, to make sure the issue doesn't come back.

Other relevant changes are #8418 and #8567.

@uael
Copy link
Contributor Author

uael commented Dec 10, 2025

To give a bit more context we got this crash on v25:

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:3459: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x1220c7a00 (label: (wgpu internal) Signal):
<MTLToolsObject: 0x600002b02450> -> <MTLSimBuffer: 0x60000382c300>
    label = Render Pass Vertex Buffer 
    length = 36 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    purgeableState = MTLPurgeableStateNonVolatile'
CoreSimulator 1048 - Device: iPad mini (A17 Pro) (395B57A4-D87A-4845-90CB-168FA4CE7140) - Runtime: iOS 26.0 (23A339) - DeviceType: iPad mini (A17 Pro)
Can't show file for stack frame : <DBGLLDBStackFrame: 0x827be4000> - stackNumber:12 - name:core::ptr::drop_in_place$LT$wgpu_hal..metal..Buffer$GT$::h4d4355beee563fc7 [inlined]. The file path does not exist on the file system: /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/core/src/ptr/mod.rs

As an attempt to fix it I blindly back-ported #7842, which resulted to "leaking" resources as explained above. Maybe I was just missing #8220 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants