fix: timing issue in S3 locker to prevent zombie locks #812

itslenny · 2025-12-10T19:02:25Z

What kind of change does this PR introduce?

Bug fix

What is the current behavior?

There is a potential race condition in the S3 locker that can cause zombie locks to be created if the lock is unlocked during the renewal operation (Between get and put)

What is the new behavior?

Check that the lock is still locked after the S3 get operation completes and exit if it was already unlocked

Additional context

This issue was causing intermittent test failures which would manifest in the "double unlock" test but was actually caused by the abort test leaving behind zombie objects. Running the S3 locker tests in a loop I was able to reproduce this semi-consistently. Usually it would take only a few test runs, but sometimes it'd go upwards of 50 runs without a failure.

The newly added test expands on the "handle abort signal" test and consistently fails without this fix.

coveralls · 2025-12-10T19:11:24Z

Pull Request Test Coverage Report for Build 20110090962

Details

8 of 8 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 75.96%

Totals
Change from base Build 20069848970:	0.03%
Covered Lines:	25433
Relevant Lines:	33205

💛 - Coveralls

fenos · 2025-12-10T19:45:12Z

src/storage/protocols/tus/s3-locker.ts

  }

-  async renewLock(id: string): Promise<boolean> {
+  async renewLock(id: string, checkLocked: () => boolean): Promise<boolean> {


Any specific reason why we need to pass a function a not check directly this.isLocked?

This is in renewLock() wich is a part of S3Locker, but isLocked is part of S3Lock. So when S3Lock calls this.locker.renewLock() it passes in a function to allow S3Locker to check the lock state of S3Lock.

Alternatives would be

move the lock states to S3Locker with a map<string, boolean> (id -> isLocked) and share it that way

pass in the whole S3Lock instance (instead of checkLocked) and expose isLocked from S3Lock (currently private) so the locker can check the lock directly

I think this fix wouldn't work in a distributed environment, it would only work on a single instance setup.
I think the proper fix here is to use If-Match directive in the PutObject after retrieving the current lock, and only write the file if the etag matches with the current lock. ex:

// Get current lock to verify ownership const response = await this.s3Client.send( new GetObjectCommand({ Bucket: this.bucket, Key: lockKey, }) ) // Update the lock with new expiration await this.s3Client.send( new PutObjectCommand({ Bucket: this.bucket, Key: lockKey, IfMatch: response.Etag, // <--- here is the trick Body: JSON.stringify(updatedLock), ContentType: 'application/json', Metadata: { lockId: id, expiresAt: updatedLock.expiresAt.toString(), }, }) )

This way it won't create the zombie lock if there isn't a file with the same etag and it should work in a distributed environment

We might also want to add a check if (id === currentLock.id) before sending the put

fix: timing issue in S3 locker to prevent zombie locks

1bf4fb0

fenos reviewed Dec 11, 2025

View reviewed changes

itslenny requested a review from fenos December 12, 2025 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: timing issue in S3 locker to prevent zombie locks #812

fix: timing issue in S3 locker to prevent zombie locks #812

itslenny commented Dec 10, 2025

Uh oh!

coveralls commented Dec 10, 2025

Uh oh!

fenos Dec 10, 2025

Uh oh!

itslenny Dec 12, 2025 •

edited

Loading

Uh oh!

fenos Dec 15, 2025 •

edited

Loading

Uh oh!

fenos Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

fix: timing issue in S3 locker to prevent zombie locks #812

Are you sure you want to change the base?

fix: timing issue in S3 locker to prevent zombie locks #812

Conversation

itslenny commented Dec 10, 2025

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

Uh oh!

coveralls commented Dec 10, 2025

Pull Request Test Coverage Report for Build 20110090962

Details

💛 - Coveralls

Uh oh!

fenos Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

itslenny Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fenos Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fenos Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

itslenny Dec 12, 2025 •

edited

Loading

fenos Dec 15, 2025 •

edited

Loading