-
-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Summary
I am the maintainer of graphile_worker_rs, a Rust rewrite of graphile/worker.
A user recently reported a bug, and I think this exposes a race condition in the keyed scheduling path (add_jobs / add_job).
Original report: leo91000/graphile_worker_rs#378
Steps to reproduce
- Start PostgreSQL and initialize Graphile Worker schema.
- Run a worker with concurrency
10and two tasks:
printer: very short task (e.g. sleeps ~2ms) so keyed jobs frequently become locked/running.scheduler: loops many times (e.g.100), each time callingaddJob("printer", { key }, { jobKey: key, jobKeyMode: "preserve_run_at" })with key chosen from a small keyspace (e.g. 10 keys).
- Enqueue multiple
schedulerjobs concurrently (e.g. 4). - Let it run for ~30-60 seconds.
This creates high contention on the same jobKey while some conflicting rows are locked.
Expected results
addJob(...)should always return a valid job row.- For
replace/preserve_run_at, scheduling should not occasionally return “no row” under contention.
Actual results
Under contention, graphile_worker.add_jobs(...) can return no row for a spec because of:
ON CONFLICT (key) DO UPDATE ... WHERE jobs.locked_at IS NULL
When that WHERE condition is false, the conflict path does nothing and returns nothing for that spec.
Then add_job(...) (which selects from add_jobs(...) LIMIT 1) can return a null/empty row.
In strict clients this surfaces clearly (example from Rust/sqlx):
error occurred while decoding column "id": unexpected null; try decoding as an Option
In JS this can manifest as rows[0] missing from add_job(...) result in edge cases.
Additional context
- Reproduced from issue
https://github.com/leo91000/graphile_worker_rs/issues/378 - Reproduced against upstream SQL shape in
sql/000018.sql: add_jobdelegates toadd_jobs(select * into v_job from ...add_jobs(...))add_jobsusesON CONFLICT (key) DO UPDATE ... WHERE jobs.locked_at is null- Current repo version checked:
0.17.0-rc.0(frompackage.json) - PostgreSQL: reproduced on Docker Postgres (15/16 class behavior)