DAP-16 Job State Documentation Updates by jcjones · Pull Request #4323 · divviup/janus

jcjones · 2026-02-04T17:35:31Z

I wrote some more documentation on the state transitions in the AggregationJobState database model.

Resolves #4322

Now that DAP-16 allows for job reacquisition / idempotency on the HTTP layer, a job in AwaitingRequest has to be continuable directly without error. E.g., the database needs to include it in `acquire_incomplete_aggregation_jobs`. While I was at it, I wrote some more documentation on the state transitions in the `AggregationJobState` database model. These changes unblock further job state updates for the leader, but do not go so far as to remove the no-longer-relevant AwaitingRequest state, which will be done separately in #4305. Resolves #4322

divergentdave · 2026-02-05T17:49:49Z

aggregator_core/src/datastore.rs

+    -- DAP (§4.6.3 [dap-16]) describes the continuation phase where the Leader sends
+    -- AggregationJobContinueReq messages to advance preparation. Helper jobs in
+    -- AWAITING_REQUEST state represent work that has processed one round but needs
+    -- additional Leader requests to complete. These jobs must be acquirable so the
+    -- Helper can respond to incoming continuation requests.
+    WHERE aggregation_jobs.state IN ('ACTIVE', 'AWAITING_REQUEST')


I don't think this change makes sense for helper tasks. The acquire_incomplete_aggregation_jobs() method is only used in the AggregationJobDriver, when fetching leases on aggregation jobs to process. When it steps an aggregation job, it will eventually set the aggregation job state to AwaitingRequest, in either step_aggregation_job_helper_init() or step_aggregation_job_helper_continue(). I'm not sure what these routines would do if they run on the same set of report aggregations, but assuming they don't return an error and eventually mark the job as abandoned, the same aggregation jobs would still be eligible to be picked up by this query again, which would impair the liveness of aggregation jobs in the same task.

This method does not directly affect how the HTTP route handler works, so we shouldn't need to change it at all for request idempotency reasons. Rather, that code path would fetch an individual aggregation job by its identifiers.

divergentdave · 2026-02-05T17:58:40Z

aggregator_core/src/datastore/models.rs

+    /// phase (§4.6.3 [dap-16]). The Helper has sent an AggregationJobResp but some reports
+    /// remain in the Continued state awaiting the next AggregationJobContinueReq.
+    ///
+    /// Note: This state will be removed as part of the DAP-16 state model redesign (#4305).


I think we will need two states like Active and AwaitingRequest for at least some of our modes of operation, particularly when operating as the helper and doing asynchronous processing. We need to hop back and forth between them as we wait to do computationally expensive work in the aggregation job driver or wait to get polled after finishing that work, until we finish all VDAF rounds. Note that we also need to do this when handling the aggregation job initialization request, not just the continuation request. The aggregation job driver needs some way of efficiently finding jobs that are ready for it to process, and we currently achieve that with the aggregation_jobs_state_and_lease_expiry partial index.

@divergentdave

@divergentdave has corrected my interpretation of the state machine.

divergentdave · 2026-02-06T23:54:10Z

aggregator_core/src/datastore/models.rs

+    /// Job is being actively processed. This is the initial state for both Leader and Helper
+    /// aggregation jobs. Corresponds to the initialization phase in DAP (§4.6.2 [dap-16]).


This isn't always the initial state. That only holds for Leader aggregation jobs and for Helper aggregation jobs when using the asynchronous aggregation mode. For Helper aggregation jobs when using the synchronous aggregation mode, the aggregation job starts in either AwaitingRequest or Finished (depending on the number of rounds). This state moreso indicates that the job is ready for the aggregation job driver to pick up.

tgeoghegan

I see how this PR adds aggregation job state transitions in step_aggregation_job_helper_init and step_aggregation_job_helper_continue. But I don't see the corresponding change to make the aggregation_job_writer stop evaluating those state changes. Should WriteState::update_aggregation_job_state_from_report_aggregations be changed?

…-state-updates

jcjones · 2026-02-12T16:05:30Z

This has been trimmed down to just the documentation updates.

tgeoghegan

Some wording nits to ponder

tgeoghegan · 2026-02-12T17:44:41Z

aggregator_core/src/datastore/models.rs

+/// corresponds to the AGGREGATION_JOB_STATE enum in the schema.
+///
+/// These are implementation-specific states used for Janus's internal state management.
+/// DAP §4.6 [dap-16] defines aggregation job completion in terms of individual report


nit: could make this a link to draft 16 in the Datatracker

Could, but that's verbose and mostly I'm trying to add easy-to-grep tags for our future use than to live in a hypertext utopia.

tgeoghegan · 2026-02-12T17:45:31Z

aggregator_core/src/datastore/models.rs

 #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, ToSql, FromSql)]
 #[postgres(name = "aggregation_job_state")]
 pub enum AggregationJobState {
+    /// Job is ready for the aggregation job driver to pick up. Corresponds to the


nit: "pick up" is perhaps ambiguous. What we mean is that the job is ready for the aggregation job driver to run, or to drive, right?

Had to use drive since you basically gave me permission. :)

tgeoghegan · 2026-02-12T17:47:46Z

aggregator_core/src/datastore/models.rs

    #[postgres(name = "AWAITING_REQUEST")]
    AwaitingRequest,
+    /// All report aggregations have reached a terminal state (Finished or Failed), completing
+    /// the aggregation job lifecycle (§4.6 [dap-16]). Output shares are committed to batch


Suggested change

/// the aggregation job lifecycle (§4.6 [dap-16]). Output shares are committed to batch

/// the aggregation job lifecycle (§4.6 [dap-16]). Output shares have been committed to batch

I think the tense matters in that if we see an aggregation job in state Finished, output shares from its constituent report aggregations have been, at some point in time previous to when the Finished job is observed, computed and committed, which has implications for handling of subsequent aggregation jobs. The tense "are committed" leaves it unclear when the commitment happens (perhaps the entity observing the Finished state is expected to do so?)

tgeoghegan · 2026-02-12T17:50:17Z

aggregator_core/src/datastore/models.rs

+    /// Job has been marked for deletion and should not be processed further. This is a terminal
+    /// state used during cleanup.


By cleanup, do we mean garbage collection? Or is this also possible if someone sends DELETE /tasks/{task-id}/aggregation_jobs/{agg-job-id}? That's an honest question, I can't remember. Anyway, if it's just those two, we could afford a few more words here explaining how a job enters this state.

/// Job has been marked for deletion, either by garbage collection or by using the HTTP /// DELETE endpoint, and should not be processed further. This is a terminal state used /// during cleanup.

tgeoghegan

🚢

jcjones marked this pull request as ready for review February 4, 2026 18:39

jcjones requested a review from a team as a code owner February 4, 2026 18:39

divergentdave reviewed Feb 5, 2026

View reviewed changes

Keep the driver only acquiring ACTIVE jobs

ed0763e

@divergentdave has corrected my interpretation of the state machine.

divergentdave reviewed Feb 6, 2026

View reviewed changes

tgeoghegan reviewed Feb 9, 2026

View reviewed changes

tgeoghegan mentioned this pull request Feb 9, 2026

DAP-16 Leader Job State Updates #4324

Closed

jcjones added 3 commits February 10, 2026 12:23

Merge remote-tracking branch 'origin/main' into 4322-dap16-helper-job…

0e278f4

…-state-updates

Address @divergentdave's correction about the Active state

6d0277d

Trim down to only doc updates to AggregationJobState

bb52416

jcjones changed the title ~~DAP-16 Helper Job State Updates~~ DAP-16 Job State Documentation Updates Feb 12, 2026

jcjones requested review from divergentdave and tgeoghegan February 12, 2026 16:05

jcjones linked an issue Feb 12, 2026 that may be closed by this pull request

DAP-16: Aggregation Job State Updates #4022

Open

divergentdave approved these changes Feb 12, 2026

View reviewed changes

tgeoghegan approved these changes Feb 12, 2026

View reviewed changes

Wording updates from @tgeoghegan

37cb664

tgeoghegan approved these changes Feb 12, 2026

View reviewed changes

		/// Job is being actively processed. This is the initial state for both Leader and Helper
		/// aggregation jobs. Corresponds to the initialization phase in DAP (§4.6.2 [dap-16]).

	/// the aggregation job lifecycle (§4.6 [dap-16]). Output shares are committed to batch
	/// the aggregation job lifecycle (§4.6 [dap-16]). Output shares have been committed to batch

		/// Job has been marked for deletion and should not be processed further. This is a terminal
		/// state used during cleanup.

Conversation

jcjones commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgeoghegan left a comment

Choose a reason for hiding this comment

Uh oh!

jcjones commented Feb 12, 2026

Uh oh!

tgeoghegan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgeoghegan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcjones commented Feb 4, 2026 •

edited

Loading