-
Notifications
You must be signed in to change notification settings - Fork 84
Add lease duration to worker heartbeat request #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2467,6 +2467,25 @@ message RecordWorkerHeartbeatRequest { | |
| string identity = 2; | ||
|
|
||
| repeated temporal.api.worker.v1.WorkerHeartbeat worker_heartbeat = 3; | ||
|
|
||
| // Duration for which the worker lease should be valid. During this time, the server considers the worker to be active. | ||
| // The worker is expected to send periodic heartbeats to renew its lease before it expires. | ||
| // | ||
| // Server will calculate the actual expiration time based on when it receives this request. | ||
| // If not specified or zero, the server will use a default lease duration of 1 minute. | ||
| // | ||
| // There are 3 states for a worker: Active, Inactive, and CleanedUp. | ||
| // Lifecycle transitions: | ||
| // - Active->Active: Each time the server receives a heartbeat from the worker, it will renew the lease and keep the worker in the active state. | ||
| // | ||
| // - Active->Inactive: When the lease expires, the server will consider the worker to be inactive, and reschedule activities that were known to be running as of that time. | ||
| // | ||
| // - Inactive->Active: If the server receives subsequent heartbeat from this worker, then it will transition it back to the active state. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would call out that once the state machine transitions to inactive, any activity task that was assigned to this worker is considered failed and should be canceled by the worker. The actual protocol for syncing the activity task inventory should be defined sooner rather than later IMHO. |
||
| // | ||
| // - Inactive->CleanedUp: If the worker remains inactive for a prolonged period, the server will cleanup the worker state. This is a terminal state. | ||
| // If the server receives subsequent heartbeat from this worker, then it will always return an non-retryable FailedPrecondition error. | ||
| // The worker will need to shutdown and re-register using a different WorkerInstanceKey to become active again. | ||
| google.protobuf.Duration lease_duration = 4; | ||
| } | ||
|
|
||
| message RecordWorkerHeartbeatResponse { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this already true? If not we may want to go back and update the comment after it is, or, alternatively, let's hold off merging this until the implementation behind it is also ready.
Love the detail in the comment now though!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, none of this is implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all but the most obvious things, we usually would like an accompanying server implementation to ensure that the API matches what we want. We learn a lot during implementation and it can cause problems providing "under development" fields. We find that we so often cut releases of API that people will employ these models into their proxies and in other ways that makes backwards incompatible alterations problematic.
Would strongly suggest at least having server implementation built before merging this. For larger efforts, even with the server implementation built, we may want it in a separate branch so SDKs can design against it and we can apply any API learnings from that, but not sure this qualifies for such a need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I have converted this into a draft PR, so you can still review it. I will not be merging this.