Skip to content

Comments

Split provisioning and execution#55

Merged
ispasov merged 3 commits intomainfrom
is/split-execution
Feb 18, 2026
Merged

Split provisioning and execution#55
ispasov merged 3 commits intomainfrom
is/split-execution

Conversation

@ispasov
Copy link
Collaborator

@ispasov ispasov commented Feb 17, 2026

Description

  • Split provisioning and execution

Previously the provisioning and execution were bundled together in the same func.
This leads to several issues:

  1. When the execution fails, the provisioning tried to deploy a new VM, thinking the provisioning of the VM failed.
  2. When the execution fails due to ssh connection, a new VM was either deployed or the running one was deleted, although it was running an actual active job.

Now the provisioning retries until a VM and a runner are present.
Then the execution is performed.
The resources are cleaned if the execution completes. If there is an SSH error then the VM is not deleted and it is relied on the JobCompleted event to clean up the resources.
This ensures active VMs are not deleted and the jobs are not lost.

Finally, add more logging

Testing

Case 1

  1. Start the runner make run
  2. Run a job and ensure it completes

Case 2

  1. Ensure you have a job that takes at least 10-15 seconds.
  2. Start it
  3. Follow the runner logs, the moment you see that the runner has started, stop the VPN connection.
  4. The job should complete and the runner should be deleted together with the VM

Case 3

  1. Use the same job
  2. The moment the runner starts the job, delete the VM
  3. You should not see any attempts to cleanup the VM
  4. Stop the runner and cancel the job

Case 4

  1. Use a job that fails
  2. Ensure the runner correctly deletes the VM and the job fails successfully

@ispasov ispasov requested a review from a team as a code owner February 17, 2026 13:32
@ispasov ispasov force-pushed the is/split-execution branch 2 times, most recently from a28d11f to f046ae3 Compare February 17, 2026 13:46
@ispasov ispasov marked this pull request as draft February 17, 2026 13:58
@ispasov ispasov marked this pull request as ready for review February 17, 2026 14:42
@ispasov ispasov marked this pull request as draft February 17, 2026 14:53
@ispasov ispasov marked this pull request as ready for review February 17, 2026 15:32

select {
case <-provisioningContext.Done():
if executionErr != nil && !errors.As(executionErr, &exitErr) && !errors.Is(executionErr, context.Canceled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to wrap this into a private function so that the behavior is a little more semantic

@ispasov ispasov force-pushed the is/add-more-logging branch from 20f5f0c to 8676f77 Compare February 17, 2026 16:35
Base automatically changed from is/add-more-logging to main February 17, 2026 16:48
Previously the provisioning and execution were bundled together in the same func.
This leads to several issues:
1. When the execution fails, the provisioning tried to deploy a new VM, thinking the provisioning of the VM failed.
2. When the execution fails due to ssh connection, a new VM was either deployed or the running one was deleted, although it was running an actual active job.

Now the provisioning retries until a VM and a runner are present.
Then the execution is performed.
The resources are cleaned if the execution completes. If there is an SSH error then the VM is not deleted and it is relied on the JobCompleted event to clean up the resources.
This ensures active VMs are not deleted and the jobs are not lost.

Finally, add more logging
@ispasov ispasov merged commit 46b174f into main Feb 18, 2026
3 of 4 checks passed
@ispasov ispasov deleted the is/split-execution branch February 18, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants