Skip to content

Conversation

@edersonbrilhante
Copy link
Contributor

Summary

This PR makes the list of EC2 scaling error codes configurable instead of hardcoded in the control-plane Lambda. It allows users to extend or override the default retryable error set without forcing a change on everyone.

Motivation

Issue [#4105] was closed without a PR, leaving the scale error list hardcoded.
Different environments can encounter additional EC2 error codes that should trigger retries; making this list configurable lets users adapt behavior without modifying the library code.

@edersonbrilhante edersonbrilhante marked this pull request as ready for review December 4, 2025 13:55
@edersonbrilhante edersonbrilhante requested review from a team as code owners December 4, 2025 13:55
@edersonbrilhante edersonbrilhante force-pushed the feat-custom-scale-error branch 2 times, most recently from c3b568d to 68f1f4b Compare December 4, 2025 14:04
@npalm
Copy link
Member

npalm commented Dec 13, 2025

@edersonbrilhante due to the update to the new 7.x release some breaking changes. Please can you rebase the PR? Thanks!

@edersonbrilhante
Copy link
Contributor Author

@npalm I tested and it is working fine :)

I added InsufficientFreeAddressesInSubnet in the custom_scale_errors

custom_scale_errors = [
      "UnfulfillableCapacity",
      "MaxSpotInstanceCountExceeded",
      "TargetCapacityLimitExceededException",
      "RequestLimitExceeded",
      "ResourceLimitExceeded",
      "MaxSpotInstanceCountExceeded",
      "MaxSpotFleetRequestCountExceeded",
      "InsufficientInstanceCapacity",
      "InsufficientFreeAddressesInSubnet", # Deployed in a short subnet to trigger this error
    ]

The logs(I redacted the valid ids) show it will was sent back to queue because accepted this error: InsufficientFreeAddressesInSubnet


{"level":"WARN","message":"Failed to create instance, create fleet failed. (Failed to create 12 instances) A retry will be attempted via SQS.","timestamp":"2025-12-16T12:59:03.030Z","service":"runners-scale-up","sampling_rate":0,"xray_trace_id":"1-00000000-000000000000000000000000","region":"eu-west-1","environment":"acme-standard","aws-request-id":"00000000-0000-0000-0000-000000000000","function-name":"acme-standard-scale-up","module":"lambda.ts","error":


{"name":"ScaleError","location":"file:///var/task/index.js:124990","message":"Failed to create instance, create fleet failed.","stack":"ScaleError: Failed to create instance, create fleet failed.\n    at processFleetResult (file:///var/task/index.js:124990:15)\n    at createRunner (file:///var/task/index.js:124946:29)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async createRunners (file:///var/task/index.js:131774:23)\n    at async scaleUp (file:///var/task/index.js:131912:27)\n    at async BufferedInvokeProcessor.scaleUpHandler [as handler] (file:///var/task/index.js:132732:36)\n    at async BufferedInvokeProcessor.processInvoke (file:///var/runtime/index.mjs:1092:22)\n    at async _Runtime.processSingleConcurrent (file:///var/runtime/index.mjs:1178:7)\n    at async _Runtime.start (file:///var/runtime/index.mjs:1165:7)\n    at async ignition (file:///var/runtime/index.mjs:1634:5)","failedInstanceCount":12}}


{"level":"WARN","message":"Create fleet failed, ScaleError will be thrown to trigger retry for ephemeral runners.","timestamp":"2025-12-16T12:59:03.030Z","service":"runners-scale-up","sampling_rate":0,"xray_trace_id":"1-00000000-000000000000000000000000","region":"eu-west-1","environment":"acme-standard","module":"runners","aws-request-id":"00000000-0000-0000-0000-000000000000","function-name":"acme-standard-scale-up","runner":{"ephemeral":true,"type":"Org","namePrefix":"","n_events":1}}
{"level":"WARN","message":"No instances created.","timestamp":"2025-12-16T12:59:03.029Z","service":"runners-scale-up","sampling_rate":0,"xray_trace_id":"1-00000000-000000000000000000000000","region":"eu-west-1","environment":"acme-standard","module":"runners","aws-request-id":"00000000-0000-0000-0000-000000000000","function-name":"acme-standard-scale-up","runner":{"ephemeral":true,"type":"Org","namePrefix":"","n_events":1},"data":{"$metadata":{"httpStatusCode":200,"requestId":"00000000-0000-0000-0000-000000000000","attempts":1,"totalRetryDelay":0},"FleetId":"fleet-00000000-0000-0000-0000-000000000000","Errors":[{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.medium","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.medium","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.large","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.large","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"m5.large","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"m5.large","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.xlarge","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.xlarge","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"m5.xlarge","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"m5.xlarge","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.2xlarge","SubnetId":"subnet-11111111111111111"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-11111111111111111' to satisfy the requested number of instances."},{"LaunchTemplateAndOverrides":{"LaunchTemplateSpecification":{"LaunchTemplateId":"lt-00000000000000000","Version":"16"},"Overrides":{"InstanceType":"t3.2xlarge","SubnetId":"subnet-00000000000000000"}},"Lifecycle":"on-demand","ErrorCode":"InsufficientFreeAddressesInSubnet","ErrorMessage":"There are not enough free addresses in subnet 'subnet-00000000000000000' to satisfy the requested number of instances."}],"Instances":[]}}

npalm
npalm previously approved these changes Dec 18, 2025
Copy link
Member

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edersonbrilhante @guicaulada looks good to me. Left a commetn. But this approach is fine as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants