Skip to content

Comments

[ML] Add per allocation and per deployment memory metadata fields to …#6

Open
MitchLewis930 wants to merge 1 commit intopr_016_beforefrom
pr_016_after
Open

[ML] Add per allocation and per deployment memory metadata fields to …#6
MitchLewis930 wants to merge 1 commit intopr_016_beforefrom
pr_016_after

Conversation

@MitchLewis930
Copy link

@MitchLewis930 MitchLewis930 commented Jan 30, 2026

PR_016


Note

Medium Risk
Touches ML deployment task serialization (new transport version) and changes the memory estimation formula used for allocation/stats, which could affect deployment sizing and autoscaling decisions.

Overview
Adds support for model-provided memory requirements in ML deployments. StartTrainedModelDeploymentAction.TaskParams now carries per_deployment_memory_bytes and per_allocation_memory_bytes, serializes them behind a new transport version (V_8_500_064), and includes them in toXContent/parsing.

Updates required native memory estimation. estimateMemoryUsageBytes(...) now takes the new metadata plus number_of_allocations and computes max(240MB + 2*model_size, per_deployment + per_allocation*allocations + model_size) (with ELSER v1 still pinned to a fixed value), and TransportGetTrainedModelsStatsAction wires this into the required_native_memory_bytes stats calculation.

Deployment start/updates propagate the metadata from TrainedModelConfig into task params across assignment/task update paths, and tests/QA add coverage for the new estimation behavior and request helpers.

Written by Cursor Bugbot for commit 2e13a9f. This will update automatically on new commits. Configure here.

…the trained models config (elastic#98139)

To improve the required memory estimation of NLP models, this PR introduces two new metadata fields: per_deployment_memory_bytes and per_allocation_memory_bytes.

per_deployment_memory_bytes is the memory required to load the model in the deployment
per_allocation_memory_bytes is the temporary additional memory used during the inference for every allocation.

This PR extends the memory usage estimation logic while ensuring backward compatibility.

In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

parentTaskId,
modelSizeStatsListener
modelSizeStatsListener,
numberOfAllocations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global allocation sum used for per-model memory estimation

High Severity

The numberOfAllocations is calculated by summing allocations across ALL deployments, but this single total is then used to calculate memory estimates for EACH individual model. When a model has perAllocationMemoryBytes set, the memory formula uses perAllocationMemoryBytes * numberOfAllocations, so using the global sum instead of each model's specific allocation count produces incorrect memory estimates. For example, if Model A has 2 allocations and Model B has 3 allocations, both models would incorrectly use 5 allocations in their memory calculation.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants