[ML] Add per allocation and per deployment memory metadata fields to …#6
[ML] Add per allocation and per deployment memory metadata fields to …#6MitchLewis930 wants to merge 1 commit intopr_016_beforefrom
Conversation
…the trained models config (elastic#98139) To improve the required memory estimation of NLP models, this PR introduces two new metadata fields: per_deployment_memory_bytes and per_allocation_memory_bytes. per_deployment_memory_bytes is the memory required to load the model in the deployment per_allocation_memory_bytes is the temporary additional memory used during the inference for every allocation. This PR extends the memory usage estimation logic while ensuring backward compatibility. In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| parentTaskId, | ||
| modelSizeStatsListener | ||
| modelSizeStatsListener, | ||
| numberOfAllocations |
There was a problem hiding this comment.
Global allocation sum used for per-model memory estimation
High Severity
The numberOfAllocations is calculated by summing allocations across ALL deployments, but this single total is then used to calculate memory estimates for EACH individual model. When a model has perAllocationMemoryBytes set, the memory formula uses perAllocationMemoryBytes * numberOfAllocations, so using the global sum instead of each model's specific allocation count produces incorrect memory estimates. For example, if Model A has 2 allocations and Model B has 3 allocations, both models would incorrectly use 5 allocations in their memory calculation.


PR_016
Note
Medium Risk
Touches ML deployment task serialization (new transport version) and changes the memory estimation formula used for allocation/stats, which could affect deployment sizing and autoscaling decisions.
Overview
Adds support for model-provided memory requirements in ML deployments.
StartTrainedModelDeploymentAction.TaskParamsnow carriesper_deployment_memory_bytesandper_allocation_memory_bytes, serializes them behind a new transport version (V_8_500_064), and includes them intoXContent/parsing.Updates required native memory estimation.
estimateMemoryUsageBytes(...)now takes the new metadata plusnumber_of_allocationsand computesmax(240MB + 2*model_size, per_deployment + per_allocation*allocations + model_size)(with ELSER v1 still pinned to a fixed value), andTransportGetTrainedModelsStatsActionwires this into therequired_native_memory_bytesstats calculation.Deployment start/updates propagate the metadata from
TrainedModelConfiginto task params across assignment/task update paths, and tests/QA add coverage for the new estimation behavior and request helpers.Written by Cursor Bugbot for commit 2e13a9f. This will update automatically on new commits. Configure here.