[ML] Add per allocation and per deployment memory metadata fields to …#6
[ML] Add per allocation and per deployment memory metadata fields to …#6MitchLewis930 wants to merge 1 commit intopr_016_beforefrom
Conversation
…the trained models config (elastic#98139) To improve the required memory estimation of NLP models, this PR introduces two new metadata fields: per_deployment_memory_bytes and per_allocation_memory_bytes. per_deployment_memory_bytes is the memory required to load the model in the deployment per_allocation_memory_bytes is the temporary additional memory used during the inference for every allocation. This PR extends the memory usage estimation logic while ensuring backward compatibility. In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.
There was a problem hiding this comment.
Pull request overview
This PR adds per-deployment and per-allocation memory metadata fields to the trained model deployment system. These fields enable more accurate memory usage estimation for PyTorch model deployments by allowing models to specify custom memory requirements that can differ from the default calculations.
Changes:
- Added
per_deployment_memory_bytesandper_allocation_memory_bytesfields toTaskParamsandTrainedModelConfig - Updated memory estimation logic to use custom memory values when available, falling back to default calculations
- Added transport version
V_8_500_064for backward compatibility of the new fields
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| StartTrainedModelDeploymentAction.java | Added new memory fields to TaskParams, updated estimateMemoryUsageBytes to support custom memory calculations |
| TrainedModelConfig.java | Added getter methods for per-deployment and per-allocation memory bytes from metadata |
| TransportStartTrainedModelDeploymentAction.java | Updated to extract and pass memory values from model config to task params |
| TransportGetTrainedModelsStatsAction.java | Modified memory estimation to include number of allocations parameter |
| TrainedModelDeploymentTask.java | Updated updateNumberOfAllocations to preserve new memory fields |
| TrainedModelAssignmentNodeService.java | Updated task params construction to include memory fields |
| TrainedModelAssignment.java | Updated setNumberOfAllocations to preserve memory fields |
| TransportVersion.java | Added V_8_500_064 version constant and updated CURRENT version |
| PyTorchModelRestTestCase.java | Added test utility methods for creating models with memory metadata |
| PyTorchModelIT.java | Added integration test for memory estimation with and without metadata |
| Multiple test files | Updated test constructors to include new memory parameters with default values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Request request = new Request("PUT", "/_ml/trained_models/" + modelId); | ||
| request.setJsonEntity(""" | ||
| String metadata; | ||
| if (perDeploymentMemoryBytes > 0 && perAllocationMemoryBytes > 0) { |
There was a problem hiding this comment.
The condition checks both values must be greater than 0, but the logic in estimateMemoryUsageBytes treats them independently (line 726 checks both equal 0). Consider checking if either value is non-zero to be consistent with the estimation logic.
| if (perDeploymentMemoryBytes > 0 && perAllocationMemoryBytes > 0) { | |
| if (perDeploymentMemoryBytes > 0 || perAllocationMemoryBytes > 0) { |
PR_016