Skip to content

[ML] Report actual memory usage for pytorch_inference process #2885

@valeriy42

Description

@valeriy42

Add the capability to report the actual OS memory usage (RSS) for the pytorch_inference process, similar to what was implemented for autodetect in #2846.

Background

PR #2846 introduced reporting of actual memory usage (via getrusage RSS) for the autodetect process. This provides valuable insight into the real memory footprint of anomaly detection jobs as reported by the OS, rather than relying solely on internal memory tracking.

The pytorch_inference process currently:

  • Has the infrastructure to report RSS values via writeProcessStats() (called on-demand via E_ProcessStats control message)
  • Uses CProcessStats::residentSetSize() and CProcessStats::maxResidentSetSize() in Main.cc
  • Does not periodically report this information back to the Java process

Proposed Changes

  1. Add periodic reporting of system memory usage for pytorch_inference, similar to how autodetect updates E_TSADSystemMemoryUsage and E_TSADMaxSystemMemoryUsage program counters.

  2. Include the RSS values in the output stream that can be consumed by the Java side. Options include:

    • Adding new fields to an existing result type
    • Creating a new periodic stats message
    • Extending the response from E_ProcessStats to be sent periodically
  3. The values to report:

    • system_memory_bytes - current resident set size (CProcessStats::residentSetSize())
    • max_system_memory_bytes - peak resident set size (CProcessStats::maxResidentSetSize())

Files likely to be modified

  • bin/pytorch_inference/Main.cc - Add periodic memory reporting
  • bin/pytorch_inference/CResultWriter.cc / CResultWriter.h - Potentially extend output format
  • bin/pytorch_inference/CCommandParser.cc / CCommandParser.h - If new message types are needed

Relates to

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions