Conversation
ocrd/ocrd/processor/helpers.py
Outdated
| t0_wall = perf_counter() | ||
| t0_cpu = process_time() | ||
| processor.process() | ||
| mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True) |
There was a problem hiding this comment.
Maybe you wanna add multiprocess=True in there for processors that use multiprocessing or similar, or that delegate to shell calls. Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).
There was a problem hiding this comment.
Maybe you wanna add
multiprocess=Truein there for processors that usemultiprocessingor similar, or that delegate to shell calls.
I will.
Perhaps also
include_children=True(not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).
if include_children:
mem += sum(_get_child_memory(process, meminfo_attr))https://github.com/pythonprofilers/memory_profiler/blob/master/memory_profiler.py#L135-L136
There was a problem hiding this comment.
Perhaps also
include_children=True(not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).if include_children: mem += sum(_get_child_memory(process, meminfo_attr))
I saw that, but the question remains: what is in that sum? (We probably don't want to count shared and CoW pages repeatedly. But it may be hard to calculate exactly. IIRC standard ps and top and even time do count repetitions, so they appear too large – don't "add up" – for multiprocessed programs, whereas smem gets it right. This tells me we are actually interested in the PSS, the proportional set size, rather than the RSS.)
There was a problem hiding this comment.
Ok, but PSS vs RSS is a more general problem/discussion. Any multiprocessing coder will know about that. And RSS is still a valuable information. So let's stick to your current implementation.
|
This will conflict with #652 (where that part of Oh, since I saw this also uses |
Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?
I haven't, yet. I did notice that when I profile the current process ( |
It looks like this is only cosmetic (since the workflow server does not handle the retval either, it just adds exception handling, and does not use multiprocessing itself, at least in the current implementation). But I would have to check/test. |
ocrd/ocrd/processor/helpers.py
Outdated
| t0_wall = perf_counter() | ||
| t0_cpu = process_time() | ||
| processor.process() | ||
| mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True) |
There was a problem hiding this comment.
| mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True) | |
| mem_usage = memory_usage(proc=processor.process, | |
| interval=.1, timeout=None, timestamps=True, | |
| # include sub-processes | |
| multiprocess=True, include_children=True, | |
| # get proportional set size instead of RSS | |
| backend='psutil_pss') |
There was a problem hiding this comment.
If it is really that simple – let's make this configurable (PSS vs RSS measuring). Did you try this?
There was a problem hiding this comment.
Sure, however we first need a mechanism to toggle this behavior. Simple idea would be to add CLI flags --mem-profile-rss and --mem-profile-pss.
@bertsky prefers a way to have this configurable via the logging configuration - @M3ssman do you have a good idea how that might work? That way, we could keep the CLI arguments small and the --help output less confusing.
There was a problem hiding this comment.
Perhaps for the moment it could be as simple as
mem_prof = os.getenv('OCRD_PROFILE', '')
if mem_prof:
...
backend='psutil_pss' if mem_prof == 'PSS' else 'psutil')There was a problem hiding this comment.
I like environment variables in general, but when the choice is between environment variables and CLI flags, I prefer the latter, because at least it's consistent. Because otherwise you could get OCRD_PROFILE=PSS ocrd-dummy --profile-file my.prof ....
There was a problem hiding this comment.
@kba I agree, however I did not argue env var plus cli arg, but only the former. Perhaps we could also move --profile and --profile-file into env var control entirely.
Or we go the other direction and put everything into a single CLI arg, just with multi-faceted value: --profile mem=rss,cpu,file=foo.log for example.
For the broader discussion of how to configure processors without adding to many options / parameters and what for, see this old thread
There was a problem hiding this comment.
OK, now that the wonky multiple-process-call behavior is fixed, I'll implement this via environment variable OCRD_PROFILE and OCRD_PROFILE_FILE
OCRD_PROFILEnot set or empty: no profilingOCRD_PROFILE_FILEset to a value (== current--profile-fileflag):- if
OCRD_PROFILEis defined but empty: no profiling - if
OCRD_PROFILEis not defined: setOCRD_PROFILEtoCPU
* iftoo strict, PROFILE_FILE should imply PROFILE^=CPUOCRD_PROFILEis defined but does not containCPU: raise error
- if
OCRD_PROFILEcontainsCPU, do CPU profiling (== current--profileflag)OCRD_PROFILEcontainsRSSdo RSS memory profilingOCRD_PROFILEcontainsPSS: do proportionate mem profilingIftoo strict,OCRD_PROFILEcontains bothPSSandRSS: raise errorPSStakes precedence
Did I catch everything?
There was a problem hiding this comment.
Environment variables look good to me.
In the readme, I would also suggest to reference an external link where the difference between RSS and PSS is explained or at least briefly described. May not be the best suggestion, but consider this.
There was a problem hiding this comment.
yes, sounds reasonable and complete AFAICS – thx!
|
I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative |
I'm afraid it's not that simple at all. GPutil only gives snapshots, no aggregation/statistics there. And it looks like one would need to use the respective ML framework's facilities for process/session measurements anyway. (See here and there for Pytorch. TF now also has things like get_memory_info / get_memory_usage / reset_memory_stats.) Let's treat that off-topic for now! |
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
| * `RSS`: Enable RSS memory profiling | ||
| * `PSS`: Enable proportionate memory profiling | ||
| * `OCRD_PROFILE_FILE`: If set, then the CPU profile is written to this file for later peruse with a analysis tools like [snakeviz](https://jiffyclub.github.io/snakeviz/) | ||
|
|
There was a problem hiding this comment.
| Logging is configured via configuration files, see [module logging](./ocrd_utils#ocr-d-module-logging). |
When running processors with
ocrd process, we already measure the wall and CPU time of a processor run. This adds basic tracking of memory usage with https://github.com/pythonprofilers/memory_profiler.Currently, this just outputs a sparkline of memory usage and max/min memory, e.g.