can we identify (and re-route) intrinsically CPU-intensive workflows?

In [this](https://opensciencegrid.slack.com/archives/CJBRGH1EJ/p1559591919004800) OSG Slack thread, after the PRP folks observed/complained that they were seeing unusually low GPU utilization by RIFT jobs, @astroclark explained:

> turns out the latest batch of RIFT jobs that were submitted are intrinsically more CPU-intensive (a more expensive waveform approximant SEOBNRv4 fwiw).  The waveform generation - CPU-bound - in this case is more expensive than the likelihood calculations - the GPU part.  They're aware and agree that it would make more sense to run this type of job on CPUs.

This all makes sense and is not a problem, but sparks some questions and thoughts for me:

1. How common is this (as a proportion of all the RIFT workflows you run, over time)?
1. In practice, do/can you know in advance which runs which behave like this?  Or is it something you can only really discover after you've run a workflow?
1. Does it make sense to manually assign runs to CPUs or GPUs based on this knowledge, ad-hoc, or can it be done programmatically at either workflow generation-time or run-time, so humans aren't in the loop?
1. Do you think it might be a good idea to instrument RIFT to collect some basic performance data while it runs, and then report, post-facto, the CPU and GPU utilization of each run as part of its results?  I'm going to turn this last question into its own ticket (#16), because PyCBC did this a long time ago and it's been _enormously_ helpful, and it allows you to effectively set alarms if things go outside expected bounds (e.g., CPU utilization approaches zero) and/or run reports on RIFT performance over time, have automated performance regression tests between RIFT versions, etc.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can we identify (and re-route) intrinsically CPU-intensive workflows? #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

can we identify (and re-route) intrinsically CPU-intensive workflows? #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions