Skip to content

TACC/pylauncher

Repository files navigation

This is the README file for the

PyLauncher

by

Victor Eijkhout eijkhout@tacc.utexas.edu

The pylauncher is a python-based parametric job launcher, that is, a utility for executing many small runs in one batch submission. On many batch-based cluster computers this is a better strategy than submitting many small individual small jobs.

The latest version of the pylauncher is always available from the repository: https://github.com/TACC/pylauncher

The only required sources for running are pylaucher.py and hostlist.py (if the latter is already installed on your system you don't even need that.)

Basic usage

The pylauncher is used from inside a (Slurm or PBS or SGE or whatever) job script. In your job script make sure that pylauncher.py is where python can find it, then:

python my_pylauncher_script.py

where the script contains

import pylauncher
pylauncher.ClassicLauncher( commandlinesfile )

Here ClassicLauncher is the simplest launcher -- more sophisticated launchers are discussed below, and commandlinesfile is a file containing one commandline per line.

Your file of commandlines can be simple, containing a program invocation for a sequence of parameters:

./my_program 1
./my_program 2
./my_program 3

We will refer to these lines as "tasks".

Tasks can be more complicated:

mkdir run1 && cd run1 && ./my_program 1 > out1
mkdir run2 && cd run2 && ./my_program 2 > out2
# et cetera

(Blank lines and comment lines, recognizable by the hash symbol are ignored in this file.)

In the simplest case, pylauncher assign one task per core, and as cores finish their tasks, they receive new tasks until the file of commandlines is exhausted.

Classic launcher and standard options

Launchers, including the classic launcher, can take several options.

Parallelism

By default, each task gets one core. The cores option can be used to assign more cores per task. For example:

ClassicLauncher( "commandlines",cores=4 )

assigns four cores per task. You can use that for multi-threaded tasks; alternatively, this means that the tasks get four times as much memory.

If you want each task to have all the cores (and the memory) of a node, use cores="node".

If specifying a uniform core count is limiting, you can specify cores="file". In this case the commandlines have the core count as prefix:

1,./simple_program 
5,./medium_program
16,./big_program

More options

You can limit taskruntime: taskmaxruntime=60 for one minute.

Job state, tracing

By default, pylauncher outputs some statistics at the end of the run. For purposes of tracing and debugging you can add a debug option. Minimally, debug="job" outputs job progress. For more output, debug="job+host+exec".

The job also produces a file queuestate. In cases where your batch job is killed for exceeding its time limit this file can be used to restart the job.

Each launcher run also generates a work directory. The name by default includes the job id, giving something like pylauncher_tmp_1234567.

  • The option workdir="my_own_tmp_name" can be used to specify non-default names.
  • The work directory contains (among much more) files out0, out1, out2 et cetera that contain the standard out and error streams of the tasks.

Other use cases

Let me first stress that in 95 percent of cases the ClassicLauncher is the right choices. Here are some exceptional use cases.

MPI jobs

The ClassicLauncher, through the cores option, can handle multi-threaded parallelism but not MPI parallelism. If your tasks involve an MPI program, use:

pylauncher.IbrunLauncher( "commandlines",cores=10 )

GPUs

For tasks that need a GPU, use the GPULauncher. Since the number of GPUs is not easily detected by the launcher, you need to specify an option gpuspernode=3 or whatever the number is.

Very short tasks

Pylauncher uses a short delay between starting tasks. This prevents excessive file system activity. You can shorten this delay by an option delay=.1.

Still, it takes some time to fill up all the cores. For this there is an option schedule="block8" which groups tasks in blocks of 8 that are started together.

Dynamically generated tasks

If tasks are dynamically generated by another process, you can use

job = pylauncher.DynamicLauncher() # no commandline file!

job.append( "./my_program" ) # any number of times

job.finish() # declare no more tasks

job.tick() # delay, and process tasks

Resuming an aborted job

If your batch job was killed for exceeding runtime, or because of a hardware failure, you can use the queuestate file to restart the job and execute only the tasks that did not finish.

Support

If you are a TACC or XSEDE/Access user, please submit a ticket in the respective ticket system. Otherwise, create a ticket in the github repo. You can also mail me, putting "pylauncher" somewhere in the subject.

About

Python-based job launcher

Resources

License

Stars

Watchers

Forks

Packages

No packages published