GitHub - TACC/pylauncher: Python-based job launcher

This is the README file for the

PyLauncher

by

Victor Eijkhout eijkhout@tacc.utexas.edu

The pylauncher is a python-based parametric job launcher, that is, a utility for executing many small runs in one batch submission. On many batch-based cluster computers this is a better strategy than submitting many small individual small jobs.

The latest version of the pylauncher is always available from the repository: https://github.com/TACC/pylauncher

The only required sources for running are pylaucher.py and hostlist.py (if the latter is already installed on your system you don't even need that.)

Basic usage

The pylauncher is used from inside a (Slurm or PBS or SGE or whatever) job script. In your job script make sure that pylauncher.py is where python can find it, then:

python my_pylauncher_script.py

where the script contains

import pylauncher
pylauncher.ClassicLauncher( commandlinesfile )

Here ClassicLauncher is the simplest launcher -- more sophisticated launchers are discussed below, and commandlinesfile is a file containing one commandline per line.

Your file of commandlines can be simple, containing a program invocation for a sequence of parameters:

./my_program 1
./my_program 2
./my_program 3

We will refer to these lines as "tasks".

Tasks can be more complicated:

mkdir run1 && cd run1 && ./my_program 1 > out1
mkdir run2 && cd run2 && ./my_program 2 > out2
# et cetera

(Blank lines and comment lines, recognizable by the hash symbol are ignored in this file.)

In the simplest case, pylauncher assign one task per core, and as cores finish their tasks, they receive new tasks until the file of commandlines is exhausted.

Classic launcher and standard options

Launchers, including the classic launcher, can take several options.

Parallelism

By default, each task gets one core. The cores option can be used to assign more cores per task. For example:

ClassicLauncher( "commandlines",cores=4 )

assigns four cores per task. You can use that for multi-threaded tasks; alternatively, this means that the tasks get four times as much memory.

If you want each task to have all the cores (and the memory) of a node, use cores="node".

If specifying a uniform core count is limiting, you can specify cores="file". In this case the commandlines have the core count as prefix:

1,./simple_program 
5,./medium_program
16,./big_program

More options

You can limit taskruntime: taskmaxruntime=60 for one minute.

Job state, tracing

By default, pylauncher outputs some statistics at the end of the run. For purposes of tracing and debugging you can add a debug option. Minimally, debug="job" outputs job progress. For more output, debug="job+host+exec".

The job also produces a file queuestate. In cases where your batch job is killed for exceeding its time limit this file can be used to restart the job.

Each launcher run also generates a work directory. The name by default includes the job id, giving something like pylauncher_tmp_1234567.

The option workdir="my_own_tmp_name" can be used to specify non-default names.
The work directory contains (among much more) files out0, out1, out2 et cetera that contain the standard out and error streams of the tasks.

Other use cases

Let me first stress that in 95 percent of cases the ClassicLauncher is the right choices. Here are some exceptional use cases.

MPI jobs

The ClassicLauncher, through the cores option, can handle multi-threaded parallelism but not MPI parallelism. If your tasks involve an MPI program, use:

pylauncher.IbrunLauncher( "commandlines",cores=10 )

GPUs

For tasks that need a GPU, use the GPULauncher. Since the number of GPUs is not easily detected by the launcher, you need to specify an option gpuspernode=3 or whatever the number is.

Very short tasks

Pylauncher uses a short delay between starting tasks. This prevents excessive file system activity. You can shorten this delay by an option delay=.1.

Still, it takes some time to fill up all the cores. For this there is an option schedule="block8" which groups tasks in blocks of 8 that are started together.

Dynamically generated tasks

If tasks are dynamically generated by another process, you can use

job = pylauncher.DynamicLauncher() # no commandline file!

job.append( "./my_program" ) # any number of times

job.finish() # declare no more tasks

job.tick() # delay, and process tasks

Resuming an aborted job

If your batch job was killed for exceeding runtime, or because of a hardware failure, you can use the queuestate file to restart the job and execute only the tasks that did not finish.

Support

If you are a TACC or XSEDE/Access user, please submit a ticket in the respective ticket system. Otherwise, create a ticket in the github repo. You can also mail me, putting "pylauncher" somewhere in the subject.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
docs		docs
examples		examples
regression		regression
src		src
tests		tests
tutorial		tutorial
LICENSE		LICENSE
Makefile		Makefile
PBS_README.md		PBS_README.md
README		README
README.md		README.md
hostlist.py		hostlist.py
hostlist114.py		hostlist114.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic usage

Classic launcher and standard options

Parallelism

More options

Job state, tracing

Other use cases

MPI jobs

GPUs

Very short tasks

Dynamically generated tasks

Resuming an aborted job

Support

About

Uh oh!

Releases 14

Packages

Uh oh!

Languages

License

TACC/pylauncher

Folders and files

Latest commit

History

Repository files navigation

Basic usage

Classic launcher and standard options

Parallelism

More options

Job state, tracing

Other use cases

MPI jobs

GPUs

Very short tasks

Dynamically generated tasks

Resuming an aborted job

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Languages

Packages