This is a template codespace for using the attpc_spyral analysis framework on Polaris, the supercomputer at the ALCF at ANL. It serves as a location from which one can supply the appropriate configuration controls and contains a script which can create a PBS job script and submit the script to PBS to run the analysis.
This template can be used in practice, or it can be used simply as a guide for how to setup a codespace to use attpc_spyral at Polaris.
This template comes with a bunch of preloaded examples and directories to demonstrate
how to run a Spyral job on Polaris. But first, you need to install Spyral and the
Dragon libraries to make Spyral node aware. This can be done with the scripts in the
bin directory. If it is your first time using Spyral, run
source ./bin/create_spyral_env.shThis will load the appropriate Polaris modules, create a virtual environment, install
Spryal and Dragon, and activate the environment. Note that for best performance, the
LIBFABRIC_PATH environment variable should be specified to use your systems High Speed
Transport Agent (HSTA). If this variable isn't set (or is invalid) Dragon will default
to a slower TCP implementation.
If you are returning to the codespace, you can use
source ./bin/activate_spyral_env.shwhich will activate the modules and environment. Finally, to deactivate the environment you can use
source ./bin/deactivate_spyral_env.shwhich will deactivate the environment.
We'll start at the top with the file spyral_job.py. This is the main script you will
actually run when you want to use Spyral on Polaris. spyral_job.py does two things:
- Create a PBS script describing the job you want to run, including invoking the image containing the attpc_spyral install
- Submit the PBS script to the PBS system
This makes it so that you don't really need to know very much about how PBS and
Polaris work to run Spyral (you should still read
up on it though). To use spyral_job.py, you need to make sure you have your
environment active. You can then simply run the script as
python spyral_job.py COMMAND CONFIGYou should replace COMMAND with one of the following commands
- help - prints the help message
- create - creates a job script
- submit - submits a job script to the requested queue (if the script does not exist it is created)
and you should replace CONFIG with the path to a JSON configuration file. An example
is given to you in the configs directory of the template. See the Job Configuration
Parameters section for more details.
The other directories are for storing common Spyral data files, such as gas definitions,
particle IDs, and for defining extensions. An example Spyral script is also included,
example_spyral_script.py. There are a few differences from the normal scripts that
are important, mostly for working with Dragon. First notice at the top of the script
import dragonThis is critical. Dragon must be imported first. Then, at the bottom
if __name__ == "__main__":
multiprocessing.set_start_method("dragon")
main()Using multiprocessing.set_start_method("dragon") sets Dragon as the method by which
we start new processes. Both of these are critical for correctly using Spyral with Dragon!
Here we'll go over the available configuration parameters for jobs with Spyral
pbs_script_path: The path to where you want to store the created PBS job scriptspyral_start_script: The script to be run by Spyral. It should exist at the same location asspyral_job.py.log_path: The location at which you would like PBS logs to be writtenproject_name: The name of the project at ALCF that you will be running underqueue: The queue you would like to submit the job to. See the Polaris docsnodes: The number of vnodes to use. See Polaris docswalltime: How long the job is allowed to run in units of minutes. See Polaris docs
{
"pbs_script_path": "/some/path/somewhere.pbs",
"spyral_start_script": "some_script.py",
"log_path": "/some/path/somewhere/",
"project_name": "attpc",
"queue": "debug",
"nodes": 1,
"walltime": 60
}By default the .env file looks like
LIBFABRIC_PATH=/opt/cray/libfabric/1.15.2.0/lib64
OMP_NUM_THREADS=1
OPENBLAS_NUM_THREADS=1
MKL_NUM_THREADS=1
VECLIB_MAXIMUM_THREADS=1
NUMEXPR_NUM_THREADS=1
POLARS_MAX_THREADS=1In general, the only path you need to modify is the LIBFABRIC_PATH. This should be
set to the location of the HSTA (High Speed Transport Agent) library installation for
your system. By default, it is set to the appropriate value at the time of writing for
Argonne Leadership Computing Facility's Polaris system. The other variables most likely
do not need modification.