This repo contains scripts related to data flow and management, from file generation to distribution. Each server uses different scripts, but all are contained in this repo. Directories are separated by server and usage.
- borealis - Scripts run on the site Borealis computers (bore206, main207). These scripts transfer Borealis rawacf and antennas_iq data files from the computer to the site NAS (nas203).
- site-linux - Scripts run on the site distribution computers (dist204, dist205). These computers perform extra data processing and transfer files from the site NAS to campus.
- campus - Scripts run on the on-campus data server (sdc-serv). These scripts verify files received from site and move them to respective directories for data backup, staging for other institutions, and staging for the SuperDARN data mirror.
- mirror - Scripts run on the on-campus data server (sdc-serv). These scripts control the USask SuperDARN Mirror using Globus scripts, and syncs data with the other SuperDARN mirrors.
- inotify_daemons - Scripts that use
inotifywaitto monitor the data flow and trigger scripts once the preceding script has finished execution. These scripts are to be set up as asystemdservice. Example service files are within theinotify_daemons/services/directory. - library - Bash and python functions and tools used by data flow scripts.
The SuperDARN data flow scripts execute in the following order:
borealis/rsync_to_nas: Move 2-hour blocks of Borealis files (rawacf, antennas_iq) to the site storage (default is the site NAS). Triggered viaborealis.daemonwhen the next 2-hour Borealis file is first writtensite-linux/plot_antennas_iq: reads in the antennas iq files after the previous script has run. Creates plots of the iq data for each rx path using the last generated file. Triggered viasite-linux.daemononceborealis.daemonfinishes running.site-linux/rsync_to_campus: Moves rawacf files and iq plots from the site NAS to the university campus server at sdc-serv. Triggered viasite-linux.daemonwhenplot_antennas_iqfinishes executing.campus/convert_on_campus: Converts rawacf array files to DMAP files for sites specified inconfig.sh. Sites that don't need to convert anything just skip to end of script. Triggered viacampus.daemonwhensite-linux.daemonfinishes running.campus/distribute_borealis_data: Copy DMAP files to respective directories for distribution to other institutions, the Globus mirror, and CEDAR. Backs up DMAP and array files to the campus NAS. Triggered viacampus.daemonwhenconvert_on_campusfinishes executing.campus/archive_iq_plots: Archives any iq plots on campus that are over 24 hours old by moving them to an archive directory. Triggered viacampus.daemonwhendistribute_borealis_datafinishes executing.
Each script is triggered by an inotify daemon unique to each computer. These daemons run on each of
the data flow computers (borealis, site-linux, and sdc-serv) and run sequentially using
inotify to trigger each daemon script in order. Hidden directories .inotify_flags/ and
inotify_watchdir/ are created and used to manage inotify flags. The daemon scripts are as follows:
borealis.daemon: Runs on the Borealis computer. Executesrsync_to_naswhen inotify sees a new 2-hour borealis file get created. Whenrsync_to_nasfinishes, the daemon sends a "flag" file to the site-linux computer to trigger the next data flow daemonsite-linux.daemon: Runs on the Site-Linux computer. Executesplot_antennas_iqandrsync_to_campussequentially as soon asrsync_to_nasfinishes. Triggered on flag sent byborealis.daemon.campus.daemon: Runs on sdc-serv. Executesconvert_on_campus,distribute_borealis_data, andarchive_iq_plotssequentially as soon asrsync_to_campusfinishes. Triggered by flag sent bysite-linux.daemon.
Each of these daemons are configured through systemd, as described below. Example .service files
are provided in inotify_daemons/services/.
To make a daemon with systemd, create a .service file within /usr/lib/systemd/system/ (must
be super user). For example, the borealis.daemon is run with the following
borealis_dataflow.service file:
[Unit]
Description=Borealis data flow inotify daemon
[Service]
User=radar
ExecStart=/home/radar/data_flow/inotify_daemons/borealis.daemon
Restart=always
[Install]
WantedBy=multi-user.target
Useful systemctl commands for operating systemd daemons:
systemctl daemon-reloadsystemctl enable borealis_dataflow.servicesystemctl start borealis_dataflow.servicesystemctl status borealis_dataflow.servicesystemctl restart borealis_dataflow.servicesystemctl stop borealis_dataflow.servicesystemctl disable borealis_dataflow.service
The University of Saskatchewan has a firewall which blocks an IP address from connecting to the
campus network if 50 connection requests are made within one minute. The rsync_to_campus script
can easily hit this mark when sending antennas_iq plots, so SSH multiplexing is recommended. To
configure this:
- log into the remote computer that will be running the
rsync_to_campusscript (i.e. a site-linux computer). cd ~/.sshmkdir controlmasters- Edit the file called
config, adding the following ([xxx]indicates to fill in the value intelligently):
HOST [address of the campus computer, either hostname or IP]
User [username]
ControlPath ~/.ssh/controlmasters/%C
ControlMaster auto
ControlPersist 10m
- Verify that the settings are working by running
ssh -N -f [username]@[address]; ssh -O check [username]@[address], where username and address are the same as those set in the config file. The output will be something like:
Success:
transfer@pgrdist205:~> ssh -N -f dataman@sdc-serv.usask.ca; ssh -O check dataman@sdc-serv.usask.ca
Master running (pid=15739)
Failure:
transfer@pgrdist205:~> ssh -N -f dataman@sdc-serv.usask.ca; ssh -O check dataman@sdc-serv.usask.ca
Control socket connect(/home/transfer/.ssh/controlmasters/3354587955ba492d0d5f595f8619d902ac0192a7): No such file or directory
To use this data flow repository, follow the following steps:
- Clone data flow repository:
git clone https://github.com/SuperDARNCanada/data_flow.git - Install
inotifywaitvia zypper withsudo zypper in inotify-toolsif it is not already installed. - Set up ssh between current computer and next computer in dataflow to work without password. This
is required for the sending of inotify flags between computers.
- As the user running the dataflow, create a key (if no key already created):
ssh-keygen -t ecdsa -b 521 - Copy the public key to the destination computer:
ssh-copy-id user@host - Computers that must be linked: Borealis -> Site-Linux, Site-Linux -> sdc-serv
- For telemetry purposes, each data flow computer must also be linked to the logman user on sdc-serv, so copy the ssh keys to logman@sdc-serv as well
- As the user running the dataflow, create a key (if no key already created):
- Install the inotify daemon for the respective computer (for example, install borealis.daemon
with borealis_dataflow.service on the Borealis computer). As super user, do the following:
- Copy the correct
.servicefile frominotify_daemons/services/to/usr/lib/systemd/system/ - Reload the daemons:
systemctl daemon-reload - Enable the daemon:
systemctl enable [dataflow].service - Start the daemon:
systemctl start [dataflow].service - Check that the daemon is running:
systemctl status [dataflow].service - To specify the radar for
campus_dataflow@.service(using sas as an example):systemctl [command] campus_dataflow@sas.service
- Copy the correct
- Ensure the pydarnio-env virtual environment is set up in home directory and configured correctly.
- To link pydarnio-env/ to the current branch in the ~/pyDARNio local repo, do the following
commands:
source ~/pydarnio-env/bin/activatepip install -e ~/pyDARNio
- If the
-eis omitted, the pydarnio-env will just be installed with the current branch of ~/pyDARNio, and won't be updated if the branch changes.
- To link pydarnio-env/ to the current branch in the ~/pyDARNio local repo, do the following
commands:
- Check the logs to ensure the data flow is working correctly
- The inotify daemon logs are available in the
~/logs/inotify_daemons/directory - The data flow script logs are available in the
~/logs/[script name]directory
- The inotify daemon logs are available in the
- For telemetry purposes, summary logs are availabe for each script in the
~/logs/[script name]/summary/directory. These logs contain the status of all operations on each file and easily parseable to monitor data flow operation. Each script rsyncs the summary files to logman@sdc-serv for uploading to the Engineering dashboard. SSH password-free connection must be setup between each computer and logman@sdc-serv for this to work correctly. - To modify the data flow easily, a
config.shfile is provided. This file specifies:- If the data flow can use the NAS at a site
- What Borealis filetypes are to be converted and restructured
- Which sites have bandwidth / memory limitations
- Where logs should be synched for telemetry