Add what's needed to make probes work with normal cron#482
Add what's needed to make probes work with normal cron#482ericvaandering wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for using standard cron instead of Jobber for scheduling probes in the Rucio container. The changes introduce conditional logic that selects between cron and Jobber based on the RUCIO_USING_CRON environment variable, along with a new Python utility script for deterministic job delays to distribute cron load.
Changes:
- Modified
run-probes.shto conditionally use either cron or Jobber based on environment variable - Added
cronie-noanacronpackage installation to the Dockerfile - Introduced a new
DeterministicDelayPython script for spreading out cron job execution times
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| probes/run-probes.sh | Added conditional logic to choose between cron and Jobber scheduler based on RUCIO_USING_CRON environment variable |
| probes/Dockerfile | Added cronie-noanacron package and DeterministicDelay script to support cron functionality |
| probes/DeterministicDelay | New utility script that delays command execution by a deterministic amount based on command hash |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| max_delay = int(args.delay) | ||
|
|
||
| delay = checksum % max_delay |
There was a problem hiding this comment.
If the delay argument is '0' or '0s', max_delay will be 0, causing a ZeroDivisionError when computing checksum % max_delay on line 41. The script should validate that max_delay is greater than 0, or handle the zero case by not delaying at all.
| delay = checksum % max_delay | |
| if max_delay <= 0: | |
| delay = 0 | |
| else: | |
| delay = checksum % max_delay |
| cp /etc/jobber-config/dot-jobber.yaml /root/.jobber | ||
| if [ ! -z "$RUCIO_USING_CRON" ]; then | ||
| echo "Setting up and starting cron" | ||
| cp /etc/cron.rucio/probes-crontab /etc/cron.d/ |
There was a problem hiding this comment.
The script references /etc/cron.rucio/probes-crontab but this file is not added to the Docker image in the Dockerfile. The cron configuration will fail to copy when RUCIO_USING_CRON is set, causing the container to fail. You need to either add the crontab file to the Dockerfile (e.g., ADD probes-crontab /etc/cron.rucio/) or create it during the build process.
| cp /etc/cron.rucio/probes-crontab /etc/cron.d/ | |
| if [ -f /etc/cron.rucio/probes-crontab ]; then | |
| cp /etc/cron.rucio/probes-crontab /etc/cron.d/ | |
| else | |
| echo "Warning: /etc/cron.rucio/probes-crontab not found; skipping probes cron configuration." | |
| fi |
| ADD rucio.config.default.cfg /tmp/ | ||
|
|
||
| ADD run-probes.sh / | ||
| ADD DeterministicDelay / |
There was a problem hiding this comment.
The DeterministicDelay script is added to the container but is not made executable. The Dockerfile should include a RUN chmod +x /DeterministicDelay command after the ADD instruction, or the file should be made executable before being added to the image. Without execute permissions, the script cannot be run directly from cron jobs.
| ADD DeterministicDelay / | |
| ADD DeterministicDelay / | |
| RUN chmod +x /DeterministicDelay |
| executed = subprocess.run(args.commands, capture_output=True) | ||
|
|
||
| print("STDOUT") | ||
| print(executed.stdout) | ||
| print("STDERR") | ||
| print(executed.stderr) |
There was a problem hiding this comment.
The subprocess.run call does not check the return code or propagate the exit status of the executed command. This means that cron jobs will always appear to succeed even if the actual command fails. Consider adding sys.exit(executed.returncode) after printing the output to properly propagate the exit status.
| print("STDOUT") | ||
| print(executed.stdout) | ||
| print("STDERR") | ||
| print(executed.stderr) |
There was a problem hiding this comment.
The output from subprocess.run is captured as bytes but printed directly. This will result in output like b'...' being printed instead of the actual text. The bytes should be decoded before printing, for example: print(executed.stdout.decode('utf-8')) and print(executed.stderr.decode('utf-8')). Alternatively, you can add text=True to the subprocess.run call to get string output directly.
| if args.delay[-1] in ['s', 'm', 'h', 'd']: | ||
| delay_value, delay_unit = args.delay[:-1], args.delay[-1] | ||
| if delay_unit == 'm': | ||
| max_delay = int(delay_value) * 60 | ||
| elif delay_unit == 'h': | ||
| max_delay = int(delay_value) * 60 * 60 | ||
| elif delay_unit == 'd': | ||
| max_delay = int(delay_value) * 24 * 60 * 60 | ||
| else: | ||
| max_delay = int(delay_value) | ||
| else: | ||
| max_delay = int(args.delay) |
There was a problem hiding this comment.
The delay argument parsing does not handle invalid input. If a non-numeric value is provided (e.g., -d abc or -d 5x), the int() conversion will raise a ValueError. Consider adding error handling with a try-except block to provide a meaningful error message instead of crashing with an unhandled exception.
| if args.delay[-1] in ['s', 'm', 'h', 'd']: | |
| delay_value, delay_unit = args.delay[:-1], args.delay[-1] | |
| if delay_unit == 'm': | |
| max_delay = int(delay_value) * 60 | |
| elif delay_unit == 'h': | |
| max_delay = int(delay_value) * 60 * 60 | |
| elif delay_unit == 'd': | |
| max_delay = int(delay_value) * 24 * 60 * 60 | |
| else: | |
| max_delay = int(delay_value) | |
| else: | |
| max_delay = int(args.delay) | |
| try: | |
| if args.delay[-1] in ['s', 'm', 'h', 'd']: | |
| delay_value, delay_unit = args.delay[:-1], args.delay[-1] | |
| if delay_unit == 'm': | |
| max_delay = int(delay_value) * 60 | |
| elif delay_unit == 'h': | |
| max_delay = int(delay_value) * 60 * 60 | |
| elif delay_unit == 'd': | |
| max_delay = int(delay_value) * 24 * 60 * 60 | |
| else: | |
| max_delay = int(delay_value) | |
| else: | |
| max_delay = int(args.delay) | |
| except ValueError: | |
| parser.error( | |
| f"Invalid delay value '{args.delay}'. Delay must be a non-negative integer " | |
| "optionally suffixed with s, m, h, or d." | |
| ) |
| print(f"Sleeping {delay}s") | ||
| time.sleep(delay) | ||
|
|
||
| executed = subprocess.run(args.commands, capture_output=True) |
There was a problem hiding this comment.
When no commands are provided (empty args.commands list), subprocess.run will fail with an error. The script should validate that at least one command is provided and print a meaningful error message. Consider adding validation like if not args.commands: parser.error('No commands provided') after parsing arguments.
I'm told no one else uses this container, so it should be OK to merge. But comments welcome on the PR regardless.