-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Therse seems to be a bug in CycleCloud PBS my customer was complaining about (CC version 8.7.1-3364 .... with pbspro):
Any user can make autoscaling stop working, i.e. no further jobs handled if a memory size is specified in capitalized units. For example:
$ echo sleep 300 | qsub -l select=1:ncpus=120:mem=4194304KB:mpiprocs=120 or echo 300 | qsub -l nodes=1:ppn=120 -l mem=4194304KB
OpenPBS handles this just fine, but /opt/cycle/pbspro/autoscale.log will have python exceptions - and autoscaling will never proceed for the entire cluster.
Workaround is a manual like this:
$ qalter -l select=1:ncpus=120:mem=4194304kb:mpiprocs=120 JOBID
Traceback (most recent call last):
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/resource.py", line 154, in parse
return HPCSize.value_of(expr)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/hpctypes.py", line 106, in value_of
return Size._value_of(Size, value)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/hpctypes.py", line 142, in _value_of
).format(mag, mag)
RuntimeError: Unknown SizeMagnitude 'KB'. To register custom magnitudes, call hpc.autoscale.hpctypes.add_magnitude_conversion('KB', N), where N is the number of bytes.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1892, in main
args.func(**kwargs)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 436, in autoscale
output_columns = output_columns or self._get_default_output_columns(config)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1077, in _get_default_output_columns
default_cmd = self._default_output_columns(config, cmd_name)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 76, in _default_output_columns
env = self._pbs_env(driver)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 110, in _pbs_env
self.__pbs_env = environment.from_driver(pbs_driver.config, pbs_driver)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/environment.py", line 58, in from_driver
jobs = pbs_driver.parse_jobs(queues, default_scheduler.resources_for_scheduling)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 594, in parse_jobs
self.pbscmd, self.resource_definitions, queues, resources_for_scheduling
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 747, in parse_jobs
rdict = parser.convert_resource_list(res_list)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/parser.py", line 38, in convert_resource_list
ret["select"] = self.parse_select(str(raw_dict["select"]))
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/parser.py", line 101, in parse_select
value = self.resource_definitions[key].type.parse(value)
File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/resource.py", line 157, in parse
"Could not parse '{}' as type size (e.g. 1mb)".format(expr)
pbspro.resource.ResourceParsingError: Could not parse '4194304KB' as type size (e.g. 1mb)
Metadata
Metadata
Assignees
Labels
No labels