Skip to content

autoscaler not handling well bad formatted JSON qstat output #43

@xpillons

Description

@xpillons

cyclecloud-pbspro version 2.0.10

With OpenPBS 19.1.1 output in JSON for qstat can be bad formatted in case of complex environment variables.
For example : qstat -f <job_id> -F json | jq '.' will return an error meaning the JSON is bad formatted.
As a result this make the autoscaler to stop working, so a single bad job can hang all the whole system and no new nodes can be added.

Here the output of azpbs autoscale

File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/environment.py", line 58, in from_driver
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/init.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 47623-47628: ordinal not in range(128)
Call stack:
  File "/root/bin/azpbs", line 4, in
    main()
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 284, in main
    clilib.main(argv or sys.argv[1:], "pbspro", PBSCLI())
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1739, in main
    args.func(**kwargs)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 1315, in analyze
    dcalc = self._demand(config, ctx_handler=ctx_handler)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/clilib.py", line 360, in _demand
    dcalc, jobs = self._demand_calc(config, driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 113, in _demand_calc
    pbs_env = self._pbs_env(pbs_driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/cli.py", line 106, in _pbs_env
    self.__pbs_env = environment.from_driver(pbs_driver.config, pbs_driver)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/environment.py", line 58, in from_driver
    jobs = pbs_driver.parse_jobs(queues, default_scheduler.resources_for_scheduling)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 414, in parse_jobs
    self.pbscmd, self.resource_definitions, queues, resources_for_scheduling
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/driver.py", line 530, in parse_jobs
    response: Dict = pbscmd.qstat_json("-f", "-t")
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 31, in qstat_json
    response = self.qstat(*args)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 25, in qstat
    return self._check_output(cmd)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/pbspro/pbscmd.py", line 76, in _check_output
    logger.info("Response: %s", ret)
  File "/usr/lib64/python3.6/logging/init.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/opt/cycle/pbspro/venv/lib/python3.6/site-packages/hpc/autoscale/hpclogging.py", line 45, in _log
    **stacklevelkw
Message: 'Response: %s'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions