Skip to content

ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known #2

@kinow

Description

@kinow

Hi,

I followed the instructions from the README.md file, and got the Docker compose cluster working after #1

But running the example results in the error below.

kinow@ranma:/tmp/docker-compss-runtime$ docker-compose exec compss-master bash
(eddl_onnx_last) root@c6e014895899:~# cd pyeddl/third_party/compss_runtime/
(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime# runcompss --lang=python --python_interpreter=python3 --project=linux-based/project.xml --resources=linux-based/resources.xml eddl_train_batch_compss.py
[  INFO] Using default execution type: compss

----------------- Executing eddl_train_batch_compss.py --------------------------

WARNING: COMPSs Properties file is null. Setting default values
[(778)    API]  -  Starting COMPSs Runtime v2.6.rc2003 (build 20200408-1126.rcbac84bafe556637e165de38764868ac68a8a75e)
Sleeping 30 seconds...
E:  uname_result(system='Linux', node='c6e014895899', release='5.4.0-120-generic', version='#136-Ubuntu SMP Fri Jun 10 13:40:48 UTC 2022', machine='x86_64', processor='x86_64')
Generating Random Table
---------------------------------------------
---------------------------------------------

None
CS with low memory setup
Model training...
Number of epochs:  1
Number of epochs for parameter syncronization:  1
Training epochs [ 1  -  1 ] ...
Num workers:  4
Num images per worker:  15000
Workers batch size:  250
[ERRMGR]  -  WARNING: There was an exception when initiating worker deephealth_compss-worker_4.
[ERRMGR]  -  WARNING: There was an exception when initiating worker deephealth_compss-worker_2.
                      Stack trace:
                      Stack trace:
                      es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_4 through user .
                      es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
                      OUTPUT:
                      OUTPUT:
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
                      
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
                      	at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
                      	at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
                      	at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[ERRMGR]  -  ERROR:   [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
                      OUTPUT:
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
[ERRMGR]  -  Shutting down COMPSs...
                      ERROR:ssh: Could not resolve hostname deephealth_compss-worker_4: Name or service not known
                      
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
                      	at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
                      	at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
                      	at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
                      	at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[(163161)    API]  -  Execution Finished
Shutting down the running process

Error running application

(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime#

Thanks!
-Bruno

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions