-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Hi,
I followed the instructions from the README.md file, and got the Docker compose cluster working after #1
But running the example results in the error below.
kinow@ranma:/tmp/docker-compss-runtime$ docker-compose exec compss-master bash
(eddl_onnx_last) root@c6e014895899:~# cd pyeddl/third_party/compss_runtime/
(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime# runcompss --lang=python --python_interpreter=python3 --project=linux-based/project.xml --resources=linux-based/resources.xml eddl_train_batch_compss.py
[ INFO] Using default execution type: compss
----------------- Executing eddl_train_batch_compss.py --------------------------
WARNING: COMPSs Properties file is null. Setting default values
[(778) API] - Starting COMPSs Runtime v2.6.rc2003 (build 20200408-1126.rcbac84bafe556637e165de38764868ac68a8a75e)
Sleeping 30 seconds...
E: uname_result(system='Linux', node='c6e014895899', release='5.4.0-120-generic', version='#136-Ubuntu SMP Fri Jun 10 13:40:48 UTC 2022', machine='x86_64', processor='x86_64')
Generating Random Table
---------------------------------------------
---------------------------------------------
None
CS with low memory setup
Model training...
Number of epochs: 1
Number of epochs for parameter syncronization: 1
Training epochs [ 1 - 1 ] ...
Num workers: 4
Num images per worker: 15000
Workers batch size: 250
[ERRMGR] - WARNING: There was an exception when initiating worker deephealth_compss-worker_4.
[ERRMGR] - WARNING: There was an exception when initiating worker deephealth_compss-worker_2.
Stack trace:
Stack trace:
es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_4 through user .
es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
OUTPUT:
OUTPUT:
ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[ERRMGR] - ERROR: [START_CMD_ERROR]: Could not start the NIO worker in resource deephealth_compss-worker_2 through user .
OUTPUT:
ERROR:ssh: Could not resolve hostname deephealth_compss-worker_2: Name or service not known
[ERRMGR] - Shutting down COMPSs...
ERROR:ssh: Could not resolve hostname deephealth_compss-worker_4: Name or service not known
at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:90)
at es.bsc.compss.nio.master.starters.WorkerStarter.startWorker(WorkerStarter.java:142)
at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:153)
at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:119)
at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$1.run(StartWorkerAction.java:109)
[(163161) API] - Execution Finished
Shutting down the running process
Error running application
(eddl_onnx_last) root@c6e014895899:~/pyeddl/third_party/compss_runtime#Thanks!
-Bruno
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels