Skip to content
This repository was archived by the owner on Jan 26, 2021. It is now read-only.
This repository was archived by the owner on Jan 26, 2021. It is now read-only.

why workers are blocked when I start 6 process in three node, 2 process per node #143

@teki1981

Description

@teki1981

11 class TestMultiversoSharedVariable:
12 def _test_sharedvar(self, row, col):
13 W = sharedvar.mv_shared(
14 value=np.zeros(
15 (row, col),
16 dtype=theano.config.floatX
17 ),
18 name='W',
19 borrow=True
20 )
21 delta = np.array(range(1, row * col + 1),
22 dtype=theano.config.floatX).reshape((row, col))
23 train_model = theano.function([], updates=[(W, W + delta)])
24 for i in xrange(10):
25 train_model()
26 train_model()
27 sharedvar.sync_all_mv_shared_vars() #sent to server
28 #mv.barrier()
29 # to get the newest value, we must sync again
30 mv.barrier()
31 sharedvar.sync_all_mv_shared_vars()
32 for j, actual in enumerate(W.get_value().reshape(-1)):
33 print "[%d] %d %d %d"%(i,j, (j + 1) * (i + 1) * 2 * mv.workers_num(), actual)
34
35 def test_sharedvar(self):
36 self._test_sharedvar(10, 10)
37
38
39 if name == 'main':
40 mv.init()
41 test_shared = TestMultiversoSharedVariable()
42 test_shared.test_sharedvar()
43 mv.shutdown()

I run this test, found When start one worker in one node, it is OK
but When start two worker in one node , all workers were blocked。
mpirun -hostfile alg_cluster.txt -npernode 1 python test_multi.py
mpirun -hostfile alg_cluster.txt -npernode 2 python test_multi.py

there are three ips in my cluster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions