Multi gpu training by psteinb · Pull Request #69 · juglab/n2v

psteinb · 2020-02-03T16:21:21Z

This needs a bit more testing, but I think going multi-gpu is somewhat straight forward. Or did you try that already?

psteinb · 2020-02-11T15:44:40Z

Almost there, apparently there is a problematic interplay of tf and keras:
tensorflow/tensorflow#30728
keras-team/keras#13057
keras-team/keras#13255
I need to check how to fix this.

psteinb · 2020-02-12T15:29:24Z

done implementing multi-gpu training. I hope putting that into the constructor of N2V was the right choice. I also added an example notebook derived from examples/2D/denoising2D_BSD68/BSD68_reproducibility_multi_gpu.ipynb.

I'll supply more extensive numbers later, my current estimate for training n2v from this notebook is:

a single P100 with tf 1.12 and keras 2.2.4: ~93 seconds per epoch after warm-up
double P100s with tf 1.12 and keras 2.2.4: ~56 seconds per epoch after warm-up

I'll provide 4 GPU numbers later. Note that this "improvement" is expected to be non-linear as keras internally does parallize the batches, so a batch size of 128 will be parallelized to 2 batches of 64 images. As discussed earlier this approach is currently not support with tf 1.14 and keras 2.2.{4,5} due to the bugs mentioned above.

Would love to hear your feedback on this.

tibuch · 2020-06-24T08:10:15Z

Thank you for this PR!

I have this on my to-do list, but wasn't able to get my hands on a multi-GPU system. I guess the cluster should work for testing.

Although I am very confident that it just works, I would like to test it as well :)

psteinb · 2020-06-24T08:35:26Z

thanks for having a look. Last time I checked, all GPU configs with >=3 GPUs fail to run due to some problems with the keras data augmentations. Maybe this is leveraged by looking into bringing n2v 100% to tf.keras?

snehashis-roy · 2020-08-19T20:19:12Z

Hi,
I want to use 2 gpus for training. As explained in the notebook, I used the following config,

config = N2VConfig(X_train, unet_kern_size=3, unet_n_depth=3, unet_n_first = 64,
                           train_steps_per_epoch=int(dim[0] / 128), train_epochs=50, train_loss='mse',
                           batch_norm=True, train_num_gpus=2,
                           train_batch_size=64, n2v_perc_pix=1.0, n2v_patch_shape=(128,128),
                           n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5)

I have set CUDA_VISIBLE_DEVICES to 1,2 before running the training. I have used pip install n2v to install N2V. My TF-GPU is 1.14.1, keras 2.2.5, numpy 1.19.1

The training still uses 1 GPU. Please let me know what I am missing.

tibuch · 2020-08-20T06:47:24Z

Hi @piby2,

This functionality is not part of the official N2V release yet.

If you would like to test it you would have to clone the fork psteinb/n2v and checkout the branch multi_gpu_training. Then you can run pip install . from inside the git repo and this version will be installed.

psteinb added 2 commits February 3, 2020 17:18

added multi_gpu option to training

0be67a3

add sentinals for backend later

274dfb9

psteinb added 2 commits February 12, 2020 16:19

multi gpu training works for 2D training

99e4e55

notebook to illustrate use of multiple GPUs

1b0a8d3

psteinb changed the title ~~WIP: Multi gpu training~~ Multi gpu training Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi gpu training#69

Multi gpu training#69
psteinb wants to merge 4 commits intojuglab:mainfrom
psteinb:multi_gpu_training

psteinb commented Feb 3, 2020

Uh oh!

psteinb commented Feb 11, 2020

Uh oh!

psteinb commented Feb 12, 2020

Uh oh!

tibuch commented Jun 24, 2020

Uh oh!

psteinb commented Jun 24, 2020

Uh oh!

snehashis-roy commented Aug 19, 2020

Uh oh!

tibuch commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psteinb commented Feb 3, 2020

Uh oh!

psteinb commented Feb 11, 2020

Uh oh!

psteinb commented Feb 12, 2020

Uh oh!

tibuch commented Jun 24, 2020

Uh oh!

psteinb commented Jun 24, 2020

Uh oh!

snehashis-roy commented Aug 19, 2020

Uh oh!

tibuch commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants