About your DDP code #22

chaolongy · 2022-05-04T02:49:32Z

Thank you very much for your excellent work, I had the following problems in reading your training code and test code：
In 'train.py' file, val_sampler=torch.utils.data.distributed.DistributedSampler(val_data) , with dist.all_reduce(intersection), dist.all_reduce(union), dist.all_reduce(target). In 'test.py' file, val_sampler=None, with dist.all_reduce(output_3d).

My question：

Why is the sampler inconsistent here?
I found that performance did not change when val_sampler=None. What is the significance of dist.all_reduce() here?
I found that when val_sampler=torch.utils.data.distributed.DistributedSampler(val_data) and dist.all_reduce() was not used, mIOU increased during testing. Why is this?
The last question is that in the 'train.py' file, the "intersectionAndUnionGPU" function in the ‘util.py’ file is used, while in the 'test.py' file, the ‘evaluate’ function in the ‘iou.py’ file is used. What are the essential differences between the two evaluation metrics in terms of application?

I look forward to hearing from you and thank you again for your excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About your DDP code #22

About your DDP code #22

chaolongy commented May 4, 2022

About your DDP code #22

About your DDP code #22

Comments

chaolongy commented May 4, 2022