You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After modified the script for mpirun launch, training on 8xRTX3090 with nccl backend can recover the accuracy, while switch to cgx backend, the top1 accuracy always be under 1%. The model used for validation is resnet50.
Besides the default setting of hyper-parameters (e.g. batch, lr, wd), the quantization bits and bucket size are set to 4 and 1024 according to the paper.
Could you share more details about the reproducing of resnet50 on imagenet with cgx backend? Thanks.
The text was updated successfully, but these errors were encountered:
After modified the script for mpirun launch, training on 8xRTX3090 with nccl backend can recover the accuracy, while switch to cgx backend, the top1 accuracy always be under 1%. The model used for validation is resnet50.
Besides the default setting of hyper-parameters (e.g. batch, lr, wd), the quantization bits and bucket size are set to 4 and 1024 according to the paper.
Could you share more details about the reproducing of resnet50 on imagenet with cgx backend? Thanks.
The text was updated successfully, but these errors were encountered: