-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-gpu training & maml baseline #42
Comments
Hi Danqing, Thanks for your interest in our project. If you have any further questions, feel free to leave additional comments. Best, |
Hi Yaoyao, thanks for the reply! I see, I can report the numbers later here when I finish the experiments. For (2), so what you mentioned is the FT and SS meta-training operations in your paper. I actually have one question on the table 2 of your paper. For the line "MAML deep, HT", did you combine the pre step with the MAML algorithm? Do you have the experiment: "MAML deep, HT" without the fine-tuning? Then we can see how performance improvement fine-tuning contributes. |
Hi Danqing, For "MAML deep, HT" in Table 2, we used the pre-trained model (ResNet-12 (pre)). Best, |
@yaoyao-liu , then for "SS[Θ;θ], HT meta-batch" in table 2, is that also the pre-trained model without the first fine-tuning step? I mean which experiments in table 2 has the "(a) large-scale DNN training" step? |
The differences between your proposed MTL algorithm, and the MAML-Resnet algorithm are: 1) fine-tuning; 2) HT and 3) FT->SS meta-training operations. |
I am not sure what do you mean by "first fine-tuning" step. In Table 2, if the feature extractor is labeled with "(pre)" (e.g., ResNet-12 (pre)), then the pre-trained model is applied. The model is pre-trained on all base class samples. The results in Table 1 show that the "SS meta-training operation" works. Comparing the 3rd block with the 1st and the 2nd blocks, you can observe that our "SS" performs better than "FT" and "update". "HT meta-batch" is not applied in Table 1. |
oh I see, I thought ResNet-12 (pre) means the ResNet-12 without any fine-tuning. By 'first fine-tuning" step I mean "(a) large-scale DNN training" step. |
For table 1, did you first conduct the "(a) large-scale DNN training" step? |
Yes. In the caption, you can see "ResNet-12 (pre)" is applied. |
Yeah I understand by loading the pre-trained models, we have to drop the classifier parameters and only use the encoder parameters. This is like domain-finetuning steps, adapting the pre-trained model weights to the domain. |
I see, thanks for the clarification! I misunderstood "ResNet-12 (pre)". |
You're welcome. |
Hi @yaoyao-liu , I have an additional question, if we don't run the large-scale DNN training step, and just run the experiment with "SS[Θ;θ], HT meta-batch", will the performance be better than "MAML, HT meta-batch"? |
Hi thank you so much for the codebase! I am looking for a multi-gpu PyTorch maml implementations. I am wondering if I can use your codebase for this.
For the multi-gpu training, can I simply use DataParallel to parallel the model? Will the existing data loader work with the the DataParallel model?
Also, I am wondering if I skip the pre-train step and run meta learning directly (made some changes not to load the pre-trained model), is that MAML?
Many thanks and look forward to your reply!
The text was updated successfully, but these errors were encountered: