Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-gpu training & maml baseline #42

Open
DanqingZ opened this issue Dec 27, 2020 · 13 comments
Open

multi-gpu training & maml baseline #42

DanqingZ opened this issue Dec 27, 2020 · 13 comments

Comments

@DanqingZ
Copy link

DanqingZ commented Dec 27, 2020

Hi thank you so much for the codebase! I am looking for a multi-gpu PyTorch maml implementations. I am wondering if I can use your codebase for this.

For the multi-gpu training, can I simply use DataParallel to parallel the model? Will the existing data loader work with the the DataParallel model?

self.model = torch.nn.DataParallel(self.model) 

Also, I am wondering if I skip the pre-train step and run meta learning directly (made some changes not to load the pre-trained model), is that MAML?
Many thanks and look forward to your reply!

@yaoyao-liu
Copy link
Owner

Hi Danqing,

Thanks for your interest in our project.
For (1): I have never tried to run this project on multiple GPUs. You may try that. Welcome to report your results here.
For (2): It is different from the original MAML. In our method, during base-learning, we only update the FC classifier weights. During meta-learning, we update the scaling and shifting weights. In MAML, they update all the network parameters during both base-learning and meta-learning.

If you have any further questions, feel free to leave additional comments.

Best,
Yaoyao

@DanqingZ
Copy link
Author

Hi Yaoyao, thanks for the reply! I see, I can report the numbers later here when I finish the experiments.

For (2), so what you mentioned is the FT and SS meta-training operations in your paper. I actually have one question on the table 2 of your paper. For the line "MAML deep, HT", did you combine the pre step with the MAML algorithm? Do you have the experiment: "MAML deep, HT" without the fine-tuning? Then we can see how performance improvement fine-tuning contributes.
The differences between your proposed MTL algorithm, and the MAML-Resnet algorithm are: 1) fine-tuning; 2) HT and 3) FT->SS meta-training operations. I am actually curious how much performance improvement each component contributes. Thanks!

@yaoyao-liu
Copy link
Owner

Hi Danqing,

For "MAML deep, HT" in Table 2, we used the pre-trained model (ResNet-12 (pre)).
For different ablative fine-tuning settings, you may see the results in Table 1.
As the model is pre-trained on 64 classes (miniImageNet), we are not able to directly apply it to 5-class tasks without any fine-tuning steps. At least, we need to fine-tune the FC classifiers.

Best,
Yaoyao

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

@yaoyao-liu , then for "SS[Θ;θ], HT meta-batch" in table 2, is that also the pre-trained model without the first fine-tuning step? I mean which experiments in table 2 has the "(a) large-scale DNN training" step?

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

The differences between your proposed MTL algorithm, and the MAML-Resnet algorithm are: 1) fine-tuning; 2) HT and 3) FT->SS meta-training operations.
If we want to claim "SS meta-training operations" works, then we need to make sure the comparison experiments also have 1) fine-tuning and 2) HT.
I am trying to understand your work better, please correct me if I am wrong. Thanks.

@yaoyao-liu
Copy link
Owner

I am not sure what do you mean by "first fine-tuning" step.

In Table 2, if the feature extractor is labeled with "(pre)" (e.g., ResNet-12 (pre)), then the pre-trained model is applied. The model is pre-trained on all base class samples.

The results in Table 1 show that the "SS meta-training operation" works. Comparing the 3rd block with the 1st and the 2nd blocks, you can observe that our "SS" performs better than "FT" and "update". "HT meta-batch" is not applied in Table 1.

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

oh I see, I thought ResNet-12 (pre) means the ResNet-12 without any fine-tuning.

By 'first fine-tuning" step I mean "(a) large-scale DNN training" step.

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

For table 1, did you first conduct the "(a) large-scale DNN training" step?

@yaoyao-liu
Copy link
Owner

For table 1, did you first conduct the "(a) large-scale DNN training" step?

Yes. In the caption, you can see "ResNet-12 (pre)" is applied.

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

Yeah I understand by loading the pre-trained models, we have to drop the classifier parameters and only use the encoder parameters. This is like domain-finetuning steps, adapting the pre-trained model weights to the domain.

@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

For table 1, did you first conduct the "(a) large-scale DNN training" step?

Yes. In the caption, you can see "ResNet-12 (pre)" is applied.

I see, thanks for the clarification! I misunderstood "ResNet-12 (pre)".

@yaoyao-liu
Copy link
Owner

You're welcome.

@DanqingZ DanqingZ closed this as completed Jan 2, 2021
@DanqingZ DanqingZ reopened this Jan 2, 2021
@DanqingZ
Copy link
Author

DanqingZ commented Jan 2, 2021

Hi @yaoyao-liu , I have an additional question, if we don't run the large-scale DNN training step, and just run the experiment with "SS[Θ;θ], HT meta-batch", will the performance be better than "MAML, HT meta-batch"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants