Transfer Learning Code

Using the excellent timm package, the article result can be reproduced almost completely. Specifically, timm package enables to compare official pretraining and miil pretraining of ViT and Mixer model, and validate the improvement in transfer learning results. This comparison also enables to show how miil pretraining stabilizes transfer learning results, and make them far less susceptible to hyper-parameter selection.

An example training code on cifar100:

python train.py \
/Cifar100Folder/ \
-b=128 \
--img-size=224 \
--epochs=50 \
--color-jitter=0 \
--squish \
--amp \
--sched='cosine' \
--model-ema --model-ema-decay=0.995 --squish --reprob=0.5 --smoothing=0.1 \
--nonstrict_checkpoint --min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0 \
--pretrained \
--lr=2e-4 \
--model=mixer_b16_224_in21k \
--opt=adam --weight-decay=1e-4 \

These are the result we got for the official 21k pretrain (--model=mixer_b16_224_in21k) and miil 21k pretrain (--model=mixer_b16_224_miil_in21k), for different hyper-parameter selection:

Optimizer	Weight decay	Learning rate	Official pretrain Mixer-B-16 score	Miil pretrain Mixer-B-16 score
adam	1e-4	4e-4	82.6	90.5 (+7.9)
adam	1e-4	2e-4	84.0	91.1 (+7.1)
adamw	1e-4	2e-4	84.4	90.9 (+6.5)
adamw	1e-2	2e-4	84.7	90.9 (+6.2)
sgd	1e-4	2e-4	91.7	92.4 (+0.7)

We can see that miil pretraining reaches almost the same accuracy for all hyper-parameters, while the official pretraining suffers from a major drop in accuracy for adam and adamw. For every hyper-parameter tested, miil pretraining achieves better accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer_learning.md

Transfer_learning.md

Transfer Learning Code

Files

Transfer_learning.md

Latest commit

History

Transfer_learning.md

File metadata and controls

Transfer Learning Code