Paper accepted in ICDM 2019:19th IEEE International Conference on Data Mining, Beijing, China, 8-11 November 2019.
The original code base for the experiments and results for Image datasets.
Bibtex :
@inproceedings{DBLP:conf/icdm/BalgiD19,
author = {Sourabh Balgi and
Ambedkar Dukkipati},
editor = {Jianyong Wang and
Kyuseok Shim and
Xindong Wu},
title = {{CUDA:} Contradistinguisher for Unsupervised Domain Adaptation},
booktitle = {2019 {IEEE} International Conference on Data Mining, {ICDM} 2019,
Beijing, China, November 8-11, 2019},
pages = {21--30},
publisher = {{IEEE}},
year = {2019},
url = {https://doi.org/10.1109/ICDM.2019.00012},
doi = {10.1109/ICDM.2019.00012},
timestamp = {Mon, 03 Feb 2020 19:47:40 +0100},
biburl = {https://dblp.org/rec/conf/icdm/BalgiD19.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Paper URL :
CUDA: Contradistinguisher for Unsupervised Domain Adaptation
You will need:
- Python 3.6 (Anaconda Python recommended)
- torch (PyTorch)
- torchvision
- nltk
- pandas
- scipy
- tqdm
- scikit-image
- scikit-learn
- tensorboardX
- tensorflow==1.13.1 (for tensorboard visualizations)
On Linux:
> conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
install relevant cuda if GPUs are available. Use of GPUs is very much recommended and inevitable because of the size of the model and datasets.
Use requirements.txt in the respective sub-folders with pip as below:
> pip install -r requirements.txt
Toydataset Domain Adaptation Experiments (toydataset_README.md)
Full details on toydataset domain adaptation including codes in toydataset
subdirectory.
> cd toydataset
We create two blobs in 2D to represent 2 classes. We create 2 sets of such blobs one for each domain D0 and D1.
Following images are illustrations of images from Source (odd columns) and Target (even columns) domains with one example per class.
The plots also illustrate the difference between the CUDA vs all the other domain alignment approaches used for domain adaptation.
-
Example toydataset 1 with seed 22 :
Method D0 -> D1 D1 -> D0 Domain Alignment Approaches CUDA -
Example toydataset 2 with seed 3234 :
Method D0 -> D1 D1 -> D0 Domain Alignment Approaches CUDA -
As seen above, the domain alignment approaches align the target domain over source domain to completely morph source and target domains. After morphing, a classifier is learnt on the labeled source domain. Due to the change of source domain on swapping domains, the classifier learnt mainly depends on the source domain.
-
On the contrast, Since CUDA jointly learns on both the domains in source supervised + target unsupervised manner, the same classifier adapts to learn the best possible decision boundary. Hence the decision boundaries are almost the same even when the source and target domains are swapped.
toydataset/git_images/plots/videos
contains videos of the training of Contradistinguisher using CUDA as the epoch progresses.
We can observe the decision boundary being updated to satisfy both the domains as they are jointly trained without domain alignment.
ss
: source supervised only setting with domain alignmentss_tu
: source supervised + target unsupervised only setting with CUDAss_tu_ta
: source supervised + target unsupervised + target adversarial setting with CUDA
More illustrations of CUDA with different domain shifts and orientations
Visual Domain Adaptation Experiments (visual_README.md)
Full details on visual domain adaptation including codes in visual
subdirectory.
The experiments in visual domain includes Digits, Objects and Traffic signs.
- Digits : USPS, MNIST, SVHN, SYNNUMBERS with 10 digits for classification
- USPS : Train, Test
- MNIST : MNIST
- SVHN : SVHN
- SYNNUMBERS : SYNNUMBERS
- Objects : CIFAR, STL with 9 overlapping classes for classification
- Traffic Signs : SYNSIGNS, GTSRB with 43 classes for classification
The experiments in visual domain includes real world objects from Office dataset.
- Office Objects : Objects from Office-31 dataset with 3 real world domain AMAZON, DSLR, WEBCAM with 31 classes for classification
- AMAZON, DSLR, WEBCAM : Office-31
Visual dataset statistics used for visual domain adaptation |
Target domain test accuracy reported using CUDA over several SOTA domain alignment methods |
Language Domain Adaptation Experiments (language_README.md)
Full details on language domain adaptation including codes in language
subdirectory.
We consider Amazon Customer Reviews Dataset with 4 domains Books, DVDs, Electronics and Kitchen Appliances located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification.
Language dataset statistics used for language domain adaptation |
Target domain test accuracy reported using CUDA over several SOTA domain alignment methods |
Special thanks to Statistics and Machine Learning Group, Department of Computer Science and Automation, Indian Institute of Science, Bengaluru, India for proving the necessary computational resources for the experiments.
Sourabh Balgi
M. Tech., Artificial Intelligence
Indian Institute of Science, Bangalore
<firstname><lastname>[at]gmail[dot]com, <firstname><lastname>[at]iisc[dot]ac[dot]in