Paper accepted in ICDM 2019:19th IEEE International Conference on Data Mining, Beijing, China, 8-11 November 2019.
The original code base for the experiments and results for language datasets.
We consider Amazon Customer Reviews Dataset with 4 domains Books, DVDs, Electronics and Kitchen Appliances located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification.
You will need:
- Python 3.6 (Anaconda Python recommended)
- PyTorch
- torchvision
- nltk
- pandas
- scipy
- tqdm
- scikit-image
- scikit-learn
- tensorboardX
- tensorflow==1.13.1 (for tensorboard visualizations)
On Linux:
> conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
install relevant cuda if GPUs are available. Use of GPUs is very much recommended and inevitable because of the size of the model and datasets.
Use pip as below:
> pip install -r requirements.txt
We consider Amazon Customer Reviews Dataset with 4 domains Books (B) , DVDs (D), Electronics (E) and Kitchen Appliances (K) located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification. The processed data of Amazon CUstomer Reviews dataset is obtained from MAN github repo.
Language dataset statistics used for language domain adaptation |
Target domain test accuracy reported using CUDA over several SOTA domain alignment methods |
B -> D | B -> E | B -> K |
D -> B | D -> E | D -> K |
E -> B | E -> D | E -> K |
K -> B | K -> D | K -> E |
- The t-SNE plots indicates inclined line-like clustering in both Source (x) and Target (+) domain with each class at either ends.
SB_lang_00_ss.py
: Code for source supervised only settingSB_lang_00_ss_tu.py
: Code for source supervised + target unsupervised only settingSB_lang_00_ss_tu_su.py
: Code for source supervised + target unsupervised + source unsupervised settingSB_lang_00_ss_tu_su_sa.py
: Code for source supervised + target unsupervised + source unsupervised + source adversarial settingSB_lang_00_ss_tu_su_ta.py
: Code for source supervised + target unsupervised + source unsupervised + target adversarial settingSB_lang_00_ss_tu_su_ta_sa.py
: Code for source supervised + target unsupervised + source unsupervised + target adversarial + source adversarial setting
cuda.sh
: this file consists of commands to run the experiments simultaneously in batch on multiple GPUs.
data
: This folder is where the datasets are stored.datasets
: This folder consists of all the pytorch dataset files used for dataloading.logs
: This folder consists of logs from the previous simulations whose results are reported in the paper. The logs are created to store all the settings and parameters for reproducibility.model
: This folder consists of all the variants of neural networks used in MAN and CMD.