Analyzing the Impact of Domain Similarity: A New Perspective in Cross-Domain Recommendation

Abstract:

Cross-domain recommendation (CDR) has recently emerged as an effective way to alleviate the cold-start and sparsity issues faced by recommender systems, by transferring information from an auxiliary domain to a target domain to improve recommendations. Studying the similarity between domains is a novel direction in CDR research, potentially opening doors for further exploration. In this context, we introduce a systematic approach to quantify similarity between a pair of domains and explore how current CDR methods perform with both similar and dissimilar domain combinations. We achieve this by presenting two original similarity metrics. Our extensive empirical evaluation on different domain combinations demonstrates that the state-of-the-art CDR algorithms do not perform significantly better when using source domains that are more similar to the target domain, compared to those that are less similar. Importantly, we find that no matter how similarity is measured, it does not correlate with the recommendation performance of the state-of-the-art algorithms.

Usage

This is the code repository for this research project. This repository contains the source code of the two similarity metrics that were presented in the paper (Embedding-based Domain Similarity & Inter-domain Item Similarity). There are two folders: Full_Project_Code and Custom_Project_Code. The Full_Project_Code folder contains the code and the data we used to run the experiments from our paper. The Custom_Project_Code folder allows users to use our similarity metrics to compute similarities between their own source and target domain data.

Files/Directories Used to Run Experiments From Our Paper

The data that we used to run experiements is in the Full_Project_Code directory, and below are details regarding the files:

GloVe_File: This directory contains the pre-trained GloVe embeddings file that we use to retrieve embeddings for tags.
dataframes: This directory contains the dataframes for every domain that we used in the study.
domain_embeddings: This directory contains domain embeddings for each domain, which were created by running create_domain_embeddings.py.
domain_embedding_similarity_results: This directory contains the similarity values between different domain combinations across three datasets using the Embedding-based Domain Similarity method.
pairwise_similarities: This directory contains the similarity values between different domain combinations across three datasets using the Inter-domain Item Similarity method.
create_domain_embeddings.py: This file created the domain embeddings for each domain based on the dataframes for each domain.
domain_embedding_similarities.py: This file computes the similarity between domain embeddings using the Embedding-based Domain Similarity method, and writes the results to the domain_embedding_similarity_results directory.
pairwise_similarities.py: This file computes the similarity betwween domains using the Inter-domain Item Similarity method, and writes the results to the pairwise_similarities directory.
utils.py: This function contains helper functions that are used throughout the python files in this project repository.

Running Experiments From Our Paper

Enter the folder that contains the data we used for experimentaion
- cd Full_Project_Code
To retrieve the similarity values between domains using the Embedding-based Domain Similarity method, run the command below:
- python3 domain_embedding_similarities.py
To retrieve the similarity values between domains using the Inter-domain Item Similarity method, run the command below:
- python3 pairwise_similarities.py

Using Custom Data Run Your Own Experiments

To run the similarity metrics using your own data, navigate to the Custom_Project_Code directory:

dataframes: This directory should contain the dataframes for your source and target domains. Make sure each dataframe has only two columns (item_id & tags). The item_id column should be an integer, and the tags columns should be a string with tags seperated by commas. Convert your dataframes into pickle files using pandas.DataFrame.to_pickle() and place the pickle files in this directory. The names of the dataframes must be source_domain_df and target_domain_df.
domain_embeddings: This directory will contain domain embeddings for your source and target domains, which are created by running custom_domain_embeddings.py.
domain_embedding_similarity_results: This directory will contain the similarity values between your source and target domains using the Embedding-based Domain Similarity method.
pairwise_similarities: This directory contains the similarity values between your source and target domains using the Inter-domain Item Similarity method.
custom_domain_embeddings.py: This file creates the domain embeddings for your source and target domains based on the dataframes in the dataframes directory.
custom_domain_embedding_similarities.py: This file computes the similarity between your domain embeddings using the Embedding-based Domain Similarity method, and writes the results to the domain_embedding_similarity_results directory.
custom_pairwise_similarities.py: This file computes the similarity between your domains using the Inter-domain Item Similarity method, and writes the results to the pairwise_similarities directory.
custom_utils.py: This function contains helper functions that are used throughout the python files in this project repository and extra functions to deal with your custom data.

Running Embedding-based Domain Similarity Using Custom Source and Target Domains

Enter the folder that contains the data we used for experimentaion
- cd Custom_Project_Code
Create two dataframes called source_domain_df and target_domain_df that have two columns (item_id & tags).
Convert your dataframes into pickle files, and add them to the dataframes directory.
Create domain embeddings for both your source and target domains:
- python3 custom_domain_embeddings.py
To retrieve the similarity values between domains using the Embedding-based Domain Similarity method, run the command below:
- python3 custom_domain_embedding_similarities.py

Running Inter-domain Item Similarity Using Custom Source and Target Domains

Enter the folder that contains the data we used for experimentaion
- cd Custom_Project_Code
Create two dataframes called source_domain_df and target_domain_df that have two columns (item_id & tags).
Convert your dataframes into pickle files, and add them to the dataframes directory.
To retrieve the similarity values between domains using the Inter-domain Item Similarity method, run the command below:
- python3 custom_pairwise_similarities.py

`

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Custom_Data_Code		Custom_Data_Code
Full_Project_Code		Full_Project_Code
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing the Impact of Domain Similarity: A New Perspective in Cross-Domain Recommendation

Usage

Files/Directories Used to Run Experiments From Our Paper

Running Experiments From Our Paper

Using Custom Data Run Your Own Experiments

Running Embedding-based Domain Similarity Using Custom Source and Target Domains

Running Inter-domain Item Similarity Using Custom Source and Target Domains

About

Releases

Packages

Languages

ajaykv1/Domain-Similarity-CDR

Folders and files

Latest commit

History

Repository files navigation

Analyzing the Impact of Domain Similarity: A New Perspective in Cross-Domain Recommendation

Usage

Files/Directories Used to Run Experiments From Our Paper

Running Experiments From Our Paper

Using Custom Data Run Your Own Experiments

Running Embedding-based Domain Similarity Using Custom Source and Target Domains

Running Inter-domain Item Similarity Using Custom Source and Target Domains

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages