Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported data statistics do not match #57

Open
hv-abacus opened this issue May 25, 2023 · 1 comment
Open

Reported data statistics do not match #57

hv-abacus opened this issue May 25, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@hv-abacus
Copy link

Hi, I downloaded the Amazon dataset from here: https://recbole.s3-accelerate.amazonaws.com/CrossDomain/Amazon.zip
The dataset statistics that you report here do not match with what I compute from the original data.
I removed all rows with NaNs and compute the number of unique values present in the user_id column in the original .inter files. This gives the following statistics:

Number of users in AmazonBooks: 687827
Number of users in AmazonMov: 66317
Number of overlapping users: 27516

Am I doing something wrong?

@hv-abacus hv-abacus added the bug Something isn't working label May 25, 2023
@Wicknight
Copy link
Collaborator

Hello @hv-abacus ,
It seems that you are not filtering the data. The dataset statistics that you report here were obtained after 10-core filtering, which were specified by parameters 'user_inter_num_interval' and 'item_inter_num_interval' in the yaml file. You can use our yaml file to run code directly on Amazon datasets and you can obtain the same statistics.

@Wicknight Wicknight added question Further information is requested and removed bug Something isn't working labels Dec 8, 2023
@Wicknight Wicknight self-assigned this Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants