Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Hydrator for multiple CSV files #94

Open
grrigore opened this issue May 31, 2021 · 7 comments
Open

Run Hydrator for multiple CSV files #94

grrigore opened this issue May 31, 2021 · 7 comments

Comments

@grrigore
Copy link

I have a folder with 100 .csv files of different sizes. Is there a way to hydrate those files without manually adding each file into the Hydrator app?

@edsu
Copy link
Member

edsu commented May 31, 2021

Do your CSV files only contain a column of numbers? Or do they include other columns as well? Also what operating system are you using?

@grrigore
Copy link
Author

My .csv files contain tweet's ID.I am using Ubuntu.

@edsu
Copy link
Member

edsu commented May 31, 2021

Do the CSV files have a column header? Or are the files just lines of numbers?

@grrigore
Copy link
Author

grrigore commented May 31, 2021

This is a preview from a .csv file:
ID, TextBlob score (I can remove this)

1385449730818285569,0.125
1385449730981842946,0
1385449730981957635,-0.0062500000000000056
1385449730948288516,0.26666666666666666
1385449731132989440,-0.016666666666666677
1385449731086708736,0
1385449731267178496,0.3

I am using data from here

@edsu
Copy link
Member

edsu commented May 31, 2021

You will want to ensure that your input file is a text file where each line contains a tweet id and nothing else. So that TextBlob score will need to be removed as will any column headers.

I don't actually see data with that format in the dataset you linked to. If you are working with a very large dataset (hundreds of millions of tweets) you might want to use twarc instead of Hydrator.

@edsu edsu closed this as completed May 31, 2021
@edsu edsu reopened this May 31, 2021
@edsu
Copy link
Member

edsu commented May 31, 2021

Sorry i should have left this open to see if you have any more questions.

@grrigore
Copy link
Author

No problem. 🙂 I think twarc it's a better tool for what I want. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants