-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting results beyond defined time period #671
Comments
What's the full code snippet you have? How are you defining the times? |
This is the snippet relevant for the dates. I'm iterating over a dataframe getting dates like 2017-01-28 per line and turn them into datetime objects. They are then turned into 15 dates (+-7 days on the original date) using a loop and then appended to start_list whose items are later used as start_date. Using timedelta again on each element in start_list I get my end_date. |
Ah very strange - does the same thing happen if you run the same query in command line? What's the twarc.log output if you try this? |
Yeah, unfortunately same thing using command line. both yield results including tweets from day 2019-10-02. Attached the log file for first command: |
It's probably the result of some microservice Elon turned off. |
More seriously, if you are using twarc2 as a library then you should be able to notice when you've passed the limit and stop right? |
Yeah, I print every argument to check if they're correct. |
Do you see the same behaviour if you leave off the time zone argument and
just have a naive date time? (it shouldn't matter, but
...)
Also, is the overshoot a consistent amount of time? I'm wondering if
there's a subtle timezone problem somewhere in twarc
…On Fri, 2 Dec 2022, 07:44 xaracai, ***@***.***> wrote:
Yeah, I print every argument to check if they're correct.
Problem is since I want to analyze 500 tweets of only particular days for
a given query, results going beyoned the set date take away from my total
results as it seems like the date is simply being shifted a few hours
beyoned the set end date.
—
Reply to this email directly, view it on GitHub
<#671 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACADAUL7SUP4TLPVAYERM6TWLEL5DANCNFSM6AAAAAASQ7D3J4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Something to note is that I can't seem to reproduce the same thing in command line now anyway - so it may be something with their index or cache - i don't think we can or should try to fix it in twarc in code though. |
At first I thought it was only +1h but the more results I get for a given query the higher the amount of tweets belonging to the next day which would fit igorbrigadir's explanation of reverse chronological order... @SamHames leaving out the timezone argument does not change the results at all... I'm not using twarc from command line but I suppose iterating over the pages returned by the API would be the same as using the So there is nothing you can help me with? |
Yeah command line uses the exact same code so there shouldn't be any difference. It's a very strange error - Is it still happening? Since i can't reproduce the error with exact dates, maybe the bug is actually somewhere in this code that reads a dataframe and extracts dates with substrings here: #671 (comment) (this seems very specific to your case so i can't offer any help here - but pandas tag on stackoverflow is usually much more active for figuring these things out!) |
Hey ,
I'm using twarc2 as a library in python (not from command line) with search_all and academic access.
Everything is working fine except the defined start and end time does not seem to be working properly.
Whenever my time settings look like this from 2017-01-28 01:00:00+00:00 to 2017-01-28 23:00:00+00:00 or from 2017-01-28 00:00:00+00:00 to 2017-01-29 00:00:00+00:00 I get tweets from 2017-01-29 as well.
To clarify, I just want to get Tweets for one particular day, à 500 Tweets per query (6 pages using enumerate) which returns me around 60 tweets for the following day as well.
Can someone help or point me toward a solution (can poste python code too if needed)?
The text was updated successfully, but these errors were encountered: