Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent slow/error nominatim responses #1069

Closed
Tuxrug opened this issue Apr 30, 2024 · 5 comments
Closed

Intermittent slow/error nominatim responses #1069

Tuxrug opened this issue Apr 30, 2024 · 5 comments
Labels
location:osuosl service:nominatim The geocoding service that powers search on osm.org

Comments

@Tuxrug
Copy link

Tuxrug commented Apr 30, 2024

Another user and I had noticed an issue getting errors from the nominatim API. @mtmail noticed nominatim.openstreetmap.org usage graphs show a lot of slow queries today and suggested reporting it here. It looks like this is clearing up as I am now having trouble replicating it, however I am reporting it just in case it needs any further investigation.

Original issue: osm-search/Nominatim#3405

Steps to reproduce:

Observed behavior:

  • Queries are intermittently delayed heavily or return errors such as HTTP 500
@tomhughes tomhughes changed the title Intermittent slow/error responses Intermittent slow/error nominatim responses Apr 30, 2024
@tomhughes tomhughes added service:nominatim The geocoding service that powers search on osm.org location:osuosl labels Apr 30, 2024
@tomhughes
Copy link
Member

It's just the US server I think - it looks like somebody is probably doing some scraping or something and it is overloaded.

I did try and investigate earlier when it was first reported but I couldn't find any sort of access log that would let me look for IPs to block so it probably needs @lonvia to deal with it.

@lonvia
Copy link

lonvia commented May 1, 2024

Looks like they've hit stormfly with 500 parallel connections. They are gone now, so feel free to close. I need to think about better monitoring for this kind of situation.

Logs are for historic reasons in different locations on the different servers. I've added a symlink in /var/log/nginx now to make them easier findable the next time.

@tomhughes
Copy link
Member

I think the nftables rate limiting should have blocked that - more likely it was a small number of real connections with lots of multiplexed http2 streams much like we saw on the main site some weeks ago.

That said as you're using nginx it has much better support for rate limiting that ought to be usable to block that sort of thing I think.

@Firefishy
Copy link
Member

Firefishy commented May 1, 2024

Just an quick look at the logs...

  • 24x IPs exceeding 1 req/s. (34x if including 301 responses)
  • BLOCKED: 4x AWS EC2 IPs reverse scraping in same region using same python version (user-agent), from above 24x.
  • BLOCKED: 2x AWS EC2 IPs scraping Brazil addresses using same user-agent which identifies as jobs company, from above 24x.
  • BLOCKED: 1x AWS EC2 IPs reverse scraping using generic user agent, likely vehicle, from above 24x.

Some other user-agents in top 24x are: YourAppName, my_app, python-requests/x.y.z, Java/x.y.z_aaa, "Chome" and tutorial which I would consider blocking as against usage policy.

Here is my code:

# Creating the top IP list
grep -F '" 200 ' /var/log/nominatim/nominatim.openstreetmap.org-access.log | cut -d ' ' -f 1 | sort -S 25% --parallel=4 | uniq -c | sort -S 25% --parallel=4 -nr | head -n 1000 | tee top-ips-nominatim-200response-20240501.txt
# Viewing sample queries from top IP list
for i in $(head -n 24 /home/grant/top-ips-nominatim-200response-20240501.txt|awk '{print $2}'); do echo "${i}"; tac /var/log/nominatim/nominatim.openstreetmap.org-access.log|grep -F "${i}" | head -n 10; done | less

@lonvia
Copy link

lonvia commented May 1, 2024

I've checked the logs now. This was a mass geocoder using approx. 580 servers from Google Cloud, each sending requests at a rate of a bit less than 1 request/s.

We are pretty well set when it comes to limiting requests from single IPs. It's just when people start using bot nets when things are failing. Thankfully it is rare to see it on this scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
location:osuosl service:nominatim The geocoding service that powers search on osm.org
Projects
None yet
Development

No branches or pull requests

4 participants