-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would caching these changeset diffs help? #28
Comments
@iandees that sounds amazing! We have been toying with some ideas of side-stepping overpass entirely and building up changeset diffs through looking at minutely replication files, but that is probably a way away. Caching this on |
Sounds good. This is something that I've been wanting to do for quite some time, so it'll be nice to have someone to test it 😄 . |
Happy to test any time :) One thing to be aware of: the query to overpass is a little bit of a hack -- and while it is generally pretty reliable, there is a possibility that it would return wrong data. As you can see, what it does is something like: query the OSM API for details on the changeset -- bbox, created_at and closed_at. It then queries Overpass for all features within that bbox that were modified during those times. So it is entirely possible that there would have been other changes in that bbox at the same time that will get mixed up in here. But this seems, unfortunately, like the best way to get a diff of features in a changeset -- it mimics the technique used by Achavi So far though, we have not noticed any discrepancies, but just so that you know, it is entirely possible that there will be some. (If you are hoping to have an authoritative cache of changeset diffs, there may be edge cases where the data is wrong). |
A little update here: in the interest of being able to build more than just changeset diffs I'm importing full-history data into a database and will use that to build the diffs. This is why it's taking longer :). |
@iandees very curious to know if there's any update here :) - no rush of course, just what you outlined seems super exciting, so just happy to jump onto this if there's anything to test even if it's a bit raw - also happy to help if there's anything I can do to make this happen. |
My progress so far has been slowed by trying to load a full-history database dump into a database. I eventually hacked together a libosmium program to dump a TSV file that I then loaded into an RDS PostgreSQL instance. Based on that I wrote a bit of Python that downloads an OSM changeset and then queries the above database to find the before and after geometry of the data in a changeset. I ended up using a format that's slightly different than the Overpass result. Check out an example output here. It's rather long because it includes geometry for the relations, but it would be fairly simple to (optionally) exclude relations. The credits I'm using to pay for the ~$150/mo database expire today, so I will have to throw out this database or find somewhere else to host it. I've thought about putting together a PR for Overpass to better handle this specific changeset diff situation (via caching and a more reliable query), too. |
I noticed that you've recently moved away from using Overpass to get changeset diff data. I've started to think more about this ticket again, and wondering if you think it'd be useful to break out the code you're writing to build the diff data into a separate module. |
Hey @iandees! We started caching changesets and augmented diffs. More here: http://www.openstreetmap.org/user/geohacker/diary/40846 |
I've been poking around with caching OSM data like crazy and these changeset diffs are a prime example of something that it seems would be perfect caching candidates.
Am I correct in thinking that the changeset diff from Overpass does not change once the changeset has been closed? Do you make frequent requests for the same changeset?
I would propose that I set up an S3 bucket that this webapp can hit. If the changeset isn't in the S3 bucket, it forwards to a lambda function (via API gateway) that makes the same request that you do now, but also saves it to S3 so that future requests can return the static file immediately.
The text was updated successfully, but these errors were encountered: