-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated results - WOF - DiffPlace.js #1071
Comments
Hi @jbgriesner, If I had to design it right now, I would say that it should operate by looking at multiple WOF records and if one is a locality, the other is a localadmin, their names are the same, and the localadmin is the parent of the locality, we should consider them duplicates Which one to prefer is and interesting question. My intuition is it should default to the locality. If needed we could come up with something more complex. |
I'm currently in the process of refactoring the dedupe middleware in #1222 However, I suspect this issue will be improved by the work the WOF team is currently doing in whosonfirst-data/whosonfirst-data#1343 Deduplicating between We would need to choose if we want to be technically correct or user-friendly :) |
Here's another example of administrative area duplication: /v1/autocomplete?boundary.country=aus&text=gungahlin, Basically we get a WOF neighbourhood, locality, and localadmin with the same name, plus a Geonames record of the same name. The Geonames record shows as a venue, but is probably an admin area that's incorrectly classified by our importer |
All of these examples have now been fixed after #1230, except for http://pelias.github.io/compare/#/v1/search%3Ftext=Boissy%20St%20L%C3%A9ger which appears to be failing because of differing diacriticals. We can probably both fix that in WOF data and add code to ignore diacriticals when deduping. |
Brussels also has several duplicates: Some are in fact part of other localadmins like Dilbeek It seems like a WOF issue though? |
Some searches in France (such as "Lognes", "Sucy en Brie" or "Boissy St Léger") seem to lead to duplicated results.
These queries return, among others, respectively:
So there is apparently a problem with wof duplicate data "locality" and "localadmin", and also with duplication checking (in "middleware/dedup.js").
To fix this it is apparently possible either to change the wof import in order to prevent "locality" and "localadmin" duplicates, or to add another test in "isDifferent()" function from "helper/diffPlaces.js".
What do you think ?
The text was updated successfully, but these errors were encountered: