Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated results - WOF - DiffPlace.js #1071

Open
jbgriesner opened this issue Dec 6, 2017 · 5 comments
Open

Duplicated results - WOF - DiffPlace.js #1071

jbgriesner opened this issue Dec 6, 2017 · 5 comments

Comments

@jbgriesner
Copy link

jbgriesner commented Dec 6, 2017

Some searches in France (such as "Lognes", "Sucy en Brie" or "Boissy St Léger") seem to lead to duplicated results.

These queries return, among others, respectively:

So there is apparently a problem with wof duplicate data "locality" and "localadmin", and also with duplication checking (in "middleware/dedup.js").

To fix this it is apparently possible either to change the wof import in order to prevent "locality" and "localadmin" duplicates, or to add another test in "isDifferent()" function from "helper/diffPlaces.js".

What do you think ?

@orangejulius
Copy link
Member

Hi @jbgriesner,
Thanks for providing some very nice test cases. I believe we should solve this in the API deduplication middleware.

If I had to design it right now, I would say that it should operate by looking at multiple WOF records and if one is a locality, the other is a localadmin, their names are the same, and the localadmin is the parent of the locality, we should consider them duplicates

Which one to prefer is and interesting question. My intuition is it should default to the locality. If needed we could come up with something more complex.

@missinglink
Copy link
Member

I'm currently in the process of refactoring the dedupe middleware in #1222

However, I suspect this issue will be improved by the work the WOF team is currently doing in whosonfirst-data/whosonfirst-data#1343

Deduplicating between localadmin and locality layers is a UX question, in a lot of cases, these two concepts are different from a legal and administrative point-of-view but synonymous from a casual users perspective.

We would need to choose if we want to be technically correct or user-friendly :)

@orangejulius
Copy link
Member

orangejulius commented Oct 30, 2018

Here's another example of administrative area duplication:

/v1/autocomplete?boundary.country=aus&text=gungahlin,
image

Basically we get a WOF neighbourhood, locality, and localadmin with the same name, plus a Geonames record of the same name. The Geonames record shows as a venue, but is probably an admin area that's incorrectly classified by our importer

@orangejulius
Copy link
Member

All of these examples have now been fixed after #1230, except for http://pelias.github.io/compare/#/v1/search%3Ftext=Boissy%20St%20L%C3%A9ger which appears to be failing because of differing diacriticals. We can probably both fix that in WOF data and add code to ignore diacriticals when deduping.

@bboure
Copy link
Member

bboure commented Jun 22, 2020

Brussels also has several duplicates:
https://pelias.github.io/compare/#/v1/autocomplete?layers=locality&text=Bruss&debug=0

Some are in fact part of other localadmins like Dilbeek

It seems like a WOF issue though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants