Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apertium RDF - duplicated entries #8

Open
jogracia opened this issue Jul 24, 2020 · 1 comment
Open

Apertium RDF - duplicated entries #8

jogracia opened this issue Jul 24, 2020 · 1 comment

Comments

@jogracia
Copy link

The following query gets information about "abrupt"@en. This retrieves (wrongly, I guess) two different URIs for the corresponding lexical entry

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX vartrans: <http://www.w3.org/ns/lemon/vartrans#>
PREFIX lime: <http://www.w3.org/ns/lemon/lime#>
PREFIX lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?lex_entry ?pos ?source ?lexicon ?language ?written_rep
FROM <http://linguistic.linkeddata.es/id/apertium-lexinfo/>
WHERE {
   ?lex_entry ontolex:lexicalForm ?lemon_form ;
      lexinfo:partOfSpeech ?pos ;
      dc:source ?source .

   ?lemon_form ontolex:writtenRep "abrupt"@en ;
      ontolex:writtenRep ?written_rep .

   ?lexicon lime:entry ?lex_entry ;
      lime:language ?language  .
   
}

Result:

  lex_entry pos source lexicon language written_rep
1 http://linguistic.linkeddata.es/id/apertium/lexiconEN/abrupt-en lexinfo:adjective https://github.com/apertium/apertium-trunk.git http://linguistic.linkeddata.es/id/apertium/lexiconEN "en" "abrupt"@en
2 http://linguistic.linkeddata.es/id/apertium/lexiconEN/abrupt-adj-en lexinfo:adjective https://github.com/apertium/apertium-trunk.git http://linguistic.linkeddata.es/id/apertium/lexiconEN "en" "abrupt"@en

Observe the two URIs to represent the same entity:
http://linguistic.linkeddata.es/id/apertium/lexiconEN/abrupt-en
http://linguistic.linkeddata.es/id/apertium/lexiconEN/abrupt-adj-en

Why is this happening?

@jogracia
Copy link
Author

Adding here a preliminary answer by Max Ionov (22/5/20):

This is not so much about duplicate URIs, but about the same word from different lexicons. More specifically, the first “abrupt” comes from LexiconEN from either the EN-ES or EN-KK dictionaries. The second ones comes from EN-CA and it’s connected to its strange tagset.

This problem actually unearths several (which were already known): (a) converting the part of speech part of URIs to UD or other standartised tagset and (b) providing metadata about the dictionary from which the lexicon comes from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant