-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with the Ntriple file from Kaggle #1
Comments
Legend! Thanks for writing this out, we will try to integrate this in our pipeline so that the issue is resolved for next versions. |
Hi, I noticed that the latest version available on Kaggle seems to have solved those encoding issues, thanks! The version 11 file is half the size (500M) of the version 9 (1G) I cannot find Mesh keywords in the latest version (previously defined using http://idlab.github.io/covid19#paragraphEntities ) We can only find dbpedia mappings defined using http://idlab.github.io/covid19#hasConcept Is it normal? |
Hi, @bsteenwi made some changes to the final version to reduce the size. He did indeed remove some of the relations, but I am not sure which ones exactly... |
Hi, the last version of the KG does indeed mis some links. I will update the mapping scripts in this repository, so it easier to see which relations are available |
Ok, we were planning to integrate your KG to the Mesh vocabulary and complementary resources (other publications KG about covid, drug, pathways db, etc). And are less interested in the dbpedia mappings (mainly due to data quality issues) Do you know if you plan to make MeSH annotations available again soon? A small note also: for MeSH URI you are using HTTPS (e.g. https://id.nlm.nih.gov/mesh/D007251) Thanks! |
First I would like to thank you for this KG and its documentation!
I tried to deploy your Notebooks on my infrastructure (in a Jupyterlab with root user)
I faced issues when loading the provided ntriples file from Kaggle: https://www.kaggle.com/group16/covid19-literature-knowledge-graph
http://dbpedia.org/datatype/polishZ\u0142oty does not look like a valid URI, trying to serialize this will break.
datatype rdf:langString requires a language tag
Not sure if the encoding issue is due to my environment (running Ubuntu 18.04)
I found a rather clean way to solve those issues:
apt-get install raptor2-utils rapper -i ntriples -o turtle kg.nt > ugent-covid-kg.ttl
^^rdf:langString
with^^<http:\/\/www.w3.org\/2001\/XMLSchema#string>
I uploaded the Notebooks to this GitHub repository and detailed the process to download the ntriples: https://github.com/MaastrichtU-IDS/covid-kg-notebooks/#download-data
I loaded the graph in a GraphDB triplestore, it can be browsed and URI resolved using this web browser:
http://trek.semanticscience.org/describe?uri=http://idlab.github.io/covid19#ffe663e4ef5018da41f057533520b9d85ec86e18&endpoint=https://graphdb.dumontierlab.com/repositories/covid-kg
I will add search index and HCLS descriptive metadata soon if you are interested
The text was updated successfully, but these errors were encountered: