Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searches on cl.obo, uberon.obo fail #49

Closed
caufieldjh opened this issue Apr 29, 2022 · 3 comments
Closed

Searches on cl.obo, uberon.obo fail #49

caufieldjh opened this issue Apr 29, 2022 · 3 comments

Comments

@caufieldjh
Copy link
Collaborator

I'm following the tutorial, and diverged a bit to try searches in other OBO ontologies.

The following search fails:

$ runoak -i obolibrary:cl.obo search 'epithelial cell of lung'
/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/_fastobo.py:84: SyntaxWarning: source document contains incomplete creation date: 2021-11-08
  process_clause_typedef(clause, data, self.ont)
/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/_fastobo.py:84: NotImplementedWarning: cannot process `equivalent_to_chain: attaches_to part_of` macro
  process_clause_typedef(clause, data, self.ont)
Traceback (most recent call last):
  File "/home/harry/oak-tutorial/oak-env/bin/runoak", line 8, in <module>
    sys.exit(main())
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1654, in invoke
    super().invoke(ctx)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/oaklib/cli.py", line 140, in main
    settings.impl = impl_class(resource)
  File "<string>", line 6, in __init__
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/oaklib/implementations/pronto/pronto_implementation.py", line 73, in __post_init__
    ontology = Ontology.from_obo_library(resource.slug)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/ontology.py", line 206, in from_obo_library
    return cls(
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/ontology.py", line 283, in __init__
    cls(self).parse_from(_handle)  # type: ignore
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/obo.py", line 48, in parse_from
    self.symmetrize_lineage()
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/base.py", line 84, in symmetrize_lineage
    graphdata.lineage[superentity].sub.add(subentity)
KeyError: 'CARO:0000000'

Looks like a pronto parsing issue?

Searching cl.owl instead produces numerous errors but eventually yields the desired result (CL:0000082 ! epithelial cell of lung)

I get a similar result when running

$ runoak -i obolibrary:uberon.obo search 'epithelial cell'

for Uberon, though in that case searching uberon.owl instead raises an AttributeError:

Traceback (most recent call last):
  File "/home/harry/oak-tutorial/oak-env/bin/runoak", line 8, in <module>
    sys.exit(main())
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1654, in invoke
    super().invoke(ctx)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/oaklib/cli.py", line 140, in main
    settings.impl = impl_class(resource)
  File "<string>", line 6, in __init__
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/oaklib/implementations/pronto/pronto_implementation.py", line 73, in __post_init__
    ontology = Ontology.from_obo_library(resource.slug)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/ontology.py", line 206, in from_obo_library
    return cls(
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/ontology.py", line 283, in __init__
    cls(self).parse_from(_handle)  # type: ignore
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/rdfxml.py", line 115, in parse_from
    self._extract_annotation_property(prop, curies)
  File "/home/harry/oak-tutorial/oak-env/lib/python3.9/site-packages/pronto/parsers/rdfxml.py", line 668, in _extract_annotation_property
    label = elem.find(_NS["rdfs"]["label"]).text
AttributeError: 'NoneType' object has no attribute 'text'
@cmungall
Copy link
Collaborator

Yes, see althonos/pronto#159 and althonos/pronto#156

V soon it will be possible to work with local rdf/owl files in OAK...

@cmungall
Copy link
Collaborator

cmungall commented May 4, 2022

There are now a few more options that don't depend on pronto:

https://incatools.github.io/ontology-access-kit/selectors.html

ubergraph:

runoak -i ubergraph:cl search 'epithelial cell of lung' 

warning: this is a relevancy ranked list that uses blazegraph so you will get a lot of partial matches after the first exact match

if you give it an OWL file it will use rdflib:

wget http://purl.obolibrary.org/obo/cl.owl -O /tmp/cl.owl
runoak -i /tmp/cl.owl search 'epithelial cell of lung' 

however, rdflib is quite slow to do the initial parse

you can use a ready-made sqlite db:

semsql download cl -o cl.db
runoak -i cl.db search 'epithelial cell of lung' 

very soon it will be possible to do this:

wget http://purl.obolibrary.org/obo/cl.owl -O /tmp/cl.owl
runoak -i sqlite:/tmp/cl.owl search 'epithelial cell of lung' 

and rdftab/relation graph will be used behind the scenes to build the db

Each implementation will implement search a little differently but the goal is to come up with a unified search datamodel so that this can be better specified by the user and explained by the provider, comments welcome:

https://incatools.github.io/ontology-access-kit/datamodels/search/index.html

@caufieldjh
Copy link
Collaborator Author

Great - local search on an OWL works, though it is slow as promised.

$ runoak -i /tmp/cl.owl search 'epithelial cell of lung'
WARNING:root:Using rdflib rdf/xml parser; this behavior may change in future
CL:0000082 ! epithelial cell of lung

And the same with the identical term in Uberon:

$ runoak -i /tmp/uberon.owl search 'epithelial cell of lung'
WARNING:root:Using rdflib rdf/xml parser; this behavior may change in future
CL:0000082 ! epithelial cell of lung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants