-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SPARQL queries and OWLTools by a ROBOT plugin #1174
base: master
Are you sure you want to change the base?
Conversation
Install my experimental ROBOT plugin as a built-in plugin under the name "odk". This is for experimentation only -- I use this plugin to trial the use of pluggable commands in the ODK workflows. If we go on with that route, we will create a dedicated ODK plugin later.
When preparing import modules, we do a few things: (1) add a dc:source ontology annotation, derived from the version IRI of the original ontology; (2) remove all other ontology annotations, keeping only the newly added dc:source; (3) inject proper SubAnnotationPropertyOf axioms for properties representing subsets and synonym types. All those steps are currently performed by SPARQL queries. Here we replace those queries by calls to the `odk:annotate` command, which takes care of (1) and (2), and to the `odk:normalize` command, which takes care of (3). Of note, the fact that we are no longer going through a SPARQL processing step means that we could end up with duplicated axioms with different sets of annotations. Those were automatically merged as a side-effect of the SPARQL processing (which involves dumping the output of the SPARQL processing and re-parsing it again into OWLAPI objects). Since we no longer benefit from that side-effect, we must explicitly include a step in which we merge duplicated axioms (theoretically this could be done with `robot repair --merge-axiom-annotations`, but unfortunately this command does not behave exactly like we would [1]). [1] ontodev/robot#1239
We are still using OWLTools for two things: (1) creating ontology subsets; (2) merging duplicated axioms in the source file. Those tasks can now be done by the `odk:subset` command and the `odk:normalize` command, respectively.
The inject-subset-declaration.ru and inject-synonymtype-declaration.ru SPARQL queries are no longer used in any standard workflows.
Now that the standard workflows no longer use OWLTools, there is no longer any need for OWLTools to be present in ODKLite (whose purpose is to contain all the tools needed by the standard workflows, and only those tools). We thus move it to ODKFull.
By default, the `odk:subset` does _not_ send the generated subset down the ROBOT pipeline, unless the `--replace true` option is used.
We add a new option in the 'robot_report' section called 'upper_ontology'. If set, it should be the (resolvable) IRI of an upper ontology (such as http://purl.obolibrary.org/obo/cob.owl). When set, a a new report is added to the list of ROBOT report, one that tests whether all classes of the ontology are classified under one of the classes of the upper ontology. The new report uses the same parameters as the standard ROBOT reports regarding the file to perform the check on ('report_on') and whether the check should be limited to classes within the project's namespaces or not ('use_base_iris'). See #1175
This commit fixes several issues with the generated Makefile rule that performs the alignment check: * Only include the aligment report when an upper ontology is defined. * When asked to perform the check on the -edit file, actually perform it on the $(SRCMERGED) file (for consistency with other reports). * Use the reasoner defined in the project, if any. * Fix formatting so that the generated rules are somewhat readable.
The status of the `project.robot_report.upper_ontology` field cannot simply be tested with either 'is defined' or 'is not none', because it will depend on whether a `robot_report` section exists at all: * without a `robot_report` section, the field is *defined* but is None (default value); * with a `robot_report` section but no `upper_ontology` field, the field is *not defined*. So to cover both cases, we need to test both for the existence of the field, and whether it is None or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is sooooo awesome. I have a few questions (in code) and 1 general concern.
Is there any way you would agree to move the ODK robot plugin into the INCATools org? I would feel a bit better if that component that is shaping up to be a core component of the ODK build system would live a bit more visibly here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to be careful here of one thing: Not to delete this file in repos during some future implementation update_repo
cleanup. Currently I think this is ok.
The reason, I know for a fact that this file (and the subset one) is used in at least 8 or so custom.Makefile
(basically everytime someone had to customise a imports/*_import.owl
goal).
@@ -290,7 +288,8 @@ all_mappings: $(MAPPING_FILES) | |||
# ---------------------------------------- | |||
|
|||
OBO_REPORT = {% for x in project.robot_report.report_on|default(["edit"]) %} {% if x=="edit" %}$(SRC){% else %}{{ x }}{% endif %}-obo-report{% endfor %} | |||
REPORTS = $(OBO_REPORT) | |||
ALIGNMENT_REPORT = {% for x in project.robot_report.report_on|default(["edit"]) %} {% if x =="edit" %}$(SRC){% else %}{{ x }}{% endif %}-align-report{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
|
||
$(REPORTDIR)/$(SRC)-align-report.tsv: $(SRCMERGED) | $(REPORTDIR) all_robot_plugins | ||
$(ROBOT) odk:validate -i $< --reasoner $(REASONER) \ | ||
--upper-ontology-iri {{ project.robot_report.upper_ontology }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic design decision as it allows us to use version IRIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also means that the validation step will download the upper ontology whenever it is needed, which in turns mean that the validation becomes susceptible to network glitches.
I’d like to recommend that projects that want to enable validation against an upper ontology take the supplementary steps of:
- downloading the upper ontology (in the version that they want) once and for all, and commit it somewhere in the repo;
- add an entry in their XML catalog so that the upper ontology IRI redirects to the local file.
{% if project.namespaces is not none %}{% for iri in project.namespaces %}--base-iri {{ iri }} \ | ||
{% endfor %}{% else %}--base-iri $(URIBASE)/{{ project.id.upper() }}_ --base-iri $(URIBASE)/{{ project.id }} \ | ||
{% endif %}{% else %} \ | ||
{% endif %}--report-output $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some additional thoughts here (what would be useful in that report), not sure if you want to discuss this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in the dedicated issue (#1175)?
$(IMPORTDIR)/%_import.owl: {% if project.import_group.use_base_merging %}$(MIRRORDIR)/merged.owl{% else %}$(MIRRORDIR)/%.owl{% endif %} $(IMPORTDIR)/%_terms_combined.txt | all_robot_plugins | ||
$(ROBOT) odk:annotate -i $< --remove-all true --add-source true \ | ||
extract -T $(IMPORTDIR)/$*_terms_combined.txt --force true --copy-ontology-annotations false --individuals {{ project.import_group.slme_individuals }} --method {{ project.import_group.module_type_slme }} \ | ||
odk:normalize --base-iri {{ project.uribase }} --inject-subset-declarations true --inject-synonym-declarations true --merge-axioms true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be such a relief on memory consumption for larger ontologies! awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of note, the odk:annotate
could hopefully be replaced by
annotate --remove-annotations --interpolate --link-annotation dc:source %{version_iri}
if the interpolation feature makes it way into upstream ROBOT.
$(ROBOT) merge -i $< \{% if project.import_group.exclude_iri_patterns is not none %} | ||
{% for pattern in project.import_group.exclude_iri_patterns %}remove --select "{{ pattern }}" {% endfor %} \{% endif %} | ||
extract -T $(IMPORTDIR)/merged_terms_combined.txt --force true --copy-ontology-annotations true --individuals {{ project.import_group.slme_individuals }} --method {{ project.import_group.module_type_slme }} \ | ||
remove $(patsubst %, --term %, $(ANNOTATION_PROPERTIES)) -T $(IMPORTDIR)/merged_terms_combined.txt --select complement --select annotation-properties \ | ||
query --update ../sparql/inject-subset-declaration.ru --update ../sparql/inject-synonymtype-declaration.ru --update ../sparql/postprocess-module.ru \ | ||
odk:normalize --base-iri {{ project.uribase }} --inject-subset-declarations true --inject-synonym-declarations true --merge-axioms true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mildly concerned of using {{ project.uribase }}
; what exactly is this parameter used for in odk:normalize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as in the original inject-subset-declaration.ru.jinja2
SPARQL query:
INSERT { ?y rdfs:subPropertyOf oboInOwl:SubsetProperty . }
WHERE {
?x oboInOwl:inSubset ?y .
FILTER(isIRI(?y))
FILTER(regex(str(?y),"^(http://purl.obolibrary.org/obo/)")
|| regex(str(?y),"^(http://www.ebi.ac.uk/efo/)")
|| regex(str(?y),"^(https://w3id.org/biolink/)")
|| regex(str(?y),"^({{ project.uribase }})"))
}
That is, it restricts the injection of the SubAnnotationPropertyOf
axiom to subset properties that are in the namespace of the project (in addition to the built-namespaces http://purl.obolibrary.org/obo/
, http://www.ebi.ac.uk/efo/
, https://w3id.org/biolink/
).
So a oboInOwl:inSubset
annotation with a value of http://example.org/external/namespace
, where http://example.org/external/namespace
is unrelated to {{ project.uribase }}
, would not cause http://example.org/external/namespace
to be classified as a subproperty of oboInOwl:SubsetProperty
.
This is the original behaviour, which is unchanged by the use of the plugin.
Likewise for synonym type declarations.
$(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\ | ||
$(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@ | ||
$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR) all_robot_plugins | ||
$(ROBOT) odk:subset -i $< --subset $* --fill-gaps true --replace true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 WOW! That would be awesome, but...
I was a bit surprised to see this owltools use anyways because I thought this might have been replaced by robot extract -m subset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wish!
The subset
method of robot extract
only implements¹ the “gap spanning” mode of owltools --extract-ontology-subset
(the default mode, used when the --fill-gaps
option is not specified). There is no way with ROBOT to reproduce the behaviour of the “gap filling” mode – I have already raised that issue here.
More generally, OWLTools has 3 different ways of creating an “ontology subset” (that I know of – there might even be more that I have never encoutered). And of course they all behave slightly differently, because where would be the fun otherwise?
--extract-ontology-subset --subset <SUBSET_NAME>
Used to extract a subset defined by oboInOwl:inSubset
annotations. This uses “gap spanning”, meaning basically that the subset only contains the classes defined in the subset, but if two classes are indirectly related through several relationships that involve intermediate classes that are not in the subset, that relation will be preserved (a direct relation between the two classes will be automatically fabricated).
This is what the -m subset
method of robot extract
does.
--extract-ontology-subset --subset <SUBSET_NAME> --fill-gaps
Also used to extract a subset defined by oboInOwl:inSubset
annotations, but using “gap filling”, meaning basically that the initial list of classes in the subset is expanded to include any classes referred from within the subset. This creates a self-sufficient “closure”.
Not doable at all with ROBOT.
--reasoner-query <DL_QUERY> --make-ontology-from-results
Used to create a subset defined by a DL query, such as 'part of' some 'nervous system'
. It doesn’t do any “gap filling” or “gap spanning”, but it does take care of including any object property used in the subset. It can optionally do a limited form of “gap filling”, but only for is-a relationships (including all parent classes all the way to owl:Thing
), by asking the reasoner to include the ancestors that match the query (the default is to only include the equivalent and the subclasses).
Not doable at all with ROBOT.
My proposed odk:subset
command implements the last two methods.
(I do wonder why the first method is the one that got implemented in ROBOT, as it seems to be the least used method. It is not used at all in any ODK standard workflow, and even Uberon, which creates literally dozens of subsets, only uses it for one subset.)
¹ Kind of. robot extract -m subset
does not take care of retrieving the classes flagged as belonging to the subset, so that must be done by another step beforehand. Also, it requires an explicit list of properties to include in the subset, instead of automatically using all properties that are used by the classes of the subset.
$(ROBOT) {% if ont.is_large %}extract -i $< -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT \{% else %}query -i $< --update ../sparql/preprocess-module.ru \ | ||
extract -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT \{% endif %} | ||
$(IMPORTDIR)/{{ ont.id }}_import.owl: {% if project.import_group.use_base_merging %}$(MIRRORDIR)/merged.owl{% else %}$(MIRRORDIR)/{{ ont.id }}.owl{% endif %} $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt | all_robot_plugins | ||
$(ROBOT) odk:annotate -i $< --remove-all true --add-source true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does --remove-all true
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It removes all existing ontology annotations (which is what the postprocess-module.ru
query currently does).
I wrote that before realizing that the original annotate
command in upstream ROBOT already has the same feature (robot annotate --remove-annotations
). 🤦
There is currently no ODK ROBOT plugin. This PR is using my experimental plugin (emphasis on experimental), which nobody should ever think about using seriously in a production pipeline. That plugin is basically my playground, where I When there will be a ROBOT plugin (using consolidated ideas from my experimental one), it could of course be hosted somewhere in the INCATools organisation. Though its Java namespaces would likely still be somewhere under |
Well, this is no longer true. :) The “ODK ROBOT plugin” exists now. Derived from my experimental plugin, but containing only the commands that are deemed useful for the ODK, not the more dubious experimental ones. It is in my own account for now, but happy to transfer its ownership to the INCATools organisation if you prefer. |
Replace my experimental ROBOT plugin by a proper ODK plugin. This involves modifying the calls to some commands, as the consolidated ODK plugin use slightly different options compared to the original experimental plugin.
This PR is intended to experiment using a ROBOT plugin to replace both
(1) SPARQL queries (as suggested in #1169), and
(2) OWLTools (still used in standard workflows for two things: normalising a OBO source file, and creating subsets -- #622).
For now, this is using my own “experimental ROBOT plugin”. If we are happy with the experiment, we can then create a proper ODK plugin for ROBOT (and/or push some of the features in upstream ROBOT).