Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace SPARQL queries and OWLTools by a ROBOT plugin #1174

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

gouttegd
Copy link
Contributor

@gouttegd gouttegd commented Feb 3, 2025

This PR is intended to experiment using a ROBOT plugin to replace both

(1) SPARQL queries (as suggested in #1169), and

(2) OWLTools (still used in standard workflows for two things: normalising a OBO source file, and creating subsets -- #622).

For now, this is using my own “experimental ROBOT plugin”. If we are happy with the experiment, we can then create a proper ODK plugin for ROBOT (and/or push some of the features in upstream ROBOT).

Install my experimental ROBOT plugin as a built-in plugin under the name
"odk". This is for experimentation only -- I use this plugin to trial
the use of pluggable commands in the ODK workflows.

If we go on with that route, we will create a dedicated ODK plugin
later.
When preparing import modules, we do a few things:

(1) add a dc:source ontology annotation, derived from the version IRI of
the original ontology;
(2) remove all other ontology annotations, keeping only the newly added
dc:source;
(3) inject proper SubAnnotationPropertyOf axioms for properties
representing subsets and synonym types.

All those steps are currently performed by SPARQL queries. Here we
replace those queries by calls to the `odk:annotate` command, which
takes care of (1) and (2), and to the `odk:normalize` command, which
takes care of (3).

Of note, the fact that we are no longer going through a SPARQL
processing step means that we could end up with duplicated axioms with
different sets of annotations. Those were automatically merged as a
side-effect of the SPARQL processing (which involves dumping the output
of the SPARQL processing and re-parsing it again into OWLAPI objects).
Since we no longer benefit from that side-effect, we must explicitly
include a step in which we merge duplicated axioms (theoretically this
could be done with `robot repair --merge-axiom-annotations`, but
unfortunately this command does not behave exactly like we would [1]).

[1] ontodev/robot#1239
We are still using OWLTools for two things:

(1) creating ontology subsets;
(2) merging duplicated axioms in the source file.

Those tasks can now be done by the `odk:subset` command and the
`odk:normalize` command, respectively.
The inject-subset-declaration.ru and inject-synonymtype-declaration.ru
SPARQL queries are no longer used in any standard workflows.
Now that the standard workflows no longer use OWLTools, there is no
longer any need for OWLTools to be present in ODKLite (whose purpose is
to contain all the tools needed by the standard workflows, and only
those tools). We thus move it to ODKFull.
@gouttegd gouttegd marked this pull request as draft February 3, 2025 20:10
@gouttegd gouttegd self-assigned this Feb 3, 2025
@gouttegd gouttegd added this to the 1.6 milestone Feb 3, 2025
By default, the `odk:subset` does _not_ send the generated subset down
the ROBOT pipeline, unless the `--replace true` option is used.
gouttegd and others added 3 commits February 4, 2025 23:24
We add a new option in the 'robot_report' section called
'upper_ontology'. If set, it should be the (resolvable) IRI of an upper
ontology (such as http://purl.obolibrary.org/obo/cob.owl).

When set, a a new report is added to the list of ROBOT report, one that
tests whether all classes of the ontology are classified under one of
the classes of the upper ontology. The new report uses the same
parameters as the standard ROBOT reports regarding the file to perform
the check on ('report_on') and whether the check should be limited to
classes within the project's namespaces or not ('use_base_iris').

See #1175
This commit fixes several issues with the generated Makefile rule that
performs the alignment check:

* Only include the aligment report when an upper ontology is defined.
* When asked to perform the check on the -edit file, actually perform it
  on the $(SRCMERGED) file (for consistency with other reports).
* Use the reasoner defined in the project, if any.
* Fix formatting so that the generated rules are somewhat readable.
The status of the `project.robot_report.upper_ontology` field cannot
simply be tested with either 'is defined' or 'is not none', because it
will depend on whether a `robot_report` section exists at all:

* without a `robot_report` section, the field is *defined* but is None
  (default value);
* with a `robot_report` section but no `upper_ontology` field, the field
  is *not defined*.

So to cover both cases, we need to test both for the existence of the
field, and whether it is None or not.
@gouttegd gouttegd marked this pull request as ready for review February 5, 2025 16:44
@gouttegd gouttegd requested a review from matentzn February 5, 2025 16:45
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sooooo awesome. I have a few questions (in code) and 1 general concern.

Is there any way you would agree to move the ODK robot plugin into the INCATools org? I would feel a bit better if that component that is shaping up to be a core component of the ODK build system would live a bit more visibly here..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be careful here of one thing: Not to delete this file in repos during some future implementation update_repo cleanup. Currently I think this is ok.

The reason, I know for a fact that this file (and the subset one) is used in at least 8 or so custom.Makefile (basically everytime someone had to customise a imports/*_import.owl goal).

@@ -290,7 +288,8 @@ all_mappings: $(MAPPING_FILES)
# ----------------------------------------

OBO_REPORT = {% for x in project.robot_report.report_on|default(["edit"]) %} {% if x=="edit" %}$(SRC){% else %}{{ x }}{% endif %}-obo-report{% endfor %}
REPORTS = $(OBO_REPORT)
ALIGNMENT_REPORT = {% for x in project.robot_report.report_on|default(["edit"]) %} {% if x =="edit" %}$(SRC){% else %}{{ x }}{% endif %}-align-report{% endfor %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?


$(REPORTDIR)/$(SRC)-align-report.tsv: $(SRCMERGED) | $(REPORTDIR) all_robot_plugins
$(ROBOT) odk:validate -i $< --reasoner $(REASONER) \
--upper-ontology-iri {{ project.robot_report.upper_ontology }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic design decision as it allows us to use version IRIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also means that the validation step will download the upper ontology whenever it is needed, which in turns mean that the validation becomes susceptible to network glitches.

I’d like to recommend that projects that want to enable validation against an upper ontology take the supplementary steps of:

  • downloading the upper ontology (in the version that they want) once and for all, and commit it somewhere in the repo;
  • add an entry in their XML catalog so that the upper ontology IRI redirects to the local file.

{% if project.namespaces is not none %}{% for iri in project.namespaces %}--base-iri {{ iri }} \
{% endfor %}{% else %}--base-iri $(URIBASE)/{{ project.id.upper() }}_ --base-iri $(URIBASE)/{{ project.id }} \
{% endif %}{% else %} \
{% endif %}--report-output $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some additional thoughts here (what would be useful in that report), not sure if you want to discuss this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not in the dedicated issue (#1175)?

$(IMPORTDIR)/%_import.owl: {% if project.import_group.use_base_merging %}$(MIRRORDIR)/merged.owl{% else %}$(MIRRORDIR)/%.owl{% endif %} $(IMPORTDIR)/%_terms_combined.txt | all_robot_plugins
$(ROBOT) odk:annotate -i $< --remove-all true --add-source true \
extract -T $(IMPORTDIR)/$*_terms_combined.txt --force true --copy-ontology-annotations false --individuals {{ project.import_group.slme_individuals }} --method {{ project.import_group.module_type_slme }} \
odk:normalize --base-iri {{ project.uribase }} --inject-subset-declarations true --inject-synonym-declarations true --merge-axioms true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be such a relief on memory consumption for larger ontologies! awesome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of note, the odk:annotate could hopefully be replaced by

annotate --remove-annotations --interpolate --link-annotation dc:source %{version_iri}

if the interpolation feature makes it way into upstream ROBOT.

$(ROBOT) merge -i $< \{% if project.import_group.exclude_iri_patterns is not none %}
{% for pattern in project.import_group.exclude_iri_patterns %}remove --select "{{ pattern }}" {% endfor %} \{% endif %}
extract -T $(IMPORTDIR)/merged_terms_combined.txt --force true --copy-ontology-annotations true --individuals {{ project.import_group.slme_individuals }} --method {{ project.import_group.module_type_slme }} \
remove $(patsubst %, --term %, $(ANNOTATION_PROPERTIES)) -T $(IMPORTDIR)/merged_terms_combined.txt --select complement --select annotation-properties \
query --update ../sparql/inject-subset-declaration.ru --update ../sparql/inject-synonymtype-declaration.ru --update ../sparql/postprocess-module.ru \
odk:normalize --base-iri {{ project.uribase }} --inject-subset-declarations true --inject-synonym-declarations true --merge-axioms true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mildly concerned of using {{ project.uribase }}; what exactly is this parameter used for in odk:normalize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as in the original inject-subset-declaration.ru.jinja2 SPARQL query:

INSERT { ?y rdfs:subPropertyOf oboInOwl:SubsetProperty . }

WHERE {
  ?x oboInOwl:inSubset  ?y .
  FILTER(isIRI(?y))
  FILTER(regex(str(?y),"^(http://purl.obolibrary.org/obo/)")
         || regex(str(?y),"^(http://www.ebi.ac.uk/efo/)")
         || regex(str(?y),"^(https://w3id.org/biolink/)")
         || regex(str(?y),"^({{ project.uribase }})"))
}

That is, it restricts the injection of the SubAnnotationPropertyOf axiom to subset properties that are in the namespace of the project (in addition to the built-namespaces http://purl.obolibrary.org/obo/, http://www.ebi.ac.uk/efo/, https://w3id.org/biolink/).

So a oboInOwl:inSubset annotation with a value of http://example.org/external/namespace, where http://example.org/external/namespace is unrelated to {{ project.uribase }}, would not cause http://example.org/external/namespace to be classified as a subproperty of oboInOwl:SubsetProperty.

This is the original behaviour, which is unchanged by the use of the plugin.

Likewise for synonym type declarations.

$(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\
$(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@
$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR) all_robot_plugins
$(ROBOT) odk:subset -i $< --subset $* --fill-gaps true --replace true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 WOW! That would be awesome, but...

I was a bit surprised to see this owltools use anyways because I thought this might have been replaced by robot extract -m subset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wish!

The subset method of robot extract only implements¹ the “gap spanning” mode of owltools --extract-ontology-subset (the default mode, used when the --fill-gaps option is not specified). There is no way with ROBOT to reproduce the behaviour of the “gap filling” mode – I have already raised that issue here.

More generally, OWLTools has 3 different ways of creating an “ontology subset” (that I know of – there might even be more that I have never encoutered). And of course they all behave slightly differently, because where would be the fun otherwise?

  • --extract-ontology-subset --subset <SUBSET_NAME>

Used to extract a subset defined by oboInOwl:inSubset annotations. This uses “gap spanning”, meaning basically that the subset only contains the classes defined in the subset, but if two classes are indirectly related through several relationships that involve intermediate classes that are not in the subset, that relation will be preserved (a direct relation between the two classes will be automatically fabricated).

This is what the -m subset method of robot extract does.

  • --extract-ontology-subset --subset <SUBSET_NAME> --fill-gaps

Also used to extract a subset defined by oboInOwl:inSubset annotations, but using “gap filling”, meaning basically that the initial list of classes in the subset is expanded to include any classes referred from within the subset. This creates a self-sufficient “closure”.

Not doable at all with ROBOT.

  • --reasoner-query <DL_QUERY> --make-ontology-from-results

Used to create a subset defined by a DL query, such as 'part of' some 'nervous system'. It doesn’t do any “gap filling” or “gap spanning”, but it does take care of including any object property used in the subset. It can optionally do a limited form of “gap filling”, but only for is-a relationships (including all parent classes all the way to owl:Thing), by asking the reasoner to include the ancestors that match the query (the default is to only include the equivalent and the subclasses).

Not doable at all with ROBOT.

My proposed odk:subset command implements the last two methods.

(I do wonder why the first method is the one that got implemented in ROBOT, as it seems to be the least used method. It is not used at all in any ODK standard workflow, and even Uberon, which creates literally dozens of subsets, only uses it for one subset.)


¹ Kind of. robot extract -m subset does not take care of retrieving the classes flagged as belonging to the subset, so that must be done by another step beforehand. Also, it requires an explicit list of properties to include in the subset, instead of automatically using all properties that are used by the classes of the subset.

$(ROBOT) {% if ont.is_large %}extract -i $< -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT \{% else %}query -i $< --update ../sparql/preprocess-module.ru \
extract -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT \{% endif %}
$(IMPORTDIR)/{{ ont.id }}_import.owl: {% if project.import_group.use_base_merging %}$(MIRRORDIR)/merged.owl{% else %}$(MIRRORDIR)/{{ ont.id }}.owl{% endif %} $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt | all_robot_plugins
$(ROBOT) odk:annotate -i $< --remove-all true --add-source true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does --remove-all true do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It removes all existing ontology annotations (which is what the postprocess-module.ru query currently does).

I wrote that before realizing that the original annotate command in upstream ROBOT already has the same feature (robot annotate --remove-annotations). 🤦

@gouttegd
Copy link
Contributor Author

gouttegd commented Feb 8, 2025

Is there any way you would agree to move the ODK robot plugin into the INCATools org?

There is currently no ODK ROBOT plugin. This PR is using my experimental plugin (emphasis on experimental), which nobody should ever think about using seriously in a production pipeline. That plugin is basically my playground, where I have fun test various ideas that may or may not turn out to be good ideas.

When there will be a ROBOT plugin (using consolidated ideas from my experimental one), it could of course be hosted somewhere in the INCATools organisation. Though its Java namespaces would likely still be somewhere under org.incenp.obofoundry, since the “INCATools organisation” is merely a GitHub thing that does not have its own domain name.

@gouttegd
Copy link
Contributor Author

gouttegd commented Feb 8, 2025

There is currently no ODK ROBOT plugin.

Well, this is no longer true. :) The “ODK ROBOT plugin” exists now. Derived from my experimental plugin, but containing only the commands that are deemed useful for the ODK, not the more dubious experimental ones.

It is in my own account for now, but happy to transfer its ownership to the INCATools organisation if you prefer.

Replace my experimental ROBOT plugin by a proper ODK plugin.

This involves modifying the calls to some commands, as the consolidated
ODK plugin use slightly different options compared to the original
experimental plugin.
@gouttegd gouttegd changed the title Experiment with a ODK plugin for ROBOT Replace SPARQL queries and OWLTools by a ROBOT plugin Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants