Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for entity reference to clarify CURIE/URI type #358

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Apr 8, 2024

Resolves mapping-commons/sssom-py#512

  • docs/ have been added/updated if necessary

The documentation for "entity references" is largely confusing, and leading, in its current form, to the expectation that metadata fields types with these can be either a CURIE or a full URI.

This is not quite right: in the TSV serialistion, as well as JSON, we expect the values to be CURIE-strings. This PR documents this design decision, but, as @joeflack4 puts it rightfully, this has a bit of a "smell" to it. Not sure what the right thing is here, but this documentation here is certainly better than not having it.

Copy link
Contributor

@gouttegd gouttegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific details about the serialisations do not belong in the data model. The data model is (or must be) independent of any serialisation format.

In the model, an entity reference is an identifier. That’s it.

If a serialisation format wants to impose a specific format for the identifiers (e.g., in SSSOM/TSV, we only want CURIEs), that must be documented in the specification of that serialisation format.

Of course the problem right now is that we do not have proper specifications for the serialisation formats, so we have no place where to describe those details. That’s part of what I want to address in #330.

@matentzn
Copy link
Collaborator Author

matentzn commented Apr 8, 2024

Ok, lets replace this PR by a "see also" reference to the doc on serialisations.

That said, the current description of Entity reference is still terrible :P I will leave this in draft mode to fix it another time.

Thanks for weighing in!

@matentzn matentzn marked this pull request as draft April 8, 2024 10:07
@gouttegd
Copy link
Contributor

gouttegd commented Apr 8, 2024

Note that I can’t say when I will complete the reorganisation of the docs, so if you don’t want to wait for that, you can add a note to the TSV section of the overview (which is the best/only thing we have as a “specification” for the TSV format, for now) to explicitly state that identifiers in a SSSOM/TSV file MUST be in CURIE form.

Something like

All identifiers in a SSSOM/TSV file (both in the metadata section and in the TSV columns) MUST be in CURIE form. The use of full-length identifiers is officially not supported.

That said, the current description of Entity reference is still terrible.

Agreed. “A reference to a mapped entity” is plain wrong, actually, since EntityReference is not only used for the subject_id and object_id, but for all cases where an identifier is required -- including identifiers to other things than the entities that are being mapped.

@joeflack4
Copy link
Contributor

I think I agree @gouttegd that the documentation for the model should be serialization agnostic. However, from a user experience perspective, I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user; the serialization.

So I think I'll add a suggestion to your pull request, or it can be in a new PR, to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.

I'm also fine with adding a "see also" linking to the serialization documentation, but with the changes I suggested above, that is less of a necessary / useful change than it otherwise would be.

@gouttegd
Copy link
Contributor

gouttegd commented Apr 8, 2024

I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user

That’s definitely something that is likely to happen right now. In fact the documentation for the data model is currently the home page for the SSSOM website, which is very unfortunate and is one of the things I want to change (issue #330; changes are underway in that branch).

to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.

That’s the plan. Won’t happen overnight, though.

@gouttegd
Copy link
Contributor

gouttegd commented Apr 11, 2024

By the way, among the small things that need to be updated: the examples in the spec consistently use a full-length IRI for the creator_id slot, despite that slot being a EntityReference:

#creator_id: "https://orcid.org/0000-0002-7356-1779"
#curie_map: 
#  HP: "http://purl.obolibrary.org/obo/HP_"
#  MP: "http://purl.obolibrary.org/obo/MP_"
#  skos: "http://www.w3.org/2004/02/skos/core"
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#mapping_provider: "http://purl.obolibrary.org/obo/upheno.owl"
subject_id  predicate_id    object_id   mapping_justification   subject_label   object_label
HP:0009124  skos:exactMatch MP:0000003  Lexical Abnormal adipose tissue morphology  abnormal adipose tissue morphology
HP:0008551  skos:exactMatch MP:0000018  Lexical Microtia    small ears
HP:0000411  skos:exactMatch MP:0000021  Lexical Protruding ear  prominent ears

It’s bad enough for the spec never to explicitly say that full-length IRIs are not expected in the TSV format, but to give examples where full-length IRIs are actually used is actively misleading and is very bad.

Adding a see_also link to the TSV docs in the spec to the data model and removing the serialisation specific comments
@matentzn matentzn marked this pull request as ready for review April 12, 2024 11:57
@matentzn matentzn requested a review from gouttegd April 12, 2024 11:57
@matentzn matentzn requested a review from ehartley April 13, 2024 09:53
Copy link
Contributor

@ehartley ehartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good when combined with the changes in #362.

@matentzn matentzn merged commit 836e2bd into master Apr 17, 2024
3 checks passed
@matentzn matentzn deleted the matentzn-patch-1 branch April 17, 2024 10:24
@matentzn
Copy link
Collaborator Author

@ehartley @gouttegd thank you for your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EntityReference typedef: docs & validation inconsistent
4 participants