-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation for entity reference to clarify CURIE/URI type #358
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specific details about the serialisations do not belong in the data model. The data model is (or must be) independent of any serialisation format.
In the model, an entity reference is an identifier. That’s it.
If a serialisation format wants to impose a specific format for the identifiers (e.g., in SSSOM/TSV, we only want CURIEs), that must be documented in the specification of that serialisation format.
Of course the problem right now is that we do not have proper specifications for the serialisation formats, so we have no place where to describe those details. That’s part of what I want to address in #330.
Ok, lets replace this PR by a "see also" reference to the doc on serialisations. That said, the current description of Entity reference is still terrible :P I will leave this in draft mode to fix it another time. Thanks for weighing in! |
Note that I can’t say when I will complete the reorganisation of the docs, so if you don’t want to wait for that, you can add a note to the TSV section of the overview (which is the best/only thing we have as a “specification” for the TSV format, for now) to explicitly state that identifiers in a SSSOM/TSV file MUST be in CURIE form. Something like
Agreed. “A reference to a mapped entity” is plain wrong, actually, since |
I think I agree @gouttegd that the documentation for the model should be serialization agnostic. However, from a user experience perspective, I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user; the serialization. So I think I'll add a suggestion to your pull request, or it can be in a new PR, to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection. I'm also fine with adding a "see also" linking to the serialization documentation, but with the changes I suggested above, that is less of a necessary / useful change than it otherwise would be. |
That’s definitely something that is likely to happen right now. In fact the documentation for the data model is currently the home page for the SSSOM website, which is very unfortunate and is one of the things I want to change (issue #330; changes are underway in that branch).
That’s the plan. Won’t happen overnight, though. |
By the way, among the small things that need to be updated: the examples in the spec consistently use a full-length IRI for the
It’s bad enough for the spec never to explicitly say that full-length IRIs are not expected in the TSV format, but to give examples where full-length IRIs are actually used is actively misleading and is very bad. |
Adding a see_also link to the TSV docs in the spec to the data model and removing the serialisation specific comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good when combined with the changes in #362.
Resolves mapping-commons/sssom-py#512
docs/
have been added/updated if necessaryThe documentation for "entity references" is largely confusing, and leading, in its current form, to the expectation that metadata fields types with these can be either a CURIE or a full URI.
This is not quite right: in the TSV serialistion, as well as JSON, we expect the values to be CURIE-strings. This PR documents this design decision, but, as @joeflack4 puts it rightfully, this has a bit of a "smell" to it. Not sure what the right thing is here, but this documentation here is certainly better than not having it.