Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Query Model Structures for discussion #32

Open
colinveal opened this issue Jul 26, 2018 · 7 comments
Open

Example Query Model Structures for discussion #32

colinveal opened this issue Jul 26, 2018 · 7 comments

Comments

@colinveal
Copy link
Contributor

colinveal commented Jul 26, 2018

Here's the examples from the teleconference:
Example 1
Example 2

I believe we were leaning towards the hierarchical model with external logic:

Model 1

@harindra-a
Copy link
Contributor

harindra-a commented Jul 26, 2018

great, thanks for sharing this @colinveal !

@Relequestual and @fschiettecatte, we reached consensus on this yesterday right? How about we ask Colin to add this directly (or Colin and Ben meet 1-1 teleconf to do that jointly) to draft to keep things moving as we are pretty close to v0.1.0 freeze?

@fschiettecatte
Copy link
Contributor

Fine with me.

@Relequestual
Copy link
Member

Relequestual commented Aug 1, 2018

Thanks for these @colinveal
It looks like we've reached an updated consensus at ga4gh-discovery/data-connect#9 (comment)

Are you happy to close this issue in favour of ga4gh-discovery/data-connect#9 ?

The gists should stay around for reference.

@colinveal
Copy link
Contributor Author

colinveal commented Aug 2, 2018

Hi, I've updated models 1 (hierarchical) and 2 (non-hierarchical) with json pointers.
model 1
model 2

@Relequestual
Copy link
Member

Looking at these updated examples, it looks like this issue is still discussing if we need hierarchical components, and not how the components are referenced (which we have agreed on now I believe).

Could you explain in sudo logic the query you're expecting please?


Following from my previous experience with the MME API specification, and how we and others represent variants in our database, subjectvariant from your example of would encompass allele, zygosity, and pathogenicity, but not phenotype as you have shown in your first model example.

So, in terms of a subjectvariant component, we're pretty close to agreeing, I think.

I would say gender comes under a subject component, as it's information about the subject, which would probably include some identifier too. Still needs to be ironed out.

disease, I don't think we intend to support free text at the API layer. OMIM codes only for now, although given OMIM isn't a disease ontology, having text based could be useful, but equally you could provide that ability by including all the terms which include the specific phrase your user inputs.

For phenotype, I feel for now we should specify, matches up or down the tree, apart from the second level generic terms (as in, not children of HP:0000118). HPO only for now.

In "model 2", data is split into components in a too granular way so as they loose meaning, loosing context. For example value or allele on their own should not be a component. Components should have meaning on their own.

@Relequestual
Copy link
Member

To summarise the previous, I want to combine fields into components which represent different contexts, removing ambiguity. There may be components which have similar fields, but have different contextual meaning. I find that a preferable solution over nesting components.

I'm even unhappy about the potential to assign different meanings to components based on how they are combined, which seems to be the suggestion you're putting forward here.

@colinveal
Copy link
Contributor Author

The query is:
A subject that has (heterozygous allele 'A' at variant rs123 where variant rs123 is pathogenic for dementia and variant rs123 has a relationship with MMSE AND the subject has dementia or Alzheimer's disease) OR (the subject has homozygous allele 'C' at variant rs124 AND the subject has Alzheimer's disease or MMSE > 20.

I can see how we can use fields to replicate a lot of the hierarchy within a component, however to replicate the complexity available using hierarchies there could be a lot of fields for some components. Also where there are qualifiers that are required for multiple fields, i.e 'ontology', 'operator', 'value', 'source', 'unit' then these would require distinct naming to distinguish which field they apply to, thereby also increasing the overall number of fields in a component.

I feel there should be a way that we can take advantage of both ways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants