-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify what sort of biomedical data ARAs will be required to obtain … #5
base: master
Are you sure you want to change the base?
Conversation
…from KPs or other translator resources.
@dkoslicki - these are great points, and I'd like to ask a few questions to help clarify my understanding. So for use case 1, is it fair to say that this is mostly a concern about caching? If so, then I think it's mostly orthogonal to whether ARAs can get non-KP information. So, if caching of KPs were allowed, then I think we would want to make a KP for that information, such that any of the ARAs could use it. I guess one problem could be if the data being pulled is not amenable to being served via a ReasonerAPI, but I'm not sure if that's the problem here... For use case 2, I agree that we need to be able to instantiate those aggregate graphs for all kinds of big-graph analysis. The question in my mind is whether they can be created from KPs. My hope is that they could be - the KGX tools developed by @cmungall and @deepakunni3 among others should allow this to happen pretty easily? Or do you see a need to bring in non-KP information into such a graph? |
For use case 1, yes this is “mostly” a concern about caching. Hitting an API is significantly slower than hitting a database stored in memory. While KP’s could provide endpoints that allow bulk download of the data:
For use case 2: I think the aggregated graphs (from tools such as KGX) would be a great starting point, but additional (non-KP) information might be required. For example, some GNN’s require graphs to have a specific format (easily done by post-processing a, say, KGX aggregated graph), but also require additional information decorating node/edge properties (specifically, numerical values related to non-KP derived (or only partially KP derived) training data, etc). It would be nice to clarify if it’s “ok” for ARA’s to keep/control these modified graphs (originally derived from some Translator KP-aggregated graph(s)) as local sources of information to informal machine learning models. |
@dkoslicki - did the Relay discussion help with the clarification here? I think that the outcome of that discussion was that:
|
Yup! I think that accurately summarizes the discussion. Though on point 2, "in the short term" may end up being longer than we think (as KP teams have their own milestones to prioritize). Might be wise to make the architecture doc just not say anything/much on this point (as I think is currently the case) until a KP/KS registry is settled on, a request/ticketing system is set up, etc. |
@NCATSTranslator/architecture-core: Test: Review Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About No. 7:
[[ARAs obtain Message nodes and edges only via KPs (or other ARAs), not from locally-cached aggregated graphs or non-Translator data sources.]]
Sounds odd that ARAs can obtain Message nodes and edges from other ARAs because to provide those, would it be necessary that ARAs locally cache graphs? Apologies if I'm missing something.
…from KPs or other translator resources.
In particular, while obtaining Message nodes and edges from Translator resources makes complete sense imho, ARAs frequently require additional information to perform reasoning. Examples include: